[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regex.c simplification
Re: regex.c simplification
Sat, 16 Jun 2018 09:11:34 -0700
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
Eli Zaretskii wrote:
I think we still haven't abandoned the hope of updating to the latest
glibc/gnulib versions of regex.c, although I'm not sure how practical
these hopes are at this point.
That's been on my list of things to do for ages. I don't know if it'll ever get
done, or even whether it's worth doing.
As far as I know, Emacs is the only package that still uses the "old" regex.c
code derived from pre-2002 glibc. Everybody else has migrated to the "new"
regex.c code that was contributed to glibc in 2002 and is in Gnulib. So, in some
sense regex.c has already forked; we just haven't made it official.
A complication: src/regex.c is compiled twice, once within lib-src (for etags)
and once within src (for Emacs proper), and the "#if defined emacs" stuff in
src/regex.c matters for this.
If we wanted to make the fork more official, we could simplify src/regex.c to
not worry about lib-src, by having etags use Glibc/Gnulib regex rather than
Emacs regex. That would be easy for me to arrange, if you like. Once we did
that, you could simplify src/regex.c by assuming that 'emacs' is defined. None
of this would preclude us from eventually merging Emacs src/regex.c with
Gnulib/glibc, a task that is so hard that the changes Daniel is thinking about
wouldn't make it much harder.
While we're on the topic, a couple of more comments about regex code.
The "old" and the "new" regex implementations both have problems. The old one
has serious performance problems in some cases, and fails to conform to POSIX.
The new one is typically better in both departments, but is so complicated that
no maintainer understands it (I have attempted to contact the original
contributor Isamu Hasegawa of Square Enix Co., Ltd., but have never heard back),
so its (hopefully few) bugs remain unfixed.
The Perl regular expression library is popular in other free software and
appears to be better maintained than either "old" or "new" regexp code. GNU
Grep, for example, uses either the "new" regexp code or the Perl library,
depending on command-line options. The Perl library tends to be more like the
"old" regex implementation, in that it prefers functionality and flexibility to
performance; however, it has many more features than the "old" regex code does.
Among other things, it supports a more-readable regular expression syntax (a
topic that came up recently on this mailing list in another context).
Re: regex.c simplification, Noam Postavsky, 2018/06/16
Re: regex.c simplification, Perry E. Metzger, 2018/06/16