[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regex.c simplification

From: Paul Eggert
Subject: Re: regex.c simplification
Date: Sat, 16 Jun 2018 09:11:34 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

Eli Zaretskii wrote:
I think we still haven't abandoned the hope of updating to the latest
glibc/gnulib versions of regex.c, although I'm not sure how practical
these hopes are at this point.

That's been on my list of things to do for ages. I don't know if it'll ever get done, or even whether it's worth doing.

As far as I know, Emacs is the only package that still uses the "old" regex.c code derived from pre-2002 glibc. Everybody else has migrated to the "new" regex.c code that was contributed to glibc in 2002 and is in Gnulib. So, in some sense regex.c has already forked; we just haven't made it official.

A complication: src/regex.c is compiled twice, once within lib-src (for etags) and once within src (for Emacs proper), and the "#if defined emacs" stuff in src/regex.c matters for this.

If we wanted to make the fork more official, we could simplify src/regex.c to not worry about lib-src, by having etags use Glibc/Gnulib regex rather than Emacs regex. That would be easy for me to arrange, if you like. Once we did that, you could simplify src/regex.c by assuming that 'emacs' is defined. None of this would preclude us from eventually merging Emacs src/regex.c with Gnulib/glibc, a task that is so hard that the changes Daniel is thinking about wouldn't make it much harder.

While we're on the topic, a couple of more comments about regex code.

The "old" and the "new" regex implementations both have problems. The old one has serious performance problems in some cases, and fails to conform to POSIX. The new one is typically better in both departments, but is so complicated that no maintainer understands it (I have attempted to contact the original contributor Isamu Hasegawa of Square Enix Co., Ltd., but have never heard back), so its (hopefully few) bugs remain unfixed.

The Perl regular expression library is popular in other free software and appears to be better maintained than either "old" or "new" regexp code. GNU Grep, for example, uses either the "new" regexp code or the Perl library, depending on command-line options. The Perl library tends to be more like the "old" regex implementation, in that it prefers functionality and flexibility to performance; however, it has many more features than the "old" regex code does. Among other things, it supports a more-readable regular expression syntax (a topic that came up recently on this mailing list in another context).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]