[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Regex library
From: |
Assaf Gordon |
Subject: |
Re: Regex library |
Date: |
Sun, 27 Jun 2021 00:24:35 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 |
Hello,
On 2021-06-18 1:42 a.m., Pietro Paolini wrote:
In the sed source code there is a folder called lib/ which seems to include
the GNU lib and or maybe I am flay wrong and that isn't gnulib
The content of "/lib" in the sed-4.8.tar.gz is indeed a subset of gnulib.
Another question concerns the regex library in use, I can see the code
using regex functions defined as part of gnulib
[...]
Yet when I ldd the sed binary I can observe that PCRE is dynamically linked
[...]
What library is used for regex in GNU sed ? I inclined to say that PCRE
isn't used, after all libpthread gets linked too and it is not used.
First,
You're correct - PCRE is not used by gnu sed.
On my system, it is "libselinux" which uses PCRE (and sed does use
selinux by default):
$ ldd /lib/x86_64-linux-gnu/libselinux.so.1 | grep -i pcre
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3
Second,
As for which regex code is used, the answer is a bit nuanced.
The source code file which does the actual regex matching is "sed/regexp.c":
https://git.savannah.gnu.org/cgit/sed.git/tree/sed/regexp.c
Inside, two main function are used: re_compile_pattern() and re_search().
These are defined in gnulib's "regcomp.c" and "regexec.c" files:
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/regcomp.c
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/regexec.c
However,
These functions are also defined in glibc (although internal).
glibc and gnulib's source code are often synchronized, so these
functions might be identical, or (if it's an old glibc) - gnulib's
version that is bundled with gnu sed might be newer.
During "./configure", if the system's glibc is detected to have new-
enough version of these functions - they will be used.
Otherwise, the gnulib version will be used.
You can force the build to use glibc's version with:
./configure --without-included-regex
But that's not recommended, unless you are certain of what you're
doing.
Third,
To add another layer, GNU sed employs some regex optimizations using a
faster engine (gnulib's DFA engine,
https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/dfa.c ).
That code is not available in glibc, and so it is always taken from gnulib.
Hope this answers the question.
regards,
- assaf
- Regex library, Pietro Paolini, 2021/06/18
- Re: Regex library,
Assaf Gordon <=