bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: implementing extended bracket expressions in gnulib [was Re: Dealing


From: Bruno Haible
Subject: Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep]
Date: Thu, 9 Jun 2011 13:53:21 +0200
User-agent: KMail/1.9.9

Paolo,

> My proposal wouldn't change defaults, which is why I believe that this 
> is a separate topic.

But at the same time you are pushing for the use of --with-included-regex.
We found out that by doing this, the equivalence classes feature gets lost,
and the divergence between glibc and gnulib becomes greater.

> 1) Aharon would like to release gawk 4.0 in the very near future, and 2) 
> adding an extension to glibc takes time.  That's why I prefer to work in 
> smaller steps.

If there is time pressure for gawk 4.0, gawk can itself make modifications
to the regex from gnulib, through gnulib-tool's option --local-dir. See
<http://www.gnu.org/software/gnulib/manual/html_node/Extending-Gnulib.html>.
It can also make --with-included-regex the default on its own.

> We'd need glibc to export two functions in both multi-byte and 
> wide-character versions:
> 
> 1) streqcoll(S1, S2) and wcseqcoll(S1, S2) would be the same as strcoll 
> and wcscoll, but they would compare only according to primary weights. 
> A slightly more formal definition is that streqcoll(S1, S2) == 0 iff S1 
> matches the \`[=C1=][=C2=][=C3=]...[=Cn=]\' regular expression, where Ci 
> are the characters of S2 (I'd need to double check this against POSIX 
> though).  When non-zero, the result of streqcoll(S1, S2) would be the 
> same as strcoll(S1, S2).  Likewise, glibc could provide streqxfrm and 
> wcseqxfrm, with the definition that strcmp(streqxfrm(S1), streqxfrm(S2)) 
> == streqcoll(S1, S2).
> 
> 2) On top of this, [.ss.] could be implemented using an additional 
> function mbelemlen(S) giving the length of the first collation element 
> in S.  [.S1.] would be rejected unless mbelemlen(S1) == strlen(S1), and 
> [.S1.] would match S2 if strcoll(S1, strndup(S2, mbelemlen(S2))) == 0. 
> wcelemlen could be provided likewise.
> 
> These are the minimal extensions that would be required to support full 
> regular expression features portably and in a manner that is compatible 
> with glibc, except for ranges

Great! These look like a good basis for discussing with the glibc people.

Ad 1): Is streqcoll symmetric? That is, is streqcoll(S1, S2) the same as
streqcoll(S2, S1)? It is not immediately clear to me from the definition.
If not, then a single streqxfrm function is not sufficient, you need two
functions streqxfrm1 and streqxfrm2, such that
  streqcoll(S1, S2) == strcmp(streqxfrm1(S1), streqxfrm2(S2)).

Ad 2): Do you need 2 functions, one for char * strings, and one for wide
strings here as well?

Bruno
-- 
In memoriam Johanna Kirchner <http://en.wikipedia.org/wiki/Johanna_Kirchner>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]