[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: implementing extended bracket expressions in gnulib [was Re: Dealing
From: |
Bruno Haible |
Subject: |
Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep] |
Date: |
Thu, 9 Jun 2011 13:53:21 +0200 |
User-agent: |
KMail/1.9.9 |
Paolo,
> My proposal wouldn't change defaults, which is why I believe that this
> is a separate topic.
But at the same time you are pushing for the use of --with-included-regex.
We found out that by doing this, the equivalence classes feature gets lost,
and the divergence between glibc and gnulib becomes greater.
> 1) Aharon would like to release gawk 4.0 in the very near future, and 2)
> adding an extension to glibc takes time. That's why I prefer to work in
> smaller steps.
If there is time pressure for gawk 4.0, gawk can itself make modifications
to the regex from gnulib, through gnulib-tool's option --local-dir. See
<http://www.gnu.org/software/gnulib/manual/html_node/Extending-Gnulib.html>.
It can also make --with-included-regex the default on its own.
> We'd need glibc to export two functions in both multi-byte and
> wide-character versions:
>
> 1) streqcoll(S1, S2) and wcseqcoll(S1, S2) would be the same as strcoll
> and wcscoll, but they would compare only according to primary weights.
> A slightly more formal definition is that streqcoll(S1, S2) == 0 iff S1
> matches the \`[=C1=][=C2=][=C3=]...[=Cn=]\' regular expression, where Ci
> are the characters of S2 (I'd need to double check this against POSIX
> though). When non-zero, the result of streqcoll(S1, S2) would be the
> same as strcoll(S1, S2). Likewise, glibc could provide streqxfrm and
> wcseqxfrm, with the definition that strcmp(streqxfrm(S1), streqxfrm(S2))
> == streqcoll(S1, S2).
>
> 2) On top of this, [.ss.] could be implemented using an additional
> function mbelemlen(S) giving the length of the first collation element
> in S. [.S1.] would be rejected unless mbelemlen(S1) == strlen(S1), and
> [.S1.] would match S2 if strcoll(S1, strndup(S2, mbelemlen(S2))) == 0.
> wcelemlen could be provided likewise.
>
> These are the minimal extensions that would be required to support full
> regular expression features portably and in a manner that is compatible
> with glibc, except for ranges
Great! These look like a good basis for discussing with the glibc people.
Ad 1): Is streqcoll symmetric? That is, is streqcoll(S1, S2) the same as
streqcoll(S2, S1)? It is not immediately clear to me from the definition.
If not, then a single streqxfrm function is not sufficient, you need two
functions streqxfrm1 and streqxfrm2, such that
streqcoll(S1, S2) == strcmp(streqxfrm1(S1), streqxfrm2(S2)).
Ad 2): Do you need 2 functions, one for char * strings, and one for wide
strings here as well?
Bruno
--
In memoriam Johanna Kirchner <http://en.wikipedia.org/wiki/Johanna_Kirchner>
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/09
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Bruno Haible, 2011/06/09
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Bruno Haible, 2011/06/09
- implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Paolo Bonzini, 2011/06/09
- Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep],
Bruno Haible <=
- Re: implementing extended bracket expressions in gnulib [was Re: Dealing with character ranges in grep], Paolo Bonzini, 2011/06/09
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/10
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/15
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/16
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/16
- Re: Dealing with character ranges in grep, Philipp Thomas, 2011/06/16
- Re: Dealing with character ranges in grep, Johannes Meixner, 2011/06/17
- Re: Dealing with character ranges in grep, Jim Meyering, 2011/06/17
- Re: Dealing with character ranges in grep, Paolo Bonzini, 2011/06/27
- proposal: make [A-Z] range handling locale-independent, Jim Meyering, 2011/06/16