[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#57507: Regular expression matching depends on locale encoding

From: Jean Abou Samra
Subject: bug#57507: Regular expression matching depends on locale encoding
Date: Mon, 5 Sep 2022 20:39:26 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0

Le 05/09/2022 à 09:48, Ludovic Courtès a écrit :
Hi Jean,

Jean Abou Samra <jean@abou-samra.fr> skribis:

Regular expressions do funky things with Unicode if a non-Unicode-aware
locale is set. Yet, they're purely string operations, so I don't think
it's expected that they depend on the locale encoding.
This is the expected behavior: first because (ice-9 regex) is
implemented in terms of the libc regex functions, as Dale put (but that
could be thought as an implementation detail), and second because things
such as character classes are necessarily locale-dependent (this has
bitten us in the past, for instance with <https://bugs.gnu.org/35785>).

I hope that makes sense.

OK, thanks, but in this case, it should be clearly stated as a limitation
in the (ice-9 regex) documentation IMHO. If you don't know what constraints
there are on the implementation, there is no reason to expect this. Would it
help if I submitted a patch for that?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]