[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#57507: Regular expression matching depends on locale encoding

From: Jean Abou Samra
Subject: bug#57507: Regular expression matching depends on locale encoding
Date: Thu, 17 Nov 2022 21:33:42 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1

Le 05/09/2022 à 21:24, Ludovic Courtès a écrit :
Yes, that’d be welcome.  I would not call it a constraint or limitation;
for example, that ‘w’ is not a letter in Swedish is the kind of thing
you’d generally want to take into account.  Now, it’d be nice if one
could easily specify the locale to operate under, with an API similar to
that of (ice-9 i18n) and its first-class locale objects.

Sorry that it took me forever to send this.

From c666ca4f72dc0a00d28b8d7ef1221ebfc9741551 Mon Sep 17 00:00:00 2001
From: Jean Abou Samra <jean@abou-samra.fr>
Date: Thu, 17 Nov 2022 21:26:07 +0100
Subject: [PATCH] Doc: clarification on regexes and encodings

* doc/ref/api-regex.texi: make it more obviously clear that regexp
  matching supports only characters supported by the locale encoding.
 doc/ref/api-regex.texi | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi
index b14c2b39c..bd1f4079d 100644
--- a/doc/ref/api-regex.texi
+++ b/doc/ref/api-regex.texi
@@ -57,7 +57,11 @@ locale's encoding, and then passed to the C library's regular expression
 routines (@pxref{Regular Expressions,,, libc, The GNU C Library
 Reference Manual}).  The returned match structures always point to
 characters in the strings, not to individual bytes, even in the case of
-multi-byte encodings.
+multi-byte encodings.  This ensures that the match structures are
+correct when performing matching with characters that have a multi-byte
+representation in the locale encoding.  Note, however, that using
+characters which cannot be represented in the locale encoding can lead
+to surprising results.

 @deffn {Scheme Procedure} string-match pattern str [start]
 Compile the string @var{pattern} into a regular expression and compare

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]