bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’


From: Ludovic Courtès
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Mon, 20 May 2019 11:14:04 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Hi!

So the guts of the problem is that Guile’s ‘string->uri’ procedure
behaves incorrectly under that locale:

--8<---------------cut here---------------start------------->8---
$ export GUIX_LOCPATH=$(guix build glibc-locales)/lib/locale
$ LANGUAGE= LC_ALL=sv_SE.utf8 ./pre-inst-env guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> ,use(web uri)
scheme@(guile-user)> (string->uri 
"ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz";)
$1 = #f
--8<---------------cut here---------------end--------------->8---

More specifically, ‘parse-authority’ is failing under that locale,
because of the “w”:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (const 
'fail))
$5 = fail
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (const 
'fail))
$6 = #f
$7 = "sourcevare.org"
$8 = #f
--8<---------------cut here---------------end--------------->8---

We can boil it down to this example:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(ice-9 regex)
scheme@(guile-user)> (string-match "[a-z]" "a")
$10 = #("a" (0 . 1))
scheme@(guile-user)> (string-match "[a-z]" "w")
$11 = #f
--8<---------------cut here---------------end--------------->8---

In short, under the sv_SE.utf8 locale of glibc 2.28, “w” is not
considered part of the ‘a-z’ interval.

Indeed, ‘localedata/locales/sv_SE’ in glibc reads this:

  % The letter w is normally not present in the Swedish alphabet. It
  % exists in some names in Swedish and foreign words, but is accounted
  % for as a variant of 'v'.  Words and names with 'w' are in Swedish
  % ordered alphabetically among the words and names with 'v'. If two
  % words or names are only to be distinguished by 'v' or % 'w', 'v' is
  % placed before 'w'.

Using the “lower” regexp class instead of “[a-z]” works:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-match "[[:lower:]]" "w")
$12 = #("w" (0 . 1))
--8<---------------cut here---------------end--------------->8---

However, it’s not clear to me whether the “lower” class is supposed to
be the same for all locales or if we’re just lucky:

  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

Thoughts?

The workaround until we’ve fixed it is to use another locale, though you
can still set “LC_MESSAGES=sv_SE.utf8” or “LANGUAGE=sv”.

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]