[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
From: |
Ludovic Courtès |
Subject: |
bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’ |
Date: |
Mon, 20 May 2019 11:14:04 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) |
Hi!
So the guts of the problem is that Guile’s ‘string->uri’ procedure
behaves incorrectly under that locale:
--8<---------------cut here---------------start------------->8---
$ export GUIX_LOCPATH=$(guix build glibc-locales)/lib/locale
$ LANGUAGE= LC_ALL=sv_SE.utf8 ./pre-inst-env guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guile-user)> ,use(web uri)
scheme@(guile-user)> (string->uri
"ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz")
$1 = #f
--8<---------------cut here---------------end--------------->8---
More specifically, ‘parse-authority’ is failing under that locale,
because of the “w”:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (const
'fail))
$5 = fail
scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (const
'fail))
$6 = #f
$7 = "sourcevare.org"
$8 = #f
--8<---------------cut here---------------end--------------->8---
We can boil it down to this example:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(ice-9 regex)
scheme@(guile-user)> (string-match "[a-z]" "a")
$10 = #("a" (0 . 1))
scheme@(guile-user)> (string-match "[a-z]" "w")
$11 = #f
--8<---------------cut here---------------end--------------->8---
In short, under the sv_SE.utf8 locale of glibc 2.28, “w” is not
considered part of the ‘a-z’ interval.
Indeed, ‘localedata/locales/sv_SE’ in glibc reads this:
% The letter w is normally not present in the Swedish alphabet. It
% exists in some names in Swedish and foreign words, but is accounted
% for as a variant of 'v'. Words and names with 'w' are in Swedish
% ordered alphabetically among the words and names with 'v'. If two
% words or names are only to be distinguished by 'v' or % 'w', 'v' is
% placed before 'w'.
Using the “lower” regexp class instead of “[a-z]” works:
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (string-match "[[:lower:]]" "w")
$12 = #("w" (0 . 1))
--8<---------------cut here---------------end--------------->8---
However, it’s not clear to me whether the “lower” class is supposed to
be the same for all locales or if we’re just lucky:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
Thoughts?
The workaround until we’ve fixed it is to use another locale, though you
can still set “LC_MESSAGES=sv_SE.utf8” or “LANGUAGE=sv”.
Ludo’.