bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#54893: guix-daemon, locale, LANG, and unicode in git tag names


From: Attila Lendvai
Subject: bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
Date: Wed, 13 Apr 2022 07:51:08 +0000

> * LANG should be set, because it is in #:leaked-env-vars (see
> guix/git-download.scm). I don't know whose LANG it is though
> -- the user's, or the daemon's?


if i add this to the gexp:

(simple-format (current-error-port)
               "LANG is '~A'~%"
               (getenv "LANG"))
(setenv "LANG" "en_US.utf8")
(setenv "GUIX_LOCPATH" "/run/current-system/locale")
(setlocale LC_ALL (getenv "LANG"))

i see:

LANG is ''
Backtrace:
           2 (primitive-load "/gnu/store/z4bis94jg0s0y0xj1xbmliv7xs8?")
In ice-9/eval.scm:
    619:8  1 (_ #f)
In unknown file:
           0 (setlocale 6 "en_US.utf8")

ERROR: In procedure setlocale:
In procedure setlocale: Invalid argument


> * GUIX_LOCPATH is not leaked.


it's the same if i add GUIX_LOCPATH to the #:leaked-env-vars and don't setenv 
it explicitly.


> * Even if it was, I don't think that /gnu/store/...glibc-locales
> would be accessible from the build container (though you could give
> it a try?).


i didn't check this specifically, but i'm afraid you are right, and this is why 
my kludge doesn't work.


> * So perhaps GUIX_LOCPATH needs to be set in the gexp in
> guix/git-download.scm, + some setlocale as done by
> gnu-build-system.


i don't understand why the setlocale call in gnu-build-system's install-locale 
works, but my setlocale kludge in git-download doesn't.

i even tried to add glibc-locale as native-inputs to the package in question, 
but it didn't help.


> * Long-term, it could be interesting to remove the
> ‘file name = string encoded in current locale's encoding’
> assumption from Guile.


i'm not sure why the wrong locale breaks file-system walking and deleting, 
though.

i assume if every function in guile uses/assumes the same locale (character 
encoding), then both directions through the guile FFI should be idempotent, no? 
and i think both ASCII and UTF-8 are idempotent wrt C bytes <-> scheme string 
conversions. IOW, it's only the displaying of the chars that should be broken, 
not file operations.

or am i wrong to assume this?

or maybe the character encoding algo used in guile's FFI silently emits actual 
question marks in place of bytes that are outside the valid range of the 
encoding used? if so, that's not a very defensive way of coding, and it's 
eating up hours of my life...

hrm... this is not relevant here, only a related thought: things can go wrong 
in the GEXP serialization, too: if the writing side and the reading side 
doesn't use the same character encoding. locale should be set explicitly at the 
relevant entry points.

i'd appreciate if someone could help me come up with at least a kludge, so that 
i could make progress until it's fixed properly.

thanks for your insights Maxime,

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
If you never heal from what hurt you, you'll bleed on people who didn't cut you.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]