[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#15803: default-file-name-coding-system: utf-8 better than latin-1 th
From: |
Eli Zaretskii |
Subject: |
bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days? |
Date: |
Fri, 11 Sep 2020 15:24:14 +0300 |
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: rgm@gnu.org, 15803@debbugs.gnu.org
> Date: Fri, 11 Sep 2020 13:27:28 +0200
>
> make[1]: Entering directory '/home/larsi/src/emacs/f�o/test'
> ELC lisp/eshell/eshell-tests.elc
> foo2:
> "#(\"/home/larsi/src/emacs/fóo/test/lisp/eshell/eshell-tests.elcnjDFYY\" 0 65
> (charset iso-8859-1))"
> >>Error occurred processing lisp/eshell/eshell-tests.el: File is missing
> >>(("Doing chmod" "No such file or directory"
> >>"/home/larsi/src/emacs/f\303\263o/test/lisp/eshell/eshell-tests.elcnjDFYY"))
> make[1]: *** [Makefile:165: lisp/eshell/eshell-tests.elc] Error 1
>
> So it's created a tempfile, tagged with the correct charset (I had no
> idea that that's how it worked), but decoded, and then set-file-modes
> interprets that as an UTF-8 file name.
>
> So... it's a bug in set-file-modes? Hm, nope, write-region has the
> same problem.
There be dragons ;-)
The problematic aspect of debugging these problems is that what you
see is not always what's there, due to display and decoding/encoding
operations by both Emacs and the display software you have on your
system (which drives the terminal).
In particular, strings inside Emacs are always in UTF-8-compatible
encoding, so the fact you get UTF-8 in *Messages* doesn't prove
anything. What we need is to find 2 types of possible problems:
. raw bytes from Latin-1 encoding inside Emacs buffers or strings
that are supposed to be decoded
. UTF-8 encoded (instead of Latin-1 encoded) characters passed to
libc functions
So if you found that the problem reveals itself in set-file-modes,
let's see what happens there. The relevant code is this:
char *fname = SSDATA (ENCODE_FILE (absname));
mode_t imode = XFIXNUM (mode) & 07777;
if (fchmodat (AT_FDCWD, fname, imode, nofollow) != 0)
report_file_error ("Doing chmod", absname);
Please either run this under GDB, or add printf's, to show the byte
sequences of 'absname' and of 'fname'. The former should be in UTF-8
(so you should see 0xC3 and 0xB3 for the ó character), the latter
should be in Latin-1 (so you should see 0xF3 for the same letter).
This should give us some hints wrt where to look for the cause of the
problem.
> That weird file name (decoded and tagged with a charset text parameter)
> comes from make-temp-file -- everything seems to be OK before that.
> target-file is:
>
> foo: "\"/home/larsi/src/emacs/f\\363o/test/lisp/eshell/eshell-tests.elc\""
>
> which seems to be correct,
Where does the "foo:" printout comes from? I wouldn't expect to see
Latin-1 encoded strings inside Emacs, not normally anyway.
> but
>
> (tempfile
> (make-temp-file (expand-file-name target-file)))
>
> is
>
> "#(\"/home/larsi/src/emacs/fóo/test/lisp/eshell/eshell-tests.elcnjDFYY\" 0 65
> (charset iso-8859-1))"
I see nothing wrong here: this is how decoding works in Emacs. And
again, how did you produce this string? As I explained above, the
details of how you display these strings matter in this case.
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/09
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2020/09/09
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/10
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2020/09/10
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?,
Eli Zaretskii <=
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Eli Zaretskii, 2020/09/11
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Michael Albinus, 2020/09/12
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/12
- bug#15803: default-file-name-coding-system: utf-8 better than latin-1 these days?, Lars Ingebrigtsen, 2020/09/11