bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58281: 27.1; windows mangles encoding on command line


From: Eli Zaretskii
Subject: bug#58281: 27.1; windows mangles encoding on command line
Date: Wed, 12 Oct 2022 19:35:49 +0300

> From: Daniel Bastos <dbastos@id.uff.br>
> Date: Wed, 12 Oct 2022 08:49:32 -0300
> Cc: 58281@debbugs.gnu.org
> 
> > I think you said at some point that using non-ASCII commit log
> > messages from a shell outside of Emacs did succeed?  If so, can you
> 
> Not from a shell but from a regular GNU EMACS buffer.  I then showed
> an ESHELL session where I don't specify the commit message on the
> command-line and then emacsclientw was invoked.  In the buffer that
> opened, I typed an UTF-8 encoded message and that was not mangled.
> 
> --8<---------------cut here---------------start------------->8---
> However, if instead of the command-line, I use a regular GNU EMACS
> buffer, it works just fine.
> 
> %echo kkk >> encoding.txt
> 
> %fs commit
> Pull from https://mer@somewhere.edu/test
> Round-trips: 1   Artifacts sent: 0  received: 0
> Pull done, wire bytes sent: 437  received: 2118  ip: 5.161.138.46
> emacsclientw ./ci-comment-A2803F45F10B.txt
> Waiting for Emacs...
> Pull from https://mer@somewhere.edu/test
> Round-trips: 1   Artifacts sent: 0  received: 0
> Pull done, wire bytes sent: 441  received: 2118  ip: 5.161.138.46
> New_Version: 09ea1b5d5b8d776d61a74bb412cd58bd8b6f82323c2f539a1eb0d915f7026f20
> Sync with https://mer@somewhere.edu/test
> Round-trips: 1   Artifacts sent: 2  received: 0
> Sync done, wire bytes sent: 2496  received: 309  ip: 5.161.138.46
> 
> %fs timeline
> === 2022-10-01 ===
> 14:09:39 [09ea1b5d5b] *CURRENT* Naiveté. (user: mer tags: trunk)
> --8<---------------cut here---------------end--------------->8---

I don't understand what that means, sorry.  There's a lot of stuff
that isn't relevant to the issue at hand (and I'm not familiar with
fossil, so its detailed output makes no difference to me).  But
there's no description of what you did in plain English, which I could
read and understand.

I'm guessing that emacsclientw was invoked to edit a file with the
commit log message, and the commit command then used that edited
file.  If that is true, then there's no wonder this works: the problem
you experience only happens if the commit log message is passed to
fossil through the command-line arguments, not through a disk file.

> > describe how you do that, i.e. which shell do you use and how you type
> > 'Naiveté' from the shell?  Also, what does the command "chcp" report
> > in that shell, if you invoke it with no arguments?
> 
> I had not tested with a different shell.  I'm testing it with cmd.exe
> below.  The encoding is not mangled, but I don't know which encoding
> is applied there because I have no idea how cmd.exe works.  The
> command chcp reports code page 850.

If chcp says codepage 850, then cmd.exe uses that codepage to encode.
And my reading of the fossil source code is that it converts the
command-line arguments from the codepage-encoding to UTF-8 internally.

> 
> --8<---------------cut here---------------start------------->8---
> c:\my\path>chcp
> Active code page: 850
> 
> c:\my\path>fossil commit -m 'Naiveté'
> Pull from https://mer@somewhere.edu/mer
> Round-trips: 1   Artifacts sent: 0  received: 0
> Pull done, wire bytes sent: 438  received: 3250  ip: 5.161.138.46
> New_Version: 8cce649b5236e507e84ce8114ab273e3b9ea246dd00e42484b47ab86517cf028
> Sync with https://mer@somewhere.edu/mer
> Round-trips: 1   Artifacts sent: 2  received: 0
> Sync done, wire bytes sent: 3615  received: 307  ip: 5.161.138.46
> 
> c:\my\path>fossil timeline -n 1
> === 2022-10-12 ===
> 11:31:30 [8cce649b52] *CURRENT* 'Naiveté' (user: mer tags: trunk)
> --- entry limit (1) reached ---
> 
> c:\my\path>
> --8<---------------cut here---------------end--------------->8---

So now the question is why Eshell doesn't use the cp850 encoding when
you tell it?  What happens if you say

  C-x RET f cp850 RET

in the Eshell buffer before invoking the commit command?

> However, there is some evidence that UTF-8 is the encoding used by
> cmd.exe.  I committed again with the message "água aaaaa".
> 
> --8<---------------cut here---------------start------------->8---
> c:\my\path>fossil timeline -n 1
> === 2022-10-12 ===
> 11:38:30 [148c174ad3] *CURRENT* água aaaaa (user: mer tags: trunk)
> --- entry limit (1) reached ---
> --8<---------------cut here---------------end--------------->8---
> 
> I know "á" encodes to the two-byte c3 a1 in UTF-8.  Asking /od/ to
> show me the byte sequence, I see the c3 a1 in there.  First notice the
> position of the two-byte sequence of interest --- it's in line 0000060
> at the 4th column.
> 
> --8<---------------cut here---------------start------------->8---
> c:\my\path>fossil timeline -n 1 | od -t c
> 0000000   =   =   =       2   0   2   2   -   1   0   -   1   2       =
> 0000020   =   =  \n   1   1   :   3   8   :   3   0       [   1   4   8
> 0000040   c   1   7   4   a   d   3   ]       *   C   U   R   R   E   N
> 0000060   T   *       Ã   ¡   g   u   a       a   a   a   a   a       (
> [...]
> --8<---------------cut here---------------end--------------->8---
> 
> If we look at which bytes are there, we find c3 a1.  I do not
> understand this: I have no idea why my cmd.exe is UTF-8 encoding
> anything.

It doesn't.  What you see is the result of fossil's internal
conversion to UTF-8, not what cmd.exe passed to fossil.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]