--- Begin Message ---
Subject: |
emacs-mule/utf-8 difference |
Date: |
Thu, 01 Mar 2012 16:39:57 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111109 Thunderbird/3.1.16 |
Hi,
I have a problem regarding coding systems:
I'm using process-send-string to send substrings of a buffer through a
socket, after setting the process encoding and decoding systems to
emacs-mule.
I expect the number of bytes written to match the byte-length of the
substring as obtained by position-bytes, since the specification of
position-bytes in emacs-devel is to always work with the emacs-mule
encoding. From emacs-devel:
"The byte sequence of a buffer after decoded is always in emacs-mule (in
emacs-unicode-2 branch, it's utf-8). So, changing
buffer-file-coding-system or any other coding-system-related variables
doesn't affects position-bytes."
However, this is not the case with 3bytes utf8 characters:
position-bytes counts them as 3 bytes, but process-send-string wirtes 4
bytes.
Setting the process coding systems for the socket to utf-8 solves the
problem, but I don't think it will with other coding systems, even if I
used buffer-file-coding-system instead, since position-bytes does not
use it.
What is the real expected behavior of these things, and how to make this
correct ?
Regards,
Tiphaine Turpin
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#10919: emacs-mule/utf-8 difference |
Date: |
Thu, 01 Mar 2012 19:54:48 +0200 |
> Date: Thu, 01 Mar 2012 16:39:57 +0100
> From: Tiphaine Turpin <address@hidden>
>
> From emacs-devel:
>
> "The byte sequence of a buffer after decoded is always in emacs-mule (in
> emacs-unicode-2 branch, it's utf-8).
This is very old info. The emacs-unicode-2 branch was merged with the
mainline when Emacs 23.1 was released.
> So, changing
> buffer-file-coding-system or any other coding-system-related variables
> doesn't affects position-bytes."
>
> However, this is not the case with 3bytes utf8 characters:
> position-bytes counts them as 3 bytes, but process-send-string wirtes 4
> bytes.
process-send-string _encodes_ the string, it does not send the
internal representation of the string in the buffer. Using
process-send-string is like writing the string to a disk file: Emacs
encodes it before sending or writing.
Therefore, buffer-file-coding-system _does_ affect what is being sent.
I'm closing this non-bug.
--- End Message ---