guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: David Kastrup
Subject: Re: guile can't find a chinese named file
Date: Thu, 16 Feb 2017 12:49:23 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

Marko Rauhamaa <address@hidden> writes:

> David Kastrup <address@hidden>:
>> It's still irrelevant since split does not _use_ the existing file name
>> for constructing new file names.
>
> Split was just an example of a command that concatenates bytes sequences
> to get pathnames, nothing more.
>
> Such concatenation is commonplace in Linux programs of all kinds.
>
> And the point of bringing concatenation into the discussion was that
> remapping byte sequences to byte sequences breaks concatenation
> additivity:
>
>    U(x) + U(y) = U(x + y)

But Emacs' implementation doesn't in any respect "break concatenation
additivity".

If you split an arbitrary byte stream (including material invalid as
UTF-8) at an arbitrary point (including in the middle of an UTF-8
character), decode the resulting pieces as UTF-8 (as one of several
"reversible" encodings Emacs can interpret), concatenate the resulting
Emacs strings and reencode the result as UTF-8 (since you actually need
to provide a byte sequence to open(1) or similar), you will retain the
original byte stream.  No ifs and buts.

The _decoded_ concatenated string might differ from decoding the unsplit
byte string: it might contain "byte 0xc2, byte 0x80" (represented as
0xc1 0x82 0xc0 0x80) at the concatenation point rather than "character
0x80" (represented as 0xc2 0x80).  But the moment you use this
concatenation of half-sequences as a file name, it gets reencoded into
the bytes 0xc2 and 0x80 and works just fine.

-- 
David Kastrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]