emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multibyte and unibyte file names


From: Eli Zaretskii
Subject: Re: Multibyte and unibyte file names
Date: Fri, 25 Jan 2013 22:31:19 +0200

> From: Stefan Monnier <address@hidden>
> Cc: address@hidden,  address@hidden,  address@hidden
> Date: Fri, 25 Jan 2013 06:36:39 -0500
> 
> >> That the callers get to see meaningful (decoded) names?
> >> That file-name manipulation functions don't have the side effect of
> >> encoding/decoding file names?
> > If we decode unibyte file names at entry to each primitive, before
> > doing anything else, and thereafter manipulate decoded multibyte
> > strings, this will happen anyway.
> 
> I get the impression that we're not talking about the same thing.

Looks like that.

> If you only decode on entry, then Elisp code will first see encoded file
> names returned by directory-files and will then see them converted to
> decoded form after passing the result to a file-name
> manipulation function.

No.  Elisp code will see _decoded_ file names from directory-files,
because we already decode them.  I didn't mean to change that.

What I meant was to return decoded file names from all file-name
primitives, such as file-name-nondirectory, even if their input was
encoded.

> Which is why I suggest to decode right away in the functions that return
> file names (e.g. directory-files).

We already do that, so there's no issue in that department.

The issue is in the file-name primitives that want to support both
encoded and decoded file names, and as I understand from this
discussion, this feature should stay.

> > But since everybody (at least those who spoke) seem to think this is a
> > w32 only problem, I will solve it for w32 only.
> 
> I think the specific problems you mentioned are mostly non-issues under
> POSIX, but the general problem of deciding which representation to use
> is more general.

I thought this was already decided in favor of decoded file names,
a.k.a. "multibyte strings".  The few calls that pass encoded file
names are rare exceptions, but since we want to keep support for
encoded file names, fixing those few places is not going to buy us
anything except code reshuffling.

The problem with encoded file names is that we have little support for
them.  E.g., we cannot up-/down-case them (except if we know the
encoding is supported by the current locale).  For multibyte encodings
that are not UTF-8, we also cannot scan them by characters, only by
bytes, so e.g. strchr will not generally work reliably.  We are
crippled.

So some things will never work with encoded file names, but I guess no
one cares, because most of those problems go away if the encoding is
UTF-8.  Fine; if no one cares, neither do I.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]