[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


From: Dave Love
Subject: non-ASCII TAGS
Date: 02 Apr 2003 18:34:53 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Probably the worst problem with using non-ASCII programming
identifiers is etags.  It isn't aware of encoding issues and fixing
the issues is non-trivial, so this is mainly raising a flag and hoping
someone can work on it.  I think sorting it out requires not only
extending the TAGS format, but probably also generating it with Emacs.
I don't have time to work on this, but here's the problem and some

In general, different files in TAGS could have different encodings --
as is actually the case for Emacs, but the tags are all ASCII -- and
file names encoded inappropriately for the locale in which it's used.
I think it's reasonable to assume that the locale in which it's used
is the same as the one in which it's generated, i.e. the file names
are always in `file-name-coding-system', though it wouldn't harm to
record the locale information and act on it.  However, the file
content encodings may well be different from the locale coding system
which determines the encoding of their names, e.g. utf-8 code
processed in a Latin-N locale.

Thus it's a question of labelling the section for each file with a
coding system corresponding to how Emacs would read the source file
(accounting for coding cookies &c).  This can all be decoded
appropriately with a bit of effort, and searches in the result should

I think the TAGS files have to be generated with Emacs, since making
etags.c multilingual doesn't seem realistic.  A Lisp version should be
efficient enough and it would have the advantage that tags, imenu and
font-lock might work from the same set of patterns.  It could be used
in makefiles by running Emacs in batch, obviously.  It will be a
significant amount of work, though, and I guess dropping etags.c isn't
reasonable, so two programs would have to be maintained in parallel.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]