Re: coding tags and utf-16

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coding tags and utf-16

From:	Kevin Rodgers
Subject:	Re: coding tags and utf-16
Date:	Wed, 08 Feb 2006 17:32:02 -0700
User-agent:	Mozilla Thunderbird 0.9 (X11/20041105)

David Kastrup wrote:

Kenichi Handa <address@hidden> writes:

In article <address@hidden>, Stefan Monnier <address@hidden> writes:

So, in any cases, a tag value itself is useless.  Then how
to detect utf-16 more reliably?  In the current Emacs
(i.e. Ver.22), I think we can use auto-coding-regexp-alist
or auto-coding-alist.  In the former case, we can register
BOM patterns and also something like "\\`\\(\0[\0-\177]\\)+"
for utf-16be.  In the latter case, you can use more
complicated heuristics in a registered function.

Can't it be somehow added to detect_coding_utf_16?


Yes, but usually it has no effect if, for instance,
iso-8859-1 is more preferred.  If only ASCII and Latin-1
characters are encoded in utf-16, all bytes (including BOM)
are valid for iso-8859-1.


I thought we had discussed this already.  The BOM-encodings should
have priority since the likelihood of a misdetection is negligible
(the character pair does not make sense at the start of a text in
latin-1 in any language): the only thing that can reasonably be
expected to happen is that a binary file is detected as utf-16.  Not
much of an issue, I'd say.

Exactly. So why haven't these entries been added toauto-coding-regexp-alist?


("\\`\xEF\xBB\xBF" . utf-8)
("\\`\xFE\xFF" . utf-16-be)
("\\`\xFF\xFE" . utf-16-le)
("\\`\x00\x00\xFE\xFF" . utf-32-be)
("\\`\xFF\xFE\x00\x00" . utf-32-le)

Of course, for the BOM-less utf-16 encodings, priority should depend
on the language environment.


Definitely.
--
Kevin Rodgers

[Prev in Thread]

Current Thread

[Next in Thread]

Re: coding tags and utf-16, Kevin Rodgers <=

Prev by Date: Re: removal of erc-viper.el
Next by Date: coding systems vs. info files
Previous by thread: ffap bindings suggestion
Next by thread: coding systems vs. info files
Index(es):
- Date
- Thread