Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8

bug-auctex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-AUCTeX] Re: 11.81; output buffer is set to UTF-8

From:	David Kastrup
Subject:	Re: [Bug-AUCTeX] Re: 11.81; output buffer is set to UTF-8
Date:	Tue, 30 Jan 2007 16:41:17 +0100
User-agent:	Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

Stefan Pofahl <address@hidden> writes:

> David Kastrup <dak <at> gnu.org> writes:
>> 
>> > Ok my problem is, that I can not edit german tex-documents 
>> > with the current xemacs, shiped with Debian (vers. Sarge).
>> 
>> There is no such thing as a "German TeX-document" from the view of an
>> editor.
>
> Ok, German like umlauts "ü", "ä", "ö", and tex-documents with
> german language are "German TeX-documents", you can also say 
> "german language TeX-documents with e.g. latin1-coding" and 
> it would be nice if there is a chance the the LaTeX-mode would
> look for the correct coding of the TeX-file.

Umlauts, when entered via the keyboard, are interpreted according to
the keyboard locale.  Loaded and saved files are interpreted according
to the buffer file encoding which usually defaults to your system's
locale.  Many systems nowadays default to a utf-8 locale for most of
their operations, and thus they will, unless you configure them
differently, expect to work in such a locale.

XEmacs 21.4 out of the box is unable to deal with an utf-8 locale
(that's why you need (require 'un-define) for it), but it will likely
notice that your locale is not latin-1.  So it throws itself into some
semi-binary mode.  At least that's what I guess is the problem you are
trying to address: you have not at all described what actions lead to
what problems in your setup.

>> > XEmacs writes the buffer in the wrong coding-system into
>> > the file.
>> 
>> What do you call "wrong coding system"?
> A coding system that destroys me input, e.g. a "ä" should be handled
> like an "ä", and if emacs files, it is using the wrong
> coding-system.

Sorry, but that's just nonsense.  "ä" is a character.  XEmacs may pick
a different coding system from what you expected or used before, but
in that coding system, the character "ä" will still be represented
correctly if it exists.  Of course, this holds when your problem
happens when saving files.  When it happens loading files, the
situation is similar, but then no character "ä" is "destroyed", but
rather the code point in the file is interpreted differently.

Generally, loading and saving a file should mostly preserve its
contents regardless of whether they have been interpreted wrongly.
Again, XEmacs has a much worse track record preserving byte sequences
in the wrong encoding than Emacs has.

>> > Is it possible, that the "emacs/xemacs/auctex" is looking for
>> > the argument of e.g. "inputenc":
>> > \usepackage[coding-system]{inputenc}
>> >
>> > and changes the file I/O coding-system in a propper way?
>> 
>> Emacs (the snapshot version that is to become Emacs 22) uses the
>> function latexenc-find-file-coding-system in order to figure out the
>> correct coding system to use for LaTeX files.
> Is this a part of auctex of this new emacs or is it an emacs feature
> for LaTeX-users?

It is not a part of AUCTeX, but of Emacs, and then only the developer
version of Emacs that is to become Emacs 22.

>> Feel free to forward my comments there if you feel that I am biased
>> with my advice.
>
> Yes, of course, it should be one of the most simple things, even
> for xemacs to handle normal "latin1-coded" tex-files in a proper
> way,  especially if it calls itself XEmacs-mule ;-)

Files are primarily byte streams.  The information that they may be
"latin1-coded" is not inherent in them usually.  It may only be
deduced by looking at the locale and by heuristics mostly depending on
illegal byte sequences.  While those will not likely distinguish
between the various iso-latin-* encodings, they can, to some degree,
tell them apart from utf-8 and some other decodings.

So a lot of guess work is involved here, and XEmacs 21.4 is, of all
Emacsen available that happen to be able to deal with utf-8 at all,
pretty much the worst option.  It will basically not guess at all but
follow the locale, and even then you need to kick it extra if that is
utf-8.

Doing the right thing is far from trivial, in particular if "the right
thing" is such a fuzzy concept.

-- 
David Kastrup

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, Stefan Pofahl, 2007/01/30
- Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, David Kastrup, 2007/01/30
  - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, stefan pofahl, 2007/01/30
    - Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, David Kastrup, 2007/01/30
    - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, Stefan Pofahl, 2007/01/30
    - Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, David Kastrup, 2007/01/30
    - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, Stefan Pofahl, 2007/01/30
    - Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, David Kastrup <=
    - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, Stefan Pofahl, 2007/01/30
    - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, stefan pofahl, 2007/01/30
    - Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, David Kastrup, 2007/01/30
    - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, Stefan Pofahl, 2007/01/31
    - Re: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, David Kastrup, 2007/01/31
  - [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8, stefan pofahl, 2007/01/30

Prev by Date: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8
Next by Date: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8
Previous by thread: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8
Next by thread: [Bug-AUCTeX] Re: 11.81; *output* buffer is set to UTF-8
Index(es):
- Date
- Thread