[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I'm sorry that I tried to insert a DOS file

From: Eli Zaretskii
Subject: Re: I'm sorry that I tried to insert a DOS file
Date: Wed, 13 Feb 2002 13:41:07 +0200

On 13 Feb 2002, Kim F. Storm wrote:

> I really don't understand why INSERTING a file (or another buffer)
> into a buffer should change the coding system of the target buffer --
> if the ONLY difference is in the EOL format of the inserted file.
> In that case, the EOL format of the source file should simply be
> ignored (i.e. converted to the EOL format of the target buffer)!
> The exception may be if the target buffer is empty, in which case it
> could inherit the EOL format from the inserted buffer.

These decisions are currently made by insert-file-contents and its 
subroutines.  On that level, it's not easy to be optimal in each and 
every case, because insert-file-contents doesn't always know enough about 
the context to DTRT.

It is also not clear why an empty buffer should be treated differently 
than a buffer with a single character, for example.  Where do you draw 
the line beyond which the current buffer's file encoding should be 
ignored because you inserted text from another file?

This discussion was held among developers a long time ago.  I don't 
remember the details, but there were valid reasons to do it both ways, so 
no single way is better.

> > Note that similar situations occur when the buffer's file encoding is 
> > different from the encoding of the inserted text.  E.g., imagine a buffer 
> > visiting a Latin-1 file into which you insert an ISO-2022 file.  In those 
> > situations, too, asking the question after C-x i already inserted the 
> > text is too late.  (I assume that any feature that asks the user should 
> > handle the latter case as well, not be limited to the EOL format alone.)
> I haven't look at the code (and I'm not really using the coding stuff
> except for the EOL support), but are you saying that if I have a
> Latin-1 buffer, and I insert an ISO-2022 file into it, emacs will
> automatically convert all the Latin-1 stuff already in the buffer into
> ISO-2022 ?

Neither.  The characters in the buffer are neither in Latin-1 nor in
ISO-2022, they are in the internal Mule representation.  This is what 
decoding is all about: you convert the external representation, like 
Latin-1 and ISO-2022, into the internal representation.

buffer-file-coding-system, whose mnemonic you see on the mode line, 
doesn't tell anything about the characters in the buffer.  What it tells 
is how those characters will be encoded when you save the buffer to a 
file, or send it as email.

> But what if I insert an Latin-9 file instead...  What happens to the
> parts of Latin-1 file which cannot be represented in Latin-9 ?

This question is only meaningful when you type "C-x C-s".  Until you do, 
there's no problem; Emacs can mix any characters in the buffer, no matter 
what its buffer-file-coding-system is.

> In
> that case, I would most likely prefer to keep my Latin-1 stuff, and not
> being able to see the part of the Latin-9 file which does not fit into
> Latin-1.

That's the reason Emacs pops a question about an appropriate coding 
system when you try to save the buffer.  At that time, and no earlier, 
does Emacs force you to make a decision.

Some people say that's too late.  They say Emacs should have warned you 
about possible problems when you insert the first Latin-9 character into 
an otherwise Latin-1 buffer.  It should be possible to implement such a 
feature, but it isn't easy to do so without making Emacs an absolute 
annoyance for people who routinely mix scripts and languages, given the 
current architecture of m17n support in Emacs.

> And what does yank do (at least I can see that it converts EOL to the
> target buffer format)?

No, it doesn't convert anything.  Buffers and strings inside Emacs are 
always keopt in Unix LF-only EOL format.  This, too, becomes an issue 
only when Emacs is about to write a region of text to a file.

> In any case, the documentation on insert-file-contents is a bit
> vague on the consequences of mixing coding systems....

IMHO, that's an understatement of the decade ;-)  Mule issues in general
are notoriously underdocumented, the exception being the user-level 
facilities that got lots of attention in the user manual in preparation 
for Emacs 21.1 release.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]