[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EOL: unix/dos/mac

From: Stephen J. Turnbull
Subject: Re: EOL: unix/dos/mac
Date: Tue, 26 Mar 2013 16:45:30 +0900

Eli Zaretskii writes:
 > > From: "Stephen J. Turnbull" <address@hidden>

 > > [Unicode] just says "all of these sequences when encountered in
 > > text purporting to conform to this standard should be treated in
 > > the same way."  Emacsen should do the same.
 > That would require Emacs to store all the possible EOL sequences in
 > the buffer, and treat them all identically.  That's doable, but is a
 > non-trivial job; volunteers are welcome.

I don't know what you mean by "all the possible EOL sequences".  It's
well-defined (in Unicode TR#13 or section 5.8 of Unicode 6.2) what an
NLF is: it's the first of CRLF, LF, CR, or NL (U+0085) that matches
when parsing a line.  In the buffer, they would all be converted to
Emacs' representation (ie, LF).  Ensuring that C-x C-f file RET C-x
C-w file RET is the identity requires marking non-default EOL
sequences somehow, that's all.

 > > The question then is how to deal with file comparison.  We'd like to
 > > avoid creating spurious diffs based on "fixing" random different line
 > > endings
 > If Emacs is to support different EOL formats in the same file, it
 > should not convert them at all.

Of course it should convert them.

Trying to support multiple EOL codings in the buffer is craziness.
Two decades ago, I had to live that madness at the coding system
level, it was called "Nihongo Emacs" (or "The Japanese Patch" in other
programs).  Richard (and every other upstream maintainer) rightly
(with all due respect to the developers of those patches) rejected
that patch for application to the mainstream project.  Doing it only
for EOLs would be much less painful, but it's not worth it.

 > Anything else _will_ introduce spurious modifications, and could
 > even corrupt some files, if the exact EOL sequence here or there
 > matters.

No, it need not, any more than any ambiguous encoding need do so.  Of
course it will be fragile if (for example) Emacs crashes and you have
to recover an autosave file.

 > > I guess one could attach a text property to newlines differing from
 > > the file's autodetected EOL convention.
 > Not sure how a text property should help here.

It would mark non-default EOL sequences for correct output.

 > > I've also considered switching the internal representation of newline
 > > to U+2028 LINE SEPARATOR
 > What good would that be?

Unicode correctness; no confusion between Emacs internal
representation and the actual encoding of EOL on any given platform;
no long-lines ambiguity (LS would be considered a "soft newline" in
applications that automatically rewrap, and U+2029 PARAGRAPH SEPARATOR
would unambiguously demark paragraphs).

As I wrote, it's not urgent.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]