[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EOL: unix/dos/mac

From: Eli Zaretskii
Subject: Re: EOL: unix/dos/mac
Date: Tue, 26 Mar 2013 15:07:21 +0200

> From: "Stephen J. Turnbull" <address@hidden>
> Cc: address@hidden,
>     address@hidden,
>     address@hidden
> Date: Tue, 26 Mar 2013 20:47:33 +0900
>  > > Trying to support multiple EOL codings in the buffer is craziness.
>  > 
>  > But it's the only way to be 100% sure you don't introduce spurious
>  > changes into files.  And since newlines, unlike characters, are not
>  > displayed, there's no issues with fonts etc. here.
> Currently NLFs *are* displayed, if they don't match the default for
> the buffer.

No, they are displayed because nothing other than a single LF is
treated like NLF by the Emacs internals.  EOL conversion is a layer on
top of that; the buffer maintenance and the display engine know
absolutely nothing about it.

Once these byte sequences are recognized as NLFs, they will not be
displayed, because that's how the Emacs display works.

>  > > Doing it only for EOLs would be much less painful, but it's not
>  > > worth it.
>  > 
>  > Please explain why do you think it isn't worth it.
> Because you have to fix pretty much everything

I'm probably missing something important, because things I think will
need fixing are nowhere near "pretty much everything".  How about
posting a long enough list of things to fix to convince me that
"pretty much everything" is close to the truth?

> new syntax will be required for stuff like zap-to-char


> and nearly required for regexps.

For $ we will need to get regex.c support the additional NLFs, and
that's all.  If you mean a literal \n in regexps, then yes, something
will have to be done with that.  But it would be a good thing on its
own right, because Emacs will come closer to supporting Unicode
standard annexes.

> Code will be massively uglified with tests for variable-length
> sequences instead of single characters

The code is already replete with that, ever since Emacs started using
a multi-byte representation for characters in buffers.  We have a set
of macros to fetch and examine multi-byte sequences, for that reason.
I see nothing hard or "ugly" here, sorry.

> everything from motion to insdel will have to be modified


> Any code handling old-style hidden lines (with CR marking
> "invisible" lines) will have to be changed.

First, we want to deprecate and remove this feature anyway (there's
already an implemented alternative).  And second, we already handle
this today so that we don't display ^M there; the same method can be
used for the other NLFs.

> It's not obvious to me that there are no counterintuitive
> implications.  Opposed to that, there are very few text files with
> mixed line endings, and in many cases the user would actually like to
> have them regularized (at a time of their choosing, so they can have a
> commit with only whitespace changes, for example).

We should be consistent: either there is a problem with mixed line
endings and with Unicode NLFs that aren't treated as EOL at all, or
there isn't.  If the problem is insignificant, perhaps nothing should
be changed at all.  If the problem _is_ significant, we might as well
solve it The Right Way, instead of applying more and more band-aid.
Conversion of NLFs to a single LF is a kludge, same as emptying the
kettle when you already have a procedure for preparing a kettle of
boiled water starting with an empty one.  You cannot do such
conversion efficiently if you need to discover the EOL format for
every line.  Dispensing with the conversion altogether solves both
problems in one go.  What it adds doesn't seem so frightening to me,
certainly less so than, say, adding bidi support ;-)

>  > Surely, going again through the pain of inadvertent changes to user
>  > files is a movie we don't want to be part of again.
> What pain of inadvertant changes?  Sure, there will likely be bugs in
> the first draft of such code, what else is new?  If you're talking
> specifically about the \201 regression, that's a completely different
> issue AFAICT -- that was about buffer-as-unibyte exposing the
> *internal* representation to Lisp, which was a "Mr. Foot, may I
> introduce to you Mr. Bullet" kind of idea from Day 1.

The internal representation is still exposed, so nothing's changed in
that department.

>  > >  > Anything else _will_ introduce spurious modifications, and could
>  > >  > even corrupt some files, if the exact EOL sequence here or there
>  > >  > matters.
>  > > 
>  > > No, it need not, any more than any ambiguous encoding need do so.  Of
>  > > course it will be fragile if (for example) Emacs crashes and you have
>  > > to recover an autosave file.
>  > 
>  > It will be fragile, and subtle bugs will tend to break quite a bit.
> I don't think so.

Well, then we will have agree to disagree.

> I think you're hearing monsters in the closet.

And I think _you_ are hearing them.  Or maybe you will show me such a
large list of things that will become broken by keeping NLFs that I
will change my mind.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]