[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: EOL: unix/dos/mac

From: Stephen J. Turnbull
Subject: Re: EOL: unix/dos/mac
Date: Wed, 27 Mar 2013 03:34:36 +0900

Alan Mackenzie writes:

 > This is a little confusing to poor old me.  ASCII doesn't care about line
 > breaks either; only particular use cases care.

True.  ASCII is a coded character set.  It does not have a way to
represent an abstract line break in a single character; whatever you
do, then, is outside of the ASCII standard.

 > If you write a script (whether bash, sed, ....) on a *nix system
 > and it has CRLF line ends, it will fail (with an obscure error
 > message) regardless of whether that script is nominally in UTF-8 or
 > ASCII or whatever.

Python, at least, is not in your ellipsis.  Not by default, and not on
any supported platform.  I wouldn't be surprised if Perl and Ruby have
adopted "universal newlines", too.

 > In what sense does Unicode "not care"?

In the sense that Unicode is more than a character set; it prescribes
all kinds of algorithms for text processing as well.  Here, section
5.8 of the Unicode Standard v6.2 prescribes that any of LF, CR, CRLF,
and ISO 6246 NEXT LINE (U+0085) should be considered to be a single
line (or paragraph) break in legacy text.  It says nothing about how
they should be represented internally, though.  Unusually for the
Unicode Standard, it allows you to guess what the user wants, and in
some cases even alter the input stream before outputting it.

"Legacy" text means it uses ASCII (or C1) control characters to
represent line and/or paragraph breaks, rather than the characters
prescribed by Unicode (U+2028 LINE SEPARATOR and U+2029 PARAGRAPH

reply via email to

[Prev in Thread] Current Thread [Next in Thread]