bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25707: [PATCH] grep: don't forcefully strip carriage returns


From: Eli Zaretskii
Subject: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 19:26:39 +0200

> From: Eric Blake <address@hidden>
> Date: Mon, 13 Feb 2017 14:20:56 -0600
> 
> > While we're on the topic, the undossify_input approach is just a
> > heuristic and it sometimes guesses wrong. I wish the heuristic could be
> > removed somehow, so that grep would behave more deterministically on
> > MS-DOS/Windows.
> > 
> 
> I'm of the opinion that undossify_input causes more problems than it
> solves.  We should trust fopen("r") to do the right thing, rather than
> reinventing it ourselves.

FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
users if you remove undossify_input and decide to trust fopen's "r"
(or "rt") mode.  That's because reading a file which was opened in
text-mode generally removes _all_ CR characters, even if they are not
followed by a newline; it also stops on the first ^Z character in the
file, treating it as a kind of "software EOF", a legacy from CP/M
years.

That's why the original patch switched the file descriptor to binary
mode (Grep used 'open', not 'fopen', in those days) and used
undossify_input: that allowed Grep to DTRT with these use cases,
removing CRs only if they are followed by a newline, and not stopping
at ^Z.  As a side effect, undossify_input also collects the
information needed for displaying byte offsets.

It seems to me that when one bumps into some code which looks
incorrect or less than optimal, and one considers its replacement with
a more clever code, it would be a good idea to ask the person(s) who
contributed the original code, in case there was some good reason for
doing it that way.  Was that done in this case?  If not, it should
have been.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]