[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wc enhancement possibility
From: |
Pádraig Brady |
Subject: |
Re: wc enhancement possibility |
Date: |
Thu, 30 Jun 2016 09:46:31 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 30/06/16 02:52, Allan Chandler wrote:
> Good arbitrary-time-of-day, people.
>
> I helped a colleague out today with a "wc" problem they were having with line
> counts when the final line of a file did not have a newline at the end of it.
>
> Now this is technically not a bug since the doco explicitly states that "wc
> --lines/-l" gives the count of newline characters, not the count of lines.
> And, in any case, it could be argued that the definition of a line SHOULD be
> "zero or more characters followed by a newline".
>
> However, this has caused confusion before in that a non-terminated final line
> COULD be considered a line, especially if you're just outputting the file.
>
> I don't propose changing the behaviour of "--lines" since that would result
> in chaos for a large number of scripts in the world currently using it, and I
> don't wish to spend the rest of my life fighting off affected parties,
> Omega-Man-against-the-zombies style, because of the trouble I caused :-)
>
> However, I wonder whether it would be worthwhile adding another option which
> included a final non-terminated line, something like "--lines-all".
>
> I've seen some "wc" suggestions turned down in the past
> (https://www.gnu.org/software/coreutils/rejected_requests.html) but these
> seem to generally be requests for things that other tools are better to
> provide.
>
> Keeping in mind the philosophy of UNIX's "a tool should do one thing and do
> it well", and the fact that the purpose of "wC" is most definitely counting
> things, it appears it may be a better fit in the "wc" program itself rather
> than doing it as part of a pipeline.
>
> Anyway, I'm really just raising it as a discussion point. Tell me what you
> think...
Maybe.
Note one of the reasons wc -l doesn't count a non \n terminated line at end of
file
is so that counts are accurate for split files for example.
If we were to add an option it would be a flag type option
rather than selecting a different mode.
But it mightn't be too much overhead to pre-process the data?
I.E. something like:
wc-all-lines() { sed '$a\' | wc -l; }
cheers,
Pádraig