[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: wc enhancement possibility
From: |
Allan Chandler |
Subject: |
RE: wc enhancement possibility |
Date: |
Thu, 30 Jun 2016 09:34:15 +0000 |
Agreed, I'd be rather confused if the sum of "wc -l <file1" and "wc -l <file2"
was not the same as "cat file1 file2 | wc -l". That's why I thought it would be
better as a separate operation.
And I don't doubt you could generate a pipeline that would do the job (piping
it through "awk '{print}'" also seems to do the trick for me) but my point was
that counting things was the raison d'être of "wc", so it would be better added
to _that_ program. You could just as well argue that "sed" shouldn't have an
in-place editing option because you can do the same thing with a "mv" after the
event :-)
I assume by flag option you meant that, rather than a new counting option like
"--lines-including-partial-last", it may be better to have something that
modified the behaviour of the existing "--lines", such as
"--count-incomplete-last-line" or something like that. I have no issue with the
mechanics so that sounds fine, my only desire was that there should be _some_
way to get the information from "wc".
Cheers.
-----Original Message-----
From: Pádraig Brady [mailto:address@hidden]
Sent: Thursday, 30 June 2016 4:47 PM
To: Allan Chandler; address@hidden
Subject: Re: wc enhancement possibility
On 30/06/16 02:52, Allan Chandler wrote:
> Good arbitrary-time-of-day, people.
>
> I helped a colleague out today with a "wc" problem they were having with line
> counts when the final line of a file did not have a newline at the end of it.
>
> Now this is technically not a bug since the doco explicitly states that "wc
> --lines/-l" gives the count of newline characters, not the count of lines.
> And, in any case, it could be argued that the definition of a line SHOULD be
> "zero or more characters followed by a newline".
>
> However, this has caused confusion before in that a non-terminated final line
> COULD be considered a line, especially if you're just outputting the file.
>
> I don't propose changing the behaviour of "--lines" since that would
> result in chaos for a large number of scripts in the world currently
> using it, and I don't wish to spend the rest of my life fighting off
> affected parties, Omega-Man-against-the-zombies style, because of the
> trouble I caused :-)
>
> However, I wonder whether it would be worthwhile adding another option which
> included a final non-terminated line, something like "--lines-all".
>
> I've seen some "wc" suggestions turned down in the past
> (https://www.gnu.org/software/coreutils/rejected_requests.html) but these
> seem to generally be requests for things that other tools are better to
> provide.
>
> Keeping in mind the philosophy of UNIX's "a tool should do one thing and do
> it well", and the fact that the purpose of "wC" is most definitely counting
> things, it appears it may be a better fit in the "wc" program itself rather
> than doing it as part of a pipeline.
>
> Anyway, I'm really just raising it as a discussion point. Tell me what you
> think...
Maybe.
Note one of the reasons wc -l doesn't count a non \n terminated line at end of
file is so that counts are accurate for split files for example.
If we were to add an option it would be a flag type option rather than
selecting a different mode.
But it mightn't be too much overhead to pre-process the data?
I.E. something like:
wc-all-lines() { sed '$a\' | wc -l; }
cheers,
Pádraig