coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/4] cut: Optionally treat multiple consecutive delimiters as


From: Dragan Simic
Subject: Re: [PATCH 4/4] cut: Optionally treat multiple consecutive delimiters as one
Date: Tue, 01 Aug 2023 20:37:43 +0200

On 2023-08-01 16:42, Pádraig Brady wrote:
On 01/08/2023 10:07, Dragan Simic wrote:
Add new command-line option and the required logic that allow multiple
consecutive delimiters to be treated as a single delimiter. Of course,
this option is valid only with the cut's field mode.

This new feature should make cut much more usable in various real-world
applications, some of which are already mentioned in the gotchas.  For
example, merging the consecutive delimiters is very useful when cut is
used to process the outputs of various commands.

Add a whole battery of new cut tests, which cover this new feature, and
add more tests for the related already existing features, to make sure
no regressions are introduced.

While there, clean up the comments and the whitespace in the cut tests
a bit, to make them slightly more readable.

Thanks for the patch.
I wonder whether a --empty-fields={ignore,suppress} is a more general interface.

I wonder would it be a more complex approach, and more importantly, less intuitive? Quite frankly, I think it's easier to visualize the empty space. or the delimiters as a more general approach, becoming "squeezed". I think that visualizing the empty fields is harder, especially when the delimiter is a whitespace character.

This overlaps somewhat with the -w option in FreeBSD's cut,
which merges runs of whitespace, and which I was also considering adding.

After thinking a bit about it, how about having both "-m", from the patch I submitted, and "-w", which would behave differently than the FreeBSD's "-w"? Please, allow me to explain.

More specifically, our "-w" would simply "squeeze" all the whitespace in the input without forcing the delimiter to be whitespace. The "squeezing" would produce a whitespace character in the input, instead of whatever got "squeezed" there. That would be either the whitespace character specified as an optional value for the "-w" option, or it may by default produce a space wherever only spaces were "squeezed", or a tab wherever the "squeezed" whitespace contained at least one tab.

With both "-m" and "-w" options in place we'd end up with a quite versatile cut, which would cover what FreeBSD's cut does, and be able to do more. I'd be willing to implement the "-w" option as well.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]