[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A new 'transpose' coreutil

From: Pádraig Brady
Subject: Re: A new 'transpose' coreutil
Date: Thu, 17 Dec 2015 11:00:18 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 17/12/15 05:06, Ryan wrote:
> Hi everybody,
> I looked through the rejected feature requests and did not find this.
> I and others I know have often wished for an efficient 'transpose'
> utility that can handle files of arbitrary size. I am aware of
> solutions involving 'awk' and 'perl', but they are neither efficient
> in time nor in space. There is also a 'coreutils-like' transpose
> available on github, but it is also suboptimal in time and space,
> often failing to perform well enough to be usable for moderately large
> files) and is not consistent with other coreutils utilities in several
> ways.
> I would be interested in submitting a 'tranpose' tool, which I would
> finish coding (I've already started for myself) and assign copyright
> to the FSF. I am exerting significant effort to model all the
> particulars of my code on existing coreutils code and various related
> style recommendations. Features of my proposed tool include:
> - works on rectangular matrices of arbitrary dimension (mxn)
> - specifiable single-character separator (-t 'SEP')
> - specifiable buffer (-S num_bytes_with_suffix) a la GNU sort
> - correct and efficient function regardless of values S, m, or n
> - specifiable temp directories (-T dir) a la GNU sort
> - specifiable (de)compressor (--compress-program) a la GNU sort
> - correct and efficient operation regardless of field length or length
> uniformity
> - correct support for input from or output to pipes
> - standard handling of all other relevant flags
> Please let me know if there is any interest. I would be very happy to
> submit a draft for consideration if so. Finally, thanks to all the
> people who have contributed the fantastic tools that already exist.


Yes generally any program that needs to consume all input
before generating output, would benefit from the "sort" options above.
It would be good to share as much of that code as possible.
I'm guessing such a tool would benefit from --skip-rows and
--skip-cols options, to handle row/col labels?
Please begin the copyright assignment process at least.

The main question I would have is:
how useful this functionality is _at the command line_?
I've needed it a couple of times, but for small data sets.
Could you provide some background on your use case to
help determine if this appropriate to maintain as a general tool.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]