[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tac feature suggestion
Re: tac feature suggestion
Tue, 03 Jun 2014 19:27:56 +0100
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2
On 06/03/2014 06:13 PM, braultbaron wrote:
> I have a feature suggestion for tac.
> This would be an option:
> -c --bytes=N
> where N would mean "ignore the last N bytes of the file".
> In other words,
>> tac --bytes=<N> <file>
> would produce the same output as:
>> head --bytes=-<N> <file> | tac
> The fundamental difference is time and space efficiency.
Yes it's often useful to add new options/functionality
to an existing tool to improve efficiency.
> In the second case, a copy of the file is made
> in memory, which may be a problem if
> the file is huge (e.g. a few gigabytes).
So the issue is with non seekable inputs in general.
> Instead, we could add this option
> to allow tac to fseek accordingly
> at the beginning of the process,
> thus using only constant (i.e. BUFSIZ) memory,
> and with no delay between tac invocation
> and the beginning of the production of the output.
> A slightly more complicated example:
> tac --offset=1000000 <file> | head --bytes=1000
> head --bytes=-k <file> | tac | head --bytes=1000
> Assume the file takes 3 gigabytes.
> In the first case, 1k bytes are read, and the output
> is produced immediatly.
> In the second case, 2G bytes are read and stored in memory,
> and then, the output is produced.
Note tac first persists non seekable input to a temp file,
and so will have bounded memory usage, but yes it
will have initial overhead in the file copy.
> The modifications in tac.c look rather straightforward.
> If needed, I am willing to do it, even if
> I am convinced this would take a few minutes
> for an original author of the program.
A change of this size would need a copyright assignment.
> Maybe a justification for this request would be welcome.
> (if not, you can skip this)
> I am currently programming a vim-like text editor
> that operates on files that are too big to fit
> in memory, and, for that project, I make
> an intensive use of head, tail, tac and grep.
> The basic idea is to rely on mature, simple
> programs as much as possible.
Your project sounds useful and interesting.
Now the tac usage seems a little unusual.
Since we're dealing in bytes and tac doesn't transform
the size of the data, perhaps we could use dd to skip
the required data in the file before presenting to tac?
So if you had a 3GB file and you wanted to process the last 1GB:
(dd bs=1 count=0 skip=2GB && tac) < file | head --bytes=1000
Though I see that tac doesn't seem to support that currently
as it seeks the whole way from end of file back to 0.
We could store the initial offset and seek only back to there
without any interface changes.