[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Read a fixed length of input each time
From: |
Neil R. Ormos |
Subject: |
Re: Read a fixed length of input each time |
Date: |
Tue, 23 Jun 2020 12:48:29 -0500 (CDT) |
Andrew J. Schorr wrote:
> Neil R. Ormos wrote:
>> RS="................"
>> Then, each getline will place chunk-size
>> characters in RT, provided there are enough
>> characters available to match RS. [...]
>> [*1] I acknowledge the warnings from the
>> developers and their suggestions that reading
>> binary data can best be done with an extension
>> or some pre-processing step. But those
>> solutions may not be available or uniform in
>> all environments where gawk is available. So,
>> even if this RS-based method is not as good, it
>> might allow the user to write a relatively
>> portable program intended for several
>> heterogeneous environments.
> I'll bite -- what's the benefit of reading
> binary data like this? What do you do with it
> once you get it inside gawk? It's easy enough to
> add read and write functions in an extension
> library, but I've never understood the usage
> case. [...]
I've used this for a few different things. I don't suggest these use cases
justify any changes to gawk or extensions.
1. Detecting file type. Yes, I understand
there's the file(1) utility, but file(1)'s
behavior is not consistent across all plaforms
and versions, at at least in the past, there
was a significant delay between the time a
particular new file type was seen in the wild
and the availability of a file(1) (or magic
entry) that could detect it. For portably
detecting among a tiny universe of possible
file types, it can be easier in gawk to
directly inspect the file's content than to
deal with the output of file(1), especially
when the user does not control when file(1) is
updated.
2. Extracting version information from Andoid APK
files on systems where Android Asset Packaging
Tool is not available.
3. Detecting groups of files having common
initial chunks of N bytes. There are a few
different applications for this. One is
identifying probably essentially-duplicate
media files--e.g., video or audio files that
have the same substantive content and differ
only in metadata placed near the end of the
file. Although there may be "better" ways to
do it using the shell or common utilities, the
function of those utilities can vary by
platform, and if you will need to process some
of the content of the file, orchestrating a
shell pipeline may not be more convenient or
efficient.
- Re: Read a fixed length of input each time, (continued)
- Re: Read a fixed length of input each time, arnold, 2020/06/23
- Re: Read a fixed length of input each time, Neil R. Ormos, 2020/06/23
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Peng Yu, 2020/06/23
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Peng Yu, 2020/06/23
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Peng Yu, 2020/06/23
- RE: Read a fixed length of input each time, Tom Gray, 2020/06/23
- Re: Read a fixed length of input each time,
Neil R. Ormos <=
- Re: Read a fixed length of input each time, Andrew J. Schorr, 2020/06/23
- Re: Read a fixed length of input each time, Neil R. Ormos, 2020/06/23