bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Read a fixed length of input each time


From: Neil R. Ormos
Subject: Re: Read a fixed length of input each time
Date: Tue, 23 Jun 2020 12:48:29 -0500 (CDT)

Andrew J. Schorr wrote:
> Neil R. Ormos wrote:

>> RS="................"

>> Then, each getline will place chunk-size
>> characters in RT, provided there are enough
>> characters available to match RS. [...]

>> [*1] I acknowledge the warnings from the
>> developers and their suggestions that reading
>> binary data can best be done with an extension
>> or some pre-processing step.  But those
>> solutions may not be available or uniform in
>> all environments where gawk is available.  So,
>> even if this RS-based method is not as good, it
>> might allow the user to write a relatively
>> portable program intended for several
>> heterogeneous environments.

> I'll bite -- what's the benefit of reading
> binary data like this? What do you do with it
> once you get it inside gawk? It's easy enough to
> add read and write functions in an extension
> library, but I've never understood the usage
> case. [...]

I've used this for a few different things.  I don't suggest these use cases 
justify any changes to gawk or extensions.

1.  Detecting file type.  Yes, I understand
    there's the file(1) utility, but file(1)'s
    behavior is not consistent across all plaforms
    and versions, at at least in the past, there
    was a significant delay between the time a
    particular new file type was seen in the wild
    and the availability of a file(1) (or magic
    entry) that could detect it.  For portably
    detecting among a tiny universe of possible
    file types, it can be easier in gawk to
    directly inspect the file's content than to
    deal with the output of file(1), especially
    when the user does not control when file(1) is
    updated.

2.  Extracting version information from Andoid APK
    files on systems where Android Asset Packaging
    Tool is not available.

3.  Detecting groups of files having common
    initial chunks of N bytes.  There are a few
    different applications for this. One is
    identifying probably essentially-duplicate
    media files--e.g., video or audio files that
    have the same substantive content and differ
    only in metadata placed near the end of the
    file.  Although there may be "better" ways to
    do it using the shell or common utilities, the
    function of those utilities can vary by
    platform, and if you will need to process some
    of the content of the file, orchestrating a
    shell pipeline may not be more convenient or
    efficient.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]