[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: line buffering in pipes

From: Assaf Gordon
Subject: Re: line buffering in pipes
Date: Thu, 2 May 2019 13:40:47 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

Follow-up for completeness:

On 2019-05-02 1:14 p.m., Assaf Gordon wrote:
If you do need to worry about special characters in filenames (or file names with ':'), see file's "-print0" option in addition to "find -print0 | xargs -0".

This might not be trivial, so here's one solution to using "file" to
detect and act on file types, even on files with special characters (including ":" and new lines):

  find [DIRECTORY] -type f -print0 \
       | xargs -0r \
            file --raw --no-buffer --no-pad \
                 --mime-type --print0 --print0 \
       | sed -zn 'h;n;/application\/x-archive/{x;p}' \
       | xargs -0 -n1 echo == processing file:

Several things here:

1. Using "find -print0 | xargs -0" - that's well known.

2. Using "file ... -raw --print0 --print0" (must be used TWICE).
this tells "file" not to encode special characters as octal,
and to print a NUL after the file name
and a second NUL after the mime-type, with no other field separator
(e.g. no ":" is printed).

For example:

 $ touch $'hello\nworld .txt'
 $ file --mime-type --print0 --print0 h* | od -taz
0000000 h e l l o nl w o r l d sp . t x t > .txt< 0000020 nul i n o d e / x - e m p t y nul >.inode/x-empty.<

The output is now "filename<NUL>mime-type<NUL".

3. Using "sed -z" (must be GNU sed) - use NUL as line-terminator
instead of new-line.
Based on file's output (above), every odd line is a file name
and every even line is a mime type.

4. The sed program:
'h' reads the odd lines and keeps them in the hold buffer (file name).
'n' reads the next line (even lines, the mime type).
'/application\/x-archive/' is a regex match to check if the mime-type matches. If it does match, '{x;p}' fetches the content of the hold buffer (the filename) and 'p' prints it.

The result of the sed program is a NUL-terminated list of file names
whose mime-typed matched the regular expression.

5. Since it is a NUL-terminated list of file names,
we can feed it to "xargs -0" again and execute anything we want
on these files, safely.

Hope this helps,
 - assaf

reply via email to

[Prev in Thread] Current Thread [Next in Thread]