coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: line buffering in pipes


From: Assaf Gordon
Subject: Re: line buffering in pipes
Date: Thu, 2 May 2019 14:56:38 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

Hello,

On 2019-05-02 1:51 p.m., Assaf Gordon wrote:
On 2019-05-02 1:22 p.m., Egmont Koblinger wrote:
On Thu, May 2, 2019 at 9:14 PM Assaf Gordon <address@hidden> wrote:
[...]

I don't think this is robust enough. If many "stdbuf -oL file"
processes decide to produce a reasonably sized output pretty much at
the same time, it might still suddenly clog the pipe and result in a
short write in one of them. Or am I missing something?


That's exactly why I wrote "assuming the lines are short enough".

[...]

I will have to dig further for exact details and justification.

More technical details:

From POSIX:

https://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_05.html#tag_02_05 :

  2.5 Standard I/O Streams
  [...] When a stream is ``line buffered'', bytes are intended to be
  transmitted as a block when a newline byte is encountered.
  [...]
  Furthermore, bytes are intended to be transmitted as a block when a
  buffer is filled,
  [...]
  Support for these characteristics is implementation-defined,


http://pubs.opengroup.org/onlinepubs/9699919799/functions/setvbuf.html :

    setvbuf - assign buffering to a stream
    [...]
    Applications should note that many implementations only provide line
    buffering on input from terminal devices.


This doesn't guarantee anything, but it helps in understanding what
could be expected:

For implementations that support this behavior, line-buffered stream
output (using fputc(3), fprintf(3), fwrite(3) etc) should accumulate
until a "\n" character is encountered, and then the entire buffer should
be written "as one" to the kernel using something like write(2).

It leaves some unanswered questions:
1. which implementations support it?
2. what is the size of the libc internal line-buffer?
3. does the kernel write(2) guarantee anything?

1.
To the best of my understanding, glibc and musl-libc both
implement line-buffering for all streams, not just terminals.
Same for Net/Open/Free-BSDs.

2.
In Glibc, BUFSIZ = 8192.
In Musl,  BUFSIZ = 1024.
For BSDs, BUFSIZ = 1024.

3.
POSIX requires that a write(2) to a FIFO is atomic
if the amount of data written is less than PIPE_BUF:
https://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

   An attempt to write to a pipe or FIFO has several major
   characteristics:

       Atomic/non-atomic: A write is atomic if the whole amount written
       in one operation is not interleaved with data from any other
       process. This is useful when there are multiple writers sending
       data to a single reader. Applications need to know how large a
       write request can be expected to be performed atomically. This
       maximum is called {PIPE_BUF}. This volume of IEEE Std 1003.1-2001
       does not say whether write requests for more than {PIPE_BUF}
       bytes are atomic, but requires that writes of {PIPE_BUF} or fewer
       bytes shall be atomic.

POSIX defines PIPE_BUF's minimum value as _POSIX_PIPE_BUF=512.

For linux, the default is PIPE_BUF=4096.
For BSDs: PIPE_BUF=512.

----

Given all the above, is the following "robust enough" ?

  find [DIRECTORY] | xargs -P99 stdbuf -oL [PROG1] | [PROG2]

I think the answer to is "it's complicated :)"

On standard Linux+Glibc (the ubiquitous GNU/Linux),
I think line-buffering of up to 4096 is perfectly safe
(i.e. there should not be any line-mangling or interleaving from
multiple processes as long as [PROG1] does not produce lines longer than 4096).

For Linux+musl - 1024 bytes is the maximum.
For BSDs - 512 bytes.

So my solution of "stdbuf" would *not* be portably robust for all
systems. For GNU/Linux it should work fine.

For other systems, lines longer than MIN(PIPE_BUF,BUFSIZ)
will break atomicity, and output will be interleaved/mangled.

---

Of course, there are more caveats:
'stdbuf' only affects programs which use libc's stream I/O
(meaning fputc/fputs/fprintf). Programs which bypass libc
and call read(2)/write(2) directly are not affected by it.

Also, some programs set their own buffering (e.g. tee(1)).
Those will also override 'stdbuf' settings.

---

Hope this helps.
I welcome corrections or clarifications if I got something wrong.

regards,
 - assaf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]