coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fifo unlimited buffer size?


From: Pádraig Brady
Subject: Re: fifo unlimited buffer size?
Date: Tue, 04 Dec 2012 16:16:55 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1

On 12/04/2012 03:46 PM, Peng Yu wrote:
On Tue, Dec 4, 2012 at 6:24 AM, Pádraig Brady <address@hidden> wrote:
tag 13075 + notabug
close 13075
thanks

On 12/04/2012 03:19 AM, Peng Yu wrote:

Hi,

I have the following script. When the number to the right of 'seq' is
large (as 100000 in the example), the script will hang. But when the
number is small (say 1000), the script can be finished correctly. I
suspect that the problem is that there is a limit on the buffer size
for fifo. Is it so? Is there a way to make the following script work
no matter how large the number is? Thanks!

~/linux/test/gnu/gnu/coreutils/mkfifo/tee$ cat main2.sh
#!/usr/bin/env bash

rm -rf a b c
mkfifo a b c
seq 100000 | tee a > b &
sort -k 1,1n a > c &
join -j 1 <(awk 'BEGIN{OFS="\t"; FS="\t"} {print $1, $1+10}' < c)
<(awk 'BEGIN{OFS="\t"; FS="\t"}{print $1, $1+20}' < b)


So this is problematic due to `sort`.
That's special as it needs to consume all its input before
producing any output. Therefore unless the buffers connecting
the other commands in || can consume the data, there will be a deadlock.

I can't parse "Therefore unless the buffers connecting the other
commands in || can consume the data... " What is '||'?

Sorry, parallel.

This version doesn't block for example as
the input is being generated asynchronously for the sort command.

#!/usr/bin/env bash
rm -rf a b c
mkfifo a b c
join -j 1 <(awk 'BEGIN{OFS="\t"; FS="\t"} {print $1, $1+10}' < c) \
<(awk 'BEGIN{OFS="\t"; FS="\t"}{print $1, $1+20}' < b) &
seq 100000 | sort -k 1,1n > c &
seq 100000 > b
wait

Obviously, if your input is expensive to generate,
then you'd be best copying to another file
and sorting that.

I should send the message to the regular mailing list. I have two
implicit requirements.

1. The input 'seq 100000' can not be run twice, it has to be called once.

If the input is expensive to generate,
then it would need to be copied.

2. There can not be intermediate files generated.

Ah :(

Given the above requirements, there is no solution?

Not one I can think of.

The generate question is, there is one input stream fans out to
multiple streams, which then are under some processing. Then these
processed streams converge to one program, which outputs one output
stream. This seems to be general use pattern. This pattern can be
nested arbitrarily, in which case, the above two requirements are
better held. Does this make sense?

I understand the structure, but the concurrent pipelines
need separate data sources (process or file copy), or otherwise
deadlock may happen as data overflows various buffers.
I suppose this could be encapsulated in tee(1) with non-blocking
writes and internal buffering, but that would just end up
being a data copy anyway, so I'm not sure it's warranted.

thanks,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]