coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fifo unlimited buffer size?


From: Peng Yu
Subject: Re: fifo unlimited buffer size?
Date: Tue, 4 Dec 2012 09:46:17 -0600

On Tue, Dec 4, 2012 at 6:24 AM, Pádraig Brady <address@hidden> wrote:
> tag 13075 + notabug
> close 13075
> thanks
>
> On 12/04/2012 03:19 AM, Peng Yu wrote:
>>
>> Hi,
>>
>> I have the following script. When the number to the right of 'seq' is
>> large (as 100000 in the example), the script will hang. But when the
>> number is small (say 1000), the script can be finished correctly. I
>> suspect that the problem is that there is a limit on the buffer size
>> for fifo. Is it so? Is there a way to make the following script work
>> no matter how large the number is? Thanks!
>>
>> ~/linux/test/gnu/gnu/coreutils/mkfifo/tee$ cat main2.sh
>> #!/usr/bin/env bash
>>
>> rm -rf a b c
>> mkfifo a b c
>> seq 100000 | tee a > b &
>> sort -k 1,1n a > c &
>> join -j 1 <(awk 'BEGIN{OFS="\t"; FS="\t"} {print $1, $1+10}' < c)
>> <(awk 'BEGIN{OFS="\t"; FS="\t"}{print $1, $1+20}' < b)
>
>
> So this is problematic due to `sort`.
> That's special as it needs to consume all its input before
> producing any output. Therefore unless the buffers connecting
> the other commands in || can consume the data, there will be a deadlock.

I can't parse "Therefore unless the buffers connecting the other
commands in || can consume the data... " What is '||'?

>
> This version doesn't block for example as
> the input is being generated asynchronously for the sort command.
>
> #!/usr/bin/env bash
> rm -rf a b c
> mkfifo a b c
> join -j 1 <(awk 'BEGIN{OFS="\t"; FS="\t"} {print $1, $1+10}' < c) \
> <(awk 'BEGIN{OFS="\t"; FS="\t"}{print $1, $1+20}' < b) &
> seq 100000 | sort -k 1,1n > c &
> seq 100000 > b
> wait
>
> Obviously, if your input is expensive to generate,
> then you'd be best copying to another file
> and sorting that.

I should send the message to the regular mailing list. I have two
implicit requirements.

1. The input 'seq 100000' can not be run twice, it has to be called once.
2. There can not be intermediate files generated.

Given the above requirements, there is no solution?

The generate question is, there is one input stream fans out to
multiple streams, which then are under some processing. Then these
processed streams converge to one program, which outputs one output
stream. This seems to be general use pattern. This pattern can be
nested arbitrarily, in which case, the above two requirements are
better held. Does this make sense?

-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]