coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [head] wished an option to continue consuming the input after the sp


From: Thibault LE PAUL
Subject: Re: [head] wished an option to continue consuming the input after the specified number of lines has been read
Date: Tue, 16 Oct 2012 20:58:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.7) Gecko/20120922 Icedove/10.0.7

On 16/10/2012 19:25, Bob Proulx wrote:
Thibault LE PAUL wrote:
I wasn't clear enough.
My goal was to do different things on the first lines and the last
lines of same input, without using storage, thus using piped
processes.
Depending upon what you want to do I would do something like this
using sed to do the difference to either part.

   $ seq 1 10 | sed '1,3s/^/head /;7,10s/^/tail /'
   head 1
   head 2
   head 3
   4
   5
   6
   tail 7
   tail 8
   tail 9
   tail 10

Or without printing the skipped lines:

   $ seq 1 10 | sed -n '1,3s/^/head /p;7,9s/^/tail /p'
   head 1
   head 2
   head 3
   tail 7
   tail 8
   tail 9

Or awk:

   $ seq 1 10 | awk 'NR<=3{print "head ",$0} NR>7{print "tail ",$0}'
   head  1
   head  2
   head  3
   tail  8
   tail  9
   tail  10
It's difficult that way without knowing a priori the line numbers, if you want the tail -n2 equivalent. I suppose that tail is using something like a rotating line buffer.
I assume that any amount of input to the tee can be used?  Can I
simply 'echo foo>  /tmp/fifo1' and trigger your test case?  Please
say what input must be used.  If you don't say then we won't know.

What input are you providing to tee?  For use in this test case I
assume a few lines of input larger than 2+2=4 lines.  I will use the
command 'seq 1 7' to generate easy repeatable input.  I also like
spaces between debug strings so will add some spaces.

   seq 1 7 | tee /tmp/fifo1|tail -n2|sed 's/^/tail&/'

That way, the /tmp/fifo1 fifo propagates SIGPIPE ahead to tee as
soon as head has finished,
What?  There is a misunderstanding at this point.  This statement does
not make sense.

SIGPIPE occurs when a process writes to a closed pipe.  It is sent
from the kernel to the writing process.  The default action of SIGPIPE
is terminate the process.  See 'man 7 signal' for more details.

The 'tee' process is writing to the pipe.  When the last reader on the
pipe closes (usually by exiting) then all future writers will receive
a sigpipe signal which will terminate them.  This is normal behavior.

So, yes, tee is not able to read all of the input and write all of the
output to all of the output pipes from it.  But that is expected given
that one of the readers has exited.

cat /tmp/fifo1|head -n2|sed 's/^/head&/'&
That extra 'cat' process is going to confuse things.  It will buffer
input and write buffered output.  This will reblock the data in
confusing ways.  I prefer to remove it.  It isn't needed.
However (head -n2;cat >/dev/null) wiill read until EOF, that was the object of my XY problem :)
  head -n2<  /tmp/fifo1 | sed 's/^/head&/'&

But perhaps this is simply a smaller example from the larger problem
and the cat represents some other process?

rm /tmp/fifo1
mkfifo /tmp/fifo1
cat /tmp/fifo1|head -n2|sed 's/^/head&/'&
tee /tmp/fifo1|tail -n2|sed 's/^/tail&/'
It is easier to debug this by avoiding the backgrounding and running
this test in three terminal windows.  In one read from the pipe with
the "head" section.  In two run the tee section.  In three send input
to the fifo.  Doing so will make it more visible when the processes
are running and when they are exiting.  Doing so will show that the
head command pipeline is reading two lines and emitting them followed
by the tail task emitting two lines them.  But running them as you
have shown produces this output:

   $ head -n2<  fifo | sed 's/^/head&/'&
   [1] 14641
   $ seq 1 9 | tee fifo | tail -n2 | sed 's/^/tail&/'
   head 1
   head 2
   tail 8
   tail 9
   [1]+  Done                    head -n2<  fifo | sed 's/^/head&/'
   $

then tee stops, and tail doesn't read the expected last lines,
instead just the lines before tee aborts and EOF is read on
pipe. The effect is observable on large input, like
/usr/share/mysql/errmsg-utf8.txt
Yes.  I did this:

   $ head -n2<  fifo | sed 's/^/head&/'&
   [1] 14641
   $ tee fifo<  /usr/share/mysql/errmsg-utf8.txt | tail -n2 | sed 's/^/tail&/'

And I could see that tee exited due to the write on the fifo finishing
before the write to stdout and so the tail did not get all of the
file.  I consider that a normal behavior.  Yes, reading all of the
input and discarding it in the head process will allow tee to write
all of the output.  But that is a lot of extra data writing that is
wasteful and unused and simply thrown away and therefore I would avoid
doing it that way.
I agree. However, if you use temporary file storage, it's worse. You write it to disk instead of socket, and you read it from disk instead of socket. Even if you write twice on socket instead of once to disk, I think it's better. Further the performance point, you may not want to use storage that you are not sure to get available : you don't know how big is your input. By the way, you are sure to get available CPU.
Also since the background process is asynchronous the order of emitted
output isn't specified.  It is possible that the background process
would be scheduled later (kernel process scheduling) and then the
output of the two processes might appear in a different order.  It is
tickling a lot of possible problems.  Best to those avoid entirely.
Either :
1) add wait :

cat /tmp/fifo1|(head -n2;cat>/dev/null)|sed 's/^/head&/'&
cat /usr/share/mysql/errmsg-utf8.txt|tee /tmp/fifo1|tail -n2|(wait;sed 
's/^/tail&/')

2) be independent upon scheduling  :

cat /tmp/fifo1|(head -n2;cat>/dev/null)>  /tmp/head&
cat /usr/share/mysql/errmsg-utf8.txt|tee /tmp/fifo1|tail -n2>  /tmp/tail

The use of the term "abort" would usually mean 'man 3 abort', the
abort() system call.  That is the action that happens from various
signals.  But I think you are using it casually simpling meaning that
the program is exiting.  Read the man page for 'man 3 abort' and the
signals that cause an abort() to happen in 'man 7 signals' and then
please avoid using that word when we aren't talking about that event
so that it doesn't confuse us.  :-)
While writing it I felt guilty but lazy, Sorry :)
Thanks to be strict.

Also tried tee -i to ignore interrupts, but it is not the purpose of
this option I suppose. No effect in our case.
The tee -i option only ignores SIGINT meant to avoid Control-C
interrupts.

Whew!  So where are we?  I advise avoiding the background process
approach, for this particular case at least, as it isn't needed.  It
opens a large box of potential and real problems.
Finally, when we write it correctly, it works, doesn't it ? With very few code, we've solved Y.

However my initial case was to extract a sheet from a calc document, let the 4 first lines unsorted (table headers), sort the following lines by a column value, then catenate the result and transform the output into html table. And, for this case, it is better to use awk, of course, instead of head -n4 and tail -n+4. This is the X problem solved. You were right :D

Thibault



reply via email to

[Prev in Thread] Current Thread [Next in Thread]