[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: grep/sed and some strange patterns/inputs
From: |
Assaf Gordon |
Subject: |
Re: grep/sed and some strange patterns/inputs |
Date: |
Wed, 27 Jul 2016 00:54:56 -0400 |
Hello,
> On Jul 26, 2016, at 23:09, Christoph Anton Mitterer <address@hidden> wrote:
>
> I've always had the impression that ^ and $ were the end/begin anchor
> of the current pattern, and since e.g. grep/sed work normally in terms
> of lines the start/end of lines.
[...]
> What I found a bit strange is that e.g.:
> printf '' | sed 's/^/foo/'
>
> doesn't produce foo and that e.g.
> printf '' | grep '^'
> don't match.
>
> Why? Or better said, which part of POSIX mandates this? Or is it simply
> "no stdin, nothing happens"?
Exactly!
The command "printf '' " sends no output, and it is equivalent to redirection
from /dev/null,
which means sed immediately receives an end-of-file marker and does not try to
execute any command.
printf with *any* output (with newlines or not) will cause 'sed' and 'grep' to
read some characters, and then to try to execute commands or match patterns on
the input.
This can be demonstrated using 'strace' on GNU/Linux machines.
The commands below run 'sed' with both printf and /dev/null, and 'strace' will
report the 'read' system-call.
The first 'read(3,...)' can be ignored, it is the OS reading a shared library.
The second 'read(0,...)' is the interesting one:
The first "0" indicates reading from STDIN.
sed tries to read upto 4096 bytes from STDIN, and the returned value is zero
(following the equal sign).
Zero value indicates an end-of-file - meaning there is no input at all,
and sed will not try to execute any commands.
Notice that printf with an empty string and /dev/null result in the same
behavior:
$ strace -e read sed 's/^/foo/' < /dev/null
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"...,
832) = 832
read(0, "", 4096) = 0
+++ exited with 0 +++
$ printf '' | strace -e read sed 's/^/foo/'
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"...,
832) = 832
read(0, "", 4096) = 0
+++ exited with 0 +++
However, if even one character is provided in STDIN,
The read() function will return it, and sed will try to execute the
pattern/command on the input:
$ printf 'a' | strace -e read sed 's/^/foo/'
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"...,
832) = 832
read(0, "a", 4096) = 1
read(0, "", 4096) = 0
read(0, "", 4096) = 0
fooa+++ exited with 0 +
One possible source of confusion is 'echo' vs 'printf': echo by default
automatically adds a newline.
Thus, the command:
echo '' | sed 's/^/foo/'
does work as expected because there is some input (one byte: a newline).
Where as this command does not, since there is no input at all:
printf '' | sed 's/^/foo/'
'grep' follows the same principle, and can be examined using:
printf '' | strace -e read grep -q '.' && echo match || echo no-match
strace -e read grep -q '.' </dev/null && echo match || echo no-match
printf 'a' | strace -e read grep -q '.' && echo match || echo no-match
Others can perhaps elaborate regarding POSIX standard.
From a cursory look, it seems the wording for 'grep' and 'sed' imply the output
is tied to having input,
while there is mandatory default output for 'wc' regardless of input (
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html#tag_20_154_10
).
Hope this helps,
- assaf
P.S.
A minor nitpick: coreutils is a separate project from grep or sed.
grep questions should be sent to address@hidden , and
sed questions should be sent to address@hidden .