[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RFE: enable buffering on null-terminated data
From: |
Carl Edquist |
Subject: |
Re: RFE: enable buffering on null-terminated data |
Date: |
Thu, 14 Mar 2024 09:15:58 -0500 (CDT) |
On Mon, 11 Mar 2024, Zachary Santer wrote:
On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist <edquist@cs.wisc.edu>
wrote:
(In my coprocess management library, I effectively run every coproc
with --output=L by default, by eval'ing the output of 'env -i stdbuf
-oL env', because most of the time for a coprocess, that's whats
wanted/necessary.)
Surrounded by 'set -a' and 'set +a', I guess? Now that's interesting.
Ah, no - I use the 'VAR=VAL command line' syntax so that it's specific to
the command (it's not left exported to the shell).
Effectively the coprocess commands are run with
LD_PRELOAD=... _STDBUF_O=L command line
This allow running shell functions for the command line, which will all
get the desired stdbuf behavior. Because you can't pass a shell function
(within the context of the current shell) as the command to stdbuf.
As far as I can tell, the stdbuf tool sets LD_PRELOAD (to point to
libstdbuf.so) and your custom buffering options in _STDBUF_{I,O,E}, in the
environment for the program it runs. The double-env thing there is just a
way to cleanly get exactly the env vars that stdbuf sets. The values
don't change, but since they are an implementation detail of stdbuf, it's
a bit more portable to grab the values this way rather than hard code
them. This is done only once per shell session to extract the values, and
save them to a private variable, and then they are used for the command
line as show above.
Of course, if "command line" starts with "stdbuf --output=0" or whatever,
that will override the new line-buffered default.
You can definitely export it to your shell though, either with 'set -a'
like you said, or with the export command. After that everything you run
should get line-buffered stdio by default.
I just added that to a script I have that prints lines output by another
command that it runs, generally a build script, to the command line, but
updating the same line over and over again. I want to see if it updates
more continuously like that.
So, a lot of times build scripts run a bunch of individual commands.
Each of those commands has an implied flush when it terminates, so you
will get the output from each of them promptly (as each command
completes), even without using stdbuf.
Where things get sloppy is if you add some stuff in a pipeline after your
build script, which results in things getting block-buffered along the
way:
$ ./build.sh | sed s/what/ever/ | tee build.log
And there you will definitely see a difference.
sloppy () {
for x in {1..10}; do sleep .2; echo $x; done |
sed s/^/:::/ | cat
}
{
echo before:
sloppy
echo
export $(env -i stdbuf -oL env)
echo after:
sloppy
}
Yeah, there's really no way to break what I'm doing into a standard
pipeline.
I admit I'm curious what you're up to :)
Of course, using line-buffered or unbuffered output in this situation
makes no sense. Where it might be useful in a pipeline is when an
earlier command in a pipeline might only print things occasionally, and
you want those things transformed and printed to the command line
immediately.
Right ... And in that case, losing the performance benefit of a larger
block buffer is a smaller price to pay.
My assumption is that line-buffering through setbuf(3) was implemented
for printing to the command line, so its availability to stdbuf(1) is
just a useful side effect.
Right, stdbuf(1) leverages setbuf(3).
setbuf(3) tweaks the buffering behavior of stdio streams (stdin, stdout,
stderr, and anything else you open with, eg, fopen(3)). It's not really
limited to terminal applications, but yeah it makes it easier to ensure
that your calls to printf(3) actually get output after each line (whether
that's to a file or a pipe or a tty), without having to call an explicit
fflush(3) of stdout every time.
stdbuf(1) sets LD_PRELOAD to libstdbuf.so for your program, causing it to
call setbuf(3) at program startup based on the values of _STDBUF_* in the
environment (which stdbuf(1) also sets).
(That's my read of it anyway.)
In the BUGS section in the man page for stdbuf(1), we see: On GLIBC
platforms, specifying a buffer size, i.e., using fully buffered mode
will result in undefined operation.
Eheh xD
Oh, I imagine "undefined operation" means something more like
"unspecified" here. stdbuf(1) uses setbuf(3), so the behavior you'll get
should be whatever the setbuf(3) from the libc on your system does.
I think all this means is that the C/POSIX standards are a bit loose about
what is required of setbuf(3) when a buffer size is specified, and there
is room in the standard for it to be interpreted as only a hint.
If I'm not mistaken, then buffer modes other than 0 and L don't actually
work. Maybe I should count my blessings here. I don't know what's going
on in the background that would explain glibc not supporting any of
that, or stdbuf(1) implementing features that aren't supported on the
vast majority of systems where it will be installed.
Hey try it right?
Works for me (on glibc-2.23)
$ for s in 8k 16k 32k 1M; do
echo ::: $s :::
{ stdbuf -o$s strace -ewrite tr 1 2
} < /dev/zero 2>&1 > /dev/null | head -3
echo
done
::: 8k :::
write(1, "\0\0\0\0\0\0\0\0"..., 8192) = 8192
write(1, "\0\0\0\0\0\0\0\0"..., 8192) = 8192
write(1, "\0\0\0\0\0\0\0\0"..., 8192) = 8192
::: 16k :::
write(1, "\0\0\0\0\0\0\0\0"..., 16384) = 16384
write(1, "\0\0\0\0\0\0\0\0"..., 16384) = 16384
write(1, "\0\0\0\0\0\0\0\0"..., 16384) = 16384
::: 32k :::
write(1, "\0\0\0\0\0\0\0\0"..., 32768) = 32768
write(1, "\0\0\0\0\0\0\0\0"..., 32768) = 32768
write(1, "\0\0\0\0\0\0\0\0"..., 32768) = 32768
::: 1M :::
write(1, "\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(1, "\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(1, "\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
It may just be that nobody has actually had a real need for it.
(Yet?)
I imagine if anybody has, they just set --output=0 and moved on. Bash
scripts aren't the fastest thing in the world, anyway.
Ouch. Ouch. Ouuuuch. :)
While that's true if you're talking about bash itself doing the actual
computation and data processing, the main work of the shell is making it
easy to set up pipelines for other (very fast) programs to pass their data
around.
The stdbuf tool is not meant for the shell! It's meant for those very
fast programs that the shell stands up.
Using stdbuf to tweak a very fast program, causing it to output more often
at newlines over pipes rather than at block boundaries, does slow down
those programs somewhat. But as we've discussed, this is necessary for
certain pipelines that have two-way communication (including coprocesses),
or in general any time you want the output immediately.
What may not be obvious is that the shell does not need to get involved
with writing input for a coprocess or reading its output - the shell can
start other (very fast) programs with input/output redirected to/from the
coprocess pipes to do that processing.
My point though earlier was that a null-terminated record buffering mode,
as useful as it sounds on the surface (for null-terminated paths), may
actually be something _nobody_ has ever actually needed for an actual (not
contrived) workflow.
But then again I say "Yet?" - because, never say never.
Happy line-buffering :)
Carl
- stdbuf feature request - line buffering but for null-terminated data, Zachary Santer, 2024/03/09
- Re: stdbuf feature request - line buffering but for null-terminated data, Pádraig Brady, 2024/03/10
- RFE: enable buffering on null-terminated data, Zachary Santer, 2024/03/10
- Re: RFE: enable buffering on null-terminated data, Carl Edquist, 2024/03/10
- Re: RFE: enable buffering on null-terminated data, Zachary Santer, 2024/03/10
- Re: RFE: enable buffering on null-terminated data, Carl Edquist, 2024/03/11
- Re: RFE: enable buffering on null-terminated data, Zachary Santer, 2024/03/11
- Re: RFE: enable buffering on null-terminated data,
Carl Edquist <=
- Re: RFE: enable buffering on null-terminated data, Zachary Santer, 2024/03/17
- Re: RFE: enable buffering on null-terminated data, Kaz Kylheku, 2024/03/19
- Re: RFE: enable buffering on null-terminated data, Zachary Santer, 2024/03/19
- Re: RFE: enable buffering on null-terminated data, Carl Edquist, 2024/03/20
Re: stdbuf feature request - line buffering but for null-terminated data, Kaz Kylheku, 2024/03/12
Re: stdbuf feature request - line buffering but for null-terminated data, Kaz Kylheku, 2024/03/12