emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Filtering out process filters


From: Eli Zaretskii
Subject: Re: Filtering out process filters
Date: Fri, 06 Jun 2025 16:35:39 +0300

> From: Gerd Möllmann <gerd.moellmann@gmail.com>
> Cc: Dmitry Gutov <dmitry@gutov.dev>,  dancol@dancol.org,
>  arstoffel@gmail.com,  emacs-devel@gnu.org
> Date: Fri, 06 Jun 2025 11:44:52 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > If you are sure it's consing, did you do the math?  How many strings
> > are consed in the benchmark?  According to my understanding, 8000,
> > assuming that each string is 64K (the default value of
> > read-process-output-max) and the total size of process output is
> > 800MB.  The slower methods seem to take about 0.25 sec, given the
> > throughput of 4000 MB/s (when GC is disabled).  So we are basically
> > saying that it takes 0.25 sec to cons 8000 strings.  Is that
> > reasonable?  Or did I mis-calculate?
> 
> Reading 500Mb with a gc-cons-threshold of 800K means 5 * 10^8 / 8 * 10^5
> = ca. 600 GCs per run. Plus the processing of the strings in Lisp.

I've run some benchmarks on MS-Windows.  With the default value of
gc-cons-threshold, the method which uses a filter function (and thus
conses strings from each chunk of subprocess output) is about 25%
slower than the buffer+acf method.  But if I set gc-cons-threshold to
a high value, so that no GC happens during the run, the
filter-function method is just 10% slower than buffer+acf.  This means
that consing strings is very fast, and most of the slowdown in these
benchmarks is probably caused by GC.

In my benchmark, the only processing of the incoming text was decoding
by UTF-8 and recording of the total bytes read.  I used "cat FILE" as
the subprocess, with FILE a large (several hundreds MB) file, instead
of 'dd' from /dev/zero, which cannot be usefully emulated on Windows.

My guess is that any non-trivial processing of the subprocess output
will make the advantage of the buffer+acf method even smaller.

I have no idea how come in Daniel's benchmarks the buffer+acf method
is so much faster.  Maybe it's because on Windows this is less
efficient (but note that I did time "cat FILE >nul" from the shell,
and found it to be only about 40% faster than the time it takes to
consume the same FILE in the benchmark).

So my conclusion from this is that one should prefer the buffer method
to the filter function only if that doesn't complicate the code too
much, because the gains are going to be relatively small, and don't
justify complicating the code too much.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]