[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Filtering out process filters
From: |
Daniel Colascione |
Subject: |
Filtering out process filters |
Date: |
Sun, 01 Jun 2025 23:38:00 -0400 |
User-agent: |
mu4e 1.12.10; emacs 31.0.50 |
Emacs packages that parse output from a subprocess or network connection
usually install a process filter and process one string chunk at a time.
They should be using the C fast-path buffer insertion and
after-change-functions instead.
Process filters are slow (benchmark below), mostly due to GC pressure
created by consing all those strings and in part due to the parsing
contortions (e.g. in term.el) necessary to handle structured input
arriving in arbitrarily-split chunks.
jsonrpc, used by eglot, use a process filter that inserts the chunks
from its process filter into a buffer, parses the buffer using
json-parse-buffer. It could just have the Emacs core insert directly
into that buffer.
term does a ton of regex searches and string splits on its incoming
bytes. It could instead allocate an internal undecoded-bytes buffer,
have its process deposit bytes there, and then efficiently match
controls sequences in that buffer without having to copy them elsewhere.
insert-buffer-substring can copy regions between control sequences and
encoded characters without additional consing. Term could also use
looking-back at the end to detect incomplete sequences without
concatenation or multiple search passes.
Yes, jsonrpc and term do a bunch of things besides receiving characters,
but they're going to do those things whether or not they use
process filters.
Using buffers and after-change-functions not only makes code faster,
but simpler. Emacs has a richer set of inspection and manipulation
facilities for strings than it does buffers.
That said, performance isn't even close. On my machine:
Z$ emacs -Q -batch -l buffer-vs-filter.el
buffer+acf: 5728 MB/s
term: 308 MB/s
jsonrpc: 576 MB/s
That's with the default gc-cons-threshold. If I use 100MB instead, I get
Z$ emacs -Q -batch -l buffer-vs-filter.el
buffer+acf: 5273 MB/s
term: 1796 MB/s
jsonrpc: 2633 MB/s
If I use 10GB (so GC never happens), I get
Z$ emacs -Q -batch -l buffer-vs-filter.el
buffer+acf: 5218 MB/s
term: 3792 MB/s
jsonrpc: 4185 MB/s
Even without factoring GC into performance, the buffer approach wins. If you
do include
a realistic GC, it's a rout.
I am curious what the timing looks like on the MPS branch. I expect filters to
look much better there but still not quite match memcpying into a buffer.
We should update the manual and discourage new uses of process filters.
;; -*- lexical-binding: t -*-
(defconst modes '(buffer+acf term jsonrpc))
(defun benchmark-process-io (mode)
(let* ((mb 500)
(bytes (* mb 1024 1024))
(received 0)
(start nil)
(gc-cons-threshold
;; (* 1024 1024 1024 1024 10)
;; (* 1024 1024 100)
gc-cons-threshold
)
(record-recv (lambda (size)
(unless start (setq start (float-time)))
(incf received size)))
(buf (with-current-buffer (generate-new-buffer " *test*")
(set-buffer-multibyte nil)
;; Pre-allocate in small chunks to avoid huge string allocation
(let ((chunk-size (* 64 1024)) ; 64KB chunks
(remaining (* mb 1024 1024)))
(while (> remaining 0)
(let ((to-insert (min chunk-size remaining)))
(insert (make-string to-insert 0))
(setq remaining (- remaining to-insert)))))
(delete-region (point-min) (point-max))
(set-buffer-multibyte nil)
(when (eq mode 'buffer+acf)
(add-hook 'after-change-functions
(lambda (beg end _len) (funcall record-recv (- end beg)))
nil t))
(current-buffer)))
(process
(apply #'make-process
:name "test"
:noquery t
:command '("dd" "if=/dev/zero" "bs=1M")
:connection-type 'pipe
:coding 'binary
;; :buffer buf
(pcase-exhaustive mode
('buffer+acf `(:buffer ,buf))
('term
`(:filter ,(lambda (_proc string)
(funcall record-recv (length string))
;; term splits strings on semi-common
;; character like newlines, so copying the
;; filter input string once is a conservative
;; approximation of the work it does
(copy-sequence string))))
('jsonrpc
`(:filter ,(lambda (_proc string)
(funcall record-recv (length string))
;; jsonrpc inserts the string it gets into a
;; buffer before doing a bunch of other work,
;; so let's just do the insert as a converative
;; approximation of what it does
(when (buffer-live-p buf)
(with-current-buffer buf (insert
string))))))))))
(unwind-protect
(progn
(garbage-collect)
(while (< received bytes)
(accept-process-output process nil nil t))
(/ mb (- (float-time) start)))
(when (process-live-p process)
(kill-process process))
(kill-buffer buf))))
(byte-compile 'benchmark-process-io)
;; Warmup
(dolist (mode modes)
(benchmark-process-io mode))
;; Benchmark
(dolist (mode modes)
(message "%s: %.0f MB/s" mode (benchmark-process-io mode)))
- Filtering out process filters,
Daniel Colascione <=
- Re: Filtering out process filters, Eli Zaretskii, 2025/06/02
- Re: Filtering out process filters, Augusto Stoffel, 2025/06/02
- Re: Filtering out process filters, Daniel Colascione, 2025/06/02
- Re: Filtering out process filters, Augusto Stoffel, 2025/06/02
- Re: Filtering out process filters, Augusto Stoffel, 2025/06/02
- Re: Filtering out process filters, Daniel Colascione, 2025/06/02
- Re: Filtering out process filters, Augusto Stoffel, 2025/06/02
- Re: Filtering out process filters, Daniel Colascione, 2025/06/02
- Re: Filtering out process filters, Augusto Stoffel, 2025/06/03
- Re: Filtering out process filters, Gerd Möllmann, 2025/06/03