[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Any idea about what makes Emacs slow reading on pipes?

From: David Kastrup
Subject: Re: Any idea about what makes Emacs slow reading on pipes?
Date: 18 May 2003 01:39:14 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

address@hidden (Kim F. Storm) writes:

> address@hidden (David Kastrup) writes:
> > No, that isn't it.  I can pipe the stuff into |cat >/dev/null
> > instead of Emacs, and it will be finished in a whiffy.  Emacs must
> > be using some sort of system call that _stops_ dd most of the time
> > from being able to so another write to the pipe until the Linux
> > scheduler has run again (which it does at the rate of about 100
> > per second, almost exactly the rate at which characters arrive).
> That is because emacs is using a pty by default for process i/o (I
> guess that is to avoid buffering output from the writer, so emacs
> will get a more steady flow of input to process, e.g. for compiler
> output), and it sets the pty to non-blocking mode when reading from
> the pty (to avoid having emacs hanging on the read system call until
> it has successfully read all 1024 bytes).

Actually, this isn't it.  Pipes behave just the same, and I now know
why.  You won't see this problem on an SMP machine.  The writing
process makes a _small_ write, and Emacs, waiting on the receiving end
of it with select, gets woken up immediately and gets to process the
small write.  But while Emacs is processing the small write, the
writing process does not get any processing time at all, so it won't
produce anything new until Emacs either goes back to sleep, or is
preempted because of taking too long (for example garbage collecting).
The moment Emacs is preempted and not waiting on the select system
call, the writing process is able to keep stuffing bytes into the pipe
for the duration of a full time slice without being scheduled away for
Emacs again.  So the low latency of Linux and the quasi-synchronous
(instead of timeslice-based) context switch on select starve the
writing process on single-CPU systems and consequently cause Emacs to
get most of the stuff only in small packets.

So what can we do about this?

a) use only SMP systems
b) pester Linux developers to be less eager with context switches on
   select.  I am trying this course now, but it will of course take
   time to register and other OS might have similar problems
c) make a super-efficient path for process output arriving in tiny
   chunks, since this is what we will be force-fed most of the time.

> > > What happens if you try the following version, which forces
> > > start-process to use a pipe rather than a pty:
> > 
> > Nothing much, I said already.
> Well, it could have an effect for your ACUTeX case, since it would
> then be writing to a pipe (using buffered i/o) rather than to a pty
> (typically using non-buffered i/o).

I already said that the scheduling seems to be pretty much the same.
The pipe never fills up.

> Come on yourself...  This illustrates that if you use
>         "tex ... | dd bs=1k"
> instead of just
>         "tex ..."
> in acutex, you'll probably end up with much faster performance
> since you will get buffered output from tex.

Possibly.  But when we get some prompt output and a feedback is
expected, dd will not flush.

> But you should get the same effect by setting
> process-connection-type to nil.

No, since a pipe does not wait for being full before switching on the
reader.  dd, in contrast, will not write anything out before a read
of 1k has been satisfied.

> > The dd was explicitly there to force small writes, since that is
> > the most common situation when talking with applications through a
> > pipe.
> I would expect that for a pty, not for a pipe!

As I explained, it does not make a noticeable difference for a pipe (I
tried) since the moment something gets into the pipe, the writing
process is starved of any CPU time needed for letting the pipe get
any fuller.

> > And that is what makes Emacs dog slow when compared with XEmacs
> > when running things like TeX from within AUCTeX.
> Maybe Xemacs is using pipes rather than ptys by default (or never
> use pty's at all)...

Pipes don't play into it.  Either it does something other than
select, or its very_small_input_chunk_size path is much more
efficient than Emacs'.

> > We are having here an operating system with a single CPU, and we
> > switch processes here for every byte mainly, and the producing process
> > gets no CPU time for making new bytes while the consuming process is
> > still busy processing the byte it got.  Anyhow, I _do_ have the
> > additional suspicion that there is pretty much one "tick" of delay for
> > every of byte involved and that the CPU is mostly idling for that
> > reason: a 600MHz PIII should even with the full processing overhead
> > due to catering for every single character on its own be able to
> > handle more than 100 characters per second.

Should, but I now have better statistics.

> Could you try
>         M-: (setq process-connection-type nil)
> before you run your acutex process, and see if that makes a
> difference.

Since you refuse to believe me...


Function Name       Call Count  Elapsed Time  Average Time
==================  ==========  ============  ============
TeX-command-filter  725         0.1004559999  0.0001385599


Function Name       Call Count  Elapsed Time  Average Time
==================  ==========  ============  ============
TeX-command-filter  684         0.0818009999  0.0001195921

dd obs=16k & pty

Function Name       Call Count  Elapsed Time  Average Time
==================  ==========  ============  ============
TeX-command-filter  196         0.0215799999  0.0001101020

dd obs=16k & pipe

Function Name       Call Count  Elapsed Time  Average Time
==================  ==========  ============  ============
TeX-command-filter  180         0.0209769999  0.0001165388

And that does not take the context switch time and a lot of other
stuff you can see scroll by into account.  As you can see, the
difference pty/pipe is pretty much irrelevant.  The blocking makes the
difference (the difference is quite more prominent than what appears

Here is my program polished somewhat: you can run it either with
M-x make-test RET
or with

emacs -batch -q -no-site-file -l testio.el -eval "(progn(make-test t 
'kill-emacs)(sit-for 100000))"

from the command line.

The relevant part of the output will be something like:
Time:  0
 641 blocks with size    1
   1 blocks with size 1023
   3 blocks with size 1024
Time:  1
 640 blocks with size    1
Time:  2
 449 blocks with size    1
   1 blocks with size 1023
   3 blocks with size 1024
Time:  3
1827 blocks with size    1
Time:  4
1132 blocks with size    1
   1 blocks with size  881
   6 blocks with size 1023
  20 blocks with size 1024

Attachment: testio.el
Description: I/O benchmark

Note that on multiprocessor machines, the bad effect does not occur.
This is really a single processor problem, and for those operating
systems that do an immediate context switch from a task writing to a
pty or pipe to a task having a select system call on it.

At the very least, we need a well-optimized path for small process
output sizes to deal with that problem.

David Kastrup, Kriemhildstr. 15, 44793 Bochum

reply via email to

[Prev in Thread] Current Thread [Next in Thread]