emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait_reading_process_ouput hangs in certain cases (w/ patches)


From: Eli Zaretskii
Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date: Tue, 07 Nov 2017 18:40:42 +0200

> From: Matthias Dahl <address@hidden>
> Cc: address@hidden
> Date: Tue, 7 Nov 2017 15:18:02 +0100
> 
> The amount of output read (in total) from a process is tracked via the
> following member variable of Lisp_Process:
> 
>   /* Byte-count for process output read from `infd'.  */
>   unsigned long infd_num_bytes_read;
> 
> That gives us at least 4 GiB worth of data that can be read back and
> tracked before it wraps around to 0.

If we want, we can make it EMACS_INT, which will give us as much as
Emacs can gobble.

> What might have happened now is that wait_proc->infd_num_bytes_read was
> already at the edge of the maximum representable value range and wrapped
> around during some recursive calls to wait_reading_process_output while
> we still have a value in initial_wait_proc_num_bytes_read that has not
> wrapped around. E.g.:
> 
>   wait_proc->infd_num_bytes_read = (2^32)-10
>   initial_wait_proc_num_bytes_read = wait_proc->infd_num_bytes_read
> 
>   ... recursive calls happen, output is read ...
> 
>   wait_proc->infd_num_bytes_read = 256
>   initial_wait_proc_num_bytes_read = (2^32)-10
> 
>   // this will be wrong
>   bytes_read = wait_proc->infd_num_bytes_read -
>                initial_wait_proc_num_bytes_read
> 
> That causes a problem -- even though this case is unlikely, it is still
> possible and I don't think it should be dismissed.

That should be very rare (barring application-level bugs), and in any
case it is no worse than just returning 1.  So I don't think we should
give up better support of 90% or even 99% of use cases due to the
other 10% or 1%.

> gnulib's integer overflow helpers won't help in this case, at all.

Those helpers are meant to avoid the overflow itself, thus preventing
"undefined behavior" when a signed integer overflows.  They cannot,
and shouldn't be expected to, fix the reason for the overflow -- that
cannot be helped when it happens.

> > I don't agree with this line of reasoning.  We are not developing a
> > library whose users will be familiar only with the formal interface
> > definition and nothing else.  We are talking about code that is being
> > read, studied, and used by Emacs hack^H^H^H^Hdevelopers all the time.
> 
> Even the C interface?

Especially in the C interface.

> It is my impression that the C parts of Emacs are usually what is
> being avoided at all costs.

It depends on who we are talking about.  Some of us (yours truly
included) work on the C level quite frequently.  Any relatively
radical new feature needs at least some C infrastructure.  I had my
share of writing code based on what I read in the function commentary
and some general common sense and familiarity with the internals, only
to find out later that they were incomplete or even prone to wrong
interpretation.  I'd like to minimize such occurrences as much as I
can.

> Besides that, I treated the C parts like implementation details and
> kept strictly to the documented interfaces, expecting no user to
> ever touch this.

But that's exactly the problem: we _don't_have_ documented interfaces
on the C level, at least not of the quality we would like to have.  Do
you really think that the commentary at the beginning of
wait_reading_process_output describes what the function does anywhere
close to completeness?  Far from it.

> > Having a non-trivial function whose behavior is hard to describe and
> > remember accurately makes using it error-prone, especially if the
> > deviant behavior happens only in some corner use case that is hard to
> > reproduce.
> 
> That is why it is important that the documentation and implementation of
> a function actually align.

Of course.  But it's hard to keep them in sync, partly because it's
incomplete to begin with, and that makes it not easy to recognize when
the commentary needs to change due to some code changes.  And the
result is that I quite frequently find comments that are blatantly
incorrect: mention parameters no longer there or fail to mention new
ones that were added, for example.

> In this case, no user should actually expect the function to return
> the number of bytes read, just a positive or negative number or
> zero... just like it is stated.

I'd actually say that the commentary is incomplete because it doesn't
say what that "positive" value means.  Returning an opaque value from
a function is not useful, unless you can pass it later to the same
function to get it to do something special.

> But, again, that might be a matter of perspective. For me, what is
> documented is like a contract.

IME with Emacs code, we are a long way from interfaces that are
documented on a level that could be considered a contract.  For
starters, we don't even say clearly enough which functions can throw
to top level, without which a contract is not really a contract, don't
you agree?

> > Sorry, you lost me half-way through this description.  I actually
> > meant to return only the increment of how many bytes were read since
> > the last call, but I don't see why you'd need more than one counter
> > that will be reset to zero once its value is consumed.
> 
> My suggestion was to have an in-flight bytes read counter, that will
> only track the number of bytes read while /any/ wait for that specific
> process is active.

Ah, I see.  I think this is an unnecessary complication.  The risk of
overflow, while it isn't zero, is too low to justify such measures and
the resulting complexity, IMO.

> > It's a bug that has been there "forever", and the fix is too risky to
> > have it in the middle of a pretest.  I don't think we understand well
> > enough all of its implications, and reading from subprocesses is a
> > very central and very delicate part of Emacs.  Sorry.
> 
> Ack. Even though this means that users running into this right now will
> have to wait quite a while for a fix in a stable release, which is a
> pity imho.

We can publish the patch here, and then people who really want this
fixed ASAP can simply apply the patch.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]