guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Guile bugs


From: Linas Vepstas
Subject: Re: Guile bugs
Date: Tue, 19 Sep 2017 19:04:21 +0800

Hi Ludo,

On Fri, Sep 15, 2017 at 3:56 PM, Ludovic Courtès <address@hidden> wrote:

> Linas Vepstas <address@hidden> skribis:
>
> > On Mon, Sep 11, 2017 at 2:26 AM, Ludovic Courtès <address@hidden> wrote:
> >
> >> Hello,
> >>
> >> Linas Vepstas <address@hidden> skribis:
> >>
> >> > The stuff coming over the network sockets are bytes, not s-exps. Since
> >> none
> >> > of the bytes are ever zero, they are effectively C/C++ strings, and
> are
> >> > handled as such. These C strings are sent to  scm_eval_string()
> wrapped
> >> > by scm_c_catch().
> >>
> >> I don’t know to what extent that is applicable to your software, but my
> >> recommendation would be to treat that network socket as a Scheme port,
> >> pass it to ‘read’, and pass the result to ‘eval’ (as opposed to reading
> >> the whole string from C++ and passing it to ‘scm_eval_string’.)
> >>
> >
> > Why?  What advantage does this offer?
>
> It avoids copies and conversions, which is big deal if you deal with
> very big strings.
>
> > Its not clear that guile eval is smart enough to manage a network socket
> --
> > if the user starts a long-running process with intermittent prints, will
> it
> > send that to the socket?  What if the user hits cntrl-C in the middle of
> it
> > all? What if the code that came over the socket happened to throw an
> > exception?
>
> These are important considerations, but it’s not eval’s business IMO.
> Instead, I suggest building your own protocol around it, and having a
> way in that protocol to report both exceptions and normal returns.
>

Well, yes, this is exactly what I've done.

This conversation is frustrating: either piping read to eval is the right
thing to do, in which case eval must handle network connections correctly,
or else piping read to eval is the wrong thing to do.  You can't have it
both ways.


> > I've had to deal with all of these issues in the past, and have a stable
> > code base; but if I had to start all over again, its not clear that these
> > issues have gone away.  I mean, eval was designed to eval -- it was not
> > designed to support multi-threaded, concurrent network operations, right?
>
> Right.
>
> > To support my point: the default guile network REPL server is painfully
> > slow, and frequently crashes/hangs. It works well enough to do some demos
> > but is not stable enough for production use ... if its just read+eval,
> that
> > might explain why its unstable.
>
> I’ve never noticed slowness of the REPL server, nor crashes.
>

You are probably using it only very lightly, and not in a high-load systems
environment. It runs maybe 5x slower than my current guile shell server,
and it is very definitely unstable and crashy.

In my environment, I am sending it approximately from one up to twenty
scheme expressions every second, with a new socket opened for each scheme
expression. This goes on for days or weeks. I am using a custom guile
server written in C++, which accepts network connections, reads bytes from
the network, and sends them to scm_eval_string(). It mostly works fine,
with a couple of problems: there seems to be a pointless utf8-utf32
conversion, which started this email chain.

There also seems to be some sort of very rare race condition in the
compiler that leads to corruption inside of guile. I believe that this can
be triggered by starting twenty threads (for example) and then compiling
and running fairly short programs in each thread. By "fairly short" I mean
"less than 5-10 lines of code", and which compute and return answers in
less than a tenth of a second. Doing this for a few hours eventually causes
guile to hang in a spinloop, trying to read some guile-internal structure
that has invalid data in it. I opened a bug report for this a month or two
ago, but did not supply an easy-to-trigger test case.

I tried replacing my guile network server with the REPL shell, and
discovered that the REPL server is much much slower; I don't recall exactly
how I measured the 5x number, but that was from an actual measurement.   I
don't think the REPL server can handle 20 network connections per second.

I remember hypothesizing that guile was being re-initialized for every
network connection. Obviously, this is wasteful and slow.

Entering guile is a large bottleneck.  I once measured this, and I think it
takes approximately 200 microseconds to enter guile, which implies a
maximum limit of about 5K guile evaluations per second, when using the
simple-minded design of having the C code enter guile each time before
evaluation an expression.  By contrast, python (cython) can be entered in
10 or 20 microseconds.

The test case here is how many times per second can one eval some simple
expression, e.g. (+ 2 2) or the equivalent of that in python.

The solution for the heavy cost of entering guile is to create a pool for a
few dozen threads, enter guile in each, and then never exit -- just return
threads to the thread pool, when the eval is completed, and the thread is
no longer needed.  This cuts the  200 microseconds overhead to zero, and
what one is then left with is the cost of calling scm_eval_string().  I did
measure that too, but I don't recall the numbers.


> That said, if you run a REPL server in a separate thread and mutate the
> global state of the program, you could possibly crash it—no wonders
> here.
>

Yes, well, I would call that a bug! It feels like you are trying to blame
me for a guile bug -- its not my fault that it crashes!

I did not look very carefully, and don't recall what the stack traces
looked like, but I got the impression that there were race conditions in
guile init, and how it interacted with the sockets.

Likewise, the REPL server is meant to be used for debugging on
> localhost.  If you talk to a REPL server over the network with high
> latency, it’s going to be slow, not surprisingly.
>

The performance problem was not the  latency, it was the number of
connections it could accept.

I'll say it again: I have a different network server that is 5x faster than
the REPL server, and it works, it is stable.

For reasons completely unrelated to guile, I would like to declare my
network server deprecated and obsolete.  However, I cannot do this, because
the guile REPL server is not yet good enough to be an adequate replacement.

--linas

>
> So yes, I find the REPL server to be a really pleasant tool when
> debugging an application locally, but that’s all it is—it’s not a remote
> procedure call framework or anything like that.
>

> Thanks,
> Ludo’.
>



-- 
*"The problem is not that artificial intelligence will get too smart and
take over the world," computer scientist Pedro Domingos writes, "the
problem is that it's too stupid and already has." *


reply via email to

[Prev in Thread] Current Thread [Next in Thread]