[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-users] Re: How are exceptions propagated? - details on the race
From: |
F. Wittenberger |
Subject: |
[Chicken-users] Re: How are exceptions propagated? - details on the race |
Date: |
Wed, 20 Aug 2008 23:59:25 +0200 |
Am Mittwoch, den 20.08.2008, 08:29 +0200 schrieb felix winkelmann:
> On Tue, Aug 12, 2008 at 4:39 PM, Jörg F. Wittenberger
> <address@hidden> wrote:
> > Am Donnerstag, den 07.08.2008, 23:05 +0200 schrieb Jörg F. Wittenberger:
> >> Hi all,
> >>
> >> this is once again a slightly complicated test case. Again I understand
> >> all calls for a simpler version. Just I have a hard time to find one.
> >
> > I've been able to track this one down to chicken not handling bad
> > filedescriptors in ##sys#unblock-threads-for-i/o .
Since Elf expressed some doubt upon the existence of the race - which I
can understand, since race conditions are usually hard to reproduce
reliably, thus there's a good chance that my test case did not exhibit
the problem on his machine - I guess it might be good for the review, if
I comment comment on some details.
It's actually not that hard to understand the problem - that is, if we
start from the presumption that the runtime system ought to be robust to
some misuse. After all, we have file-close at our disposal and even
without it would be all too easy to get a bad fd, at least when using
libraries.
----
So once there's a thread waiting on a fd, which became bad in the
meantime, what's going on in the scheduler?
(define (##sys#unblock-threads-for-i/o)
(dbg "fd-list: " ##sys#fd-list)
(let* ([to? (pair? ##sys#timeout-list)]
[rq? (pair? ##sys#ready-queue-head)]
[n (##sys#fdset-select-timeout ; we use FD_SETSIZE, but really should
use max fd
(or rq? to?)
(if (and to? (not rq?)) ; no thread was unblocked by timeout,
so wait
(let* ([tmo1 (caar ##sys#timeout-list)]
[now (##sys#fudge 16)])
(fxmax 0 (- tmo1 now)) )
0) ) ] ) ; otherwise immediate timeout.
(dbg n " fds ready")
If there's a bad fd, we shall see "-1 fds ready", the return code from
select(2).
(cond [(eq? -1 n)
(cond
(error-bad-file
(set! ##sys#fd-list
(let loop ((l ##sys#fd-list))
(cond
((null? l) l)
((##sys#handle-bad-fd! (car l))
(##sys#fdset-clear (caar l))
;; This is supposed to be a rare case, catch
;; them one by one, not all at once
;; (commented out here).
;; (loop (cdr l))
(cdr l))
(else (cons (car l) (loop (cdr l)))))))
(##sys#fdset-restore)
(##sys#unblock-threads-for-i/o))
If this above case is not there, we switch to the primordial thread.
(else (##sys#force-primordial))) ]
Now let's delay the question, whether the "else" case is handled
gracefully with the change.
(define (##sys#force-primordial)
(dbg "primordial thread forced due to interrupt")
;(display "switching to primordial thread\n" debug-port)
(##sys#thread-unblock! ##sys#primordial-thread) )
That's actually all it takes.
----
It all depends on the state of the primordial, there is no special
provision in ##sys#force-primordial. In my case it was waiting in a
thread-join!:
(define thread-join!
(lambda (thread . timeout)
(##sys#check-structure thread 'thread 'thread-join!)
(let* ((limit (and (pair? timeout) (##sys#compute-time-limit
(##sys#slot timeout 0))))
(rest (and (pair? timeout) (##sys#slot timeout 1)))
(tosupplied (and rest (pair? rest)))
(toval (and tosupplied (##sys#slot rest 0))) )
(##sys#call-with-current-continuation
(lambda (return)
(let ([ct ##sys#current-thread])
(when limit (##sys#thread-block-for-timeout! ct limit))
(##sys#setslot
ct 1
(lambda ()
So it's going to continue here:
(case (##sys#slot thread 3)
[(dead) (apply return (##sys#slot thread 2))]
[(terminated)
(return
(##sys#signal
(##sys#make-structure
'condition '(uncaught-exception)
(list '(uncaught-exception . reason) (##sys#slot thread
7)) ) ) ) ]
and since the thread is neither dead not terminated...
[else
(return
(if tosupplied
toval
(##sys#signal
(##sys#make-structure 'condition
'(join-timeout-exception)
'())) ) ) ] ) ) )
the above case applies. In fact I was lucky: if it had been waiting on
a mutex for a precious resource, it would have entered the critical
section. Wherever it was, the primordial is just unblocked.
----
Now let's come back to the question, whether the "else" case is handled
correct. Probably not. I have only a Linux here right now, but man 2
select gives:
ERRORS
EBADF An invalid file descriptor was given in one of the sets.
(Perhaps a file descriptor that was
already closed, or one on which an error has occurred.)
EINTR A signal was caught.
EINVAL nfds is negative or the value contained within timeout is
invalid.
ENOMEM unable to allocate memory for internal tables.
I believe none of them should simply activate the primordial.
EBADF is handled now.
For EINTR I have yet to understand how the signals are propagated, but
I'm afraid we need some code here too.
EINVAL would be a grave programming error in the scheduler. Maybe it's
better to give a message and die here. Similar for ENOMEM, though this
is not chickens fault.
----
The same consideration should be applied to ##sys#schedule, where the
variable "eintr" controls ##sys#force-primordial . At the other hand,
signals are handled somehow, so probably I have overlooked something.
(Felix?)
----
Now to the really interesting question: what should be done, once a
defunct fd is found? Since ##sys#fd-list contains fd's and threads
only, the simple solution is (here a better version than in my last
message):
(define (##sys#handle-bad-fd! e)
(dbg "check bad" e)
(let ((bad ((foreign-lambda*
bool ((integer fd))
"struct stat buf;"
"int i = ( (fstat(fd, &buf) == -1 && errno == EBADF) ? 1 : 0);"
"return(i);")
(car e))))
(if bad
(for-each
(lambda (thread)
(thread-signal!
thread
(##sys#make-structure
'condition
'(exn i/o) ;; better? '(exn i/o net)
(list '(exn . message) "bad file descriptor"
'(exn . arguments) (car e)
'(exn . location) thread) )))
(cdr e)))
bad))
thread-signal! them a condition. In fact it might be better if we could
close the appropriate ports behind. But that's easily getting messy:
the fd-list would now have to hold both, the fd and the port. Lot's of
changes ahead. I'd abstain.
> > The attached patch uses fstat(2) to check the fd-list.
> >
> > Unfortunately I have no idea how well this is going to be supported
> > under windows.
>
> Not very well, but perhaps it can at least be supported under
> UNIXish environments.
Is there no good way on windows to tell a bad fd from a good one?
Anything will do. If worst comes to worst, we could repeat the
select(2) with just the fd in question.
best regards
/Jörg