gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NSRunLoop Tidying


From: Richard Frith-Macdonald
Subject: Re: NSRunLoop Tidying
Date: Sun, 10 Oct 2010 12:43:36 +0100

On 9 Oct 2010, at 12:19, David Chisnall wrote:

> On 8 Oct 2010, at 17:14, Richard Frith-Macdonald wrote:
> 
>> Actually the current implementation uses an unsorted list as depending on 
>> the ordering introduced a subtle bug ... a timer can be present in more than 
>> one run loop, and/or in the same loop in more than one mode.  
> 
> From the NSTimer documentation (CFRunLoopTimerRef contains a similar line):
> 
> 'A timer object can be registered in only one run loop at a time, although it 
> can be added to multiple run loop modes within that run loop.'
> 
> Therefore, the second case is the only one that should apply.  

That's good ... perhaps it changed or I misread it.  If a timer is restricted 
to a single loop, we can avoid the need for locking (since a loop is restricted 
to a single thread) , which makes keeping an ordered list much easier to do 
efficiently.

>> If it fires in one loop and changes its next fire date (ie is a repeating 
>> timer), that automatically breaks the ordering for the other loops/modes 
>> that it exists in.  Of course, what we really want is probably something 
>> more complex to keep track of all the loops/modes the timer is in, and tell 
>> them that they need to re-order their list of timers ... if we did that we 
>> could depend on the order in the list.
>> The API demands that we keep track of the timers some way (so we can return 
>> the correct value from the -limitDateForMode: method), but it doesn't have 
>> to be an ordered list if a better alternative is available (though I can't 
>> really think of one).
> 
> I was thinking that it would be simpler to just keep an unsorted list and 
> scan it when one of these methods is called.  It's O(n) in terms of the 
> number of timers, but hopefully these methods are not called very often (and 
> n is relatively small - I doubt many runloops have hundreds of timers 
> associated with them).

That's what the current code does ... and it's been fine for most applications 
(because most only have a small number of timers).  However, I recently had to 
rewrite one application's timer handling for precisely this reason ... when 
scaling to handle large volumes of traffic it was grinding to a halt because it 
was searching a huge list of timers in each loop iteration ... the code used to 
have good scalable performance when we used an ordered list but was no longer 
workable for the unsorted list. So, while the current code works for most case, 
there are certainly ones where it's performance is not good enough.  Unless the 
operating system supports  efficient lookups to find the earliest timer in a 
run loop mode (and I don't believe it does), we really need to maintain that 
list ourselves.  In fact in terms of scalability of the code, the current 
unordered list seems to be the main bottleneck ... I get reasonable performance 
with thousands of file descriptors, but not with thousands of timers.  I'd like 
to have even better performance with huge numbers of descriptors (and using the 
epoll API ought to provide that, I guess kqueue probably would too), but the 
current performance is good enough for the vast majority of applications.

> 
>>> This is required with traditional select() and poll(), but Win32 has 
>>> SetTimer, Linux has timerfd(), and *BSD has kevent(), all of which allow 
>>> you to schedule timer events and wait on them just like fd events.
>> 
>> I don't really see how that can help since the API explicitly separates 
>> timer firing (done in -limitDateForMode:) and I/O event handling ... so 
>> having timers and I/O events use the same mechanism in the operating system 
>> does not mean that they can be done at the same place in the code or with a 
>> single system call.  So using things like SetTimer() and timerfd() may just 
>> add complexity.
> 
> No.  Timers and timeouts are different in the high-level API.  
> -limitDateForMode: is an exception.  It is possible to implement the entire 
> API on top of this, but that requires doing something in userspace that the 
> kernel already has code for.  It is significantly simpler to just schedule 
> the timers and sleep.  

I can't see how.  The API does not allow timers to fire except as part of the 
runloop ... so you are always waiting for I/O (except in the rare case where a 
loop contains no input sources) at the same time as waiting for a timer to fire 
... so using two codepaths (one to sleep waiting for a timer but no I/O events, 
and one to poll/select for both I/O events and timers) is clearly more complex 
than using a single one.... you have twice the code to maintain.  

> The most common way of using a runloop is either -run (no timeout), or 
> -runUntilDate: / -runMode:beforeDate:, which have an explicit timeout that is 
> orthogonal to the timers.

> As these are currently implemented, we:
> 
> 1) Look through the timers list to find the next timer.
> 2) Find whether the timer will fire before the timeout.
> 3) Set the minimum as the timeout.
> 4) Call down into the kernel.
> 5) If the timeout occurred, determine whether it was due to a timer firing or 
> due to the user-specified timeout elapsing.
> 
> In contrast, with kqueue timers, win32 timers, or timerfd, we need to:
> 
> 1) Call down to the kernel with the user-specified timeout.
> 
> This is vastly simpler.

I see what you mean, though I think you've overstated the case a bit since you 
are omitting loads of stuff to do with managing the timers, and not comparing 
like with like (eg 4/5 (system call and branch on result) in the first instance 
are just counted as 1 operation in the second instance).  But clearly there is 
the possibility of a simpler (and faster) codepath there.   Adding this sort of 
thing will of course make the code a little more complex and less maintainable 
overall (since you still need the existing code or something similar for POSIX 
platforms), but it will get you something more efficient on some modern 
platforms in most circumstances ... though when you have lots of timers firing, 
an implementation with an ordered list of timers might be just as efficient 
since it runs in user space apart from a single call to get the current time.

>  This is especially true when you consider multiple run loop iterations.  If 
> timers change during the loop, then we need to recalculate the timer / 
> timeout minimum with the first approach, while with the second we just need 
> to modify a single timer in the kernel.
> On vaguely modern platforms, this means that entering a run loop is much 
> simpler, meaning that we spend a lot more time sleeping in userspace waiting 
> for the kernel to wake us up, which helps preserve battery life as it means 
> more time the CPU can spend in low-power mode.

So what you are talking about here is sacrificing a little 
maintainability/simplicity  for a little performance ... probably a good 
trade-off, but not obviously so since the majority of apps don't need that 
performance boost (they spend almost all their time either waiting for an event 
or handling it, with a tiny portion of time spent in the runloop code itsself). 
 I happen to write server code where a very high throughput is important ... so 
I'm probably more sympathetic to the idea than most people.

> 
>> I agree with "Dr. H. Nikolaus Schaller" <address@hidden> ... that it would 
>> make most sense to start by implementing the CFRunLoop API, and then 
>> re-implement NSRunLoop on top of it (it's quite hard to do the other way 
>> round).
> 
> CFRunLoop is quite a messy API and is actually not sufficiently expressive 
> for NSRunLoop.  For example, you can add Mach ports as sources to a 
> CFRunLoop, but the API does not provide a mechanism for specifying whether 
> you want read or write events (presumably it only generates data-available?), 
> and it does not provide an API for adding file descriptors, rather than Mach 
> ports, so you'd have to modify it slightly for all non-Darwin platforms.

Yes, you would need CFRunLoop+ to implement NSRunLoop, and you would need 
NSRunLoop+ to implement CFRunLoop ... but it does look easier and more 
efficient to do it by extending CFRunLoop.  

>> Also, you need to put together *LOTS* to testcases in the testsuite to 
>> ensure that the implementation behaves exactly like OSX does ... this is 
>> because NSRunLoop is a really key part of so much of the system and very 
>> subtle/minor changes in behavior can have really nasty effects on 
>> applications.  It's really not good enough to have an implementation which 
>> does what the documentation says, we have to mimic actual OSX behavior 
>> pretty closely.
> 
> Agreed.
> 
>> My guess is that your idea of re-implementing timer handling in a platform 
>> specific way is actually a  recipe for a less maintainable and more complex 
>> codebase, since the timeout behavior is largely mandated by the API and I 
>> can't currently see how platform specific APIs will help... 
> 
> Timers are not timeouts.  They are completely orthogonal in the API, it's 
> only an accident of implementation that we treat them as being the same.
> 
>> but maybe you have some ideas you didn't mention here.  In general it's a 
>> good idea (for maintainability) to keep platform specific code to a minimum, 
>> so for instance I wouldn't use PostThreadMessage()  as it's probably no more 
>> efficient than the current code using SetEvent(), and would demand changes 
>> to NSThread to hold additional thread data and a message queue.
> 
> Why would it need NSThread to be modified? 

To send a message to a thread, you need to pass the windows thread-id as a 
parameter to the call ... so of course you must know that ID, which means that 
NSThread must store it so that it can get to it when you try to perform a 
selector on a particular thread.  That much is clearly needed.  A quick look at 
the documentation says that a message queue must be created for the thread too 
... but that's probably fairly trivial (a call to some windows function when 
the thread is created).  I'm not saying there's any reason we can't use 
PostThreadMessage(), just that it would take more work to code/change, and make 
the codebase slightly less maintainable.  I generally believe we should keep 
things simple and avoid platform specific code unless there is good reason to 
make things more complex (ie where we have to do so for exact OSX 
compatibility, or we want to do so for substantial performance improvement).

>> On the other hand, the kqueue API is supposed to be significantly more 
>> efficient than poll/select ... so using that on BSD and the similar epoll 
>> API on Linux would be good.
> 
> Using kqueue() to emulate poll() does not provide much of a performance 
> advantage.  The advantage comes from actually using the features available 
> with the new APIs, not simply making the existing kludge use a tiny subset of 
> the new APIs.

I don't know kqueue, but I thought it was supposed to be like epoll, and that 
should offer significant performance advantages when a very high number of 
events are to be monitored.
At the moment, each time we call select/poll we must build a data structure to 
describe the events we are polling for ... which is quite expensive (and dwarfs 
the overhead of handling timers separately from I/O events).
With epoll we could avoid that, since the data structure would be held in the 
kernel and we would modify it only when adding/removing a descriptor to the set 
of those we are interested in.  So adding/removing a descriptor would be a bit 
more expensive, but usually that's pretty rare so performance would improve.  
The obvious case where this would be great is some sort of server process which 
is listening for data from thousands of clients ... each time round the loop 
it's probably only dealing with input from one client, so why should it need to 
build new list of (say) two thousand descriptors to be polled?

> 
>> So, what would be great is to:
>> 1. Implement the CFRunLoop
>> 2. Keep platform specific code to a minimum (though make use of it where it 
>> really helps performance) for maintainability
>> 3. Have testcases to check that it behaves like OSX on all platforms
>> 4. Finally, re-implement NSRunLoop on top of CFRunLoop
> 
> It is not possible to implement NSRunLoop on top of CFRunLoop using only the 
> public APIs.  It should, however, be possible to create a thin 
> platform-specific layer which permits implementation of both NSRunLoop and 
> CFRunLoop.  On OS X 10.6, libdispatch is used to implement both.  

Yes, well if you want to implement a libdispatch clone and build on top of that 
I wouldn't complain ... sounds like fun.

When you rewrote thread handling, I think we got a really big benefit ... you 
ware able to make the code simpler (more maintainable) and more efficient 
because you could work to a single thread API for all platforms.  Yes there 
were teething problems and extra work involved, but not too much and the result 
was a definite improvement.

When you rewrote NSNumber the result was much less successful because the new 
code, while smaller, is not actually much/any easier to understand (except to 
you perhaps), and there were enough glitches to iron out for OSX compatibility 
that overall benefit is hard to judge.

I'd like any NSRunLoop rewrite to be more like the NSThread rewrite, but the 
big problem is that there is no single API to use, so whatever you do needs to 
maintain something like the existing code for posix and add lots more code for 
different modern platforms.  This inherently means more code and makes things 
less maintainable, so you need to look for *real* improvements in performance 
(eg scaling to many thousands of descriptors and/or timers) and/or 
power/flexibility (eg adding CFRunLoop and perhaps libdispatch) without 
breaking any of the existing API of course :-)







reply via email to

[Prev in Thread] Current Thread [Next in Thread]