qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/3] qemu-thread: introduce qemu-thread-commo


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v3 1/3] qemu-thread: introduce qemu-thread-common.[ch]
Date: Mon, 23 Apr 2018 13:19:10 +0800
User-agent: Mutt/1.9.1 (2017-09-22)

On Fri, Apr 20, 2018 at 01:07:34PM -0400, Emilio G. Cota wrote:
> On Fri, Apr 20, 2018 at 12:42:10 +0800, Peter Xu wrote:
> > Put all the shared qemu-thread implementations into these files.  The
> > header should be internal to qemu-thread but not for qemu-thread users.
> > 
> > Introduce some hooks correspondingly for the shared part.  Note that in
> > qemu_mutex_unlock_impl() we moved the call before unlock operation which
> > should make more sense.  And we don't need qemu_mutex_post_unlock() hook.
> > 
> > Currently the hooks only calls the tracepoints.
> > 
> > Signed-off-by: Peter Xu <address@hidden>
> (snip)
> > -    trace_qemu_mutex_lock(mutex, file, line);
> > -
> > +    qemu_mutex_pre_lock(mutex, file, line);
> >      err = pthread_mutex_lock(&mutex->lock);
> >      if (err)
> >          error_exit(err, __func__);
> > -
> > -    trace_qemu_mutex_locked(mutex, file, line);
> > +    qemu_mutex_post_lock(mutex, file, line);
> >  }
> 
> I see the value in consolidating these calls. However, having a separate
> object means that this adds two function calls to mutex_lock. This
> significantly reduces performance, even without --enable-debug-mutex:
> - Before:
> $ taskset -c 0 tests/atomic_add-bench -n 1 -m
> Parameters:
>  # of threads:      1
>  duration:          1
>  ops' range:        1024
> Results:
> Duration:            1 s
>  Throughput:         57.24 Mops/s
>  Throughput/thread:  57.24 Mops/s/thread
> 
> - After:
> $ taskset -c 0 tests/atomic_add-bench -n 1 -m
> Parameters:
>  # of threads:      1
>  duration:          1
>  ops' range:        1024
> Results:
> Duration:            1 s
>  Throughput:         49.22 Mops/s
>  Throughput/thread:  49.22 Mops/s/thread
> 
> So either inlines/macros should be used instead -- I'd prefer
> inlines but I'm not sure they'll work with the tracing calls.

Indeed, it's about 10% drop.  I never thought an extra call would
bring so much overhead, but after reading your patch I think I know -
the test is only about raw mutex locking, so the extra call will be
"amplified" comparing to real usages, where normally there can be much
more things to be done besides taking and releasing the lock.

But sure making it inline should be better, and your reasoning is
valid. Though I didn't see why it can't work with traces, I thought it
should work natually.  I'll see.

> 
> I think you should cherry-pick this patch[1] and add it to the
> series -- it'll let you make sure the series does not affect
> performance.

Sure!  I'll attach benchmark results in my next post with your tool.

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]