qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization


From: Greg Kurz
Subject: Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization
Date: Wed, 11 Mar 2020 17:14:08 +0100

On Wed, 11 Mar 2020 02:18:04 +0100
Christian Schoenebeck <address@hidden> wrote:

> On Dienstag, 10. März 2020 19:33:36 CET Greg Kurz wrote:
> > > This patch is also too big for my preference, but I don't see a viable way
> > > to split it further into separate patches. I already separated all the
> > > patches I could. If you have suggestions, very much appreciated!
> > 
> > Well, the patch could be split in two or three parts at least:
> > 
> > (1) introduce the new function that reads multiple entries in codir.c
> > 
> > (2) use it from 9p.c
> > 
> > (3) remove unused stuff if anything remains
> > 
> > This doesn't seem to change much but the smaller diffstats
> > for each individual patch make them less scary :) and with
> > (1) applied only it is easier to compare what the old code
> > in 9p.c and the new one in codir.c do.
> > 
> > > The reason for this is that in order to fix these issues with current
> > > T_readdir implementation, it requires to separate what's currently one
> > > task
> > > (i.e. one function) into two separate tasks (i.e. two functions). There is
> > > no sensible way to do that.
> > 
> > Yeah, I won't ask for finer grain.
> 
> Me confused. Does that mean your split suggestion was just theoretical, or do 
> you need it?
> 

I need it and I won't ask for more splitting. Promised ! :)

> > > But don't be too scared about this patch; if you look just at the diff
> > > directly then yes, the diff is not very human readable. However if you
> > > apply the patch and look at the resulting code, you'll notice that there
> > > are actually only 2 functions (v9fs_do_readdir() in 9p.c and
> > > do_readdir_lowlat() in codir.c) which require careful reviewing and that
> > > their resulting code is actually very, very straight forward, and not
> > > long either.
> > 
> > These are personal opinions. Careful reviewing can take a lot of time :)
> 
> Well, depends on what precisely you mean with opinion. :) That this patch 
> actually reduces code complexity is a fact (less branches, less locks, less 
> dispatches, less side effects, less error/cleanup handlers). These are 
> objectively measurable quantities.
> 

You're likely right but that wasn't my point.

> But yes, nevertheless reviewing it costs time.
> 

That is my point: even if the resulting code is a lot better than the
old one in several aspects, it is still a non-trivial change to review
on my side.

> > > Current code on master is much more tricky and error prone due to the huge
> > > amount of potential branches, individual error/cleanup handlers, high
> > > amount of thread dispatching and much more. In the patched version all
> > > these code complexities and error sources are removed.
> > 
> > Come on :) The 9pfs code has been a can of worms from the beginning.
> > It produced more than the average amount of security-related bugs,
> > and most sadly, due to the overall lack of interest, it bitrotted
> > and missed a lot of cool improvements like an appropriate support of
> > unlinked files, QOM-ification of fsdev, conversion of proxy fsdev to
> > vhost-user, hot plug/unplug support, live migration support and
> > "much more"... The performance aspect of things is a full time job
> 
> No, the performance issues are actually very managable in case of 9pfs.
> I already addressed readdir with this patch (by far the worst performance 

They're very manageable if someone cares and spends time. Thanks again
for doing this.

> issue), and then there would just be one more severe performance issue: 
> walkdir.
> 

That doesn't surprise me. It is a big function that may end up doing a
lot of dispatching.

> My intention is not to squeeze out the last fractional percent of performance 
> for 9pfs, but you certainly agree that a simple "ls" blocking for more than 1 
> second is something that should be fixed, and fortunately the amount of 

I never observed such timeouts with my personal toy use of 9p but
you did and this motivated you to step in, which is very welcome.

> changes involved are far less than I originally feared they would.
> 

Fortunately is the word :)

> > I never had the opportunity to even consider. So yes, your changes
> > are likely beneficial but the code base is still extremely fragile
> > and sensible to huge changes... not breaking things that happen
> > to work, even in a sub-optimal way, is essentially what I care for
> > these days.
> 
> And I think I'm also very careful not breaking anything. I carefully consider 
> what to touch and what not. I wrote test cases and I am actively testing my 
> changes with real installations and snapshots as well.
> 

Everyone thinks he's also very careful :)

> I think the cause of disagreements we have are solely our use cases of 9pfs: 
> your personal use case does not seem to require any performance 
> considerations 
> or multi-user aspects, whereas my use case requires at least some minimum 
> performance grade for utilizing 9pfs for server applications.
> 

Your point about the personal use case is right indeed but our disagreements,
if any, aren't uniquely related to that. It's more about maintainership and
available time really. I'm 100% benevolent "Odd fixer" now and I just try
to avoid being forced into fixing things after my PR is merged. If you want
to go faster, then you're welcome to upgrade to maintainer and send PRs.
This would make sense since you care for 9p, you showed a good understanding
of the code and you provided beneficial contributions so far :)

> > > > Oh, so we'd still have the current implementation being used, even
> > > > with this patch applied... This would be okay for a RFC patch but
> > > > in the end I'd really like to merge something that also converts
> > > > v9fs_do_readdir_with_stat().
> > > 
> > > Yes, I know, but I would not recommend mixing these things at this point,
> > > because it would be an entire effort on its own.
> > > 
> > > v9fs_do_readdir_with_stat() is used for 9P2000.u, while v9fs_do_readdir()
> > > is used for 9P2000.L. They're behaving very differently, so it would not
> > > only require me to update v9fs_do_readdir_with_stat() and v9fs_read(), I
> > > would also need to write their own test cases (plural, since there are
> > > none at all yet) and benchmarks, and of course somebody what need to
> > > review all that additional amount of code, and who would actually test
> > > it? I am actively testing my 9P2000.L changes, but I am not actually
> > > using 9P2000.u, so I could not guarantee for the same quality.
> > > 
> > > Considering the current commitment situation regarding 9pfs, we can only
> > > bring things forward with compromises. Hence I suggest I concentrate on
> > > fixing the worst performance issues on 9P2000.L side first, and later on
> > > if there are really people interested in seeing these improvements on
> > > 9P2000.u side and willing to at least help out with testing, then I can
> > > definitely also adjust 9P2000.u side accordingly as well.
> > > 
> > > But to also make this clear: this patch 10 is not introducing any
> > > behaviour
> > > change for 9P2000.u side.
> > 
> > Ok, fair enough to leave 9p2000.U behind for now but I had to ask :)
> > /me is even wondering if we should start deprecating 9p2000.U since
> > most clients can use 9p2000.L now...
> 
> I probably wouldn't. On macOS for instance there's only 9p2000.u. In the end 

Hmm... the only thing I've heard about is:

https://github.com/benavento/mac9p

and unless I'm missing something, this seems to only have a socket based
transport... the server in QEMU is for virtio-9p and Xen 9p devices only.
Unless mac9p seriously plans to implement a driver for either of those,
I'm still not convinced it is worth keeping .U around... and BTW, our
deprecation policy imposes a 2 QEMU versions cooling period before
actually removing the code. This would leave ample time for someone
to speak up.

> 9p2000.L is Linux specific. By deprecating 9p2000.u that would mean a shift 
> towards only supporting Linux guests.
> 
> > > Mmm... maybe later on, as a cleanup patch or something. Current version is
> > > tested. I would like to avoid to make things more complicated than they
> > > already are (i.e. accidental introduction of some bug just due to this).
> > 
> > It seems we might agree on not breaking things that work ;-)
> 
> Of course! All the things I work on 9pfs are just fixes. But that's probably 
> where we start to disagree. :)
> 

:)

> > > > > > +static int do_readdir_lowlat(V9fsPDU *pdu, V9fsFidState *fidp,
> > > > > > +                             struct V9fsDirEnt **entries,
> > > > > > +                             int32_t maxsize, bool dostat)
> > > > > > +{
> > > > > > +    V9fsState *s = pdu->s;
> > > > > > +    V9fsString name;
> > > > > > +    int len, err = 0;
> > > > > > +    int32_t size = 0;
> > > > > > +    off_t saved_dir_pos;
> > > > > > +    struct dirent *dent;
> > > > > > +    struct V9fsDirEnt *e = NULL;
> > > > > > +    V9fsPath path;
> > > > > > +    struct stat stbuf;
> > > > > > +
> > > > > > +    *entries = NULL;
> > > > > > +    v9fs_path_init(&path);
> > > > > > +
> > > > > > +    /*
> > > > > > +     * TODO: Here should be a warn_report_once() if lock failed.
> > > > > > +     *
> > > > > > +     * With a good 9p client we should not get into concurrency
> > > > > > here,
> > > > > > +     * because a good client would not use the same fid for
> > > > > > concurrent
> > > > > > +     * requests. We do the lock here for safety reasons though.
> > > > > > However
> > > > > > +     * the client would then suffer performance issues, so better
> > > > > > log
> > > > > > that
> > > > > > +     * issue here.
> > > > > > +     */
> > > > 
> > > > This should probably be done with qemu_mutex_trylock() in a loop
> > > > directly
> > > > in v9fs_readdir_lock().
> > > 
> > > No definitely not in the loop. That's intentionally *one* lock *outside*
> > > of
> > 
> > Not sure we're talking about the same loop here...
> > 
> > I'm just suggesting that v9fs_readdir_lock() could be something like:
> > 
> > static inline void v9fs_readdir_lock(V9fsDir *dir)
> > {
> >     while (qemu_mutex_trylock(&dir->readdir_mutex)) {
> >         warn_report_once("blah");
> >     }
> > }
> 
> Ah yeah, in fact I got you wrong on this one. I thought you meant to move the 
> lock within do_readdir_lowlat(). About your actual suggestion above ...
> 
> > > the loop. The normal case is not requiring a lock at all, like the comment
> > > describes. Doing multiple locks inside the loop unnecessaririly reduces
> > > performance for no reason.
> > > 
> > > About *_trylock() instead of v9fs_readdir_lock(): might be a candidate to
> > 
> > Hmm... are you suggesting that do_readdir_lowlat() should return if it
> > can't take the lock ?
> 
> ... yep, that's what I had in mind for it later on. I would not mind running 
> trylock in a loop like you suggested; if it fails, log a warning and return.
> Because if the lock fails, then client is buggy. User can read the warning in 
> the logs and report the bug for client to be fixed. Not worth to handle that 
> case in any fancy way by server.
> 

The locking is just a safety net because readdir(3) isn't necessarily
thread safe. It was never my intention to add an error path for that.
And thinking again, sending concurrent Treaddir requests on the same
fid is certainly a weird thing to do but is it really a bug ?

> > > > > > + * Retrieves the requested (max. amount of) directory entries from
> > > > > > the
> > > > > > fs
> > > > > > + * driver. This function must only be called by the main IO thread
> > > > > > (top
> > > > > > half). + * Internally this function call will be dispatched to a
> > > > > > background
> > > > > > IO thread + * (bottom half) where it is eventually executed by the
> > > > > > fs
> > > > > > driver. + *
> > > > > > + * The old readdir implementation above just retrieves always one
> > > > > > dir
> > > > > > entry + * per call. The problem of that implementation above is that
> > > > > > latency is + * added for (retrieval of) each directory entry, which
> > > > > > in
> > > > > > practice lead to + * latencies of several hundred ms for readdir of
> > > > > > only one directory. + * + * This is avoided in this function by
> > > > > > letting
> > > > > > the fs driver retrieve all + * required directory entries with only
> > > > > > call of this function and hence with + * only a single fs driver
> > > > > > request.
> > > > 
> > > > True but these kind of considerations rather belong to the changelog...
> > > > otherwise, we'll have to not forget to kill these lines when the "old
> > > > readdir implementation" is no more.
> > > 
> > > Mmm... I do think this overall latency issue should be clarified somewhere
> > > as
> > The issue itself is best described in a changelog, but...
> > 
> > > a comment. Where exactly is not that important to me. For instance it
> > > could
> > > also be described on v9fs_co_run_in_worker() macro definition in coth.h
> > > instead, as general rule of thumb when dispatching stuff.
> > 
> > ... if you have useful recommendations that don't involve referring to a
> > piece of code that might go away, a comment in coth.h is certainly a good
> > idea.
> 
> Ok, I'll move it to coth.h then.
> 
> > > The thing is: for >10 years several developers obviously did not realize
> > > the severe negative performance impact of frivolously dispatching tasks
> > > to
> > I won't dare to comment on some people I don't know not doing the obvious
> > right things at the time... what I do know is that for >10 years, nobody
> > cared for this code. Period. You're lucky it is still functional ;-)
> 
> This was not about judging individuals, just to express that the latency 
> impact of dispatching does not seem to be obvious for people, and trying to 

Maybe because it isn't that obvious...

> avoid similar mistakes by placing appropriate comment(s).
> 
> > > > > > +                                        struct V9fsDirEnt
> > > > > > **entries,
> > > > > > +                                        int32_t maxsize, bool
> > > > > > dostat)
> > > > 
> > > > s/maxsize/max_count since this is the wording use in the caller.
> > > 
> > > Wouldn't do that. This is the driver side, not the 9p protocol request
> > > handler. As you might have relized before, "max_count" is semantically
> > > completely wrong. This variable does not count integral entries, it is
> > > rather a maximum size (in bytes) of the destination buffer being filled.
> > > 
> > > On 9p request handler side it is ok to use "max_count" instead, because
> > > the
> > > protocol definition uses exactly this term for the request parameter, but
> > > on driver side it's IMO inappropriate.
> > 
> > I agree the max_count wording sucks and I don't care that much to
> > name variables according to protocol definitions, but I do care
> > to use the same name in caller/callee when it makes sense, for a
> > better grep or cscope experience.
> 
> So you insist on that max_count change?
> 

Rather the opposite. Kill max_count and make it maxsize, in a preparatory
patch with the reasons you've exposed.

> Best regards,
> Christian Schoenebeck
> 
> 

Cheers,

--
Greg



reply via email to

[Prev in Thread] Current Thread [Next in Thread]