[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v6] 9pfs: use GHashTable for fid table
From: |
Christian Schoenebeck |
Subject: |
Re: [PATCH v6] 9pfs: use GHashTable for fid table |
Date: |
Thu, 06 Oct 2022 18:12:38 +0200 |
On Mittwoch, 5. Oktober 2022 11:38:39 CEST Christian Schoenebeck wrote:
> On Dienstag, 4. Oktober 2022 14:54:16 CEST Christian Schoenebeck wrote:
> > On Dienstag, 4. Oktober 2022 12:41:21 CEST Linus Heckemann wrote:
> > > The previous implementation would iterate over the fid table for
> > > lookup operations, resulting in an operation with O(n) complexity on
> > > the number of open files and poor cache locality -- for every open,
> > > stat, read, write, etc operation.
> > >
> > > This change uses a hashtable for this instead, significantly improving
> > > the performance of the 9p filesystem. The runtime of NixOS's simple
> > > installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> > > decreased by a factor of about 10.
> > >
> > > Signed-off-by: Linus Heckemann <git@sphalerite.org>
> > > Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > > Message-Id: <20220908112353.289267-1-git@sphalerite.org>
> > > [CS: - Retain BUG_ON(f->clunked) in get_fid().
> > >
> > > - Add TODO comment in clunk_fid(). ]
> > >
> > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > > ---
> >
> > In general: LGTM now, but I will definitely go for some longer test runs
> > before queuing this patch. Some minor side notes below ...
>
> So I was running a compilation marathon on 9p as root fs this night, first
> couple hours went smooth, but then after about 12 hours 9p became unusable
> with error:
>
> Too many open files
>
> The question is, is that a new issue introduced by this patch? I.e. does it
> break the reclaim fd code? Or is that rather unrelated to this patch, and a
> problem we already had?
>
> Linus, could you look at this? It would probably make sense to force getting
> into this situation much earlier like:
>
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index aebadeaa03..0c104b81e1 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -4330,6 +4330,6 @@ static void __attribute__((__constructor__))
> v9fs_set_fd_limit(void)
> error_report("Failed to get the resource limit");
> exit(1);
> }
> - open_fd_hw = rlim.rlim_cur - MIN(400, rlim.rlim_cur / 3);
> + open_fd_hw = rlim.rlim_cur - MIN(50, rlim.rlim_cur / 3);
> open_fd_rc = rlim.rlim_cur / 2;
> }
>
> I can't remember that we had this issue before, so there might still be
> something wrong with this GHashTable patch here.
Much easier reproducer; and no source changes required whatsoever:
prlimit --nofile=140 -- qemu-system-x86_64 ...
And I actually get this error without this patch as well, which suggests that
we already had a bug in the reclaim FDs code before? :/
Anyway, as it seems that this bug was not introduced by this particular patch,
and with the unnecesary `goto` and `out:` label removed:
Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next
Best regards,
Christian Schoenebeck