[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24937: "deleting unused links" GC phase is too slow
From: |
Ludovic Courtès |
Subject: |
bug#24937: "deleting unused links" GC phase is too slow |
Date: |
Tue, 13 Dec 2016 18:02:18 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) |
Hello Mark,
Mark H Weaver <address@hidden> skribis:
> address@hidden (Ludovic Courtès) writes:
>
>> I did some measurements with the attached program on chapters, which is
>> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It
>> has 600k entries in /gnu/store/.links.
>
> I just want to point out that 600k inodes use 150 megabytes of disk
> space on ext4, which is small enough to fit in the cache, so the disk
> I/O will not be multiplied for such a small test case.
Right. That’s the only spinning-disk machine I could access without
problem. :-/
Ricardo, Roel: would you be able to run that links-traversal.c from
<https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>
on a machine with a big store, as described at
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?
>> Semi-interleaved is ~12% slower here (not sure how reproducible that is
>> though).
>
> This directory you're testing on is more than an order of magnitude
> smaller than Hydra's when it's full. Unlike in your test above, all of
> the inodes in Hydra's store won't fit in the cache.
Good point. I’m trying my best to get performance figures, there’s no
doubt we could do better!
> In my opinion, the reason Hydra performs so poorly is because efficiency
> and scalability are apparently very low priorities in the design of the
> software running on it. Unfortunately, I feel that my advice in this
> area is discarded more often than not.
Well, as you know, I’m currently traveling, yet I take the time to
answer your email at night; I think this should suggest that far from
discarding your advice, I very much value it.
I’m a maintainer though, so I’m trying to understand the problem better.
It’s not just about finding the “optimal” solution, but also about
finding a tradeoff between the benefits and the maintainability costs.
>> sort.c in Coreutils is very big, and we surely don’t want to duplicate
>> all that. Yet, I’d rather not shell out to ‘sort’.
>
> The "shell" would not be involved here at all, just the "sort" program.
> I guess you dislike launching external processes? Can you explain why?
I find that passing strings around among programs is inelegant
(subjective), but I don’t think you’re really looking to argue about
that, are you? :-)
It remains that, if invoking ‘sort’ appears to be preferable *both* from
performance and maintenance viewpoints, then it’s a good choice. That
may be the case, but again, I prefer to have figures to back that.
>> Do you know how many entries are in .links on hydra.gnu.org?
>
> "df -i /gnu" indicates that it currently has about 5.5M inodes, but
> that's with only 29% of the disk in use. A few days ago, when the disk
> was full, assuming that the average file size is the same, it may have
> had closer to 5.5M / 0.29 ~= 19M inodes,
OK, good to know.
Thanks!
Ludo’.
- bug#24937: "deleting unused links" GC phase is too slow, Ludovic Courtès, 2016/12/09
- bug#24937: "deleting unused links" GC phase is too slow, Ludovic Courtès, 2016/12/11
- bug#24937: "deleting unused links" GC phase is too slow, Mark H Weaver, 2016/12/11
- bug#24937: "deleting unused links" GC phase is too slow, Ludovic Courtès, 2016/12/11
- bug#24937: "deleting unused links" GC phase is too slow, Mark H Weaver, 2016/12/11
- bug#24937: "deleting unused links" GC phase is too slow, Ludovic Courtès, 2016/12/12
- bug#24937: "deleting unused links" GC phase is too slow, Mark H Weaver, 2016/12/13
- bug#24937: "deleting unused links" GC phase is too slow,
Ludovic Courtès <=
- bug#24937: "deleting unused links" GC phase is too slow, Ricardo Wurmus, 2016/12/13
- bug#24937: "deleting unused links" GC phase is too slow, Mark H Weaver, 2016/12/12
- bug#24937: "deleting unused links" GC phase is too slow, Mark H Weaver, 2016/12/14