[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: give us a hand with arch

From: Tom Lord
Subject: Re: [Gnu-arch-users] Re: give us a hand with arch
Date: Sat, 27 Sep 2003 19:19:49 -0700 (PDT)

    > From: Miles Bader <address@hidden>

    >> My belief is that the time-savings would be *negative*, [....]
    > Why do you think that the time-savings would be negative?

    > For the no-cached-inode-state case, [....]
    > For the cached-inode-state case, [....]

We're all still a bit fuzzy.  Summing up observations made by various
people and design points and so forth (and leaving out a column of
subjective judgements on the impact of each method, in combination
with flouridated water, on our POE):

("manifest-a == an inventory index in {arch}"
 "manifest-b == an inventory index per directory")

                i                  ii              iii

        needs mkpatch/dopatch   space costs     speed costs
        changes; complicates                    of inventory --tags

embedded        no              minimal         inode-signature [1]

explicit        no              maximal         maybe-inode-signature [2]

manifest-a      yes             minimal         traversal-with-stat

manifest-b      yes             minimal         traversal-with-stat

names           no              none            traversal-with-stat

            iv              v

        supports        reliance on
        renames in      tla {mv,rm,etc.}

embedded        yes     very low

explicit        yes     files only (not dirs)

manifest-a      yes     total

manifest-b      yes     files only (not dirs)

names           no      none

[1] "speed costs == inode-signature"

     The worst case cost (of inventory --tags) involves opening approx
     one file per source file; doing a short read (or two) from it;
     closing it.

     A common case cost, however, consists of a traversal-with-stat:
     just stat'ing each source file and reading source directories.

[2] "speed costs == maybe-inode-signature"

    Currently, requires opening and reading one short file per source
    file (reading the .id file).   Actually -- currently -- the .id
    file is read more than once but that's an easy optimization to 
    toss in....

    An inode-signature-style optimization applies with a traversal
    cost of 2 stats per source file.  That's not a hugely difficult
    task to implement and would be a good task for somebody who wants
    to learn arch in more detail and "give us a hand with arch".

A partly subjective exercise might be to score all of these tagging
methods other than names:

        Scoring key:

        * i  -- needs mkpatch/dopatch hacks
        ** no:  +CHANGESET
        ** yes: -CHANGESET

        * ii -- space costs
        ** none: +SPACE
        ** minimal: +(7/10)SPACE
        ** maximal: -SPACE

        * iii -- speed costs
        ** traversal-with-stat: +SPEED
        ** (maybe-)inode-signature:  +(1/2)SPEED

        * iv -- supports renames
        ** yes: n/a                     (n/a since only `names' is different)
        ** no: n/a

        * v -- reliance on tla mv, etc.
        ** none: n/a
        ** very low:  +(9/10)CONVENIENCE        (the fractiosn of 10
        ** files only: +(1/10)CONVENIENCE        ultimately don't matter
                                                 -- see below)
        ** total: -CONVENIENCE

The scores are then:

embedded   =  CHANGESET + 7/10 SPACE + 1/2 SPEED + 9/10 CONVENIENCE
explicit   =  CHANGESET + -SPACE     + 1/2 SPEED + 1/10 CONVENIENCE
manifest-a = -CHANGESET + SPACE      + SPEED     + -CONVENIENCE
manifest-b = -CHANGESET + SPACE      + SPEED     + 1/10 CONVENIENCE

And the debate is (mostly) over which relation (<, >, ==, etc.) goes
in the following (noting that with POE considerations, I might be
asking about manifest-a instead):

        explict         ??      manifest-b

which is:

        CHANGESET + -SPACE     + 1/2 SPEED + 1/10 CONVENIENCE


        -CHANGESET + SPACE     + SPEED     + 1/10 CONVENIENCE

which is:



        -CHANGESET + SPACE     + 1/2 SPEED     

which is:

           explicit      ??     manifest-b

        2 * CHANGESET    ??     2 * SPACE + 1/2 SPEED

I don't think "SPACE" is worth very much in today's economy.
(Remember that the speed implications of how much disk space is used
are scored under SPEED -- not SPACE.   The SPACE prize is just
reflective of the bucks-per-pit for storage).   I'd be pretty
comfortable treating the above as:

           explicit      ??     manifest-b

        2 * CHANGESET    ??     2 * epsilon + 1/2 SPEED

or appox:

        2 * CHANGESET    ??     1/2 SPEED

        4 * CHANGESET    ??     SPEED

Explicit is better than manifest-b unless the SPEED prize is worth
more than 4 times the CHANGESET prize.

Over time, the SPEED prize clearly shrinks as hardware gets better.

And, the SPEED prize clearly shrinks at least a little everytime
there's a good reason to want a new changeset util or mkpatch/dopatch

I think it's clear that I made the correct _long_term_ choice.

The question is how long term, and how do these prize values translate
into economic terms either monetary or labor?

One might argue: "hey, I have this box here, I want to use arch, I
want to hack LK, and the SPEED prize is, as far as I'm concerned,
infinite (i.e., I won't use arch without it)."

(I think such arguments are a bit premature -- the readily available
expansions of the inode-sig hacks could very well change your mind and
would take far less work.)

It's a little hard to tell whether any argument about the relative
size of the CHANGESET and SPEED penalties for any particular use
really hold water without some actual development, both of additional
inode-signature optimizations and of manifest-b implementation.

I'd also be more comfortable if the people engaging in arguments in
this area displayed more awareness of the issues surrounding


    > reading lots of little files is just about always a lot slower than
    > reading one still-pretty-little file; remember, ids are _short_, you can
    > probably pack about 80 of them into a _single_ disk block on a typical
    > ext2 filesystem.

Remember that by expanding inode-signature optimizations for inventory
to help with explicit, that in some common cases at least, explicit is
going to wind up doing much closer to the work of manifest-b:  roughly
2x the file stats is all.

    > see you still win with one-.arch-ids-file-per-directory.  

It's a bit hard to quantify and the answer is going to change quite a
bit in relatively few years.    And you're glossing over the CHANGESET

    > Am I missing some obvious point???

No, just the above.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]