[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux

From: Nathan Myers
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Mon, 6 Dec 2004 23:22:27 -0800
User-agent: Mutt/1.3.28i

On Sun, Dec 05, 2004 at 02:39:25PM -0500, graydon hoare wrote:
> what I'm unwilling to do is make deep changes without giving a fair
> bit  of thought to what we're trying to accomplish. so I'm glad we're 
> discussing. I don't enjoy someone throwing down the gauntlet and 
> deciding there is One True Way of doing things. there are usually 
> many ways.

This attitude gives me confidence in the direction of the project and 
hope for its ultimate success.

> I see there as being 3 issues to decide:
> 1. whether hashes-as-identifiers are acceptable, from a human factors
>    perspective, as an *implementation* technique alone. eg. is it
>    acceptable for debug logs and whatnot to contain hashes, and for
>    some parts of the manual to describe hashes in passing, if they are
>    sufficiently rare in user input and output?

To me this is uncontroversial.  However, I see no merit at all in
the hexadecimal presentation of hashes.  The only issue I see is that
it is not instantly obvious what the best alternative to hex is.

On that note, I have posted my first word list,
sufficient to encode ten bits each.  It turned out to be surprisingly
difficult to filter and then extend "grep '^...$' /usr/dict/words" out 
to 1024 entries.  Not all are English; some heavily-used Latin, Spanish, 
French, German, numbers, and abbreviations appear.  I've made no effort
to keep them aurally distinguishable -- e.g., both Tao and tau, cay and
Kay (and que) are there.

If you disagree with any choices, please propose replacements.  If 
necessary, the list might be culled to encode only nine bits per word, 

Coupled with my proposed per-repository parameter U -- a pessimistic 
lower bound on the number of bits necessary to uniquely distinguish 
all entries in a repository, and used for non-verbose presentation -- 
hashes would typically look like "GalPerMooVia", which is hardly scary 
at all, vs. the equivalent "0xfe14c03d1a" -- albeit still a bit weird.  
Coupled with an indicator of the actual minimum unique substring, 
e.g. "GalPer.MooVia" (vs. "0xfe14c.03d1a") it could actually become 
more convenient to type hashes than more indicative aliases.  (Of 
course when typing one would normally omit caps, e.g. "fooper".)

As noted earlier, there may be a need to collect and distribute word 
lists for other locales.  It would be awkward to have similar but not
identical lists in circulation, so we would need confidence in a list 
before distributing it.

> 2. whether the UI accepts something which is more user-friendly to
>    input.

User-assigned aliases, at least.  Automatically generated aliases
are, evidently, a research project; some experiments will have to 
fail before we know more.  It seems to me the most important 
consideration is not to attempt the impossible.  A stable naming 
scheme that works within one repository is a reasonable, and hard 
enough, goal.  Between repositories we have hashes and tags.

> for #2, I already made the UI accept something more user-friendly than 
> hashes, but it's apparantly not enough. 

If monotone were to write out the (currently) shortest version of the
alias that uniquely selects the item, then it would be easier to get
users to use such names.

>   - perhaps bibblebabble or nathan's dictionary approach would help.
>     I've resisted this because I don't feel like it adds much
>     *meaning* to identifiers; I think people would in fact find it
>     easier to look at monotone as some crazy moon-man system if it
>     insisted on calling your versions "dog-wall-stink-egg" or
>     "fwee-bazoo-frump-gorf". at least 0x9f798f98ea is somewhat clearly
>     a number, albeit an unfriendly one.

We don't seem to have a choice about using something more or less
offputting when hashes are involved.  The best we can do is make them
not too unpleasant to use in those contexts where they are needed.  
Certainly revision hashes shouldn't be needed for local operations 
where automatically-generated aliases suffice, so maybe they need 
not always appear in default output, although I imagine there are 
places where you would want to see them anyway (e.g. to identify 
branch heads).

> and we ask a hook for your preferences as far as which to print out 
> (possibly all three) when listing logs, status, etc.

This seems to avoid the careful work of choosing a good default for 
each command, according to how it is normally used.  Assigning commands 
to families that share a policy might lead to good choices.  


> > No, we're _not_ doing OK.  We are falling farther and farther behind 
> > Arch and Subversion.  Arch is held back mainly by its incidental 
> > weirdnesses in, e.g., file name conventions.  In public discussion 
> > on SCM systems, Monotone is rarely more than a footnote.  Even Darcs 
> > gets more respect!
> I used to worry a lot about this.  [...]
> there is some sort of population-genetics equation defining the number 
> of competitors which can coexist in a niche. it is something related to 
> network effects, size of the niche, amount of variation in the niche, 
> barriers to entry of new competitors, and "inertial" costs of 
> maintaining and adapting large general competitors -vs- small 
> specialized competitors.

This is what worries me.  People on this list are comfortable with
the idea of experimenting with SCM systems.  The overwhelming majority
of software development, though, is done under circumstances where:

1. The choice of SCM system is made once, company-wide or project-wide.
   Divisions, companies, or subprojects merged in are forced to replace 
   their SCM with the company choice, most likely the "most popular"

2. Software developers have, as a rule, no interest in SCM as anything 
   but a job requirement.  They learn the minimum necessary to work 
   with the system they are assigned, and resist learning a new one.  
   (I feel this way about CVS, and am only motivated to get off it.)

3. Project leaders may be unable to change SCM once it is chosen because
   of the difficulty in getting all the participants up on the the new

We are in a unique position now.  CVS has is so nearly intolerable that 
people and organizations are motivated to abandon it as soon as inertia 
and the stability of the alternatives permit.  Once they switch to 
something tolerable they will never switch again if they can possibly 
avoid it.  At some point familiarity with Monotone OR Subversion will 
be an essential resume checklist item.  Administrators will need to 
have managed the popular one, so will install it even against their own
better judgment.

It will be years before most outfits drop CVS.  However, the great bulk 
of them will jump to whichever seems to be most popular at the time 
they begin to consider the switch, so unless a dark horse is gaining 
by leaps and bounds at the time (and thus smells of The Future), their 
"choice" will only amplify any imbalance.  

Nathan Myers

reply via email to

[Prev in Thread] Current Thread [Next in Thread]