[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux

From: Nathan Myers
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Wed, 8 Dec 2004 20:36:43 -0800
User-agent: Mutt/1.3.28i

On Tue, Dec 07, 2004 at 11:36:16AM +0200, Oren Ben-Kiki wrote:
> On Tuesday 07 December 2004 09:22, Nathan Myers wrote:
> Some of these words are only known to Scrabble addicts, if they 
> are indeed words (random selection: 'fra', 'sho', 'ums', 'wog'...).

Even if you couldn't identify a painting by Fra Angelico, you have
probably hard of him.  You almost certainly say "um" now and again; 
your speech may be peppered with ums.  "Wog" is common British slang 
for the locals in a colony.  "Sho", like "sno" and "gro", is extremely
common in trademarks; look in your yellow pages or in any supermarket.

It doesn't much matter if some of the words are not OED.  You've seen 
them all, or words very like them, often enough that it won't take any 
conscious attention to use them even where you can't put your finger
on a definition.  Thus, "bux", "wix" and "dux" cause no trouble in
practice even if you know of no trademarks that use them.

> > If you disagree with any choices, please propose replacements.  If
> > necessary, the list might be culled to encode only nine bits per
> > word, instead.
> Nine bits seems neither here nor there - if you go down to 9 bits per 
> word you'd need an extra word to get to the 40-bit "safe" limit; you 
> might as well get down to 8 bits, using only phonetically distinct 
> "real" words.

It was already established that 36 bits ought to be enough.  Eight 
bits per word would allow a lot more latitude, though.  One could 
try to make blatantly off-color combinations impossible, although 
that seems ultimately futile -- imaginations are fertile.  (Anyway 
off-color sequences are better, mnemonically.)  It might be more 
helpful to eliminate homonyms, although that's probably almost as

The need to express hashes with no possible ambiguity over the phone 
has to be rare.  In phone conversations you might mention the JimBob 
revision, and it's almost as unlikely that your correspondent will 
think you meant the GymBob revision when you're both looking at the 
same graph as that both revisions' hashes both start with JimBob
anyway.  In either case, you just add another word, or in extremis
maybe two more.

If we find sound reasons to prefer an 8-bit encoding, it's easy to 
produce lists for it, and 8-bit hash fragments would be almost
as useful as 10-bit ones.

> That said, I'm less convinced that this approach is necessary in the 
> first place, given that CVS-like cross-db stable revision ids are 
> achievable (using the branch/fork owner's E-mail address).

That makes a lot of assumptions.  It assumes there is only one project
tree in a repository, or that one must identify in commands which 
project, i.e. main trunk, is being operated on.  (Probably there are
lots of places where it would be hard to open a separate port, and 
maintain a separate repository, for each project.)  It assumes that 
a user is only operating on one branch, from one other repository.  
It assumes your e-mail address is not, itself, inconveniently long,
or unpleasantly ambiguous in its convenient shortened form.

> > Automatically generated aliases 
> > are, evidently, a research project; some experiments will have to
> > fail before we know more.  It seems to me the most important
> > consideration is not to attempt the impossible.  A stable naming
> > scheme that works within one repository is a reasonable, and hard
> > enough, goal.  Between repositories we have hashes and tags.
> I'm not convinced that having cross-repository stable ids is a lost 
> cause. I think that CVS-like branch/fork numbering using the (prefix of 
> the) author's E-mail as the fork identifier does achieve both goals. 
> The key advantage here is that revision relationships are inherent in 
> the ids. All other methods only give you a unique id, period. I think 
> that's important enough to warrant some experimentation before we give 
> up and settle for some form of "random" unique ids, short and nice as 
> they may be.

I don't believe hashes can be avoided entirely.  That's not to say an 
enormous amount of work cannot be wasted trying.  Still we will end up 
with both, after all.  Conceptual corruptions in support of avoiding 
them will be regretted when it's too late to fix them, and will also 

That's not to say that other notations won't be useful.  Rather, we
shouldn't load more weight on them than they can carry without 
getting heavier than the hashes they try to subsume.  For example, 
in any naming scheme it helps to have a suffix that says "parent", 
that may be repeated to indicate parent's parent etc., and similarly 
for children.  Merges and branches would make such references 
ambiguous, but that's OK if you have hashes to fall back on.  When 
an ambiguous reference fails, you have to identify what you mean.   
It could hardly ever take many bits from the hash to clear up any 
such ambiguous reference.  A place for such fragmentary hash 
disambiguators could be part of the suffix notation.

Then, of course, there are tags.  They too could be ambiguous, 
although that is generally quite easy to avoid in practice, and 
we have hashes to fall back on anyway.  Of course the suffixes 
mentioned above would be just as helpful appended to tag names.  

Floating tags would be helpful too, as I mentioned earlier this year.  
Branches from a floating tag's current revision would make that 
tag ambiguous until it's re-assigned manually, or until the branches
are merged again.  Again, ambiguity is easily resolved with hash

Nathan Myers

reply via email to

[Prev in Thread] Current Thread [Next in Thread]