[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux

From: Oren Ben-Kiki
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Tue, 7 Dec 2004 11:36:16 +0200
User-agent: KMail/1.7.1

On Tuesday 07 December 2004 09:22, Nathan Myers wrote:
> ... I have posted my first word list,
> sufficient to encode ten bits each.  It
> turned out to be surprisingly difficult to filter and then extend
> "grep '^...$' /usr/dict/words" out to 1024 entries.  Not all are
> English; some heavily-used Latin, Spanish, French, German, numbers,
> and abbreviations appear.  I've made no effort to keep them aurally
> distinguishable -- e.g., both Tao and tau, cay and Kay (and que) are
> there.

I see that words were really running out. You had to resort to numbers 
to complete the list ('111', '789'). Some of these words are only known 
to Scrabble addicts, if they are indeed words (random selection: 'fra', 
'sho', 'ums', 'wog'...).

> If you disagree with any choices, please propose replacements.  If
> necessary, the list might be culled to encode only nine bits per
> word, instead.

Nine bits seems neither here nor there - if you go down to 9 bits per 
word you'd need an extra word to get to the 40-bit "safe" limit; you 
might as well get down to 8 bits, using only phonetically distinct 
"real" words.

That said, I'm less convinced that this approach is necessary in the 
first place, given that CVS-like cross-db stable revision ids are 
achievable (using the branch/fork owner's E-mail address).

> Automatically generated aliases 
> are, evidently, a research project; some experiments will have to
> fail before we know more.  It seems to me the most important
> consideration is not to attempt the impossible.  A stable naming
> scheme that works within one repository is a reasonable, and hard
> enough, goal.  Between repositories we have hashes and tags.

I'm not convinced that having cross-repository stable ids is a lost 
cause. I think that CVS-like branch/fork numbering using the (prefix of 
the) author's E-mail as the fork identifier does achieve both goals. 
The key advantage here is that revision relationships are inherent in 
the ids. All other methods only give you a unique id, period. I think 
that's important enough to warrant some experimentation before we give 
up and settle for some form of "random" unique ids, short and nice as 
they may be.

Have fun,

 Oren Ben-Kiki

reply via email to

[Prev in Thread] Current Thread [Next in Thread]