monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] serialization format


From: Markus Wanner
Subject: Re: [Monotone-devel] serialization format
Date: Fri, 8 Apr 2016 07:31:52 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

On 04/08/2016 06:34 AM, J Decker wrote:
> 1) Hashes... once they're serliazed, can't 90% of the time they just
> be compared as strings?  (The output of which fits in utf-8 as ascii
> subset esp if you're using 58)

Monotone did that, but migrated to using binary representation for
efficiency. Note that we do hash calculations quite frequently, so we
need to serialize pretty frequently, too.

I rather think we need to migrate to binary all the way and encode the
hash just before displaying it to the user. That doesn't need to scale,
because the user hardly wants to see millions of hashes at once.

> 2) hashes fed through as utf-8 codpoints (because any value from
> 0-4,000,000 is encodable in a general algorithm, regardless of
> arbitrary restrictions) would yes more often be outside of the 94
> characters, and be encoded characters... but since the output is just
> characters anyway...

So you could come up with some kind of base4000000 encoding, where every
code point would cost 1-4 bytes in utf-8 encoding, i.e. we're speaking
about encoding twice. And loose all of the benefits of using a subset of
ASCII.... I don't see the point.

> Yuck, YAML has keywords?
> 
> {"cert":"1249123840182028934801az","Idunno":"blah"}

Something like that may be a canonical format, but without any newline,
I don't consider it human readable. I'd rather use something like:

{
  cert: "1249123840182028934801az"
  Idunno: "blah"
}

> and that itself is in utf-8... which emans any value is storable in a
> rune (to borrow a type name from Go)

Well, yes, we're already using utf-8 for commit messages and such. So
any human-readable, textual format is very likely using Unicode and be
encoded as UTF-8.

Regards

Markus Wanner

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]