[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] locale bug ?
Re: [Monotone-devel] locale bug ?
Sat, 16 Feb 2008 15:34:48 -0600
On Feb 16, 2008 11:29 AM, Zack Weinberg <address@hidden> wrote:
> On Sat, Feb 16, 2008 at 8:32 AM, Timothy Brownawell <address@hidden> wrote:
> > > mtn: misuse: error converting 12 UTF-8 bytes to IDNA ACE: non-LDH
> > characters
> > >
> > > Any ideas how I can solve the problem and avoid it in the future? Is
> > > this a bug or am I doing something wrong ? Is this related to the
> > > fact that the cert value is a multi line string ?
> > Monotone has some very silly rules about what characters can be in a
> > cert name -- numbers, letters, and '-'. I have no idea what the
> > reasoning behind this rule was, but we should probably try to make it go
> > away. Until that happens, you'll have to use cert names that only have
> > those characters.
Thanks. In my particular case, replacing underscores with dashes did
the trick. However the count of non-LDH chars reported by monotone in
the error message does not match with the count of underscores in the
cert value, no idea why. It would in any case be useful if the error
message would include at least one of the chars that causes the
Having full unicode support in the lowest layers, implies supporting
ambiguous data, or data that is difficult to handle or interpret. So
apart from the technical burden unicode support can cause for
implementation, testing and maintenance, it can also lead to
situations were the social trust in the system gets undermined.
People who really want to add unicode meta-data, can always do so by
putting the data in a separate database, and putting the database key
of the meta-data in the cert value. It is probably what I will do to,
because it makes the meta-data more easy to search. Then the unicode
support comes from the external database, and from the monotone
development viewpoint, it simply saves work.
So I think actually that it is not to bad if there are certain
restrictions on cert values, as in the end you want transparency and
clarity for this type of data, certainly if it gets generated
Anyway, just my 2pc.
> It's a consequence of certain fields in the database being run through
> a canonicalization designed for domain names (that's what IDNA is).
> The only person who might possibly have remembered the rationale is
> Graydon, but I already asked him and he doesn't. :-/
> I'm definitely in favor of getting rid of it. It has to be done a
> little carefully because one of the things that gets canonicalized
> this way is key IDs, but I think it can be done without forcing us to
> reissue certs, even in the unlikely event that someone has a key ID
> that was changed by the canonicalization.
> There are two higher-level concerns, which are, what do we then do
> with cert names / key IDs that are the same under some Unicode
> canonicalization? And do we need to worry about e.g. CYRILLIC CAPITAL
> LETTER A being visually indistinguishable from LATIN CAPITAL LETTER A,
> despite their not being unified by any Unicode canonicalization?
> (This was a major issue with opening up DNS to non-ASCII names.)
Hugo Cornelis Ph.D.
Research Imaging Center
University of Texas Health Science Center at San Antonio
7703 Floyd Curl Drive
San Antonio, TX 78284-6240
Phone: 210 567 8112
Fax: 210 567 8152