monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: problems with i18n testsuite


From: graydon hoare
Subject: [Monotone-devel] Re: problems with i18n testsuite
Date: Tue, 20 Apr 2004 13:03:01 -0400
User-agent: Mozilla Thunderbird 0.5 (X11/20040208)

Robert Bihlmeyer wrote:

I think the best solution is to assume UTF-8, and use LC_CTYPE's
charset in case the filename is not valid UTF-8.

I'm afraid that algorithm doesn't work well. suppose I'm a EUC-KR (korean) user:

 - my filenames have one encoding in EUC-KR
 - they have a different encoding in UTF-8
 - they *mean* the same characters in either encoding
 - if I commit on an EUC-KR machine, the filename is not valid UTF-8;
   but the filename is representable in UTF-8 if I do a conversion.
 - if I checkout from monotone (UTF-8) to a EUC-KR machine, the
   UTF-8 filename is not valid EUC-KR, but it is representable in
   EUC-KR if I do a conversion.

I think the solution monotone has is a good one:

 - UTF-8 in control files (manifest, work, .mt-attrs)
 - no non-UTF-8-representable characters allowed in filenames
 - filenames externalized to whatever your local charset is
 - no constraints on contents of files *required*
 - any conversion on contents of files *permitted*

this means that if you have the luxury of working in a rich UTF-8 environment, checkouts are no-ops on pathnames. if you're on a "lesser" system (say, EUC-KR or KOI8-R), some filenames may not checkout on your machine (say, those in the arabic part of UTF-8), but you'll be informed of this by a useful error, not a mysterious set of bytes in your filenames.

if you stick to filenames which are representable on your current system, you'll still be able to edit those files using your EUC-KR or KOI8-R or whatever editor and check in without "breaking" those filenames. they'll be "promoted" to UTF-8 for storage and "demoted" back to your local charset when you check them out or update them.

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]