[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: problems with i18n testsuite
From: |
graydon hoare |
Subject: |
[Monotone-devel] Re: problems with i18n testsuite |
Date: |
Tue, 20 Apr 2004 13:03:01 -0400 |
User-agent: |
Mozilla Thunderbird 0.5 (X11/20040208) |
Robert Bihlmeyer wrote:
I think the best solution is to assume UTF-8, and use LC_CTYPE's
charset in case the filename is not valid UTF-8.
I'm afraid that algorithm doesn't work well. suppose I'm a EUC-KR
(korean) user:
- my filenames have one encoding in EUC-KR
- they have a different encoding in UTF-8
- they *mean* the same characters in either encoding
- if I commit on an EUC-KR machine, the filename is not valid UTF-8;
but the filename is representable in UTF-8 if I do a conversion.
- if I checkout from monotone (UTF-8) to a EUC-KR machine, the
UTF-8 filename is not valid EUC-KR, but it is representable in
EUC-KR if I do a conversion.
I think the solution monotone has is a good one:
- UTF-8 in control files (manifest, work, .mt-attrs)
- no non-UTF-8-representable characters allowed in filenames
- filenames externalized to whatever your local charset is
- no constraints on contents of files *required*
- any conversion on contents of files *permitted*
this means that if you have the luxury of working in a rich UTF-8
environment, checkouts are no-ops on pathnames. if you're on a "lesser"
system (say, EUC-KR or KOI8-R), some filenames may not checkout on
your machine (say, those in the arabic part of UTF-8), but you'll be
informed of this by a useful error, not a mysterious set of bytes in
your filenames.
if you stick to filenames which are representable on your current
system, you'll still be able to edit those files using your EUC-KR or
KOI8-R or whatever editor and check in without "breaking" those
filenames. they'll be "promoted" to UTF-8 for storage and "demoted" back
to your local charset when you check them out or update them.
-graydon