[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Special characters
Mark D. Baushke
Re: Special characters
Mon, 12 Feb 2007 22:35:50 -0800
-----BEGIN PGP SIGNED MESSAGE-----
[thomas] <address@hidden> writes:
> We are experiencing some problems with the use
> of danish characters in files. When commited,
> CVS changes the characters. Is there some
> settings for this?
Question: Which encoding are you using?
(i.e., ISO-8859-1 or or IOS-8859-15 or
windows-1252 or something else?)
Question: Are you using CVS or CVSNT (please
indicate both client and server)?
Question: What versions of CVS or CVSNT are you
fwiw: I have used ISO-8859-1 (sometimes called
Latin1) characters on GNU/Linux servers and
clients with no problems. I am not a windows user,
so I am not really qualified to speak about any
problems or advantages windows may bring to the
The big downside for ISO-8859-1 is that it is
missing the euro sign. The UNICODE/UTF-8 encoding
for the euro sign is U+20AC (UTF-8 0xE2 0x82 0xAC).
Some folks seem to like ISO-8859-15 and others
do not like it. I don't use it myself.
If you are using CVSNT for both client and server,
then you may need to use cvs admin -ku (unicode)
to set the file to unicode.
If you are using a mixture of CVS (server) and
CVSNT (client), then you should not have any
problems with ISO-8859-1 characters, but you might
need to use -kb (binary) in some cases. (I am not
sure if any problems exist with windows-1252
encodings or not.)
If you have questions about CVSNT, look to
http://www.cvsnt.org/ for additional help.
CVS will not use use Unicode Transform Formats and
assumes UTF-8 characters. The exception is that CR
LF line endings will be converted to LF on checkin
- From a windows client to the server. If a file is
marked as a binary (-kb), then what is checked in
will be checked out. However, 'cvs update' will
have problems doing merges for binary files.)
CVS may have problems with UTF-16 characters as it
treats bytes rather than characters and may be
confused by an encoding where the second byte of a
UTF-16 character looks like a CR or LF character.
In theory, this is not a problem as all valid
UTF-16 encodings are supposed to set the
high-order bit. This does not mean that all
unicode characters are always well formed.
Some folks consider that Windows-12452 is an
extension of ISO-8859-1. Others consider it
non-standard. Exactly how your system renders
arbitrary bytes will be system dependent.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)
-----END PGP SIGNATURE-----