[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Special characters

From: Mark D. Baushke
Subject: Re: Special characters
Date: Mon, 12 Feb 2007 22:35:50 -0800

Hash: SHA1

Hi Thomas,

[thomas] <address@hidden> writes:

> We are experiencing some problems with the use
> of danish characters in files. When commited,
> CVS changes the characters. Is there some
> settings for this?

Question: Which encoding are you using? 
          (i.e., ISO-8859-1 or or IOS-8859-15 or
          windows-1252 or something else?)

Question: Are you using CVS or CVSNT (please
          indicate both client and server)?

Question: What versions of CVS or CVSNT are you

fwiw: I have used ISO-8859-1 (sometimes called
Latin1) characters on GNU/Linux servers and
clients with no problems. I am not a windows user,
so I am not really qualified to speak about any
problems or advantages windows may bring to the

The big downside for ISO-8859-1 is that it is
missing the euro sign. The UNICODE/UTF-8 encoding
for the euro sign is U+20AC (UTF-8 0xE2 0x82 0xAC).

Some folks seem to like ISO-8859-15 and others
do not like it. I don't use it myself.

If you are using CVSNT for both client and server,
then you may need to use cvs admin -ku (unicode)
to set the file to unicode.

If you are using a mixture of CVS (server) and
CVSNT (client), then you should not have any
problems with ISO-8859-1 characters, but you might
need to use -kb (binary) in some cases. (I am not
sure if any problems exist with windows-1252
encodings or not.)

If you have questions about CVSNT, look to for additional help.

CVS will not use use Unicode Transform Formats and
assumes UTF-8 characters. The exception is that CR
LF line endings will be converted to LF on checkin
- From a windows client to the server. If a file is
marked as a binary (-kb), then what is checked in
will be checked out. However, 'cvs update' will
have problems doing merges for binary files.)

CVS may have problems with UTF-16 characters as it
treats bytes rather than characters and may be
confused by an encoding where the second byte of a
UTF-16 character looks like a CR or LF character.
In theory, this is not a problem as all valid
UTF-16 encodings are supposed to set the
high-order bit. This does not mean that all
unicode characters are always well formed.

Some folks consider that Windows-12452 is an
extension of ISO-8859-1. Others consider it
non-standard. Exactly how your system renders
arbitrary bytes will be system dependent.

        Good luck,
        -- Mark
Version: GnuPG v1.4.6 (FreeBSD)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]