lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Should we restrict cvs files to ASCII?


From: Evgeniy Tarassov
Subject: Re: [lmi] Should we restrict cvs files to ASCII?
Date: Tue, 17 Apr 2007 17:57:49 +0200

On 4/17/07, Greg Chicares <address@hidden> wrote:
On 2007-04-17 12:58Z, Vadim Zeitlin wrote:
> On Tue, 17 Apr 2007 12:50:12 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> I see this inadvertent change:
> GC>
> GC>   ChangeLog,v 1.618 2007/04/17 12:33:09
> GC> -  "§21.2206(1)(I)-(M), (O), (Q)-(S) and (U)-(V) Supplemental - Other"
> GC> +  "�21.2206(1)(I)-(M), (O), (Q)-(S) and (U)-(V) Supplemental - Other"
>
>  This is probably due to an editor which does conversion to UTF-8
> automatically... Which also means that it's probably impossible to even
> notice that it has happened. And while Codestriker should have shown it
> it's still easy to miss it.

May I ask whether it's visible in Codestriker? I'm getting some
other work out of the way right now, and I'm not yet familiar
enough with the tool to know how to determine this efficiently
myself. The reason I ask is that I'm encouraging my compatriots
to report any shortcomings we perceive to Codestriker's author.
(You don't need more practice doing that, obviously.)

Vadim has asked me to create a patch involving non-ASCII characters
and to upload it to CodeStriker to test its behavior.

Finally there were two patches uploaded, since the way the patches are
created matters:
1) 
http://lmi.tt-solutions.com/codestriker/codestriker.pl?topic=9337714&action=view
This patch is created using 'cvs diff'. IT turns out that the output
encoding is utf8. I have not found an option to explicitly specify the
output encoding in 'cvs manual' ().
This patch is handled correctly by CodeStriker.
2) 
http://lmi.tt-solutions.com/codestriker/codestriker.pl?topic=2112415&action=view
This patch I have created using 'diff' only:
'diff ChangeLog.old ChangeLog'
The output is in 'ISO-8859-1' encoding.
This patch is _not_ handled correctly by CodeStriker.

Some thoughts:

- CodeStriker recieves text files as input and it cannot correctly
determine the patch-file encoding, therefor it uses utf8 to handle
unicode and to stay backward compatible with ASCII-embrased projects.

- In my IDE when i create a CVS repository link i can specify the
encoding of the repository. I have tried to change the value and
create patchs, but it seems that this parameter is not passed to 'cvs
diff' and it does not matter -- the produced patch is always utf8
encoded.

I suspect that 'cvs protocol' has a hardcoded utf8 encoding for text files.

I think the best way to simplify the situation would be to switch lmi
to use utf8, since it is the hardcoded choice for CodeStriker and
probably for CVS too.
Otherwise any patch file containing non-ASCII characters will depend
on the way it was created ('cvs diff' will produce utf8 encoded file,
'diff' will produce ISO-8859-1 (native lmi encoding)).

--
Best wishes,
Evgeniy Tarassov




reply via email to

[Prev in Thread] Current Thread [Next in Thread]