[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support

From: Jarl Friis
Subject: Re: Unicode support
Date: Tue, 25 Jul 2006 20:06:22 +0200
User-agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.5 (chestnut, linux)

Bruno Haible <address@hidden> writes:

> Hi,
> Jarl Friis wrote:
>> I would like to see support for UNICODE files, i.e. text files encoded
>> as ucs2.
>> i.e. support for this in diff and diff3.
> The basic principle of Unix on the command-line is that you can put
> together complex commands from simple ones. 

Basically I tend to agree.

>    #!/bin/bash
>    inputfile1=$1
>    inputfile2=$2
>    diff <(iconv -f UCS-2 < "$inputfile1") <(iconv -f UCS-2 < "$inputfile2")

Thanks for this "one-liner". I didn't know that the default
"to-encoding" on iconv is UTF-8, but a small test reveals this fact.

> There is no need to add this support directly to 'diff' itself, because
>   - UCS-2 encoded files are quite rare on Unix,

Not on Cygwin :-)

>   - the above solution does it.

Good argument

> By the way, the standard encoding on many Linux systems nowadays is
> UTF-8. It is also Unicode, and unlike UCS-2,
>   - it supports all traditional chinese characters, not just the most
>     frequently used 50%,
>   - it does not require unreliable heuristics to determine the "endianness"
>     of the encoding.

Very good arguments. I hereby just realise that UTF-8 covers all
Unicode, whereas UCS2 is only a (large) subset.

So I assume with these very good arguments that the diff utils support
UTF-8, right?


Jarl Friis
Softace ApS
Omøgade 8, 2.sal
2100 København Ø.
Phone:  +45 26 13 20 90
E-mail: address@hidden
LinkedIn: https://www.linkedin.com/in/jarlfriis

reply via email to

[Prev in Thread] Current Thread [Next in Thread]