[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode support
From: |
Jarl Friis |
Subject: |
Re: Unicode support |
Date: |
Tue, 25 Jul 2006 20:06:22 +0200 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.5 (chestnut, linux) |
Bruno Haible <address@hidden> writes:
> Hi,
>
> Jarl Friis wrote:
>> I would like to see support for UNICODE files, i.e. text files encoded
>> as ucs2.
>>
>> i.e. support for this in diff and diff3.
>
> The basic principle of Unix on the command-line is that you can put
> together complex commands from simple ones.
Basically I tend to agree.
> #!/bin/bash
> inputfile1=$1
> inputfile2=$2
> diff <(iconv -f UCS-2 < "$inputfile1") <(iconv -f UCS-2 < "$inputfile2")
Thanks for this "one-liner". I didn't know that the default
"to-encoding" on iconv is UTF-8, but a small test reveals this fact.
> There is no need to add this support directly to 'diff' itself, because
> - UCS-2 encoded files are quite rare on Unix,
Not on Cygwin :-)
> - the above solution does it.
Good argument
>
> By the way, the standard encoding on many Linux systems nowadays is
> UTF-8. It is also Unicode, and unlike UCS-2,
> - it supports all traditional chinese characters, not just the most
> frequently used 50%,
> - it does not require unreliable heuristics to determine the "endianness"
> of the encoding.
Very good arguments. I hereby just realise that UTF-8 covers all
Unicode, whereas UCS2 is only a (large) subset.
So I assume with these very good arguments that the diff utils support
UTF-8, right?
Jarl
--
Jarl Friis
Softace ApS
Omøgade 8, 2.sal
2100 København Ø.
Denmark
Phone: +45 26 13 20 90
E-mail: address@hidden
LinkedIn: https://www.linkedin.com/in/jarlfriis