[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] iconv diffs [Was: Why is utf8...]
From: |
Lapo Luchini |
Subject: |
[Monotone-devel] iconv diffs [Was: Why is utf8...] |
Date: |
Fri, 16 Feb 2007 16:26:57 +0100 |
User-agent: |
Thunderbird 1.5.0.9 (X11/20070129) |
Lapo Luchini wrote:
> Zack Weinberg wrote:
>> The //IGNORE and //TRANSLIT features are glibc / GNU libiconv
>> specific, but I would have thought that they were available in recent
>> Gentoo (they've been around since 2001 give or take).
>
>> Many systems have an iconv(1) command line utility that may be helpful
>> here.
>
> Uh, right, but writing a "known good UTF-8 string" escaped on the
> command line seems a bit trickier to me... no, not really.
>
> % echo "\xC2\xB7" | iconv -f UTF-8 -t CP1252//IGNORE//TRANSLIT
> · (that is, the correct and converted U+00B7 MIDDLE DOT)
> % echo "\xC2\xB7" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
> .
> % echo "\xC3\x80" | iconv -f UTF-8 -t CP1252//IGNORE//TRANSLIT
> À (that is, correct U+00C0 LATIN CAPITAL LETTER A WITH GRAVE)
> % echo "\xC3\x80" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
> `A
>
> Derek (or anyonelse with Gentoo), what do you get with these?
OK, I managed to reproduce it here at work with a Fedora box, it's a
really braindead iconv:
% echo "\xC3\x80" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
iconv: illegal input sequence at position 3
% echo "\xC3\x80" | iconv -f UTF-8 -t ASCII//IGNORE
iconv: illegal input sequence at position 3
% echo "\xC3\x80" | iconv -f UTF-8 -t ASCII//TRANSLIT
?
So the "solution" on those hosts would be to use only //TRANSLIT: but
that's a partial solution anyway, as not everything can be
transliterated. E.g. the japanese "po" katakana (U+30DD):
on FreeBSD, with libiconv 1.9.2:
% echo "\xE3\x83\x9D" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
% echo "\xE3\x83\x9D" | iconv -f UTF-8 -t ASCII//IGNORE
% echo "\xE3\x83\x9D" | iconv -f UTF-8 -t ASCII//TRANSLIT
iconv: (stdin): cannot convert
on Fedora, with libiconv bundled inside libc:
% echo "\xE3\x83\x9D" | iconv -f UTF-8 -t ASCII//IGNORE//TRANSLIT
iconv: illegal input sequence at position 4
% echo "\xE3\x83\x9D" | iconv -f UTF-8 -t ASCII//IGNORE
iconv: illegal input sequence at position 4
% echo "\xE3\x83\x9D" | iconv -f UTF-8 -t ASCII//TRANSLIT
?
There isn't any form that do something useful on both. =(
I'll take a better look at the problem probably this evening.
- [Monotone-devel] Why is utf8 type _NOVERIFY, and other vocab stuff., Timothy Brownawell, 2007/02/14
- Re: [Monotone-devel] Why is utf8 type _NOVERIFY, and other vocab stuff., Nathaniel Smith, 2007/02/15
- [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Lapo Luchini, 2007/02/15
- Re: [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Zack Weinberg, 2007/02/15
- [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Lapo Luchini, 2007/02/15
- [Monotone-devel] iconv diffs [Was: Why is utf8...],
Lapo Luchini <=
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Nathaniel Smith, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Patrick Georgi, 2007/02/17
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ethan Blanton, 2007/02/17
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ulrich Drepper, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ulrich Drepper, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Keller, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
- Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Patrick Georgi, 2007/02/16