bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] problem with iso-8859-8 encoding


From: Alexander (Sasha) Sirotkin
Subject: Re: [bug-gnu-libiconv] problem with iso-8859-8 encoding
Date: Thu, 28 Feb 2008 23:34:07 +0200



On Tue, Feb 26, 2008 at 4:25 AM, Bruno Haible <address@hidden> wrote:
Hello,

Alexander Sirotkin wrote:
> I find it hard to believe, but apparently iconv have a problem converting
> iso-8859-8 (hebrew) to any other encoding, for instance UTF-8. Hebrew
> letters in the result appear in the revere order.

As you can read in [1], [2], text in ISO 8859-8 is "sometimes in logical,
sometimes in visual order". Therefore the request to convert ISO-8859-8 to
UTF-8 is already ambiguous per se. Some others [3] say that ISO-8859-8 is always
visual... Oh well.

Additionally, conversion between visual and logical order requires an
arbitrary amount of memory (whose size depends on the input); this is
does not fit into the way iconv is implemented in GNU libc and in GNU libiconv.

For these reasons, GNU libc and GNU libiconv don't implement this reordering.

Fribidi implements reordering from logical to visual order.

The only free software (that I know of) that does reordering of ISO-8859-8
from visual to logical is ICU, and its documentation [4] says:

 "Legacy systems frequently stored text in visual order to avoid
  reordering for display. When exchanging data with such systems for
  processing in Unicode it is necessary to reorder the data from visual
  order to logical order and back. Such not-for-display transformations
  are sometimes referred to as "storage layout" transformations.

  There are two problems with an "inverse reordering" from visual to
  logical order: There may be more than one logical order of text that
  results in the same display (logical-to-visual reordering is a many-to-one
  function), and there is no standard algorithm for it. ICU's BiDi API
  provides a setting for "inverse" operation that modifies the standard
  Unicode Bidi algorithm. However, it may not always produce the expected
  results. Bidirectional data should be converted to Unicode and reordered
  to logical order only once to avoid roundtrip losses. Just as it is best
  to never convert to non-Unicode charsets, data should not be reordered
  from logical to visual order except for display and printing."
 
Well, I was under impression that ISO-8859-8 always means visual and ISO-8859-8-I is logical, but apparently not everybody thinks the same.

ICU looks like what I need. Indeed, uconv utility from that package seems to support both ISO-8859-8 and ISO-8859-8-I, at least according to the help. Unfortunately, it does not seem to fix the direction either. But I guess I will have to write ICU people about that.

Thanks a lot.
 

Bruno


[1] http://en.wikipedia.org/wiki/ISO_8859-8
[2] http://en.wikipedia.org/wiki/ISO-8859-8-I
[3] http://www.w3.org/TR/2002/WD-xhtml2-20021211/mod-bidi.html
[4] http://www.icu-project.org/userguide/icu.pdf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]