libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mis


From: Leon Merten Lohse
Subject: Re: [Libcdio-devel] How tolerant to be towards CD-TEXT character set mislabeling ?
Date: Mon, 29 Apr 2019 08:57:26 +0200
User-agent: Roundcube Webmail/1.3.3

Hi,

Note that ASCII, ISO-8559-1 (and Japanese SHIFT_JIS - I have never seen this in the wild) are the only allowed encodings. [1] ISO-8859-1 also is the "default" as it is set by a 0x00 and, more importantly, is the only allowed encoding for some of the fields.

Ignoring the "ASCII" byte seems like a good workaround to me. But I would strongly advise against assuming an encoding other than ISO-8859-1.

If one wants to be even more tolerant:
How about introducing a second step that, in case inconv ISO-8559-1->UTF-8 fails, simply ignores the illegal characters?

Best
Leon

[1] https://www.gnu.org/software/libcdio/cd-text-format.html

On 2019-04-27 16:34, Thomas Schmitt wrote:
Hi,

in the course of
  https://savannah.gnu.org/bugs/?53929
it turned out that the CD in question announces to have its CD-TEXT
encoded in 7-bit ASCII, but then has 8-bit characters as of ISO-8859-1.

Having pondered how to deal with this situation, i come to the conclusion
that either libcdio should refuse to show such a text, or retry with
ISO-8859-1.
The retry can be cut short due to the fact that ISO-8859-1 officially
includes ASCII as its codes below 128 (decimal).

So it is about the boss decision whether to simply decode ASCII as
ISO-8859-1 rather than to possibly let iconv take offense from 8-bit
characters.

Rocky will have to decide in the end. But others are invited to tell
their opinion.

I vote for being tolerant.

In the bug report i asked Serge Pouliquen to test this:

--- lib/driver/cdtext.c 2018-06-14 17:26:07.742400554 +0200
+++ lib/driver/cdtext.bug53929.c 2019-04-27 15:08:22.336291660 +0200
@@ -717,7 +717,12 @@ cdtext_data_init(cdtext_t *p_cdtext, uin
             charset = (char *) "ISO-8859-1";
             break;
           case CDTEXT_CHARCODE_ASCII:
-            charset = (char *) "ASCII";
+ /* ASCII is a subset of ISO-8859-1. Some CDs announce it but then + * have 8-bit characters in their text. Trying ISO-8859-1 gives + * more hope for a readable result than telling iconv to be picky.
+             */
+            charset = (char *) "ISO-8859-1";
             break;
           case CDTEXT_CHARCODE_SHIFT_JIS:
             charset = (char *) "SHIFT_JIS";


Have a nice day :)

Thomas



reply via email to

[Prev in Thread] Current Thread [Next in Thread]