[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wrong letter in title

From: David Kastrup
Subject: Re: Wrong letter in title
Date: Sun, 30 Sep 2018 14:52:42 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

Davide Liessi <address@hidden> writes:

> Il giorno dom 20 mag 2018 alle ore 18:35 Davide Liessi
> <address@hidden> ha scritto:
>> The file
>> \version "2.19.81"
>> \header { title = "č" }
>> { b1 }
>> results in a PDF with correct printed title (lowercase c with caron)
>> but wrong title field in metadata (Ċ, i.e. uppercase c with dot
>> above).
> On Sun, 20 May 2018 20:52:58 +0200 David Kastrup wrote:
>> Ghostscript bug when converting PostScript output to PDF.  The
>> PostScript reads (pasted from less' display)
>> mark /Creator (LilyPond 2.21.0)
>> /Title (<FE><FF>^A^M)
>> /DOCINFO pdfmark
>> which is the correct UTF16-LE string with BOM.  GhostScript however
>> converts the ^M (0x0d) into ^J (0x0a), basically converting an ASCII CR
>> to an ASCII LF.  Unfortunately, we are not in the middle of ASCII here.
> Actually, it turns out that the behaviour of GhostScript is not wrong
> and this is probably a bug in how LilyPond produces the PostScript
> file.
> PostScript strings must either properly escape non-ASCII or ASCII
> non-printable bytes, e.g., as \ddd with ddd the octal representation,
> or they must be defined as a hexadecimal string (see [1], pages
> 29–31).

Uh WHAT?  To quote:

    The \ddd form may be used to include any 8-bit character constant in
    a string.  One, two, or three octal digits may be specified, with
    high-order overflow ignored. This notation is preferred for
    specifying a character outside the recommended ASCII character set
    for the PostScript language, since the notation itself stays within
    the standard set and thereby avoids possible difficulties in
    transmitting or storing the text of the program. It is recommended
    that three octal digits always be used, with leading zeros as
    needed, to prevent ambiguity. The string (\0053) , for example,
    contains two characters—an ASCII 5 (Control-E) followed by the digit
    3—whereas the strings (\53) and (\053) contain one character, the
    ASCII character whose code is octal 53 (plus sign).

Recommended/preferred is not at all equivalent to "must".  However, one
problem indeed is that strings as such have no notion of encoding and
CR, LF, CRLF are all equivalent.  So at least those bytes, when they
occur as part of UTF-16, would warrant escaping.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]