[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wrong letter in title

From: David Kastrup
Subject: Re: Wrong letter in title
Date: Sun, 30 Sep 2018 15:58:53 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

David Kastrup <address@hidden> writes:

> Davide Liessi <address@hidden> writes:
>> Il giorno dom 20 mag 2018 alle ore 18:35 Davide Liessi
>> <address@hidden> ha scritto:
>>> The file
>>> \version "2.19.81"
>>> \header { title = "č" }
>>> { b1 }
>>> results in a PDF with correct printed title (lowercase c with caron)
>>> but wrong title field in metadata (Ċ, i.e. uppercase c with dot
>>> above).
>> On Sun, 20 May 2018 20:52:58 +0200 David Kastrup wrote:
>>> Ghostscript bug when converting PostScript output to PDF.  The
>>> PostScript reads (pasted from less' display)
>>> mark /Creator (LilyPond 2.21.0)
>>> /Title (<FE><FF>^A^M)
>>> /DOCINFO pdfmark
>>> which is the correct UTF16-LE string with BOM.  GhostScript however
>>> converts the ^M (0x0d) into ^J (0x0a), basically converting an ASCII CR
>>> to an ASCII LF.  Unfortunately, we are not in the middle of ASCII here.
>> Actually, it turns out that the behaviour of GhostScript is not wrong
>> and this is probably a bug in how LilyPond produces the PostScript
>> file.
>> PostScript strings must either properly escape non-ASCII or ASCII
>> non-printable bytes, e.g., as \ddd with ddd the octal representation,
>> or they must be defined as a hexadecimal string (see [1], pages
>> 29–31).
> Uh WHAT?  To quote:
>     The \ddd form may be used to include any 8-bit character constant in
>     a string.  One, two, or three octal digits may be specified, with
>     high-order overflow ignored. This notation is preferred for
>     specifying a character outside the recommended ASCII character set
>     for the PostScript language, since the notation itself stays within
>     the standard set and thereby avoids possible difficulties in
>     transmitting or storing the text of the program. It is recommended
>     that three octal digits always be used, with leading zeros as
>     needed, to prevent ambiguity. The string (\0053) , for example,
>     contains two characters—an ASCII 5 (Control-E) followed by the digit
>     3—whereas the strings (\53) and (\053) contain one character, the
>     ASCII character whose code is octal 53 (plus sign).
> Recommended/preferred is not at all equivalent to "must".  However, one
> problem indeed is that strings as such have no notion of encoding and
> CR, LF, CRLF are all equivalent.  So at least those bytes, when they
> occur as part of UTF-16, would warrant escaping.

Tracker issue: 5422 (https://sourceforge.net/p/testlilyissues/issues/5422/)
Rietveld issue: 345090043 (https://codereview.appspot.com/345090043)
Issue description:
  Escape nul, cr, newline in PDF metadata

I wasn't really aware that the strings remain pure 8-bit strings on
input and the UTF16 interpretation is private business of the pdfmark
command.  So thanks for that pointer, allowing to tackle this fairly
long-known bug.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]