Re: guile-2.0 and debian

lilypond-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: guile-2.0 and debian

From:	Antonio Ospite
Subject:	Re: guile-2.0 and debian
Date:	Sat, 19 Nov 2016 16:05:22 +0100
On Fri, 18 Nov 2016 00:24:19 +0100
Thomas Morley <address@hidden> wrote:

[...]
> Hi Antonio,
>

Hi,

> as said, no time to dive in deeper, though here some observations (my
> test-file attached.)
> 
> - toplevel-markups with special characters working
> - ly-identifier with special characters working
> - context-names with special characters and assigned Lyrics working
> 
> - scheme/guile-identifier with special characters _not_ working

AFAIU this is what happens with guile-2.0:

  - lilypond tries to open the input file as a bytevector input port
    (see lily/source-file.cc in Source_file::init_port ()), the encoding
    is set to Latin1 which should mean a "pure" binary port:
    https://www.gnu.org/software/guile/docs/master/guile.html/Encoding.html

  - and than when scm_read() (lily/parse-scm.cc in
    internal_ly_parse_scm()) is called to parse the embedded scm, and
    the latter is interpreted as Latin1 too.
  
  - while the identifiers outside the scm code are recognized as UTF-8.

To confirm my rationale I tried this weird input file, and it does not
give the error, even though the final output is not quite right (bööh
gets the wrong encoding too):

---------------------------------------------------------------------
#(define bääh #{ { c1^\markup "bööh" } #})
\new Staff \bÃ¤Ã¤h
---------------------------------------------------------------------

If lilypond were to set the %default-port-encoding to UTF-8 your example
would work, but I don't know if that would break something else.

Alternatively, a more "confined" change could look like this:

---------------------------------------------------------------------
diff --git a/lily/parse-scm.cc b/lily/parse-scm.cc
index 576591d..20627ed 100644
--- a/lily/parse-scm.cc
+++ b/lily/parse-scm.cc
@@ -54,7 +54,14 @@ internal_ly_parse_scm (Parse_start *ps)
   if (multiple)
     (void) scm_read_char (port);

+#if GUILEV2
+  SCM current_encoding = scm_port_encoding (port);
+  scm_set_port_encoding_x (port, ly_string2scm("UTF-8"));
   SCM form = scm_read (port);
+  scm_set_port_encoding_x (port, current_encoding);
+#else
+  SCM form = scm_read (port);
+#endif
   SCM to = scm_ftell (port);
---------------------------------------------------------------------

But, really? :)

> - pdf-meta-data with special characters _not_ working:
> 
> exiftool atest-40.pdf
> ExifTool Version Number         : 10.10
> File Name                       : atest-40.pdf
> ...skipping...
> Title                           : ??b??h
> Creator                         : LilyPond 2.19.51
> 

Your test about pdf-meta-data is enough to trigger the issue, but it
firstly lead me to a partial understanding of the issue because those
characters are also representable in Latin1. A better example is to use
the Japanese text in the metadata.

BTW, the issue is that the values of the pdf metadata fields are
expected to be in UTF-16, but since my last change the port for the
postscript file uses the Latin1 encoding globally, and guile was
substituting the characters it didn't recognize as representable in the
encoding (the first two "??" are the BOM and the other two are the
udieresis).

The following change fixes the issue at hand:

---------------------------------------------------------------------
diff --git a/scm/framework-ps.scm b/scm/framework-ps.scm
index a404119..b2b6802 100644
--- a/scm/framework-ps.scm
+++ b/scm/framework-ps.scm
@@ -28,6 +28,9 @@
              (scm clip-region)
              (lily))

+(if (guile-v2)
+  (use-modules(rnrs bytevectors)))
+
 (define format ergonomic-simple-format)

 (define framework-ps-module (current-module))
@@ -518,15 +521,22 @@
   (define (metadata-encode val)
     ;; First, call ly:encode-string-for-pdf to encode the string (Latin1 or
     ;; utf-16be), then escape all parentheses and backslashes
-    ;; FIXME guile-2.0: use (string->utf16 str 'big) instead
+    ;; With guile-2.0: use (string->utf16 str 'big) instead
+    (if (guile-v2)
+      (ps-quote (utf16->string (string->utf16 val 'big)))
+      (ps-quote (ly:encode-string-for-pdf val))))

-    (ps-quote (ly:encode-string-for-pdf val)))
   (define (metadata-lookup-output overridevar fallbackvar field)
     (let* ((overrideval (ly:modules-lookup (list header) overridevar))
            (fallbackval (ly:modules-lookup (list header) fallbackvar))
            (val (if overrideval overrideval fallbackval)))
       (if val
-          (format port "/~a (~a)\n" field (metadata-encode (markup->string val 
(list header)))))))
+        (begin
+          (format port "/~a (" field)
+          (set-port-encoding! port "UTF-16")
+          (format port "~a" (metadata-encode (markup->string val (list 
header))))
+          (set-port-encoding! port "ISO-8859-1")
+          (format port ")\n")))))

   (if (module? header)
       (begin
---------------------------------------------------------------------

This is rather ugly, but encoding only the actual _value_ of the field
in UTF-16 allows to have exactly the same output as with guile-1.8.

The issue is about a file (the postscript file) with a mixed encoding
(Latin1 and UTF-16) while the file port only has one encoding.

AFAICS in guile-2.0 the difference between characters and bytes is taken
very seriously.

I'll try to set %default-port-conversion-strategy to 'error and see if
some other issue shows up.  Where is the earliest point I can set that?

Maybe in the long run using soft ports[1] or R6RS I/O Ports[2] could be a 
solution?

[1] 
https://www.gnu.org/software/guile/manual/html_node/Soft-Ports.html#Soft-Ports
[2] 
https://www.gnu.org/software/guile/manual/html_node/R6RS-I_002fO-Ports.html#R6RS-I_002fO-Ports

But I don't really know what I am talking about here...


> Last, but _most_ serious issue:
> 
> make LANGS='' doc
> always fails in the regtests with
>
> (1) gs-warnings like
> warning: `(gs -q -dNOSAFER -dEPSCrop -dCompatibilityLevel=1.4
> -dNOPAUSE -dBATCH -r1200 -sDEVICE=pdfwrite
> -sOutputFile=./11/lily-41c82a1e.pdf -c.setpdfwrite
> -f./11/lily-41c82a1e.eps)' failed (256)
>
> although they do not happen, if I compile the relevant file separately.
>

My previous workarounds took care of the postscript output but here EPS
are produced and I was not covering this case yet.

> (2) and finally aborts with:
> Processing `./0b/lily-d193a1c3.ly'
> Parsing...
> Renaming input to:
> `/home/hermann/lilypond-git/input/regression/markup-cyclic-reference.ly'lilypond:
> dynwind.c:121: scm_dynwind_end: Assertion `WINDER_P (entry)' failed.
> Aborted (core dumped)
> 
> The last one, probably because the two patches about it, one provided
> by Antonio and the one David checked in, have
> scm_dynwind_end ();
> at different places.
> Likely I should have applied only one of them
>

Yes :)

I managed to have "make LANGS='' doc" succeed again, there was
a transient failure, but just rerunning the make command worked.

This is with the patches at:
https://ao2.it/tmp/lilypond-guile2/patches_2016-11-19/ on top of the git
tag "release/2.19.50-1".

With this setup your test file should work.

I still need to clean things up, and ask for advice for better fixes,
but I wanted to report something in case you had some time in the
week-end.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
[Prev in Thread]
Current Thread
[Next in Thread]
Re: guile-2.0 and debian, (continued)
Prev by Date: Re: Add using Extract PDFmark for document building (issue 314130043 by address@hidden)
Next by Date: Re: guile-2.0 and debian
Previous by thread: Re: guile-2.0 and debian
Next by thread: Re: guile-2.0 and debian
Index(es):
- Date
- Thread