lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XML to .ly and Lilypond, again, with a fix


From: David Wright
Subject: Re: XML to .ly and Lilypond, again, with a fix
Date: Mon, 15 May 2017 11:28:41 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

On Sun 14 May 2017 at 15:35:51 (+0100), Phil Holmes wrote:
> ----- Original Message ----- From: "Urs Liska" <address@hidden>
> To: <address@hidden>
> Sent: Sunday, May 14, 2017 3:06 PM
> Subject: Re: XML to .ly and Lilypond, again

> >Am 14.05.2017 um 16:03 schrieb Phil Holmes:
> >>I've just confirmed Ian Ring's suggestion - removing the copyright
> >>symbol allows the conversion to continue, but results in text with
> >>spurious null characters.

Only some of the text, as I reported in
http://lists.gnu.org/archive/html/lilypond-user/2017-05/msg00241.html

> >But can that be? Shouldn't MusicXML allow arbitrary regular Unicode
> >characters?
> 
> My understanding is the XML is like HTML and requires special
> characters to be escaped.

No, the norm is for XML to be written in Unicode as this one is,
hence its header:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

So the program should be handling all the data in unicode, and the
problem is the exact opposite of what I started out looking for.
Handling unicode is tricky at best in python2, and I avoided it
myself by switching to python3 before trying to do anything more
than printing unicode to output, which is all musicxml2py should
really be doing.

However, the new version tries to do one clever thing and it's in
split_string_and_preserve_doublequoted_substrings in utilities.py.
This uses the shlex module whose preamble runs:

 The shlex class makes it easy to write lexical analyzers for simple
 syntaxes resembling that of the Unix shell. This will often be useful
 for writing minilanguages, (for example, in run control files for
 Python applications) or for parsing quoted strings.

 Prior to Python 2.7.3, this module did not support Unicode input.

So the fate of the copyright symbol in printer.dump should be
to go from
 u'"\xa9"'                   ← a unicode value
to
 [u'"\xa9"']                 ← a list with one unicode value
but instead it gets mangled to
 ['"\x00\xa9\x00"', '\x00']  ← a list of ascii strings.

I don't know what the change was meant to fix as I've never used
musicxml in anger. But the easiest patch to get things to work is
to replace

 words = utilities.split_string_and_preserve_doublequoted_substrings (str)

with

 words = string.split (str)

in .../lilypond-2.19.…/lilypond/usr/share/lilypond/current/python/utilities.py
assuming you're running a downloaded version rather than one
included in your distribution. (Debian is still installing 2.18 IIRC.)

Cheers,
David.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]