[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile-2.0 and debian

From: David Kastrup
Subject: Re: guile-2.0 and debian
Date: Sat, 19 Nov 2016 16:49:58 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

Antonio Ospite <address@hidden> writes:

> AFAICS in guile-2.0 the difference between characters and bytes is
> taken very seriously.

The problem is that lily/parser.yy and particularly lily/lexer.ll
implement robust and fast recognition and interpretation of UTF-8.  It
transparently maps them to C++ strings encoded in UTF-8.

Guile-2.0 has _no_ UTF-8 encoded strings.  Its strings are _either_
encoded in Latin-1 or in UCS-32.  Its string _ports_ are exclusively
encoded in UTF-8 and that also includes any file offsets in the string
ports.  As a result, its string port offsets are _useless_ for indexing
into strings.

If you want to get an UTF-8 string into Guile, it will get decoded into
UCS-32 only to be reencoded into UTF-8 when moved through a string port
(like when using the Scheme reader on it) and have each character be
redecoded into UCS-32 that will get reencoded into UTF-8 when getting it
back into C++.

Guile-2.0 cannot work efficiently with string ports internally since it
constantly needs to recode stuff.  Its UTF-8 encoding/decoding (unlike
that of Emacs) cannot represent anything not in proper UTF-8: it either
produces stuff that does not encode into the original, or errors out
without remedy and useful offsets.  As a consequence, pinpointing the
problem into the original string or byte sequence is unreliable.

The UTF-8 libraries Guile employs are not internal to Guile (though
partly distributed as part of Guile rather than an external dependency).
Very little active work on them has been done in recent years.

The Guile developers will be in total denial that anything is amiss with
the current situation and that there is anything wrong with the
inability of Guile to read and write UTF-8 strings without involving a
non-information preserving conversion to UCS-32 or Latin-1 and back and
having its string ports work in an encoding that its strings cannot

LilyPond uses Guile as a very tightly integrated extension language so
it constantly passes strings into Guile and back and reads from string

Actual byte streams seem like they could help keeping some of this
insanity in check, in particular if you can let the Scheme reader treat
them as if they were in UTF-8.

Now in Guile-1.8, we did a lot of the UTF-8 work seamlessly and
manually.  There are a few rough corners with that in the context of
Scheme identifiers and strings.

Doing stuff "the Guile way" instead will be good for a lot of headaches
since Guile's representations are not even compatible within Guile
itself and since any attempt of getting strings into and out of Guile
requires a conversion since Guile's internal encodings are not exposed
to its API.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]