[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GUILE 2/3 and string encoding cost

From: David Kastrup
Subject: Re: GUILE 2/3 and string encoding cost
Date: Wed, 22 Jan 2020 12:01:53 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Han-Wen Nienhuys <address@hidden> writes:

> I looked a bit through the GUILE source code to see what is going on.
> I believe our current hypothesis (LilyPond's slowdown is caused by
> expensive unicode transcoding into 32-bit strings) is incorrect.
> If you look into the source code, you can see that the UTF-8 -> SCM
> conversion checks if there are any code points over 255
> if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
> string as a normal byte array. This code walks the string twice, but that
> is very cheap due to CPU cache locality, so it should be
> essentially equivalent to whatever GUILE 1.8 was doing.

GUILE 1.8 did not walk the string even once.

> LilyPond internally doesn't use any Unicode strings, as all our
> identifiers are pure ascii, as well as internal strings (eg. font
> glyph names). This means that files that do not use Unicode characters
> at all should have the same overhead for strings as GUILE 1.8.

We already use the latin1 calls for LilyPond internals.

> Even so, if the input flie does use UTF-8, there should be little
> overhead, because the number of texts that we process is always
> small. LilyPond is not a text processor.
> So, what hard data do we have on GUILE 2/3 slowness, and what does
> that data say?

That data says "humongous slowdown".  There is not much more than
speculation what this is caused by as far as I know.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]