[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GUILE 2/3 and string encoding cost
From: |
David Kastrup |
Subject: |
Re: GUILE 2/3 and string encoding cost |
Date: |
Wed, 22 Jan 2020 12:01:53 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) |
Han-Wen Nienhuys <address@hidden> writes:
> I looked a bit through the GUILE source code to see what is going on.
>
> I believe our current hypothesis (LilyPond's slowdown is caused by
> expensive unicode transcoding into 32-bit strings) is incorrect.
>
> If you look into the source code, you can see that the UTF-8 -> SCM
> conversion checks if there are any code points over 255
>
>
> https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620
>
> if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
> string as a normal byte array. This code walks the string twice, but that
> is very cheap due to CPU cache locality, so it should be
> essentially equivalent to whatever GUILE 1.8 was doing.
GUILE 1.8 did not walk the string even once.
> LilyPond internally doesn't use any Unicode strings, as all our
> identifiers are pure ascii, as well as internal strings (eg. font
> glyph names). This means that files that do not use Unicode characters
> at all should have the same overhead for strings as GUILE 1.8.
We already use the latin1 calls for LilyPond internals.
> Even so, if the input flie does use UTF-8, there should be little
> overhead, because the number of texts that we process is always
> small. LilyPond is not a text processor.
>
> So, what hard data do we have on GUILE 2/3 slowness, and what does
> that data say?
That data says "humongous slowdown". There is not much more than
speculation what this is caused by as far as I know.
--
David Kastrup