GUILE 2/3 and string encoding cost

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GUILE 2/3 and string encoding cost

From:	Han-Wen Nienhuys
Subject:	GUILE 2/3 and string encoding cost
Date:	Wed, 22 Jan 2020 10:00:03 +0100

I looked a bit through the GUILE source code to see what is going on.

I believe our current hypothesis (LilyPond's slowdown is caused by
expensive unicode transcoding into 32-bit strings) is incorrect.

If you look into the source code, you can see that the UTF-8 -> SCM
conversion checks if there are any code points over 255

https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620

if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
string as a normal byte array. This code walks the string twice, but that
is very cheap due to CPU cache locality, so it should be
essentially equivalent to whatever GUILE 1.8 was doing.

The conversion in the other direction is here:
https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n2065

as you can see, if the string is narrow (Latin1/ASCII), it uses the cheap
path as well.

LilyPond internally doesn't use any Unicode strings, as all our identifiers
are pure ascii, as well as internal strings (eg. font glyph names). This
means that files that do not use Unicode characters at all should have the
same overhead for strings as GUILE 1.8.

Even so, if the input flie does use UTF-8, there should be little overhead,
because the number of texts that we process is always small. LilyPond is
not a text processor.

So, what hard data do we have on GUILE 2/3 slowness, and what does that
data say?

--
Han-Wen Nienhuys - address@hidden - http://www.xs4all.nl/~hanwen

[Prev in Thread]

Current Thread

[Next in Thread]

GUILE 2/3 and string encoding cost, Han-Wen Nienhuys <=
- Re: GUILE 2/3 and string encoding cost, David Kastrup, 2020/01/22
  - Re: GUILE 2/3 and string encoding cost, Thomas Morley, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, Thomas Morley, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, Thomas Morley, 2020/01/22
  - Re: GUILE 2/3 and string encoding cost, Han-Wen Nienhuys, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, David Kastrup, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, Carl Sorensen, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, Urs Liska, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, Karlin High, 2020/01/22
    - Re: GUILE 2/3 and string encoding cost, Thomas Morley, 2020/01/22

Prev by Date: Re: Packages/modules
Next by Date: Re: document and test slur score debugging (issue 555160043 by address@hidden)
Previous by thread: guile-3.0 and LilyPond - here: /input/regression/context-defaultchild-cycle.ly fails
Next by thread: Re: GUILE 2/3 and string encoding cost
Index(es):
- Date
- Thread