[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GUILE 2/3 and string encoding cost
From: |
Han-Wen Nienhuys |
Subject: |
Re: GUILE 2/3 and string encoding cost |
Date: |
Wed, 22 Jan 2020 21:07:09 +0100 |
On Wed, Jan 22, 2020 at 12:01 PM David Kastrup <address@hidden> wrote:
> Han-Wen Nienhuys <address@hidden> writes:
>
> > I looked a bit through the GUILE source code to see what is going on.
> >
> > I believe our current hypothesis (LilyPond's slowdown is caused by
> > expensive unicode transcoding into 32-bit strings) is incorrect.
> >
> > If you look into the source code, you can see that the UTF-8 -> SCM
> > conversion checks if there are any code points over 255
> >
> >
> >
> https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620
> >
> > if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
> > string as a normal byte array. This code walks the string twice, but that
> > is very cheap due to CPU cache locality, so it should be
> > essentially equivalent to whatever GUILE 1.8 was doing.
>
> GUILE 1.8 did not walk the string even once
>
GUILE 1.8 walks it once when you do memcpy.
> > Even so, if the input flie does use UTF-8, there should be little
> > overhead, because the number of texts that we process is always
> > small. LilyPond is not a text processor.
> >
> > So, what hard data do we have on GUILE 2/3 slowness, and what does
> > that data say?
>
> That data says "humongous slowdown". There is not much more than
> speculation what this is caused by as far as I know.
>
>
Do we have a standardized test file for benchmarking performance?
--
Han-Wen Nienhuys - address@hidden - http://www.xs4all.nl/~hanwen
Re: GUILE 2/3 and string encoding cost, Han-Wen Nienhuys, 2020/01/22