[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case mapping of sharp s

From: Stephen J. Turnbull
Subject: Re: Case mapping of sharp s
Date: Thu, 19 Nov 2009 10:57:17 +0900

Eli Zaretskii writes:

 > Yes, I know: you explained it earlier in this thread.  What I thought
 > would help is that fold("SS") and fold("ß") have the same number of
 > bytes, which I think was the problem that prevented the use of BM.

No, the problem is that BM thinks in terms of code units, and only
works "out of the box" if you are comparing a single code unit
(perhaps transformed by "folding").  In fixed width representations
you could use a 16-bit unit, but UTF-8 is not fixed width, so you
can't guarantee the appropriate alignment.  Thus the code unit is 1
octet, and the problem is that the width of fold("SS") is not 1 code

You also bloat the table from 256 bytes to 65536 bytes, which is
perhaps not large compared to modern memories, but still is a pretty
alarming factor of increase.  It's possible a sparse representation
would work OK, but that would clearly have a dramatic negative impact
on performance.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]