[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] UAX #29 changes
From: |
Ben Pfaff |
Subject: |
Re: [bug-libunistring] UAX #29 changes |
Date: |
Sun, 29 Oct 2017 10:18:32 -0700 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Thu, Oct 26, 2017 at 04:47:35PM +0200, Daiki Ueno wrote:
> Daiki Ueno <address@hidden> writes:
>
> > I have been trying to update libunistring to Unicode 9.0.0. Initially I
> > planned it for the end of this month, but now I'm almost giving up,
> > because of the recent additions to the UAX #29 algorithms:
> >
> > - The 3 rules added to the Grapheme Cluster Boundary Rules, namely
> > (GB10, GB12, GB13), involve 3 consequent characters, while the current
> > API uc_is_grapheme_break() only takes 2 characters
> >
> > - The similar rules are also added to the Word Boundary Rules. Though
> > it wouldn't be a problem as uniwbrk.h doesn't expose such API, the
> > implementation of WB15 and WB16 could be complicated because it
> > requires lookahead of a next character
>
> As I had some time this week, I resumed this work. Thanks to the help
> of my colleagues, the above new rules involving 3 or more characters are
> now implemented without breaking the ABI.
>
> For the Grapheme Cluster Boundary rules, u*_grapheme_breaks have been
> rewritten to be more generic, taking into account of the entire
> sequence. The other API functions are still kept, but have limitations
> due to the number of arguments.
>
> Bruno, Ben, could you take a look at the attached patch, when you have
> time?
I'm impressed. I have not looked carefully at the whole patch. That is
partly because of my time constraints, but it is also partly because I
get patch rejects when I apply the patch to the tip of master for
gnulib. To what commit should I apply the patch?
Thanks,
Ben.