bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] _wordbreaks/_grapheme_breaks and break count?


From: Ben Pfaff
Subject: Re: [bug-libunistring] _wordbreaks/_grapheme_breaks and break count?
Date: Tue, 2 Sep 2014 20:43:08 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Sep 02, 2014 at 03:18:14PM -0400, Andrew Boling wrote:
> >
> > I wrote the grapheme break functions.  It didn't occur to me that it would
> > be
> > useful to return anything, because usually the breakpoints are scanned to
> > find good places to break, and usually those are pretty common.
> >
> 
> It's probably not a common use case (otherwise someone would have said the
> same thing about the _wordbreaks series already), but I'll elaborate a
> little bit to help demonstrate an applicable scenario.
> 
> The strings my functions operate on are arrays in memory with associated
> link counts. The original code used random access to perform string
> manipulation, but that's not a valid approach when n_bytes != n_codepoints
> (non-ASCII). The new approach I'm using is to pre-generate the grapheme
> breaks when the string is instantiated (u8_wordbreaks). This way the break
> positions are only calculated once across the life of that string. Knowing
> the grapheme count is beneficial here as the operation can be immediately
> rejected without an additional scan.

This is a use case that makes sense to me.  It would not cost very much
to return the number of grapheme or word breaks.  That argues toward
adding it.

On the other hand: Are you using grapheme breaks or word breaks?  The
u*_grapheme_breaks() functions, in particular, are very simple, and use
only public libunistring interfaces, so it would be very easy for the
library client to implement its own specialized version that also
returns the number of breaks found.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]