bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new modules for grapheme cluster breaking


From: Bruno Haible
Subject: Re: new modules for grapheme cluster breaking
Date: Sat, 1 Jan 2011 12:54:29 +0100
User-agent: KMail/1.9.9

Hi Ben,

> > "grapheme" or "grapheme cluster"? I'm a bit confused: The Unicode 3.0
> > book uses the term "grapheme" to denote the entity that users consider
> > to be a single character, but UAX #29 nowadays calls it "grapheme cluster".
> 
> I am being a little sloppy with terminology.  My take-away from
> the Unicode glossary definitions is that a "grapheme" is a
> user-perceived character, and a "grapheme cluster" is the
> sequence of code points that make up a grapheme.

Hmm. In <http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf> they use
only the term "grapheme cluster". So, to me it appears that "grapheme" is
an older term that they wanted to get away from. Therefore your exclusive
use of "grapheme cluster" in unigbrk.h is perfect.

One tiny improvement of your patches: In C source code, use octal escapes
instead of hexadecimal escapes. Some platform's cc compiler (IRIX 6.5 or
HP-UX 10.20 or something like that) supports only octal escapes correctly.


2011-01-01  Bruno Haible  <address@hidden>

        Avoid use of hexadecimal escapes.
        * tests/unigbrk/test-uc-is-grapheme-break.c (main): Use octal escapes
        instead of hexadecimal escapes.

--- tests/unigbrk/test-uc-is-grapheme-break.c.orig      Sat Jan  1 12:52:02 2011
+++ tests/unigbrk/test-uc-is-grapheme-break.c   Sat Jan  1 12:34:04 2011
@@ -1,5 +1,5 @@
 /* Grapheme cluster break function test.
-   Copyright (C) 2010 Free Software Foundation, Inc.
+   Copyright (C) 2010-2011 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify it
    under the terms of the GNU Lesser General Public License as published
@@ -97,12 +97,12 @@
           ucs4_t next;
 
           p += strspn (p, " \t\r\n");
-          if (!strncmp (p, "\xc3\xb7" /* ÷ */, 2))
+          if (!strncmp (p, "\303\267" /* ÷ */, 2))
             {
               should_break = true;
               p += 2;
             }
-          else if (!strncmp (p, "\xc3\x97" /* × */, 2))
+          else if (!strncmp (p, "\303\227" /* × */, 2))
             {
               should_break = false;
               p += 2;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]