[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-libunistring] Re: new modules for grapheme cluster breaking
From: |
Bruno Haible |
Subject: |
[bug-libunistring] Re: new modules for grapheme cluster breaking |
Date: |
Sat, 1 Jan 2011 12:54:29 +0100 |
User-agent: |
KMail/1.9.9 |
Hi Ben,
> > "grapheme" or "grapheme cluster"? I'm a bit confused: The Unicode 3.0
> > book uses the term "grapheme" to denote the entity that users consider
> > to be a single character, but UAX #29 nowadays calls it "grapheme cluster".
>
> I am being a little sloppy with terminology. My take-away from
> the Unicode glossary definitions is that a "grapheme" is a
> user-perceived character, and a "grapheme cluster" is the
> sequence of code points that make up a grapheme.
Hmm. In <http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf> they use
only the term "grapheme cluster". So, to me it appears that "grapheme" is
an older term that they wanted to get away from. Therefore your exclusive
use of "grapheme cluster" in unigbrk.h is perfect.
One tiny improvement of your patches: In C source code, use octal escapes
instead of hexadecimal escapes. Some platform's cc compiler (IRIX 6.5 or
HP-UX 10.20 or something like that) supports only octal escapes correctly.
2011-01-01 Bruno Haible <address@hidden>
Avoid use of hexadecimal escapes.
* tests/unigbrk/test-uc-is-grapheme-break.c (main): Use octal escapes
instead of hexadecimal escapes.
--- tests/unigbrk/test-uc-is-grapheme-break.c.orig Sat Jan 1 12:52:02 2011
+++ tests/unigbrk/test-uc-is-grapheme-break.c Sat Jan 1 12:34:04 2011
@@ -1,5 +1,5 @@
/* Grapheme cluster break function test.
- Copyright (C) 2010 Free Software Foundation, Inc.
+ Copyright (C) 2010-2011 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published
@@ -97,12 +97,12 @@
ucs4_t next;
p += strspn (p, " \t\r\n");
- if (!strncmp (p, "\xc3\xb7" /* ÷ */, 2))
+ if (!strncmp (p, "\303\267" /* ÷ */, 2))
{
should_break = true;
p += 2;
}
- else if (!strncmp (p, "\xc3\x97" /* × */, 2))
+ else if (!strncmp (p, "\303\227" /* × */, 2))
{
should_break = false;
p += 2;
- [bug-libunistring] Re: new modules for grapheme cluster breaking,
Bruno Haible <=