[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[groff] 15/23: Support CJK fonts encoded in UTF-16 (1/6).
From: |
G. Branden Robinson |
Subject: |
[groff] 15/23: Support CJK fonts encoded in UTF-16 (1/6). |
Date: |
Thu, 21 Nov 2024 14:47:49 -0500 (EST) |
gbranden pushed a commit to branch master
in repository groff.
commit 78d1ef7c37edeb8cc39ae15bec3020eb31472bd8
Author: TANAKA Takuji <ttk@t-lab.opal.ne.jp>
AuthorDate: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (1/6).
* src/include/unicode.h (to_utf8_string): Declare new function.
* src/libs/libgroff/unicode.cpp (to_utf8_string): New function converts
input integer into UTF-8 sequence (or an HTML character entity in
hexadecimal if the integer is out of range).
---
ChangeLog | 9 +++++++++
src/include/unicode.h | 2 ++
src/libs/libgroff/unicode.cpp | 27 +++++++++++++++++++++++++++
3 files changed, 38 insertions(+)
diff --git a/ChangeLog b/ChangeLog
index 506ba31a6..7a7da1f62 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-11-20 TANAKA Takuji <ttk@t-lab.opal.ne.jp>
+
+ Support CJK fonts encoded in UTF-16 (1/6).
+
+ * src/include/unicode.h (to_utf8_string): Declare new function.
+ * src/libs/libgroff/unicode.cpp (to_utf8_string): New function
+ converts input integer into UTF-8 sequence (or an HTML character
+ entity in hexadecimal if the integer is out of range).
+
2024-11-21 G. Branden Robinson <g.branden.robinson@gmail.com>
* src/devices/grops/ps.cpp: Fix code style nits. Parenthesize
diff --git a/src/include/unicode.h b/src/include/unicode.h
index f07cbafe5..a7c915068 100644
--- a/src/include/unicode.h
+++ b/src/include/unicode.h
@@ -82,6 +82,8 @@ const size_t UNIBUFSZ = sizeof "u10FFFF"; // see glyphuni.cpp
// `unicode_to_glyph_name` might return.
const size_t GLYPHBUFSZ = sizeof "bracketrighttp"; // see uniglyph.cpp
+char *to_utf8_string (unsigned int);
+
// Local Variables:
// fill-column: 72
// mode: C++
diff --git a/src/libs/libgroff/unicode.cpp b/src/libs/libgroff/unicode.cpp
index 757a73399..dc7a1baed 100644
--- a/src/libs/libgroff/unicode.cpp
+++ b/src/libs/libgroff/unicode.cpp
@@ -98,6 +98,33 @@ const char *valid_unicode_code_sequence(const char *u, char
*errbuf)
return u;
}
+// TODO: Does gnulib have a function that does this?
+char *to_utf8_string(unsigned int ch)
+{
+ static char buf[16];
+
+ if (ch < 0x80)
+ sprintf(buf, "%c", (ch & 0xff));
+ else if (ch < 0x800)
+ sprintf(buf, "%c%c",
+ 0xc0 + ((ch >> 6) & 0x1f),
+ 0x80 + ((ch ) & 0x3f));
+ else if ((ch < 0xD800) || ((ch > 0xDFFF) && (ch < 0x10000)))
+ sprintf(buf, "%c%c%c",
+ 0xe0 + ((ch >> 12) & 0x0f),
+ 0x80 + ((ch >> 6) & 0x3f),
+ 0x80 + ((ch ) & 0x3f));
+ else if ((ch > 0xFFFF) && (ch < 0x120000))
+ sprintf(buf, "%c%c%c%c",
+ 0xf0 + ((ch >> 18) & 0x07),
+ 0x80 + ((ch >> 12) & 0x3f),
+ 0x80 + ((ch >> 6) & 0x3f),
+ 0x80 + ((ch ) & 0x3f));
+ else
+ sprintf(buf, "&#x%X;", ch);
+ return buf;
+}
+
// Local Variables:
// fill-column: 72
// mode: C++
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [groff] 15/23: Support CJK fonts encoded in UTF-16 (1/6).,
G. Branden Robinson <=