groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 15/23: Support CJK fonts encoded in UTF-16 (1/6).


From: G. Branden Robinson
Subject: [groff] 15/23: Support CJK fonts encoded in UTF-16 (1/6).
Date: Thu, 21 Nov 2024 14:47:49 -0500 (EST)

gbranden pushed a commit to branch master
in repository groff.

commit 78d1ef7c37edeb8cc39ae15bec3020eb31472bd8
Author: TANAKA Takuji <ttk@t-lab.opal.ne.jp>
AuthorDate: Fri Dec 29 13:56:37 2023 +0000

    Support CJK fonts encoded in UTF-16 (1/6).
    
    * src/include/unicode.h (to_utf8_string): Declare new function.
    
    * src/libs/libgroff/unicode.cpp (to_utf8_string): New function converts
      input integer into UTF-8 sequence (or an HTML character entity in
      hexadecimal if the integer is out of range).
---
 ChangeLog                     |  9 +++++++++
 src/include/unicode.h         |  2 ++
 src/libs/libgroff/unicode.cpp | 27 +++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 506ba31a6..7a7da1f62 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-11-20  TANAKA Takuji <ttk@t-lab.opal.ne.jp>
+
+       Support CJK fonts encoded in UTF-16 (1/6).
+
+       * src/include/unicode.h (to_utf8_string): Declare new function.
+       * src/libs/libgroff/unicode.cpp (to_utf8_string): New function
+       converts input integer into UTF-8 sequence (or an HTML character
+       entity in hexadecimal if the integer is out of range).
+
 2024-11-21  G. Branden Robinson <g.branden.robinson@gmail.com>
 
        * src/devices/grops/ps.cpp: Fix code style nits.  Parenthesize
diff --git a/src/include/unicode.h b/src/include/unicode.h
index f07cbafe5..a7c915068 100644
--- a/src/include/unicode.h
+++ b/src/include/unicode.h
@@ -82,6 +82,8 @@ const size_t UNIBUFSZ = sizeof "u10FFFF"; // see glyphuni.cpp
 // `unicode_to_glyph_name` might return.
 const size_t GLYPHBUFSZ = sizeof "bracketrighttp"; // see uniglyph.cpp
 
+char *to_utf8_string (unsigned int);
+
 // Local Variables:
 // fill-column: 72
 // mode: C++
diff --git a/src/libs/libgroff/unicode.cpp b/src/libs/libgroff/unicode.cpp
index 757a73399..dc7a1baed 100644
--- a/src/libs/libgroff/unicode.cpp
+++ b/src/libs/libgroff/unicode.cpp
@@ -98,6 +98,33 @@ const char *valid_unicode_code_sequence(const char *u, char 
*errbuf)
   return u;
 }
 
+// TODO: Does gnulib have a function that does this?
+char *to_utf8_string(unsigned int ch)
+{
+  static char buf[16];
+
+  if (ch < 0x80)
+    sprintf(buf, "%c", (ch & 0xff));
+  else if (ch < 0x800)
+    sprintf(buf, "%c%c",
+      0xc0 + ((ch >>  6) & 0x1f),
+      0x80 + ((ch      ) & 0x3f));
+  else if ((ch < 0xD800) || ((ch > 0xDFFF) && (ch < 0x10000)))
+    sprintf(buf, "%c%c%c",
+      0xe0 + ((ch >> 12) & 0x0f),
+      0x80 + ((ch >>  6) & 0x3f),
+      0x80 + ((ch      ) & 0x3f));
+  else if ((ch > 0xFFFF) && (ch < 0x120000))
+    sprintf(buf, "%c%c%c%c",
+      0xf0 + ((ch >> 18) & 0x07),
+      0x80 + ((ch >> 12) & 0x3f),
+      0x80 + ((ch >>  6) & 0x3f),
+      0x80 + ((ch      ) & 0x3f));
+  else
+    sprintf(buf, "&#x%X;", ch);
+  return buf;
+}
+
 // Local Variables:
 // fill-column: 72
 // mode: C++



reply via email to

[Prev in Thread] Current Thread [Next in Thread]