octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Narrow UTF-16 and UTF-32 to ASCII when loading .mat files.


From: Jason Riedy
Subject: [PATCH] Narrow UTF-16 and UTF-32 to ASCII when loading .mat files.
Date: Thu, 27 Sep 2007 16:12:29 -0700
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.50 (gnu/linux)

This is somewhat nasty, but something is necessary for loading
the UF sparse matrix collection.  Rather than play with iconv
and "true" conversions, just carry along the convention of
replacing out-of-ASCII-range entries with '?'.

Signed-off-by: Jason Riedy <address@hidden>
---

  Note: With this patch, the main CVS branch can load every
  matrix from the UF collection using Dr. Davis's UFget
  interface.  Kinda handy.

 src/ls-mat5.cc |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/ls-mat5.cc b/src/ls-mat5.cc
index a9128ed..57e10e7 100644
--- a/src/ls-mat5.cc
+++ b/src/ls-mat5.cc
@@ -140,6 +140,7 @@ read_mat5_binary_data (std::istream& is, double *data,
       read_doubles (is, data, LS_SHORT, count, swap, flt_fmt);
       break;
 
+    case miUTF16:
     case miUINT16:
       read_doubles (is, data, LS_U_SHORT, count, swap, flt_fmt);
       break;
@@ -148,6 +149,7 @@ read_mat5_binary_data (std::istream& is, double *data,
       read_doubles (is, data, LS_INT, count, swap, flt_fmt);
       break;
 
+    case miUTF32:
     case miUINT32:
       read_doubles (is, data, LS_U_INT, count, swap, flt_fmt);
       break;
@@ -1251,8 +1253,20 @@ read_mat5_binary_element (std::istream& is, const 
std::string& filename,
              {
                if (type == miUTF16 || type == miUTF32)
                  {
-                   error ("load: can not read Unicode UTF16 and UTF32 encoded 
characters");
-                   goto data_read_error;
+                   bool found_big_char = false;
+                   for (int i = 0; i < n; i++)
+                     {
+                       if (re(i) > 127) {
+                         re(i) = '?';
+                         found_big_char = true;
+                       }
+                     }
+
+                   if (found_big_char)
+                     {
+                       warning ("load: can not read non-ASCII portions of UTF 
characters.");
+                       warning ("      Replacing unreadable characters with 
'?'.");
+                     }
                  }
                else if (type == miUTF8)
                  {
-- 
1.5.3.2




reply via email to

[Prev in Thread] Current Thread [Next in Thread]