guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, string-abstraction, updated. feda5bd1c


From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, string-abstraction, updated. feda5bd1c08d69628ec9fdb5577d5c894b70b6e8
Date: Sun, 26 Apr 2009 16:15:34 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=feda5bd1c08d69628ec9fdb5577d5c894b70b6e8

The branch, string-abstraction has been updated
       via  feda5bd1c08d69628ec9fdb5577d5c894b70b6e8 (commit)
       via  a1386c72a49c5f047c7d6451ffdfd23933c0a5c8 (commit)
       via  a84d5f1e3d290417d443d2a29b25e173bb8c3da3 (commit)
       via  3aa3f19175fa199090632889ef874c5516209649 (commit)
       via  cca0264be4c44ede8fecdad0100c298b41a832c2 (commit)
       via  ed9ba980a3d54529ac86cd8b99877a5e9825a51b (commit)
       via  d0ba78e9a92ccf31d845ca3de79d3cf20343db5c (commit)
      from  274d338ba7ee788d06ec9a87708afc3130c8616d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit feda5bd1c08d69628ec9fdb5577d5c894b70b6e8
Author: Michael Gran <address@hidden>
Date:   Sun Apr 26 09:13:38 2009 -0700

    Lowercase a character name

commit a1386c72a49c5f047c7d6451ffdfd23933c0a5c8
Author: Michael Gran <address@hidden>
Date:   Sun Apr 26 09:11:34 2009 -0700

    Do character reading in 32bit
    
        * read.c: Allow characters of the form #\x where x can be a wide
        codepoint.  Hack around the fact that number processing is still
        narrow only.

commit a84d5f1e3d290417d443d2a29b25e173bb8c3da3
Author: Michael Gran <address@hidden>
Date:   Sun Apr 26 09:02:43 2009 -0700

    Refactor writing symbols; don't double-covert strings in VM
    
        * vm-i-loader.c: Don't double-convert strings and symbols to the
        locale.
    
        * vm.c: Pick up symbols.h
    
        * symbols.h: Add declaration
    
        * symbols.c (scm_i_is_keywordish_symbol): Do kewordish
        determination here and not in print.c
    
        * strings.h: Add declarations for scm_i_symbol_name*
    
        * strings.c (is_printable_ascii): New
        (string_escape_char): New
        (symbol_escape_char): New
        (hex_escape_length): New
        (hex_escape): New
        (scm_i_to_write_string): Use new helper funcs to reduce complexity
        (scm_i_symbol_name_to_write_string): Do this conversion here and
        not in the print module.  Use new helper funcs. Draw a distiction
        between writing and displaying for symbols, where writing is 7-bit
        ASCII always.
        (scm_i_symbol_name_to_locale_string): Do this conversion here and
        not in the print module.  Use new help funcs
    
        * print.c (quote_keywordish_symbol): Move determination of whether
        a symbol is keywordish to symbols module.
        (iprin1): Move symbol to c string conversion to strings module.

commit 3aa3f19175fa199090632889ef874c5516209649
Author: Michael Gran <address@hidden>
Date:   Fri Apr 24 20:28:59 2009 -0700

    Add locale to test 8-bit data
    
        * regexp.test: By setting the locale to ISO-8859-1, the regexp
        tests will work with the 8-bit chars

commit cca0264be4c44ede8fecdad0100c298b41a832c2
Author: Michael Gran <address@hidden>
Date:   Fri Apr 24 20:27:36 2009 -0700

    Clarify test for symbol syntax
    
        * symbols.test ("symbol syntax"): Clarificaions to tests of syntax
        symbols.

commit ed9ba980a3d54529ac86cd8b99877a5e9825a51b
Author: Michael Gran <address@hidden>
Date:   Fri Apr 24 20:25:16 2009 -0700

    Use R6RS format string escapes in tests
    
        * reader.test ("reading"): Change tests to use R6RS-format
        string escapes
    
        * ports.test ("port-column"): Change tests to use R6RS-format
        string escapes

commit d0ba78e9a92ccf31d845ca3de79d3cf20343db5c
Author: Michael Gran <address@hidden>
Date:   Fri Apr 24 20:21:23 2009 -0700

    Fix regressions from wide string conversion
    
        * unif.c (scm_istr2bve): Forgot to iterate over string.
    
        * read.c: Don't capitalize error messages. Remember to strip of
        colon on postfix keywords.
    
        * socket.c (scm_send): String length varible never initialized
    
        * strings.c (scm_take_locale_stringn): Forgot to return value

-----------------------------------------------------------------------

Summary of changes:
 libguile/print.c              |   26 ++-
 libguile/read.c               |  153 +++++++++++---
 libguile/socket.c             |    1 +
 libguile/strings.c            |  462 +++++++++++++++++++++++++++++++++-------
 libguile/strings.h            |    6 +
 libguile/symbols.c            |   22 ++
 libguile/symbols.h            |    2 +
 libguile/unif.c               |    2 +-
 libguile/vm-i-loader.c        |    6 +-
 libguile/vm.c                 |    1 +
 module/srfi/srfi-19.scm       |    2 +-
 test-suite/tests/ports.test   |   16 +-
 test-suite/tests/reader.test  |    4 +-
 test-suite/tests/regexp.test  |    2 +
 test-suite/tests/symbols.test |   37 +++-
 15 files changed, 595 insertions(+), 147 deletions(-)

diff --git a/libguile/print.c b/libguile/print.c
index f3ed140..da03523 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -316,12 +316,8 @@ static int
 quote_keywordish_symbol (SCM str)
 {
   SCM option;
-  size_t len = scm_i_symbol_length (str);
 
-  /* LEN is guaranteed to be > 0.
-   */
-  if (!scm_i_symbol_ref_eq_char (str, 0, ':')
-      && !scm_i_symbol_ref_eq_char (str, len - 1, ':'))
+  if (!scm_i_is_keywordish_symbol (str))
     return 0;
 
   option = SCM_PRINT_KEYWORD_STYLE;
@@ -620,15 +616,31 @@ iprin1 (SCM exp, SCM port, scm_print_state *pstate)
        case scm_tc7_symbol:
          if (scm_i_symbol_is_interned (exp))
            {
-             scm_i_print_symbol_name (exp, port);
+             char *str;
+             if (SCM_WRITINGP (pstate))
+               str = scm_i_symbol_name_to_write_string (exp, 
quote_keywordish_symbol (exp));
+             else
+               str = scm_i_symbol_name_to_locale_string (exp, 
quote_keywordish_symbol (exp));
+             scm_lfwrite (str, strlen (str), port);
+             free (str);
+             scm_remember_upto_here_1 (exp);
            }
          else
            {
+             char *str;
+
              scm_puts ("#<uninterned-symbol ", port);
-             scm_i_print_symbol_name (exp, port);
+             if (SCM_WRITINGP (pstate))
+               str = scm_i_symbol_name_to_write_string (exp, 
quote_keywordish_symbol (exp));
+             else
+               str = scm_i_symbol_name_to_locale_string (exp, 
quote_keywordish_symbol (exp));
+
+             scm_lfwrite (str, strlen (str), port);
              scm_putc (' ', port);
              scm_uintprint (SCM_UNPACK (exp), 16, port);
              scm_putc ('>', port);
+             free (str);
+             scm_remember_upto_here_1 (exp);
            }
          break;
        case scm_tc7_variable:
diff --git a/libguile/read.c b/libguile/read.c
index 098b75f..06a1555 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -186,13 +186,42 @@ static inline SCM scm_read_scsh_block_comment (int chr, 
SCM port);
 /* Read from PORT until a delimiter (e.g., a whitespace) is read.  Return
    zero if the whole token fits in BUF, non-zero otherwise.  */
 static inline int
-read_token (SCM port, char *buf, size_t buf_size, size_t *read)
+read_token (SCM port, scm_t_uint32 *buf, size_t buf_size, size_t *read)
 {
   *read = 0;
 
   while (*read < buf_size)
     {
-      int chr;
+      scm_t_uint32 chr;
+
+      chr = scm_getc (port);
+      chr = (SCM_CASE_INSENSITIVE_P ? CHAR_DOWNCASE (chr) : chr);
+
+      if (chr == (scm_t_uint32) EOF)
+       return 0;
+      else if (CHAR_IS_DELIMITER (chr))
+       {
+         scm_ungetc (chr, port);
+         return 0;
+       }
+      else
+       {
+         *buf = chr;
+         buf++, (*read)++;
+       }
+    }
+
+  return 1;
+}
+
+static inline int
+read_token2 (SCM port, char *buf, size_t buf_size, size_t *read)
+{
+  *read = 0;
+
+  while (*read < buf_size)
+    {
+      scm_t_uint32 chr;
 
       chr = scm_getc (port);
       chr = (SCM_CASE_INSENSITIVE_P ? CHAR_DOWNCASE (chr) : chr);
@@ -206,7 +235,7 @@ read_token (SCM port, char *buf, size_t buf_size, size_t 
*read)
        }
       else
        {
-         *buf = (char) chr;
+         *buf = chr;
          buf++, (*read)++;
        }
     }
@@ -373,6 +402,35 @@ scm_read_sexp (int chr, SCM port)
 }
 #undef FUNC_NAME
 
+static int
+update_hex_value (scm_t_uint32 *c, scm_t_uint32 digit) 
+{
+  *c = *c << 4;
+  if      ('0' <= digit && digit <= '9') 
+    *c += digit - '0';
+  else if ('A' <= digit && digit <= 'F') 
+    *c += digit - 'A' + 10;
+  else if ('a' <= digit && digit <= 'f') 
+    *c += digit - 'a' + 10;
+  else 
+    return 0;
+
+  return 1;
+}
+
+static int
+update_octal_value (scm_t_uint32 *c, scm_t_uint32 digit) 
+{
+  *c = *c << 3;
+  if ('0' <= digit && digit <= '9') 
+    *c += digit - '0';
+  else
+    return 0;
+
+  return 1;
+}
+
+
 static scm_t_uint32
 scm_read_hex_escape (const char *func_name, SCM port)
 {
@@ -405,14 +463,8 @@ scm_read_hex_escape (const char *func_name, SCM port)
   c = 0;
   for (i = 2; i < siz-1; i ++)
     {
-      c = c << 4;
-      if      ('0' <= digit[i] && digit[i] <= '9') 
-       c += digit[i] - '0';
-      else if ('A' <= digit[i] && digit[i] <= 'F') 
-       c += digit[i] - 'A' + 10;
-      else if ('a' <= digit[i] && digit[i] <= 'f') 
-       c += digit[i] - 'a' + 10;
-      else goto bad_hex_escaped;
+      if (!update_hex_value (&c, digit[i]))
+       goto bad_hex_escaped;
     }
   if ((c < 0) 
       || (c > SCM_CODEPOINT_MAX)
@@ -511,7 +563,7 @@ scm_read_string (int chr, SCM port)
          default:
          bad_escaped:
            scm_i_input_error (FUNC_NAME, port,
-                              "Invalid character in escape sequence: ~S",
+                              "invalid character in escape sequence: ~S",
                               scm_list_1 (SCM_MAKE_CHAR (c)));
          }
       scm_i_string_set (str, c_str_len, SCM_MAKE_CHAR (c));
@@ -540,7 +592,7 @@ scm_read_number (int chr, SCM port)
   scm_ungetc (chr, port);
   do
     {
-      overflow = read_token (port, buffer, sizeof (buffer), &read);
+      overflow = read_token2 (port, buffer, sizeof (buffer), &read);
 
       if ((overflow) || (scm_is_pair (str)))
        str = scm_cons (scm_from_locale_stringn (buffer, read), str);
@@ -606,14 +658,27 @@ scm_read_mixed_case_symbol (char *func_name, int chr, SCM 
port)
          {
          case (scm_t_uint32) EOF:
            scm_i_input_error (func_name, port,
-                              "Invalid symbol terminator: ~S",
+                              "invalid symbol terminator: ~S",
                               scm_list_1 (SCM_MAKE_8BIT_CHAR ('\\')));
+#ifdef BRACKETS_AS_PARENS
+         case '[':
+         case ']':
+#endif
+         case '(':
+         case ')':
+         case '"':
+         case ';':
+         case '#':
+         case SCM_WHITE_SPACES:
+         case SCM_LINE_INCREMENTORS:
+         case '\\':
+           break;
          case 'x':
            c = scm_read_hex_escape (func_name, port);
            break;
          default:
            scm_i_input_error (func_name, port,
-                              "Invalid character in escape sequence: ~S",
+                              "invalid character in escape sequence: ~S",
                               scm_list_1 (SCM_MAKE_CHAR (c)));
          }
       scm_i_string_set (str, c_str_len, SCM_MAKE_CHAR (c));
@@ -624,11 +689,16 @@ scm_read_mixed_case_symbol (char *func_name, int chr, SCM 
port)
 
   str =  scm_c_substring_copy (str, 0, c_str_len);
 
-  result = scm_string_to_symbol (str);
-
   /* Per SRFI-88, `:' alone is an identifier, not a keyword.  */
-  if (postfix && ends_with_colon && (scm_c_string_length (result) > 1))
-    result = scm_symbol_to_keyword (result);
+  if (postfix && ends_with_colon && (scm_i_string_length (str) > 1))
+    {
+      /* This is a string that ends in a colon that is meant to be a
+        keyword.  The colon gets stripped off before conversion.  */
+      str = scm_c_substring (str, 0, c_str_len - 1);
+      result = scm_symbol_to_keyword (scm_string_to_symbol (str));
+    }
+  else
+    result = scm_string_to_symbol (str);
 
   return result;
 }
@@ -673,7 +743,7 @@ scm_read_number_and_radix (int chr, SCM port)
 
   do
     {
-      overflow = read_token (port, buffer, sizeof (buffer), &read);
+      overflow = read_token2 (port, buffer, sizeof (buffer), &read);
 
       if ((overflow) || (scm_is_pair (str)))
        str = scm_cons (scm_from_locale_stringn (buffer, read), str);
@@ -788,8 +858,9 @@ scm_read_character (int chr, SCM port)
 #define FUNC_NAME "scm_lreadr"
 {
   unsigned i;
-  char charname[READER_CHAR_NAME_MAX_SIZE], c;
+  scm_t_uint32 charname[READER_CHAR_NAME_MAX_SIZE], c;
   size_t charname_len;
+  scm_t_uint32 val;
   SCM p;
 
   if (read_token (port, charname, sizeof (charname), &charname_len))
@@ -813,28 +884,30 @@ scm_read_character (int chr, SCM port)
   p = SCM_BOOL_F;
   if (charname[0] == 'x')
     {
+      val = 0;
       for (i = 1; i < charname_len; i++)
        {
          c = charname[i];
-         if (! isxdigit(c))
+         if (!update_hex_value (&val, c))
            scm_i_input_error (FUNC_NAME, port, "invalid character escape ~a",
-                              scm_list_1 (scm_from_locale_stringn (charname, 
-                                                                   
charname_len)));
+                              scm_list_1 (scm_from_utf32_stringn (charname, 
+                                                                  
charname_len)));
        }
-      p = scm_c_locale_stringn_to_number (charname+1, charname_len-1, 16);
+      p = scm_from_uint32 (val);
     }
   else if (*charname >= '0' && *charname < '8')
     {
       /* Handle characters in the Guile-extension octal format.  */
+      val = 0;
       for (i = 0; i < charname_len; i++)
        {
          c = charname[i];
-         if (c < '0' || c > '8')
+         if (!update_octal_value (&val, c))
            scm_i_input_error (FUNC_NAME, port, "invalid character escape ~a",
-                              scm_list_1 (scm_from_locale_stringn (charname, 
-                                                                   
charname_len)));
+                              scm_list_1 (scm_from_utf32_stringn (charname, 
+                                                                  
charname_len)));
        }
-      p = scm_c_locale_stringn_to_number (charname, charname_len, 8);
+      p = scm_from_uint32 (val);
     }
   if (scm_is_integer (p))
     { 
@@ -845,22 +918,34 @@ scm_read_character (int chr, SCM port)
        return scm_integer_to_char (p);
       else
        scm_i_input_error (FUNC_NAME, port, "out of range character escape ~a",
-                          scm_list_1 (scm_from_locale_stringn (charname, 
-                                                               charname_len)));
+                          scm_list_1 (scm_from_utf32_stringn (charname, 
+                                                              charname_len)));
     }
        
 
   /* Lookup named character.  */
   {
-    SCM ch = scm_i_charname_to_char (charname, charname_len);
+    /* FIXME: do this in uint32.  */
+    char *str = scm_malloc (charname_len +1);
+    int i;
+
+    for(i = 0; i < charname_len; i++)
+      {
+       if (charname[i] > 127)
+         goto char_error;
+       str[i] = charname[i];
+      }
+    str[charname_len] = '\0';
+    SCM ch = scm_i_charname_to_char (str, charname_len);
+    free (str);
     if (scm_is_true (ch))
       return ch;
   }
 
  char_error:
   scm_i_input_error (FUNC_NAME, port, "unknown character name ~a",
-                    scm_list_1 (scm_from_locale_stringn (charname,
-                                                         charname_len)));
+                    scm_list_1 (scm_from_utf32_stringn (charname,
+                                                        charname_len)));
 
   return SCM_UNSPECIFIED;
 }
diff --git a/libguile/socket.c b/libguile/socket.c
index 51ff902..8c7b639 100644
--- a/libguile/socket.c
+++ b/libguile/socket.c
@@ -1487,6 +1487,7 @@ SCM_DEFINE (scm_send, "send", 2, 1, 0,
     flg = scm_to_int (flags);
   fd = SCM_FPORT_FDES (sock);
 
+  len = scm_i_string_length (message);
   if (scm_i_is_narrow_string (message))
     {
       src = scm_i_string_chars (message);
diff --git a/libguile/strings.c b/libguile/strings.c
index f3fbb21..bb1c17c 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -628,6 +628,9 @@ scm_i_c_take_symbol (char *name, size_t len,
 size_t
 scm_i_symbol_length (SCM sym)
 {
+  if (!scm_is_symbol (sym))
+    scm_wrong_type_arg ("scm_i_symbol_length", 0, sym);
+
   return STRINGBUF_LENGTH (SYMBOL_STRINGBUF (sym));
 }
 
@@ -1170,6 +1173,109 @@ scm_i_symbol_strcmp (SCM sym, const char *str)
     }      
 }
 
+
+/* Displaying and writing strings */
+
+static int
+is_printable_ascii (scm_t_uint32 ch)
+{
+  return (ch >= 0x20 && ch <= 0x7E);
+}
+
+#define NO_MATCH ((scm_t_uint32)(-1))
+
+static scm_t_uint32
+string_escape_char (scm_t_uint32 ch)
+{
+  if (ch == '"' || ch == '\\')
+    return ch;
+  else if (ch == '\0')
+    return '0';
+  else if (ch == '\f')
+    return 'f';
+  else if (ch == '\n')
+    return 'n';
+  else if (ch == '\r')
+    return 'r';
+  else if (ch == '\t')
+    return 't';
+  else if (ch == '\007')
+    return 'a';
+  else if (ch == '\010')
+    return 'b';
+  else if (ch == '\v')
+    return 'v';
+  else 
+    return NO_MATCH;
+}
+
+static scm_t_uint32
+symbol_escape_char (scm_t_uint32 c)
+{
+  if ( c == '(' || c == ')' || c == '"' || c == ';' || c == '#'
+#ifdef BRACKETS_AS_PARENS
+       || c == '[' || c == ']'
+#endif
+#ifdef MSDOS
+       || c == SCM_ASCII_SUB_CHAR
+#endif
+       || c == ' '  || c == '\t' 
+       || c == '\n' || c == '\r' || c == '\f')
+    return c;
+  else
+    return NO_MATCH;
+}
+
+static size_t
+hex_escape_length (scm_t_uint32 ch)
+{
+  size_t olen;
+
+  if (ch <= 0xF)               /* \xN; */
+    olen = 4;                  
+  else if (ch <= 0xFF)         /* \xNN; */
+    olen = 5;
+  else if (ch <= 0xFFF)                /* \xNNN; */
+    olen = 6;
+  else if (ch <= 0xFFFF)       /* \xNNNN; */
+    olen = 7;
+  else if (ch <= 0xFFFFF)      /* \xNNNNN; */
+    olen = 8;
+  else if (ch <= SCM_CODEPOINT_MAX)
+    olen = 9;                  /* \x10NNNN; */
+  else 
+    /* This should be impossible.  */
+    abort ();
+
+  return olen;
+}
+
+static const char *
+hex_escape (scm_t_uint32 ch, size_t *plen)
+{
+  static char const hex[]="0123456789abcdef";
+  static char odata[10];
+  size_t j = 0;
+
+  odata[j++] = '\\';
+  odata[j++] = 'x';
+  if (ch >= 0x100000)
+    odata[j++] = hex [(ch & 0xF00000) >> 20];
+  if (ch >= 0x10000)
+    odata[j++] = hex [(ch & 0xF0000) >> 16];
+  if (ch >= 0x1000)
+    odata[j++] = hex [(ch & 0xF000) >> 12];
+  if (ch >= 0x100)
+    odata[j++] = hex [(ch & 0xF00) >> 8];
+  if (ch >= 0x10)
+    odata[j++] = hex [(ch & 0xF0) >> 4];
+  odata[j++] = hex [ch & 0xF];
+  odata[j++] = ';';
+
+  odata[j] = '\0';
+  *plen = j;
+  return odata;
+}
 
 /* Encode STR as a machine-readable 7-bit string for use with the
    write command.  */
@@ -1178,8 +1284,7 @@ scm_i_to_write_string (SCM str)
 {
   size_t i, j, ilen, olen;
   char *odata;
-  scm_t_uint32 ch;
-  static char const hex[]="0123456789abcdef";
+  scm_t_uint32 ch, ch2;
 
   /* The output length begins at 3: 2 for quotation marks plus 1 for
      null terminator.  */
@@ -1190,35 +1295,15 @@ scm_i_to_write_string (SCM str)
   for (i=0; i < ilen; i++)
     {
       ch = scm_i_string_ref_to_uint32 (str, i);
-      if (ch >= 0x20 && ch <= 0x7E && ch != '"' && ch != '\\')
-       {
-         /* These chars are printable ASCII */
-         olen ++;
-       }
-      else if (ch == '"' || ch == '\\' || ch == '\0' || ch == '\f'
-              || ch == '\n' || ch == '\r' || ch == '\t' || ch == '\007'
-              || ch == '\010' || ch == '\v')
-       {
-         /* These chars have special escapes.  */
-         olen += 2;
-       }
+      if (string_escape_char (ch) != NO_MATCH)
+       olen += 2;
+      else if (is_printable_ascii (ch))
+       olen ++;
       /* Otherwise, print it as a generic hex escape.  */
-      else if (ch <= 0xF)
-       olen += 4;              /* \xN; */
-      else if (ch <= 0xFF)     /* \xNN; */
-       olen += 5;
-      else if (ch <= 0xFFF)    /* \xNNN; */
-       olen += 6;
-      else if (ch <= 0xFFFF)   /* \xNNNN; */
-       olen += 7;
-      else if (ch <= 0xFFFFF)  /* \xNNNNN; */
-       olen += 8;
-      else if (ch <= SCM_CODEPOINT_MAX)
-       olen += 9;              /* \x10NNNN; */
-      else 
-       /* This should be impossible.  */
-       abort ();
+      else
+       olen += hex_escape_length (ch);
     }
+  
   /* Pass 2: create the output string.  */
   odata = scm_malloc (olen);
   j = 0;
@@ -1226,83 +1311,267 @@ scm_i_to_write_string (SCM str)
   for (i=0; i < ilen; i++)
     {
       ch = scm_i_string_ref_to_uint32 (str, i);
-      if (ch >= 0x20 && ch <= 0x7E && ch != '"' && ch != '\\')
+      if ((ch2 = string_escape_char (ch)) != NO_MATCH)
        {
-         /* These chars are printable ASCII.  */
-         odata[j++] = (char) ch;
+         odata[j++] = '\\';
+         odata[j++] = (char) ch2;
        }
-      /* Else, check the special escapes.  */
-      else if (ch == '"' || ch == '\\')
+      else if (is_printable_ascii (ch))
        {
-         odata[j++] = '\\';
+         /* These chars are unescaped.  */
          odata[j++] = (char) ch;
        }
-      else if (ch == '\0')
+      else
        {
-         odata[j++] = '\\';
-         odata[j++] = '0';
+         /* Otherwise, print it in the form \xNNN;  */
+         size_t len;
+         const char *hex;
+         hex = hex_escape (ch, &len);
+         memcpy (odata + j, hex, len);
+         j += len;
        }
-      else if (ch == '\f')
+    }
+  odata[j++] = '"';
+  odata[j++] = '\0';
+
+  if (j != olen)
+    scm_misc_error ("write string","internal string-size error~s vs ~s", 
+                   scm_list_2 (scm_from_size_t (j), 
+                               scm_from_size_t (olen)));
+
+  scm_remember_upto_here_1 (str);
+  
+  return odata;
+}
+
+
+/* Encode SYM as a machine-readable 7-bit string for use with the
+   write command.  */
+char *
+scm_i_symbol_name_to_write_string (SCM sym, int quote_key_flag)
+{
+  /* If the name contains weird characters, we'll escape them with
+   * backslashes and set this flag; it indicates that we should surround the
+   * name with "#{" and "}#". */
+  size_t i, j, ilen, olen;
+  char *odata;
+  scm_t_uint32 c, c2;
+  int weird = 0;
+
+  /* Pass 1: Determine if it requires extended symbol syntax, aka if
+     it is weird.  */
+  ilen = scm_i_symbol_length (sym);
+  if (ilen == 0
+      || scm_i_symbol_ref_eq_char (sym, 0, '\'') 
+      || scm_i_symbol_ref_eq_char (sym, 0, '`') 
+      || scm_i_symbol_ref_eq_char (sym, 0, ',') 
+      || (scm_i_is_keywordish_symbol (sym) && quote_key_flag)
+      || (scm_i_symbol_ref_eq_char (sym, 0, '.')  && ilen == 1)
+      || scm_is_true (scm_string_to_number (scm_symbol_to_string (sym), 
+                                           scm_from_int (10))))
+    {
+      weird = 1;
+    }
+  else
+    {
+      i = 0;
+      while (i < ilen)
        {
-         odata[j++] = '\\';
-         odata[j++] = 'f';
+         c = scm_i_symbol_ref_to_uint32 (sym, i++);
+         if (symbol_escape_char (c) != NO_MATCH)
+           {
+             weird = 1;
+             break;
+           }
        }
-      else if (ch == '\n')
+    }
+         
+  /* Pass 2: determine the length required. */
+  if (weird)
+    olen = 5;              /* "#{" plus "}#" plus terminating null */
+  else
+    olen = 1;                  /* Terminating null */
+  for (i = 0; i < ilen; i++)
+    {
+      c = scm_i_symbol_ref_to_uint32 (sym, i);
+      if (symbol_escape_char (c) != NO_MATCH)
        {
-         odata[j++] = '\\';
-         odata[j++] = 'n';
+         if (weird)
+           olen += 2;
+         else
+           olen ++;
        }
-      else if (ch == '\r')
+      else if (is_printable_ascii (c))
+       olen ++;
+      else
+       olen += hex_escape_length (c);
+    }
+  
+  /* Pass 3: create the output string.  */
+  odata = scm_malloc (olen);
+  j = 0;
+  if (weird)
+    {
+      odata[j++] = '#';
+      odata[j++] = '{';
+    }
+  for (i=0; i < ilen; i++)
+    {
+      c = scm_i_symbol_ref_to_uint32 (sym, i);
+      if ((c2 = symbol_escape_char (c)) != NO_MATCH)
        {
-         odata[j++] = '\\';
-         odata[j++] = 'r';
+         if (weird)
+           {
+             odata[j++] = '\\';
+             odata[j++] = c2;
+           }
+         else
+           odata[j++] = c;
        }
-      else if (ch == '\t')
+      else if (is_printable_ascii (c))
        {
-         odata[j++] = '\\';
-         odata[j++] = 't';
+         /* These chars are printable ASCII */
+         odata[j++] = (char) c;
        }
-      else if (ch == '\007')
+      else
        {
-         odata[j++] = '\\';
-         odata[j++] = 'a';
+         /* Otherwise, print it in the form \xNNN;  */
+         size_t len;
+         const char *hex;
+         hex = hex_escape (c, &len);
+         memcpy (odata + j, hex, len);
+         j += len;
        }
-      else if (ch == '\010')
+    }
+  if (weird)
+    {
+      odata[j++] = '}';
+      odata[j++] = '#';
+    }
+  odata[j++] = '\0';
+
+  if (j != olen)
+    scm_misc_error ("symbol-name->write-string","internal string-size error~s 
vs ~s", 
+                   scm_list_2 (scm_from_size_t (j), 
+                               scm_from_size_t (olen)));
+
+  scm_remember_upto_here_1 (sym);
+
+  return odata;
+}
+
+/* Encode SYM as a machine-readable 7-bit string for use with the
+   write command.  */
+char *
+scm_i_symbol_name_to_locale_string (SCM sym, int quote_key_flag)
+{
+  /* If the name contains weird characters, we'll escape them with
+   * backslashes and set this flag; it indicates that we should surround the
+   * name with "#{" and "}#". */
+  size_t i, j, ilen, olen, llen;
+  scm_t_uint32 *odata;
+  scm_t_uint32 c, c2;
+  char *ldata;
+  int weird = 0;
+  int ret;
+
+  /* Pass 1: Determine if it requires extended symbol syntax, aka if
+     it is weird.  */
+  ilen = scm_i_symbol_length (sym);
+  if (ilen == 0
+      || scm_i_symbol_ref_eq_char (sym, 0, '\'') 
+      || scm_i_symbol_ref_eq_char (sym, 0, '`') 
+      || scm_i_symbol_ref_eq_char (sym, 0, ',') 
+      || (scm_i_is_keywordish_symbol (sym) && quote_key_flag)
+      || (scm_i_symbol_ref_eq_char (sym, 0, '.')  && ilen == 1)
+      || scm_is_true (scm_string_to_number (scm_symbol_to_string (sym), 
+                                           scm_from_int (10))))
+    {
+      weird = 1;
+    }
+  else
+    {
+      i = 0;
+      while (i < ilen)
        {
-         odata[j++] = '\\';
-         odata[j++] = 'b';
+         c = scm_i_symbol_ref_to_uint32 (sym, i++);
+         if (symbol_escape_char (c) != NO_MATCH)
+           {
+             weird = 1;
+             break;
+           }
        }
-      else if (ch == '\v')
+    }
+         
+  /* Pass 2: determine the length required in codepoints. */
+  if (weird)
+    olen = 5;              /* "#{" plus "}#" plus NULL */
+  else                    
+    olen = 1;                  /* NULL terminator */
+  for (i = 0; i < ilen; i++)
+    {
+      c = scm_i_symbol_ref_to_uint32 (sym, i);
+      if (symbol_escape_char (c) != NO_MATCH)
        {
-         odata[j++] = '\\';
-         odata[j++] = 'v';
+         if (weird)
+           olen += 2;
+         else
+           olen ++;
        }
       else
+       olen ++;
+    }
+  
+  /* Pass 3: create the wide output string.  */
+  odata = scm_malloc (olen * sizeof(scm_t_uint32));
+  j = 0;
+  if (weird)
+    {
+      odata[j++] = '#';
+      odata[j++] = '{';
+    }
+  for (i=0; i < ilen; i++)
+    {
+      c = scm_i_symbol_ref_to_uint32 (sym, i);
+      if ((c2 = symbol_escape_char (c)) != NO_MATCH)
        {
-         /* Otherwise, print it in the form \xNNN;  */
-         odata[j++] = '\\';
-         odata[j++] = 'x';
-         if (ch >= 0x100000)
-           odata[j++] = hex [(ch & 0xF00000) >> 20];
-         if (ch >= 0x10000)
-           odata[j++] = hex [(ch & 0xF0000) >> 16];
-         if (ch >= 0x1000)
-           odata[j++] = hex [(ch & 0xF000) >> 12];
-         if (ch >= 0x100)
-           odata[j++] = hex [(ch & 0xF00) >> 8];
-         if (ch >= 0x10)
-           odata[j++] = hex [(ch & 0xF0) >> 4];
-         odata[j++] = hex [ch & 0xF];
-         odata[j++] = ';';
+         if (weird)
+           {
+             odata[j++] = '\\';
+             odata[j++] = c2;
+           }
+         else
+           odata[j++] = c;
        }
+      else
+       odata[j++] = c;
     }
-  odata[j++] = '"';
-  odata[j] = '\0';
-  scm_remember_upto_here_1 (str);
-  
-  return odata;
+  if (weird)
+    {
+      odata[j++] = '}';
+      odata[j++] = '#';
+    }
+  odata[j++] = 0;
+  if (j != olen)
+    scm_misc_error ("symbol-name->locale-string","internal string-size error 
~s vs ~s", 
+                   scm_list_2 (scm_from_size_t (j), 
+                               scm_from_size_t (olen)));
+
+
+  ldata = NULL;
+  llen = 0;
+  ret = u32_conv_to_encoding (locale_charset (), iconveh_question_mark,
+                             odata, olen, NULL, &ldata, &llen);
+  if (ret != 0)
+    scm_misc_error (NULL, "conversion error", scm_list_1 (sym));
+
+  free (odata);
+  scm_remember_upto_here_1 (sym);
+
+  return ldata;
 }
 
+
 /* Create a null-terminated utf-8 C string.  */
 scm_t_uint8 *
 scm_i_to_utf8_string (SCM str)
@@ -1564,6 +1833,36 @@ scm_i_c_is_native_8bit_codeset (const char *charset)
 }
 
 SCM
+scm_from_latin1_stringn (const char *str, size_t len)
+{
+  SCM res;
+  char *dst;
+
+  if (len == (size_t)-1)
+    len = strlen (str);
+  res = scm_i_make_string (len, &dst);
+  memcpy (dst, str, len);
+  return res;
+}
+
+SCM
+scm_from_utf32_stringn (const scm_t_uint32 *str, size_t len)
+{
+  SCM res;
+  scm_t_uint32 *dst;
+
+#if 0
+  if (len == (size_t)-1)
+    len = u32_strlen (str);
+#endif
+  res = scm_i_make_wide_string (len, &dst);
+  memcpy (dst, str, len);
+  return res;
+}
+
+
+
+SCM
 scm_from_locale_stringn (const char *str, size_t len)
 {
   SCM res;
@@ -1598,7 +1897,7 @@ scm_from_locale_stringn (const char *str, size_t len)
        {
          SCM errstr = scm_i_make_string (len, &dst);
          memcpy (dst, str, len);
-         scm_misc_error (NULL, "Invalid sequence in locale string: ~S in ~S",
+         scm_misc_error (NULL, "invalid sequence in locale string: ~S in ~S",
                          scm_list_2 (errstr, scm_from_locale_string 
(locale_charset())));
        }
       else
@@ -1674,7 +1973,7 @@ scm_take_locale_stringn (char *str, size_t len)
     }
   else
     {
-      buf = scm_from_locale_stringn (str, len);
+      res = scm_from_locale_stringn (str, len);
       free (str);
     }
   
@@ -1701,6 +2000,7 @@ scm_to_locale_stringn (SCM str, size_t *lenp)
   len = 0;
   if (scm_i_is_narrow_string (str))
     {
+      
       ret = mem_iconveh (scm_i_string_chars (str), scm_i_string_length (str),
                         "ISO-8859-1", locale_charset(),
                         iconveh_question_mark, NULL,
diff --git a/libguile/strings.h b/libguile/strings.h
index 6c26a96..5b7142d 100644
--- a/libguile/strings.h
+++ b/libguile/strings.h
@@ -99,6 +99,8 @@ SCM_API SCM scm_c_substring_shared (SCM str, size_t start, 
size_t end);
 SCM_API SCM scm_c_substring_copy (SCM str, size_t start, size_t end);
 
 SCM_API int scm_is_string (SCM x);
+SCM_INTERNAL SCM scm_from_latin1_stringn (const char *str, size_t len);
+SCM_INTERNAL SCM scm_from_utf32_stringn (const scm_t_uint32 *str, size_t len);
 SCM_API SCM scm_from_locale_string (const char *str);
 SCM_API SCM scm_from_locale_stringn (const char *str, size_t len);
 SCM_API SCM scm_take_locale_string (char *str);
@@ -170,6 +172,10 @@ SCM_INTERNAL int scm_i_string_contains_char (SCM str, char 
ch);
 SCM_INTERNAL int scm_i_symbol_strcmp (SCM sym, const char *str);
 
 SCM_INTERNAL char *scm_i_to_write_string (SCM str);
+SCM_INTERNAL char *scm_i_symbol_name_to_write_string (SCM sym, 
+                                                     int quote_key_flag);
+SCM_INTERNAL char *scm_i_symbol_name_to_locale_string (SCM sym,
+                                                      int quote_key_flag);
 SCM_INTERNAL scm_t_uint8 *scm_i_to_utf8_string (SCM str);
 SCM_INTERNAL SCM scm_i_from_utf8_string (const scm_t_uint8 *str);
 
diff --git a/libguile/symbols.c b/libguile/symbols.c
index 8940118..fe6bc84 100644
--- a/libguile/symbols.c
+++ b/libguile/symbols.c
@@ -364,6 +364,21 @@ SCM_DEFINE (scm_symbol_pset_x, "symbol-pset!", 2, 0, 0,
 }
 #undef FUNC_NAME
 
+int
+scm_i_is_keywordish_symbol (SCM str)
+{
+  size_t len = scm_i_symbol_length (str);
+
+  if (len == 0)
+    return 0;
+
+  if (!scm_i_symbol_ref_eq_char (str, 0, ':')
+      && !scm_i_symbol_ref_eq_char (str, len - 1, ':'))
+    return 0;
+
+  return 1;
+}
+
 SCM
 scm_from_locale_symbol (const char *sym)
 {
@@ -371,6 +386,13 @@ scm_from_locale_symbol (const char *sym)
 }
 
 SCM
+scm_from_latin1_symboln (const char *sym, size_t len)
+{
+  SCM str = scm_from_latin1_stringn (sym, len);
+  return scm_i_mem2symbol (str);
+}
+
+SCM
 scm_from_locale_symboln (const char *sym, size_t len)
 {
   SCM str = scm_from_locale_stringn (sym, len);
diff --git a/libguile/symbols.h b/libguile/symbols.h
index c2dc183..50ff3f5 100644
--- a/libguile/symbols.h
+++ b/libguile/symbols.h
@@ -54,6 +54,8 @@ SCM_API SCM scm_symbol_pset_x (SCM s, SCM val);
 SCM_API SCM scm_symbol_hash (SCM s);
 SCM_API SCM scm_gensym (SCM prefix);
 
+SCM_INTERNAL int scm_i_is_keywordish_symbol (SCM str);
+SCM_INTERNAL SCM scm_from_latin1_symboln (const char *str, size_t len);
 SCM_API SCM scm_from_locale_symbol (const char *str);
 SCM_API SCM scm_from_locale_symboln (const char *str, size_t len);
 SCM_API SCM scm_take_locale_symbol (char *sym);
diff --git a/libguile/unif.c b/libguile/unif.c
index 29ca857..c9aba59 100644
--- a/libguile/unif.c
+++ b/libguile/unif.c
@@ -2254,7 +2254,7 @@ scm_istr2bve (SCM str)
   scm_t_uint32 *data;
 
   data = scm_bitvector_writable_elements (vec, &handle, NULL, NULL, NULL);
-
+  i = 0;
   for (k = 0; k < (len + 31) / 32; k++)
     {
       data[k] = 0L;
diff --git a/libguile/vm-i-loader.c b/libguile/vm-i-loader.c
index 515001d..eeae82f 100644
--- a/libguile/vm-i-loader.c
+++ b/libguile/vm-i-loader.c
@@ -72,7 +72,8 @@ VM_DEFINE_LOADER (62, load_string, "load-string")
   size_t len;
   FETCH_LENGTH (len);
   SYNC_REGISTER ();
-  PUSH (scm_from_locale_stringn ((char *)ip, len));
+  /* PUSH (scm_from_locale_stringn ((char *)ip, len)); */
+  PUSH (scm_from_latin1_stringn ((char *)ip, len));
   /* Was: scm_makfromstr (ip, len, 0) */
   ip += len;
   NEXT;
@@ -83,7 +84,8 @@ VM_DEFINE_LOADER (63, load_symbol, "load-symbol")
   size_t len;
   FETCH_LENGTH (len);
   SYNC_REGISTER ();
-  PUSH (scm_from_locale_symboln ((char *)ip, len));
+  /* PUSH (scm_from_locale_symboln ((char *)ip, len));  */
+  PUSH (scm_from_latin1_symboln ((char *)ip, len));
   ip += len;
   NEXT;
 }
diff --git a/libguile/vm.c b/libguile/vm.c
index 38d085c..55b8010 100644
--- a/libguile/vm.c
+++ b/libguile/vm.c
@@ -52,6 +52,7 @@
 #include "objcodes.h"
 #include "programs.h"
 #include "lang.h" /* NULL_OR_NIL_P */
+#include "symbols.h"
 #include "vm.h"
 
 /* I sometimes use this for debugging. */
diff --git a/module/srfi/srfi-19.scm b/module/srfi/srfi-19.scm
index 29c604f..235e2b8 100644
--- a/module/srfi/srfi-19.scm
+++ b/module/srfi/srfi-19.scm
@@ -1077,7 +1077,7 @@
                                           pad-with 2)
                             port))))
    (cons #\t (lambda (date pad-with port)
-               (display #\Tab port)))
+               (display #\tab port)))
    (cons #\T (lambda (date pad-with port)
                (display (date->string date "~H:~M:~S") port)))
    (cons #\U (lambda (date pad-with port)
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index f1ba80b..b2f8502 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -441,14 +441,14 @@
        (display "x\a" port)
        (= 1 (port-column port))))
 
-    (pass-if "\\x08 backspace"
+    (pass-if "\\x08; backspace"
       (let ((port (open-output-string)))
-       (display "\x08" port)
+       (display "\x08;" port)
        (= 0 (port-column port))))
 
-    (pass-if "x\\x08 backspace"
+    (pass-if "x\\x08; backspace"
       (let ((port (open-output-string)))
-       (display "x\x08" port)
+       (display "x\x08;" port)
        (= 0 (port-column port))))
 
     (pass-if "\\n"
@@ -498,13 +498,13 @@
        (while (not (eof-object? (read-char port))))
        (= 1 (port-column port))))
 
-    (pass-if "\\x08 backspace"
-      (let ((port (open-input-string "\x08")))
+    (pass-if "\\x08; backspace"
+      (let ((port (open-input-string "\x08;")))
        (while (not (eof-object? (read-char port))))
        (= 0 (port-column port))))
 
-    (pass-if "x\\x08 backspace"
-      (let ((port (open-input-string "x\x08")))
+    (pass-if "x\\x08; backspace"
+      (let ((port (open-input-string "x\x08;")))
        (while (not (eof-object? (read-char port))))
        (= 0 (port-column port))))
 
diff --git a/test-suite/tests/reader.test b/test-suite/tests/reader.test
index b068c71..91f09dd 100644
--- a/test-suite/tests/reader.test
+++ b/test-suite/tests/reader.test
@@ -34,7 +34,7 @@
 (define exception:eof-in-string
   (cons 'read-error "end of file in string constant$"))
 (define exception:illegal-escape
-  (cons 'read-error "illegal character in escape sequence: .*$"))
+  (cons 'read-error "invalid character in escape sequence: .*$"))
 
 
 (define (read-string s)
@@ -88,7 +88,7 @@
 
   (pass-if "CR recognized as a token delimiter"
     ;; In 1.8.3, character 0x0d was not recognized as a delimiter.
-    (equal? (read-string "one\x0dtwo") 'one))
+    (equal? (read-string "one\x0d;two") 'one))
 
   (pass-if "returned strings are mutable"
     ;; Per R5RS Section 3.4, "Storage Model", `read' is supposed to return
diff --git a/test-suite/tests/regexp.test b/test-suite/tests/regexp.test
index 15f77a3..873b787 100644
--- a/test-suite/tests/regexp.test
+++ b/test-suite/tests/regexp.test
@@ -22,6 +22,8 @@
   #:use-module (test-suite lib)
   #:use-module (ice-9 regex))
 
+(setlocale LC_ALL "en_US.iso88591")
+
 
 ;;; Run a regexp-substitute or regexp-substitute/global test, once
 ;;; providing a real port and once providing #f, requesting direct
diff --git a/test-suite/tests/symbols.test b/test-suite/tests/symbols.test
index be4e2da..8598761 100644
--- a/test-suite/tests/symbols.test
+++ b/test-suite/tests/symbols.test
@@ -94,7 +94,11 @@
 ;;; lexical syntax of identifiers
 ;;;
 
-(define (legit x)
+(define (legit-r6rs x)
+  (expect-fail-exception (string-append x " is invalid") exception:read-error
+    (with-input-from-string x read)))                   
+
+(define (legit-guile-extension x)
   (expect-fail-exception (string-append x " is invalid") exception:read-error
     (with-input-from-string x read)))                   
 
@@ -102,23 +106,34 @@
   (pass-if-exception (string-append x " is invalid")  exception:read-error
     (with-input-from-string x read)))                   
               
-(define (should-be-invalid x)
-  (pass-if (string-append x " is invalid")
-    (throw 'unresolved)))
 
 (with-test-prefix "symbol syntax"
 
  (with-test-prefix "1-character ASCII identifiers"
-   (map legit (list "'!" "'$" "'%" "'&" "'*" "'/" "':" "'<" "'=" "'>" "'?" 
-               "'^" "'_" "'~"))
-   (map should-be-invalid (list "'+" "'-" "'." "'0" "'@" "'[" "']" "'{" 
-                               "'|" "'}"))
+   (map legit-r6rs 
+       (append
+        ;; special initial
+        (list "'!" "'$" "'%" "'&" "'*" "'/" "':" "'<" "'=" "'>" "'?" 
+              "'^" "'_" "'~")
+        ;; "peculiar" identifiers
+        (list "'+" "'-" "'.")))
+ 
+   (map legit-guile-extension (list "'0" "'@" "'[" "']" "'{" "'|" "'}"))
+
    (map invalid (list "'\"" "'#" "'\\")))
 
  (with-test-prefix "2-character ASCII identifiers"
-   (map legit (list "'A!" "'A$" "'A%" "'A&" "'A*" "'A+" "'A-" "'A." "'A/"
-                   "'A:" "'A<" "'A=" "'A")) 
-   (map should-be-invalid (list "'a\"" "'a#" "'a[" "'a]" "'a{" "'a|" "'a}"))
+
+   (map legit-r6rs 
+       (append
+        ;; special initial
+        (list "'A!" "'A$" "'A%" "'A&" "'A*" "'A/" "'A:" "'A<" "'A=" 
+              "'A>" "'A?" "'A^" "'A_" "'A~")
+        ;; special subsequent
+        (list "'A+" "'A-" "'A." "'@")))
+
+   (map legit-guile-extension (list "'a\"" "'a#" "'a[" "'a]" "'a{" "'a|" 
"'a}"))
+
    (map invalid (list "'a\\")))
 
  (with-test-prefix "inline hex escapes"


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]