guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e


From: Michael Gran
Subject: [Guile-commits] GNU Guile branch, string_abstraction2, updated. fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
Date: Fri, 05 Jun 2009 13:40:40 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Guile".

http://git.savannah.gnu.org/cgit/guile.git/commit/?id=fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2

The branch, string_abstraction2 has been updated
       via  fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2 (commit)
       via  c1988a1fd75b84947e35f402a24c0b96fd94fd10 (commit)
       via  5eff5b4a8fc5317301484f114f55cc0af37e30ee (commit)
      from  9d52ea2fa4073582113f4235dededb002682e459 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit fc50695e8d6a5cc0cebc3a8fcd0833ec1ff316a2
Author: Michael Gran <address@hidden>
Date:   Fri Jun 5 06:39:25 2009 -0700

    ports have their own character encoding
    
    Make character encoding part of the port structure.  Also repurpose the
    fluid that contained the global character encoding to now be the default
    character encoding for new ports.  Scan for character encodings as
    magic comments when calling primitive-load for a file.
    
        * strings.h: update declarations for
        scm_set_conversion_error_behavior_x, scm_i_get_port_encoding,
        scm_i_set_port_encoding.  New declaration scm_to_encoded_stringn
    
        * strings.c (scm_i_get_port_encoding): new parameter 'port'
        (scm_i_set_port_encoding): new parameter 'port'
        (scm_set_conversion_error_behavior_x): new parameter 'port'
        (scm_from_locale_stringn): store encoding: don't recompute it
        (scm_from_locale_string): return null string on NULL
        (scm_to_encoded_stringn): new func
        (scm_to_locale_stringn): call scm_to_encoding_stringn
    
        * read.h: new func declaration for scm_scan_for_encoding
    
        * read.c (read_complete_token): remove parameter 'encoding'
        (scm_read_string): remove encoding parameter.  Use scm_getc not
        scm_getc_with_encoding.  Remove dynwind.
        (SCM_ENCODING_SEARCH_SIZE): new
        (scm_scan_for_encoding): new func
    
        * posix.h: new declarations for scm_setencoding and scm_setbinary
    
        * posix.c (scm_setencoding): new optional port parameter. Set
        encoding of port.
        (scm_setbinary): new optional port parameter.  Set encoding of ports.
    
        * ports.h (struct scm_t_port): new members encoding and ilseq_handler
    
        * ports.c (scm_new_port_table_entry): initialize new port struct
        members encoding and ilseq_handler
        (scm_getc): use port's encoding.
        (scm_lfwrite_substr): use port's encoding
    
        * load.c (scm_primitive_load): scan for encoding

commit c1988a1fd75b84947e35f402a24c0b96fd94fd10
Author: Michael Gran <address@hidden>
Date:   Fri Jun 5 06:33:37 2009 -0700

    fixes in scm_getc
    
    Reduce the maximum number of bytes in a char to 4. Stop multi-byte
    processing if a newline is encountered.  Simplify error message generation.
    
        * ports.c (SCM_MBCHAR_BUF_SIZE): max bytes per char: now 4
        (scm_getc): Terminate a multi-byte character if it contains
        a newline. Simplify error message generation.

commit 5eff5b4a8fc5317301484f114f55cc0af37e30ee
Author: Michael Gran <address@hidden>
Date:   Fri Jun 5 06:28:05 2009 -0700

    allow writing of escaped wide string characters
    
        * print.c (iprin1): add string escapes for L1 and wide strings

-----------------------------------------------------------------------

Summary of changes:
 libguile/load.c                         |    8 +-
 libguile/ports.c                        |   81 +++++-------
 libguile/ports.h                        |    6 +
 libguile/posix.c                        |   93 ++++++++++----
 libguile/posix.h                        |    4 +-
 libguile/print.c                        |   38 +++++-
 libguile/read.c                         |  113 ++++++++++++++--
 libguile/read.h                         |    1 +
 libguile/strings.c                      |  218 ++++++++++++++++++-------------
 libguile/strings.h                      |    9 +-
 test-suite/tests/encoding-escapes.test  |    6 +-
 test-suite/tests/encoding-iso88591.test |    4 +-
 test-suite/tests/encoding-iso88597.test |    2 +-
 test-suite/tests/i18n.test              |    3 +-
 test-suite/tests/ports.test             |    1 +
 15 files changed, 386 insertions(+), 201 deletions(-)

diff --git a/libguile/load.c b/libguile/load.c
index 1b5b24f..3deae37 100644
--- a/libguile/load.c
+++ b/libguile/load.c
@@ -80,6 +80,7 @@ SCM_DEFINE (scm_primitive_load, "primitive-load", 1, 0, 0,
 #define FUNC_NAME s_scm_primitive_load
 {
   SCM hook = *scm_loc_load_hook;
+  char *encoding;
   SCM_VALIDATE_STRING (1, filename);
   if (scm_is_true (hook) && scm_is_false (scm_procedure_p (hook)))
     SCM_MISC_ERROR ("value of %load-hook is neither a procedure nor #f",
@@ -92,7 +93,12 @@ SCM_DEFINE (scm_primitive_load, "primitive-load", 1, 0, 0,
     SCM port = scm_open_file (filename, scm_from_locale_string ("r"));
     scm_dynwind_begin (SCM_F_DYNWIND_REWINDABLE);
     scm_i_dynwind_current_load_port (port);
-
+    encoding = scm_scan_for_encoding (port);
+    if (encoding)
+      {
+       scm_i_set_port_encoding (port, encoding);
+       free (encoding);
+      }
     while (1)
       {
        SCM reader, form;
diff --git a/libguile/ports.c b/libguile/ports.c
index 0b9ecd0..c1fedbf 100644
--- a/libguile/ports.c
+++ b/libguile/ports.c
@@ -511,10 +511,16 @@ scm_new_port_table_entry (scm_t_bits tag)
   
   SCM z = scm_cons (SCM_EOL, SCM_EOL);
   scm_t_port *entry = (scm_t_port *) scm_gc_calloc (sizeof (scm_t_port), 
"port");
+  const char *enc;
 
   entry->file_name = SCM_BOOL_F;
   entry->rw_active = SCM_PORT_NEITHER;
   entry->port = z;
+  if ((enc = scm_i_get_port_encoding (SCM_BOOL_F)) == NULL)
+    entry->encoding = NULL;
+  else
+    entry->encoding = strdup (enc);
+  entry->ilseq_handler = scm_i_get_conversion_strategy ();
 
   SCM_SET_CELL_TYPE (z, tag);
   SCM_SETPTAB_ENTRY (z, entry);
@@ -942,7 +948,7 @@ SCM_DEFINE (scm_read_char, "read-char", 0, 1, 0,
 }
 #undef FUNC_NAME
 
-#define SCM_MBCHAR_BUF_SIZE (5)
+#define SCM_MBCHAR_BUF_SIZE (4)
 
 scm_t_wchar
 scm_getc (SCM port)
@@ -953,6 +959,8 @@ scm_getc (SCM port)
   scm_t_wchar codepoint = 0;
   scm_t_uint32 *u32;
   size_t u32len;
+  scm_t_port *pt = SCM_PTAB_ENTRY (port);
+
 
 
   c = scm_raw_getc (port);
@@ -963,7 +971,7 @@ scm_getc (SCM port)
   bufcount++;
 
   /* Handle most ASCII characters quickly.  */
-  if ((scm_i_get_port_encoding () == NULL) 
+  if ((pt->encoding == NULL) 
       || ('0' <= c && c <= '9') || ('a' <= c && c <= 'z') || ('A' <= c && c <= 
'Z'))
     {
       /* In all (?) encodings except UTF-16 and UTF-32, alnum characters
@@ -974,7 +982,7 @@ scm_getc (SCM port)
 
   for (;;)
     {
-      u32 = u32_conv_from_encoding (SCM_PORT_ENCODING, 
scm_i_get_conversion_strategy (), 
+      u32 = u32_conv_from_encoding (pt->encoding, pt->ilseq_handler, 
                                    buf, bufcount, NULL, NULL, &u32len);
       if (u32 == NULL || u32len == 0)
        {
@@ -1013,9 +1021,14 @@ scm_getc (SCM port)
              scm_ungetc (buf[bufcount-1], port);
              bufcount --;
            }
-         return EOF;
        }
 
+      if (c == '\n')
+       {
+         scm_ungetc ('\n', port);
+         goto failure;
+       }
+       
       buf[bufcount++] = c;
     }
 
@@ -1044,47 +1057,20 @@ scm_getc (SCM port)
   return codepoint;
 
  failure:
-  if (errno == EILSEQ)
-    {
-      int z, pos = 0;
-      char encoded_buf[4*SCM_MBCHAR_BUF_SIZE+1];
-
-      /* This character sequence can't be converted, and extra
-        characters won't help.  Try hard to make a string for the
-        error message. */
-
-      if (scm_i_get_conversion_strategy ()!= iconveh_escape_sequence)
-       {
-         u32 = u32_conv_from_encoding (SCM_PORT_ENCODING, 
iconveh_escape_sequence, 
-                                       buf, bufcount, NULL, NULL, &u32len);
-         if (u32)
-           {
-             SCM encoded_str;
-             scm_t_wchar *wbuf;
-             encoded_str = scm_i_make_wide_string (u32len, &wbuf);
-             u32_cpy ((scm_t_uint32 *) wbuf, (scm_t_uint32 *) u32, u32len);
-             free (u32);
-             
-             scm_misc_error (NULL, "input encoding error for ~s: ~s",
-                             scm_list_2 (scm_from_locale_string 
(SCM_PORT_ENCODING),
-                                         encoded_str));
-           }
-       }
-
-      for (z = 0; z < bufcount; z++)
-       {
-         if (buf[z]>=32 && buf[z]<=127)
-           pos += sprintf (encoded_buf+pos, "%c", buf[z]);
-         else
-           pos += sprintf (encoded_buf+pos, "\\x%02x", (unsigned char) buf[z]);
-       }
-      
-      scm_misc_error (NULL, "Input encoding error for ~s: ~s",
-                     scm_list_2 (scm_from_locale_string (SCM_PORT_ENCODING),
-                                 scm_from_locale_string (encoded_buf)));
-    }
-  else
-    scm_misc_error (NULL, "Input encoding error (invalid)\n", SCM_EOL);
+  {
+    char *err_buf;
+    SCM err_str = scm_i_make_string (bufcount, &err_buf);
+    memcpy (err_buf, buf, bufcount);
+
+    if (errno == EILSEQ)
+      scm_misc_error (NULL, "input encoding error for ~s: ~s",
+                     scm_list_2 (scm_from_locale_string 
(scm_i_get_port_encoding (port)),
+                                 err_str));
+    else
+      scm_misc_error (NULL, "Input encoding error (invalid) for ~s: ~s\n", 
+                     scm_list_2 (scm_from_locale_string 
(scm_i_get_port_encoding (port)),
+                                 err_str));
+  }
 
   /* Never gets here.  */
   return 0;
@@ -1178,7 +1164,8 @@ scm_lfwrite_substr (SCM str, size_t start, size_t end, 
SCM port)
     end = size;
   size = end - start;
 
-  buf = scm_to_locale_stringn (scm_c_substring (str, start, end), &len);
+  buf = scm_to_encoded_stringn (scm_c_substring (str, start, end), &len, 
+                               pt->encoding, pt->ilseq_handler);
   ptob->write (port, buf, len);
   free (buf);
 
@@ -1396,7 +1383,7 @@ scm_ungetc (scm_t_wchar c, SCM port)
   int i;
 
   wbuf[0] = c;
-  buf = scm_to_locale_stringn (str, &len);
+  buf = scm_to_encoded_stringn (str, &len, pt->encoding, pt->ilseq_handler);
     
   for (i = len - 1; i >= 0; i--)
     {
diff --git a/libguile/ports.h b/libguile/ports.h
index 322757d..20840c6 100644
--- a/libguile/ports.h
+++ b/libguile/ports.h
@@ -31,6 +31,7 @@
 
 /* Not sure if this is a good idea.  We need it for off_t.  */
 #include <sys/types.h>
+#include <uniconv.h>
 
 
 
@@ -58,6 +59,11 @@ typedef struct
   long line_number;            /* debugging support.  */
   int column_number;           /* debugging support.  */
 
+  /* Character encoding support  */
+
+  char *encoding;
+  enum iconv_ilseq_handler ilseq_handler;
+
   /* port buffers.  the buffer(s) are set up for all ports.  
      in the case of string ports, the buffer is the string itself.
      in the case of unbuffered file ports, the buffer is a
diff --git a/libguile/posix.c b/libguile/posix.c
index 30a224b..da76d54 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1665,7 +1665,7 @@ SCM_DEFINE (scm_setlocale, "setlocale", 1, 1, 0,
       SCM_SYSERROR;
     }
 
-  scm_i_set_port_encoding (locale_charset ());
+  scm_i_set_port_encoding (SCM_BOOL_F, locale_charset ());
   /* Recompute the standard SRFI-14 character sets in a locale-dependent
      (actually charset-dependent) way.  */
   scm_srfi_14_compute_char_sets ();
@@ -1676,58 +1676,105 @@ SCM_DEFINE (scm_setlocale, "setlocale", 1, 1, 0,
 #undef FUNC_NAME
 #endif /* HAVE_SETLOCALE */
 
-SCM_DEFINE (scm_setencoding, "setencoding", 1, 0, 0,
-           (SCM enc),
+SCM_DEFINE (scm_setencoding, "setencoding", 1, 1, 0,
+           (SCM enc, SCM port),
            "Sets the character encoding that will be used to interpret all\n"
-           "port I/O.  Normally, one would set this using @code{setlocale},\n"
+           "port I/O.  Normally, a new port would inherit the encoding\n"
+           "set by using @code{setlocale},\n"
            "so that the character encoding is in sync with the environment.\n"
            "This command can be used to set it directly.\n"
            "\n"
-           "Note that since the output to the current output and error ports\n"
-           "is also interpreted using this locale, non-ASCII strings may not\n"
-           "be correctly displayed by the terminal device that receives 
current\n"
-           "output and current error.\n"
+           "If the optional parameter @var{port} is included, the\n"
+           "encoding will be used for that specific port.\n"
            "\n"
-           "Subsequent calls to @code{setlocale} will report the encoding to\n"
-           "be that proscribed by the locale.")
+           "If @var{port} is not included, this encoding will be used\n"
+           "for all new ports opened in this thread.\n")
 #define FUNC_NAME s_scm_setencoding
 {
   const char str[] = " ";
   scm_t_uint32 *u32;
   size_t u32len;
-  char *new_enc = scm_to_locale_string (enc);
+  char *new_enc;
+  SCM old_enc;
+  scm_t_port *pt;
 
+  SCM_VALIDATE_STRING (1, enc);
+
+  new_enc = scm_to_locale_string (enc);
   u32 = u32_conv_from_encoding (new_enc, iconveh_error, str, 1,
                                NULL, NULL, &u32len);
   if (u32 == NULL)
     scm_misc_error (FUNC_NAME, "invalid or unknown character encoding ~s",
                    scm_list_1 (enc));
   free (u32);
-  scm_i_set_port_encoding (new_enc);
 
-  return SCM_UNSPECIFIED;
+  if (SCM_UNBNDP (port))
+    {
+      if (scm_i_get_port_encoding (SCM_BOOL_F))
+       old_enc = scm_from_locale_string (scm_i_get_port_encoding (SCM_BOOL_F));
+      else 
+       old_enc = SCM_BOOL_F;
+      scm_i_set_port_encoding (SCM_BOOL_F, new_enc);
+    }
+  else
+    {
+      SCM_VALIDATE_OPPORT (2, enc);
+      pt = SCM_PTAB_ENTRY (port);
+
+      if (pt->encoding)
+       old_enc = scm_from_locale_string (pt->encoding);
+      else
+       old_enc = SCM_BOOL_F;
+
+      scm_i_set_port_encoding (port, new_enc);
+    }
+
+  free (new_enc);
+  return old_enc;
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_setbinary, "setbinary", 0, 0, 0,
-           (void),
-           "Sets the behavior of all port I/O to treat data as 8-bit\n"
+SCM_DEFINE (scm_setbinary, "setbinary", 0, 1, 0,
+           (SCM port),
+           "If the optional @var{port} parameter is passed to this procedure\n"
+           "the port will treat data as 8-bit\n"
            "clean bytes with no character encoding.\n" 
            "This is the default behavior at startup.\n"
-           "Calling @code{setlocale} will modify the character encoding.\n"
-           "Thus, the procedure @code{setbinary} can be used to\n"
-           "clear it.\n"
+           "\n"
+           "If no @var{port} is passed in, the encoding for future ports\n"
+           "created in this thread will treat data as 8-bit clean.\n"
            "\n"
            "In practice, @code{setbinary} sets the port character encoding\n"
            "to ISO-8859-1: an 8-bit character encoding that does not modify\n"
            "binary data when it is input or output\n"
            "\n"
-           "After @code{setbinary} has been called, subsequent calls to \n"
-           "@code{setlocale} will, again, modify the character encoding.\n")
+           "The old character encoding is returned.")
 #define FUNC_NAME s_scm_setbinary
 {
-  scm_i_set_port_encoding (NULL);
-  return SCM_UNSPECIFIED;
+  SCM old_enc;
+  scm_t_port *pt;
+
+  if (!SCM_UNBNDP (port))
+    {
+      SCM_VALIDATE_OPPORT (1, port);
+      pt = SCM_PTAB_ENTRY (port);
+
+      if (pt->encoding)
+       old_enc = scm_from_locale_string (pt->encoding);
+      else
+       old_enc = SCM_BOOL_F;
+      scm_i_set_port_encoding (port, NULL);
+
+    }
+  else
+    {
+      old_enc = scm_from_locale_string (scm_i_get_port_encoding (SCM_BOOL_F));
+      scm_i_set_port_encoding (SCM_BOOL_F, NULL);
+    }
+
+
+
+  return old_enc;
 }
 #undef FUNC_NAME
 
diff --git a/libguile/posix.h b/libguile/posix.h
index f87a49a..7ab4a6a 100644
--- a/libguile/posix.h
+++ b/libguile/posix.h
@@ -73,8 +73,8 @@ SCM_API SCM scm_access (SCM path, SCM how);
 SCM_API SCM scm_getpid (void);
 SCM_API SCM scm_putenv (SCM str);
 SCM_API SCM scm_setlocale (SCM category, SCM locale);
-SCM_API SCM scm_setencoding (SCM encoding);
-SCM_API SCM scm_setbinary (void);
+SCM_API SCM scm_setencoding (SCM encoding, SCM port);
+SCM_API SCM scm_setbinary (SCM port);
 SCM_API SCM scm_mknod (SCM path, SCM type, SCM perms, SCM dev);
 SCM_API SCM scm_nice (SCM incr);
 SCM_API SCM scm_sync (void);
diff --git a/libguile/print.c b/libguile/print.c
index 55ea440..3c9290f 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -555,6 +555,9 @@ iprin1 (SCM exp, SCM port, scm_print_state *pstate)
          if (SCM_WRITINGP (pstate))
            {
              size_t i, j, len;
+             static char const hex[]="0123456789abcdef";
+             char buf[8];
+
 
              scm_putc ('"', port);
              len = scm_i_string_length (exp);
@@ -562,24 +565,43 @@ iprin1 (SCM exp, SCM port, scm_print_state *pstate)
                {
                  scm_t_wchar ch = scm_i_string_ref_to_wchar (exp, i);
 
-                 if ((ch < 32 && ch != '\n') || (127 <= ch && ch < 148))
+                 if (ch == '"' || ch == '\\')
+                   {
+                     scm_lfwrite_substr (exp, j, i, port);
+                     scm_putc ('\\', port);
+                     j = i;
+                   }
+                 else if ((ch < 32 && ch != '\n') || (127 <= ch && ch < 256))
                    {
-                     static char const hex[]="0123456789abcdef";
-                     char buf[4];
-
                      scm_lfwrite_substr (exp, j, i, port);
                      buf[0] = '\\';
                      buf[1] = 'x';
-                     buf[2] =  hex [ch / 16];
+                     buf[2] = hex [ch / 16];
                      buf[3] = hex [ch % 16];
                      scm_lfwrite (buf, 4, port);
                      j = i+1;
                    }
-                 else if (ch == '"' || ch == '\\')
+                 else if (ch > 0xFF && ch <= 0xFFFF)
                    {
                      scm_lfwrite_substr (exp, j, i, port);
-                     scm_putc ('\\', port);
-                     j = i;
+                     buf[0] = '\\';
+                     buf[1] = 'u';
+                     buf[2] = hex [(ch & 0xF000) >> 12];
+                     buf[3] = hex [(ch & 0xF00) >> 8];
+                     buf[4] = hex [(ch & 0xF0) >> 4];
+                     buf[5] = hex [(ch & 0xF)];
+                   }
+                 else if (ch > 0xFFFF)
+                   {
+                     scm_lfwrite_substr (exp, j, i, port);
+                     buf[0] = '\\';
+                     buf[1] = 'U';
+                     buf[2] = hex [(ch & 0xF00000) >> 20];
+                     buf[3] = hex [(ch & 0xF0000) >> 16];
+                     buf[4] = hex [(ch & 0xF000) >> 12];
+                     buf[5] = hex [(ch & 0xF00) >> 8];
+                     buf[6] = hex [(ch & 0xF0) >> 4];
+                     buf[7] = hex [(ch & 0xF)];
                    }
                }
              scm_lfwrite_substr (exp, j, i, port);
diff --git a/libguile/read.c b/libguile/read.c
index bade5d5..babdb30 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -223,7 +223,7 @@ read_token (SCM port, SCM buf, size_t *read)
       if (CHAR_IS_DELIMITER (chr))
        {
          scm_dynwind_end ();
-         scm_ungetc_with_encoding (chr, port, encoding);
+         scm_ungetc (chr, port);
          return 0;
        }
 
@@ -236,21 +236,21 @@ read_token (SCM port, SCM buf, size_t *read)
 }
 
 static SCM
-read_complete_token (SCM port, size_t *read, const char *encoding)
+read_complete_token (SCM port, size_t *read)
 {
   SCM buffer, str = SCM_EOL;
   size_t len;
   int overflow;
 
   buffer = scm_i_make_string (READER_BUFFER_SIZE, NULL); 
-  overflow = read_token (port, buffer, read, encoding);
+  overflow = read_token (port, buffer, read);
   if (!overflow)
     return scm_i_substring (buffer, 0, *read);
 
   str = scm_string_copy (buffer);
   do
     {
-      overflow = read_token (port, buffer, &len, encoding);
+      overflow = read_token (port, buffer, &len);
       str = scm_string_append (scm_list_2 (str, buffer));
       *read += len;
     }
@@ -418,7 +418,7 @@ scm_read_sexp (scm_t_wchar chr, SCM port)
 #undef FUNC_NAME
 
 static SCM
-scm_read_string (int chr, SCM port, const char *encoding)
+scm_read_string (int chr, SCM port)
 #define FUNC_NAME "scm_lreadr"
 {
   /* For strings smaller than C_STR, this function creates only one Scheme
@@ -429,10 +429,6 @@ scm_read_string (int chr, SCM port, const char *encoding)
   scm_t_wchar c;
 
   str = scm_i_make_string (READER_STRING_BUFFER_SIZE, NULL);  
-  scm_dynwind_begin (0);
-  scm_dynwind_unwind_handler (stop_writing, NULL, SCM_F_WIND_EXPLICITLY);
-  str = scm_i_string_start_writing (str);
-
   while ('"' != (c = scm_getc (port)))
     {
       if (c == EOF)
@@ -452,7 +448,7 @@ scm_read_string (int chr, SCM port, const char *encoding)
 
       if (c == '\\')
        {
-         switch (c = scm_getc_with_encoding (port, encoding))
+         switch (c = scm_getc (port))
            {
            case EOF:
              goto str_eof;
@@ -492,9 +488,9 @@ scm_read_string (int chr, SCM port, const char *encoding)
            case 'x':
              {
                scm_t_wchar a, b;
-               a = scm_getc_with_encoding (port, encoding);
+               a = scm_getc (port);
                if (a == EOF) goto str_eof;
-               b = scm_getc_with_encoding (port, encoding);
+               b = scm_getc (port);
                if (b == EOF) goto str_eof;
                if      ('0' <= a && a <= '9') a -= '0';
                else if ('A' <= a && a <= 'F') a = a - 'A' + 10;
@@ -514,7 +510,7 @@ scm_read_string (int chr, SCM port, const char *encoding)
                c = 0;
                for (i = 0; i < 4; i ++)
                  {
-                   a = scm_getc_with_encoding (port, encoding);
+                   a = scm_getc (port);
                    if (a == EOF)
                      goto str_eof;
                    if      ('0' <= a && a <= '9') a -= '0';
@@ -532,7 +528,7 @@ scm_read_string (int chr, SCM port, const char *encoding)
                c = 0;
                for (i = 0; i < 6; i ++)
                  {
-                   a = scm_getc_with_encoding (port, encoding);
+                   a = scm_getc (port);
                    if (a == EOF)
                      goto str_eof;
                    if      ('0' <= a && a <= '9') a -= '0';
@@ -550,9 +546,10 @@ scm_read_string (int chr, SCM port, const char *encoding)
                                 scm_list_1 (SCM_MAKE_CHAR (c)));
            }
        }
+      str = scm_i_string_start_writing (str);
       scm_i_string_set_from_wchar (str, c_str_len++, c);
+      scm_i_string_stop_writing ();
     }
-  scm_dynwind_end ();
   if (c_str_len > 0)
     {
       return scm_i_substring_copy (str, 0, c_str_len);
@@ -1271,6 +1268,92 @@ scm_get_hash_procedure (int c)
     }
 }
 
+#define SCM_ENCODING_SEARCH_SIZE (500)
+
+char *
+scm_scan_for_encoding (SCM port)
+{
+  char header[SCM_ENCODING_SEARCH_SIZE+1];
+  size_t bytes_read;
+  char *encoding = NULL;
+  int utf8_bom = 0;
+  char *pos;
+  int i;
+  int in_comment;
+
+  bytes_read = scm_c_read (port, header, SCM_ENCODING_SEARCH_SIZE);  
+  scm_seek (port, scm_from_int (0), scm_from_int (SEEK_SET));
+
+  if (bytes_read > 3 
+      && header[0] == '\xef' && header[1] == '\xbb' && header[2] == '\xbf')
+    utf8_bom = 1;
+
+  /* search past "coding[:=]" */
+  if ((pos = strstr(header, "coding")) == NULL)
+    return NULL;
+
+  pos += strlen("coding");
+  if (pos - header >= SCM_ENCODING_SEARCH_SIZE || 
+      (*pos != ':' && *pos != '='))
+    return NULL;
+  pos ++;
+
+  /* skip spaces */
+  while (pos - header <= SCM_ENCODING_SEARCH_SIZE && 
+        (*pos == ' ' || *pos == '\t'))
+    pos ++;
+
+  /* grab the next token */
+  i = 0;
+  while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE 
+        && (isalnum(pos[i]) || pos[i] == '_' || pos[i] == '-' || pos[i] == 
'.'))
+    i++;
+
+  if (i == 0)
+    return NULL;
+
+  encoding = scm_malloc (i+1);
+  memcpy (encoding, pos, i);
+  encoding[i] ='\0';
+  for (i = 0; i < strlen(encoding); i++)
+    encoding[i] = toupper(encoding[i]);
+
+  /* push backwards to make sure we were in a comment */
+  in_comment = 0;
+  while (pos - i - header > 0)
+    {
+      if (*(pos - i) == '\n')
+       {
+         /* This wasn't in a semicolon comment. Check for a
+          hash-bang comment. */
+         char *beg = strstr (header, "#!");
+         char *end = strstr (header, "!#");
+         if (beg < pos && pos < end)
+           in_comment = 1;
+         break;
+       }
+      if (*(pos - i) == ';')
+       {
+         in_comment = 1;
+         break;
+       }
+      i ++;
+    }
+  if (!in_comment)
+    {
+      /* This wasn't in a comment */
+      free (encoding);
+      return NULL;
+    }
+  if (utf8_bom && strcmp(encoding, "UTF-8"))
+    scm_misc_error (NULL, 
+                   "the port input declares the encoding ~s but is encoded as 
UTF-8",
+                   scm_list_1 (scm_from_locale_string (encoding)));
+      
+  return encoding;
+}
+
+
 void
 scm_init_read ()
 {
diff --git a/libguile/read.h b/libguile/read.h
index 4253622..f08872c 100644
--- a/libguile/read.h
+++ b/libguile/read.h
@@ -55,6 +55,7 @@ SCM_API SCM scm_read_options (SCM setting);
 SCM_API SCM scm_read (SCM port);
 SCM_API size_t scm_read_token (int ic, SCM * tok_buf, SCM port, int weird);
 SCM_API SCM scm_read_hash_extend (SCM chr, SCM proc);
+SCM_INTERNAL char *scm_scan_for_encoding (SCM port);
 
 SCM_INTERNAL void scm_i_input_error (const char *func, SCM port,
                                     const char *message, SCM arg)
diff --git a/libguile/strings.c b/libguile/strings.c
index 63d7e82..d46b855 100644
--- a/libguile/strings.c
+++ b/libguile/strings.c
@@ -268,28 +268,42 @@ SCM_GLOBAL_VARIABLE (scm_port_encoding, "%port-encoding");
 static int scm_port_encoding_init = 0;
 
 const char *
-scm_i_get_port_encoding ()
+scm_i_get_port_encoding (SCM port)
 {
   SCM encoding;
-  if (!scm_port_encoding_init)
-    return NULL;
-  else if (!scm_is_fluid (SCM_VARIABLE_REF (scm_port_encoding)))
-    return NULL;
-  else
+  
+  if (scm_is_false (port))
     {
-      encoding = scm_fluid_ref (SCM_VARIABLE_REF (scm_port_encoding));
-      if (!scm_is_string (encoding))
+      if (!scm_port_encoding_init)
        return NULL;
+      else if (!scm_is_fluid (SCM_VARIABLE_REF (scm_port_encoding)))
+       return NULL;
+      else
+       {
+         encoding = scm_fluid_ref (SCM_VARIABLE_REF (scm_port_encoding));
+         if (!scm_is_string (encoding))
+           return NULL;
+         else
+           return scm_i_string_chars (encoding);
+       }
+    }
+  else
+    {
+      scm_t_port *pt;
+      pt = SCM_PTAB_ENTRY (port);
+      if (pt->encoding)
+       return pt->encoding;
       else
-       return scm_i_string_chars (encoding);
+       return NULL;
     }
 }
 
 void
-scm_i_set_port_encoding (const char *enc)
+scm_i_set_port_encoding (SCM port, const char *enc)
 {
   char *buf;
   SCM encoding;
+  scm_t_port *pt;
 
   if (enc == NULL)
     encoding = SCM_BOOL_F;
@@ -298,11 +312,24 @@ scm_i_set_port_encoding (const char *enc)
       encoding = scm_i_make_string (strlen (enc), &buf);
       memcpy (buf, enc, strlen(enc));
     }
-  if (!scm_port_encoding_init
-      || !scm_is_fluid (SCM_VARIABLE_REF (scm_port_encoding)))
-    scm_misc_error (NULL, "tried to set port encoding fluid before it is 
initialized", 
-                   SCM_EOL);
-  scm_fluid_set_x (SCM_VARIABLE_REF (scm_port_encoding), encoding);
+
+  if (scm_is_false (port))
+    {
+      /* Set the default encoding for future ports.  */
+      if (!scm_port_encoding_init
+         || !scm_is_fluid (SCM_VARIABLE_REF (scm_port_encoding)))
+       scm_misc_error (NULL, "tried to set port encoding fluid before it is 
initialized", 
+                       SCM_EOL);
+      scm_fluid_set_x (SCM_VARIABLE_REF (scm_port_encoding), encoding);
+    }
+  else
+    {
+      /* Set the character encoding for this port.  */
+      pt = SCM_PTAB_ENTRY (port);
+      if (pt->encoding)
+       free (pt->encoding);
+      pt->encoding = strdup (enc);
+    }
 }
 
 /* This determines how conversions handle unconvertible characters.  */
@@ -1039,11 +1066,12 @@ SCM_DEFINE (scm_sys_stringbuf_hist, "%stringbuf-hist", 
0, 0, 0,
 
 
 
-SCM_DEFINE (scm_set_conversion_error_behavior_x, 
"set-conversion-error-behavior!",
-           1, 0, 0, 
-           (SCM sym),
+SCM_DEFINE (scm_set_conver
+sion_error_behavior_x, "set-conversion-error-behavior!",
+           2, 0, 0, 
+           (SCM port, SCM sym),
            "Sets the behavior of the interpreter when outputting a character\n"
-           "that is not representable in the current locale.\n"
+           "that is not representable in the port's current encoding.\n"
            "@var{sym} can be either @code{'error}, @code{'substitute}, or\n"
            "@code{'escape}.  If it is @code{'error}, an error will be thrown\n"
            "when an unconvertible character is encountered.  If it is\n"
@@ -1051,45 +1079,71 @@ SCM_DEFINE (scm_set_conversion_error_behavior_x, 
"set-conversion-error-behavior!
            "be replaced with approximate characters, or with question marks\n"
            "if no approximately correct character is available.\n"
            "If it is @code{'escape},\n"
-           "it will appear as a hex escape when output.\n")
+           "it will appear as a hex escape when output.\n"
+           "\n"
+           "If @var{port} is an open port, the conversion error behavior\n"
+           "is set for that port.  If it is @code{#f}, it is set as the\n"
+           "default behavior for any future ports that get created in\n"
+           "this thread.\n")
 #define FUNC_NAME s_scm_set_conversion_error_behavior_x
 {
   SCM err;
   SCM qm;
   SCM esc;
-  
+  int isport;
+  scm_t_port *pt;
+
+  if (!scm_is_false (port))
+    {
+      SCM_VALIDATE_OPPORT (1, port);
+      pt = SCM_PTAB_ENTRY (port);
+      isport = 1;
+    }
+  else
+    isport = 0;
+
   if (!scm_conversion_strategy_init)
     scm_misc_error (NULL, 
                    "tried to set a conversion strategy before string module 
initialization.", 
                    SCM_EOL); 
-  err = scm_from_locale_symbol ("error");
 
+  err = scm_from_locale_symbol ("error");
   if (scm_is_true (scm_eqv_p (sym, err)))
     {
-      scm_fluid_set_x (SCM_VARIABLE_REF (scm_conversion_strategy), 
-                      scm_from_int ((int) iconveh_error));
+      if (!isport)
+       scm_fluid_set_x (SCM_VARIABLE_REF (scm_conversion_strategy), 
+                        scm_from_int ((int) iconveh_error));
+      else
+       pt->ilseq_handler = iconveh_error;
+
+      return SCM_UNSPECIFIED;
     }
-  else 
+
+  qm = scm_from_locale_symbol ("substitute");
+  if (scm_is_true (scm_eqv_p (sym, qm)))
     {
-      qm = scm_from_locale_symbol ("substitute");
-      if (scm_is_true (scm_eqv_p (sym, qm)))
-       {
-         scm_fluid_set_x (SCM_VARIABLE_REF (scm_conversion_strategy), 
-                          scm_from_int ((int) iconveh_question_mark));
-       }
-      else 
-       {
-         esc = scm_from_locale_symbol ("escape");
-         if (scm_is_true (scm_eqv_p (sym, esc)))
-           {
-             scm_fluid_set_x (SCM_VARIABLE_REF (scm_conversion_strategy), 
-                              scm_from_int ((int) iconveh_escape_sequence));
-           }
-         else
-           SCM_MISC_ERROR ("unknown conversion behavior ~s", scm_list_1 (sym));
-       }
+      if (!isport)
+       scm_fluid_set_x (SCM_VARIABLE_REF (scm_conversion_strategy), 
+                        scm_from_int ((int) iconveh_question_mark));
+      else
+       pt->ilseq_handler = iconveh_question_mark;
+      
+      return SCM_UNSPECIFIED;
     }
 
+  esc = scm_from_locale_symbol ("escape");
+  if (scm_is_true (scm_eqv_p (sym, esc)))
+    {
+      if (!isport)
+       scm_fluid_set_x (SCM_VARIABLE_REF (scm_conversion_strategy), 
+                        scm_from_int ((int) iconveh_escape_sequence));
+      else
+       pt->ilseq_handler = iconveh_escape_sequence;
+      
+      return SCM_UNSPECIFIED;
+    }
+
+  SCM_MISC_ERROR ("unknown conversion behavior ~s", scm_list_1 (sym));
 
   return SCM_UNSPECIFIED;
 }
@@ -1443,6 +1497,7 @@ scm_from_locale_stringn (const char *str, size_t len)
   size_t u32len, i;
   int wide = 0;
   char *dst;
+  const char *enc;
 
   if (len == (size_t)-1)
     len = strlen (str);
@@ -1450,8 +1505,8 @@ scm_from_locale_stringn (const char *str, size_t len)
   if (len == 0)
     return scm_nullstr;
 
-  if (scm_i_get_port_encoding () == NULL
-      || strcmp (SCM_PORT_ENCODING, "ISO-8859-1") == 0)
+  enc = scm_i_get_port_encoding (SCM_BOOL_F);
+  if (enc == NULL || strcmp (enc, "ISO-8859-1") == 0)
     {
       res = scm_i_make_string (len, &dst);
       memcpy (dst, str, len);
@@ -1459,7 +1514,7 @@ scm_from_locale_stringn (const char *str, size_t len)
     }
 
   u32len = 0;
-  u32 = (scm_t_wchar *) u32_conv_from_encoding (SCM_PORT_ENCODING,
+  u32 = (scm_t_wchar *) u32_conv_from_encoding (enc,
                                                scm_i_get_conversion_strategy 
(),
                                                str, len,
                                                NULL,
@@ -1479,7 +1534,7 @@ scm_from_locale_stringn (const char *str, size_t len)
          escaped_str = scm_i_make_string (len, &dst);
          memcpy (dst, str, len);
          scm_misc_error (NULL, "input locale conversion error from ~s: ~s",
-                         scm_list_2 (scm_from_locale_string 
(SCM_PORT_ENCODING),
+                         scm_list_2 (scm_from_locale_string (enc),
                                      escaped_str));
        }
     }
@@ -1513,6 +1568,9 @@ scm_from_locale_stringn (const char *str, size_t len)
 SCM
 scm_from_locale_string (const char *str)
 {
+  if (str == NULL)
+    return scm_nullstr;
+
   return scm_from_locale_stringn (str, -1);
 }
 
@@ -1609,6 +1667,15 @@ unistring_escapes_to_guile_escapes (char **bufp, size_t 
*lenp)
 char *
 scm_to_locale_stringn (SCM str, size_t *lenp)
 {
+  return scm_to_encoded_stringn (str, lenp, 
+                                scm_i_get_port_encoding (SCM_BOOL_F),
+                                scm_i_get_conversion_strategy ());
+}
+
+char *
+scm_to_encoded_stringn (SCM str, size_t *lenp, const char *encoding, 
+                       enum iconv_ilseq_handler handler)
+{
   char *buf;
   size_t strlen, len, i;
   int ret;
@@ -1626,7 +1693,7 @@ scm_to_locale_stringn (SCM str, size_t *lenp)
                          "string contains #\\nul character: ~S",
                          scm_list_1 (str));
   
-  if (scm_i_is_narrow_string (str) && (scm_i_get_port_encoding() == NULL))
+  if (scm_i_is_narrow_string (str) && (encoding == NULL))
     {
       if (lenp)
        {
@@ -1645,76 +1712,39 @@ scm_to_locale_stringn (SCM str, size_t *lenp)
 
   buf = NULL;
   len = 0;
-  enc = scm_i_get_port_encoding ();
+  enc = encoding;
   if (enc == NULL)
     enc = "ISO-8859-1";
   if (scm_i_is_narrow_string (str))
     {
       ret = mem_iconveh (scm_i_string_chars (str), strlen,
                         "ISO-8859-1", enc,
-                        scm_i_get_conversion_strategy (), NULL,
+                        handler, NULL,
                         &buf, &len);
 
-      if (ret == 0 && scm_i_get_conversion_strategy () == 
iconveh_escape_sequence)
+      if (ret == 0 && handler == iconveh_escape_sequence)
        unistring_escapes_to_guile_escapes (&buf, &len);
 
       if (ret != 0)
        {
-         /* If there is a conversion error, try to generate an escaped
-            version of the string for the error message.  */
-         SCM escape_str;
-         if (scm_i_get_conversion_strategy () != iconveh_escape_sequence)
-           ret = mem_iconveh (scm_i_string_chars (str), 
-                              strlen,
-                              "ISO-8859-1", enc,
-                              iconveh_escape_sequence, NULL,
-                              &buf, &len);
-         if (ret == 0)
-           {
-             unistring_escapes_to_guile_escapes (&buf, &len);
-             escape_str = scm_from_locale_stringn (buf, len);
-             free (buf);
-             scm_misc_error (NULL, "cannot convert to output locale ~s: 
\"~a\"", 
-                             scm_list_2 (scm_from_locale_string (enc),
-                                         escape_str));
-           }
-         else
-           scm_misc_error (NULL, "cannot convert to output locale ~s", 
-                           scm_list_1 (scm_from_locale_string (enc)));
+         scm_misc_error (NULL, "cannot convert to output locale ~s: \"~s\"", 
+                         scm_list_2 (scm_from_locale_string (enc),
+                                     str));
        }
     }
   else
     {
       buf = u32_conv_to_encoding (enc, 
-                                 scm_i_get_conversion_strategy (),
+                                 handler,
                                  (scm_t_uint32 *) scm_i_string_wide_chars 
(str), 
                                  strlen,
                                  NULL,
                                  NULL, &len);
       if (buf == NULL)
        {
-         /* If there is a conversion error, try to generate an escaped
-            version of the string for the error message.  */
-         SCM escape_str;
-         if (scm_i_get_conversion_strategy () != iconveh_escape_sequence)
-           buf = u32_conv_to_encoding (enc, 
-                                       iconveh_escape_sequence,
-                                       (scm_t_uint32 *) 
scm_i_string_wide_chars (str), 
-                                       strlen,
-                                       NULL,
-                                       NULL, &len);
-         if (buf != NULL)
-           {
-             /* scm_unistring_escapes_to_guile_escapes (&buf, &len) */
-             escape_str = scm_from_locale_stringn (buf, len);
-             free (buf);
-             scm_misc_error (NULL, "cannot convert to output locale ~s: 
\"~a\"", 
-                             scm_list_2 (scm_from_locale_string (enc),
-                                         escape_str));
-           }
-         else
-           scm_misc_error (NULL, "cannot convert to output locale ~s", 
-                           scm_list_1 (scm_from_locale_string (enc)));
+         scm_misc_error (NULL, "cannot convert to output locale ~s: \"~s\"", 
+                         scm_list_2 (scm_from_locale_string (enc),
+                                     str));
        }
     }
 
diff --git a/libguile/strings.h b/libguile/strings.h
index b43523b..ffed569 100644
--- a/libguile/strings.h
+++ b/libguile/strings.h
@@ -78,9 +78,9 @@
 
 #define SCM_PORT_ENCODING (scm_i_get_port_encoding())
 
-SCM_API SCM scm_set_conversion_error_behavior_x (SCM behavior);
-SCM_INTERNAL const char *scm_i_get_port_encoding (void);
-SCM_INTERNAL void scm_i_set_port_encoding (const char *enc);
+SCM_API SCM scm_set_conversion_error_behavior_x (SCM port, SCM behavior);
+SCM_INTERNAL const char *scm_i_get_port_encoding (SCM port);
+SCM_INTERNAL void scm_i_set_port_encoding (SCM port, const char *enc);
 SCM_INTERNAL enum iconv_ilseq_handler scm_i_get_conversion_strategy (void);
 SCM_API SCM scm_string_p (SCM x);
 SCM_API SCM scm_string (SCM chrs);
@@ -112,6 +112,9 @@ SCM_API SCM scm_take_locale_string (char *str);
 SCM_API SCM scm_take_locale_stringn (char *str, size_t len);
 SCM_API char *scm_to_locale_string (SCM str);
 SCM_API char *scm_to_locale_stringn (SCM str, size_t *lenp);
+SCM_INTERNAL char *scm_to_encoded_stringn (SCM str, size_t *lenp, 
+                                          const char *encoding,
+                                          enum iconv_ilseq_handler handler);
 SCM_INTERNAL scm_t_uint8 *scm_i_to_utf8_string (SCM str);
 SCM_API size_t scm_to_locale_stringbuf (SCM str, char *buf, size_t max_len);
 
diff --git a/test-suite/tests/encoding-escapes.test 
b/test-suite/tests/encoding-escapes.test
index 79c4f3c..3ab6a58 100644
--- a/test-suite/tests/encoding-escapes.test
+++ b/test-suite/tests/encoding-escapes.test
@@ -70,7 +70,7 @@
 
 ;; Check that an error is flagged on display output when the output
 ;; error strategy is 'error
-(set-conversion-error-behavior! 'error)
+(set-conversion-error-behavior! #f 'error)
 
 (setlocale LC_ALL "C")
 
@@ -87,7 +87,7 @@
 
 ;; Check that questions marks or substitutions appear when the conversion
 ;; mode is substitute
-(set-conversion-error-behavior! 'substitute)
+(set-conversion-error-behavior! #f 'substitute)
 
 (with-test-prefix "display output substitutions"
 
@@ -101,7 +101,7 @@
 
 ;; Check that hex escapes appear in the write output and that no error
 ;; is thrown.  The output error strategy should be irrelevant here.
-(set-conversion-error-behavior! 'escape)
+(set-conversion-error-behavior! #f 'escape)
 
 (with-test-prefix "display output escapes"
 
diff --git a/test-suite/tests/encoding-iso88591.test 
b/test-suite/tests/encoding-iso88591.test
index 425d174..b1b9586 100644
--- a/test-suite/tests/encoding-iso88591.test
+++ b/test-suite/tests/encoding-iso88591.test
@@ -116,12 +116,12 @@
                 (ñ 2))
             (eq? (+ á ñ) 3))))
 
-(set-conversion-error-behavior! 'error)
+(set-conversion-error-behavior! #f 'error)
+(setencoding "ISO-8859-1")
 
 (with-test-prefix "output errors"
 
   (pass-if-exception "char 256" exception:conversion
-                    (setencoding "ISO-8859-1")
                     (with-output-to-string 
                       (lambda() (display (string-ints 256))))))
 
diff --git a/test-suite/tests/encoding-iso88597.test 
b/test-suite/tests/encoding-iso88597.test
index 44ba823..259f1b8 100644
--- a/test-suite/tests/encoding-iso88597.test
+++ b/test-suite/tests/encoding-iso88597.test
@@ -117,7 +117,7 @@
                 (ñ 2))
             (eq? (+ á ñ) 3))))
 
-(set-conversion-error-behavior! 'error)
+(set-conversion-error-behavior! #f 'error)
 
 (with-test-prefix "output errors"
 
diff --git a/test-suite/tests/i18n.test b/test-suite/tests/i18n.test
index 1f8d406..5a23eff 100644
--- a/test-suite/tests/i18n.test
+++ b/test-suite/tests/i18n.test
@@ -1,5 +1,4 @@
-;;;; i18n.test --- Exercise the i18n API.
-(setencoding "ISO-8859-1")
+;;;; i18n.test --- Exercise the i18n API. -*- mode: scheme; coding: iso-8859-1 
-*-
 ;;;;
 ;;;; Copyright (C) 2006, 2007 Free Software Foundation, Inc.
 ;;;; Ludovic Courtès
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 88782a0..f53cb4d 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -32,6 +32,7 @@
   (data-file-name "ports-test.tmp"))
 
 ;; Make sure that we are set up for 8-bit clean data
+(setlocale LC_ALL "C")
 (setbinary)
 
 


hooks/post-receive
-- 
GNU Guile




reply via email to

[Prev in Thread] Current Thread [Next in Thread]