[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fencepost error in encoding processing

From: Ken Raeburn
Subject: Re: fencepost error in encoding processing
Date: Mon, 16 Nov 2009 12:25:17 -0500

On Nov 16, 2009, at 08:03, Ludovic Courtès wrote:
As far as encoding names are concerned, Bruno Haible pointed me to and I added a link to it
in the manual a couple of days ago.

Between your link and Mike's, it looks to me like we should add several more characters.

The GNU libc code adds ":" and "," to the list. The comment in iconv_open doesn't list the comma, but the function it calls does permit it. There's also some special handling of "/".

The IANA list shows names using "+" and parens ("ebcdic-us-37+euro", "NF_Z_62-010_(1973)"), as well as colons.

I've skimmed the ICU page Mike pointed to, and it includes names like "UTF-16BE,version=1" and "ibm-1149_P100-197,swaplfnl" as well as "+" and ":" names, when showing "all aliases". If we only try to support, say, IANA and MIME, then "+" and ":" are used but not "=".

Since we're scanning an Emacs-style coding specification, as long as whitespace and semicolon aren't on the list, I think we can be expansive, so let's go ahead and include all of ":,+=/()" to the allowed set. The results will still be constrained by whatever the OS supports; we just don't want Guile to impose additional constraints.

Should we allow punctuation in general by calling ispunct (and explicitly checking for semicolon) instead? (Note that isalnum and ispunct will also check for locale-specific characters... of course, the new encoding spec hasn't come into effect yet....)


Allow more characters in coding system names in Emacs-style declarations.

    * libguile/read.c (scm_i_scan_for_encoding): Allow more punctuation
      symbols in coding system names.

diff --git a/libguile/read.c b/libguile/read.c
index 775612a..657e101 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -1506,8 +1506,7 @@ scm_i_scan_for_encoding (SCM port)
   i = 0;
   while (pos + i - header <= SCM_ENCODING_SEARCH_SIZE
          && pos + i - header < bytes_read
-        && (isalnum((int) pos[i]) || pos[i] == '_' || pos[i] == '-'
-             || pos[i] == '.'))
+        && (isalnum((int) pos[i]) || strchr("_-.:/,+=()", pos[i]) != NULL))

   if (i == 0)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]