From: Andy Wingo
Date: Thu, 31 Jan 2013 12:04:56 +0100
On Wed 16 Jan 2013 19:16, Andy Wingo <address@hidden> writes:

> On Wed 16 Jan 2013 18:37, address@hidden (Ludovic Courtès) writes:
>> I just think [string port encodings] may have to wait until 2.2.
> Oh yes, agreed here.  Anyway let's let it simmer for a while.  Another
> two or three of these threads should be enough to either reaffirm or
> change the current state of things :)

OK that was simmering long enough ;)

I just merged stable-2.0 to master.  There is now a failing test.

      '(*TOP* (foo "\xA0"))
      (xml->sxml "<foo>&nbsp;</foo>"
                 #:entities '((nbsp . "\xA0"))))

This one fails, with (encoding-error "scm_to_stringn" "cannot convert
narrow string to output locale" 84 #f #f).

It passes in stable-2.0 because "ASCII" is erroneously treated as equal
the same as "ISO-8859-1".  In master, attempting to write a character
above #\x7F to an ASCII port will cause an encoding error.  It seems
more correct than the 2.0 behavior.  This error would have happened in
stable-2.0 if I had chose an entity with a character above #\xFF.

Looking further, the cause is in sxml/upstream/SSAX.scm:

   (define (ssax:handle-parsed-entity port name entities
                                      content-handler str-handler seed)
           (call-with-input-string ent-body
             (lambda (port) (content-handler port new-entities seed)))

Here is where I think this code goes wrong: its correctness appears to
depend on the default port encoding.  That is totally bogus.  It was
written long before we had such a thing.

Again, I think the default encoding for a string port should be one that
can represent all characters, and we should change this in master.


