[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: string port encodings
From: |
Andy Wingo |
Subject: |
Re: string port encodings |
Date: |
Thu, 31 Jan 2013 12:04:56 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) |
Hi,
On Wed 16 Jan 2013 19:16, Andy Wingo <address@hidden> writes:
> On Wed 16 Jan 2013 18:37, address@hidden (Ludovic Courtès) writes:
>
>> I just think [string port encodings] may have to wait until 2.2.
>
> Oh yes, agreed here. Anyway let's let it simmer for a while. Another
> two or three of these threads should be enough to either reaffirm or
> change the current state of things :)
OK that was simmering long enough ;)
I just merged stable-2.0 to master. There is now a failing test.
(pass-if-equal
'(*TOP* (foo "\xA0"))
(xml->sxml "<foo> </foo>"
#:entities '((nbsp . "\xA0"))))
This one fails, with (encoding-error "scm_to_stringn" "cannot convert
narrow string to output locale" 84 #f #f).
It passes in stable-2.0 because "ASCII" is erroneously treated as equal
the same as "ISO-8859-1". In master, attempting to write a character
above #\x7F to an ASCII port will cause an encoding error. It seems
more correct than the 2.0 behavior. This error would have happened in
stable-2.0 if I had chose an entity with a character above #\xFF.
Looking further, the cause is in sxml/upstream/SSAX.scm:
(define (ssax:handle-parsed-entity port name entities
content-handler str-handler seed)
...
(call-with-input-string ent-body
(lambda (port) (content-handler port new-entities seed)))
...)
Here is where I think this code goes wrong: its correctness appears to
depend on the default port encoding. That is totally bogus. It was
written long before we had such a thing.
Again, I think the default encoding for a string port should be one that
can represent all characters, and we should change this in master.
Andy
--
http://wingolog.org/