--- Begin Message ---
Subject: |
drain-input doesn't decode |
Date: |
Fri, 4 Mar 2016 03:09:44 +0000 |
The documentation for drain-input says that it returns a string of
characters, implying that the result is equivalent to what you'd get
from calling read-char some number of times. In fact it differs in a
significant respect: whereas read-char decodes input octets according to
the port's selected encoding, drain-input ignores the selected encoding
and always decodes according to ISO-8859-1 (thus preserving the octet
values in character form).
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port)
"UCS-2BE") (write (port-encoding (current-input-port))) (newline) (write (map
char->integer (let r ((l '\''())) (let ((c (read-char (current-input-port))))
(if (eof-object? c) (reverse l) (r (cons c l))))))) (newline)'
"UCS-2BE"
(353 610 867)
$ echo -n $'\1a\2b\3c' | guile-2.0 -c '(set-port-encoding! (current-input-port)
"UCS-2BE") (write (port-encoding (current-input-port))) (newline) (peek-char
(current-input-port)) (write (map char->integer (string->list (drain-input
(current-input-port))))) (newline)'
"UCS-2BE"
(1 97 2 98 3 99)
The practical upshot is that the input returned by drain-input can't
be used in the same way as regular input from read-char. It can still
be used if the code doing the reading is totally aware of the encoding,
so that it can perform the decoding manually, but this seems a failure
of abstraction. The value returned by drain-input ought to be coherent
with the abstraction level at which it is specified.
I can see that there is a reason for drain-input to avoid performing
decoding: the problem that occurs if the buffer ends in the middle
of a character. If drain-input is to return decoded characters then
presumably in this case it would have to read further octets beyond the
buffer contents, in an unbuffered manner, until it reaches a character
boundary. If this is too unpalatable, perhaps drain-input should be
permitted only on ports configured for single-octet character encodings.
If, on the other hand, it is decided to endorse the current non-decoding
behaviour, then the break of abstraction needs to be documented.
-zefram
--- End Message ---
--- Begin Message ---
Subject: |
drain-input doesn't decode |
Date: |
Wed, 19 May 2021 13:41:26 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 |
Closing this since it's 5 years old and fixed in Guile 2.1 and higher.
--
Taylan
--- End Message ---