bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in bin


From: David Kastrup
Subject: bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode
Date: Wed, 25 Mar 2015 15:31:32 +0100

Run the following code in an UTF-8 capable locale:

(setlocale LC_ALL "")
(use-modules (rnrs io ports) (rnrs bytevectors) (ice-9 format))
(let ((p (open-bytevector-input-port
          (u8-list->bytevector '(#xc3 #x9f #xc3 #X9f)))))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (format #t "#x~x\n" (char->integer (read-char p)))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (set-port-encoding! p "ISO-8859-1")
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (format #t "#x~x\n" (char->integer (read-char p)))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p)))
This results in the output
#f #t
#xdf
#f #t
ISO-8859-1 #f
#xc3
ISO-8859-1 #f

The manual, however, states:

 -- Scheme Procedure: port-encoding port
 -- C Function: scm_port_encoding (port)
     Returns, as a string, the character encoding that PORT uses to
     interpret its input and output.  The value ‘#f’ is equivalent to
     ‘"ISO-8859-1"’.

That would appear to be false since the value #f here is treated as
equivalent to "UTF-8" rather than "ISO-8859-1".

In addition, the manual states

 -- Scheme Procedure: binary-port? port
     Return ‘#t’ if PORT is a "binary port", suitable for binary data
     input/output.

     Note that internally Guile does not differentiate between binary
     and textual ports, unlike the R6RS. Thus, this procedure returns
     true when PORT does not have an associated encoding—i.e., when
     ‘(port-encoding PORT)’ is ‘#f’ (*note port-encoding: Ports.).  This
     is the case for ports returned by R6RS procedures such as
     ‘open-bytevector-input-port’ and ‘make-custom-binary-output-port’.

     However, Guile currently does not prevent use of textual I/O
     procedures such as ‘display’ or ‘read-char’ with binary ports.
     Doing so “upgrades” the port from binary to textual, under the
     ISO-8859-1 encoding.  Likewise, Guile does not prevent use of
     ‘set-port-encoding!’ on a binary port, which also turns it into a
     “textual” port.

But it would appear that the only way to actually get binary-encoded
read-char behavior is to switch the port to textual.  While the port is
in "binary" mode, it will decode as utf-8 rather than deliver binary
data.  Also it will not automagically switch itself away from the
nominal #f encoding which is not actually present.

Putting (with-fluids ((%default-port-encoding #f)) ...) around the
open-bytevector-input-port call results in the output
#f #t
#xc3
ISO-8859-1 #f
ISO-8859-1 #f
#x9f
ISO-8859-1 #f
which actually corresponds to the documentation.

-- 
David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]