bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in bin

From: David Kastrup
Subject: bug#20200: GUILE 2.0.11: open-bytevector-input-port fails to open in binary mode
Date: Wed, 25 Mar 2015 15:31:32 +0100

Run the following code in an UTF-8 capable locale:

(setlocale LC_ALL "")
(use-modules (rnrs io ports) (rnrs bytevectors) (ice-9 format))
(let ((p (open-bytevector-input-port
          (u8-list->bytevector '(#xc3 #x9f #xc3 #X9f)))))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (format #t "#x~x\n" (char->integer (read-char p)))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (set-port-encoding! p "ISO-8859-1")
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p))
  (format #t "#x~x\n" (char->integer (read-char p)))
  (format #t "~a ~a\n" (port-encoding p) (binary-port? p)))
This results in the output
#f #t
#f #t
ISO-8859-1 #f
ISO-8859-1 #f

The manual, however, states:

 -- Scheme Procedure: port-encoding port
 -- C Function: scm_port_encoding (port)
     Returns, as a string, the character encoding that PORT uses to
     interpret its input and output.  The value ‘#f’ is equivalent to

That would appear to be false since the value #f here is treated as
equivalent to "UTF-8" rather than "ISO-8859-1".

In addition, the manual states

 -- Scheme Procedure: binary-port? port
     Return ‘#t’ if PORT is a "binary port", suitable for binary data

     Note that internally Guile does not differentiate between binary
     and textual ports, unlike the R6RS. Thus, this procedure returns
     true when PORT does not have an associated encoding—i.e., when
     ‘(port-encoding PORT)’ is ‘#f’ (*note port-encoding: Ports.).  This
     is the case for ports returned by R6RS procedures such as
     ‘open-bytevector-input-port’ and ‘make-custom-binary-output-port’.

     However, Guile currently does not prevent use of textual I/O
     procedures such as ‘display’ or ‘read-char’ with binary ports.
     Doing so “upgrades” the port from binary to textual, under the
     ISO-8859-1 encoding.  Likewise, Guile does not prevent use of
     ‘set-port-encoding!’ on a binary port, which also turns it into a
     “textual” port.

But it would appear that the only way to actually get binary-encoded
read-char behavior is to switch the port to textual.  While the port is
in "binary" mode, it will decode as utf-8 rather than deliver binary
data.  Also it will not automagically switch itself away from the
nominal #f encoding which is not actually present.

Putting (with-fluids ((%default-port-encoding #f)) ...) around the
open-bytevector-input-port call results in the output
#f #t
ISO-8859-1 #f
ISO-8859-1 #f
ISO-8859-1 #f
which actually corresponds to the documentation.

David Kastrup

