bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11197: problems with string ports and unicode


From: Ludovic Courtès
Subject: bug#11197: problems with string ports and unicode
Date: Wed, 11 Apr 2012 23:01:16 +0200
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.93 (gnu/linux)

Hi Mark,

Mark H Weaver <address@hidden> skribis:

> Okay, now I understand.  The problem is that internally, string ports
> are implemented by converting the string into a stream of bytes in the
> string port's encoding, and then the string port reads those bytes.

Exactly.

[...]

> Conceptually, a string port is a textual port, not a binary port.

But not in Guile, where there’s no distinction between textual and
binary ports.  One can write code like:

  scheme@(guile-user)> (define (string->utf16 s)
                         (let ((p (with-fluids ((%default-port-encoding 
"UTF-16BE"))
                                    (open-input-string s))))
                           (get-bytevector-all p)))
  scheme@(guile-user)> (string->utf16 "hello")
  $4 = #vu8(0 104 0 101 0 108 0 108 0 111)
  scheme@(guile-user)> (use-modules(rnrs bytevectors))
  scheme@(guile-user)> (utf16->string $4)
  $5 = "hello"

> You should be able to hand it an arbitrary string and read those
> characters from it, as described in SRFI-6, without setting
> Guile-specific fluid variables.  Similarly, you should be able to
> write arbitrary characters to a string-output-port.

The SRFI-6 issue could be addressed with:

diff --git a/module/srfi/srfi-6.scm b/module/srfi/srfi-6.scm
index 098b586..ba946ec 100644
--- a/module/srfi/srfi-6.scm
+++ b/module/srfi/srfi-6.scm
@@ -1,6 +1,6 @@
 ;;; srfi-6.scm --- Basic String Ports
 
-;;     Copyright (C) 2001, 2002, 2003, 2006 Free Software Foundation, Inc.
+;;     Copyright (C) 2001, 2002, 2003, 2006, 2012 Free Software Foundation, 
Inc.
 ;;
 ;; This library is free software; you can redistribute it and/or
 ;; modify it under the terms of the GNU Lesser General Public
@@ -23,10 +23,16 @@
 ;;; Code:
 
 (define-module (srfi srfi-6)
-  #:re-export (open-input-string open-output-string get-output-string))
+  #:export (open-input-string open-output-string)
+  #:re-export (get-output-string))
 
-;; Currently, guile provides these functions by default, so no action
-;; is needed, and this file is just a placeholder.
+(define (open-input-string s)
+  (with-fluids ((%default-port-encoding "UTF-8"))
+    ((@ (guile) open-input-string) s)))
+
+(define (open-output-string)
+  (with-fluids ((%default-port-encoding "UTF-8"))
+    ((@ (guile) open-output-string))))
 
 (cond-expand-provide (current-module) '(srfi-6))
It wouldn’t completely solve the problem.

> IMO, string ports should use UTF-8 as their initial port encoding, since
> we know that UTF-8 can represent any Guile string.  This will allow
> portable use of string ports.

The change was submitted and briefly discussed at
<http://thread.gmane.org/gmane.lisp.guile.devel/9822>.

I think the rationale was mostly backward compatibility (in 1.8 people
could mix Latin-1 textual and binary I/O), consistency with how other
ports behave, and the ability to change the default encoding of string
ports.

> I realize that this would change the existing behavior of programs that
> use binary I/O on string ports, but as things stand right now, portable
> SRFI-6 code is broken on Guile.
>
> What do you think?

In hindsight, UTF-8 does seem like a better default than the locale port
encoding (which is what %default-port-encoding is, by default), but it
does remain useful to specify a different encoding.

>>> What _is_ needed is a file coding declaration near the top of the source
>>> file, e.g. "coding: utf-8" (see "Character Encoding of Source Files" in
>>> the manual).
>>
>> Yes.  And you actually need both–i.e., the ‘coding’ cookie won’t
>> magically make string ports use that encoding.
>>
>>> I tried that and it still fails for me.
>>
>> What fails exactly?
>
> It fails ungracefully (goes into an infinite while trying to print the
> backtrace) without the %default-port-encoding setting.

Indeed, it’s stuck in a deadlock:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007ffff75e1204 in __lll_lock_wait () from 
/nix/store/vxycd107wjbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0
#1  0x00007ffff75dc4d4 in _L_lock_999 () from 
/nix/store/vxycd107wjbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0
#2  0x00007ffff75dc2ea in pthread_mutex_lock () from 
/nix/store/vxycd107wjbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0
#3  0x00007ffff7b30499 in scm_dynwind_pthread_mutex_lock (mutex=0x7ffff7dd28c0) 
at threads.c:1962
#4  0x00007ffff7b2bb0e in scm_mkstrport (pos=0x2, str=0x4, modes=327680, 
caller=<value optimized out>) at strports.c:287
#5  0x00007ffff7aac20b in display_backtrace_body (a=0x7fffffffc1a0) at 
backtrace.c:487
#6  0x00007ffff7b46c7b in vm_regular_engine (vm=0x6f61f0, program=0x7f5d50, 
argv=0x6fa3b0, nargs=-1) at vm-i-system.c:895
#7  0x00007ffff7ac039e in scm_call_3 (proc=0x7f5d50, arg1=<value optimized 
out>, arg2=<value optimized out>, arg3=<value optimized out>) at eval.c:500
#8  0x00007ffff7b32504 in scm_internal_catch (tag=<value optimized out>, 
body=<value optimized out>, body_data=<value optimized out>, handler=<value 
optimized out>, handler_data=<value optimized out>) at throw.c:222
#9  0x00007ffff7aabbba in scm_display_backtrace_with_highlights (stack=<value 
optimized out>, port=<value optimized out>, first=<value optimized out>, 
depth=<value optimized out>, highlights=<value optimized out>)
    at backtrace.c:558
#10 0x00007ffff7ab725e in print_exception_and_backtrace (error_port=0x6f6170, 
tag=0x66d4c0, args=0x8e6ea0) at continuations.c:490
#11 pre_unwind_handler (error_port=0x6f6170, tag=0x66d4c0, args=0x8e6ea0) at 
continuations.c:534
#12 0x00007ffff7b46c7b in vm_regular_engine (vm=0x6f61f0, program=0x7f3ce0, 
argv=0x6fa300, nargs=-1) at vm-i-system.c:895
#13 0x00007ffff7b4846e in scm_call_with_vm (vm=0x6f61f0, proc=0x7f3ce0, 
args=<value optimized out>) at vm.c:878
#14 0x00007ffff7b296db in scm_to_stringn (str=0x8dba80, lenp=0x7fffffffc4e8, 
encoding=<value optimized out>, handler=SCM_FAILED_CONVERSION_ERROR) at 
strings.c:2102
#15 0x00007ffff7b2bb73 in scm_mkstrport (pos=0x2, str=0x8dba80, modes=196608, 
caller=<value optimized out>) at strports.c:312
--8<---------------cut here---------------end--------------->8---

This could be fixed by calling ‘scm_new_port_table_entry’ after having
prepared the backing buffer, but the problem is that ‘pt->encoding’ is
needed before.

Thoughts?

Ludo’.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]