[MIT-Scheme-users] *parser and UTF-8 (was: Staging problem)

From: Kaloian Doganov
Subject: [MIT-Scheme-users] *parser and UTF-8 (was: Staging problem)
Date: Thu, 29 Jun 2006 22:09:17 +0300

        > I am eager to try out the improved *parser that deals with Unicode
        > buffers, described in release notes for "Testing release 7.7.91
        > (pending)" [1].  That's why I've tried to build the code from CVS in 
        > first place.

        I see.  You don't need to build from CVS for that -- the Debian package
        you have installed has that update already.

Here is a simple test case.

Let's say I have a text file "sample-data.txt" which is encoded in UTF-8
and contains just one line:

Когато бях овчарче и овците пасях...

Then executing the following simple program:

(load-option '*parser)

(define full-alphabet
  (code-points->alphabet (list (cons #x0 #xD7FF)
                                                           (cons #xE000 #xFFFD)
                                                           (cons #x10000 (-1+ 

(call-with-input-file "sample-data.txt"
  (lambda (port)
        (display ((*parser (seq (match (* (alphabet full-alphabet)))))
                          (input-port->parser-buffer port)))))

should display:

   #(Когато бях овчарче и овците пасях

but it displays:

   #(Когато бях овчарче и овците пасях...

Which looks like UTF-8 byte sequence parsed as ISO-8859-1.  Or perhaps
the data itself is parsed properly, but the `display' procedure is not
capable to handle it?

What I am doing wrong?

My locale is LANG=bg_BG.UTF-8.

