guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: byte-order marks


From: Neil Jerram
Subject: Re: byte-order marks
Date: Tue, 29 Jan 2013 21:12:21 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Andy Wingo <address@hidden> writes:

> On Tue 29 Jan 2013 20:22, Neil Jerram <address@hidden> writes:
>
>> (define (read-csv file-name)
>>   (let ((s (utf16->string (get-bytevector-all (open-input-file file-name))
>>                        'little)))
>>
>>     ;; Discard possible byte order mark.
>>     (if (and (>= (string-length s) 1)
>>           (char=? (string-ref s 0) #\xfeff))
>>      (set! s (substring s 1)))
>>
>>     ...))
>
> FWIW the procedure I had was:
>
> (define (consume-byte-order-mark port)
>   (let ((enc (or (port-encoding port) "ISO-8859-1")))
>     (set-port-encoding! port "ISO-8859-1")
>     (case (peek-char port)
>       ((#\xEF)
>        (read-char port)
>        (case (peek-char port)
>          ((#\xBB)
>           (read-char port)
>           (case (peek-char port)
>             ((#\xBF)
>              (read-char port)
>              (set-port-encoding! port "UTF-8"))
>             (else
>              (unread-char #\xBB port)
>              (unread-char #\xEF port)
>              (set-port-encoding! port enc))))
>          (else
>           (unread-char #\xEF port)
>           (set-port-encoding! port enc))))
>       ((#\xFE)
>        (read-char port)
>        (case (peek-char port)
>          ((#\xFF)
>           (read-char port)
>           (set-port-encoding! port "UTF-16BE"))
>          (else
>           (unread-char #\xFE port)
>           (set-port-encoding! port enc))))
>       ((#\xFF)
>        (read-char port)
>        (case (peek-char port)
>          ((#\xFE)
>           (read-char port)
>           (set-port-encoding! port "UTF-16LE"))
>          (else
>           (unread-char #\xFF port)
>           (set-port-encoding! port enc))))
>       (else
>        (set-port-encoding! port enc)))))
>
> The encoding dance is because there is no unread-u8 from Scheme, only
> unread-char.

I can see why you'd want to do something about that!

Regards,
        Neil



reply via email to

[Prev in Thread] Current Thread [Next in Thread]