Re: Unicode, ports and encoding

From: Mike Gran
Subject: Re: Unicode, ports and encoding
Date: Tue, 17 Feb 2009 15:45:32 -0800 (PST)

 > From: Ludovic Courtès <address@hidden>
>> Mike Gran writes:

> >     This implies that a source code file should have syntax to
> >     indicate its own encoding, if it is not ASCII.  Something akin to
> >     the  line in HTML files.
> One could imagine special treatment of, say, the first 10 lines of a
> file, with the ability to recognize Emacs file variables like
> "-*- coding: utf-8 -*-" and to change the current port transcoder
> accordingly, something like that.

Yeah.  Something like that.

> IIRC, the first step you suggested was the implementation of wide
> string/char types.  Did you also work on this?

Sort of.

I thought I could start there, but, it isn't easy. There is a lot that could
be broken by modifying string processing.  So I tried writing some tests 
first so I can check my work as I go along.  But the tests have to be
non-ASCII, so they need to be converted when they are read in.
It gets a little bit circular using scm_from_locale_string to convert
non-ASCII strings in the test source, and then having the test check
the behavior of scm_from_locale_string.

So, now I think a better route is to make some type of simplified
transcoded port system available to ports so that non-ASCII
tests are read in correctly.   From there, one can work up toward wide
strings and chars while checking work along the way.


Mike Gran

