[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode and Guile

From: Tom Lord
Subject: Re: Unicode and Guile
Date: Tue, 11 Nov 2003 17:40:28 -0800 (PST)

    > From: Marius Vollmer <address@hidden>

    > Tom Lord <address@hidden> writes:

    > >     ~ (grapheme=? g1 g2 [locale]) => <boolean>
    > >     ~ (grapheme<? g1 g2 [locale])
    > >     ~ (grapheme>? g1 g2 [locale])
    > >     [...]
    > >     ~ (grapheme-ci=? g1 g2 [locale])
    > >     ~ (grapheme-ci<? g1 g2 [locale])
    > >     ~ (grapheme-ci>? g1 g2 [locale])

    > >       The usual orderings.

    > Is it a good idea to have an ordering among graphemes, or would it be
    > better to only order texts, i.e., to allow for the context of a
    > grapheme to determine the order?

I think it's a fine idea to order graphemes but, depending on the
locale, the ordering of texts is _not_ a lexical ordering grounded in
grapheme ordering.

It would be good to provide a locale, perhaps the default, in which
ordering of texts _is_ a lexical ordering grounded in (default)
grapheme order.

    > >     ~ (make-text-marker text index) => <marker>

    > What about having _only_ markers and not allow integers as
    > indices?

Seems excessive and aribtrary.  How do I implement (Emacs') GOTO-CHAR
without standing on my head?

    > Also, what about making TEXTs unmutable by default and instead let
    > TEXT-REPLACE, etc return a new text object?

Given an implementation that can do that efficiently, I see no
obstacle to implementing a new type, META-TEXT?, which is mutable in
exactly the way that TEXT? is in my proposal.   That'd be ridiculously
inconvenient though.   So, make META-TEXT? the same thing as TEXT?.

(I strongly suggest splay trees as an ideal implementation strategy
for for TEXT?.   They would make _both_ mutating and functional
REPLACE efficient.)

    > >   There is no essential difference between a grapheme and a text
    > >   object of length 1, and thus the proposal makes GRAPHEME? a 
    > >   subtype of TYPE.

    > Do we need the concept of grapheme at all, then?

Interesting question!  And it ties in with your question about "why
not just markers and not integer indexes".

I don't see a good way to ground markers _without_ integer indexes.

Graphemes are a reasonable "what the user thinks of as a character".

What does DELETE-BACKWARD-CHAR delete (for example) (at least by
default) if not a grapheme?  And in the non-default cases, how does it
analyze the TEXT?  value to figure out what to do?

    > > The proposal also makes it possible to pass strings everywhere that
    > > text can be used.   I think that's the more interesting direction: 
    > > just use text- and grapheme- procedures from now on except where you
    > > _really_ want to refer to octets.

    > Could we make strings/chars go away completely over time?  For vectors
    > of octets, there is u8vector? from SRFI-4.

I wouldn't object to seeing a complete unification of STRING? with
u8vector.   I'm not so sure that the CHAR? type is particularly useful
in the long run -- it's rather culturally biased.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]