[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: utf8 and emacs text/string multibyte representation
From: |
Raymond Toy |
Subject: |
Re: utf8 and emacs text/string multibyte representation |
Date: |
Wed, 29 Oct 2014 08:56:55 -0700 |
User-agent: |
Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b34 (darwin) |
>>>>> "Camm" == Camm Maguire <address@hidden> writes:
Camm> Greetings! I've recently been considering supporting unicode in gcl
by
Camm> representing strings internally in utf8. It appears that emacs does
the
Camm> same or similar. Apart from the obvious memory footprint benefits,
I'd
Camm> like to ask what other advantages/disadvantages have been discovered.
Camm> Much of the utf8 literature emphasizes that most algorithms can
proceed
Camm> conventionally in byte-wise fashion, including lexicographical
ordering
Camm> comparisons, given that almost all jobs are sequential, at least
Camm> initially. A cached internal pointer storing the last referenced
Camm> codepoint offset makes access essentially O(1). Yet setting string
Camm> elements can trigger reallocations/memmove operations. While these
can
Camm> be aggregated over the setting of multiple elements, operations like
Camm> nreverse look ridiculous if left in terms of calls to aref and aset.
Camm> Thoughts, advice and experiences most appreciated.
Have you looked at what other Lisp implementations do? AFAIK, none use
utf-8. CCL and clisp use utf-32, cmucl and allegro use utf-16, sbcl
and ecl(?) have two string types: 8-bit base-string and 32-bit
strings.
As a one-man operation (unfortunately), I'd go with the easiest one to
get right and follow either ccl or cmucl. The rest of the support for
unicode can be added with libraries like cl-unicode and/or babel, if
need be.
--
Ray
- utf8 and emacs text/string multibyte representation, (continued)
- utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/29
- Re: utf8 and emacs text/string multibyte representation, Eli Zaretskii, 2014/10/29
- Re: utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/29
- Re: utf8 and emacs text/string multibyte representation, Eli Zaretskii, 2014/10/29
- Re: utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
- Re: utf8 and emacs text/string multibyte representation, Eli Zaretskii, 2014/10/31
- Re: utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
- Re: utf8 and emacs text/string multibyte representation, Eli Zaretskii, 2014/10/31
- Re: utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
- Re: utf8 and emacs text/string multibyte representation, Stefan Monnier, 2014/10/29
- Re: utf8 and emacs text/string multibyte representation,
Raymond Toy <=
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/10/31
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Stefan Monnier, 2014/10/31
- Message not available
- Re: utf8 and emacs text/string multibyte representation, Andreas Schwab, 2014/10/31
- utf8 and emacs text/string multibyte representation, Stephen J. Turnbull, 2014/10/29
- Re: Referring to revisions in the git future., Eric S. Raymond, 2014/10/29
- Re: Referring to revisions in the git future., Stefan Monnier, 2014/10/29
- Re: Referring to revisions in the git future., Eric S. Raymond, 2014/10/29
- Re: Referring to revisions in the git future., Stephen J. Turnbull, 2014/10/29
- Re: Referring to revisions in the git future., Jan Djärv, 2014/10/29
- Re: Referring to revisions in the git future., Eric S. Raymond, 2014/10/29