[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf8 and emacs text/string multibyte representation

From: Eli Zaretskii
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Wed, 29 Oct 2014 16:51:35 +0200

> From: Camm Maguire <address@hidden>
> Date: Wed, 29 Oct 2014 10:04:58 -0400
> Greetings!  I've recently been considering supporting unicode in gcl by
> representing strings internally in utf8.  It appears that emacs does the
> same or similar.

If you haven't already, you can find some basic description of what
Emacs does in the node "Text Representations" of the ELisp manual.

> Apart from the obvious memory footprint benefits, I'd
> like to ask what other advantages/disadvantages have been discovered.

You have basically said it yourself: memory footprint vs
addressability.  If you want to discuss this in more detail, I suggest
to ask more specific questions about specific aspects that bother you.

> A cached internal pointer storing the last referenced codepoint
> offset makes access essentially O(1).

We indeed maintain a cache for byte-to-character and character-to-byte

> Yet setting string elements can trigger reallocations/memmove
> operations.

Emacs, as every editor, needs to handle this efficiently anyway,
because editing operations rarely leave the buffer size unchanged.  So
Emacs uses a gap to minimize reallocations.

> While these can be aggregated over the setting of multiple elements,
> operations like nreverse look ridiculous if left in terms of calls
> to aref and aset.

nreverse applied to a string is a rarity, IME.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]