emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: creating unibyte strings


From: Eli Zaretskii
Subject: Re: creating unibyte strings
Date: Fri, 22 Mar 2019 15:27:17 +0200

> From: Stefan Monnier <address@hidden>
> Cc: address@hidden
> Date: Fri, 22 Mar 2019 08:33:02 -0400
> 
> >> Which reminds me: could someone add to the module API a primitive to
> >> build a *unibyte* string?
> > I don't like adding such a primitive.  We don't want to proliferate
> > unibyte strings in Emacs through that back door, because manipulating
> > unibyte strings involves subtle issues many Lisp programmers are not
> > aware of.
> 
> I don't see what's subtle about "unibyte" strings, as long as you
> understand that these are strings of *bytes* instead of strings
> of *characters* (i.e. they're `int8[]` rather than `w_char_t[]`).

That's the subtlety, right there.  Handling such "strings" in Emacs
Lisp can produce strange and unexpected results for someone who is not
aware of the difference and its implications.

> "Multibyte" strings are just as subtle (maybe more so even), yet we
> rightly don't hesitate to offer a primitive way to construct them.

Because we succeed to hide the subtleties in that case, so the
multibyte nature is not really visible on the Lisp level, unless you
try very hard to make it so.

> > Instead, how about doing that via vectors of byte values?
> 
> What's the advantage?  That seems even more convoluted: create a Lisp
> vector of the right size (i.e. 8x the size of your string on a 64bit
> system), loop over your string turning each byte into a Lisp integer
> (with the reverted API, this involves allocation of an `emacs_value`
> box), then pass that to `concat`?

That's one way, but I'm sure I can come up with a simpler one. ;-)

> It's probably going to be even less efficient than going through utf-8
> and back.

I doubt that.  It's just an assignment.  And it's a rare situation
anyway.

> Think about cases where the module receives byte strings from the disk
> or the network and need to pass that to `decode-coding-string`.
> And consider that we might be talking about megabytes of strings.

They don't need to decode, they just need to arrange for it to be
UTF-8.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]