emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Raw" string literals for elisp


From: Daniel Brooks
Subject: Re: "Raw" string literals for elisp
Date: Sat, 02 Oct 2021 14:03:57 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Anna Glasgall <anna@crossproduct.net> writes:

> Alan (Dr. Mackenzie? Forgive me, not sure what standards are here),
> your point about strings ending in \ is very well taken and I'm frankly
> not sure what the easiest path forward here is. Having "raw literals
> cannot end in a \" is a weird and unpleasant restriction, although the
> fact that it is one that Python places on r-strings (to my considerable
> surprise; I've been using Python since the mid-00s and have never run
> across this particular syntax oddity before) may mean that it is
> perhaps not so bad. The C++ concept of allowing r-strings to specify
> their own delimiters is perhaps maximally flexible, but is definitely
> going to be a heavier lift to implement than any of the above. I'd love
> to hear people's opinions on the merits of the various possible
> approaches here.

I’ve written a little about raw strings on this mailing list. You might
read 87zgzqz6mu.fsf@db48x.net, but I can summarize or restate the parts
dealing with delimiters.

I happen to love Raku’s choice: you can use any matched pair of
nonalphanumeric unicode characters. U+2603 SNOWMAN is a perfectly
cromulent choice of delimiter as far as Raku is concerned; an example
would be q☃foo☃. Since you can always choose a character that will not
appear in your string, this essentially eliminates all need for escaping
of the delimiter. Raku also lets you use characters that come in left–
and right–handed versions, as long as you order them correctly. For
example q«foo» is allowed, while q»foo« is not. There are unicode
properties that allow this to work without enumerating all of the
possibilities, making it future–proof. (There are only a couple of dozen
pairs, so enumerating them is not hard either.)

Then of course there are languages where the delimiters can be chosen by
the programmer but from a much more constrained set of
possibilities. C++ and Rust seem like good ones that we could mimic.

All of these delimiter styles are quite easy to implement in the reader,
but as Alan points out they can cause some complexity in the
corresponding language modes:

Alan Mackenzie <acm@muc.de> writes:

> When implementing the C++ raw strings, that flexibility caused me a lot
> of grief.  For example, changing text in the middle of a C++ raw string,
> I had to check the new text didn't, by chance, form a closing delimiter
> matching the opening one.  I would recommend not implementing anything
> like the C++ raw string identifiers.

As such, if we go this route I would recommend Rust–style over C++ style
raw strings. The Rust style is a lot like the C++ style, except that the
extra delimiter must be a sequence of # characters, matching on both
sides, rather than arbitrary source characters. Modes that want to check
for this will have an easier time with Rust–style than C++–style raw
strings.

But ultimately I prefer the exuberance and whimsy of Raku’s approach
over the more staid and pedestrian approaches taken by C++ and Rust.

db48x



reply via email to

[Prev in Thread] Current Thread [Next in Thread]