Christmas wish: Literate Elisp

Well, more of a proposal then a question, but here it is:

From Wikipedia: "Literate programming is a programming paradigm

introduced by Donald Knuth in which a computer program is given

an explanation of its logic in a natural language, such as English,

interspersed with snippets of macros and traditional source code,

from which compilable source code can be generated."

Emacs already supports a form of literal programming in form of

org mode and babel where we can insert code for programming

languages in-between #+BEGIN_SRC and #+END_SRC markers,

which is super nice and cool feature.

However I got a thought that LISPs (lisp-like languages), have natural

code markers, since all code is enclosed with parenthesis. Thus one

could see '(' and ')' as code markers in literate-programming style,

more of as Knuth proposed. In other words, LISP (or at least Elisp)

does not need special markers to denote start and end of code. Unlike

Haskell, there is no need to use '\begin_code' or '>' to differentiate

code from text (comments).

My proposal is to slightly change Elisp parser to treat lines that start

with any other printable character but '(' as a start of comment and to

simply ignore the line, just as it treats ';' as a comment. Code blocks

would still be parsed as they are now, and ';' would still mean a comment,

wherever it is encountered, it is just that anything that does not

belong in a code-block (lists) is a comment. For example consider this mail,

if this would be thrown into parser all lines to this point would be simply

ignored since they don't start with '(' or are white spaces.

(my-elisp-fun

(progn

(while (very-cool)

(do-something-very-literate-here))))

Then I could have Elisp code in between and continue to write this mail

and later on just use this as a source code. Wouldn't that be a step

toward true and more cool literate programming language? Below is another

snippet of imaginary Elisp. If we could do this, then this email would be

a working literate Elisp, where those two snippets are code and text of

this mail is just ignored.

(my-other-fun

(progn

; this-is-some-other-fun

(to-hopefully-demonstrate-the-idea)))

What would this achieve

More then highly increased coolness factor, it would be a small quality

of life improvement. For example this would actually make it slightly

easier to use org-mode for Elisp programming. For example for us that

use org-mode to structure Emacs init file, we could just throw in our org

file directly, instead of using babel to entangle it into Elisp file first.

If every printable character but '(' starts a comment line, then everything

in org-file but Elisp code would be simply ignored, and only Elisp executed.

If we think other way around, it would also let us use pure Elisp for literate

programming without org-mode whatsoever, albeit it would be a cool feature to

use org-headings and similar to structure the code. It might make code more

structured and thus more readable. When I think in terms of Elisp as a starting

point rather then in terms of org-mode as a starting point, that could result in

adding org-mode organizational features directly to Elisp. One could even mark

say not-implemented functions as todo items, use calendar etc.

We could also entangle other languages within pure Elisp code without using org

mode whatsoever. Either within some code markers for processing them out to

separate files, or without code markers just as documentation or whatever. I

don't have some better example of use-case at the moment.

I don't mean that it is incredibly slow to entangle files, but it would be

slightly more efficient to process Elisp entangled in org mode. I also don't

think it is hard to type ';' at the beginning of a line to start a comment line.

But it is a small convenience and thus quality of life improvement that probably

does not need much changes to a parser but has quite a dramatic effect on how

source code looks in human eye (at least mine, if you don't mind that I count

myself as a part of the species :-)). It would let us use org-mode as a standard

Elisp source code format, which might be just a perceived convenience rather

then some real extra useful thing that does not exist yet.

Some thoughts about implementation

I think that in terms of cost effectiveness with implementation in mind, it

probably isn't that much work to implement this, but honestly I have no idea.

I believe it can't be much work, but I am not sure so I should really put an

exclamation mark to word probably in paragraph above. Feel free to educate me

about cost of making it work. I was looking myself in C source to see if I could

test this myself before I post here, but I couldn't find where you have implemented

parser. I am sorry, but I am that bad :-(.

Essentially when parsing literate Elisp, if I may call it so, what

parser has to do is to simply not flag random printable characters, on lines

that does not belong to code-blocks, as errors in source code. Instead just treat

them as if it has seen a ';'.

It means there are just two classes of printable characters: '(' that opens

a code block, and everything else that opens a comment block. Well almost.

Parsing code blocks would not need to be changed at all, and ';' in code blocks

would still mean that rest of line is a comment, and all code with comments

would still continue to work as it does now. It would only affect new code that

is written in this style. However new Elisp code wouldn't be backward

compatible with old versions of Emacs.

As extra, one could keep current Elisp parser and make new one and use as

in Haskell, en extra 'l' in suffix to denote a literate program, '.lel'. Though

it kind-a looks fun, I don't think it wouldn't be needed, I don't think this

change would need different parser, to ensure backward compatibility. I don't

think that is very important since if we would write Elisp that needs to run on

older versions, we can just let be to write it in literal form.

Drawbacks

As I can think of, it would maybe make spotting errors slightly harder, for

example I could type a random character before an opening parenthesis and

comment out entire line, but those kind of errors are easily spotted on first code

run. Another drawback would be probably syntax highlighting. It could probably

become much harder to detect comments in code since there is no ';' to mark a

comment-line. Maybe I am wrong about this one, it is just a fast thought.

Final thought

I have no idea if somebody else has already thought about this and found that it

can't work. It seems like a very straight-forward and simple thing so I am

probably not the first one to think the thought, and there is probably some

reason why it is not done that I am not aware of. In that case just ignore this

email. It was just a thought I got yesterday. There might be something I am not

aware of that makes this impossible, I am not that familiar with Emacs source to

try to implement this myself unfortunately. It is just an idea, a thought for

discussion, but it would be cool if it can work.

I am sorry for very long email, I hope you have at least somewhat enjoyed my

rather ramblings and Christmas wishes, and please ask Santa to excuse my

English, I am not native English speaker, it is just my 3rd language if it is in

any defense or my horrible writing.

From:	arthur miller
Subject:	Christmas wish: Literate Elisp
Date:	Thu, 12 Dec 2019 15:45:50 +0000