[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some commentary on the Org Syntax document

From: Timothy
Subject: Re: Some commentary on the Org Syntax document
Date: Fri, 03 Dec 2021 03:16:40 +0800
User-agent: mu4e 1.6.9; emacs 28.0.50

Hi Tom,

Thanks for your comments, they've been most helpful.
I have some comments on your comments, and have also started drafting
some tweaks to the document in light of your initial comments, put as a
diff excerpt at the end of this email.

For starters, I have come more general comments. However, this has
turned out a bit longer than I intended. Unfortunately I am moments away
from heading to bed, so to quote Pascal "I have only made this letter
longer because I have not had the time to make it shorter".

I think a a big problem is the mix of implicit and explicit information.
Some components are rigorously specified in terms of the characters they
may contain, elements and objects that are recognised inside them, and
even the order in which different parts of the pattern are parsed.

As mentioned originally, the current Dynamic Blocks description doesn't
even mention the CONTENTS part of the pattern, and relies on the reader
inferring that it operates similarly to the CONTENTS part of Drawers.

Forcing the reader to start making inferences like this is a treacherous
path, and I think I can blame for some of the other issues I've
experienced. Take for instance the "surely X can't contain a newline?"
comments I've made. In the Node Properties and Entities descriptions you
have statements along the lines of "X can contain any character [...]
except a newline". In my mind this then sets up the reader to interpret
a similar statement without the "except a newline" clause to mean that
newlines are permitted.

I'm also thinking that the term "element" is overworked in the document.
It's basically pulling tripple duty: you have Elements, Greater
Elements, and elements which are Elements and/or Greater Elements πŸ˜“.

The naming here is quite understandable, and I think we all know that
naming things well isn't easy, but I think it would behove us to try to
give each term a single unique meaning across the document --- or at
least try to come as close to that as reasonably possible.

I think we may be able to improve this by tweaking the hierarchy of
terms and then applying it rigorously throughout the document.

At the highest level, I think we want to encapsulate Headlines,
Sections, Greater Elements, Elements, and Objects. I suppose we might
call these the *components* of an Org document. Then we have the group
of Element and Greater Elements, which are useful to clump together.
Each component is usually given in terms of a number of forms or
patterns, which usually contain terms which are elucidated in the
description of that component.

So, the hierarchy appears to be something like.

1. (Headline / Section / Greater Element / Element / Object)
2. Headline
3. Section
4. Greater Element
5. (Greater Element / Element)
6. Element
7. Object
8. Pattern / Form
9. Term

We could say call (1) Components, (7) Units, (6) Objects, (5) Element or
Object (why not spell it out to avoid telling people to remember

I could have put more thought into this, but it should do for
illustrating my line of thinking. Let me know if you have any good

A separate improvement could be using more formatting to distinguish
when terms are used in a particular way.

Now for a few specific comments.

Tom Gillespie <tgbugs@gmail.com> writes:

>> As a general comment, in many places the Org Syntax document states what
>> characters a component can contain, but not what objects/elements. This feels
>> like a bit of a hole in the current specifications.
> This is indeed confusing because there are some implicit constraints
> that are not listed because they never come up.

I've sort of covered this before, but I think the document would benefit
from being more explicit in general.

> For example, you cannot have two newlines
> inside an inline footnote because the two newlines break the paragraph and the
> thing that appears to be an inline footnote is just plain text that is
> never terminated.

Specifically regarding newlines, perhaps we could add something like
this to the start of the Objects section?

"Furthermore, while many objects may contain newlines, an empty line
(i.e. a double newline) often terminates the element that the object is
a part of, such as a paragraph."

> Ensuring that font locking is in sync org-element and org-export is
> critical to ensure that users know what will actually happen.

On this, I'm cautiously optimistic about the discussion about using
org-element for fontification.

>> Heading
>> ───────
>> ⁃ Ok, so `TITLE' can have any character but a newline, but what Org 
>> components can it contain?
>>   I’m going to assume any object?
> Via org-element-object-restrictions it is standard-set-no-line-break which is
> all elements except citation-reference, table-cell, and line-break.

I must thank you and Ihor for pointing me to
org-element-object-restrictions! I wasn't aware of that till now, and
it's most helpful. Should all the information given by it be included in
the Syntax document? I lean towards saying yes.

>> Drawers and Property Drawers
>> ────────────────────────────
>> ⁃ β€œContents can contain any element but another drawer”
>>   β€’ Does β€œany element” mean β€œany Element or Greater Element”
> Any element that does not have greater precedence, so that would
> be only a heading.

I'm not sure this element = Element / Greater Element "shorthand" is
doing us any favours, but I've discussed that already...

>> Dynamic Blocks
>> ──────────────
>> ⁃ It is not specified what `CONTENTS' may be
> Implicitly follows the same rules as drawers, no headings
> and no nesting of dynamic blocks. Text should be added
> that states this explicitly.

I'm drafting some changes, and this change has been added.

>> ⁃ Surely `PARAMETERS' cannot contain a newline?
> Termination by newline is implicit in the example, but the text is confusing.

Made explicit in my draft.

>> Plain Lists and Items
>> ─────────────────────
>> ⁃ It is not completely clear what content an item may have.
>>   I assume any Object?
> By my reading it may contain anything, objects and elements,
> except for a heading, but that is already implied by the de-indent.
> To quote from the docs:
> An item ends before the next item, the first line less or equally
> indented than its starting line, or two consecutive empty lines.
> Indentation of lines within other greater elements do not count,
> neither do inlinetasks boundaries.
> This makes plain lists one of the most complex elements to parse.

Is it? Perhaps I'm not doing it right but it didn't seem bad to me when
implementing my parser (though I need to add the element support).

All right, that's all I have time for for now.
Hopefully some of this is of use/interest.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]