[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A vision for multiple major modes: some design notes

From: Alan Mackenzie
Subject: Re: A vision for multiple major modes: some design notes
Date: Thu, 21 Apr 2016 21:33:23 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

Hello, Eli.

I'll get a fuller reply to you later.  But for now....

On Thu, Apr 21, 2016 at 05:17:09PM +0300, Eli Zaretskii wrote:
> > Date: Wed, 20 Apr 2016 19:44:50 +0000
> > From: Alan Mackenzie <address@hidden>
> > 
> > This post describes my notion of how multiple major modes {c,sh}ould be
> > implemented.  Key notions are "islands", "island chains", and "chain
> > local" variable bindings.

> Thank you for publishing this.  A few comments and questions below.
> Please keep in mind that I never had to write any Lisp that deals with
> these issues, so apologies in advance for possibly silly questions and
> misunderstandings.

[ .... ]

> More generally, perhaps it will help if you publish the rationale for
> at least the main points of this design, discussing possible
> alternatives and explaining why you ended up with the one you present
> as the design decision.  This could help us see the main issues that
> are to be dealt with, and perhaps suggest better ways of dealing with
> them.  Seeing just the final product of the design tends to limit the
> discussions to low-level details, which could easily miss the broader
> picture and issues.

It would be nice if Emacs supported several major modes in a buffer, not
just by awkward workarounds, but fully and natively.  There's no magic
involved in the emergence of the design - it's basically a naive vision
of how things should be, given the current state of Emacs.

The essence of major mode support is buffer local variables.  (Things
like the syntax table and local key map are basically buffer local
variables, even though they are not accessible as such from Lisp.)  So,
at first sight, each "island" in the buffer needs its own set of "buffer
local" variables.

However, a set of variable bindings is a big overhead in terms of RAM,
so it would make sense, wherever possible, to share these bindings
between islands with the same major mode.  Furthermore, in some use
cases, there are sequences of islands which are in essence a single
stream of text.  It thus makes sense to have "chains of islands", all
islands in a chain sharing the "chain local" variable bindings.

There might be a need for actual "island local" variables, with a
separate value in each island.  However, Dmitry and I were unable to
identify any such variables in an earlier thread on emacs-devel.  If
any such variables became apparent, then would be the time to work out
how to implement them.

The parts of a buffer which are not in any island (we won't call these
"the ocean" ;-) also need their own variable bindings.  It seems to make
sense to use the standard buffer local bindings for these, since there
would otherwise be no use for them.  An alternative would be to construe
these regions as being islands in their own right, in their own island
chain.  However, that would fit badly with the syntactic delimiters for
islands (see below).

The above applies to most variables which are currently buffer local.
However, there are some such variables which are intrinsically to do
with the whole buffer, not individual islands within it.  These include
`buffer-undo-list', the mark, `mark-ring', .....  They must be marked as
belonging to the whole buffer, and handled as such, hence the
`entire-buffer' property applied to their symbols.

How do we implement chain local variable bindings?  Why not base them on
the implementation of buffer local bindings?  Some buffer local
variables are fixed slots in the struct buffer, the rest are elements in
an association list in the struct buffer.  Until there's a better idea,
we copy this scheme for chain local variables; the fixed slot variables,
currently accessed by the BVAR macro could instead get a somewhat more
involved macro called "CVAR" which will somehow use the current position
(whatever that means) to select the pertinent struct chain or the
familiar struct buffer.

Given a buffer position, we need to be able to find the corresponding
island chain.  "Obviously", we do this with a text property, which we
might as well call `island', or possibly `chain'.  Since successive
accesses to chain local variables are very likely to be in the same
chain most of the time, we will cache the "current" chain in buffer
local variables.

We want `parse-partial-sexp' and friends to work "properly" wrt islands.
It is immediately clear that the syntactic context of each island chain
is independent of other chains and of the regions outside islands.  It
is also clear that the syntactic context at the end of an island should
be preserved and used as the starting value at the start of the next
island in the same chain.  It thus seems sensible to introduce new
syntactic classes "open island" and "close island" to facilitate this.
Why not give them the characters "{" and "}", which are currently
unused?  This method of delimiting islands does, however, force us to
deal with nested islands.  Clearly, our parser state must be amended to
deal with these stacked and suspended states.

It is currently unclear whether `syntax-ppss' needs to return this
amended state, or whether the simple "state within the chain" would be
adequate.  It is clear that syntactic commands such as `forward-list'
(C-M-n) must confine their operation to a single island chain.

When it comes to movement and search primitives, we want to adapt these
so that the impact on existing major modes is minimised.  Ideally, we
would want major modes to "see" only their own islands (or lack
thereof).  Thus we treat irrelevant islands as blocks of whitespace.  It
seems to make sense to have such islands matched by subexpressions in
regexps which match spaces.  This would obviate the need to amend a
great number of regexps currently coded in major modes.

On the other hand, when a user does C-s or C-M-s, the Right Thing is
surely to search the buffer as a whole, without regard to islands.  We
therefore need a flag which instructs the primitives how to behave when
there are islands.  We might as well call this flag `in-islands', for
want of a better name.

The user will, from time to time, delete the delimiters which define
islands, and will insert other ones.  The super mode needs to be able to
react to these actions, amending its island chains appropriately.  I
have not been able to come up with an adequate scheme for this using
only before/after-change-functions.  These variables are going to be
chain local, and the buffer local values will hold functions for the
buffer regions not in islands.  So we introduce
`island-before/after-change-function', entire-buffer local variables,
each of which will hold a single function intended for adjusting island
chains.  Their return values will direct Emacs which islands need
`before/after-change-functions' invoking on them.

To minimise changes to major modes, quite a few primitives (such as
`skip-syntax-forward' and `next-single-property-change') will be amended
to restrict themselves to island chains when `in-islands' is bound to

Several Emacs subsystems will need enhancement, in particular redisplay
and font-lock.

Sorry this has turned out so long, so pedestrian, and so boring.  :-(
As promised, I have had no magic insights, no sparkling innovations in
drawing up these notes - just a sequence of humdrum decisions, one after
the other.  If I've missed out anything relevant, please say so, then I
can try and fill in the gap.

It's also clear that what I'm proposing can't be implemented in a couple
of weekends - it would be a long hard grind.  But it would enable super
modes to be written with comparative ease.

> Thanks.

Alan Mackenzie (Nuremberg, Germany).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]