[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: generic buffer parsing cache data

From: Paul Pogonyshev
Subject: Re: generic buffer parsing cache data
Date: Sun, 1 Jul 2007 16:41:58 +0300
User-agent: KMail/1.7.2

martin rudalics wrote:
>  > I propose to add something generic.  For instance, Python mode needs to
>  > know indentation level of blocks.  It seems that `syntax-ppss` doesn't
>  > return it at all.  And adding everything that might ever be needed by
>  > some XYZ mode seems counter-productive and complicates an already complex
>  > function and its return value.
>  >
>  > I just mean that major modes can have needs beyond that suited by
>  > `syntax-ppss`.  And as far as I can see, they can either parse half of
>  > the buffer each time they need something, or invent some ad-hoc custom
>  > code for caching such data.
> Like `c-state-cache'.  Well, `syntax-ppss' can only do whatever
> `parse-partial-sexp' does.  Occasionally, that's not even sufficient for
> the Elisp case (look how `lisp-font-lock-syntactic-face-function'
> strives for detecting doc-strings).  I'd appreciate if you came up with
> something more "generic" (if you just could give a clear description of
> that term).

For instance, something like this:

    Function: put-cache-data key data &optional pos

        Store cache DATA with given KEY in the current buffer, at position
        POS (if not specified, then where point currently is.)

    Function: get-cache-data key &optional pos

        Return cache data associated with given KEY in the current buffer
        at position POS (if not specified, then where point currently is.)
        If there is no data with that KEY stored at position, or if it has
        been invalidated, return nil.

Internally, Emacs core (at C level) automatically invalidates cache data
starting from X onwards when buffer text from X to Y (Y >= X) changes in
some way.  Whether cache data is actively removed from internal storage,
or just somehow marked invalid is implementation detail and irrelevant for
Elisp level.

It is unclear whether changes in any text properties should lead to cache
invalidation.  Probably no, at least by default.

It also makes sense to define some `anchors'.  Those would be ways of
partitioning buffers into parts, where changes in one part don't cause
invalidation of cache data in other parts.  For instance, in Python mode
anchors would be set wherever a toplevel block is defined, since it stops
parsing on reaching a toplevel anyway.  However, this can be added later.
For instance, it is not clear when and how to remove anchors.  (I.e. in
Python mode if toplevel is indented to another level, it should stop
being an anchor.)

It is required that major mode stores cache data at some logical position,
so it can later find them again.  Maybe it also makes sense to add

    Function: find-cache-data key &optional pos

        Find and return cache data at POS (or point position) or _before
        it_.  Return nil if there is no (valid) cached data at pos or
        anywhere before with that KEY.

However, I don't see any obvious ways of using it.  As I can see, modes
should access cache data like this (in pseudocode):

            data = (get-cache-data mode-key)
            if data is nil:
                data = (mode-compute-cache-data)
                (put-cache-data mode-key data)
            return data

                higher-level-data = (mode-get-cache-data)
            data = (mode-compute-data-from-higher-level higher-level-data)
            return data

Here `higher-level' is not the same as `previous'.  For instance, in
Python mode it makes sense to compute indentation from the block this one
is nested in, not just previous block:

    class X:
        class Y: # <-- higher-level block for the current block
            class Z:
                def bla (): # <-- previos block (with cached data)
            def __init__(self): # <-- current block


reply via email to

[Prev in Thread] Current Thread [Next in Thread]