Re: "Font-lock is limited to text matching" is a myth

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Font-lock is limited to text matching" is a myth

From:	Daniel Colascione
Subject:	Re: "Font-lock is limited to text matching" is a myth
Date:	Tue, 11 Aug 2009 08:13:28 -0400

Morning, Steve. It's good to see you jump into the discussion. Thanksfor all the work you've put into js2-mode.


On Aug 11, 2009, at 2:47 AM, Steve Yegge wrote:

js2-mode performs both syntactic and (some) semantic analysis.  It
knows, for instance, when you're using a symbol that's not defined in
its file.  js2-mode does not currently understand project structure,
but I'm doing some work in this area, and it may at some point gather
semantic information collected from several files.

I agree that some kind of project abstraction is certainly in thepurview of Emacs, and is sorely needed. But a single major-mode is notthe place for that kind of infrastructure. espresso-mode, for now,just looks at all extant buffers in espresso-mode when gatheringsymbol information. That seems like a reasonable approach to use whilewe don't have a project system.

There is a relatively simple alternative that might appease Daniel:
I could have js2-mode simply not do any highlighting by default,
except for errors and warnings.  We'd use whatever highlighting is
provided by espresso-mode, and users would be able to choose between
espresso-highlighting and js2-mode highlighting.  With the former,
they'd get "instantaneous" font-locking, albeit not as rich as what
js2-mode can provide.

That's an interesting idea, but my concerns are not limited to js2-mode's highlighting. js2-mode is a valuable experiment, but I stand bymy assertion that js2-mode represents a fundamentally wrong way todesign major modes, and represents a possible future I would like toavoid.

I recognize that full parsing can take substantial time, butfontification and indentation *must* be synchronous. If the parsercannot perform well enough to play a part in fontification andindentation, then it must be a separate and optional module.

If you propose relegating js2's parser to error reporting, thenwouldn't it be better to package it up as a separate, optional minormode? There is *already* a separate and optional full parsingframework called CEDET that is powerful, generic, and not tied to amajor-mode in particular. The right approach is for a given major-modeto understand enough of a given language for fontification andindentation while leaving more substantial parsing and indexing toCEDET (which the user can disable). I recognize that js2's parser maywork well in its problem domain --- couldn't it just be added to CEDET?

Alternatively, error checking can be offloaded to a separateprocessing tool that regularly scans the buffer. Something likeFlymake would seem appropriate here.

Even if js2-mode's parser could be made fast enough for synchronousfontification, it still would remain an isolated module apart fromCEDET's infrastructure, and it would still make the mode brittle inthe face of minor syntactic variations (which might arise, forexample, from using the C preprocessor on Javascript like we do).Rigid parsers with nonterminals and error productions appearsuperficially attractive, but using them for all aspects of a mode notonly leads to the issues you discuss below, but also prevents thatmode from being reused for similar languages without the grammar beingre-worked. It's the wrong approach.

This would be trivial to change.  I am actively maintaining js2-mode,

My apologies, then. I had merely looked at the lack of commits andreleases.

Errors and warnings would still need to be asynchronous (if they're
enabled).  So, too, would the imenu outline and my in-progress
buffer-based outline, which is somewhat nicer than the IMenu one.

Have you looked at the espresso imenu support? It heuristicallycaptures much of the object structure of a typical Javascript file andproduces something quite nice, if I say so myself. It's fast enoughthat the parsing operation can be performed synchronously when animenu data structure is required.

But I think the main objection to js2-mode revolves around its
highlighting, correct?  If so, AND if we can solve the font-lock
integration issues, AND if we can fix the multi-mode issues (II
below), then I'm hopeful that js2-mode might become a reasonable
choice as the default editing mode for JavaScript.

I think espresso-mode is a fine fallback position.  Anything but
java-mode!  The default today is java-mode, and I had no qualms about
replacing it as the default for JavaScript.

Indeed. java-mode is unacceptable. But it's not quite as unacceptableas it appears at first glance: cc-mode would be an excellent platformon which to base a Javascript mode, and I only created espresso-modeafter becoming frustrated time and again in trying to extend cc-modeto support Javascript. Unfortunately, cc-mode's core includes somebaked-in assumptions about the language, that, IIRC, include functiondeclarations requiring parameter types. It's a shame, really, becauseJavascript is close enough to C that they could conceivably share amode, whereas something like Perl clearly requires a world unto itself.

Note: diagnostic messages in js2-mode are highlighted using overlays.
I tried using overlays for all highlighting but it was unacceptably
slow and had a tendency to crash Emacs.

I've had the same thought. Just as in aside, overlays seem like a muchbetter conceptual fit for fontification than text properties do. Tothis day, I believe the old XEmacs extent system, which was similar tooverlays, represented a more elegant approach than the semipermanenttext properties used by GNU Emacs. Nevertheless, text propertyfontification isn't bad enough to warrant the terrible backwardcompatibility problems that would be generated by a switch to overlays.


II. Multi-mode support

JavaScript is especially needful of mumamo (or equivalent) multi-mode
support, because much of the JavaScript in the wild is embedded in
HTML, in template files, even in strings in other languages.

js2-mode does not support mumamo (or mmm-mode, which which I am
currently more familiar) because js2-mode's lexer needs to support
ignoring parts of the buffer.  I do not think this would be very
hard to implement, but I have not done it yet.

If I don't get to it before the next version of Emacs launches, then I
think this should effectively disqualify js2-mode from being the
default JavaScript mode.  It would be an inconsistent user experience
to have one JavaScript mode in .js files and another mode for
JavaScript inside multi-mode-enabled files.

I agree. If I recall correctly, I've also used rather strong languageto describe the current state of multi-mode support in Emacs. (So it'snot just js2-mode.) I still believe that some kind of indirect buffersolution in the core would be the most effective and elegant approachto supporting multiple modes. However, mumamo seems to have improvedquite a bit, and if it exposes sufficient information to theunderlying major modes, it shouldn't hard to modify either js2-mode orespresso-mode to work with it. (Though I do have to ask how yourparser will deal with possibly non-contiguous chunks of Javascript.)

I'm ready to give it a try, though, and I'll ping Lennart offlineabout

integrating the two somehow.

III. Incremental and partial parsing

Lennart and others have asked whether it is possible for js2-mode to
support partial or incremental parsing.  The short answer is
"incremental: yes; partial: no".

nxml-mode, last I checked, does incremental parsing.  It parses ahead

in the buffer, but then stops and saves its state. If you jumpforward

in the buffer, it resumes and continues the parse until some point
beyond the section you're viewing.


espresso does precisely this kind of parsing, as does cc-mode.

js2-mode could do it this way without much additional effort.


That's good to hear.

I've been an Emacs user for 20+ years now, and like many I found
the idea of a parsing delay to be somewhere between "undesirable"
and "sickening".

That would describe my experience. May I add "maddening" and"distracting"?

But the majority of programmers today have
apparently learned not to notice delays of ~1sec as long as it
never interferes with their typing or indentation (see IV below).

That other programmers have resigned themselves to inferiorfontification is no argument for Emacs to accept it. Asynchronousfontification is completely unacceptable for me, and if it were tobecome commonplace and unavoidable in Emacs, I would simply stay witholder versions.

So after looking at my ~8000 lines of elisp devoted to parsing
JavaScript, I weighed it and decided not to support partial parsing.
It's certainly possible to support it, but I think my time would be
better spent on things that average users are more likely to notice.

cc-mode has a surprisingly complex and robust reprasing system thatcaches both whitespace runs and syntactic information necessary forindentation. It'd be nice to be able to reuse that.

As it is, espresso uses an incremental parsing scheme that feeds intothe fontification layer.

IV.  Indentation

The indentation in js2-mode is broken.  I'll be the first to say it.

It is based on the indentation in Karl Langstrom's mode, which does a
better job for JavaScript than any indenter based on cc-engine, but

that doesn't mean it's a good job. And it's essentiallyunconfigurable.


espresso-mode shares this problem, which means that for this
important use case it is not an improvement over js2-mode.

Indeed. I've implemented some changes to the indentation system, butthe basic approach reminds the same. I should point out, however,that cc-mode's indentation, however convenient, is the exception tothe typical Emacs rule. Most modes have only a few knobs the user maytweak to adjust indentation, and users seem happy with those. Also, asAlan Mackenzie mentioned in another thread, cc-mode's indentation is amaintenance burden. I'm not quite sure moving away from KarlLangstrom's indentation approach is worth the trouble right now.

Daniel's objections to js2-mode's non-interaction with font-lock
apply equally to the non-interaction with cc-engine's indentation
configuration system.  The indent configuration for JavaScript should
share as many settings as practical with cc-mode.

I actually made a serious attempt to generate the `c-style-alist'
data structure for js2-mode using the parse tree, but ran into three
issues: [snipped]

I encountered the same problems myself when trying to implement thesame feature.

V. Font Lock framework design problems
There seems to be a common misconception flitting about to the
effect that font-lock is perfect and will never need to change.

Nobody is making this claim, and it would be a foolish one to make. Ofcourse font-lock can be improved. But the fundamental approach is sound.

Actually, there seems to be a common misconception that font-lock isan ancient, decrepit mess that's preventing Emacs from stridingforward into the "modern" world. Far from it: used properly, font-lockis flexible and powerful. I'd love to see some improvements, such assyntactic keywords being pushed down to a lower level, but the basicidea to sound. In one system, one can combine everything from fast,efficient keyword fontification to arbitrarily complex schemes thatdepend on elaborate contexts and subtle rules. espresso-mode performsthis kind of mixing, actually. Font-lock confers many benefits interms of reusability, modularity, and customizability, and it would bea waste to replicate it instead of augmenting it. (You have someexcellent ideas for doing that below.)

Really, those who dislike font-lock have the same mindset as those whodislike X11. Like font-lock, X11 is an old, powerful system thatsuperficially appears poorly-designed. What detractors ignore is thatold, mature systems embody years of experience in the problem domain,and that attempts at ground-up rewrites typically lead to either asystem with a reduced feature-set relative to the original, or a merereimplementation of the original system in different terms, andwithout the benefit of the experience embodied in the original system.

This is a somewhat paradoxical viewpoint in view of the corpses
littering the path to jit-lock, which include font-lock, fast-lock,
lazy-lock, and vapor-lock.  Each decade we've had a cadre of people

claiming that *-lock meets everyone's needs, and then it getsrewritten

anyway.

I'm not aware of the authors of *any* of these modes making thatclaim. The facilities you mention have all been incrementalimprovements on the basic font-locking idea. Do you really want todiscourage that kind of experimentation? Also, in defense of thesemodes, jit-lock depends on core Emacs functionality that has notalways been available, and some of the modes you mentioned woulddoubtlessly had not been written if jit-lock had been available earlier.

So it's hard to understand how it remains such a popular viewpoint.

I'll make yet another attempt to dispel it, since once we're past the
emotional stumbling blocks, font-lock may be able to evolve again.

Va) Inadequate/insufficient style names
Vb) Ad-hoc default faces that are not being autoloaded
Vc) Additional semantic styles not needed by JavaScript
Vd) Composable semantic styles

I fully agree with these points. While the default font-lock faceshave been generally adequate over the years, adding a set of richerfaces (that perhaps inherit from the traditional ones) would bewelcome. A composable set of styles is an interesting idea too, andit'd be great to see.

Vf) No font-lock interface for setting exact style runs

I could be mistaken here -- if so, please correct me.

The problem is that I need a way, in a given font-lock redisplay, to
say "highlight the region from X to Y with text properties {Z}".

This use case does not seem like it should be inordinately difficult
to support, but it does not seem to be supported today.

As I detailed in '"Font-lock is limited to text matching" is a myth',explicit fontification has essentially always been possible in font-lock. cc-mode has used it for over a decade, and today, both espresso-mode and nxml use this regrettably poorly-documented facility.

If this simple feature were supported, I would have a great deal more
incentive to try to get my parsing to be fast enough to work within
the time constraints users expect from font-lock.

I've taken pains in espresso-mode to ensure that synchronousoperations are fast enough to be used interactively, even on largefiles. Were js2's parsing to also become fast and synchronous, some ofmy objections would indeed evaporate.

Vg) Lack of differentiation between mode- and minor-mode styles

One of the most common complaints from the thousands of users of
js2-mode, most of whom have exercised enough self-restraint to use the
term "work in progress" in preference to "abomination", is that
js2-mode has poor support for minor modes that do their work with
font-lock -- 80-column highlighters being a popular example, although
there are others.

As I mentioned earlier, my diction reflects not of js2-mode'smaturity, but its fundamental structure. I believe it is wrong, and"abomination", while incendiary, is correct. I don't want the futureof Emacs to be chock full of modes like js2.

For one thing, it's possible (as Daniel observes) to bypass this
mechanism and call font-lock-apply-highlight directly, which makes
the reverse-engineering even more cumbersome and fragile.


Quite the opposite, actually.

(Vf) is the reason (Vg) is a problem for js2-mode.  font-lock-defaults
does not seem to be a very satisfactory way to apply 2000-10000
precise style runs to a buffer, so I do all my own highlighting,
and it doesn't include style-run contributions from minor modes.

When using font-lock-apply-highlights, or its moral equivalents, userand minor-mode font locking is automatically composed with the majormode's. By using the 'prepend and 'append operators, minor modes andusers may state the priority of their fontification rules with respectto those of the major mode. Niceties like these have grown with Emacsfor years, and a great deal is lost when a particular major-modeattempts to re-implement core functions to account for some imagined,or at worst, temporary deficiency.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Why js2-mode in Emacs 23.2?, (continued)

Prev by Date: Re: GSoC gdb-mi.el changes
Next by Date: Re: Fiddling with the menus
Previous by thread: Re: "Font-lock is limited to text matching" is a myth
Next by thread: Re: "Font-lock is limited to text matching" is a myth
Index(es):
- Date
- Thread