[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Font-lock is limited to text matching" is a myth

From: Daniel Colascione
Subject: Re: "Font-lock is limited to text matching" is a myth
Date: Tue, 11 Aug 2009 08:13:28 -0400

Morning, Steve. It's good to see you jump into the discussion. Thanks for all the work you've put into js2-mode.

On Aug 11, 2009, at 2:47 AM, Steve Yegge wrote:
js2-mode performs both syntactic and (some) semantic analysis.  It
knows, for instance, when you're using a symbol that's not defined in
its file.  js2-mode does not currently understand project structure,
but I'm doing some work in this area, and it may at some point gather
semantic information collected from several files.

I agree that some kind of project abstraction is certainly in the purview of Emacs, and is sorely needed. But a single major-mode is not the place for that kind of infrastructure. espresso-mode, for now, just looks at all extant buffers in espresso-mode when gathering symbol information. That seems like a reasonable approach to use while we don't have a project system.

There is a relatively simple alternative that might appease Daniel:
I could have js2-mode simply not do any highlighting by default,
except for errors and warnings.  We'd use whatever highlighting is
provided by espresso-mode, and users would be able to choose between
espresso-highlighting and js2-mode highlighting.  With the former,
they'd get "instantaneous" font-locking, albeit not as rich as what
js2-mode can provide.

That's an interesting idea, but my concerns are not limited to js2- mode's highlighting. js2-mode is a valuable experiment, but I stand by my assertion that js2-mode represents a fundamentally wrong way to design major modes, and represents a possible future I would like to avoid.

I recognize that full parsing can take substantial time, but fontification and indentation *must* be synchronous. If the parser cannot perform well enough to play a part in fontification and indentation, then it must be a separate and optional module.

If you propose relegating js2's parser to error reporting, then wouldn't it be better to package it up as a separate, optional minor mode? There is *already* a separate and optional full parsing framework called CEDET that is powerful, generic, and not tied to a major-mode in particular. The right approach is for a given major-mode to understand enough of a given language for fontification and indentation while leaving more substantial parsing and indexing to CEDET (which the user can disable). I recognize that js2's parser may work well in its problem domain --- couldn't it just be added to CEDET?

Alternatively, error checking can be offloaded to a separate processing tool that regularly scans the buffer. Something like Flymake would seem appropriate here.

Even if js2-mode's parser could be made fast enough for synchronous fontification, it still would remain an isolated module apart from CEDET's infrastructure, and it would still make the mode brittle in the face of minor syntactic variations (which might arise, for example, from using the C preprocessor on Javascript like we do). Rigid parsers with nonterminals and error productions appear superficially attractive, but using them for all aspects of a mode not only leads to the issues you discuss below, but also prevents that mode from being reused for similar languages without the grammar being re-worked. It's the wrong approach.

This would be trivial to change.  I am actively maintaining js2-mode,

My apologies, then. I had merely looked at the lack of commits and releases.

Errors and warnings would still need to be asynchronous (if they're
enabled).  So, too, would the imenu outline and my in-progress
buffer-based outline, which is somewhat nicer than the IMenu one.

Have you looked at the espresso imenu support? It heuristically captures much of the object structure of a typical Javascript file and produces something quite nice, if I say so myself. It's fast enough that the parsing operation can be performed synchronously when an imenu data structure is required.

But I think the main objection to js2-mode revolves around its
highlighting, correct?  If so, AND if we can solve the font-lock
integration issues, AND if we can fix the multi-mode issues (II
below), then I'm hopeful that js2-mode might become a reasonable
choice as the default editing mode for JavaScript.

I think espresso-mode is a fine fallback position.  Anything but
java-mode!  The default today is java-mode, and I had no qualms about
replacing it as the default for JavaScript.

Indeed. java-mode is unacceptable. But it's not quite as unacceptable as it appears at first glance: cc-mode would be an excellent platform on which to base a Javascript mode, and I only created espresso-mode after becoming frustrated time and again in trying to extend cc-mode to support Javascript. Unfortunately, cc-mode's core includes some baked-in assumptions about the language, that, IIRC, include function declarations requiring parameter types. It's a shame, really, because Javascript is close enough to C that they could conceivably share a mode, whereas something like Perl clearly requires a world unto itself.

Note: diagnostic messages in js2-mode are highlighted using overlays.
I tried using overlays for all highlighting but it was unacceptably
slow and had a tendency to crash Emacs.

I've had the same thought. Just as in aside, overlays seem like a much better conceptual fit for fontification than text properties do. To this day, I believe the old XEmacs extent system, which was similar to overlays, represented a more elegant approach than the semipermanent text properties used by GNU Emacs. Nevertheless, text property fontification isn't bad enough to warrant the terrible backward compatibility problems that would be generated by a switch to overlays.

II. Multi-mode support

JavaScript is especially needful of mumamo (or equivalent) multi-mode
support, because much of the JavaScript in the wild is embedded in
HTML, in template files, even in strings in other languages.

js2-mode does not support mumamo (or mmm-mode, which which I am
currently more familiar) because js2-mode's lexer needs to support
ignoring parts of the buffer.  I do not think this would be very
hard to implement, but I have not done it yet.

If I don't get to it before the next version of Emacs launches, then I
think this should effectively disqualify js2-mode from being the
default JavaScript mode.  It would be an inconsistent user experience
to have one JavaScript mode in .js files and another mode for
JavaScript inside multi-mode-enabled files.

I agree. If I recall correctly, I've also used rather strong language to describe the current state of multi-mode support in Emacs. (So it's not just js2-mode.) I still believe that some kind of indirect buffer solution in the core would be the most effective and elegant approach to supporting multiple modes. However, mumamo seems to have improved quite a bit, and if it exposes sufficient information to the underlying major modes, it shouldn't hard to modify either js2-mode or espresso-mode to work with it. (Though I do have to ask how your parser will deal with possibly non-contiguous chunks of Javascript.)

I'm ready to give it a try, though, and I'll ping Lennart offline about
integrating the two somehow.

III. Incremental and partial parsing

Lennart and others have asked whether it is possible for js2-mode to
support partial or incremental parsing.  The short answer is
"incremental: yes; partial: no".

nxml-mode, last I checked, does incremental parsing.  It parses ahead
in the buffer, but then stops and saves its state. If you jump forward
in the buffer, it resumes and continues the parse until some point
beyond the section you're viewing.

espresso does precisely this kind of parsing, as does cc-mode.

js2-mode could do it this way without much additional effort.

That's good to hear.

I've been an Emacs user for 20+ years now, and like many I found
the idea of a parsing delay to be somewhere between "undesirable"
and "sickening".

That would describe my experience. May I add "maddening" and "distracting"?

But the majority of programmers today have
apparently learned not to notice delays of ~1sec as long as it
never interferes with their typing or indentation (see IV below).

That other programmers have resigned themselves to inferior fontification is no argument for Emacs to accept it. Asynchronous fontification is completely unacceptable for me, and if it were to become commonplace and unavoidable in Emacs, I would simply stay with older versions.

So after looking at my ~8000 lines of elisp devoted to parsing
JavaScript, I weighed it and decided not to support partial parsing.
It's certainly possible to support it, but I think my time would be
better spent on things that average users are more likely to notice.

cc-mode has a surprisingly complex and robust reprasing system that caches both whitespace runs and syntactic information necessary for indentation. It'd be nice to be able to reuse that.

As it is, espresso uses an incremental parsing scheme that feeds into the fontification layer.

IV.  Indentation

The indentation in js2-mode is broken.  I'll be the first to say it.

It is based on the indentation in Karl Langstrom's mode, which does a
better job for JavaScript than any indenter based on cc-engine, but
that doesn't mean it's a good job. And it's essentially unconfigurable.

espresso-mode shares this problem, which means that for this
important use case it is not an improvement over js2-mode.

Indeed. I've implemented some changes to the indentation system, but the basic approach reminds the same. I should point out, however, that cc-mode's indentation, however convenient, is the exception to the typical Emacs rule. Most modes have only a few knobs the user may tweak to adjust indentation, and users seem happy with those. Also, as Alan Mackenzie mentioned in another thread, cc-mode's indentation is a maintenance burden. I'm not quite sure moving away from Karl Langstrom's indentation approach is worth the trouble right now.

Daniel's objections to js2-mode's non-interaction with font-lock
apply equally to the non-interaction with cc-engine's indentation
configuration system.  The indent configuration for JavaScript should
share as many settings as practical with cc-mode.

I actually made a serious attempt to generate the `c-style-alist'
data structure for js2-mode using the parse tree, but ran into three
issues: [snipped]

I encountered the same problems myself when trying to implement the same feature.

V. Font Lock framework design problems
There seems to be a common misconception flitting about to the
effect that font-lock is perfect and will never need to change.

Nobody is making this claim, and it would be a foolish one to make. Of course font-lock can be improved. But the fundamental approach is sound.

Actually, there seems to be a common misconception that font-lock is an ancient, decrepit mess that's preventing Emacs from striding forward into the "modern" world. Far from it: used properly, font-lock is flexible and powerful. I'd love to see some improvements, such as syntactic keywords being pushed down to a lower level, but the basic idea to sound. In one system, one can combine everything from fast, efficient keyword fontification to arbitrarily complex schemes that depend on elaborate contexts and subtle rules. espresso-mode performs this kind of mixing, actually. Font-lock confers many benefits in terms of reusability, modularity, and customizability, and it would be a waste to replicate it instead of augmenting it. (You have some excellent ideas for doing that below.)

Really, those who dislike font-lock have the same mindset as those who dislike X11. Like font-lock, X11 is an old, powerful system that superficially appears poorly-designed. What detractors ignore is that old, mature systems embody years of experience in the problem domain, and that attempts at ground-up rewrites typically lead to either a system with a reduced feature-set relative to the original, or a mere reimplementation of the original system in different terms, and without the benefit of the experience embodied in the original system.

This is a somewhat paradoxical viewpoint in view of the corpses
littering the path to jit-lock, which include font-lock, fast-lock,
lazy-lock, and vapor-lock.  Each decade we've had a cadre of people
claiming that *-lock meets everyone's needs, and then it gets rewritten

I'm not aware of the authors of *any* of these modes making that claim. The facilities you mention have all been incremental improvements on the basic font-locking idea. Do you really want to discourage that kind of experimentation? Also, in defense of these modes, jit-lock depends on core Emacs functionality that has not always been available, and some of the modes you mentioned would doubtlessly had not been written if jit-lock had been available earlier.

So it's hard to understand how it remains such a popular viewpoint.

I'll make yet another attempt to dispel it, since once we're past the
emotional stumbling blocks, font-lock may be able to evolve again.

Va) Inadequate/insufficient style names
Vb) Ad-hoc default faces that are not being autoloaded
Vc) Additional semantic styles not needed by JavaScript
Vd) Composable semantic styles

I fully agree with these points. While the default font-lock faces have been generally adequate over the years, adding a set of richer faces (that perhaps inherit from the traditional ones) would be welcome. A composable set of styles is an interesting idea too, and it'd be great to see.

Vf) No font-lock interface for setting exact style runs

I could be mistaken here -- if so, please correct me.

The problem is that I need a way, in a given font-lock redisplay, to
say "highlight the region from X to Y with text properties {Z}".

This use case does not seem like it should be inordinately difficult
to support, but it does not seem to be supported today.

As I detailed in '"Font-lock is limited to text matching" is a myth', explicit fontification has essentially always been possible in font- lock. cc-mode has used it for over a decade, and today, both espresso- mode and nxml use this regrettably poorly-documented facility.

If this simple feature were supported, I would have a great deal more
incentive to try to get my parsing to be fast enough to work within
the time constraints users expect from font-lock.

I've taken pains in espresso-mode to ensure that synchronous operations are fast enough to be used interactively, even on large files. Were js2's parsing to also become fast and synchronous, some of my objections would indeed evaporate.

Vg) Lack of differentiation between mode- and minor-mode styles

One of the most common complaints from the thousands of users of
js2-mode, most of whom have exercised enough self-restraint to use the
term "work in progress" in preference to "abomination", is that
js2-mode has poor support for minor modes that do their work with
font-lock -- 80-column highlighters being a popular example, although
there are others.

As I mentioned earlier, my diction reflects not of js2-mode's maturity, but its fundamental structure. I believe it is wrong, and "abomination", while incendiary, is correct. I don't want the future of Emacs to be chock full of modes like js2.

For one thing, it's possible (as Daniel observes) to bypass this
mechanism and call font-lock-apply-highlight directly, which makes
the reverse-engineering even more cumbersome and fragile.

Quite the opposite, actually.

(Vf) is the reason (Vg) is a problem for js2-mode.  font-lock-defaults
does not seem to be a very satisfactory way to apply 2000-10000
precise style runs to a buffer, so I do all my own highlighting,
and it doesn't include style-run contributions from minor modes.

When using font-lock-apply-highlights, or its moral equivalents, user and minor-mode font locking is automatically composed with the major mode's. By using the 'prepend and 'append operators, minor modes and users may state the priority of their fontification rules with respect to those of the major mode. Niceties like these have grown with Emacs for years, and a great deal is lost when a particular major-mode attempts to re-implement core functions to account for some imagined, or at worst, temporary deficiency.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]