[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "Font-lock is limited to text matching" is a myth
From: |
Daniel Colascione |
Subject: |
Re: "Font-lock is limited to text matching" is a myth |
Date: |
Tue, 11 Aug 2009 08:13:28 -0400 |
Morning, Steve. It's good to see you jump into the discussion. Thanks
for all the work you've put into js2-mode.
On Aug 11, 2009, at 2:47 AM, Steve Yegge wrote:
js2-mode performs both syntactic and (some) semantic analysis. It
knows, for instance, when you're using a symbol that's not defined in
its file. js2-mode does not currently understand project structure,
but I'm doing some work in this area, and it may at some point gather
semantic information collected from several files.
I agree that some kind of project abstraction is certainly in the
purview of Emacs, and is sorely needed. But a single major-mode is not
the place for that kind of infrastructure. espresso-mode, for now,
just looks at all extant buffers in espresso-mode when gathering
symbol information. That seems like a reasonable approach to use while
we don't have a project system.
There is a relatively simple alternative that might appease Daniel:
I could have js2-mode simply not do any highlighting by default,
except for errors and warnings. We'd use whatever highlighting is
provided by espresso-mode, and users would be able to choose between
espresso-highlighting and js2-mode highlighting. With the former,
they'd get "instantaneous" font-locking, albeit not as rich as what
js2-mode can provide.
That's an interesting idea, but my concerns are not limited to js2-
mode's highlighting. js2-mode is a valuable experiment, but I stand by
my assertion that js2-mode represents a fundamentally wrong way to
design major modes, and represents a possible future I would like to
avoid.
I recognize that full parsing can take substantial time, but
fontification and indentation *must* be synchronous. If the parser
cannot perform well enough to play a part in fontification and
indentation, then it must be a separate and optional module.
If you propose relegating js2's parser to error reporting, then
wouldn't it be better to package it up as a separate, optional minor
mode? There is *already* a separate and optional full parsing
framework called CEDET that is powerful, generic, and not tied to a
major-mode in particular. The right approach is for a given major-mode
to understand enough of a given language for fontification and
indentation while leaving more substantial parsing and indexing to
CEDET (which the user can disable). I recognize that js2's parser may
work well in its problem domain --- couldn't it just be added to CEDET?
Alternatively, error checking can be offloaded to a separate
processing tool that regularly scans the buffer. Something like
Flymake would seem appropriate here.
Even if js2-mode's parser could be made fast enough for synchronous
fontification, it still would remain an isolated module apart from
CEDET's infrastructure, and it would still make the mode brittle in
the face of minor syntactic variations (which might arise, for
example, from using the C preprocessor on Javascript like we do).
Rigid parsers with nonterminals and error productions appear
superficially attractive, but using them for all aspects of a mode not
only leads to the issues you discuss below, but also prevents that
mode from being reused for similar languages without the grammar being
re-worked. It's the wrong approach.
This would be trivial to change. I am actively maintaining js2-mode,
My apologies, then. I had merely looked at the lack of commits and
releases.
Errors and warnings would still need to be asynchronous (if they're
enabled). So, too, would the imenu outline and my in-progress
buffer-based outline, which is somewhat nicer than the IMenu one.
Have you looked at the espresso imenu support? It heuristically
captures much of the object structure of a typical Javascript file and
produces something quite nice, if I say so myself. It's fast enough
that the parsing operation can be performed synchronously when an
imenu data structure is required.
But I think the main objection to js2-mode revolves around its
highlighting, correct? If so, AND if we can solve the font-lock
integration issues, AND if we can fix the multi-mode issues (II
below), then I'm hopeful that js2-mode might become a reasonable
choice as the default editing mode for JavaScript.
I think espresso-mode is a fine fallback position. Anything but
java-mode! The default today is java-mode, and I had no qualms about
replacing it as the default for JavaScript.
Indeed. java-mode is unacceptable. But it's not quite as unacceptable
as it appears at first glance: cc-mode would be an excellent platform
on which to base a Javascript mode, and I only created espresso-mode
after becoming frustrated time and again in trying to extend cc-mode
to support Javascript. Unfortunately, cc-mode's core includes some
baked-in assumptions about the language, that, IIRC, include function
declarations requiring parameter types. It's a shame, really, because
Javascript is close enough to C that they could conceivably share a
mode, whereas something like Perl clearly requires a world unto itself.
Note: diagnostic messages in js2-mode are highlighted using overlays.
I tried using overlays for all highlighting but it was unacceptably
slow and had a tendency to crash Emacs.
I've had the same thought. Just as in aside, overlays seem like a much
better conceptual fit for fontification than text properties do. To
this day, I believe the old XEmacs extent system, which was similar to
overlays, represented a more elegant approach than the semipermanent
text properties used by GNU Emacs. Nevertheless, text property
fontification isn't bad enough to warrant the terrible backward
compatibility problems that would be generated by a switch to overlays.
II. Multi-mode support
JavaScript is especially needful of mumamo (or equivalent) multi-mode
support, because much of the JavaScript in the wild is embedded in
HTML, in template files, even in strings in other languages.
js2-mode does not support mumamo (or mmm-mode, which which I am
currently more familiar) because js2-mode's lexer needs to support
ignoring parts of the buffer. I do not think this would be very
hard to implement, but I have not done it yet.
If I don't get to it before the next version of Emacs launches, then I
think this should effectively disqualify js2-mode from being the
default JavaScript mode. It would be an inconsistent user experience
to have one JavaScript mode in .js files and another mode for
JavaScript inside multi-mode-enabled files.
I agree. If I recall correctly, I've also used rather strong language
to describe the current state of multi-mode support in Emacs. (So it's
not just js2-mode.) I still believe that some kind of indirect buffer
solution in the core would be the most effective and elegant approach
to supporting multiple modes. However, mumamo seems to have improved
quite a bit, and if it exposes sufficient information to the
underlying major modes, it shouldn't hard to modify either js2-mode or
espresso-mode to work with it. (Though I do have to ask how your
parser will deal with possibly non-contiguous chunks of Javascript.)
I'm ready to give it a try, though, and I'll ping Lennart offline
about
integrating the two somehow.
III. Incremental and partial parsing
Lennart and others have asked whether it is possible for js2-mode to
support partial or incremental parsing. The short answer is
"incremental: yes; partial: no".
nxml-mode, last I checked, does incremental parsing. It parses ahead
in the buffer, but then stops and saves its state. If you jump
forward
in the buffer, it resumes and continues the parse until some point
beyond the section you're viewing.
espresso does precisely this kind of parsing, as does cc-mode.
js2-mode could do it this way without much additional effort.
That's good to hear.
I've been an Emacs user for 20+ years now, and like many I found
the idea of a parsing delay to be somewhere between "undesirable"
and "sickening".
That would describe my experience. May I add "maddening" and
"distracting"?
But the majority of programmers today have
apparently learned not to notice delays of ~1sec as long as it
never interferes with their typing or indentation (see IV below).
That other programmers have resigned themselves to inferior
fontification is no argument for Emacs to accept it. Asynchronous
fontification is completely unacceptable for me, and if it were to
become commonplace and unavoidable in Emacs, I would simply stay with
older versions.
So after looking at my ~8000 lines of elisp devoted to parsing
JavaScript, I weighed it and decided not to support partial parsing.
It's certainly possible to support it, but I think my time would be
better spent on things that average users are more likely to notice.
cc-mode has a surprisingly complex and robust reprasing system that
caches both whitespace runs and syntactic information necessary for
indentation. It'd be nice to be able to reuse that.
As it is, espresso uses an incremental parsing scheme that feeds into
the fontification layer.
IV. Indentation
The indentation in js2-mode is broken. I'll be the first to say it.
It is based on the indentation in Karl Langstrom's mode, which does a
better job for JavaScript than any indenter based on cc-engine, but
that doesn't mean it's a good job. And it's essentially
unconfigurable.
espresso-mode shares this problem, which means that for this
important use case it is not an improvement over js2-mode.
Indeed. I've implemented some changes to the indentation system, but
the basic approach reminds the same. I should point out, however,
that cc-mode's indentation, however convenient, is the exception to
the typical Emacs rule. Most modes have only a few knobs the user may
tweak to adjust indentation, and users seem happy with those. Also, as
Alan Mackenzie mentioned in another thread, cc-mode's indentation is a
maintenance burden. I'm not quite sure moving away from Karl
Langstrom's indentation approach is worth the trouble right now.
Daniel's objections to js2-mode's non-interaction with font-lock
apply equally to the non-interaction with cc-engine's indentation
configuration system. The indent configuration for JavaScript should
share as many settings as practical with cc-mode.
I actually made a serious attempt to generate the `c-style-alist'
data structure for js2-mode using the parse tree, but ran into three
issues: [snipped]
I encountered the same problems myself when trying to implement the
same feature.
V. Font Lock framework design problems
There seems to be a common misconception flitting about to the
effect that font-lock is perfect and will never need to change.
Nobody is making this claim, and it would be a foolish one to make. Of
course font-lock can be improved. But the fundamental approach is sound.
Actually, there seems to be a common misconception that font-lock is
an ancient, decrepit mess that's preventing Emacs from striding
forward into the "modern" world. Far from it: used properly, font-lock
is flexible and powerful. I'd love to see some improvements, such as
syntactic keywords being pushed down to a lower level, but the basic
idea to sound. In one system, one can combine everything from fast,
efficient keyword fontification to arbitrarily complex schemes that
depend on elaborate contexts and subtle rules. espresso-mode performs
this kind of mixing, actually. Font-lock confers many benefits in
terms of reusability, modularity, and customizability, and it would be
a waste to replicate it instead of augmenting it. (You have some
excellent ideas for doing that below.)
Really, those who dislike font-lock have the same mindset as those who
dislike X11. Like font-lock, X11 is an old, powerful system that
superficially appears poorly-designed. What detractors ignore is that
old, mature systems embody years of experience in the problem domain,
and that attempts at ground-up rewrites typically lead to either a
system with a reduced feature-set relative to the original, or a mere
reimplementation of the original system in different terms, and
without the benefit of the experience embodied in the original system.
This is a somewhat paradoxical viewpoint in view of the corpses
littering the path to jit-lock, which include font-lock, fast-lock,
lazy-lock, and vapor-lock. Each decade we've had a cadre of people
claiming that *-lock meets everyone's needs, and then it gets
rewritten
anyway.
I'm not aware of the authors of *any* of these modes making that
claim. The facilities you mention have all been incremental
improvements on the basic font-locking idea. Do you really want to
discourage that kind of experimentation? Also, in defense of these
modes, jit-lock depends on core Emacs functionality that has not
always been available, and some of the modes you mentioned would
doubtlessly had not been written if jit-lock had been available earlier.
So it's hard to understand how it remains such a popular viewpoint.
I'll make yet another attempt to dispel it, since once we're past the
emotional stumbling blocks, font-lock may be able to evolve again.
Va) Inadequate/insufficient style names
Vb) Ad-hoc default faces that are not being autoloaded
Vc) Additional semantic styles not needed by JavaScript
Vd) Composable semantic styles
I fully agree with these points. While the default font-lock faces
have been generally adequate over the years, adding a set of richer
faces (that perhaps inherit from the traditional ones) would be
welcome. A composable set of styles is an interesting idea too, and
it'd be great to see.
Vf) No font-lock interface for setting exact style runs
I could be mistaken here -- if so, please correct me.
The problem is that I need a way, in a given font-lock redisplay, to
say "highlight the region from X to Y with text properties {Z}".
This use case does not seem like it should be inordinately difficult
to support, but it does not seem to be supported today.
As I detailed in '"Font-lock is limited to text matching" is a myth',
explicit fontification has essentially always been possible in font-
lock. cc-mode has used it for over a decade, and today, both espresso-
mode and nxml use this regrettably poorly-documented facility.
If this simple feature were supported, I would have a great deal more
incentive to try to get my parsing to be fast enough to work within
the time constraints users expect from font-lock.
I've taken pains in espresso-mode to ensure that synchronous
operations are fast enough to be used interactively, even on large
files. Were js2's parsing to also become fast and synchronous, some of
my objections would indeed evaporate.
Vg) Lack of differentiation between mode- and minor-mode styles
One of the most common complaints from the thousands of users of
js2-mode, most of whom have exercised enough self-restraint to use the
term "work in progress" in preference to "abomination", is that
js2-mode has poor support for minor modes that do their work with
font-lock -- 80-column highlighters being a popular example, although
there are others.
As I mentioned earlier, my diction reflects not of js2-mode's
maturity, but its fundamental structure. I believe it is wrong, and
"abomination", while incendiary, is correct. I don't want the future
of Emacs to be chock full of modes like js2.
For one thing, it's possible (as Daniel observes) to bypass this
mechanism and call font-lock-apply-highlight directly, which makes
the reverse-engineering even more cumbersome and fragile.
Quite the opposite, actually.
(Vf) is the reason (Vg) is a problem for js2-mode. font-lock-defaults
does not seem to be a very satisfactory way to apply 2000-10000
precise style runs to a buffer, so I do all my own highlighting,
and it doesn't include style-run contributions from minor modes.
When using font-lock-apply-highlights, or its moral equivalents, user
and minor-mode font locking is automatically composed with the major
mode's. By using the 'prepend and 'append operators, minor modes and
users may state the priority of their fontification rules with respect
to those of the major mode. Niceties like these have grown with Emacs
for years, and a great deal is lost when a particular major-mode
attempts to re-implement core functions to account for some imagined,
or at worst, temporary deficiency.
- Re: Why js2-mode in Emacs 23.2?, (continued)
- Re: Why js2-mode in Emacs 23.2?, Lennart Borgman, 2009/08/10
- "Font-lock is limited to text matching" is a myth, Daniel Colascione, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, David Engster, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, Eric M. Ludlam, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, Eric M. Ludlam, 2009/08/10
- Re: "Font-lock is limited to text matching" is a myth, Steve Yegge, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Miles Bader, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth,
Daniel Colascione <=
- Re: "Font-lock is limited to text matching" is a myth, Miles Bader, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Daniel Colascione, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Daniel Colascione, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Miles Bader, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Stephen J. Turnbull, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Stephen J. Turnbull, 2009/08/11
- Re: "Font-lock is limited to text matching" is a myth, Lennart Borgman, 2009/08/11