[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature/tree-sitter: Where to Put C/C++ Stuff

From: Theodor Thornhill
Subject: Re: feature/tree-sitter: Where to Put C/C++ Stuff
Date: Tue, 01 Nov 2022 08:55:44 +0100

Hi Eli!

>> Date: Tue, 01 Nov 2022 06:44:38 +0100
>> From: Theodor Thornhill <theo@thornhill.no>
>> >Where specifically should the C and C++ tree-sitter stuff go? I've
>> >been using it for a couple months and would like to upstream syntax
>> >highlighting for both. I'll focus on getting C done first.
>> >
>> >I see there are a lot of cc- files; would it be appropriate to add
>> >the tree-sitter stuff into a new cc-treesit.el file?  Thanks.
>> I'm no authority on the matter, but I'd love for us not to complicate
>> things too much. I vote for separate, non-cc-prefixed _new_ modes,
>> that derives from prog-mode.
> That'd mean people will need either to invent all the other goodies in
> CC mode (everything except fontifications and indentation) from
> scratch, or give up all those other goodies.  Does that make sense?

Yes, well, partially.  I think that we are too likely to create unwanted
issues by merging the two too closely.  I have seen several of these
issues the last couple of years while implementing c-sharp mode in cc
mode, emacs-tree-sitter and treesit.  There are several things that are
happening.  I'll try to expand on some of them just to create some
perspective, but also for some specific points where we can improve to
maybe don't have a problem with this at all.

1: Use CC mode for one thing and tree-sitter for the rest
While first implementing tree-sitter in c-sharp mode we tried just
applying font-locking, and use cc mode for indentation and the rest.
What happened was that we immediately inherited the performance issues
from cc mode straight into our code.  Specifically, when typing in a
file with too many (from cc mode's perspective) strings, typing lag rose
to several seconds per press.  I filed several bug reports on this both
here and to Alan.  After some time and much heroics we got some
improvement on this from Alan, but c-sharp already had moved on.

2: Using separate names for modes.
The great advantage here is easy to understand.  You have no inheritance
issues, and are free to develop features without regards to legacy.  A
disadvantage is that some users depend on that major mode name for other
stuff.  We had some issues filed with us to flip over to tree-sitter
completely, because that name (csharp-mode) was so important compared to
(csharp-tree-sitter-mode).  We almost made the change, but then Yuan
started his work so we waited.  This would have sunsetted the cc mode
almost immediately

3: Confusion with where to file bugs
We have many bugs in c-sharp mode where some things are emacs bugs, some
things are cc mode bugs, some are treesitter bugs and some are our own
bugs.  There is a real issue with understanding cc mode and figuring out
where a bug fix should end up.  It has taken me many weeks worth of
digging to understand only the simplest mechanisms of cc mode.
Tree-sitter takes contributors only a couple of hours to be immediately
productive.  To disregard this point with only compatibility with cc
mode is a huge mistake, IMO.

4: How do we know what to disable?
If there's a problem somewhere in the tree-sitter variant of the cc mode
derived new mode, and we see some issue - who makes the fix?  For
example, previously there was limited support for multiline strings in
cc mode, which took almost a year to finalize.  The tree-sitter variant
with more performance and accuracy took me maybe 20 minutes in a
work-meeting.  Should a feature that is simple to implement in the
tree-sitter variant wait for a similar cc mode implementation?  The
namespacing seems to suggest that yes, it should.

5: While tree-sitter is only an engine, it provides a lot more goodies
We have a huge opportunity to create real new frameworks for emacs now,
but limiting us to merge the features/modes suggests that we cannot
reliably do overarching advancements such as we see now in the
feature/tree-sitter branch.  For example, many small hacks I've made in
the modes I've submitted thus far has made it into general mechanisms in
treesit.el.  All modes that enable tree-sitter should be able to use
these and all the new that come _without_ worrying whether or not some
issue will crop up from inheriting from cc mode or some other thing.
Examples are indentation styles, paredit-like funciontalities,
refactorings and more.

6: What are the goodies that we really need from CC mode?
CC mode provides indentation and font locking.  What else does it
provide that isn't replaceable pretty quickly?  I mean this not as a
contrarian, but out of real curiosity.  My guess is that we can get to
feature parity and well beyond that in a very short amount of time, if
we're not hindered by merging everything.

Sorry for the long mail, but I think we are missing the point by viewing
tree-sitter simply as an engine to plop in aside cc mode for
convenience, and not the real infrastructure change it is.  There is no
need to sunset cc mode, but equally there is no need to limit tree-sitter.

> Tree-sitter doesn't (and cannot) replace everything a major mode does
> for a programming language.  So a completely new mode means we through
> the baby with the bathwater.

I don't agree, but I'm very curious to what else would take a
significant effort _apart_ from indentation feature parity with cc mode is.

One thing I know of is integration with package managers such as what
elm-mode and go-mode does, but that is an easy fix.  The upstream
go-mode, if not possible to move to core can just derive from a simple
go-treesit, skip all indentation and font-locking in its own mode, but
supply the goodies.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]