Re: Initial fontification in sh-mode with tree-sittter

From: Yuan Fu
Subject: Re: Initial fontification in sh-mode with tree-sittter
Date: Sat, 12 Nov 2022 14:28:52 -0800

> On Nov 12, 2022, at 2:04 PM, João Paulo Labegalini de Carvalho 
> <jaopaulolc@gmail.com> wrote:
> I see. This is tree-sitter-bash’s problem. When there are only newlines 
> between two EOF’s, the parser erroneously marks everything that follows as 
> heredoc_body. I tried tree-sitter’s online demo and it gives the same 
> result[1]. We should report this to tree-sitter-bash’s author.
> Sorry for the delay. I confirmed the problem was in the tree-sitter-bash side 
> and submitted a PR to fix it: 
> https://github.com/tree-sitter/tree-sitter-bash/pull/137
> Once my fixes are pulled in, there is no change required to my patch.
> Also, when defining sh-mode--treesit-settings, instead of using the value 
> sh-shell as the language, it’s better to just use ‘bash. Here is what 
> happened to me: my default value for sh-shell is fish, so 
> sh-mode--treesit-settings was defined with language = fish. When I open 
> heredoc-issue.sh, sh-mode parses the shebang and sets sh-shell to bash. Since 
> bash does have a parser, (treesit-ready-p ’sh-mode sh-shell) returns t, and 
> tree-sitter is activated. However when font-lock tries to use the query, it 
> errors because query tries to load a parser for fish.
> I see. I thought that because sh-mode--treesit-settings is executed after the 
> local variable sh-shell is defined, it would always be equal to the 
> detected/file shell type. I am still getting my head around scope in elisp.

When the defvar evaluates at load time, the value of sh-shell is the value set 
by user’s configuration, not the detected/file shell type. When the major-mode 
initialization runs (when we open a file), sh-shell’s value becomes the 
detected/file shell type. 

Because the tree-sitter language definition only works with bash, it doesn’t 
make sense to define those queries with anything other than bash, in 

> I did the change and I think it is good to go, unless there is anything else 
> to improve for now.
> I hope to soon get time to work on imenu, navigation, and indentation for 
> sh-mode & bash with tree-sitter.
> Please find the corrected patch attached.

Thanks, some comments:

+(defun sh-mode--treesit-fontify-decl-command (node override _start _end)
+  "Fontifies only the name of declaration_command nodes.
+This is used instead of `font-lock-builtion-face' directly because
+otherwise the whole command, including the variable assignment part,
+is fontified with with `font-lock-builtin-face'. An alternative to
+this would be to declaration_command nodes to have a `name:' field.”

I guess you meant “...for declaration_command node to have…”? (Declaimer: not 
native speaker)

+  (let* ((maybe-decl-cmd (treesit-node-parent node))
+         (node-type (treesit-node-type maybe-decl-cmd)))
+    (when (string= node-type "declaration_command")
+      (let* ((name-node (car (treesit-node-children maybe-decl-cmd)))
+             (name-beg (treesit-node-start name-node))
+             (name-end (treesit-node-end name-node)))
+        (put-text-property name-beg
+                           name-end
+                           'face
+                           font-lock-builtin-face)))))

+  (cond
+   ;; Tree-sitter
+   ((treesit-ready-p 'sh-mode sh-shell)

+    (setq-local font-lock-keywords-only t)

This line is not necessary anymore due to recent changes.

+    (setq-local treesit-font-lock-feature-list
+                '((comments functions strings heredocs)
+                  (variables keywords commands decl-commands)
+                  (constants operators builtin-variables)))
+    (setq-local treesit-font-lock-settings
+                sh-mode--treesit-settings)
+    (treesit-major-mode-setup))
+   ;; Elisp.
+   (t
+    (setq font-lock-defaults
+          `((sh-font-lock-keywords
+             sh-font-lock-keywords-1 sh-font-lock-keywords-2)
+            nil nil
+            ((?/ . "w") (?~ . "w") (?. . "w") (?- . "w") (?_ . "w")) nil
+            (font-lock-syntactic-face-function
+             . ,#'sh-font-lock-syntactic-face-function))))))

+(defvar sh-mode--treesit-settings
+  (treesit-font-lock-rules
+   :feature 'comments
+   :language sh-shell
+   '((comment) @font-lock-comment-face)
+   :feature 'functions
+   :language sh-shell
+   '((function_definition name: (word) @font-lock-function-name-face))
+   :feature 'strings
+   :language sh-shell
+   '([(string) (raw_string)] @font-lock-string-face)
+   :feature 'heredocs
+   :language sh-shell
+   '([(heredoc_start) (heredoc_body)] @sh-heredoc)
+   :feature 'variables
+   :language sh-shell
+   '((variable_name) @font-lock-variable-name-face)

Because of reasons I mentioned earlier, we should use ‘bash instead of sh-shell 

Once those are changed I think we can push to feature/tree-sitter, other 
features/fixes can come later.


