Feature request/RFC: proper highlighting of code embedded in comments

From: Clément Pit--Claudel
Date: Sat, 15 Oct 2016 11:19:24 -0400
Hi emacs-devel,

Some languages have a way to quote code in comments.  Some examples:

* Python

    def example(foo, *bars):
        """Foo some bars"""

        >>> example(1,
        ...         2,
        ...         3)

        >>> example(4, 8)

* Coq

    Definition example foo bars :=
        (* [example foo bars] uses [foo] to foo some [bars].  For example:
             Compute (example 1 [2, 3]).
             (* 3 *)
           >> *)

In Python, ‘>>>’ indicates a doctest (a small bit of example code).  In Coq, 
‘[…]’ and ‘<<…>>’ serve as markers (inside of comments) of single-line (resp 
multi-line) code snippets.  At the moment, Emacs doesn't highlight these 
snippets.  I originally asked about this in 
 , but received no answers.

There are multiple currently-available workarounds, but none of them that I 
know of are satisfactory:

* Duplicate all font-lock rules, creating anchored matchers that recognize code 
in comments.  The duplication is very unpleasant, and it will require adding 
‘prepend’ to a bunch of font-lock rules, which will break some of them.

* Use a custom syntax-propertize-function to recognize these code snippets and 
escape out of strings.  This has some potential, but it confuses existing 
tools.  For example, in Python, one can do the following; it works fine for 
‘>>>’ in comments, but in strings it seems to break eldoc, among others:

    (let ((current-defun (python-info-current-defun))) (if current-defun (progn 
(format "In: %s()" current-defun))))

    (defconst litpy--doctest-re
      "Regexp matching doctests.")

    (defun litpy--syntax-propertize-function (start end)
      "Mark doctests in START..END."
      (goto-char start)
      (while (re-search-forward litpy--doctest-re end t)
        (let* ((old-syntax (save-excursion (syntax-ppss (match-beginning 1))))
               (in-docstring-p (eq (nth 3 old-syntax) t))
               (in-comment-p (eq (nth 4 old-syntax) t))
               (closing-syntax (cond (in-docstring-p "|") (in-comment-p ">")))
               (reopening-syntax (cond (in-docstring-p "|") (in-comment-p "<")))
               (reopening-char (char-after (match-end 2)))
               (no-reopen (eq (and reopening-char (char-syntax reopening-char))
                              (cond (in-comment-p ?>)))))
          (when closing-syntax
            (put-text-property (1- (match-end 1)) (match-end 1)
                               'syntax-table (string-to-syntax closing-syntax))
            (when (and reopening-char (not no-reopen))
              (put-text-property (match-end 2) (1+ (match-end 2))
                                 'syntax-table (string-to-syntax 

Maybe the second approach can be made to more-or-less work for Python, despite 
the issue above — I'm not entirely sure.  The idea there is to detect chunks of 
code, and mark their starting and ending characters in a way that escapes from 
the surrounding comment or string.

But this doesn't solve the problem for Coq, for example, because it confuses 
comment-forward and the like.  Some coq tools depend on Emacs to identify 
comments and skip over them when running a file (code is sent bit by bit, so if 
‘(* foo [some code here] bar *)’ is annotated with syntax properties to make 
Emacs think that it should be understood as ‘(* foo *) some code here (* bar 
*)’, then Proof General (a Coq IDE based on Emacs) won't realize that “some 
code here” is part of a comment, and things will break.

I'm not sure what the right approach is.  I guess there are two approaches:

* Mark embedded code in comments as actual code using 
syntax-propertize-function, and add a way for tools to detect this "code but 
not really code" situation.  Pros: things like company, eldoc, 
prettify-symbols-mode, etc. will work in embedded code comments without having 
to opt them in.  Cons: some things will break, and will need to be fixed 
(comment-forward, Proof General, Elpy, indentation functions…).

* Add new "code block starter"/"code-block-ender" syntax classes?  Then 
font-lock would know that it has to highlight these.  Pros: few things would 
break.  Cons: Tools would have to be opted-in (company-mode, eldoc, 
prettify-symbols-mode, …).

Am I missing another obvious solution?  Has this topic been discussed before?


