[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp font-lock highlighting

From: martin rudalics
Subject: Re: regexp font-lock highlighting
Date: Wed, 15 Jun 2005 18:00:07 +0200
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Richard Stallman wrote:
>     believe that I found a solution that does the right thing in most cases
>     and will send it to you in the next days.
> Could you dscribe in words what it does?

Attached find a file called `lisp-font-lock-regexp.el' which contains
all changes I propose.  You may try to load it, make the face
definitions meet your requirements, and look whether it works.
Syntax-highlighting and decoration for lisp-font-lock-keywords-2 must be
activated.  Eventually someone would have to decide on appropriate
names and defaults for faces.

I have set regexp highlighting to the minimum level 1.  If this were
incorporated in font-lock.el, the standard level should be 0 - which
means no regexp highlighting and thus no obtrusiveness.  Emacs would
behave as before the introduction of regexp highlighting a couple of
weeks ago.  Level 1 does regexp highlighting as introduced recently with
some minor bug fixes.

Levels 2 and 3 should do something that was proposed in font-lock.el but
commented out due to problems with an "unbreakable endless loop".  Level
2 does this for regexp groups on a single line only.  Level 3 should
handle regexp groups spanning several lines as well.  By no means the
default level should equal 3 as will become evident from remarks below.

The variable `lisp-font-lock-regexp' can be used to set the default
level.  Individual buffer settings can be achieved by using the command

Levels 2 and 3 use the syntax-table property to remove parenthesis
syntax from unescaped parentheses and escaped brackets within regexp
groups.  I added syntax-table to `font-lock-extra-managed-props' since I
don't want font-lock to perform the extra syntactic fontification pass.
This idea is non-standard and could be defeated by anyone who removed
syntax-table from that list - so far no one seems to use syntax-table
properties in elisp-mode.

With that property paren-matching/blinking and forward/backward-sexp
should work "as intended" within parenthetical groups.  You may have
noticed my simple-minded posting on emacs-pretest-bug about forward-sexp
not being able to handle unescaped semicolons within strings.  I
resolved the problem by setting the syntax-table property of `;' to
punctuation within regexp groups.  For a similar reason I reset the
escape syntax property of single backslashes preceding parentheses and

I do not treat special characters "as ordinary ones if they are in
contexts where their special meanings make no sense".  Hence,
subexpressions like

\\(\\[[^]]*]\\)* in `reftex-extract-bib-entries-from-thebibliography'

\\(\\[[^\\]]*\\]\\)? in `reftex-all-used-citation-keys'

\\`\\(\\\\[sS]?.\\|\\[\\^?]?[^]]*]\\|[^\\]\\) in `gnus-score-regexp-bad-p'

\\(\[[0-9]+\] \\)* in `gud-jdb-marker-filter'

do contain mismatches.

With level 3 highlighting I'm using the font-lock-multiline property.
Apparently this property is used by `smerge.el' too.  Consequently, I
cannot simply reset the variable `font-lock-multiline' to nil when I
switch to a lower level.  I believe that this variable - and the
variable `parse-sexp-lookup-properties' as well - should be handled in a
way similar to hooks or `buffer-invisibility-spec'.  Anyone who wants to
set these variables should create or append its name to a corresponding
list and remove its name to eventually reset the variable.  Routines
checking the value of the variable would not be affected by this
convention.  Likely font-lock-multiline, syntax-table and
`lisp-font-lock-regexp' prefixed properties should be added to
`yank-excluded-properties' too.

I've been experimenting a bit with level 3 highlighting.  With a 200MHz
PC the results are negative: Fontifiying a buffer is moderatly slow,
modifying text is hardly supportable.  With a 1GHz PC I did not
encounter substantial difficulties with one exception - fontifying
`cperl-init-faces' took a couple of seconds.  I tried to look at bit
closer what's going on.

When I scrolled down through `cperl.el' and looked at what font-lock is
doing I found out that the range from position 168761 to 172839 gets
fontified no less than _seven_ times in sequence: Apparently `xdisp.c' -
encountering an unfontified object at a position START - asks
`jit-lock-function' to fontify from position START.  jit-lock-function
now calls `jit-lock-fontify-now' to fontify from START to (+ START
jit-lock-chunk-size).  The latter sets the fontified property for this
region to t.  `font-lock-default-fontify-region' detects that there is a
font-lock-multiline pattern, fontifies the entire region from beginning
to end of the pattern - the 168761 to 172839 region above - but does not
set the fontified property for this region.

I simply inserted `(put-text-property beg end 'fontified t)' in the text
of `font-lock-default-fontify-region' right before it calls
`font-lock-unfontify-region' and the problem disappeared.

When I change some text within a font-lock-multiline pattern of
`cperl-init-faces' font-lock refontifies the entire area twice which can
take a couple of seconds.  What happens here?  The first refontification
is triggered by redisplay which encounters an unfontified thing it
should display (the thing was unfontified by `jit-lock-after-change'
previously).  The second refontification is eventually triggered by
`jit-lock-context-fontify' which unfontifies everything from
`jit-lock-context-unfontify-pos' until point-max.  However, the second
refontification is useless because font-lock-default-fontify-region
already took care of the font-lock multiline pattern.  Moreover, the
second fontification usualy occurs right after the first has finished
_before_ I am able to enter the next character.

I could resolve this by having font-lock-default-fontify-region
fontify a region iff it has not fontified exactly that region already
since the last modification of the buffer.  But font-lock-multiline
patterns do not seem suited for handling this problem anyway.  Patterns
spanning more than a couple of lines - your mileage may vary - will
delay redisplay because inserting one single character triggers
refontification of the _entire_ pattern.  It should be possible to
resolve this problem by using the `jit-lock-defer-multiline' property.
However, the latter is broken.

Suppose I used jit-lock-defer-multiline instead of font-lock-multiline
for my pattern.  Inserting a character now will not delay redisplay
anymore since font-lock-default-fontify-region does not cater for
jit-lock-defer-multiline.  Eventually, jit-lock-context-fontify will
unfontify the relevant parts of my buffer from the start of the pattern
to point-max, and everything should get fontified correctly.  It does
not, however, when the jit-lock-defer-multiline pattern starts _before_
`window-start': After jit-lock-context-fontify has unfontified the
buffer, redisplay - for some reason I did not investigate - intercepts
this by fontifying the _visible_ part of the buffer without caring about
my pattern.  Eventually, the invisible parts get refontified but the
already fontified part doesn't because, as mentioned before,
font-lock-default-fontify-region does not know jit-lock-defer-multiline
patterns.  Hence, fontification appears incorrect.

I'm afraid there are no simple patches for this.  Hence I provided the
appropriate warnings that level 3 highlighting should be used with
sufficient care.

The feature I propose could be quite useful for people who write regular
expressions only occasionally and I don't want to compromise it on
behalf of the recent controversies on font-lock-comment-delimiter and
font-lock-negation-char-face faces.  On the other hand, I don't want to
give pretext to anyone who plans to introduce yet another feature in the
pre-release phase.  Hence if you think that this should be delayed or
cancelled please tell me so.

I've also experimented with a patch of `show-paren-function' where I
overlay the backslashes in `\\(...\\)' groups with the respective count
of that group.  Hence I don't have to literally step through such pairs
when searching for the subexpressions referenced by match-string,
match-beginning, ...
(defface lisp-font-lock-regexp-delimiter
  '((t (:bold t)))
  "Face for highlighting regexp group delimiters and brackets."
  :group 'font-lock-highlighting-faces)

(defface lisp-font-lock-regexp-backslash
  '((t (:foreground "PaleGreen3")))
  "Face for highlighting the backslash part of regexp group delimiters."
  :group 'font-lock-highlighting-faces)

(defface lisp-font-lock-regexp-group
  '((t (:background "Grey86")))
  "Face for highlighting inner regexp groups."
  :group 'font-lock-highlighting-faces)

(defun lisp-font-lock-regexp-hook ()
  "Automatically turn on regexp highlighting in `emacs-lisp-mode'."
  (setq lisp-font-lock-regexp lisp-font-lock-regexp) ; set buffer-local value
  (when (> lisp-font-lock-regexp 1)
    (set (make-local-variable 'parse-sexp-lookup-properties) t)
    (when (> lisp-font-lock-regexp 2)
      (set (make-local-variable 'font-lock-multiline) t))
    (set (make-local-variable 'font-lock-extra-managed-props)
         (append font-lock-extra-managed-props
                 (list 'syntax-table

(defcustom lisp-font-lock-regexp 1
  "*Highlight regular expression in `emacs-lisp-mode'.

The following levels are available:

0 (off) do no highlight regular expressions specially.

1 (minimum) highlight the non-backslash parts of regexp group delimiters with
  `lisp-font-lock-regexp-delimiter' face and delimiter backslashes with
  `lisp-font-lock-regexp-backslash' face.  Group delimiters are the
  backslash-sequences `\\(' `\\(?:' `\\|' and `\\)'.  Delimiters appearing in
  documentation strings or non-string text are not highlighted.  Within proper
  strings, however, *every* instance of such a delimiter will be highlighted
  regardless of its actual or intended semantics.  Hence, you should use these
  backslash-sequences *exclusively* for parenthetical grouping of regexps.  For
  other purposes try something like `(concat \"\\\\\" \"(\")' instead.  Within
  character alternatives write `)\\\\' instead of `\\\\('.

2 (medium) as 1 but also highlight brackets delimiting character alternatives
  within single-line regexp groups with `lisp-font-lock-regexp-delimiter' face.
  Moreover, highlight inner regexp groups with `lisp-font-lock-regexp-group'
  face.  Inner regexp groups are character sequences within `\\(...\\)' and
  `\\(?:...\\)' that appear on a single line and do not contain one of the
  backslash-sequences `\\(' `\\(?:' or `\\)'.  Inner regexp groups may contain
  non-string text provided the respective delimiters appear within a string.

  In addition, 2 will try to set the syntax-table properties of parentheses,
  brackets and semicolons within single-line regexp groups appropriately.  More
  precisely, brackets that do not delimit a character alternative or class,
  parentheses that do not delimit a group, semicolons, and single backslashes
  preceding a parenthesis or bracket, are classified as punctuation characters.

  Note that you can always create a surrounding group with the shy group
  delimiters `\\(?:...\\)' without modifying the semantics of enclosed regexps.

3 (maximum) as 2 but permit operations on regexp groups spanning several lines.
  This option exploits the `font-lock-multiline' text-property which is not
  guaranteed to work reliably and is notorious for delaying redisplay
  considerably.  Hence use this option with *extreme* care!

Setting the default value of this variable does not affect highlighting of live
buffers.  Use the command `lisp-font-lock-regexp' to change highlighting for the
current buffer only."
  :type '(choice (const :tag "off" 0)
                 (const :tag "minimum" 1)
                 (const :tag "medium" 2)
                 (const :tag "maximum" 3))
  :set (lambda (symbol value)
         (set-default symbol value)
         (remove-hook 'emacs-lisp-mode-hook 'lisp-font-lock-regexp-hook)
         (when (and (boundp 'font-lock-mode) ; silly if this is part of 
                    (> value 0))
           (custom-add-option 'emacs-lisp-mode-hook 'lisp-font-lock-regexp-hook)
           (add-hook 'emacs-lisp-mode-hook 'lisp-font-lock-regexp-hook)))
  :version "22.1"
  :group 'font-lock)
(make-variable-buffer-local 'lisp-font-lock-regexp)

(defconst lisp-font-lock-keywords-2
     `( ;; Control structures.  Emacs Lisp forms.
          "(" (regexp-opt
               '("cond" "if" "while" "let" "let*"
                 "prog" "progn" "progv" "prog1" "prog2" "prog*"
                 "inline" "lambda" "save-restriction" "save-excursion"
                 "save-window-excursion" "save-selected-window"
                 "save-match-data" "save-current-buffer" "unwind-protect"
                 "condition-case" "track-mouse"
                 "eval-after-load" "eval-and-compile" "eval-when-compile"
                 "with-current-buffer" "with-electric-help"
                 "with-local-quit" "with-no-warnings"
                 "with-output-to-string" "with-output-to-temp-buffer"
                 "with-selected-window" "with-syntax-table"
                 "with-temp-buffer" "with-temp-file" "with-temp-message"
                 "with-timeout" "with-timeout-handler") t)
        .  1)
       ;; Control structures.  Common Lisp forms.
          "(" (regexp-opt
               '("when" "unless" "case" "ecase" "typecase" "etypecase"
                 "ccase" "ctypecase" "handler-case" "handler-bind"
                 "restart-bind" "restart-case" "in-package"
                 "break" "ignore-errors"
                 "loop" "do" "do*" "dotimes" "dolist" "the" "locally"
                 "proclaim" "declaim" "declare" "symbol-macrolet"
                 "lexical-let" "lexical-let*" "flet" "labels" "compiler-let"
                 "destructuring-bind" "macrolet" "tagbody" "block" "go"
                 "multiple-value-bind" "multiple-value-prog1"
                 "return" "return-from"
                 "with-accessors" "with-compilation-unit"
                 "with-condition-restarts" "with-hash-table-iterator"
                 "with-input-from-string" "with-open-file"
                 "with-open-stream" "with-output-to-string"
                 "with-package-iterator" "with-simple-restart"
                 "with-slots" "with-standard-io-syntax") t)
        . 1)
       ;; Exit/Feature symbols as constants.
       (,(concat "(\\(catch\\|throw\\|featurep\\|provide\\|require\\)\\>"
                 "[ \t']*\\(\\sw+\\)?")
        (1 font-lock-keyword-face)
        (2 font-lock-constant-face nil t))
       ;; Erroneous structures.
("(\\(abort\\|assert\\|warn\\|check-type\\|cerror\\|error\\|signal\\)\\>" 1 
       ;; Words inside \\[] tend to be for `substitute-command-keys'.
       ("\\\\\\\\\\[\\(\\sw+\\)\\]" 1 font-lock-constant-face prepend)
       ;; Words inside `' tend to be symbol names.
       ("`\\(\\sw\\sw+\\)'" 1 font-lock-constant-face prepend)
       ;; Constant values.
       ("\\<:\\sw+\\>" 0 font-lock-builtin-face)
       ;; ELisp and CLisp `&' keywords as types.
       ("\\&\\sw+\\>" . font-lock-type-face)
       ((lambda (bound)
          (when (and (local-variable-p 'lisp-font-lock-regexp)
                     (not (zerop lisp-font-lock-regexp)))
            (while (cond
                    ((and (= lisp-font-lock-regexp 3)
                          (get-text-property (point) 'lisp-font-lock-regexp))
 bound 'bound))
                    ((and (= lisp-font-lock-regexp 2)
                          (or (and (get-text-property (point) 
 bound 'eol)
                                   (not (match-beginning 11)))
                               bound 'bound))))
                    (t (re-search-forward
                        bound 'bound)))
              (let ((face (get-text-property (1- (point)) 'face)))
                (when (or (and (listp face)
                               (memq 'font-lock-string-face face))
                          (eq 'font-lock-string-face face))
                   ((match-beginning 2) ; \\(
                     (match-beginning 1) (match-end 1)
                     'face 'lisp-font-lock-regexp-backslash)
                     (match-beginning 2) (match-end 2)
                     'face 'lisp-font-lock-regexp-delimiter)
                    (when (> lisp-font-lock-regexp 1)
                      (let* ((level (or (get-text-property
                                         (point) 'lisp-font-lock-regexp)
                             (from (match-beginning 1))
                             (mid (match-end 2))
                             (to (or (and (= lisp-font-lock-regexp 2)
                                     (and (> level 1)
                                           (point) 'lisp-font-lock-regexp))
                                         ;; beginning of next defun
                                         (if (re-search-forward "^(" nil t)
                                             (1- (point))
                        (put-text-property from to 'lisp-font-lock-regexp (1+ 
                        (put-text-property ; `\\(' is not part of inner group
                         from mid 'lisp-font-lock-regexp-group nil)
                        (put-text-property mid to 'lisp-font-lock-regexp-group 
                        (when (> lisp-font-lock-regexp 2)
                          (put-text-property from to 'font-lock-multiline t)))))
                   ((match-beginning 3) ; \\|
                     (match-beginning 1) (match-end 1)
                     'face 'lisp-font-lock-regexp-backslash)
                     (match-beginning 3) (match-end 3)
                     'face 'lisp-font-lock-regexp-delimiter))
                   ((match-beginning 4) ; \\)
                    (let ((level
                           (when (> lisp-font-lock-regexp 1)
                             (get-text-property (point) 
                       ((and level
                             (not (get-text-property
                                   (point) 'lisp-font-lock-regexp-alt)))
                        (let* ((from
                                (when (get-text-property
                                       (point) 'lisp-font-lock-regexp-group)
                                   (point) 'lisp-font-lock-regexp-group)))
                               (to (or (and (= lisp-font-lock-regexp 2)
                                        (point) 'lisp-font-lock-regexp)
                           (match-beginning 1) (match-end 1)
                           'face 'lisp-font-lock-regexp-backslash)
                           (match-beginning 4) (match-end 4)
                           'face 'lisp-font-lock-regexp-delimiter)
                          (if (> level 1)
                               (match-end 4) to 'lisp-font-lock-regexp (1- 
                             (match-end 4) to '(lisp-font-lock-regexp nil))
                            (when (> lisp-font-lock-regexp 2)
                               (match-end 4) to '(font-lock-multiline nil))))
                          (when from
                             (match-end 4) to '(lisp-font-lock-regexp-group 
                             from (match-beginning 1)
                             'face 'lisp-font-lock-regexp-group))))
                       ((not level)
                        ;; no open \\( or lisp-font-lock-regexp equals 1
                         (match-beginning 1) (match-end 1)
                         'face 'lisp-font-lock-regexp-backslash)
                         (match-beginning 4) (match-end 4)
                         'face 'lisp-font-lock-regexp-delimiter))
                       ((get-text-property (point) 'lisp-font-lock-regexp)
                         (1- (point)) (point) 'syntax-table '(3))))))
                   ;; matches below should occur within parenthetical groups
                   ((match-beginning 5) ; \\[ or \\]
                    (if (get-text-property (point) 'lisp-font-lock-regexp-alt)
                        ;; within alternative
                        (goto-char (1- (point)))
                       (1- (point)) (point) 'syntax-table '(3))))
                   ((match-beginning 6) ; escaped parenthesis or bracket
                     (match-beginning 0) (1+ (match-beginning 0))
                     'syntax-table '(3))
                    ;; reread paren
                    (goto-char (1+ (match-beginning 0))))
                   ((match-beginning 7))
                    ;; POSIX character class, skip to preserve paren syntax
                   ((match-beginning 8) ; [
                    (if (get-text-property (point) 'lisp-font-lock-regexp-alt)
                        ;; already within alternative
                         (1- (point)) (point) 'syntax-table '(3))
                      ;; starting new alternative
                      (let ((to (or (and (= lisp-font-lock-regexp 2)
                                     (point) 'lisp-font-lock-regexp)
                         (match-beginning 8) (match-end 8)
                         'face 'lisp-font-lock-regexp-delimiter)
                         ;; the following should be reset at \\)
                         (point) to 'lisp-font-lock-regexp-alt t))))
                   ((match-beginning 9) ; ]
                    (let* ((from
                            (when (get-text-property
                                   (point) 'lisp-font-lock-regexp-alt)
                               (point) 'lisp-font-lock-regexp-alt))))
                       ((not from))
                       ;; here likely some cases are missing
                       ((or (and (char-equal (char-before (1- (point))) ?\[ )
                                 ;; []
                                 (= from (1- (point))))
                            (and (char-equal (char-before (1- (point))) ?^ )
                                 (or (and (char-equal (char-before (- (point) 
2)) ?\[ )
                                          ;; [^]
                                          (= from (- (point) 2)))
                                     (and (char-equal (char-before (- (point) 
2)) ?\\ )
                                          (char-equal (char-equal (- (point) 
3)) ?\[ )
                                          ;; [\^]
                                          (= from (- (point) 3)))))
                            (and (char-equal (char-before (1- (point))) ?\\ )
                                 (or (and (char-equal (char-before (- (point) 
2)) ?\[ )
                                          ;; [\]
                                          (= from (- (point) 2)))
                                     (and (char-equal (char-before (- (point) 
2)) ?^ )
                                          (char-equal (char-before (- (point) 
3)) ?\[ )
                                          ;; [^\]
                                          (= from (- (point) 3)))
                                     (and (char-equal (char-before (- (point) 
2)) ?\^ )
                                          (char-equal (char-equal (- (point) 
3)) ?\\ )
                                          (char-equal (char-equal (- (point) 
4)) ?\[ )
                                          ;; [\^\]
                                          (= from (- (point) 4))))))
                         (1- (point)) (point) 'syntax-table '(3)))
                       ((< from (match-beginning 9))
                        (let ((to (or (and (= lisp-font-lock-regexp 2)
                                       (point) 'lisp-font-lock-regexp-alt)
                           (match-beginning 9) (match-end 9)
                           'face 'lisp-font-lock-regexp-delimiter)
                           (point) to '(lisp-font-lock-regexp-alt nil)))))))
                   ((match-beginning 10) ; (;)
                     (1- (point)) (point) 'syntax-table '(3)))))))))
  "Gaudy level highlighting for Lisp modes.")

(defun lisp-font-lock-regexp (arg)
  "Set regular expression highlighting for current buffer.

ARG may be one of 0, 1, 2 or 3.  See the documentation of the variable
`lisp-font-lock-regexp' for the meaning of these values.  Regular expression
highlighting works in Emacs-Lisp mode only."
   (if (eq major-mode 'emacs-lisp-mode)
       (list (read-number "Locally highlight regexps 0 (off), 1 (min), 2 (med), 
3 (max): "))
     (error "This option can be used in `emacs-lisp-mode' only")))
  (when (and (boundp 'font-lock-mode) font-lock-mode
             (>= arg 0) (<= arg 3) (/= arg lisp-font-lock-regexp))
      (setq lisp-font-lock-regexp arg)
      ;; can't reset parse-sexp-lookup-properties and font-lock-multiline since
      ;; they might have been set by other programs or the user, this could be
      ;; resolved by handling them like buffer-invisibility-spec
      (setq font-lock-extra-managed-props
            ;; removing syntax-table from font-lock-extra-managed-props is dirty
            ;; provided someone else did put it there; eventually this should be
            ;; handled by font-lock-syntactic-keywords in an appropriate fashion
            (delq 'syntax-table
                  (delq 'lisp-font-lock-regexp
                        (delq 'lisp-font-lock-regexp-group
                              (delq 'lisp-font-lock-regexp-alt
      ;; the following is needed to have jit-lock refontify the buffer
      (when jit-lock-mode
        (setq jit-lock-context-unfontify-pos (point-min)))
      (when (> arg 1)
        (set (make-local-variable 'parse-sexp-lookup-properties) t)
        (when (> arg 2)
          (set (make-local-variable 'font-lock-multiline) t))
        (set (make-local-variable 'font-lock-extra-managed-props)
             (append font-lock-extra-managed-props
                     (list 'syntax-table
      (dolist (window (get-buffer-window-list (current-buffer) nil t))
        ;; refontify contents of any window showing current buffer, avoids
        ;; displaying unfontified buffers
        (with-selected-window window
          (font-lock-fontify-region (window-start) (window-end)))))))

reply via email to

[Prev in Thread] Current Thread [Next in Thread]