[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Saving TeX auto parsers's regexp groups
From: |
Arash Esbati |
Subject: |
Saving TeX auto parsers's regexp groups |
Date: |
Wed, 19 Jul 2023 12:43:58 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Hi all,
there is a comment in the function `TeX-auto-parse-region' saying:
;; TODO: Emacs allows at most 255 groups in a regexp, see the
;; "#define MAX_REGNUM 255" in regex-emacs.c. If our regex
;; has more groups, bad things may happen, e.g.,
;; (match-beginning 271) returns nil although the regexp that
;; matched contains group number 271. Sadly, MAX_REGNUM is
;; not exposed to Lisp, so we need to hard-code it here (and
;; sometimes check if it increased in newer Emacs versions).
I haven't asked Emacs about enlarging the value, but I see that AUCTeX
is also somewhat wasteful on regexp groups during parsing. Starting
with plain-TeX mode, open a file like this:
--8<---------------cut here---------------start------------->8---
\noindent
{\bf TeX-engine}:
\noindent
This variable allows you to choose which TeX engine should be used for
typesetting the document, i.e. the executables which will be used when
you invoke the `TeX' or `LaTeX' commands.
\bye
%%% Local Variables:
%%% mode: plain-tex
%%% TeX-master: t
%%% End:
--8<---------------cut here---------------end--------------->8---
and then eval in scratch this function:
--8<---------------cut here---------------start------------->8---
(defun TeX-auto-parse-region (regexp-list beg end)
"Parse TeX information according to REGEXP-LIST between BEG and END."
(if (symbolp regexp-list)
(setq regexp-list (and (boundp regexp-list) (symbol-value regexp-list))))
(if regexp-list
;; Extract the information.
(let* (groups
(count 1)
(regexp (concat "\\("
(mapconcat
(lambda(x)
(push (cons count x) groups)
(setq count
(+ 1 count
(TeX-regexp-group-count (car x))))
(car x))
regexp-list "\\)\\|\\(")
"\\)"))
syms
lst)
;; TODO: Emacs allows at most 255 groups in a regexp, see the
;; "#define MAX_REGNUM 255" in regex-emacs.c. If our regex
;; has more groups, bad things may happen, e.g.,
;; (match-beginning 271) returns nil although the regexp that
;; matched contains group number 271. Sadly, MAX_REGNUM is
;; not exposed to Lisp, so we need to hard-code it here (and
;; sometimes check if it increased in newer Emacs versions).
;; The following line added:
(message (format "The TeX auto-parser's regexp used %d groups" count))
;; End addition
(when (> count 255)
(error "The TeX auto-parser's regexp has too many groups (%d)" count))
(setq count 0)
(goto-char (if end (min end (point-max)) (point-max)))
(while (re-search-backward regexp beg t)
(let* ((entry (cdr (TeX-member nil groups
(lambda (_a b)
(match-beginning (car b))))))
(symbol (nth 2 entry))
(match (nth 1 entry)))
(unless (TeX-in-comment)
(looking-at (nth 0 entry))
(if (fboundp symbol)
(funcall symbol match)
(puthash (if (listp match)
(mapcar #'TeX-match-buffer match)
(TeX-match-buffer match))
(setq count (1- count))
(cdr (or (assq symbol syms)
(car (push
(cons symbol
(make-hash-table :test #'equal))
syms)))))))))
(setq count 0)
(dolist (symbol syms)
(setq lst (symbol-value (car symbol)))
(while lst
(puthash (pop lst)
(setq count (1+ count))
(cdr symbol)))
(maphash (lambda (key value)
(push (cons value key) lst))
(cdr symbol))
(clrhash (cdr symbol))
(set (car symbol) (mapcar #'cdr (sort lst #'car-less-than-car)))))))
--8<---------------cut here---------------end--------------->8---
and go back to the tex file and hit 'C-c C-n'. You should get
The TeX auto-parser’s regexp used 19 groups
in Messages buffer. If I apply this patch to tex.el,
--8<---------------cut here---------------start------------->8---
diff --git a/tex.el b/tex.el
index 064e694d..c7010f94 100644
--- a/tex.el
+++ b/tex.el
@@ -4267,21 +4267,16 @@ alter the numbering of any ordinary, non-shy groups.")
(defvar plain-TeX-auto-regexp-list
(let ((token TeX-token-char))
- `((,(concat "\\\\def\\\\\\(" token "+\\)[^a-zA-Z@]")
+ `((,(concat "\\\\\\(?:def\\|let\\)\\\\\\(" token "+\\)[^a-zA-Z@]")
1 TeX-auto-symbol-check)
- (,(concat "\\\\let\\\\\\(" token "+\\)[^a-zA-Z@]")
- 1 TeX-auto-symbol-check)
- (,(concat "\\\\font\\\\\\(" token "+\\)[^a-zA-Z@]") 1 TeX-auto-symbol)
- (,(concat "\\\\chardef\\\\\\(" token "+\\)[^a-zA-Z@]") 1 TeX-auto-symbol)
- (,(concat "\\\\new\\(?:count\\|dimen\\|muskip\\|skip\\)\\\\\\(" token
- "+\\)[^a-zA-Z@]")
+ (,(concat "\\\\"
+ (regexp-opt '("font" "newfont" "chardef" "mathchardef"
+ "newcount" "newdimen" "newmuskip" "newskip"))
+ "{?\\\\\\(" token "+\\)}?[^a-zA-Z@]")
1 TeX-auto-symbol)
- (,(concat "\\\\newfont{?\\\\\\(" token "+\\)}?") 1 TeX-auto-symbol)
(,(concat "\\\\typein\\[\\\\\\(" token "+\\)\\]") 1 TeX-auto-symbol)
("\\\\input +\\([^#}%\"\\\n\r]+?\\)\\(?:\\.[^#}%/\"\\.\n\r]+\\)?"
- 1 TeX-auto-file)
- (,(concat "\\\\mathchardef\\\\\\(" token "+\\)[^a-zA-Z@]")
- 1 TeX-auto-symbol)))
+ 1 TeX-auto-file)))
"List of regular expression matching common plain TeX macro definitions.")
(defvar TeX-auto-full-regexp-list plain-TeX-auto-regexp-list
--8<---------------cut here---------------end--------------->8---
rebuild AUCTeX, restart Emacs and do the same procedure, I get
The TeX auto-parser’s regexp used 9 groups
which is 10 regexp groups less. So the question is: Should we go this
route? I didn't run any benchmarks reg. performance if we put single
regexp's entries into one single larger regexp. And latex.el will be
the target. Running a similar test with a LaTeX file says:
The TeX auto-parser’s regexp used 125 groups
So half of the cake is gone upon loading vanilla LaTeX; no other AUCTeX
styles loaded.
Any comments welcome.
Best, Arash
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Saving TeX auto parsers's regexp groups,
Arash Esbati <=