[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regexp bytecode disassembler

From: Pip Cet
Subject: Re: Regexp bytecode disassembler
Date: Fri, 20 Mar 2020 15:39:03 +0000

On Fri, Mar 20, 2020 at 12:28 PM Mattias Engdegård <address@hidden> wrote:
> It is sometimes useful to inspect the generated regexp engine bytecode, but 
> doing so currently involves recompiling with REGEX_EMACS_DEBUG configured, 
> setting an internal variable using a debugger, and watching data scrolling 
> past on stderr.
> This patch adds a lisp-based regexp bytecode disassembler which is always 
> available without any runtime cost to the regexp engine. It is mainly a tool 
> for maintainers but curious users may find it useful as well. It has already 
> revealed one bug in the regexp compiler, now fixed (f189e5dc10).

This looks excellent!

I think we should warn more about the non-reentrancy of our regexp
code, though: the disassembled text of a regexp may change when it is
used to match a string. Alternatively, we could omit volatile state
information from the disassembled text.

I don't think
  exactn "a"
is very readable, since there's no n on the right hand side. exactn 1,
"a" would reflect the bytecode more precisely, while exact "a" would
work better as a description, IMHO.

I'd use nreverse rather than reverse, if we're worried about garbage
collecting a few cells :-)

I'd print the address of the "value" of succeed-n etc separately: that
makes it easier to find the corresponding set-number-at.  So instead
of printing

   10  succeed-n addr 23, value 0

we could print

  10  succeed-n addr 23, value 0 at addr 13

Or similar.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]