help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to grok a complicated regex?


From: Emanuel Berg
Subject: Re: How to grok a complicated regex?
Date: Fri, 13 Mar 2015 23:46:48 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)

Marcin Borkowski <mbork@wmi.amu.edu.pl> writes:

> so I have this monstrosity [note: I know, there are
> much worse ones, too!]:
>
> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
>
> (it's in the org-latex--script-size function in
> ox-latex.el, if you're curious).
>
> I'm not asking “what does this match” – I can read
> it myself. But it comes with a considerable effort.

I dare say most people (even programmers) cannot read
that so if you can that's great. As a math
professional you are of course aware of the discipline
called automata theory that deals with such things.
Perhaps relational algebra might help to, if the data
in the sets are strings. But automata theory should be
it even more.

Also, remember you don't have to understand those
expressions. Often they are setup incrementally. They
only need to be correct. The computer understands them
- the programmer only understands the purpose, and the
latest edition. Kind of risky, perhaps not what I math
person would be appealed by, but I've constructed many
that way so I know that method works.

> Are you aware of any tools that might help to
> understand such regexen?

I have seen tools with which you can construct such
expressions and they output figures, states,
transitions, and so on. I wonder how advanced
expression they can deal with? But if you get the
basics right, it should be just basic building blocks
that stick together and from there on the sky is the
limit.

Instead the problem is, as I see it: will those
figures, balls and arrows, tagged with preconditions,
postconditions, everything you can think of, will that
actually be *clearer*?

If I were to do it (which I am not thanks god) my
answer would be *no*. The only way I could do it would
instead be the opposite. Train the brain with such
expressions - exactly as they are - day in, day out,
until they are second nature.

Example: a C++ OO project with classes and everything.
Silly inheritance and interfaces. Some people would
consider those pretty darn difficult to understand.
But to the seasoned C++ programmer (no exaggerating
here, a few years of focused training is enough) those
programs are clear. For those guys, giving up writing
C++ code and instead using some other representation
(be it graphical or not) would be to in one stroke
cripple their skills.

So no, I think that representation is the best there
is. To translate it back and forth would not only be
very difficult to do - and even if possible, which of
course it is, because a representation is just a
representation of I don't know how many possible - I
don't see the end result being any more clear: on the
contrary, most likely.

What I would do - try to get it more readable by using
classes, string classes (do they exist?), and even
more advanced constructs if necessary - as in this
simple example:

    (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)")

How do you define those? Can you identify any which
aren't there, but could/should be?

Example: say there is a class called "delimiters"
which contain [, (, {, <, >, }, ), and ]. Can you
split that up, in "opening-delimiters" and closing
ditto?

Second, exactly you mentioned - the font lock issue -
work on that.

You do know, of course, of

    font-lock-regexp-grouping-construct
    font-lock-regexp-grouping-backslash

Are there more of those, that you can identify, and
add?

-- 
underground experts united


reply via email to

[Prev in Thread] Current Thread [Next in Thread]