[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps

From: Luc Teirlinck
Subject: Re: Unquoted special characters in regexps
Date: Sat, 4 Mar 2006 21:37:53 -0600 (CST)

Thien-Thi Nguyen wrote:
   whether or not the delimiter itself is considered inside or outside
   depends on whether your pov tends to be forward- or backward-looking
   (which is a personal choice, and thus, algorithmically irrelevent).

No, both the notion of context and the forward-looking view are
algorithmically _very_ relevant.

If you consider in "[a]b]" the first and the second `]' to be _both_
inside or _both_ outside the context of a character alternative, then
it would be impossible to determine solely from that notion of context
which of the two `]' has to be taken literally.  If you consider the
opening and ending " of a string to be _both_ inside or _both_ outside
the context of a string, then it would be impossible from that notion
of context to determine which " open and which " close strings.

Thus any such notions of context are useless.

On the other hand the regexp compiler uses the notion
of context I mentioned to determine which `[' or `]' are to be
interpreted literally.  It is also how other parsers determine which "
open strings and which close them.  Hence, that notion of context is
useful, in fact, necessary.

Also, forward and backward views of a regexp are not
algorithmically equivalent.  If you read a regexp forward, you know
immediately when you encounter a character whether it has to be taken
literally or not (or at worst after a _very_ limited number of
characters, as the second `[' in in "[[:...").  If you read the regexp
backward, you may have to read all the way back to the beginning
before you can be sure that a `]' is to be taken literally.
Hence, reading a regexp forward _is_ algorithmically _very_ superior
over reading it backward if your purpose is to understand the regexp.
I must admit however, that if you want is to uncover the subliminal
satanic messages in the regexp, then you _have_ to read it backward.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]