On 9-Sep-2008, David Bateman wrote:
| Ben Abbott wrote:
| > On Tuesday, September 09, 2008, at 09:41AM, "David Bateman"
<address@hidden> wrote:
| >
| >> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length
| >> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok
| >> but "(?<[a-z]*)" isn't. I'd hoped to replace this with
| >> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is
| >> not ok either. What I'd have to do is replace it with
| >>
| >> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
| >>
| >> which used the alternate operator and MALENGTH+1 copies of the
| >> lookbehind expression to get the effect. This seems to be a ridiculous
| >> amount of extra crap in the pattern space to get this functionality. Is
| >> it worth supporting arbitrary length lookbehind expressions like
| >> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is
| >> it worth supporting it but limits max_length, and print a warning? If so
| >> what value should be the limit?
| >>
| >> Frankly I wonder how mathworks got this to work as they appear to be
| >> using the Boost regex library which also doesn't support arbitrary
| >> length lookbehind expressions....
| >>
| >> D.
| >>
| >
| > David,
| >
| > Have you tried the example in Matlab?
| >
| > Using 2007b, It does *not* work for me. My 2008a/b is busy running some
simulations, so I can't try it there until later.
| >
| >
| >>> g='x^(-1)+y(-1)+z(-1)=0';
| >>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
| >>>
| > ans =
| > x^_minus1+y_minus1+z_minus1=0
| >
| > If I understand correctly the result should be
| >
| > ans =
| > x^(-1)+y_minus1+z_minus1=0
| >
| > Correct?
| >
| > Ben
| >
| >
| >
| >
|
| The message
|
| http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff
|
| seems to imply that mathworks have their own regexp engine and that
| lookbehind is inefficient. I therefore don't consider it that much of an
| issue to duplicate the lookbehind pattern in the pattern space and so
| propose the attached changeset that replaces "(?>=[a-z]*)" with
| "((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on
| it. It also issues a warning about the maximum length string if the
| lookbehind might be an issue. So the limitation is that "+" then
| represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind
| expression. This limitation doesn't apply to lookaheads, etc.
Is the bug report
http://bugs.exim.org/show_bug.cgi?id=547
the same problem? Note the comment
I can't see an efficient way of doing this with the current
implementation. Note that Perl is even more restrictive - all
alternatives in the lookbehind have to be the same length in Perl.