help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....


From: John W. Eaton
Subject: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....
Date: Tue, 09 Sep 2008 12:37:10 -0400

On  9-Sep-2008, David Bateman wrote:

| Ben Abbott wrote:
| > On Tuesday, September 09, 2008, at 09:41AM, "David Bateman" 
<address@hidden> wrote:
| >   
| >> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length 
| >> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok 
| >> but "(?<[a-z]*)" isn't. I'd hoped to replace this with 
| >> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is 
| >> not ok either. What I'd have to do is replace it with
| >>
| >> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
| >>
| >> which used the alternate operator and MALENGTH+1 copies of the 
| >> lookbehind expression to get the effect. This seems to be a ridiculous 
| >> amount of extra crap in the pattern space to get this functionality. Is 
| >> it worth supporting arbitrary length lookbehind expressions like 
| >> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is 
| >> it worth supporting it but limits max_length, and print a warning? If so 
| >> what value should be the limit?
| >>
| >> Frankly I wonder how mathworks got this to work as they appear to be 
| >> using the Boost regex library which also doesn't support arbitrary 
| >> length lookbehind expressions....
| >>
| >> D.
| >>     
| >
| > David,
| >
| > Have you tried the example in Matlab?
| >
| > Using 2007b, It does *not* work for me. My 2008a/b is busy running some 
simulations, so I can't try it there until later.
| >
| >   
| >>> g='x^(-1)+y(-1)+z(-1)=0';
| >>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
| >>>       
| > ans =
| > x^_minus1+y_minus1+z_minus1=0
| >
| > If I understand correctly the result should be 
| >
| > ans =
| > x^(-1)+y_minus1+z_minus1=0
| >
| > Correct?
| >
| > Ben
| >
| >
| >
| >   
| 
| The message
| 
| 
http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff
| 
| seems to imply that mathworks have their own regexp engine and that 
| lookbehind is inefficient. I therefore don't consider it that much of an 
| issue to duplicate the lookbehind pattern in the pattern space and so 
| propose the attached changeset that replaces "(?>=[a-z]*)" with 
| "((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on 
| it. It also issues a warning about the maximum length string if the 
| lookbehind might be an issue. So the limitation is that "+" then 
| represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind 
| expression. This limitation doesn't apply to lookaheads, etc.

Is the bug report

  http://bugs.exim.org/show_bug.cgi?id=547

the same problem?  Note the comment

  I can't see an efficient way of doing this with the current
  implementation.  Note that Perl is even more restrictive - all
  alternatives in the lookbehind have to be the same length in Perl.

I guess it might be worth asking whether there is a way to get this
feature, even if it is not efficient.

Meanwhile, I've applied your changeset.

Thanks,

jwe


reply via email to

[Prev in Thread] Current Thread [Next in Thread]