help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ..


From: David Bateman
Subject: Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....
Date: Wed, 10 Sep 2008 13:49:15 +0200
User-agent: Thunderbird 2.0.0.16 (X11/20080725)

John W. Eaton wrote:
On  9-Sep-2008, David Bateman wrote:

| Ben Abbott wrote:
| > On Tuesday, September 09, 2008, at 09:41AM, "David Bateman" 
<address@hidden> wrote:
| > | >> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length | >> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok | >> but "(?<[a-z]*)" isn't. I'd hoped to replace this with | >> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is | >> not ok either. What I'd have to do is replace it with
| >>
| >> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
| >>
| >> which used the alternate operator and MALENGTH+1 copies of the | >> lookbehind expression to get the effect. This seems to be a ridiculous | >> amount of extra crap in the pattern space to get this functionality. Is | >> it worth supporting arbitrary length lookbehind expressions like | >> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is | >> it worth supporting it but limits max_length, and print a warning? If so | >> what value should be the limit?
| >>
| >> Frankly I wonder how mathworks got this to work as they appear to be | >> using the Boost regex library which also doesn't support arbitrary | >> length lookbehind expressions....
| >>
| >> D.
| >> | >
| > David,
| >
| > Have you tried the example in Matlab?
| >
| > Using 2007b, It does *not* work for me. My 2008a/b is busy running some 
simulations, so I can't try it there until later.
| >
| > | >>> g='x^(-1)+y(-1)+z(-1)=0';
| >>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
| >>> | > ans =
| > x^_minus1+y_minus1+z_minus1=0
| >
| > If I understand correctly the result should be | >
| > ans =
| > x^(-1)+y_minus1+z_minus1=0
| >
| > Correct?
| >
| > Ben
| >
| >
| >
| > | | The message | | http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff | | seems to imply that mathworks have their own regexp engine and that | lookbehind is inefficient. I therefore don't consider it that much of an | issue to duplicate the lookbehind pattern in the pattern space and so | propose the attached changeset that replaces "(?>=[a-z]*)" with | "((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on | it. It also issues a warning about the maximum length string if the | lookbehind might be an issue. So the limitation is that "+" then | represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind | expression. This limitation doesn't apply to lookaheads, etc.

Is the bug report

  http://bugs.exim.org/show_bug.cgi?id=547

the same problem?  Note the comment

  I can't see an efficient way of doing this with the current
  implementation.  Note that Perl is even more restrictive - all
  alternatives in the lookbehind have to be the same length in Perl.
Well I added this as alternative lookbehinds rather than alternatives in the lookbend expression itself. However yes it is the same issue.


I guess it might be worth asking whether there is a way to get this
feature, even if it is not efficient.
The inefficient way of doing it is essentially do the pattern space expansion I did but in PCRE itself. However it can be more efficient in PCRE as it can know how much it has to expand the search length. There also cases like "(?<=Nov(ember))" to consider that match both "Nov" and "November" and so need to be expanded as "((?<=Nov)|(?<=November))" that I haven't taken into account. Maybe this is what the bug report is taking about about PCRE handling alternatively in the lookbehind expressions.

Yes it would be better if PCRE handled this internal rather than leaving us to do it by modifying the pattern.

Cheers
David

Meanwhile, I've applied your changeset.

Thanks,

jwe



--
David Bateman                                address@hidden
Motorola Labs - Paris +33 1 69 35 48 04 (Ph) Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob) 91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax) The information contained in this communication has been classified as: [x] General Business Information [ ] Motorola Internal Use Only [ ] Motorola Confidential Proprietary



reply via email to

[Prev in Thread] Current Thread [Next in Thread]