help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp question


From: Philip Nienhuis
Subject: Re: regexp question
Date: Tue, 06 Dec 2011 21:00:13 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Sergei Steshenko wrote:




----- Original Message -----
From: Philip Nienhuis<address@hidden>
To: William Krekeler<address@hidden>
Cc: "address@hidden"<address@hidden>; address@hidden
Sent: Tuesday, December 6, 2011 7:52 PM
Subject: Re: regexp question

Sergei, Wiliam,

2 answers in one post:

Sergei Steshenko wrote:
  I guess you need 'aa' surrounded by not 'a'. Octave uses
PCRE; I am not familiar with nuances of Octave PCRE usage; in Perl I would write
the regular expression this way:

  [^a]aa[^a]

  and if/when it matches, it returns pointer to the character preceding the
'aa' substring, i.e. in case of 'baab' it should return pointer
to the first 'b'.

Thanks, Sergei. I already tried this and found it'll work, but unfortunately
not in a more complicated situation:

octave:35>  tststr3 = 'aa aaaaa baa'     ## Patterns at start&
end
tststr3 = aa aaaaa baa
octave:36>  regexp (tststr3, "[^a]aa[^a]")
ans = [](1x0)                           ## Hey......

but
octave:41>  tststr4 = ' aa aaaaa baa '   ## Note spaces at start and
end
tststr4 =  aa aaaaa baa
octave:42>  regexp (tststr4, "[^a]aa[^a]")
ans =
     1   11

... so it doesn't catch the pattern at start and end of line.

[snip]

I still suggest the Perl regular expressions tutorials/documentation I gave 
links to.

Straightforwardly the regular expression can be extended to (in Perl syntax) :

(^|[^a])(aa)([^a]|$)
#  $1    $2    $3
.

Not inside character class '^' means line beginning, and '$' means line end.

In Perl terms the 'aa' part you are interest in is in $2.

Thank you, Sergei.

How do I get $2?

octave-3.5.0+:1> tststr3 = 'aa aaaaa baa'  # No spaces at ends
tststr3 = aa aaaaa baa
octave-3.5.0+:2> regexp (tststr3, "(^|[^a])(aa)([^a]|$)")
ans =
    1   10

octave-3.5.0+:3> tststr3(1) ans = a octave-3.5.0+:4> tststr3(10)
ans = b

... so there's some extra interpretation involved to get the proper position. (Little wonder as line beginnings/-ends have no length.)


Anyway, I think a regexp() solution is doomed here as its execution time is -currently- excessive (see my previous post). A while ago Rik wrote that regexprep() would be in the order of 20 X slower than strrep. The script in my previous post confirms this relative slowness of regexp vs. compiled script functions.

In conclusion, I think I'll try to cook up something with strfind().

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]