Re: regexp question

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp question

From:	Philip Nienhuis
Subject:	Re: regexp question
Date:	Tue, 06 Dec 2011 21:00:13 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Sergei Steshenko wrote:





----- Original Message -----

From: Philip Nienhuis<address@hidden>
To: William Krekeler<address@hidden>
Cc: "address@hidden"<address@hidden>; address@hidden
Sent: Tuesday, December 6, 2011 7:52 PM
Subject: Re: regexp question

Sergei, Wiliam,

2 answers in one post:

Sergei Steshenko wrote:

  I guess you need 'aa' surrounded by not 'a'. Octave uses

PCRE; I am not familiar with nuances of Octave PCRE usage; in Perl I would write
the regular expression this way:


  [^a]aa[^a]

  and if/when it matches, it returns pointer to the character preceding the

'aa' substring, i.e. in case of 'baab' it should return pointer
to the first 'b'.

Thanks, Sergei. I already tried this and found it'll work, but unfortunately
not in a more complicated situation:

octave:35>  tststr3 = 'aa aaaaa baa'     ## Patterns at start&
end
tststr3 = aa aaaaa baa
octave:36>  regexp (tststr3, "[^a]aa[^a]")
ans = [](1x0)                           ## Hey......

but
octave:41>  tststr4 = ' aa aaaaa baa '   ## Note spaces at start and
end
tststr4 =  aa aaaaa baa
octave:42>  regexp (tststr4, "[^a]aa[^a]")
ans =
     1   11

... so it doesn't catch the pattern at start and end of line.

[snip]

I still suggest the Perl regular expressions tutorials/documentation I gave 
links to.

Straightforwardly the regular expression can be extended to (in Perl syntax) :

(^|[^a])(aa)([^a]|$)
#  $1    $2    $3
.

Not inside character class '^' means line beginning, and '$' means line end.

In Perl terms the 'aa' part you are interest in is in $2.


Thank you, Sergei.

How do I get $2?

octave-3.5.0+:1> tststr3 = 'aa aaaaa baa'  # No spaces at ends
tststr3 = aa aaaaa baa
octave-3.5.0+:2> regexp (tststr3, "(^|[^a])(aa)([^a]|$)")
ans =
    1   10

octave-3.5.0+:3> tststr3(1)ans = aoctave-3.5.0+:4> tststr3(10)

ans = b

... so there's some extra interpretation involved to get the properposition. (Little wonder as line beginnings/-ends have no length.)

Anyway, I think a regexp() solution is doomed here as its execution timeis -currently- excessive (see my previous post).A while ago Rik wrote that regexprep() would be in the order of 20 Xslower than strrep. The script in my previous post confirms thisrelative slowness of regexp vs. compiled script functions.


In conclusion, I think I'll try to cook up something with strfind().

Philip

[Prev in Thread]

Current Thread

[Next in Thread]

regexp question, PhilipNienhuis, 2011/12/05
- RE: regexp question, William Krekeler, 2011/12/05
- RE: regexp question, William Krekeler, 2011/12/05
  - Re: regexp question, Philip Nienhuis, 2011/12/06
    - Re: regexp question, Sergei Steshenko, 2011/12/06
    - Re: regexp question, Philip Nienhuis <=
    - Re: regexp question, Sergei Steshenko, 2011/12/07
- Re: regexp question, Sergei Steshenko, 2011/12/05

Prev by Date: Prompt line number
Next by Date: Re: Search string in cell string?
Previous by thread: Re: regexp question
Next by thread: Re: regexp question
Index(es):
- Date
- Thread