octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Regexp cleanup


From: PhilipNienhuis
Subject: Re: Regexp cleanup
Date: Wed, 3 Jul 2013 12:57:23 -0700 (PDT)

Rik-4 wrote
> 7/3/13
> 
> All,
> 
> Does anyone know if the following expression is legal in Matlab?
> 
> [S, E, TE, M, T, NM, SP] = regexp ("John Davis\nRogers, James",
> '(?
> <first>
> \w+)\s+(?
> <last>
> \w+)|(?
> <last>
> \w+),\s+(?
> <first>
> \w+)')
> 
> The issue is with the repeated use of a named capture buffer across an
> alternation operator.  PCRE, which we use underneath for regular
> expressions, does not support non-unique capture names in a pattern. 
> Octave currently works around this by renaming the capture buffers. 
> However, the logic at the far end to parse the output of PCRE and return
> results to Octave is very complex and creaky.  I re-wrote the back end
> routine in util/regexp.cc and I can now, at least, follow what the code is
> doing.  The re-write also solves the following existing bugs (I said it
> was
> creaky).
> 
> 38778: wrong return value for regexp
> 38616: memory leak
> 38149: wrong tokens returned
> 
> So, depending on what Matlab does, would it be okay to drop support for
> this esoterica?  I'm pretty tired of trying to work it out at this point.
> 
> --Rik

Matlab r2013b prerelease does (after changing double quote to single quote,
and removing empty lines):

>> [S, E, TE, M, T, NM, SP] = regexp ('John Davis\nRogers, James',
>> '(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)')

S =
     1    12
E =
    10    25
TE = 
    [2x2 double]    [2x2 double]
M = 
    'John Davis'    'nRogers, James'
T = 
    {1x2 cell}    {1x2 cell}
NM = 
1x2 struct array with fields:
    first
    last
SP = 
    ''    '\'    ''
>> 

...so it seems Matlab thinks this is valid.

Philip



--
View this message in context: 
http://octave.1599824.n4.nabble.com/Regexp-cleanup-tp4655163p4655172.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]