[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new strsplit function
From: |
Ben Abbott |
Subject: |
Re: new strsplit function |
Date: |
Tue, 23 Apr 2013 08:18:10 -0400 |
On Apr 20, 2013, at 6:49 PM, Ben Abbott wrote:
> On Apr 20, 2013, at 5:41 PM, Philip Nienhuis wrote:
>
>> Ben Abbott wrote:
>>> On Apr 20, 2013, at 3:51 PM, Philip Nienhuis wrote:
>>>
>>>> Ben Abbott wrote:
>>>>>> On Apr 2, 2013, at 11:42 PM, John W. Eaton wrote:
>> :
>> <snip>
>> :
>>>>>>> Also, the new strsplit does not seem to be working correctly for
>>>>>>> single-quoted strings that contain escape sequences. I think that
>>>>>>>
>>>>>>> strsplit ("foo\tbar", '\t')
>>>>>>>
>>>>>>> should split on the TAB, but it is currently returning the original
>>>>>>> string for me.
>>>>>>
>>>>>> Opps. I hadn't considered single quoted strings. Your example does
>>>>>> work in Matlab. How is this sort of thing handled in other places? It
>>>>>> is sufficient to just ...
>>
>> <snip>
>>
>>>> And then there is the bug John mentioned with escape characters.
>>>> On a stable MinGW 3.6.4 build it doesn't work either:
>>>>
>>>> octave:1> strsplit ("foo\tbar", '\t')
>>>> ans =
>>>> {
>>>> [1,1] = foo bar
>>>> }
>>>
>>> Is this with the changeset I attached applied?
>>
>> Sorry I got a but confused, didn't pick up that your "sq" cs was for that
>> very issue.
>>
>> I applied it manually to yesterday's MXE-build (from yesterday's updated
>> source tree) and now it works correctly:
>>
>> octave-cli:1> test strsplit
>> PASSES 38 out of 38 tests
>> octave-cli:2> strsplit ("foo\tbar", '\t')
>> ans =
>> {
>> [1,1] = foo
>> [1,2] = bar
>> }
>> octave-cli:3>
>>
>>
>> Anyway, in view of JWE's comments, what is going to happen now with
>> strsplit.m?
>> Do we keep the changes and the "legacy" option, or do we simply follow ML as
>> John suggested?
>>
>> As long as speed isn't affected (-significantly) for simple cases I don't
>> care very much.
>
> Rik indicated the slow down was about 30x. If we keep the "legacy" code, but
> don't want to commit to supporting the "legacy" option, then I'd just
> eliminate "legacy" from the doc-string and throw an error if the user
> requests it.
>
>> You have already patched the affected core functions.
>> In the io package (spreadsheet functions) there are only "simple" calls to
>> strsplit (i.e. with just a one-char delimiter) so am I right to assume no
>> changes are required there?
>
> If we get rid of the "legacy", then only list_forge_packages.m and strread.m
> need to be patched.
>
>> BTW in the texinfo help of strsplit there's neither an (explicit)
>> explanation of what the 'simple' DELIMITERTYPE does, nor an example.
>
> I can fix that. The "simple" version doesn't support regexp expressions.
> Although the current implementation of "simple" does rely upon regexp. There
> may be some some cases (bugs) where regexp stuff sneaks past by my clumsy
> attempt to escape special characters. Perhaps it is a good idea to rewrite
> that part so as not to rely on regexp?
>
> Ben
I got side tracked looking at how to correctly collect the collapsed delimiters
for the 2nd output.
http://hg.savannah.gnu.org/hgweb/octave/rev/03a28487fa9d
I'll prepare the changeset for sq-strings next.
Ben
- Re: new strsplit function, (continued)
- Re: new strsplit function, Ben Abbott, 2013/04/02
- Re: new strsplit function, Carnë Draug, 2013/04/02
- Re: new strsplit function, Ben Abbott, 2013/04/02
- Re: new strsplit function, John W. Eaton, 2013/04/02
- Re: new strsplit function, Ben Abbott, 2013/04/03
- Re: new strsplit function, Ben Abbott, 2013/04/20
- Re: new strsplit function, Philip Nienhuis, 2013/04/20
- Re: new strsplit function, Ben Abbott, 2013/04/20
- Re: new strsplit function, Philip Nienhuis, 2013/04/20
- Re: new strsplit function, Ben Abbott, 2013/04/20
- Re: new strsplit function,
Ben Abbott <=
- Re: new strsplit function, Ben Abbott, 2013/04/23
- Re: new strsplit function, Ben Abbott, 2013/04/02