[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Improving strread / textread / textscan
From: |
Philip Nienhuis |
Subject: |
Re: Improving strread / textread / textscan |
Date: |
Mon, 24 Oct 2011 20:49:17 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Answers to three emails in one:
Ben Abbott wrote:
On Oct 23, 2011, at 6:42 PM, Ben Abbott wrote:
Ok. Lets start with writing tests for ML. I'll start by extracting Octave's
tests and confirm they work on ML.
Ben
I've copied the tests from textscan and modified them to run on ML. To do that
I wrote a simple oct_assert function to handle the asserts. Of the total 14
asserts, two of them have failed.
Test #1: Passed.
Test #2: Passed.
Test #3: Passed.
Test #4: Passed.
Test #5: Passed.
Test #6: Passed.
Test #7: Passed.
Test #8: Passed.
Test #9: Failed.
OBSEVED:
16 241 3
EXPECTED:
16 241 3 0
Test #10: Passed.
Test #11: Passed.
Hmmm... on ML2007a, I get:
Test #11: Failed.
OBSERVED:
49 10 76 50
EXPECTED:
76 49 10 76 50
So ML is inconsistent...
( Note I fixed some typos in your script :-) )
Test #12: Failed.
OBSEVED:
2
EXPECTED:
2
4
0
Test #13: Passed.
Test #14: Passed.
The script with the tests and the oct_assert function are attached.
Apparently ML doesn't recognize empty fields squeezed between two literals.
====================
On Oct 23, 2011, at 8:37 PM, Ben Abbott wrote:
> a3 = cell2mat (textscan (sprintf
('Text1Text2Text\nText3TextText\nText57Text63Text'), 'Text%dText%dText'))
>
> Matlab returns ...
>
> a1 =
> 2
> 4
> a2 =
> 2
> Error using cat
> CAT arguments dimensions are not consistent.
I got this wrong. Removing the "cell2mat" ...
a3 = textscan (sprintf ('Text1Text2Text\nText3TextText\nText57Text63Text'),
'Text%dText%dText')
a3 =
[2x1 int32] [2]
a3{1}
ans =
1
3
However, I'm still having trouble understanding ML's behavior.
This might just be a ML bug.
I think Octave does the right (= expected) thing.
=========================
Ben Abbott wrote:
I've made some modifications to your original notes, and added a few more below.
a. "Words" or fields (to be interpreted later) are separated by white-space or
delimiters.
b. The white-space char set can be adapted by the user with the "whitespace"
keyword. It can even be set to empty.
c. White-space is understood to possibly be a vector of white-space chars that
during reading is folded into one char that separates two fields.
d. Delimiters are also characters that separate words / fields. Multiple
delimiters are not folded into a single instance.
e. Vectors of white-space and one delimiter are folded into one _delimiter_
that separates fields.
f. A pair of delimiters separated by white-space (or nothing) imply an empty
value.
g. By default "emptyvalue" is NaN for numeric data types. If the numeric type
doesn't support NaN, the zero is used (int32 for example). For character fields, an empty
value is just an empty string.
h. If so desired, multiple consecutive delimiters can be folded into one delimiter if
"MultipleDelimsAsOne" parameter is set to 1.
i. EOL char sequences (\n, \r\n, or \r) are also delimiters, but are not
affected by the MultipleDelimsAsOne parameter.
As to strread, there's another ML subrule:
<QUOTE>
If your data uses a character other than a space as a delimiter, you
must use the strread parameter 'delimiter' to specify the delimiter
</QUOTE>
What is it, space or whitespace?
IAnyway, if your & mine colllection of inferred rules apply, I do not
understand this (7th test of Octave strread.m):
octave:23> a = strread ("a b c, d e, , f", "%s", "delimiter", ",")
a =
{
[1,1] = a b c
[2,1] = d e
[3,1] =
[4,1] = f
}
(Same goes for ML), while, if the rules apply, especially a. & e., I'd
expect ML would yield:
a =
{
[1,1] = a
[2,1] = b
[3,1] = c
[4,1] = d
[5,1] = e
[6,1] = []
[7,1] = f
}
because in this example there are spaces ("whitespace") separating e.g.,
'a' and 'b'.
But (ML):
>> a = strread ('1 2 3, 4 5, , 6', '%d', 'delimiter', ',')
a =
1
2
3
4
5
0
6
In the above cases, I get the same results for textscan.
So it seems that interpretation & processing of default whitespace
depends on the field format specifier as well?
Weird.
Philip
- Improving strread / textread / textscan, PhilipNienhuis, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/23
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
- Re: Improving strread / textread / textscan,
Philip Nienhuis <=
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/24
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/24
- Re: Improving strread / textread / textscan, Philip Nienhuis, 2011/10/25
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/25
- Re: Improving strread / textread / textscan, PhilipNienhuis, 2011/10/31
- Re: Improving strread / textread / textscan, Ben Abbott, 2011/10/31