[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Single/Double precision equality
From: |
Daniel J Sebald |
Subject: |
Re: Single/Double precision equality |
Date: |
Sat, 27 Sep 2014 12:41:47 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16 |
On 09/26/2014 11:35 PM, Rik wrote:
On 09/26/2014 04:35 PM, address@hidden wrote:
Subject:
Single/double precision equality question
From:
Daniel J Sebald <address@hidden>
Date:
09/26/2014 04:34 PM
To:
address@hidden
List-Post:
<mailto:address@hidden>
Content-Transfer-Encoding:
7bit
Precedence:
list
MIME-Version:
1.0
Message-ID:
<address@hidden>
Content-Type:
text/plain; charset=ISO-8859-1; format=flowed
Message:
6
Should the following equality behavior be explained in "help =="?
format bit
single(0.1)
ans = 0011111110111001100110011001100110100000000000000000000000000000
double(0.1)
ans = 0011111110111001100110011001100110011001100110011001100110011010
single(0.1) == double(0.1)
ans = 0011111111110000000000000000000000000000000000000000000000000000
I guess I'm OK with it. The test demotes the double to a single. Try:
double(single(0.1)) == double(0.1)
ans = 0000000000000000000000000000000000000000000000000000000000000000
However, the demoting rule doesn't hold true for all classes. Try:
format
v1 = pi;
v2 = int8(v1);
v1 == v2
ans = 0
Whatever the case, it probably would be good to have this documented
for the user. Perhaps it is, but I don't know where to look beyond
"help ." and "help ==".
Look at section 4.7 of the manual "Promotion and Demotion of Data
Types". I'm sure there's still room for improvement there if you want
to reword things.
OK, thanks. I vaguely remember this now. At the time there was
discussion of whether equality tests should even be allowed for mixed
types. Anyway, the description could use some rewording (or even
removal) given that on the whole it doesn't really seem to make the
issue any clearer. I can see, though, it would take a bold and brave
soul to attempt to rewrite.
"
The reason is that if Octave
promoted values in expressions like the above with all numerical
constants would need to be explicitly cast to the appropriate data type
like
uint8 (1) + uint8 (1)
=> 2
"
I like the above explanation, but it seems sort of ancillary. Maybe
making more of a main point would be helpful...
Basically, when different types are used in a mixed operation such as A
+ B, then integer trumps single which trumps double. Thus, single (0.1)
== double (0.1) should demote the RHS to single and then do the
comparison which equals true.
When doing mixed integer/floating point operations the integer is
temporarily promoted to floating point and the final result is then cast
back to integer (at least that is the way Matlab does it). This
explains the second example because 3 (int8) is promoted to double which
!= pi (double).
...and here is where another explanation could be used. Going from int
to float back to int is at first puzzling.
The explanation may lie in the fact that integers are naturally cast to
double upon input.
octave:7> x = 1
x = 1
octave:8> class(x)
ans = double
That is, we don't have to type "x = 1.0" to make x be floating point.
Thus integers are naturally cast to float when operations are evaluated.
Then again, in the case where there is no ambiguity about the variable
type, such as a char, the char is promoted to double when evaluated as
well, and when stored:
octave:25> 'a' == 97.1
ans = 0
octave:27> 'a' + 1.1
ans = 98.100
Maybe the easiest thing is to concede to arbitrariness and in the
documentation add a third column, an "operator evaluation" column:
Mixed Operation Evaluate Storage
-------------------------------------------
double OP single single single
double OP integer double integer
double OP char double double
double OP logical double double
single OP integer single integer
single OP char single single
single OP logical single single
And then there is this result:
octave:28> uint8(3) == uint16(3)
ans = 1
octave:29> uint8(3) + uint16(3)
error: binary operator '+' not implemented for 'uint8 scalar' by 'uint16
scalar' operations
Why? This is no problem:
octave:35> uint8(3) + 3
ans = 6
octave:36> class(ans)
ans = uint8
so why should uint8(3) + uint16(3) be invalid? At the start of the
documentation it's argued time is saved by not having to explicitly
cast, but now we're forced to do so with
octave:37> uint8(300) + uint8(uint16(300))
ans = 255
Oh well.
Dan