FLOATING POINT DEMO

Shows several MATLAB floating point facilities such as: overflow,underflow, cancellation, machine epsilon, special quantities IEEE standard

Floating Point Demo

format long
format compact

Overflow/underflow

1
ans = 1
-1
ans = -1
1e100
ans = 1.000000000000000e+100
1e-100
ans = 1.000000000000000e-100
1e400
ans = Inf
1e-400
ans = 0

Cancellation

x=rand
x = 0.126986816293506
y=rand
y = 0.913375856139019
z=x-y
z = -0.786389039845513
x1=x+1e10
x1 = 1.000000000012699e+10
y1=y+1e10
y1 = 1.000000000091338e+10
z1=x1-y1
z1 = -0.786388397216797
z1-z
ans = 6.426287164629230e-07

Epsilon

1+1e-20
ans = 1
(1+1e-20)-1
ans = 0
1+1e-16
ans = 1
1+2e-16
ans = 1.000000000000000
(1+2e-16)-1
ans = 2.220446049250313e-16
e=1;
while (1+e>1) e=e/2, end % Don't optimize!
e = 0.500000000000000
e = 0.250000000000000
e = 0.125000000000000
e = 0.062500000000000
e = 0.031250000000000
e = 0.015625000000000
e = 0.007812500000000
e = 0.003906250000000
e = 0.001953125000000
e = 9.765625000000000e-04
e = 4.882812500000000e-04
e = 2.441406250000000e-04
e = 1.220703125000000e-04
e = 6.103515625000000e-05
e = 3.051757812500000e-05
e = 1.525878906250000e-05
e = 7.629394531250000e-06
e = 3.814697265625000e-06
e = 1.907348632812500e-06
e = 9.536743164062500e-07
e = 4.768371582031250e-07
e = 2.384185791015625e-07
e = 1.192092895507813e-07
e = 5.960464477539063e-08
e = 2.980232238769531e-08
e = 1.490116119384766e-08
e = 7.450580596923828e-09
e = 3.725290298461914e-09
e = 1.862645149230957e-09
e = 9.313225746154785e-10
e = 4.656612873077393e-10
e = 2.328306436538696e-10
e = 1.164153218269348e-10
e = 5.820766091346741e-11
e = 2.910383045673370e-11
e = 1.455191522836685e-11
e = 7.275957614183426e-12
e = 3.637978807091713e-12
e = 1.818989403545857e-12
e = 9.094947017729282e-13
e = 4.547473508864641e-13
e = 2.273736754432321e-13
e = 1.136868377216160e-13
e = 5.684341886080802e-14
e = 2.842170943040401e-14
e = 1.421085471520200e-14
e = 7.105427357601002e-15
e = 3.552713678800501e-15
e = 1.776356839400251e-15
e = 8.881784197001252e-16
e = 4.440892098500626e-16
e = 2.220446049250313e-16
e = 1.110223024625157e-16
eps
ans = 2.220446049250313e-16
b=2^50
b = 1.125899906842624e+15
(b+e*b)-b
ans = 0

realmin and realmax

realmin
ans = 2.225073858507201e-308
realmax
ans = 1.797693134862316e+308

Signed zeros

0
ans = 0
+0
ans = 0
-0
ans = 0

Infinity

1/0
ans = Inf
-1/0
ans = -Inf
0/0
ans = NaN
inf
ans = Inf
1/inf
ans = 0
-1/inf
ans = 0
-1/-inf
ans = 0
2*inf
ans = Inf
inf+inf
ans = Inf
inf^inf
ans = Inf

NaN

inf-inf
ans = NaN
inf/inf
ans = NaN
0/0
ans = NaN
nan+123
ans = NaN

Check for NaN

x=nan;
x==nan
ans = 0
x==x
ans = 0
isnan([1,2,3,nan,inf])
ans = 0 0 0 1 0
isinf([1,2,3,nan,inf])
ans = 0 0 0 0 1

Round to even

e=eps/2
e = 1.110223024625157e-16
1+e
ans = 1
1+2*e
ans = 1.000000000000000
((1+2*e)-1)/e
ans = 2
((1+3*e)-1)/e
ans = 4
((1+4*e)-1)/e
ans = 4
[0:16; ((1+(0:16)*e)-1)/e]'
ans =
0 0 1 0 2 2 3 4 4 4 5 4 6 6 7 8 8 8 9 8 10 10 11 12 12 12 13 12 14 14 15 16 16 16

View hex/bin representations

format hex
0
ans = 0000000000000000
-0
ans = 8000000000000000
inf
ans = 7ff0000000000000
-inf
ans = fff0000000000000
nan
ans = fff8000000000000
-nan
ans = 7ff8000000000000
123123+nan
ans = fff8000000000000
1
ans = 3ff0000000000000
2
ans = 4000000000000000
(1:10)'
ans =
3ff0000000000000 4000000000000000 4008000000000000 4010000000000000 4014000000000000 4018000000000000 401c000000000000 4020000000000000 4022000000000000 4024000000000000
realmin
ans = 0010000000000000
realmax
ans = 7fefffffffffffff
eps
ans = 3cb0000000000000
xs=[0,-0,inf,-inf,nan,-nan,1:10,1+(0:10)*2^-23,2-(10:-1:0)*2^-23];
format short
for x=xs
fprintf('%10.8g %s\n',x,num2bin(single(x),true));
%pause
end
0 0 00000000 00000000000000000000000
-0 1 00000000 00000000000000000000000
Inf 0 11111111 00000000000000000000000
-Inf 1 11111111 00000000000000000000000
NaN 1 11111111 10000000000000000000000
NaN 0 11111111 10000000000000000000000
1 0 01111111 00000000000000000000000
2 0 10000000 00000000000000000000000
3 0 10000000 10000000000000000000000
4 0 10000001 00000000000000000000000
5 0 10000001 01000000000000000000000
6 0 10000001 10000000000000000000000
7 0 10000001 11000000000000000000000
8 0 10000010 00000000000000000000000
9 0 10000010 00100000000000000000000
10 0 10000010 01000000000000000000000
1 0 01111111 00000000000000000000000
1.0000001 0 01111111 00000000000000000000001
1.0000002 0 01111111 00000000000000000000010
1.0000004 0 01111111 00000000000000000000011
1.0000005 0 01111111 00000000000000000000100
1.0000006 0 01111111 00000000000000000000101
1.0000007 0 01111111 00000000000000000000110
1.0000008 0 01111111 00000000000000000000111
1.000001 0 01111111 00000000000000000001000
1.0000011 0 01111111 00000000000000000001001
1.0000012 0 01111111 00000000000000000001010
1.9999988 0 01111111 11111111111111111110110
1.9999989 0 01111111 11111111111111111110111
1.999999 0 01111111 11111111111111111111000
1.9999992 0 01111111 11111111111111111111001
1.9999993 0 01111111 11111111111111111111010
1.9999994 0 01111111 11111111111111111111011
1.9999995 0 01111111 11111111111111111111100
1.9999996 0 01111111 11111111111111111111101
1.9999998 0 01111111 11111111111111111111110
1.9999999 0 01111111 11111111111111111111111
2 0 10000000 00000000000000000000000

Contradiction?

format long
realmax
ans = 1.797693134862316e+308
log2(ans)
ans = 1024
2^1024
ans = Inf

Explanation

This looks like a contradiction at first glance, since the largest exponent should be according the IEEE conventions. But realmax is the number with the largest possible exponent and with the semnificant f consisting of all ones:
format hex
realmax
ans = 7fefffffffffffff
Even though Matlab reports log2(realmax)=1024 , realmax does not equal , but rather ; taking the logarithm of realmax yields 1024 only because of rounding. Similar rounding effects would also occur for machine numbers that are a bit smaller than realmax .