CARVIEW |
Navigation Menu
RP2 Floating point accuracy #13075
Replies: 21 comments · 49 replies
-
when I was testing esp32 vs RP2040 for Floating point calculation for Kalman filter algorithm i found out on Rasppbery pi forum some info about speed overall it is poor. I found that RP2040 ( with cortex M0 ) do not have FPU and from SDK docs it should meet IEEE 754 standard., But from My tests precision are not as good as in FPU and errors expand fast (due to type of math operation) - in RP2040 floating-point arithmetic routines is based on interpolators. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Interesting - thanks for that. I wonder if there is a benchmark for precision? Re speed, I have the following results, all on standard clocks:
|
Beta Was this translation helpful? Give feedback.
All reactions
-
There's indeed a significant problem. void test(void) {
float value = 1;
int n = 1;
while (n <= 27) {
printf("%3d %33.30f\n", n, value);
value += 1.0 / (1<<n);
n++;
}
}
int main(void) {
test();
return 0;
} produces:
whereas #!/usr/bin/env python
# -*- coding: utf-8 -*-
from array import array
def test():
value = array('f', (1,))
n = 1
while n <= 27:
print(f'{n:3d} {value[0]:33.30f}')
value[0] += 1.0 / (1<<n)
n += 1
test() produces:
|
Beta Was this translation helpful? Give feedback.
All reactions
-
I would worry a little about this accuracy comparison, because the problem could be in the output formatter, too. You probably need to compare computed values to manually-constructed , exact values. These can be generated by assembling a mantissa and exponent with ldexp. Note that in normal python, the output formatting is almost certainly being done with IEEE754 doubles, upconverted from the 32 bit floats in your array. Note that the values you expect here should be readable by directly printing the floats as hex (get them as mem32 objects). With the suppressed 1 on IEEE754 formats, you should just see a bit walking across the hex representation, with all other bits zero. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch I forget to mention, that for any maths calculations where accuracy is needed, but platform calculation has some limitation I'm always looking for information in any Jet Propulsion Laboratory and articles/studies about maths calculation and implementation in early microprocessors (mostly code is in Fortran, but all steps and algorithms are explained and at usually goal is eliminating calculation errors. For example: Implementation of the Sun Position Calculation, |
Beta Was this translation helpful? Give feedback.
All reactions
-
Wow, I looked at the paper you cited. It is based on computing using an AM9511 FPU, which is the first FPU I did a lot of work on (circa 1978). I had it interfaced to my Apple II computer, and worked in a language (I helped write) called First, which was a compiled Forth-like language. The Am9511 had some odd foibles, including not using the suppressed '1' high bit in PDP-11 and modern IEEE754 floating point formats. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Two related discussions are here and here.
So the problem should be in 1, the internal FP routines.
Good question. I did your python version of precision test @GitHubsSilverBullet here on a Blackpill (stm32f411) board and it shows exactly the same results. I would interpret that in agreement with the above, that the basic FP calculations involved (addition, division by powers of 2) are the same as on the Pico and the FP representation is the same too. So, looking for the cause of the problem: Could the trigonometric computations differ? I seem to remember that for the FP math of the Pico a special library was acquired, that is extraordinarily fast. It is not the usual math lib of the GNU C compiler. I did not find this reference now, though. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
I ran the Python test code of @GitHubsSilverBullet on various MicroPython boards. At boards with 32 bit floats, whether hardware or software including the RP2, the results were those of the second result list, at boards with 64 bit floats (double), it returned the first result list. |
Beta Was this translation helpful? Give feedback.
All reactions
-
So here is the accuracy test I suggested, to make sure basic addition was accurate: import array
import uctypes
from machine import mem32
a = array.array('f', 1 for _ in range(32))
abase = uctypes.addressof(a)
for i in range(1,26):
a[i] = a[i-1] + 1 / (1 << i)
ai = mem32[abase + 4*i]
print(i, f"{a[i]:20.15f} {ai-0x3f800000:08x}") with result
which demonstrates that the internal math is working fine for this test case, but the output formatter is, indeed , limited by its own 32 bit precision |
Beta Was this translation helpful? Give feedback.
All reactions
-
The problem is not with the output formatter: my application produces as output integers being datetimes expressed in seconds. It can be reproduced by importing this module. You only need to look at the sun rise time of day 0. This is consistent within ~20s on all platforms except RP2, where the error is 5 minutes. The code has one function @jimmo I'm pretty confident that the trig functions are called with reasonable argument values (smaller than +-4Ο radians). |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch yes, I knew your problem with the actual calculation wasn't the output formatter; it was just the accuracy test that was not meaningful because of that. I strongly suspect your problem is in the range reduction of the trig functions, which the pico sdk warns is not done to the usual accuracy. You may have to find an almanac calculation based on a different epoch, so that there aren't too many cycles of trig to wrap around anywhere, or at least so that any range reduction explicitly built in to the calculation doesn't have too much loss of accuracy. Single precision really doesn't work too well for calendar stuff. |
Beta Was this translation helpful? Give feedback.
All reactions
-
As I said above, the difference between SP and DP is about 20s. Every SP platform I tested returned identical results (within my resolution of 1s) - albeit out by 20s relative to DP. This whether FP is done in hardware or software. The one exception was RP2. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch I went and looked at the module you posted. The 'quad' function is strongly subject to roundoff error. You should use the method in 'Numerical Recipes', in which you always compute the larger root in absolute magnitude (if b is negative, use -b + sqrt(disc), and if b is positive, use -b - sqrt(disc)), and then compute the other root as c/first root (since the a*c is the product of the roots). This is not susceptible to roundoff error. Here is a modified version of 'quad()' which is formally stable: def quad(ym, yz, yp):
# See Astronomy on the PC P48-49
# finds the parabola throuh the three points (-1,ym), (0,yz), (1, yp)
# and returns the values of x where the parabola crosses zero
# (roots of the quadratic)
# and the number of roots (0, 1 or 2) within the interval [-1, 1]
nz = 0
a = 0.5 * (ym + yp) - yz
b = 0.5 * (yp - ym)
c = yz
xe = -b / (2 * a)
ye = (a * xe + b) * xe + c
dis = b * b - 4.0 * a * c # discriminant of y=a*x^2 +bx +c
if dis > 0: # parabola has roots
if b < 0:
z2 = (-b + sqrt(dis)) / (2*a) # z2 is larger root in magnitude
else:
z2 = (-b - sqrt(dis)) / (2*a)
z1 = (c/a) / z2 # z1 is always closer to zero
if fabs(z1) <= 1.0:
nz += 1
if fabs(z2) <= 1.0:
nz += 1
if z1 < -1.0:
z1 = z2
return nz, z1, z2, ye I checked, though, that this isn't actually your problem. This gives exactly the same answers as your quad, so apparently it is never dealing with roots that result from subtracting nearly-equal numbers. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for that - I will adapt the code. In the vast majority of cases there is only one root and the numbers differ substantially. The tricky case is where the Sun or Moon is grazing the horizon (in arctic latitudes at noon or midnight). In the two hour interval two roots can occur and the numbers can be quite similar. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Data from this paper showing the errors in IEEE 754 single-precision format (binary32) computations relative to the least bit of resolution reveal that the precision quite often is not optimal. (It is much worse for more special functions btw.)
I think that a combination of mpmath on the PC (for computing exact values of sufficiently high resolution) together with belay (for transferring the results of the same computations on the RPi Pico board transparently to the PC) might allow for a reproduction of such data with relatively low effort. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch I will possibly upload a version of this in the afternoon that solves the roundoff problem. I think it is in the sin_alt routine, where a large number (51544.5) is subtracted from the complete time, including the hour. If that subtraction is done from the (integer, exact) mjd, and since 51544.5 also has an exact representation, one has a much smaller remaining number that one is adding the fractional hour to. This should gain at least a few bits of precision in the calculation of 't'.
I am not next to my pico right now to see if it helps.
Here is the relevant snippet (Not sure why this refuses to format... it is fine in preview):
```python
def lmstt(self, t):
# Takes the mjd and the longitude (west negative) and then returns
# the local sidereal time in hours. Im using Meeus formula 11.4
# instead of messing about with UTo and so on
# modified to use the pre-computed 't' value from sin_alt
d = t * 36525
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
return (lst % 360) / 15.0 + self.long / 15
def sin_alt(self, hour, func):
# Returns the sine of the altitude of the object (moon or sun)
# at an hour relative to the current date (mjd)
mjd0 = self.mjd + hour / 24.0
t0 = (mjd0 - 51544.5) / 36525.0
mjd1 = (self.mjd - 51544.5) + hour / 24.0
t1 = mjd1 / 36525.0
print(f"sin_alt mjd0={mjd0:.7f} t0={t0:.7f} mjd1={mjd1:.7f} t1={t1:.7f}")
dec, ra = func(t1)
# hour angle of object: one hour = 15 degrees. Note lmst() uses longitude
tau = 15.0 * (self.lmstt(t1) - ra)
# sin(alt) of object using the conversion formulas
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
return salt
```
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for that, I'll study it in detail tomorrow. That addresses one issue, improving the code. However the second issue is the observation that RP2 seems to have lower precision than other 32-bit platforms. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I have reduced the code to a much simpler script which produces a single floating point value. It eliminates arguably contentious issues like the interpolator, epoch and time dependence. Feel free to cut and paste this into other platforms: from math import sin, cos, sqrt, atan, radians
LAT = 53.29756504536339 # Local defaults
LONG = -2.102811634540558
def frac(x):
return x % 1
def minisun(t):
p2 = 6.283185307
coseps = 0.91748
sineps = 0.39778
M = p2 * frac(0.993133 + 99.997361 * t)
DL = 6893.0 * sin(M) + 72.0 * sin(2 * M)
L = p2 * frac(0.7859453 + M / p2 + (6191.2 * t + DL) / 1296000)
SL = sin(L)
X = cos(L)
Y = coseps * SL
Z = sineps * SL
RHO = sqrt(1 - Z * Z)
dec = (360.0 / p2) * atan(Z / RHO)
ra = (48.0 / p2) * atan(Y / (X + RHO))
if ra < 0:
ra += 24
return dec, ra
class RiSet:
def __init__(self, lat=LAT, long=LONG): # Local defaults
self.sglat = sin(radians(lat))
self.cglat = cos(radians(lat))
self.long = long
self.mjd = 60276 # MJD 28 Nov 2023
print("Value", self.rise())
def lmst(self, mjd):
d = mjd - 51544.5
t = d / 36525.0
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
return (lst % 360) / 15.0 + self.long / 15
def sin_alt(self, hour, func):
mjd = self.mjd + hour / 24.0
t = (mjd - 51544.5) / 36525.0
dec, ra = func(t)
tau = 15.0 * (self.lmst(mjd) - ra)
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
return salt
def rise(self):
sinho = sin(radians(-0.833))
yp = self.sin_alt(9, minisun) - sinho
return yp
r = RiSet() Results on everything I could lay my hands on: 64-bit FPUnix 0.1236665384827108 (presumed accurate) 32-bit FPPyboard 1.1 0.1252849 (+1.31%) OutliersESP8266 0.116815 (-5.5%) Interesting that ESP8266 is similarly challenged. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I could add: |
Beta Was this translation helpful? Give feedback.
All reactions
-
I guessed, that the Changing this function for a printout of deviations: def sin_alt(self, hour, func):
mjd = self.mjd + hour / 24.0
t = (mjd - 51544.5) / 36525.0
dec, ra = func(t)
dec_prec = -21.28489097872491
ra_prec = 16.264695874889682
print('dec, ra:', dec, ra, 'deviations (%):', 100*(dec-dec_prec)/dec_prec, 100*(ra-ra_prec)/ra_prec)
tau = 15.0 * (self.lmst(mjd) - ra)
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
tau_prec = -44.06301269320757
salt_prec = 0.10912845798021381
print('tau, salt:', tau, salt, 'deviations (%):', 100*(tau-tau_prec)/tau_prec, 100*(salt-salt_prec)/salt_prec)
return salt where the
on the Pico |
Beta Was this translation helpful? Give feedback.
All reactions
-
Following the same way it turned out that the lmst() function introduced the deviations: def lmst(self, mjd):
d = mjd - 51544.5
t = d / 36525.0
d_prec = 8731.875
t_prec = 0.23906570841889116
d = d_prec
t = t_prec
print('d, t:', d, t, 'deviations (%):', 100*(d-d_prec)/d_prec, 100*(t-t_prec)/t_prec)
# lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - 2.5833118057349522e-08* t * t * t
lst_prec = 3152362.0102370647
# lst = lst_prec
res = (lst % 360) / 15.0 + self.long / 15
res_pico = 13.40981
lst_prec = 3152362.0102370647
res_prec = 13.327161695342511
print('lst, res:', lst, res, 'deviations (%):', 100*(lst-lst_prec)/lst_prec, 100*(res-res_prec)/res_prec)
# return res_pico
return res With output on Pico: 0.6 % error, whereas it is only 0.12 % on the stm Blackpill. Setting starting values d and t to common values we find that the following two lines introduce the deviations between platforms: d = 8731.875
t = 0.23906571
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - 2.5833118057349522e-08* t * t * t
res = (lst % 360) / 15.0 + -2.102811634540558 / 15
res_prec = 13.327161695342511
print('res:', res, 'deviation (%):', 100*(res-res_prec)/res_prec) Blackpill (stm): |
Beta Was this translation helpful? Give feedback.
All reactions
-
So, I think I have at least cornered the numerical problem. In the lmst function,
is very subject to roundoff. For very tiny differences in the input, it gives wildly different ( a few degrees) results, which of course add up to minutes of time error. I will see if I can figure out a way to make it numerically much more stable in single precision. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Same finding here :-) |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch @rkompass I rewrote two routines to eliminate roundoff error and remove a large modular reduction. Try pasting these into the module (I've slightly cleaned it up since my first post): def lmstt(self, t):
# Takes the mjd and the longitude (west negative) and then returns
# the local sidereal time in hours. Im using Meeus formula 11.4
# instead of messing about with UTo and so on
# modified to use the pre-computed 't' value from sin_alt
d = t * 36525
df = frac(d)
c1 = 360
c2 = 0.98564736629
dsum = c1 * df + c2 * d #no large integer * 360 here
lst = 280.46061837 + dsum + 0.000387933 * t * t - t * t * t / 38710000
return (lst % 360) / 15.0 + self.long / 15
def sin_alt(self, hour, func):
# Returns the sine of the altitude of the object (moon or sun)
# at an hour relative to the current date (mjd)
mjd1 = (self.mjd - 51544.5) + hour / 24.0
t1 = mjd1 / 36525.0
#print(f"sin_alt mjd0={mjd0:.7f} t0={t0:.9f} mjd1={mjd1:.7f} t1={t1:.9f}")
dec, ra = func(t1)
# hour angle of object: one hour = 15 degrees. Note lmst() uses longitude
tau = 15.0 * (self.lmstt(t1) - ra)
# sin(alt) of object using the conversion formulas
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
return salt |
Beta Was this translation helpful? Give feedback.
All reactions
-
The RP2 Pico transfers floats wrongly into the internal representation:
gives this is a deviation of 4 in least significant bit units. Given the many constants in the code such errors may sum up. Edit:
|
Beta Was this translation helpful? Give feedback.
All reactions
-
I was suspicious that this could be a contribution, too. 4 LSB is a pretty big error. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I think this warrants an issue. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I'll raise an issue. The origins of the deviation of esp8266 and RP2 to stm boards are obscure to me. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@mendenm @rkompass I'm deeply impressed :) RP2 now produces 0.1236907, within 0.02% of the 64 bit result. PYBD is now within 0.003%. I had been led to believe that the source of the algorithms, "Astronomy on the Personal Computer" by Montenbruck and Pfleger, was authoritative. It's certainly full of maths, sunrise and set being the nursery slopes. Clearly there is room for improvement in the algorithms (including |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch It may well have worked much better when it was written, probably close to mjd 51544.5, since the 't' parameter would have been very close to zero at that point. That would eliminate the roundoff error in the modular reduction. Those equations should probably be updated to a more recent epoch. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I changed in and in but that does not improve any further. Still I think it's a reasonable change. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I'll implement those. I've incorporated your improvements to date in the actual code. The RP2040 and other 32-bit platforms produce rise and set times within 30s of the 64-bit results. This is (in my opinion) good enough for any likely microcontroller application. There is always the option of using a soft or hard 64-bit FPU. |
Beta Was this translation helpful? Give feedback.
All reactions
-
On further thought I'm unsure about the change to MJD. The modified Julian date is a known concept with wider significance: changing its value could cause confusion if anyone were to extend the module. |
Beta Was this translation helpful? Give feedback.
All reactions
-
If it turns out to be an RP2040 FP library issue (which it's increasingly looking like it may not be), the library used is Mark Owen's Qfplib-M0. Mark is extremely approachable. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you Steward. That was the library I remembered but couldn't find now. Still I see the str2float mentioned there (as an optinal external add-on) not included in the SDK. |
Beta Was this translation helpful? Give feedback.
All reactions
-
The issue seems to be the pico-sdk's implementation of powf... it doesn't handle 10**-11 correctly. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@dpgeorge Should the string-to-float conversion be modified to use an ipow routine (which is only a couple lines long, if it needs to be written out) instead of pow/powf? I suspect it would be both faster and more accurate. |
Beta Was this translation helpful? Give feedback.
All reactions
-
No, I don't think that's necessary. The point is that And because we try to optimise for size instead of speed (for non-critical things), it's best to reuse |
Beta Was this translation helpful? Give feedback.
All reactions
-
@dpgeorge OK, no argument on the appropriateness of fixing powf. ipow can be really tiny, but FP conversion speed probably isn't critical enough to be worth much space. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Realistically, you can't do much better than that under any conditions, since local topography can make a much bigger change than that! We live in a bowl near a lake, with hills all around us. Normal calculations are way off.
Incidentally, for a different project, which needed sunrise and sunset times, I used the 'astral' package:
https://astral.readthedocs.io/en/latest/
but that was on a regular RPi4 with double precision math. I have not evaluated it at reduced precision ever.
Edit: I just looked at astral. It has the same problem with not pre-reducing the integer part of the day count. It will give bad results on a single-precision machine, too. Of course, it was never written for that. However, I wonder if I should suggest the change, since it is harmless on normal machines, and an improvement on microcontrollers.
From astral.julian
```python
def geom_mean_long_sun(juliancentury: float) -> float:
"""Calculate the geometric mean longitude of the sun"""
l0 = 280.46646 + juliancentury * (36000.76983 + 0.0003032 * juliancentury)
return l0 % 360.0
```
Marcus
|
Beta Was this translation helpful? Give feedback.
All reactions
-
I had the feeling that the computations were still unnecessarily involved: from math import pi, sin, cos, sqrt, atan, radians
for i in range(20):
z = i/20
p2 = 2* pi # 6.283185307
rho = sqrt(1 - z*z)
dec = (360.0 / p2) * atan(z / rho)
sr = sin(radians(dec))
print('z, sin(radians(dec))', z, sr)
for i in range(20):
z = i/20
p2 = 2* pi # 6.283185307
rho = sqrt(1 - z*z)
dec = (360.0 / p2) * atan(z / rho)
sr = cos(radians(dec))
print('rho, cos(radians(dec))', rho, sr) demonstrate that the atan(z / rho) is not necessary at all, because it is fed into sin() and cos() (after a conversion from radians to degrees and back again) So dec (declination) is interesting for itself, but not needed for the computations? |
Beta Was this translation helpful? Give feedback.
All reactions
-
You've lost me there for two reasons. Firstly the testing that revealed the error was done with >>> round(1.5*3600)
5400
>>> round(1.75)
2
>>> round(1.33)
1 |
Beta Was this translation helpful? Give feedback.
All reactions
-
It could be that the simplification |
Beta Was this translation helpful? Give feedback.
All reactions
-
@peterhinch No, round() with no second argument returns a float, whose fractional part is zero! That is not an int. And then, when that float (even zero) is added to the time.time(), the whole thing gets upconverted to float. Try: |
Beta Was this translation helpful? Give feedback.
All reactions
-
|
Beta Was this translation helpful? Give feedback.
All reactions
-
@robert-hh @peterhinch OK, Robert, you're right. I am embarrassed. I thought round(1.5,0) would return the same as round(1.5), and just did a shortcut typing it. CPython does the same thing (real int if no sec ond argument), which I had never noticed. |
Beta Was this translation helpful? Give feedback.
All reactions
-
The esp8266 is off because it uses truncated 30-bit floating point numbers, via For rp2... I'm not sure, will need investigation. |
Beta Was this translation helpful? Give feedback.
All reactions
-
What a famous discussion! Thank you all guy! βΊ |
Beta Was this translation helpful? Give feedback.
All reactions
-
Nice you appreciate it. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I did a string to float function in python. This is to exclude errors from wrong constants on the RP2 Pico.
If you confirm that it works well: Should I post that to the issue too, as workaround? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am in the late stages of developing some code for calculating sun and moon rise and set times and moon phases. There is a much floating point maths going on so I set about benchmarking it on various platforms. To my surprise the computed results varied. There were expected small variations between 64-bit and 32-bit platforms: around 20s on today's sunrise time. But the RP2 had a more significant error of 5 minutes on all the calculated values.
The 32-bit platforms were in extremely close mutual agreement, whether FP is implemented in hardware or software. RP2 was the only outlier.
Is this a known issue?
Beta Was this translation helpful? Give feedback.
All reactions