A very simple solution is to use a decent table-driven approximation. You don’t actually need a lot of data if you reduce your inputs correctly. exp(a)==exp(a/2)*exp(a/2)
, which means you really only need to calculate exp(x)
for 1 < x < 2
. Over that range, a runga-kutta approximation would give reasonable results with ~16 entries IIRC.
Similarly, sqrt(a) == 2 * sqrt(a/4) == sqrt(4*a) / 2
which means you need only table entries for 1 < a < 4
. Log(a) is a bit harder: log(a) == 1 + log(a/e)
. This is a rather slow iteration, but log(1024) is only 6.9 so you won’t have many iterations.
You’d use a similar “integer-first” algorithm for pow: pow(x,y)==pow(x, floor(y)) * pow(x, frac(y))
. This works because pow(double, int)
is trivial (divide and conquer).
[edit] For the integral component of log(a)
, it may be useful to store a table 1, e, e^2, e^3, e^4, e^5, e^6, e^7
so you can reduce log(a) == n + log(a/e^n)
by a simple hardcoded binary search of a in that table. The improvement from 7 to 3 steps isn’t so big, but it means you only have to divide once by e^n
instead of n
times by e
.
[edit 2]
And for that last log(a/e^n)
term, you can use log(a/e^n) = log((a/e^n)^8)/8
– each iteration produces 3 more bits by table lookup. That keeps your code and table size small. This is typically code for embedded systems, and they don’t have large caches.
[edit 3]
That’s still not to smart on my side. log(a) = log(2) + log(a/2)
. You can just store the fixed-point value log2=0.6931471805599
, count the number of leading zeroes, shift a
into the range used for your lookup table, and multiply that shift (integer) by the fixed-point constant log2
. Can be as low as 3 instructions.
Using e
for the reduction step just gives you a “nice” log(e)=1.0
constant but that’s false optimization. 0.6931471805599 is just as good a constant as 1.0; both are 32 bits constants in 10.22 fixed point. Using 2 as the constant for range reduction allows you to use a bit shift for a division.
[edit 5]
And since you’re storing it in Q10.22, you can better store log(65536)=11.09035488. (16 x log(2)). The “x16” means that we’ve got 4 more bits of precision available.
You still get the trick from edit 2, log(a/2^n) = log((a/2^n)^8)/8
. Basically, this gets you a result (a + b/8 + c/64 + d/512) * 0.6931471805599
– with b,c,d in the range [0,7]. a.bcd
really is an octal number. Not a surprise since we used 8 as the power. (The trick works equally well with power 2, 4 or 16.)
[edit 4]
Still had an open end. pow(x, frac(y)
is just pow(sqrt(x), 2 * frac(y))
and we have a decent 1/sqrt(x)
. That gives us the far more efficient approach. Say frac(y)=0.101
binary, i.e. 1/2 plus 1/8. Then that means x^0.101
is (x^1/2 * x^1/8)
. But x^1/2
is just sqrt(x)
and x^1/8
is (sqrt(sqrt(sqrt(x)))
. Saving one more operation, Newton-Raphson NR(x)
gives us 1/sqrt(x)
so we calculate 1.0/(NR(x)*NR((NR(NR(x)))
. We only invert the end result, don’t use the sqrt function directly.