ieee-754 – Make Me Engineer

Are the bit patterns of NaNs really hardware-dependent?

June 10, 2023 by Tarik

This is what §2.3.2 of the JVM 7 spec has to say about it: The elements of the double value set are exactly the values that can be represented using the double floating-point format defined in the IEEE 754 standard, except that there is only one NaN value (IEEE 754 specifies 253-2 distinct NaN values). … Read more

How to subtract IEEE 754 numbers?

June 2, 2023 by Tarik

Really not any different than you do it with pencil and paper. Okay a little different 123400 – 5432 = 1.234*10^5 – 5.432*10^3 the bigger number dominates, shift the smaller number’s mantissa off into the bit bucket until the exponents match 1.234*10^5 – 0.05432*10^5 then perform the subtraction with the mantissas 1.234 – 0.05432 = … Read more

Converting Int to Float or Float to Int using Bitwise operations (software floating point)

June 1, 2023 by Tarik

First, a paper you should consider reading, if you want to understand floating point foibles better: “What Every Computer Scientist Should Know About Floating Point Arithmetic,” http://www.validlab.com/goldberg/paper.pdf And now to some meat. The following code is bare bones, and attempts to produce an IEEE-754 single precision float from an unsigned int in the range 0 … Read more

Extreme numerical values in floating-point precision in R

May 31, 2023 by Tarik

Uses for negative zero floating point value?

May 29, 2023 by Tarik

From Wikipedia: It is claimed that the inclusion of signed zero in IEEE 754 makes it much easier to achieve numerical accuracy in some critical problems[1], in particular when computing with complex elementary functions[2]. The first reference is “Branch Cuts for Complex Elementary Functions or Much Ado About Nothing’s Sign Bit” by W. Kahan, that … Read more

Is it safe to assume floating point is represented using IEEE754 floats in C?

May 29, 2023 by Tarik

Essentially all architectures in current non-punch-card use, including embedded architectures and exotic signal processing architectures, offer one of two floating point systems: IEEE-754. IEEE-754 except for blah. That is, they mostly implement 754, but cheap out on some of the more expensive and/or fiddly bits. The most common cheap-outs: Flushing denormals to zero. This invalidates … Read more

Status of __STDC_IEC_559__ with modern C compilers

May 27, 2023 by Tarik

I believe __STDC_IEC_559__ relies on some library features and can’t be defined solely by the compiler. See this post for some information. This is not uncommon for C — the compiler and the C library must sometimes cooperate in order to implement the entire standard. What you’re asking depends on the compiler. I think you … Read more

How computer does floating point arithmetic?

May 27, 2023 by Tarik

Check out the article on “What every computer scientist should know about floating point arithmetic”

Ranges of floating point datatype in C?

May 26, 2023 by Tarik

A 32 bit floating point number has 23 + 1 bits of mantissa and an 8 bit exponent (-126 to 127 is used though) so the largest number you can represent is: (1 + 1 / 2 + … 1 / (2 ^ 23)) * (2 ^ 127) = (2 ^ 23 + 2 ^ … Read more

Double vs float on the iPhone

May 25, 2023 by Tarik

The iPhone can do both single and double precision arithmetic in hardware. On the 1176 (original iPhone and iPhone3G), they operate at approximately the same speed, though you can fit more single-precision data in the caches. On the Cortex-A8 (iPhone3GS, iPhone4 and iPad), single-precision arithmetic is done on the NEON unit instead of VFP, and … Read more