There are several ways to convert integers to floating-point numbers. The most obvious one is using the native instruction of a target CPU, while the other ones include table lookups and bit manipulation trick, each having their own trade-offs. This article provides a comparison between these techniques.
Modern CPUs with a FPU usually have dedicated instructions for converting integers to floating-point numbers and forth. The examples include x86's
cvtsi2ss instruction. Although they might seem to be the optimal choice as they usually are, sometimes the other techniques described later may outperform the dedicated instructions.
The mapping from the integer representations of IEEE floating-point numbers to the represented numbers is a piecewise linear function. The bit manipulation technique utilizes this property to accomplish the task.
The bits of a single-precision floating-point number are laid out as follows:
The value represented is
If the exponent field is set to and the sign is positive, the represented value is: . Based on this, we can construct a conversion function, which would be something like the following in Rust:
If the input range must support the entirety of 32-bit integers, the upper part and the lower part of an input value could be converted separately and then added together. It appears that LLVM uses this technique in some cases.
Since the L1 data cache of a modern CPU has very high throughput, it makes a sense to use a lookup table for integer-to-FP conversion. As reading a lookup table is just a normal data access, it will compete with other memory accesses and put a strain on the memory system. It rapidly loses efficiency as the input range becomes large, so it's only applicable if the range is very small.
Unlike other techniques, an extra care must be taken to limit the input range when using applying this technique. Out-of-bounds accesses not only lead to an application crash, but may also cause serious security problems.
cvt2siss is not very fast in general. The following table based on Agner Fog's work1 shows the latency and reciprocal throughput of
cvt2siss with 32-bit operand:
u23_to_f32 function shown in the Bit manipulation section is compiled to the following x86_64 assembly:
orl $1258291200, %esi vmovd %esi, %xmm1 vaddss %xmm0, %xmm1, %xmm1
The following tables based on Agner Fog's work1 show the characteristics of these instructions:
Based on these tables, the performance characteristics of
u23_to_f32 can be estimated:
Comparing with the table for
cvt2siss, it can be said that the bit manipulation technique provides a superior or equivalent throughput in all of the microarchitectures shown here, albeit having a narrower input range. On the other hand, this technique requires one extra µops.
The assembly code of the table lookup technique consists of a single instruction:
vmovss (%rdx,%rbx,4), %xmm0
The load latency is 5 cycles and the reciprocal throughput is 0.5 according to Intel's document2, thus it theoretically provides the highest throughput.
Agner Fog, Software optimization resources.
Intel corporation, Intel® 64 and IA-32 Architectures Optimization Reference Manual.