# Converting Integers to Floats on x86

## February 10, 2019

There are several ways to convert integers to floating-point numbers. The most obvious one is using the native instruction of a target CPU, while the other ones include table lookups and bit manipulation trick, each having their own trade-offs. This article provides a comparison between these techniques.

# 🔗Techniques

## 🔗Hardware instructions

Modern CPUs with a FPU usually have dedicated instructions for converting integers to floating-point numbers and forth. The examples include x86's cvtsi2ss instruction. Although they might seem to be the optimal choice as they usually are, sometimes the other techniques described later may outperform the dedicated instructions.

## 🔗Bit manipulation

The mapping from the integer representations of IEEE floating-point numbers to the represented numbers is a piecewise linear function. The bit manipulation technique utilizes this property to accomplish the task.

The bits of a single-precision floating-point number are laid out as follows:

The value represented is $\text{value} = (-1)^\text{sign} \times 2^{\text{exponent}-127} \times \left( 1 + \text{fraction} \cdot 2^{-23} \right)$

If the exponent field is set to $127 + 23 = 150$ and the sign is positive, the represented value is: $2^{23} + \text{fraction}$ . Based on this, we can construct a conversion function, which would be something like the following in Rust:

fn u23_to_f32(x: u32) -> f32 {
assert!(x < 0x800000);
<f32>::from_bits(x | 0x4b000000) - 8388608.0
}


If the input range must support the entirety of 32-bit integers, the upper part and the lower part of an input value could be converted separately and then added together. It appears that LLVM uses this technique in some cases.

## 🔗Table lookups

Since the L1 data cache of a modern CPU has very high throughput, it makes a sense to use a lookup table for integer-to-FP conversion. As reading a lookup table is just a normal data access, it will compete with other memory accesses and put a strain on the memory system. It rapidly loses efficiency as the input range becomes large, so it's only applicable if the range is very small.

Unlike other techniques, an extra care must be taken to limit the input range when using applying this technique. Out-of-bounds accesses not only lead to an application crash, but may also cause serious security problems.

# 🔗Theoretical analysis

cvt2siss is not very fast in general. The following table based on Agner Fog's work1 shows the latency and reciprocal throughput of cvt2siss with 32-bit operand:

cvt2siss x,r32Sandy BridgeSkylakeRyzen
Latency468
Reciprocal throughput1.521
µops422

The u23_to_f32 function shown in the Bit manipulation section is compiled to the following x86_64 assembly:

orl     \$1258291200, %esi
vmovd   %esi, %xmm1


The following tables based on Agner Fog's work1 show the characteristics of these instructions:

or r,rSandy BridgeSkylakeRyzen
Latency111
Reciprocal throughput10.250.25
Portp015p0156
mov x,r32Sandy BridgeSkylakeRyzen
Latency123
Reciprocal throughput11
Portp015p5
addss x,xSandy BridgeSkylakeRyzen
Latency343
Reciprocal throughput10.50.5
Portp1p01p23

Based on these tables, the performance characteristics of u23_to_f32 can be estimated:

u23_to_f32Sandy BridgeSkylakeRyzen
Latency577
Reciprocal throughput111

Comparing with the table for cvt2siss, it can be said that the bit manipulation technique provides a superior or equivalent throughput in all of the microarchitectures shown here, albeit having a narrower input range. On the other hand, this technique requires one extra µops.

The assembly code of the table lookup technique consists of a single instruction:

vmovss  (%rdx,%rbx,4), %xmm0


The load latency is 5 cycles and the reciprocal throughput is 0.5 according to Intel's document2, thus it theoretically provides the highest throughput.