Number Parsing at a Gigabyte per Second

01/11/2021
by   Daniel Lemire, et al.
0

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires variable-precision arithmetic. However, we need at most 17 digits to represent 64-bit standard floating-point numbers (IEEE 754). Thus we can represent the decimal significand with a single 64-bit word. By combining the significand and precomputed tables, we can compute the nearest floating-point number using as few as one or two 64-bit multiplications. Our implementation can be several times faster than conventional functions present in standard C libraries on modern 64-bit systems (Intel, AMD, ARM and POWER9). Our work is available as open source software used by major systems such as Apache Arrow and Yandex ClickHouse. The Go standard library has adopted a version of our approach.

READ FULL TEXT
research
12/13/2022

Fast Number Parsing Without Fallback

In recent work, Lemire (2021) presented a fast algorithm to convert numb...
research
02/27/2018

Reproducible Floating-Point Aggregation in RDBMSs

Industry-grade database systems are expected to produce the same result ...
research
08/27/2023

Accurate complex Jacobi rotations

This note shows how to compute, to high relative accuracy under mild ass...
research
10/18/2019

The Bitwise Hashing Trick for Personalized Search

Many real world problems require fast and efficient lexical comparison o...
research
08/20/2017

Conversion of Mersenne Twister to double-precision floating-point numbers

The 32-bit Mersenne Twister generator MT19937 is a widely used random nu...
research
10/10/2018

Generalized Ziggurat Algorithm for Unimodal and Unbounded Probability Density Functions with Zest

We present a modified Ziggurat algorithm that could generate a random nu...
research
11/14/2021

Unicode at Gigabytes per Second

We often represent text using Unicode formats (UTF-8 and UTF-16). The UT...

Please sign up or login with your details

Forgot password? Click here to reset