PolymurHash is a 64-bit universal hash function designed for use in hash tables. It has a couple desirable properties:

  • It is mathematically proven to have a statistically low collision rate. When initialized with an independently chosen random seed, for any distinct pair of inputs m and m' of up to n bytes the probability that h(m) = h(m') is at most n * 2^-60.2. This is known as an almost-universal hash function. In fact PolymurHash has a stronger property: it is almost pairwise independent. For any two distinct inputs m and m' the probability they hash to the pair of specific 64-bit hash values (x, y) is at most n * 2^-124.2.
  • It is very fast for short inputs, while being no slouch for longer inputs. On an Apple M1 machine it can hash any input <= 49 bytes in 21 cycles, and processes 33.3 GiB/sec (11.6 bytes / cycle) for long inputs.
  • It is cross-platform, using no extended instruction sets such as CLMUL or AES-NI. For good speed it only requires native 64 x 64 -> 128 bit multiplication, which almost all 64-bit processors have.
  • It is small in code size and space. Ignoring cross-platform compatibility definitions, the hash function and initialization procedure is just over 100 lines of C code combined. The initialized hash function uses 32 bytes of memory to store its parameters, and it uses only a small constant amount of stack memory when computing a hash.