Order can be preserved with variable byte integer encoding. Different implementations indicate optimisations in JavaScript.

A common way to encode unsigned integer is to have a sequence of bytes, with 7-bit data, and 1-bit length encoding per byte. The naive implementation usually does not preserve the integer order into the lexicographical order of the encoded bytes, but is possible.

Encoding:

bytes.append(number % 128)

number = number / 128

while number > 0:

number = number – 1

bytes.append(128 + (number % 128))

number = number / 128

bytes.reverse()

Decoding:

number = 0

while true:

byte = getNextByte()

number = number * 128 + (byte % 128)

if byte & 128:

number = number + 1

else:

break

The key elements to ensure that the lexicographical order of the binary strings preserve the order of the integers are:

- The binary string for an integer is unique by adding/subtracting 1 to the
*number*. This both adds slightly larger rang, and guarantees we do not have the case where 1000000 00000001 and 00000001 would both decode to 1, – in contrast to some variable byte integer encodings. - When being order-preserving, it is given that the most significant bit must be the one used for byte length encoding, and it must be 1 if there are more bytes to read and zero otherwise.

This is useful if you have a lexicographically ordered binary keys, and want to encode arbitrary unsigned integers where you do not know the size beforehand. The concrete case I was thinking about was efficient encoding of timestamps appended into a sorted data structure, without hitting the year 2038 bug.

I implemented a couple of versions in JavaScript with the following conclusions on v8:

- Implementation using typed arrays are 10-30x faster than using arrays (when optimised correctly).
- Optimising for only encoding 32bit integers, instead of up to 53 integer bits from a double, can yield about 30% performance.
- You need to benchmark each version individually, as the JIT otherwise can make the results incomparable, (even if you warm it up a bit).
- In JavaScript on V8 on a slow laptop, with above optimisations you can encode 1MB/s version, versus 40KB/s with the naïve version.
- GIST available here: https://gist.github.com/rasmuserik/47ce8e20e88f8443a79d