Reducing K/V costs by “key packing”

Yair Morgenstern
5 min readJul 9, 2024

--

Recently at Forter, we evaluated a data migration to Aerospike, a popular K/V datastore with per-key memory requirements. Despite having many advantages, there was one large problem — it was expensive.

By changing the way we stored our data — storing multiple objects per key — we were able to shift the bottleneck from memory to storage, and thus reduce cluster costs by ~50%.

This technique is applicable to all datastores with a per-key memory cost, such as Garnet or Redis Enterprise’s Auto Tiering, when the cost bottleneck is memory usage.

“Where’s the money?” — Finding the cost bottleneck

As with all optimizations, we need to ensure we’re working on the right thing. In this case, we need to determine the cost bottleneck — the reason you’re not willing to pay less.

For servers this comes in 3 main flavors: CPU, Memory, and Storage.

In our use-case, we had

  • A lot of very small objects
    low Storage requirements
  • No queries, only get/set access required
    low CPU requirements
  • Many keys with a per-key memory cost
    high Memory requirements.

This imbalance meant we were paying for more CPU and Storage than we would use, because we needed the Memory.

To give a feeling of the scale discussed, we needed to store 25 billion keys, which — at 64 bytes per key with *2 replication — comes out at ~3TB of RAM.
At 100K writes/s and only 1K reads/s — for a 25B-object data store — that’s a lot of memory sitting around doing nothing most of the time.

So we have our cost bottleneck — memory usage. How can we reduce this?

The solution — Multiple objects per key

The first solution we considered was technology-specific — Aerospike’s All Flash mode. However, this would have required significant changes to our infrastructure and automation.

Instead, we opted for a more technology-agnostic approach:

  • By taking the number of objects (a known number since we’re migrating), and dividing by how many objects we want per key (see below), we get the total number of keys we will have in the cluster. This remains constant.
  • For each object, which has an object key, we also generate a record key, being `hash(object_key) % number_of_keys`.
  • Each bin that would hold a scalar type (number / string) instead becomes a map — where the object key is the map key.
  • To get/set the values for the object, we ask for the record according to the record key, and add a map operation asking for the value in the map according to the object key.

In abstract terms, we can think of this as a hashmap of hashmaps, where the outer hashmap retrieves the record and the inner one retrieves the key. Since the inner key retrieval occurs on the Aerospike cluster, we get the final result as if we did `map[record_key][object_key]`

This works with multiple fields as well, since we can send a list of map operations to be executed, and we will receive the final record with only the field values for our object.

Finding the map values is executed in-cluster, so the result we get will contain only our fields

How many objects per key?

This is extremely context-dependent — based on what data store, format, size, and transaction rate you have — so you’ll need to conduct your own tests for performance.

For us, we chose an arbitrary amount — 25 objects per key — and tested the performance at double that, to ensure “future-proofness”. At 1ms for p99 and 2ms for p99.9 for both reads and writes, at 10K TPS per node, we can confidently state performance for our use-case was not harmed.

Challenges

Not every use-case is suited for this optimization.

Even though queries on the scalar data types can be replicated with map indices, other functionality is not as easily transferable.

TTLs

In Aerospike, TTLs apply to an entire record, not to a bin, so this would require custom retention. Our solution was to add an “expiry date” timestamp to each object/record, with a secondary index on that, and run an external cron that periodically queries for — and removes — expired data.

Since this cron is external, this does not prevent you from retrieving data that has passed its expiry date.

Optimistic Concurrency + parallel updates

Data versioning in Aerospike is done via Generation metadata on the record level. Thus, an update to any object within the record would increase the Generation, so the “generation errors” you will get would rise proportionately to the number of objects per record.

Similarly, too many concurrent operations on a single record can lead to rejections (“hot key” in Aerospike), which will similarly scale linearly with the number of objects per record.

We found that simple retries for generation mismatch errors were enough to mitigate this factor.

For our example, at 25 billion objects, 1 billion keys, with 100,000 operations per second at 1ms 99p:

  • The naive math is that 100K ops/s at 1ms/op means around 100 concurrent ops/sec. At 1 billion keys, the chance of collision for each op is 100 / 1B = 1/10M, at 100K ops/s that means a 1/100 chance of collision per second, or one every 2.5 minutes.
  • Apparently the model for ‘hot key’ in Aerospike isn’t as simplistic as the above, because we got a ‘hot key’ error every 30s — still infrequent enough that retries were enough.

Elastic sizing

Having a constant number of keys means that if we want to change the amount of objects per record, we’ll need to rehash them. Operations can be reduced by using consistent hashing, but the migration — especially for the live reads/writes — is non-trivial.

This is a risk we were willing to take, because we did not find a performance-affecting limit on the number of objects per record.

Choosing the keyspace

As mentioned earlier, we need to choose a constant number of keys, which we choose by dividing the number of objects by the number of objects per key. This means you need to know the number of objects upfront — making this solution particularly suited to migrations on datastores with predictable growth, and particularly unsuited for services where order-of-magnitude growth is expected.

Conclusions

  • Determine your cost bottlenecks — what you’re actually paying for
  • Data stores are built to be multi-purpose — you can often optimize for your use-cases by trading away functionalities you won’t use
  • Server-side operations, such as Aerospike provides, together with specialized data storage, can allow optimizing data storage and costs within the datastore, while retaining a minimal code interface to the datastore.

--

--

Yair Morgenstern
Yair Morgenstern

Written by Yair Morgenstern

Creator of Unciv, an open-source multiplatform reimplementation of Civ V https://github.com/yairm210/Unciv

No responses yet