Redis Memory Sizer
Estimate what your keyspace actually costs: per-key overhead, encoding cliffs, fragmentation, fork headroom, replicas. An honest ±30% planning number — the page tells you how to measure the real one.
Dataset shape
The dominant top-level type. Mixed workloads: run the sizer per type and add.
Top-level keys (DBSIZE).
e.g. "session:8f3a…" ≈ 40 B.
≤44 B embeds in the object (embstr).
EXPIRE adds an expires-dict entry per key.
Deployment
Full copy each.
INFO memory; 1.2–1.5 typical.
Fork COW + growth; ≥30.
Estimate
Fits comfortably- Dataset (≈ used_memory)
- 404 MB
- Expected RSS (× fragmentation)
- 505 MB
- Per key, all overheads in
- 424 B
- Modelled encoding
- n/a
~424 B per key (string) × 1,000,000 keys ≈ 0.39 GB dataset. With 1.25× fragmentation and 30% headroom you need ~0.71 GB per instance, 1.41 GB across the fleet. Estimate only (±30%) — sample real keys with MEMORY USAGE.
Operational advisories
- Each of the 1 replica(s) holds a full dataset copy plus replication buffers — the fleet figure includes them, your cloud bill will too.
- If this is a cache, set maxmemory near the dataset estimate and pick an eviction policy (allkeys-lru is the usual default). Out of the box Redis uses noeviction and returns write errors when full.
How the math works — and how wrong it can be
Unlike Postgres connections or Kafka partitions, Redis memory has no published closed-form formula. Real usage depends on jemalloc size classes, encoding transitions, and version. What Redis does document is the structure[Redis memory optimization]: every key carries dictionary-entry and object-header bookkeeping, small collections live in compact listpacks, and large ones convert to pointer-heavy encodings. The sizer models exactly that, with every constant visible:
per_key = 64 B bookkeeping (dict entry + robj + SDS headers)
+ key name bytes
+ 48 B if the key has a TTL (expires-dict entry)
+ value cost
value cost:
string value ≤ 44 B → embedded (embstr), free
value > 44 B → value + 16 B (raw SDS)
collection listpack: elements × (11 B + element)
hashtable: elements × (48–80 B + element)
dataset = keys × per_key (≈ used_memory — what maxmemory limits)
rss = dataset × fragmentation (jemalloc: typically 1.2–1.5)
instance = rss / (1 − headroom%) (fork copy-on-write envelope)
fleet = instance × (1 + replicas)These constants are estimates, and the output is a ±30% planning number. The authoritative answer comes from your own server:
MEMORY USAGE key [SAMPLES n]— actual bytes for a real key, overheads included. Sample a few hundred representative keys and multiply.INFO memory—used_memoryvsused_memory_rssgives your real fragmentation ratio; stop guessing at 1.25.DEBUG OBJECT/OBJECT ENCODING— confirms which encoding your collections actually use.
The two cliffs worth knowing by heart
- The encoding cliff. A hash with 128 short fields is one contiguous listpack — cheap. At 129 fields (or one field over 64 B) the whole key converts to a hashtable where every field is a separate allocation with its own dict entry[Redis memory optimization]. Same data, several times the memory. The thresholds (
hash-max-listpack-entries,hash-max-listpack-value, and the set/zset equivalents) are tunable — teams with object-cache workloads routinely raise them after measuring. - The fork cliff.
BGSAVE, AOF rewrite, and replica full-sync all fork. Copy-on-write means the child shares pages until the parent writes to them — so under heavy write load, memory transiently grows toward 2×. Instances sized to fit the dataset exactly meet the OOM killer during their first background save.
Why small values have embarrassing overhead
The fixed ~112 B of per-key bookkeeping (64 B + 48 B TTL) doesn't care how small your value is. A 40-byte session token with a TTL costs ~190 B — the metadata outweighs the data 3:1. This is why the standard space optimisation is aggregating many small keys into hashes that stay listpack-encoded: the per-key overhead is paid once per hash instead of once per item[Redis memory optimization].
When this sizer is wrong
- Modules — RedisJSON, RediSearch, Bloom filters, and vector indexes have their own memory models entirely. This sizer covers core data types only.
- Redis Cluster — add per-slot bookkeeping and key-to-slot metadata; small per node, real at scale. Size each shard with this model, then verify on one shard before multiplying by sixteen.
- Client and replication buffers — a slow replica or a
SUBSCRIBEfan-out with lagging consumers can hold gigabytes in output buffers. That's workload, not keyspace; the sizer can't see it. - Version drift — encodings improved repeatedly (ziplist→listpack, embedded entries). Numbers here reflect current stable defaults; a 6.x cluster differs in the details.
- Mixed workloads — one dominant type is assumed. Run the sizer per type and sum, or just sample with
MEMORY USAGEper logical keyspace prefix.
Further reading
- Redis — Memory Optimization — the authoritative page behind this tool: encoding thresholds, small-aggregate tricks, and measurement guidance.
- Scaling Redis for High Throughput — our deep dive on hot keys, pipelining, Lua, and when to reach for Cluster.
- Caching Strategies at Scale — stampede protection and consistency patterns for the cache this sizer is probably sizing.
About this tool
This sizer is part of BackendBytes' reference tools collection. The model constants live in an open, unit-tested source file — when your MEMORY USAGE samples disagree with the estimate, trust the samples and tell us where the model drifted.