How do I estimate Redis memory usage?

Model each key as overhead + key name + value cost: roughly 64 bytes of dict/object bookkeeping per key, plus ~48 bytes if the key has a TTL, plus the value. For collections, add a per-element overhead that depends on encoding — about 11 bytes per element while listpack-encoded (small collections), jumping to 48-80 bytes per element once converted to hashtable or skiplist encodings. Multiply by key count, then by your fragmentation ratio (typically 1.2-1.5), then add 30%+ headroom for fork copy-on-write. Treat the result as ±30% and verify by sampling real keys with MEMORY USAGE.

Why does Redis use more memory than my data size?

Three multipliers stack on top of raw data: per-key bookkeeping (dict entry, object header, SDS string headers — significant when values are small), allocator fragmentation (jemalloc rounds allocations into size classes; RSS is typically 1.2-1.5x used_memory), and fork copy-on-write during RDB saves and AOF rewrites, which can transiently approach 2x under heavy writes. A dataset of 10 GB of raw values routinely needs a 20+ GB instance.

What are listpack and hashtable encodings in Redis?

Small collections (by default ≤128 entries with elements ≤64 bytes) are stored as listpacks — single contiguous buffers with ~11 bytes of bookkeeping per element. Past those thresholds, Redis converts the key to its big encoding (hashtable for hashes/sets, skiplist+dict for sorted sets) where each element costs 48-80 bytes of structure overhead. Crossing the threshold on a large keyspace can jump memory several-fold, which is why the thresholds (hash-max-listpack-entries and friends) are tunable.

How much memory headroom does Redis need?

Plan at least 30% above expected RSS. RDB saves and AOF rewrites fork the process; under write load, copy-on-write duplicates touched pages and can transiently approach 2x in the worst case. Replica full-syncs trigger the same fork. If the instance is also the page-cache host for AOF files, leave more. The alternative is diskless replication and persistence off the hot path — but plan that explicitly rather than discovering the OOM killer.

Tool

Redis Memory Sizer

Estimate what your keyspace actually costs: per-key overhead, encoding cliffs, fragmentation, fork headroom, replicas. An honest ±30% planning number — the page tells you how to measure the real one.

Dataset shape

Data type

The dominant top-level type. Mixed workloads: run the sizer per type and add.

Number of keys

Top-level keys (DBSIZE).

Avg key name (bytes)

e.g. "session:8f3a…" ≈ 40 B.

Avg value size (bytes)

≤44 B embeds in the object (embstr).

Keys carry TTLs

EXPIRE adds an expires-dict entry per key.

Deployment

Replicas

Full copy each.

Fragmentation ratio

INFO memory; 1.2–1.5 typical.

Headroom (%)

Fork COW + growth; ≥30.

Estimate

Fits comfortably

Memory per instance (RSS + headroom)

722 MB

1.41 GB across the fleet with 1 replica

Dataset (≈ used_memory): 404 MB
Expected RSS (× fragmentation): 505 MB
Per key, all overheads in: 424 B
Modelled encoding: n/a

~424 B per key (string) × 1,000,000 keys ≈ 0.39 GB dataset. With 1.25× fragmentation and 30% headroom you need ~0.71 GB per instance, 1.41 GB across the fleet. Estimate only (±30%) — sample real keys with MEMORY USAGE.

Operational advisories

Each of the 1 replica(s) holds a full dataset copy plus replication buffers — the fleet figure includes them, your cloud bill will too.
If this is a cache, set maxmemory near the dataset estimate and pick an eviction policy (allkeys-lru is the usual default). Out of the box Redis uses noeviction and returns write errors when full.

How the math works — and how wrong it can be

Unlike Postgres connections or Kafka partitions, Redis memory has no published closed-form formula. Real usage depends on jemalloc size classes, encoding transitions, and version. What Redis does document is the structure^{[Redis memory optimization]}: every key carries dictionary-entry and object-header bookkeeping, small collections live in compact listpacks, and large ones convert to pointer-heavy encodings. The sizer models exactly that, with every constant visible:

per_key  = 64 B bookkeeping              (dict entry + robj + SDS headers)
         + key name bytes
         + 48 B if the key has a TTL    (expires-dict entry)
         + value cost

value cost:
  string      value ≤ 44 B → embedded (embstr), free
              value > 44 B → value + 16 B (raw SDS)
  collection  listpack:  elements × (11 B + element)
              hashtable: elements × (48–80 B + element)

dataset  = keys × per_key            (≈ used_memory — what maxmemory limits)
rss      = dataset × fragmentation   (jemalloc: typically 1.2–1.5)
instance = rss / (1 − headroom%)     (fork copy-on-write envelope)
fleet    = instance × (1 + replicas)

These constants are estimates, and the output is a ±30% planning number. The authoritative answer comes from your own server:

MEMORY USAGE key [SAMPLES n] — actual bytes for a real key, overheads included. Sample a few hundred representative keys and multiply.
INFO memory — used_memory vs used_memory_rss gives your real fragmentation ratio; stop guessing at 1.25.
DEBUG OBJECT / OBJECT ENCODING — confirms which encoding your collections actually use.

The two cliffs worth knowing by heart

The encoding cliff. A hash with 128 short fields is one contiguous listpack — cheap. At 129 fields (or one field over 64 B) the whole key converts to a hashtable where every field is a separate allocation with its own dict entry^{[Redis memory optimization]}. Same data, several times the memory. The thresholds (hash-max-listpack-entries, hash-max-listpack-value, and the set/zset equivalents) are tunable — teams with object-cache workloads routinely raise them after measuring.
The fork cliff. BGSAVE, AOF rewrite, and replica full-sync all fork. Copy-on-write means the child shares pages until the parent writes to them — so under heavy write load, memory transiently grows toward 2×. Instances sized to fit the dataset exactly meet the OOM killer during their first background save.

Why small values have embarrassing overhead

The fixed ~112 B of per-key bookkeeping (64 B + 48 B TTL) doesn't care how small your value is. A 40-byte session token with a TTL costs ~190 B — the metadata outweighs the data 3:1. This is why the standard space optimisation is aggregating many small keys into hashes that stay listpack-encoded: the per-key overhead is paid once per hash instead of once per item^{[Redis memory optimization]}.

When this sizer is wrong

Modules — RedisJSON, RediSearch, Bloom filters, and vector indexes have their own memory models entirely. This sizer covers core data types only.
Redis Cluster — add per-slot bookkeeping and key-to-slot metadata; small per node, real at scale. Size each shard with this model, then verify on one shard before multiplying by sixteen.
Client and replication buffers — a slow replica or a SUBSCRIBE fan-out with lagging consumers can hold gigabytes in output buffers. That's workload, not keyspace; the sizer can't see it.
Version drift — encodings improved repeatedly (ziplist→listpack, embedded entries). Numbers here reflect current stable defaults; a 6.x cluster differs in the details.
Mixed workloads — one dominant type is assumed. Run the sizer per type and sum, or just sample with MEMORY USAGE per logical keyspace prefix.

About this tool

This sizer is part of BackendBytes' reference tools collection. The model constants live in an open, unit-tested source file — when your MEMORY USAGE samples disagree with the estimate, trust the samples and tell us where the model drifted.