Skip to content
Tool

Kafka Partition & Retention Calculator

Size partitions for throughput and consumer parallelism, then see what retention actually costs per broker. Every formula is shown and cited — adjust an input and watch the advisories react.

Throughput

Peak produce rate the topic must absorb — size for peak, not average.

Measure yours; 10 is conservative.

Slow consumers raise the count.

For msg/s conversion. Broker default caps messages at ~1 MB.

Parallelism & growth

Each needs ≥1 partition.

Repartitioning remaps keys — provision ahead.

Retention & cluster

Kafka default: 7.

Production default: 3.

For per-broker budgets.

Result

Healthy
Recommended partitions
12
max(1 throughput-driven, 6 consumer-driven) × 2 growth
Partition replicas / broker
12
Messages / second
10,240
Ingest / day (pre-replication)
843.8 GB
Retained storage / broker
5.8 TB

12 partitions recommended — max(1 throughput-driven, 6 consumer-driven) × 2 growth. Retention at RF=3 costs ~17,719 GB across the cluster (5,906 GB/broker).

Operational advisories

  • Partition count is driven by consumer parallelism (6), not throughput (1). Each consumer in a group owns ≥1 partition exclusively — consumers beyond the partition count sit idle.
  • ~5,906 GB of retained data per broker. Broker replacement and partition reassignment times scale with on-disk state — consider tiered storage or shorter retention before adding disks.

How the math works

Partition count is the one Kafka decision that's genuinely hard to revisit, because the default partitioner maps messages to partitions with hash(key) % partition_count — change the count and existing keys land on different partitions. The sizing formula itself is small and comes from the canonical Confluent guidance written by a Kafka co-creator[Jun Rao, Confluent]:

throughput_driven = max( ⌈t / p⌉, ⌈t / c⌉ )
consumer_driven   = max consumers in one group
recommended       = ⌈ max(throughput_driven, consumer_driven) × growth ⌉

t = target peak throughput (MB/s)
p = measured per-partition produce capacity (MB/s)
c = measured per-partition consume capacity (MB/s)

Three things about this formula that the one-line version hides:

  • The consumer side is a hard floor, not a suggestion. Within a consumer group, each partition is owned by exactly one consumer. If you want 12-way parallel processing, you need ≥12 partitions — a 13th consumer sits idle no matter how fast the topic is.
  • p and c are measurements, not constants. Per-partition throughput depends on batching (linger.ms, batch.size), compression, acks, message size, and disk. The calculator defaults to a conservative 10 MB/s produce / 20 MB/s consume; benchmark your own cluster and replace them — that single measurement changes the answer more than anything else.
  • Growth factor exists because repartitioning breaks ordering. Doubling partitions on a keyed topic re-routes roughly half your keys mid-stream. If consumers hold per-key state (aggregations, joins, entity caches), provisioning 2× up front is far cheaper than the migration.

Retention is the bill nobody budgets

ingest_per_day  = t × 86,400 s
total_retained  = ingest_per_day × retention_days × replication_factor
per_broker      = total_retained / broker_count

The calculator's defaults make the point: a modest 10 MB/s topic with Kafka's default 7-day retention[Apache Kafka Docs] and replication factor 3 retains ~17.7 TB across the cluster. Throughput rarely sizes a Kafka cluster — retained state does. It also dominates operations: broker replacement and partition reassignment both move that on-disk state across the network.

Why partition count has a ceiling

Partitions aren't free. Each partition replica is an open file handle set, a leader that can need re-electing, and recovery work after an unclean shutdown. The long-standing operational guidance is to keep partition replicas per broker in the low thousands (~4,000)[Jun Rao, Confluent]. KRaft removed ZooKeeper's cluster-wide metadata bottleneck, but the per-broker costs — memory, file handles, recovery time — still scale with partition count, which is why the calculator's status chip keys off the per-broker number.

When this calculator is wrong

The model is deliberately simple. Real clusters diverge from it in known ways:

  • Compacted topicscleanup.policy=compact retains the latest record per key, not a time window. The retention math here doesn't apply; size by keyspace × average record size instead.
  • Tiered storage — brokers that offload segments to object storage break the per-broker-disk math (that's the point of the feature). The partition-count math still applies.
  • Hot keys — the formula assumes even distribution. One key producing 40% of traffic makes its partition the bottleneck regardless of the total count; fix the keying strategy, not the partition count.
  • Shared clusters — the per-broker budgets are cluster-wide. If this topic shares brokers with 200 others, sum the partitions and retained bytes across all of them before comparing against the guidance.
  • Peak vs average — the model sizes for the steady peak you enter. Traffic with sharp spikes (sales events, batch replays) needs the spike value in the throughput field, not the daily average.

Further reading

About this tool

This calculator is part of BackendBytes' reference tools collection. The math lives in an open, unit-tested source file — if you disagree with a constant, the methodology above tells you exactly which input to change.