How many partitions does my Kafka topic need?

Take the larger of two numbers: partitions needed for throughput — max(target_throughput / per_partition_producer_throughput, target_throughput / per_partition_consumer_throughput) — and partitions needed for consumer parallelism (the largest consumer group you'll run, since each consumer in a group needs at least one partition). Then multiply by a growth factor of 2x or more, because adding partitions later remaps keys and breaks per-key ordering during the transition.

Can I add partitions to a Kafka topic later?

Yes mechanically, but with a real cost: Kafka's default partitioner assigns messages by hash(key) % partition_count, so changing the count sends existing keys to different partitions. Per-key ordering breaks across the boundary, and any consumer that relies on key-to-partition affinity (local state, joins) must rebuild. This is why the standard advice is to over-partition up front.

How much disk does Kafka retention use?

Retained bytes = ingest_rate × retention_period × replication_factor. At a modest 10 MB/s with Kafka's default 7-day retention and replication factor 3, that's roughly 17.7 TB across the cluster — retention, not throughput, is usually the dominant cost. Size brokers for the retained data and consider tiered storage when per-broker state grows past a few TB.

What replication factor should I use for Kafka topics?

Three, for anything you care about. Replication factor 1 has zero fault tolerance. Replication factor 2 with min.insync.replicas=2 cannot tolerate a single broker loss without halting producers that use acks=all. RF=3 with min.insync.replicas=2 survives one broker failure while still accepting writes, which is why it's the production default.

Tool

Kafka Partition & Retention Calculator

Size partitions for throughput and consumer parallelism, then see what retention actually costs per broker. Every formula is shown and cited — adjust an input and watch the advisories react.

Throughput

Target throughput (MB/s)

Peak produce rate the topic must absorb — size for peak, not average.

Per-partition produce (MB/s)

Measure yours; 10 is conservative.

Per-partition consume (MB/s)

Slow consumers raise the count.

Average message size (bytes)

For msg/s conversion. Broker default caps messages at ~1 MB.

Parallelism & growth

Max consumers in a group

Each needs ≥1 partition.

Growth factor

Repartitioning remaps keys — provision ahead.

Retention & cluster

Retention (days)

Kafka default: 7.

Replication factor

Production default: 3.

Brokers

For per-broker budgets.

Result

Healthy

Recommended partitions

max(1 throughput-driven, 6 consumer-driven) × 2 growth

Partition replicas / broker: 12
Messages / second: 10,240
Ingest / day (pre-replication): 843.8 GB
Retained storage / broker: 5.8 TB

12 partitions recommended — max(1 throughput-driven, 6 consumer-driven) × 2 growth. Retention at RF=3 costs ~17,719 GB across the cluster (5,906 GB/broker).

Operational advisories

Partition count is driven by consumer parallelism (6), not throughput (1). Each consumer in a group owns ≥1 partition exclusively — consumers beyond the partition count sit idle.
~5,906 GB of retained data per broker. Broker replacement and partition reassignment times scale with on-disk state — consider tiered storage or shorter retention before adding disks.

How the math works

Partition count is the one Kafka decision that's genuinely hard to revisit, because the default partitioner maps messages to partitions with hash(key) % partition_count — change the count and existing keys land on different partitions. The sizing formula itself is small and comes from the canonical Confluent guidance written by a Kafka co-creator^{[Jun Rao, Confluent]}:

throughput_driven = max( ⌈t / p⌉, ⌈t / c⌉ )
consumer_driven   = max consumers in one group
recommended       = ⌈ max(throughput_driven, consumer_driven) × growth ⌉

t = target peak throughput (MB/s)
p = measured per-partition produce capacity (MB/s)
c = measured per-partition consume capacity (MB/s)

Three things about this formula that the one-line version hides:

The consumer side is a hard floor, not a suggestion. Within a consumer group, each partition is owned by exactly one consumer. If you want 12-way parallel processing, you need ≥12 partitions — a 13th consumer sits idle no matter how fast the topic is.
p and c are measurements, not constants. Per-partition throughput depends on batching (linger.ms, batch.size), compression, acks, message size, and disk. The calculator defaults to a conservative 10 MB/s produce / 20 MB/s consume; benchmark your own cluster and replace them — that single measurement changes the answer more than anything else.
Growth factor exists because repartitioning breaks ordering. Doubling partitions on a keyed topic re-routes roughly half your keys mid-stream. If consumers hold per-key state (aggregations, joins, entity caches), provisioning 2× up front is far cheaper than the migration.

Retention is the bill nobody budgets

ingest_per_day  = t × 86,400 s
total_retained  = ingest_per_day × retention_days × replication_factor
per_broker      = total_retained / broker_count

The calculator's defaults make the point: a modest 10 MB/s topic with Kafka's default 7-day retention^{[Apache Kafka Docs]} and replication factor 3 retains ~17.7 TB across the cluster. Throughput rarely sizes a Kafka cluster — retained state does. It also dominates operations: broker replacement and partition reassignment both move that on-disk state across the network.

Why partition count has a ceiling

Partitions aren't free. Each partition replica is an open file handle set, a leader that can need re-electing, and recovery work after an unclean shutdown. The long-standing operational guidance is to keep partition replicas per broker in the low thousands (~4,000)^{[Jun Rao, Confluent]}. KRaft removed ZooKeeper's cluster-wide metadata bottleneck, but the per-broker costs — memory, file handles, recovery time — still scale with partition count, which is why the calculator's status chip keys off the per-broker number.

When this calculator is wrong

The model is deliberately simple. Real clusters diverge from it in known ways:

Compacted topics — cleanup.policy=compact retains the latest record per key, not a time window. The retention math here doesn't apply; size by keyspace × average record size instead.
Tiered storage — brokers that offload segments to object storage break the per-broker-disk math (that's the point of the feature). The partition-count math still applies.
Hot keys — the formula assumes even distribution. One key producing 40% of traffic makes its partition the bottleneck regardless of the total count; fix the keying strategy, not the partition count.
Shared clusters — the per-broker budgets are cluster-wide. If this topic shares brokers with 200 others, sum the partitions and retained bytes across all of them before comparing against the guidance.
Peak vs average — the model sizes for the steady peak you enter. Traffic with sharp spikes (sales events, batch replays) needs the spike value in the throughput field, not the daily average.

About this tool

This calculator is part of BackendBytes' reference tools collection. The math lives in an open, unit-tested source file — if you disagree with a constant, the methodology above tells you exactly which input to change.