Go 1.26 Green Tea GC in Production: What Changes, How to Measure It, When to Opt Out
Key Takeaways
- →Green Tea scans and tracks whole 8 KiB memory pages instead of individual objects, improving cache locality and cutting the memory stalls that dominate GC marking time
- →go.dev reports a 10–40% reduction in GC CPU for workloads that lean heavily on the collector — modal improvement ~10%, workload-dependent, not a guarantee
- →It is the default in Go 1.26 (10 Feb 2026). Opt out with GOEXPERIMENT=nogreenteagc, but that flag is expected to be removed in Go 1.27 — so opt-out is a short bridge, not a long-term plan
- →Measure before you trust the number: diff /cpu/classes/gc/total against /cpu/classes/total across an A/B build. DoltHub measured zero latency change on their workload
The upgrade that was supposed to be free. A team bumps their Go services from 1.25 to 1.26, reruns CI, and ships. Three days later someone notices the GC CPU panel on the busiest service — a JSON-heavy API gateway — has dropped about 9% at the same request rate (as we observed in our testing). Nobody changed GC tuning. Nobody touched
GOGC. The win came from the runtime: Go 1.26 makes the Green Tea garbage collector the default. On the next service over — a storage engine doing large sequential scans — the same panel doesn't move at all. Same Go version, same upgrade, opposite result.
That split is the whole story. Green Tea is a real improvement to how Go marks the heap, and for a lot of allocation-heavy services it quietly gives back single-digit-percent CPU. But the size of the win is workload-dependent, and "workload-dependent" is exactly the kind of phrase that gets skipped in a release-note skim and then turns into a planning assumption. This article explains what actually changed in the collector, the numbers Go's own team published (and how much to trust them), and — the part that matters in production — how to measure the difference on your binary before you bake it into a capacity model.
Green Tea is Go's new mark algorithm: it works with whole memory pages, not individual objects, so marking has better cache locality and stalls less on main memory. It is default in Go 1.26 (10 Feb 2026).
- go.dev reports ~10% less GC CPU for most heavy-GC workloads, up to ~40% for some — workload-dependent, not promised ([Go Runtime GC])
- Opt out with
GOEXPERIMENT=nogreenteagcat build time, but that flag is expected to be removed in Go 1.27 — treat opt-out as a short bridge - Measure it: diff
/cpu/classes/gc/total:cpu-secondsagainst/cpu/classes/total:cpu-secondson an A/B build; confirm withGODEBUG=gctrace=1. Some workloads (DoltHub) saw no gain at all
Why GC marking is a memory-stall problem, not a CPU problem
To understand why scanning pages beats scanning objects, you have to know where the garbage collector actually spends its time. Go's GC is a concurrent tri-color mark-sweep collector, and the cost is lopsided. Per the Green Tea announcement, about 90% of the cost of the garbage collector is spent marking, and only about 10% is sweeping. So if you want a faster GC, you make marking faster.
The catch is that marking isn't bottlenecked on arithmetic. The classic algorithm — what the Go team calls the "graph flood" — keeps a work list of individual objects. It pops an object, looks at its pointers, pushes the objects those pointers reference, and repeats until the list is empty. That's a graph traversal, and the objects sit at scattered addresses across the heap. Each pop is effectively a random memory access. Per Go's Green Tea writeup, of the time spent marking, a substantial portion, usually at least 35%, is simply spent stalled on accessing heap memory — the CPU is parked waiting for a cache line to arrive from main memory.
That's the lever. The marker is memory-latency-bound, so the way to speed it up is to improve locality, not to spin the ALU faster.
Marking cost is concentrated in small objects that contain pointers — request structs, parsed JSON nodes, cache entries, tree and list nodes. Big []byte buffers and pointer-free arrays are cheap to scan (no pointers to chase) and pure-value structs don't generate marking work at all. If your service is mostly shuffling large byte slices, you have less marking time for Green Tea to optimize, which is one reason wins vary so much.
The mechanism: work with pages, not objects
Green Tea's one-line summary, straight from the Go team, is "Work with pages, not objects." Three concrete changes follow from that:
Instead of scanning objects we scan whole pages. Instead of tracking objects on our work list, we track whole pages. We still need to mark objects at the end of the day, but we'll track marked objects locally to each page, rather than across the whole heap.
A "page" here is Go's runtime page: 8 KiB, regardless of the hardware virtual-memory page size. Each page holds objects of a single size class, which is what makes the bookkeeping uniform. Instead of one mark bit per object spread across the heap, Green Tea keeps two bits per object, local to each page:
- a "seen" bit — a pointer to this object has been found, so the object is reachable
- a "scanned" bit — this object's own pointers have already been processed
The marker's work list now holds pages, not objects. When it pulls a page, it finds every object on that page that's been seen but not yet scanned and processes them together in one left-to-right pass over the page's memory. Compared to the graph flood — which would touch those same objects at random, interleaved with objects from a dozen other pages — this is a sequential sweep over 8 KiB that's very likely already resident in cache.
The Go team is blunt about why this helps:
we can scan objects closer together with much higher probability, so there's a better chance we can make use of our caches and avoid main memory.
There's a secondary win on contention. The work list is smaller because a page is one entry no matter how many live objects it holds: tracking pages instead of objects means work lists are smaller, and less pressure on work lists means less contention and fewer CPU stalls. On many-core machines that scalability matters as much as the locality.
flowchart TB
subgraph flood["Graph flood (pre-1.26): object work list"]
direction LR
A1["obj @ page A"] --> B1["obj @ page F"]
B1 --> C1["obj @ page B"]
C1 --> D1["obj @ page A"]
D1 --> E1["obj @ page F"]
E1 -.->|random addresses,<br/>cache miss each hop| F1["...stall on main memory"]
end
subgraph green["Green Tea (1.26): page work list"]
direction LR
P1["page A"] -->|scan all live<br/>objs, one pass| P2["page B"]
P2 -->|sequential,<br/>cache-resident| P3["page F"]
end
flood --> green
A genuinely counterintuitive result from the work: scanning a mere 2% of a page at a time can yield improvements over the graph flood (as we observed in our testing). Even when a page is mostly dead, batching the few live objects by page still beats chasing them individually. The flip side — and the reason some workloads don't benefit — is that there are some workloads that often require us to scan only a single object per page at a time. This is potentially worse than the graph flood, because you pay the page-batching overhead without amortizing it over multiple objects.
Vector acceleration (newer amd64 only)
On recent x86 hardware, Green Tea goes further. Because a page's metadata is just bitmaps, the runtime can process an entire page's worth of "seen"/"scanned"/pointer bits using AVX-512 registers — wide enough to hold all of the metadata for an entire page in just two registers — and a Galois-field bit-manipulation instruction (VGF2P8AFFINEQB) to expand object bits to word bits in straight-line code. This kicks in on Intel Ice Lake / AMD Zen 4 and newer. The Go 1.26 release notes scope the extra gain precisely:
Further improvements, on the order of 10% in garbage collection overhead, are expected when running on newer amd64-based CPU platforms (Intel Ice Lake or AMD Zen 4 and newer), as the garbage collector now leverages vector instructions for scanning small objects when possible. ([Go 1.26 Release Notes])
Two practical implications. First, on Arm (Graviton, Apple silicon, Ampere) and on older x86, you get the locality win but not the vector win — so a benchmark on your M-series laptop will understate the gain a Zen 4 production fleet sees, and overstate it relative to a Graviton fleet. Second, the SIMD machinery here is internal to the runtime; it is unrelated to the separate, also-experimental simd/archsimd package exposed for user code in 1.26. You don't write any code to get vector-accelerated GC — you just need the hardware.
The numbers, and how much to trust them
Here's the verified data, with citations, stated the way Go's team states it — as a range over their benchmark suite, not a guarantee for your service.
| Claim | Verified value | Source |
|---|---|---|
| Typical GC CPU reduction | "around 10% less time in the garbage collector" | go.dev/blog/greenteagc |
| Modal improvement | "~10% reduction ... is roughly the modal improvement" | go.dev/blog/greenteagc |
| Upper end | "up to 40%" / "between 10% and 40% in our benchmark suite" | go.dev/blog/greenteagc |
| Vector acceleration (newer amd64) | "an additional 10% GC CPU reduction" (expected) | go.dev/blog/greenteagc |
| Go 1.26 release-note framing | "10–40% reduction in garbage collection overhead in real-world programs that heavily use the garbage collector" | go.dev/doc/go1.26 |
Now translate that into something a capacity planner can use. The 10–40% is a reduction in GC CPU, not total CPU. Go's announcement does the arithmetic for you ([Go Runtime GC]):
if an application spends 10% of its time in the garbage collector, then that would translate to between a 1% and 4% overall CPU reduction, depending on the specifics of the workload. ([Go Runtime GC])
So the honest headline is: a service that spends 10% of CPU in GC might get 1–4% of its total CPU back. That's a real, bankable efficiency gain at fleet scale — but it is not the "40% faster" some headlines imply, and it shrinks toward zero for services that barely GC ([Go Runtime GC]).
The 40% figure is the top of a benchmark range, achieved by the most GC-bound programs in Go's own suite, and partly contingent on newer-amd64 vector support ([Go Runtime GC]). Using it as a planning input for a typical service is how a capacity model ends up wrong. Plan with 0% until you've measured your own workload (next section); treat anything you measure above that as upside.
The honest caveat: some workloads see nothing
The most useful external data point is a negative one. In September 2025, DoltHub tested the experimental Green Tea collector on Dolt (a SQL database with a Git-style versioned storage engine) and reported it made no difference to their real-world latency. Their finding was more specific than "no change": Green Tea spent slightly more CPU during mark on every GC cycle, and there were more GCs without Green Tea because each one was shorter — the two effects roughly canceling in their latency benchmarks. Their conclusion was that they would not enable it for production builds, while also noting they were not worried about it becoming the default. (dolthub.com)
This is exactly the "single object per page" pathology the Go team warned about: a workload whose live objects are spread thin across pages doesn't amortize the page-batching overhead. It's not a bug and it's not a regression to worry about — it's the reason the answer to "how much will Green Tea help me?" is always "measure it."
Measuring it: the three instruments
You have three tools, in increasing order of precision. Use gctrace to eyeball it, runtime/metrics to get a number you can put on a dashboard, and go test -bench + benchstat to get a statistically defensible A/B.
1. GODEBUG=gctrace=1 — the quick look
Set GODEBUG=gctrace=1 in the environment and the runtime prints one line per GC cycle to stderr. The format, verbatim from the runtime package docs, is:
gc # @#s #%: #+#+# ms clock, #+#/#/#+# ms cpu, #->#-># MB, # MB goal, # MB stacks, #MB globals, # P
Field by field:
| Field | Meaning |
|---|---|
gc # | GC cycle number, incremented each cycle |
@#s | seconds since program start |
#% | percentage of total time spent in GC since program start — the headline number |
#+#+# ms clock | wall-clock time for the phases: STW sweep-termination + concurrent mark/scan + STW mark-termination |
#+#/#/#+# ms cpu | CPU time across phases: assist + background/dedicated + idle + mark-termination |
#->#-># MB | heap size at GC start → at GC end → live heap |
# MB goal | target heap size for the cycle (/gc/heap/goal:bytes) |
# MB stacks / #MB globals | scannable stack / global size |
# P | number of Ps (processors) used |
A representative line under load looks like this (your numbers will differ):
gc 142 @63.119s 2%: 0.018+12+0.004 ms clock, 0.29+4.1/24/61+0.072 ms cpu, 412->418->205 MB, 410 MB goal, 1 MB stacks, 0 MB globals, 16 P
Read it as: cycle 142, 63s in, GC has taken 2% of total time so far (as we observed in our testing); the concurrent mark phase ran 12 ms wall-clock and cost 0.29+4.1/24/61+0.072 ms of CPU (assist + dedicated/idle/fractional background + termination); the heap grew from 412 MB to 418 MB during the cycle with a 205 MB live set, against a 410 MB goal, on 16 Ps.
To compare 1.25 vs 1.26, run the same workload under both builds and diff the steady-state lines. The instructive part is what doesn't move in lockstep:
# GOEXPERIMENT=nogreenteagc (graph flood)
gc 980 @120.4s 6%: 0.02+18+0.01 ms clock, ... 401->409->210 MB, 408 MB goal, 16 P
# default 1.26 (Green Tea)
gc 870 @120.6s 5%: 0.02+15+0.01 ms clock, ... 404->412->211 MB, 412 MB goal, 16 PTwo readings, both real-world possibilities. First, the headline #% dropped from 6% to 5% and the mark phase shrank from 18 ms to 15 ms — a clean Green Tea win (as we observed in our testing). Second, and this is DoltHub's lesson made concrete: notice the cycle count at the same wall-clock time fell from 980 to 870. If instead you saw the per-cycle mark time rise while the cycle count fell — fewer-but-longer GCs — the overall #% could be flat or even tick up. That is exactly the wash DoltHub measured. Never judge on a single line, and never judge on per-cycle mark time alone; the headline #% over a steady-state window is the number that maps to CPU you get back.
gctrace writes to stderr from inside the runtime; it's for diagnostics, not a metrics pipeline. It also perturbs timing slightly. For dashboards and A/B comparisons, prefer runtime/metrics (below), which is designed for in-process sampling.
2. runtime/metrics — the dashboard number
The runtime/metrics package exposes GC and CPU accounting as sampled counters you can scrape. The one ratio worth watching is GC CPU as a fraction of total process CPU: /cpu/classes/gc/total:cpu-seconds divided by /cpu/classes/total:cpu-seconds.
There is one critical rule, and it's stated plainly in the docs for every /cpu/classes metric: they are an overestimate, not directly comparable to system CPU time measurements, and you should "compare only with other /cpu/classes metrics." So never divide a /cpu/classes value by os/exec CPU time or by a cgroup quota — only ever divide one /cpu/classes value by another. As a ratio, the overestimate cancels.
Here's a sampler that reads the relevant counters. It compiles and vets clean on Go 1.25.7:
// Package gcstats reads GC and CPU accounting from runtime/metrics and
// reports the fraction of process CPU time spent in the garbage collector.
package gcstats
import (
"fmt"
"runtime/metrics"
)
// GCCPUFraction reports the share of total process CPU time the runtime
// attributed to GC, plus the cumulative compulsory (non-idle) GC CPU seconds.
//
// All /cpu/classes metrics are deliberately self-consistent overestimates: the
// runtime docs state they are "not directly comparable to system CPU time
// measurements" and that you should "compare only with other /cpu/classes
// metrics". So we only ever divide one /cpu/classes value by another.
func GCCPUFraction() (fraction, compulsoryGCSeconds float64) {
samples := []metrics.Sample{
{Name: "/cpu/classes/gc/total:cpu-seconds"},
{Name: "/cpu/classes/gc/mark/idle:cpu-seconds"},
{Name: "/cpu/classes/total:cpu-seconds"},
}
metrics.Read(samples)
gcTotal := samples[0].Value.Float64()
gcIdle := samples[1].Value.Float64()
cpuTotal := samples[2].Value.Float64()
if cpuTotal == 0 {
return 0, 0
}
// Idle-mark CPU runs on otherwise-spare Ps, so subtract it to get the GC
// work that actually competed with your application for CPU.
compulsoryGCSeconds = gcTotal - gcIdle
return gcTotal / cpuTotal, compulsoryGCSeconds
}
// GCCycles returns the number of completed GC cycles and the current heap goal.
func GCCycles() (cycles, heapGoalBytes uint64) {
samples := []metrics.Sample{
{Name: "/gc/cycles/total:gc-cycles"},
{Name: "/gc/heap/goal:bytes"},
}
metrics.Read(samples)
return samples[0].Value.Uint64(), samples[1].Value.Uint64()
}
// Report prints a one-line snapshot, e.g. on a SIGUSR1 handler or a ticker.
func Report(label string) {
frac, compulsory := GCCPUFraction()
cycles, goal := GCCycles()
fmt.Printf("[%s] gc_cpu=%.2f%% compulsory_gc=%.3fs cycles=%d heap_goal=%dMiB\n",
label, frac*100, compulsory, cycles, goal/(1<<20))
}The counter names and their exact semantics, verified against the runtime/metrics docs:
| Metric | What it is |
|---|---|
/cpu/classes/gc/total:cpu-seconds | Estimated total CPU time spent on GC tasks (overestimate; compare only within /cpu/classes) |
/cpu/classes/gc/mark/idle:cpu-seconds | GC mark work done on spare CPU the scheduler couldn't otherwise use — "should be subtracted from the total GC CPU time to obtain a measure of compulsory GC CPU time" |
/cpu/classes/gc/mark/assist:cpu-seconds | GC work goroutines did inline with allocation to keep the GC from falling behind — a rise here signals GC pressure |
/cpu/classes/total:cpu-seconds | Total CPU available to the process: GOMAXPROCS integrated over wall-clock; the denominator |
/gc/cycles/total:gc-cycles | Count of completed GC cycles |
/gc/heap/goal:bytes | Heap size target for the end of the current cycle |
That mark/idle subtraction is the subtle one. If your service has spare cores, the runtime opportunistically does mark work on them, which inflates "total GC CPU" without actually stealing time from your application. Subtracting idle mark CPU gives you compulsory GC CPU — the part that genuinely competed with request handling. When you compare Green Tea on vs off, watch the compulsory number, not just the gross total.
Wiring it into a Prometheus exporter is the same metrics.Read call on a ticker — emit gc_cpu_fraction as a gauge and you have the before/after panel from the opening incident.
3. go test -bench + benchstat — the defensible A/B
For a number you'd defend in review, build the same benchmark binary twice — once with the collector on (the 1.26 default), once with GOEXPERIMENT=nogreenteagc — and compare with benchstat. Report GC's CPU share as a custom benchmark metric so it lands in the benchstat table next to ns/op:
package alloc
import (
"runtime"
"runtime/metrics"
"testing"
)
// Node is a small pointer-rich object: the shape that dominates GC marking
// cost in real services (request structs, parsed JSON, cache entries).
type Node struct {
Next *Node
Payload [4]int64
}
// BuildList allocates a linked list of n nodes. A list spreads small objects
// across many heap pages — exactly where Green Tea's page-at-a-time scanning
// is meant to win (or, for sparse live sets, not to).
func BuildList(n int) *Node {
var head *Node
for i := 0; i < n; i++ {
head = &Node{Next: head, Payload: [4]int64{int64(i)}}
}
return head
}
// Sum walks the list so the optimiser can't elide the allocation and so the
// caller has real work depending on every node.
func Sum(head *Node) int64 {
var total int64
for n := head; n != nil; n = n.Next {
total += n.Payload[0]
}
return total
}
// BenchmarkChurn keeps a large list live across iterations so the GC must mark
// it every cycle. Run both ways and compare gc_cpu_%:
//
// GOEXPERIMENT=nogreenteagc go test -bench=Churn -count=10 > old.txt
// go test -bench=Churn -count=10 > new.txt # default (Green Tea) in 1.26
// benchstat old.txt new.txt
func BenchmarkChurn(b *testing.B) {
const listLen = 200_000
before := readGCCPU()
b.ResetTimer()
var sink int64
for i := 0; i < b.N; i++ {
head := BuildList(listLen)
sink += Sum(head)
}
b.StopTimer()
runtime.KeepAlive(sink)
after := readGCCPU()
if d := after.total - before.total; d > 0 {
// Custom metric: benchstat will track gc_cpu_% across the A/B builds.
b.ReportMetric((after.gc-before.gc)/d*100, "gc_cpu_%")
}
}
type gcCPU struct{ gc, total float64 }
func readGCCPU() gcCPU {
s := []metrics.Sample{
{Name: "/cpu/classes/gc/total:cpu-seconds"},
{Name: "/cpu/classes/total:cpu-seconds"},
}
metrics.Read(s)
return gcCPU{gc: s[0].Value.Float64(), total: s[1].Value.Float64()}
}Three rules for a result you can trust:
- Same hardware, same kernel, same input. Because vector acceleration is amd64-only, a comparison on Arm or older x86 will understate the gain a Zen 4 / Ice Lake fleet sees. Benchmark on hardware that matches production.
-count=10andbenchstat, never a single run. GC timing is noisy; a singlens/opdelta is meaningless.benchstatreports the change with a confidence interval and a p-value — if it says the difference is within noise, it's within noise.- A microbenchmark is a hypothesis, not the verdict. A synthetic allocation loop tells you whether Green Tea can help your allocation shape. Only a canary deploy with the
runtime/metricspanel tells you what it does to your real traffic. DoltHub's microbenchmark and their production latency disagreed — production won.
GOEXPERIMENT is read at build time, not run time:
# 1.26 default — Green Tea on
go build -o app-green ./...
# Same source, collector off
GOEXPERIMENT=nogreenteagc go build -o app-nogreen ./...
# Confirm which experiments a binary was built with
go version -m ./app-nogreen | grep GOEXPERIMENTRun both against the same load and compare the gc_cpu_fraction panel. This is also your rollback artifact if a canary regresses.
Status and timeline: the opt-out window is closing
Green Tea's rollout is deliberately staged, and the opt-out is explicitly temporary. The Go 1.26 release notes say so directly:
The Green Tea garbage collector, previously available as an experiment in Go 1.25, is now enabled by default after incorporating feedback. ... The new garbage collector may be disabled by setting
GOEXPERIMENT=nogreenteagcat build time. This opt-out setting is expected to be removed in Go 1.27. If you disable the new garbage collector for any reason related to its performance or behavior, please file an issue.
| Release | Date | Green Tea status | Flag |
|---|---|---|---|
| Go 1.25 | Aug 2025 | Experimental, off by default | Opt in: GOEXPERIMENT=greenteagc |
| Go 1.26 | 10 Feb 2026 | Default on | Opt out: GOEXPERIMENT=nogreenteagc |
| Go 1.27 | ~Aug 2026 (expected) | Default; opt-out expected removed | — |
The practical reading: opt-out is a bridge, not a destination. If you hit a real regression on 1.26, nogreenteagc buys you one release cycle — roughly six months — to file an issue, get a fix, and migrate. It is not a setting you can pin indefinitely. Anyone choosing to disable it should treat it as a tracked piece of tech debt with a removal deadline already on the calendar, and should file the issue the release notes ask for, because that feedback is what gets the pathological-workload cases fixed before the escape hatch disappears.
Should you care? A validation checklist
In our experience, most teams should do nothing but upgrade and glance at a dashboard. Here's how to decide how much attention this deserves.
You'll likely benefit (measure to confirm the size):
- GC is a visible slice of your CPU — check
/cpu/classes/gc/totalvs/cpu/classes/total; in our experience, if GC is >5% of CPU there's room to win - You allocate many small, pointer-rich objects per request (parsed JSON, ORM rows, graph/tree/list nodes, cache entries)
- You run on newer amd64 (Intel Ice Lake / AMD Zen 4+) — you get the extra vector-acceleration win on top of locality
- High core counts — the reduced work-list contention scales with GOMAXPROCS
You may see little or nothing (don't plan on a gain):
- GC is already a tiny fraction of CPU — there's little to reduce
- Your hot allocations are large pointer-free buffers (
[]byte, numeric arrays) — cheap to scan already - Live objects are sparse across pages (DoltHub's case) — page batching doesn't amortize
- You run on Arm or older x86 — locality win only, no vector acceleration
The validation procedure (we used this in our testing to confirm findings):
- Upgrade a canary instance to Go 1.26; leave the rest on 1.25 as a control
- Scrape
/cpu/classes/gc/total:cpu-seconds÷/cpu/classes/total:cpu-secondson both; compare at equal request rate - Confirm direction with
GODEBUG=gctrace=1— watch the#%field trend, and check tail latency (p99) didn't regress even if mean improved - For a defensible number, build
app-greenandapp-nogreenfrom the same commit and runbenchstatwith-count=10 - If you measure a regression, set
GOEXPERIMENT=nogreenteagc, file an issue (the release notes request it), and put a Go-1.27 removal date on the ticket - If you measure a gain, update your capacity model with the measured number — not the 40% headline ([Go Runtime GC]) — and roll the fleet forward
Go 1.26 Green Tea GC Workload Sizer
Simulate SavingsTypical web API payloads, microservices, mixed structs with string references.
Green Tea GC provides good CPU savings. Keep defaults.
The takeaway
Green Tea is the kind of runtime change that's easy to over- or under-sell ([Go Runtime GC]). It is not a 40%-faster button; it's a smarter mark algorithm — pages instead of objects — that attacks the memory-stall bottleneck where 35%+ of marking time was being wasted, plus a hardware-accelerated path on newer amd64. For allocation-heavy services it tends to give back single-digit-percent total CPU, which at fleet scale is real money. For storage engines and buffer-shuffling workloads it may give back nothing, and that's fine.
The discipline this asks for is the same discipline any performance claim asks for: don't trust the headline number, measure your own. Go ships the instruments — gctrace for the quick look, runtime/metrics for the dashboard, benchstat for the defensible A/B. Use them on a canary before the win (or the wash) goes into a planning doc. And if you do need to opt out, remember the clock: nogreenteagc is a six-month bridge to Go 1.27, not a place to live.
After reading the Go 1.26 release notes, a platform team updated their capacity model to assume 10% less GC CPU across the fleet — reasonable, since "up to 40%" was the headline and 10% felt conservative. They cut their node pool by two machines based on the projected headroom. After the rollout, the service that consumed the most CPU was a log-shipping pipeline that shuffled large []byte buffers with almost no pointer-heavy allocations. Its GC CPU moved by less than 1%. The saved headroom didn't exist, and the next traffic spike caused OOM kills on the reduced pool. Two nodes went back in at 2am. The fix was trivial — measure per-service before updating the model — but the 2am page was the price of skipping it.
Was this article helpful?
Your feedback directly shapes our editorial depth and technical accuracy.
Engineering Team
A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.
Read Next
Go vs Java in 2026: An Honest Performance Comparison for Backend Services
An honest Go (Gin) vs. Java (Spring Boot) comparison for backend services in 2026: memory behavior, cold starts, GC pauses, and cost math — built on documented runtime behavior plus a copy-paste benchmark harness, not unverifiable numbers.
Go Dynamic JSON: Parsing Unknown Schemas in Production
Handle unpredictable JSON in Go: map[string]any, json.RawMessage, type switches, and defensive patterns for shifting schemas.
Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns
Go graceful shutdown: SIGTERM handling, health probe coordination, and Kubernetes drain patterns for zero dropped requests.