Go's concurrency model is famously approachable: go func() and you have a goroutine. The trap is that easy to write is not the same as easy to write correctly at scale. Most production Go incidents I've debugged trace back to one of three things: leaked goroutines, missing context cancellation, or unbounded fan-out swallowing the database.
This is the production playbook for the patterns that actually hold up.
errgroup: The Pattern You Should Reach for First
golang.org/x/sync/errgroup is the single most important addition to Go's standard concurrency story. It handles the "run N goroutines, wait for all, capture the first error, cancel the rest" pattern that you'd otherwise build by hand every time.
import "golang.org/x/sync/errgroup"
func GetDashboard(ctx context.Context, userID string) (*Dashboard, error) {
g, ctx := errgroup.WithContext(ctx)
var profile *Profile
var orders []Order
var notifications []Notification
g.Go(func() error {
var err error
profile, err = profileService.Get(ctx, userID)
return err
})
g.Go(func() error {
var err error
orders, err = orderService.List(ctx, userID)
return err
})
g.Go(func() error {
var err error
notifications, err = notifyService.List(ctx, userID)
return err
})
if err := g.Wait(); err != nil {
return nil, err
}
return &Dashboard{Profile: profile, Orders: orders, Notifications: notifications}, nil
}
Three calls run in parallel. If any one fails, the context is cancelled. the other two abort their in-flight work immediately rather than wasting cycles. The first error returned wins.
This pattern replaces the entire genre of "manually allocate channels, manually count goroutines, manually accumulate errors" code that older Go codebases are full of.
Worker Pools for Bounded Fan-Out
Spawning 10,000 goroutines to process 10,000 jobs is easy. It's also a great way to OOM your service or melt your downstream database. The bounded worker pool is the production answer.
func ProcessImages(ctx context.Context, urls []string) error {
const workers = 16
jobs := make(chan string)
g, ctx := errgroup.WithContext(ctx)
// Producer
g.Go(func() error {
defer close(jobs)
for _, url := range urls {
select {
case jobs <- url:
case <-ctx.Done():
return ctx.Err()
}
}
return nil
})
// Workers
for i := 0; i < workers; i++ {
g.Go(func() error {
for url := range jobs {
if err := processOne(ctx, url); err != nil {
return err
}
}
return nil
})
}
return g.Wait()
}
The producer feeds the channel; 16 workers consume it. If any worker errors, errgroup cancels the context, the producer's select picks up the cancellation and stops feeding new jobs, and remaining workers drain naturally.
The number of workers should be sized to the bottleneck. usually downstream connection pools or external API rate limits, not CPU cores. For a service hitting a Postgres database with a 30-connection pool, 16 workers is the right ballpark; 100 will queue on the pool and produce the same throughput with worse tail latency.
Context Cancellation: The Discipline That Saves You
Every function that does I/O or blocks should accept a context.Context and respect cancellation. Every function that hands work to a goroutine should propagate the context. This isn't optional. it's the difference between a service that gracefully sheds load under pressure and one that piles up goroutines until it dies.
// ✅ Cancellable
func FetchAll(ctx context.Context, ids []string) ([]Item, error) {
items := make([]Item, len(ids))
g, ctx := errgroup.WithContext(ctx)
for i, id := range ids {
i, id := i, id
g.Go(func() error {
item, err := fetchOne(ctx, id) // ctx propagated
if err != nil {
return err
}
items[i] = item
return nil
})
}
return items, g.Wait()
}
// ❌ Will leak if the caller times out
func FetchAllBad(ids []string) ([]Item, error) {
items := make([]Item, len(ids))
var wg sync.WaitGroup
for i, id := range ids {
wg.Add(1)
go func(i int, id string) {
defer wg.Done()
item, _ := fetchOneNoCtx(id) // no way to cancel
items[i] = item
}(i, id)
}
wg.Wait()
return items, nil
}
The bad version starts goroutines that the caller cannot stop. If the HTTP request behind this call times out at 30 seconds and fetchOneNoCtx takes 60, you've got 60-second goroutine lifetimes accumulating under load. They eventually finish, but they're consuming connections, memory, and downstream capacity for work no one wants.
Fan-Out, Fan-In with Order Preservation
When you need to process items in parallel but return them in input order, the pattern is: assign a slot, write to the slot, never write to a shared channel out of order.
type result struct {
index int
val string
err error
}
func ParallelMap(ctx context.Context, in []string) ([]string, error) {
out := make([]string, len(in))
sem := make(chan struct{}, 8) // concurrency limit
g, ctx := errgroup.WithContext(ctx)
for i, item := range in {
i, item := i, item
g.Go(func() error {
select {
case sem <- struct{}{}:
defer func() { <-sem }()
case <-ctx.Done():
return ctx.Err()
}
v, err := transform(ctx, item)
if err != nil {
return err
}
out[i] = v // safe: each goroutine writes to its own index
return nil
})
}
return out, g.Wait()
}
The semaphore caps concurrency. Each goroutine writes only to its own pre-assigned slot in out: no shared mutable state, no channel ordering tricks. Output order matches input order automatically.
sync.Once and sync.OnceFunc for Lazy Initialisation
For singletons and lazy-init patterns, sync.Once is the right primitive. Go 1.21 added sync.OnceFunc, sync.OnceValue, and sync.OnceValues which collapse the most common boilerplate:
// Old way
var (
once sync.Once
client *http.Client
)
func getClient() *http.Client {
once.Do(func() {
client = &http.Client{Timeout: 30 * time.Second}
})
return client
}
// New way
var getClient = sync.OnceValue(func() *http.Client {
return &http.Client{Timeout: 30 * time.Second}
})
Cleaner, harder to misuse, and the closure makes the dependency obvious.
Pitfalls That Bite Production
1. Goroutine leaks from blocking sends to unbuffered channels. A goroutine sending to an unbuffered channel blocks until someone reads. If the reader exited (because of error, cancellation, etc.), the sender leaks. Always pair sends with select on the context:
select {
case ch <- value:
case <-ctx.Done():
return ctx.Err()
}
2. Loop variable capture pre-Go 1.22. Code like for _, x := range items { go func() { use(x) }() } captured the same x across iterations on older Go versions. Go 1.22 fixed this. but if your codebase still supports older versions, write x := x inside the loop or pass x as an argument.
3. WaitGroup misuse. Calling wg.Add(1) inside the goroutine instead of before it is a race condition. the parent might call wg.Wait() before the goroutine has incremented the counter. Always wg.Add before go func().
4. Unbounded channel buffers. A channel with a buffer of 100,000 isn't a queue. it's a memory bomb waiting for the producer to outpace the consumer. Use bounded buffers and rely on backpressure to slow producers down.
When Channels Are the Wrong Tool
Go culture used to push channels for everything. The modern wisdom is more pragmatic:
- Use channels for streaming data between goroutines, signalling, and coordination.
- Use mutexes for protecting shared state (caches, counters, maps).
- Use atomic operations for single-word counters and flags.
- Use errgroup for "fan out, wait, capture errors" patterns instead of building it yourself.
The right primitive is the one that matches the problem shape. A sync.Map is sometimes the right answer; a 4-channel coordination dance is sometimes overkill.
The Production Checklist
Before any concurrent Go code ships:
- Does every long-lived goroutine respect context cancellation?
- Is fan-out bounded (worker pool, semaphore, or errgroup with limit)?
- Are there any unbuffered channel sends not wrapped in
selectwithctx.Done()? - Are errors returned, not swallowed?
- Will this code degrade gracefully under load, or will it pile up goroutines?
Concurrency in Go is easy to start and hard to stop. The patterns above turn that asymmetry into your favour.