When Go services talk to each other across a network, gRPC is the default for good reason: schema-first contracts, generated client and server code, streaming support, and a binary wire format that's faster and smaller than JSON. But gRPC's defaults are not production defaults. and the gap between "tutorial gRPC" and "gRPC that survives Saturday night" is wider than most teams realise.
This is the production playbook.
The Generation Workflow
The biggest source of friction for teams adopting gRPC is the code generation pipeline. Get this right once and forget about it.
proto/
├── orders/
│ └── v1/
│ └── orders.proto
└── inventory/
└── v1/
└── inventory.proto
// proto/orders/v1/orders.proto
syntax = "proto3";
package orders.v1;
option go_package = "github.com/acme/protos/gen/go/orders/v1;ordersv1";
import "google/protobuf/timestamp.proto";
service OrderService {
rpc PlaceOrder(PlaceOrderRequest) returns (PlaceOrderResponse);
rpc GetOrder(GetOrderRequest) returns (Order);
}
message Order {
string id = 1;
string customer_id = 2;
google.protobuf.Timestamp placed_at = 3;
// ...
}
Use buf for everything. generation, linting, breaking-change detection, dependency management. The buf.yaml and buf.gen.yaml configs give you reproducible builds across machines:
# buf.gen.yaml
version: v2
plugins:
- remote: buf.build/protocolbuffers/go
out: gen/go
opt: paths=source_relative
- remote: buf.build/grpc/go
out: gen/go
opt: paths=source_relative
buf generate # produce gen/go/...
buf lint # enforce style rules
buf breaking --against '.git#branch=main' # catch breaking changes
Pin the buf version in CI; check in the generated code (yes, even though it's generated). The reproducibility win. every developer and every PR sees the same generated output. is worth the disk space.
Always Set Deadlines
The single most important thing you can do for gRPC reliability: set deadlines on every RPC call. A call without a deadline is a call that can hang forever.
ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
resp, err := client.GetOrder(ctx, &ordersv1.GetOrderRequest{Id: id})
The deadline propagates over the wire. the server sees it on its incoming context, and any downstream RPCs the server makes inherit it. A request budget of 2 seconds means every sub-call must complete within the remaining time, automatically.
The pattern that prevents the most outages: enforce deadlines at the service boundary.
func unaryDeadlineInterceptor(defaultTimeout time.Duration) grpc.UnaryServerInterceptor {
return func(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) {
if _, hasDeadline := ctx.Deadline(); !hasDeadline {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, defaultTimeout)
defer cancel()
}
return handler(ctx, req)
}
}
Now any client that forgot to set a deadline gets a sane default. Defence in depth.
Smart Retries
gRPC supports automatic retries via service config. but turning it on naively can amplify outages. The retry policy that's safe for production:
serviceConfig := `{
"methodConfig": [{
"name": [{"service": "orders.v1.OrderService"}],
"retryPolicy": {
"maxAttempts": 3,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
}
}]
}`
conn, err := grpc.NewClient(
addr,
grpc.WithDefaultServiceConfig(serviceConfig),
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
Three rules:
- Only retry idempotent operations.
GetOrderis fine.PlaceOrderis not. a retry could create two orders. For non-idempotent operations, design an idempotency key into the request and let the server deduplicate. - Never retry on
INVALID_ARGUMENT,PERMISSION_DENIED,NOT_FOUND, etc. These are deterministic failures; retrying just amplifies load. - Cap total attempts and backoff. A retry storm during a partial outage is the textbook way to turn a 5-minute incident into a 50-minute one.
Interceptors for Observability
gRPC interceptors are middleware. The set every production service needs:
import (
grpc_otel "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
grpc_recovery "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery"
)
server := grpc.NewServer(
grpc.ChainUnaryInterceptor(
grpc_otel.UnaryServerInterceptor(),
grpc_recovery.UnaryServerInterceptor(),
loggingInterceptor(),
authInterceptor(),
),
)
The order matters. otel first so spans capture everything below. recovery next so panics turn into proper gRPC errors instead of crashing the process. Logging after that. Auth last (it can short-circuit further work).
func loggingInterceptor() grpc.UnaryServerInterceptor {
return func(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) {
start := time.Now()
resp, err := handler(ctx, req)
slog.InfoContext(ctx, "rpc",
"method", info.FullMethod,
"duration_ms", time.Since(start).Milliseconds(),
"status", status.Code(err).String(),
)
return resp, err
}
}
Every RPC logged with method, duration, status, and trace ID (from the OTel context). Every RPC traced. Every panic recovered. Three interceptors, hours of debugging time saved.
Connection Management
Common mistake: creating a new gRPC connection per request. gRPC connections are designed to be long-lived and multiplexed.
// ❌ Don't do this
func handler(w http.ResponseWriter, r *http.Request) {
conn, _ := grpc.NewClient(addr)
defer conn.Close()
client := ordersv1.NewOrderServiceClient(conn)
// ...
}
// ✅ Initialise once, reuse
type Handler struct {
orderClient ordersv1.OrderServiceClient
}
func NewHandler(conn *grpc.ClientConn) *Handler {
return &Handler{orderClient: ordersv1.NewOrderServiceClient(conn)}
}
A single grpc.ClientConn handles concurrent calls. Multiplex through it.
For services with multiple replicas behind it, configure client-side load balancing:
conn, _ := grpc.NewClient(
"dns:///order-service.default.svc.cluster.local:50051",
grpc.WithDefaultServiceConfig(`{"loadBalancingConfig":[{"round_robin":{}}]}`),
)
The dns:/// scheme tells the client to resolve all backing IPs and round-robin across them. For Kubernetes services, this works without a service mesh.
Streaming: When and How
gRPC's four streaming modes (unary, server-streaming, client-streaming, bidirectional) cover real use cases:
- Server streaming: log tail, change feed, search results paging. The server pushes results as they're available.
- Client streaming: file upload, batch ingest. The client pushes chunks; the server returns one final response.
- Bidirectional: chat, real-time collaboration, anything with sustained two-way traffic.
Streaming is more complex than unary; reach for it only when the use case clearly warrants it. A "GetAll" endpoint that returns 10,000 items is not a use case for streaming. paginate the unary call instead. Reserve streaming for cases where the data flow is genuinely incremental.
What "Production-Ready" Looks Like
A gRPC service hits these checkpoints before it goes live:
- Deadline propagation enforced at the server boundary
- Retry policy configured with idempotency in mind
- Interceptors: tracing, recovery, logging, auth
- Long-lived client connections, never per-request
- Health check service registered (
grpc_health_v1) - Protobuf schemas linted by
bufand breaking-change checked in CI - Generated code committed to the repo, not regenerated at build time
Hit those, and gRPC stops being a source of production surprises and starts being the boring, reliable transport it's meant to be.