Software.
Everything I've written about software. Start at the top; the list compounds.
Migrating From GPT 5.4 to GPT 5.5: A Practical Migration Playbook
Migrating production LLM features from one model version to the next is mostly mechanical. change a string, ship the build. until it isn't. The places it bites: subtle output differences your prompt-test suite missed, cost shifts you didn't notice, and behavio
ChatGPT 5.5 Multimodal Patterns: Vision, Audio, and Mixed Inputs
Multimodal LLM features have moved from "interesting demo" to "real production capability" over the last two years. With GPT 5.5, vision and audio inputs are reliable enough to ship into customer-facing features for use cases like document analysis, visual sup
ChatGPT 5.5 for Coding Tasks: Where It Wins and Where It Doesn't
Asking "is GPT 5.5 good at coding?" is the wrong question. The right question is: for which coding tasks does it produce reliable, useful output, and when should you reach for a different tool? The answer is more granular than the marketing makes it sound, and
ChatGPT 5.5: What Changed for Developers
Each minor GPT version brings a mix of broadly-better and use-case-specific improvements. This article focuses on what GPT 5.5 changes for builders. the capability shifts that justify migration effort, the patterns that newly become practical, and the places w
Claude Opus 4.7 1M Context Window: Patterns and Pitfalls
The 1M-token context window in Claude Opus 4.7 is a genuine capability shift, not a marketing increment. But "you can fit it" and "you should fit it" are different questions, and the production patterns for long context are non-obvious. This article walks thro
Claude Opus 4.7: What's New, What it Changes for Builders
Claude Opus 4.7 is the current frontier of the Claude model family. The headline upgrade from 4.6 is the 1M-token context window. five times the size. but the more practical wins are in long-context recall, agentic stability over long sessions, and a noticeabl
Cost-Optimising ChatGPT 5.4 Production Deployments
The fastest path from a working LLM feature to a financially sustainable LLM feature is a set of cost optimisations that don't compromise quality. For most production deployments of GPT 5.4, these patterns cut spend by 60-85% with no measurable user-facing imp
Gemini 3.1 Pro vs Other Frontier Models: A Practitioner's Comparison
The frontier-model market is now a genuine multi-provider one. Anthropic's Claude, OpenAI's GPT, Google's Gemini, plus serious open-weight models. The "best model" varies by use case, by week, and by which evaluation suite you trust. The question that actually
ChatGPT 5.4 for Builders: Capability Patterns and Production Notes
When a new GPT model lands, the first wave of "what's new" coverage focuses on benchmark deltas. The second wave. the one builders actually need. is about which production patterns the model unlocks, where it changes the cost-quality calculus, and what to migr
High-Volume Classification and Extraction with Gemini Flash Lite
The two highest-volume LLM use cases in production today are classification (assign a category to an input) and extraction (pull structured fields from unstructured input). For both, small-tier models like Gemini 3.1 Flash Lite often produce identical-quality
ChatGPT 5.4: When to Use Reasoning Models vs Standard Chat
OpenAI now ships two distinct families of models for builders to choose between: standard chat models like GPT 5.4, and reasoning-tier models that produce longer, more deliberate outputs by spending more compute per request. They're not interchangeable. and ch
Gemini 3.1 Flash Lite: When Fast and Cheap Wins
The frontier-model conversation gets the headlines. The small-tier models do the work. In production AI systems with real volume, the lite-class models. Gemini 3.1 Flash Lite, Claude Haiku, GPT mini variants. handle the bulk of requests, while the frontier tie
Gemini 3.1 Pro for Builders: Strengths, Use Cases, and Production Patterns
Google's Gemini line has always been positioned as a frontier alternative to OpenAI and Anthropic. strong capabilities, deep integration with Google Cloud, and a willingness to lean into long-context and multimodal differentiators. Gemini 3.1 Pro is the curren
Claude Sonnet vs Opus: A Practitioner's Guide to Choosing the Right Model
The Claude 4 family. Sonnet 4.6, Opus 4.6, and Opus 4.7. gives builders three meaningful tiers to choose from. Pick the wrong one and you're either burning money on a task that didn't need the firepower, or shipping a feature that almost works. The right choic
Gemini 3.1 Pro Long Context: Patterns That Hold Up in Production
Long context is the dimension where Gemini's family has consistently distinguished itself. With Gemini 3.1 Pro, the ability to process very large inputs in a single call is mature enough to ship into production for serious analytical workloads. codebase reason
Claude Sonnet 4.6 for Production AI Features: A Builder's Guide
Claude Sonnet 4.6 is the model most production AI features should be built on. It's the workhorse of the Claude 4 family. strong enough to handle complex reasoning, fast enough to drive real-time features, and priced for the volume that production usage actual
Claude Opus 4.6 for Complex Reasoning Tasks: When and How to Use It
Opus 4.6 is the model you reach for when the answer matters and the question is hard. It's slower than Sonnet, costs more per token, and you should be deliberate about every call you make to it. But when the task is genuinely complex. multi-step reasoning, dee
Building With the Claude Agent SDK: Production Patterns for 2026
The Claude Agent SDK is Anthropic's higher-level layer for building agents. handling the loop, tool execution, session management, and the surrounding infrastructure that you'd otherwise build yourself. Combined with Opus 4.7's improvements in long-running sta
Building Production Agents with Claude Opus and Tool Use
The gap between "an LLM with tool use" and "a production agent that does real work" is wider than the demos suggest. The model can call your tools. but making it do so reliably, recovering when tools fail, knowing when to stop, and shipping outputs your users
Claude Prompt Caching: Production Patterns That Cut Costs 80%
Prompt caching is the single highest-ROI feature in the Claude API for production workloads. Used well, it cuts the cost of high-traffic endpoints by 70-90% and shaves hundreds of milliseconds off latency. Used poorly. or ignored. it leaves the equivalent of a
LLM Tier Economics: Flash Lite vs Pro vs Frontier - A Decision Framework
Every major LLM provider now offers three tiers: a small/lite model (Gemini 3.1 Flash Lite, Claude Haiku, GPT mini variants), a mid-tier workhorse (Gemini 3.1 Pro, Claude Sonnet 4.6, GPT 5.4/5.5), and a frontier flagship (Claude Opus 4.7, reasoning-tier models
Streaming LLM Responses in FastAPI: SSE, WebSockets, and Real-Time AI
LLM responses are fundamentally different from traditional API responses. A typical database query returns in under 100ms. A GPT-4 completion for a long prompt can take 15-30 seconds to fully generate. Users will abandon a blank screen after 2-3 seconds. The s
Next.js Caching: A Production Deep Dive (Fetch, Router, ISR, Edge)
Next.js has four caches. They interact. They don't always invalidate the way you'd guess. Most production incidents I've debugged in the last year on Next.js apps trace back to a misunderstanding of one of the four. usually the Data Cache or the Full Route Cac
Next.js Edge Runtime - A Production Reality Check
The pitch for the Edge Runtime sounds irresistible: your code runs in 300+ cities, the cold start is under 50ms, and your users always hit a server within a few hundred miles of them. Latency disappears. The reality, after building several apps that targeted E
Next.js App Router in Production: Patterns That Actually Scale
The Next.js App Router has been generally available for over two years now. The early "should we migrate?" debates have settled. the answer for new projects is yes, and the migration patterns for existing apps are mature. But the App Router rewards a different
React Server Components: A Mental Model That Actually Sticks
Most React developers I've onboarded onto an App Router project have the same first reaction: "Oh, Server Components are like SSR." This is wrong in a way that takes weeks to unlearn. The mistake leads to apps that ship JavaScript they don't need, fetch data i
React 19 Hooks: use(), useOptimistic, useActionState in Production
React 19 introduced three hooks that fundamentally change how production React apps handle async data and forms: use(), useOptimistic, and useActionState. Used together, they replace huge amounts of boilerplate that previously took libraries like SWR, React Qu
React Rendering Performance: When to memo, When Not to (2026 Edition)
For most of React's history, the standard performance advice was: profile, find re-renders, wrap things in useMemo, useCallback, and React.memo. The advice produced codebases full of memoisation that couldn't pass a real audit. half of it was wrong (memoising
Go Concurrency Patterns That Actually Hold Up in Production
Go's concurrency model is famously approachable: go func() and you have a goroutine. The trap is that easy to write is not the same as easy to write correctly at scale. Most production Go incidents I've debugged trace back to one of three things: leaked gorout
Go HTTP Services in 2026: net/http vs Gin, Echo, Chi, Fiber
For years, the standard advice for Go HTTP services was "use Chi" or "use Gin". anything to escape net/http's missing features. The standard library couldn't do path parameters, methods routing was awkward, middleware composition was painful. Frameworks closed
Go gRPC in Production: Patterns for Reliable Microservice Communication
When Go services talk to each other across a network, gRPC is the default for good reason: schema-first contracts, generated client and server code, streaming support, and a binary wire format that's faster and smaller than JSON. But gRPC's defaults are not pr
Spring Boot AI Error Analyzer: One Annotation, Plain-English Stack Traces
Every Java team I've worked with loses hours per week to the same ritual. An exception fires in production, an engineer copies the stack trace into a chat, scrolls past framework noise to find the one line that matters, then walks back through the code to figu
Spring Boot Modular Monolith: Better Than Microservices for Most Teams
The pendulum has swung back. After a decade of teams over-correcting from monoliths to microservices and discovering the operational tax (distributed tracing, network failures, eventual consistency, deploy-time coupling pretending to be runtime decoupling), th
OpenTelemetry in Spring Boot: A Production Observability Setup
OpenTelemetry has become the default observability stack for modern Java services. It's vendor-neutral (you can ship to Datadog, Honeycomb, Grafana Tempo, Jaeger. same code), it covers traces, metrics, and logs in one SDK, and Spring Boot's integration story i
Spring WebFlux vs Virtual Threads: Which Concurrency Model in 2026
For five years, Spring teams chasing high throughput had one answer: WebFlux. Reactive streams, non-blocking I/O, the whole reactive programming model. The cost was steep. every dependency had to be reactive (R2DBC instead of JDBC, reactive Kafka clients, reac
Spring Boot + Project Loom: Virtual Threads for High-Throughput Java Services
Java 21 shipped Project Loom as a production feature. Virtual threads. lightweight user-mode threads managed by the JVM rather than the OS. fundamentally change the performance profile of blocking I/O applications. For Spring Boot developers, this means near-W
Java Spring Boot: The Complete Guide to Building Production REST APIs
Spring Boot is the most widely deployed Java framework in the world. It powers banking systems, healthcare platforms, e-commerce giants, and the overwhelming majority of enterprise microservices. If you're building anything serious in Java, Spring Boot is your
Python FastAPI: The Complete Guide to Building Production APIs
FastAPI is the fastest-growing Python API framework. and for good reason. It combines Python type hints with automatic OpenAPI documentation, Pydantic v2 validation, and genuine async support. Teams routinely see 2-3× the throughput of Flask/Django for I/O-bou
Firestore Data Modeling That Survives Scale: Patterns, Pitfalls, and Production Lessons
Firestore's most common cause of failure isn't technical. it's data modeling. Bad Firestore schemas produce expensive queries, hit document size limits, require full collection scans, or make certain features structurally impossible. Good schemas are designed
Supabase RLS Patterns for Multi-Tenant SaaS: The Complete Playbook
Row Level Security (RLS) is Postgres's mechanism for enforcing data access rules at the database level. In Supabase, it's the primary security boundary between your application and your data. When implemented correctly, RLS makes it structurally impossible for
Supabase: The Complete Developer Guide for Modern Full-Stack Apps
Supabase is the open-source Firebase alternative built on Postgres. It gives you a hosted Postgres database, REST and GraphQL APIs auto-generated from your schema, real-time subscriptions, built-in authentication, file storage, and serverless Edge Functions. a
Firebase for Modern App Developers: The Complete 2026 Guide
Firebase is Google's application development platform. a fully managed suite of backend services designed for mobile and web apps. At its core: Firestore (a NoSQL document database), Authentication, Realtime Database, Cloud Storage, Cloud Functions, and Hostin
PostgreSQL for Application Developers: The Complete Guide
PostgreSQL is the world's most advanced open-source relational database. It's the default choice for serious applications. powerful enough to handle the most complex data requirements, reliable enough for financial and healthcare systems, and open enough to de
PostgreSQL JSONB: Indexing Strategies and Query Performance Deep-Dive
PostgreSQL's JSONB type is one of its most powerful features. and one of the most misunderstood. Teams reach for JSONB to store flexible data, then discover their JSONB queries are slow, their indexes aren't being used, or their query planner is making bad cho