REF / WRITING · SOFTWARE

Software.

Everything I've written about software. Start at the top; the list compounds.

DomainSoftware

Posts44

CadenceWeekly

FormatEssay · Note · Case

001

MAY 5, 2026TUTORIAL

Migrating From GPT 5.4 to GPT 5.5: A Practical Migration Playbook

Migrating production LLM features from one model version to the next is mostly mechanical. change a string, ship the build. until it isn't. The places it bites: subtle output differences your prompt-test suite missed, cost shifts you didn't notice, and behavio

002

MAY 2, 2026TUTORIAL

ChatGPT 5.5 Multimodal Patterns: Vision, Audio, and Mixed Inputs

Multimodal LLM features have moved from "interesting demo" to "real production capability" over the last two years. With GPT 5.5, vision and audio inputs are reliable enough to ship into customer-facing features for use cases like document analysis, visual sup

003

APR 28, 2026ESSAY

ChatGPT 5.5 for Coding Tasks: Where It Wins and Where It Doesn't

Asking "is GPT 5.5 good at coding?" is the wrong question. The right question is: for which coding tasks does it produce reliable, useful output, and when should you reach for a different tool? The answer is more granular than the marketing makes it sound, and

004

APR 25, 2026ESSAY

ChatGPT 5.5: What Changed for Developers

Each minor GPT version brings a mix of broadly-better and use-case-specific improvements. This article focuses on what GPT 5.5 changes for builders. the capability shifts that justify migration effort, the patterns that newly become practical, and the places w

005

APR 22, 2026TUTORIAL

Claude Opus 4.7 1M Context Window: Patterns and Pitfalls

The 1M-token context window in Claude Opus 4.7 is a genuine capability shift, not a marketing increment. But "you can fit it" and "you should fit it" are different questions, and the production patterns for long context are non-obvious. This article walks thro

006

APR 18, 2026ESSAY

Claude Opus 4.7: What's New, What it Changes for Builders

Claude Opus 4.7 is the current frontier of the Claude model family. The headline upgrade from 4.6 is the 1M-token context window. five times the size. but the more practical wins are in long-context recall, agentic stability over long sessions, and a noticeabl

007

MAR 21, 2026TUTORIAL

Cost-Optimising ChatGPT 5.4 Production Deployments

The fastest path from a working LLM feature to a financially sustainable LLM feature is a set of cost optimisations that don't compromise quality. For most production deployments of GPT 5.4, these patterns cut spend by 60-85% with no measurable user-facing imp

008

MAR 17, 2026ESSAY

Gemini 3.1 Pro vs Other Frontier Models: A Practitioner's Comparison

The frontier-model market is now a genuine multi-provider one. Anthropic's Claude, OpenAI's GPT, Google's Gemini, plus serious open-weight models. The "best model" varies by use case, by week, and by which evaluation suite you trust. The question that actually

009

MAR 14, 2026ESSAY

ChatGPT 5.4 for Builders: Capability Patterns and Production Notes

When a new GPT model lands, the first wave of "what's new" coverage focuses on benchmark deltas. The second wave. the one builders actually need. is about which production patterns the model unlocks, where it changes the cost-quality calculus, and what to migr

010

MAR 11, 2026TUTORIAL

High-Volume Classification and Extraction with Gemini Flash Lite

The two highest-volume LLM use cases in production today are classification (assign a category to an input) and extraction (pull structured fields from unstructured input). For both, small-tier models like Gemini 3.1 Flash Lite often produce identical-quality

011

MAR 8, 2026ESSAY

ChatGPT 5.4: When to Use Reasoning Models vs Standard Chat

OpenAI now ships two distinct families of models for builders to choose between: standard chat models like GPT 5.4, and reasoning-tier models that produce longer, more deliberate outputs by spending more compute per request. They're not interchangeable. and ch

012

MAR 5, 2026ESSAY

Gemini 3.1 Flash Lite: When Fast and Cheap Wins

The frontier-model conversation gets the headlines. The small-tier models do the work. In production AI systems with real volume, the lite-class models. Gemini 3.1 Flash Lite, Claude Haiku, GPT mini variants. handle the bulk of requests, while the frontier tie

013

MAR 1, 2026ESSAY

Gemini 3.1 Pro for Builders: Strengths, Use Cases, and Production Patterns

Google's Gemini line has always been positioned as a frontier alternative to OpenAI and Anthropic. strong capabilities, deep integration with Google Cloud, and a willingness to lean into long-context and multimodal differentiators. Gemini 3.1 Pro is the curren

014

FEB 25, 2026ESSAY

Claude Sonnet vs Opus: A Practitioner's Guide to Choosing the Right Model

The Claude 4 family. Sonnet 4.6, Opus 4.6, and Opus 4.7. gives builders three meaningful tiers to choose from. Pick the wrong one and you're either burning money on a task that didn't need the firepower, or shipping a feature that almost works. The right choic

015

FEB 21, 2026TUTORIAL

Gemini 3.1 Pro Long Context: Patterns That Hold Up in Production

Long context is the dimension where Gemini's family has consistently distinguished itself. With Gemini 3.1 Pro, the ability to process very large inputs in a single call is mature enough to ship into production for serious analytical workloads. codebase reason

016

FEB 18, 2026TUTORIAL

Claude Sonnet 4.6 for Production AI Features: A Builder's Guide

Claude Sonnet 4.6 is the model most production AI features should be built on. It's the workhorse of the Claude 4 family. strong enough to handle complex reasoning, fast enough to drive real-time features, and priced for the volume that production usage actual

017

FEB 10, 2026TUTORIAL

Claude Opus 4.6 for Complex Reasoning Tasks: When and How to Use It

Opus 4.6 is the model you reach for when the answer matters and the question is hard. It's slower than Sonnet, costs more per token, and you should be deliberate about every call you make to it. But when the task is genuinely complex. multi-step reasoning, dee

018

OCT 7, 2025TUTORIAL

Building With the Claude Agent SDK: Production Patterns for 2026

The Claude Agent SDK is Anthropic's higher-level layer for building agents. handling the loop, tool execution, session management, and the surrounding infrastructure that you'd otherwise build yourself. Combined with Opus 4.7's improvements in long-running sta

019

SEP 16, 2025TUTORIAL

Building Production Agents with Claude Opus and Tool Use

The gap between "an LLM with tool use" and "a production agent that does real work" is wider than the demos suggest. The model can call your tools. but making it do so reliably, recovering when tools fail, knowing when to stop, and shipping outputs your users

020

AUG 26, 2025TUTORIAL

Claude Prompt Caching: Production Patterns That Cut Costs 80%

Prompt caching is the single highest-ROI feature in the Claude API for production workloads. Used well, it cuts the cost of high-traffic endpoints by 70-90% and shaves hundreds of milliseconds off latency. Used poorly. or ignored. it leaves the equivalent of a

021

MAY 6, 2025ESSAY

LLM Tier Economics: Flash Lite vs Pro vs Frontier - A Decision Framework

Every major LLM provider now offers three tiers: a small/lite model (Gemini 3.1 Flash Lite, Claude Haiku, GPT mini variants), a mid-tier workhorse (Gemini 3.1 Pro, Claude Sonnet 4.6, GPT 5.4/5.5), and a frontier flagship (Claude Opus 4.7, reasoning-tier models

022

APR 29, 2025TUTORIAL

Streaming LLM Responses in FastAPI: SSE, WebSockets, and Real-Time AI

LLM responses are fundamentally different from traditional API responses. A typical database query returns in under 100ms. A GPT-4 completion for a long prompt can take 15-30 seconds to fully generate. Users will abandon a blank screen after 2-3 seconds. The s

023

APR 15, 2025TUTORIAL

Next.js Caching: A Production Deep Dive (Fetch, Router, ISR, Edge)

Next.js has four caches. They interact. They don't always invalidate the way you'd guess. Most production incidents I've debugged in the last year on Next.js apps trace back to a misunderstanding of one of the four. usually the Data Cache or the Full Route Cac

024

APR 1, 2025ESSAY

Next.js Edge Runtime - A Production Reality Check

The pitch for the Edge Runtime sounds irresistible: your code runs in 300+ cities, the cold start is under 50ms, and your users always hit a server within a few hundred miles of them. Latency disappears. The reality, after building several apps that targeted E

025

MAR 18, 2025TUTORIAL

Next.js App Router in Production: Patterns That Actually Scale

The Next.js App Router has been generally available for over two years now. The early "should we migrate?" debates have settled. the answer for new projects is yes, and the migration patterns for existing apps are mature. But the App Router rewards a different

026

MAR 4, 2025ESSAY

React Server Components: A Mental Model That Actually Sticks

Most React developers I've onboarded onto an App Router project have the same first reaction: "Oh, Server Components are like SSR." This is wrong in a way that takes weeks to unlearn. The mistake leads to apps that ship JavaScript they don't need, fetch data i

027

FEB 18, 2025TUTORIAL

React 19 Hooks: use(), useOptimistic, useActionState in Production

React 19 introduced three hooks that fundamentally change how production React apps handle async data and forms: use(), useOptimistic, and useActionState. Used together, they replace huge amounts of boilerplate that previously took libraries like SWR, React Qu

028

FEB 4, 2025ESSAY

React Rendering Performance: When to memo, When Not to (2026 Edition)

For most of React's history, the standard performance advice was: profile, find re-renders, wrap things in useMemo, useCallback, and React.memo. The advice produced codebases full of memoisation that couldn't pass a real audit. half of it was wrong (memoising

029

JAN 21, 2025TUTORIAL

Go Concurrency Patterns That Actually Hold Up in Production

Go's concurrency model is famously approachable: go func() and you have a goroutine. The trap is that easy to write is not the same as easy to write correctly at scale. Most production Go incidents I've debugged trace back to one of three things: leaked gorout

030

JAN 7, 2025ESSAY

Go HTTP Services in 2026: net/http vs Gin, Echo, Chi, Fiber

For years, the standard advice for Go HTTP services was "use Chi" or "use Gin". anything to escape net/http's missing features. The standard library couldn't do path parameters, methods routing was awkward, middleware composition was painful. Frameworks closed

031

DEC 17, 2024TUTORIAL

Go gRPC in Production: Patterns for Reliable Microservice Communication

When Go services talk to each other across a network, gRPC is the default for good reason: schema-first contracts, generated client and server code, streaming support, and a binary wire format that's faster and smaller than JSON. But gRPC's defaults are not pr

032

DEC 3, 2024TUTORIAL

Spring Boot AI Error Analyzer: One Annotation, Plain-English Stack Traces

Every Java team I've worked with loses hours per week to the same ritual. An exception fires in production, an engineer copies the stack trace into a chat, scrolls past framework noise to find the one line that matters, then walks back through the code to figu

033

NOV 19, 2024TUTORIAL

Spring Boot Modular Monolith: Better Than Microservices for Most Teams

The pendulum has swung back. After a decade of teams over-correcting from monoliths to microservices and discovering the operational tax (distributed tracing, network failures, eventual consistency, deploy-time coupling pretending to be runtime decoupling), th

034

NOV 5, 2024TUTORIAL

OpenTelemetry in Spring Boot: A Production Observability Setup

OpenTelemetry has become the default observability stack for modern Java services. It's vendor-neutral (you can ship to Datadog, Honeycomb, Grafana Tempo, Jaeger. same code), it covers traces, metrics, and logs in one SDK, and Spring Boot's integration story i

035

OCT 15, 2024ESSAY

Spring WebFlux vs Virtual Threads: Which Concurrency Model in 2026

For five years, Spring teams chasing high throughput had one answer: WebFlux. Reactive streams, non-blocking I/O, the whole reactive programming model. The cost was steep. every dependency had to be reactive (R2DBC instead of JDBC, reactive Kafka clients, reac

036

OCT 1, 2024TUTORIAL

Spring Boot + Project Loom: Virtual Threads for High-Throughput Java Services

Java 21 shipped Project Loom as a production feature. Virtual threads. lightweight user-mode threads managed by the JVM rather than the OS. fundamentally change the performance profile of blocking I/O applications. For Spring Boot developers, this means near-W

037

SEP 17, 2024TUTORIAL

Java Spring Boot: The Complete Guide to Building Production REST APIs

Spring Boot is the most widely deployed Java framework in the world. It powers banking systems, healthcare platforms, e-commerce giants, and the overwhelming majority of enterprise microservices. If you're building anything serious in Java, Spring Boot is your

038

SEP 3, 2024TUTORIAL

Python FastAPI: The Complete Guide to Building Production APIs

FastAPI is the fastest-growing Python API framework. and for good reason. It combines Python type hints with automatic OpenAPI documentation, Pydantic v2 validation, and genuine async support. Teams routinely see 2-3× the throughput of Flask/Django for I/O-bou

039

AUG 20, 2024TUTORIAL

Firestore Data Modeling That Survives Scale: Patterns, Pitfalls, and Production Lessons

Firestore's most common cause of failure isn't technical. it's data modeling. Bad Firestore schemas produce expensive queries, hit document size limits, require full collection scans, or make certain features structurally impossible. Good schemas are designed

040

AUG 6, 2024TUTORIAL

Supabase RLS Patterns for Multi-Tenant SaaS: The Complete Playbook

Row Level Security (RLS) is Postgres's mechanism for enforcing data access rules at the database level. In Supabase, it's the primary security boundary between your application and your data. When implemented correctly, RLS makes it structurally impossible for

041

JUL 23, 2024TUTORIAL

Supabase: The Complete Developer Guide for Modern Full-Stack Apps

Supabase is the open-source Firebase alternative built on Postgres. It gives you a hosted Postgres database, REST and GraphQL APIs auto-generated from your schema, real-time subscriptions, built-in authentication, file storage, and serverless Edge Functions. a

042

JUL 9, 2024TUTORIAL

Firebase for Modern App Developers: The Complete 2026 Guide

Firebase is Google's application development platform. a fully managed suite of backend services designed for mobile and web apps. At its core: Firestore (a NoSQL document database), Authentication, Realtime Database, Cloud Storage, Cloud Functions, and Hostin

043

JUN 25, 2024TUTORIAL

PostgreSQL for Application Developers: The Complete Guide

PostgreSQL is the world's most advanced open-source relational database. It's the default choice for serious applications. powerful enough to handle the most complex data requirements, reliable enough for financial and healthcare systems, and open enough to de

044

JUN 8, 2024TUTORIAL

PostgreSQL JSONB: Indexing Strategies and Query Performance Deep-Dive

PostgreSQL's JSONB type is one of its most powerful features. and one of the most misunderstood. Teams reach for JSONB to store flexible data, then discover their JSONB queries are slow, their indexes aren't being used, or their query planner is making bad cho