Why HTML Beats Markdown for Claude Code Plans and Specs

I have stopped writing markdown plans for Claude Code. Not because markdown stopped working, but because I stopped reading them. Once the model started running long enough to produce thousand-line plans, my honest behaviour was to skim, lose the loop, and shrug at whatever came out the other end. That isn't a model problem. It's a medium problem.

The fix that has compounded for me over the last few months is to use HTML for everything in that input-to-Claude category: plans, PRDs, specs, design systems, even weekly status updates. The model can produce HTML just as easily as it produces markdown. What's different is that I engage with HTML in a way I no longer engage with long markdown documents. And because the bottleneck on agent quality is almost always how clearly the human can see what's happening, that single change has done more for my output than any prompt-engineering trick I've picked up this year.

This essay is the workflow.

The Compute Allocator Mindset

The frame this all sits inside: when I tell Claude Code to run for eight hours on a non-trivial task, I am, very roughly, authorising five hundred dollars of compute spend. Numbers vary, but the order of magnitude is real. The job description for a software builder is quietly shifting from "writes code" to "decides what is worth compute." The Level 4 / Level 5 mental shift I've written about elsewhere lands here too — the work moves up a level of abstraction, and the question becomes which loops to send the model down.

That answer lives in the spec. It lives in the plan. It lives in the PRD. If the plan is wrong or sloppy, the eight-hour run produces an eight-hour-shaped mistake, and you are out the compute either way. Better planning is no longer an aesthetic preference. It is the highest-leverage cost-control lever in the whole stack, and the token-economics piece I've made about prompt caching only tells half the story — caching makes each run cheaper, but planning decides whether the run should happen at all.

Why Markdown Stopped Working

Markdown was perfect for the era when plans were fifty lines, you read them end-to-end, and you edited them by hand to nudge the model. That era ended somewhere between Opus 4.5 and Opus 4.7. Today, when I ask Claude Code to plan a non-trivial feature, the markdown plan is routinely a thousand lines long. Sometimes more.

The failure mode I started catching myself in: I would scroll past three-quarters of it, "approve" the plan, and then be surprised by the result. The model wasn't doing anything wrong. I wasn't doing the part of the job that requires me to actually be in the loop. And because markdown's only visual hierarchy is heading levels and bullet indentation, there is no way to lay out a thousand-line plan that makes it scannable. You either read it or you don't.

By the end of last year I had also stopped editing markdown plans manually — I would ask Claude to edit them instead. At which point the medium was no longer doing any of the things it had originally been chosen for. It was just inertia.

What HTML Gives You That Markdown Doesn't

Real mockups instead of ASCII art

Markdown plans full of ASCII-art mockups are charming and slightly tragic. The model is trying to render a UI in a medium that cannot render a UI. In HTML, it doesn't have to try. It draws boxes that look like boxes, cards that look like cards, and tables that look like tables. The mockup in the plan is the mockup. I am no longer doing the mental translation from "two characters of dashes and a pipe" to "a button."

This matters more than it sounds. Half the bugs I caught in markdown-plan-driven runs were ones I would have caught at plan-review time if the plan had let me see the thing instead of describing it.

Scrollable, scannable structure

HTML lets the model lay out the plan the way the content actually wants to be laid out. A two-column section where code lives next to the diagram explaining it. A mood board of references next to the design copy. A risk callout pinned to the side of the decision it relates to. Tables that don't collapse into unreadable pipe-fences. Hierarchy that I can scan in three seconds and then dive into the parts that matter.

The model can produce all of this with roughly the same prompt I would have used for markdown. The output is longer, sometimes by an order of magnitude, but I am actually reading it.

The Workflow: From Brainstorm to Implementation Plan

The pattern I use across most non-trivial tasks now. Three steps.

Step 1: brainstorm in HTML

My opening prompt is genuinely as simple as: "Brainstorm eight directions for X. Write the output as an HTML file with visual cards for each idea, a one-line 'why this,' a mockup, and a risk callout."

What comes back is eight cards I can scroll through in under a minute. The model uses real visual hierarchy, draws thumbnails, and lays them out so I can compare them. The rule of thumb I follow at this stage is "don't read longer than one screen of Claude Code output." HTML brainstorms respect that constraint; markdown ones don't.

Step 2: have Claude interview you

Once I've picked a direction, the second prompt is "Interview me about idea #N. Ask me whatever you need to write a complete implementation plan." This is the unknown-unknowns phase. Better answers come from being asked specific questions than from staring at a blank document trying to enumerate requirements myself.

This is also where I notice the "I trust you" tone I now end most prompts with. Giving the model an explicit out — "ask whatever you need" — gets sharper questions than over-constraining it would.

Step 3: generate the plan

Now the third prompt: "Create an HTML file as the implementation plan. Include mockups, code excerpts, file system diagrams, mood boards — whatever is needed to give me maximum context." The output is a single self-contained HTML file I open in a browser. It contains the file-tree layout, exemplar code blocks, the design references, the data shapes, and the testing rubric. I will actually read it.

The integration of this workflow with parallel agents — /best-of-n runs in Cursor 3, say — is the obvious next move. An HTML plan is a great input to multiple parallel implementations because each agent reads the same artifact and the diffs become comparable.

Micro-Software on Top of Micro-Software

The part of the pattern that took me longest to internalise, and that has the biggest compounding effect: when I don't like a section of the plan — a table of decision rules, a config matrix, a list of edge cases — I no longer argue with Claude in chat. I ask the model to build a custom UI for editing just that one section.

The prompt is something like: "Create an editable HTML artifact for the decision-rules table. Custom UI that gives me structure but flexibility. Design the ideal interface for this problem. Output back to me as data I can paste into the plan."

What I get is a tiny throwaway app — sliders, dropdowns, drag-and-drop, whatever the model thinks fits — that exists for the next ten minutes and lets me edit the section properly. When I'm done, I copy the data back into the plan and discard the UI. I have built and thrown away dozens of these little apps in the last two months.

This is what makes HTML genuinely compounding rather than just prettier. It isn't only that the plan looks better. It's that any module of the plan I disagree with can be zoomed into and edited inside a custom interface designed specifically for that decision. Personal software on top of personal software. The cost of producing it is roughly nothing, and the engagement quality is enormous.

Prompting Philosophy

The prompts I use across this workflow are short. The instinct I see most often in builders who haven't yet calibrated is to over-constrain: long preambles, hundred-line system prompts, prescriptive output formats. That outsources too much. The model produces better plans when it has room to make choices about structure.

My rule of thumb: give Claude enough information to know what you want, never enough to dictate how it gets there. End the prompt with an out — "whatever you think is needed to give me maximum context" — and you give the model permission to add something you didn't think to ask for.

The last line on most of my plan-generation prompts is no longer "make no mistakes." It's "I trust your judgement." The reasoning trace behind a "trust" framing is noticeably better than the reasoning trace behind a "do not fail me" framing. The model is calibrated on its training data, and the training data is full of human collaboration patterns where trust produces better work than fear does.

Beyond Plans: Living Design Systems and Status Updates

The HTML-as-medium pattern generalises far past planning. Two cases where it has already paid off for me:

Living design systems. I now store every project's design system as a single HTML file at the root of the repo: design.html. Colours, typography, spacing, radius scale, an interactable component gallery with every variant. I point Claude Code at it at the start of any session — "reference design.html" — and the consistency of the output across multiple agents and sessions is dramatically better than when I was relying on a design.md style guide. Marketers and designers who can't read code can still open it in a browser and see what the brand looks like. The same artifact serves the engineering loop and the design-review loop.

Status updates. I now write client weekly updates as HTML files. The model reads my Slack and my commits and produces a one-page status update with visual hierarchy, charts where appropriate, and a clear "decisions needed from you" callout. Compared to the markdown version, the response rate is roughly double. HTML status reports get read. Markdown ones get skimmed.

Just-in-Time Documentation

The bigger shift sitting behind all of this: the cost of producing a tailored, high-quality, custom-shaped document has dropped to near zero. That changes which documents are worth producing.

For most of the last decade, the dominant question in product and engineering documentation was "where is the single source of truth?" Centralised wikis. Templated PRDs. One blessed format for everything. The reason was that creating high-quality artifacts was expensive, and finding them later was hard. So the answer was to consolidate.

Neither premise holds any more. Creating an artifact is cheap. Models discover context through tools — they can find the document they need without me filing it in the right folder. The right default has shifted from "one canonical artifact in a blessed format" to just-in-time, custom-shaped, high-quality, throwaway documentation. That generalises further than people expect, all the way to building production agents on the Claude Agent SDK — the artifacts those systems produce should be tailored to the consumer too, not generic.

The Practitioner's Take

design.md is dead. Long live design.html. Same for plan.md, prd.md, spec.md, and probably your weekly status update template too.

The pattern works not because the model is smarter at HTML — it isn't — but because I engage with HTML in a way I no longer engage with markdown. The bottleneck on agent quality, especially on long-running agents, is almost never the agent. It is how clearly the human in the loop can see what's actually happening. HTML beats markdown on that one axis, and that one axis turns out to matter more than almost anything else.