Should Our Prompts Be Ephemeral? The Reality of Specs as Code

Sean Grove from OpenAI gave a talk recently that's been making the rounds in engineering circles. His argument: specifications are becoming more valuable than code. The vision sounds great—write your intent once in clear language, derive everything else from it. But there's a gap between the conference stage and your team's daily standup.

The Problem We're Creating

Here's something we all do now: craft a detailed prompt for an LLM, get back some code, copy-paste it into our project, and delete the prompt. We keep the generated artifact and throw away the source material. Grove calls it "shredding the source and version controlling the binary."

It does seem backwards when you think about it. The prompt contains your actual knowledge—what the system should do, why it matters, what edge cases exist. The generated code is just one possible expression of that intent. Yet we treat the derivative as precious and the original as disposable.

The Specs as Code idea flips this around. Treat your specifications like you treat your source code—version them, review them, maintain them. Everything else flows from that single source of truth.

Sounds reasonable. But does it actually work?

Why This Might Be The Right Direction

There are real benefits to treating specifications as first-class artifacts.

When specifications live in version control alongside code, they get less ambiguous over time. Every vague phrase becomes a code review comment. Every edge case gets explicitly documented. The specification evolves into something more precise through the same process that makes code better.

Cross-functional collaboration improves too. Product managers, legal teams, and engineers can all contribute to a Markdown file in Git. No specialized tools, no translation layers between "business requirements" and "technical specs." OpenAI's Model Spec is literally just Markdown files that anyone can read and edit.

You also get a complete audit trail. When something breaks, you can trace back through the specification history to see exactly when and why requirements changed. When there's a dispute about what the system should do, the spec is the authoritative reference.

Grove's most interesting point is about using specifications directly as training material for AI models. The spec doesn't just describe the desired behavior—it actively shapes it. But this only works if your specifications are rigorous, versioned artifacts that you treat seriously.

Where This Falls Apart

The theory holds up better than the practice.

Making a specification precise enough to be executable means you've essentially written a program. Every edge case, every validation rule, every state transition—these are the same things we encode in code. You've just changed the syntax. A specification detailed enough to generate correct implementations carries the same complexity as the implementation itself.

There's also a fundamental tension between precision and readability. Specifications need to be precise enough to drive behavior but readable enough for non-technical stakeholders to contribute. These goals conflict. Natural language is ambiguous by design. Remove that ambiguity sufficiently for a machine to act on it, and you often end up with pseudo-code that's harder to read than actual code.

Then there's the determinism problem. When you compile TypeScript to JavaScript, you get the same output every time. The compilation is deterministic and verifiable. Generate code from specifications via AI? Each run produces different results. The same prompt today might generate subtly different code tomorrow. Different models produce different implementations. There's no guarantee of consistency or reproducibility. You can't treat AI code generation like compilation—it's more like having a different programmer implement the spec each time.

Now you're maintaining two things: the specification and the implementation. Even if you generate the implementation, you still need to verify it matches the spec. When bugs appear, do you fix the spec, the implementation, or both? The surface area for inconsistency just doubled.

Versioning becomes unclear too. Code has straightforward semantics—changing a function signature is a breaking change. But specifications? If you clarify a vague requirement, is that a patch or a major version? If you add detail to an edge case that was previously implicit, did the behavior change or did you just document existing behavior? Specification versioning lacks the clear semantics that make code versioning work.

What Actually Happens Day-to-Day

The philosophy is one thing. Daily practice is another.

Drift Is Inevitable

You write a great specification. Implementations get built. Then production breaks at 2am. The fastest fix is changing the code directly. You'll update the spec later, but there's another incident, then a deadline, then another crisis. Within weeks, nobody's sure whether the specification or the implementation is authoritative.

Without automated validation that implementations match specifications, drift happens. And building that validation is often as hard as building the system itself.

Review Becomes a Bottleneck

Specification changes define system behavior, so they need rigorous review. But who reviews them? Engineers understand technical implications but might miss business context. Product managers understand requirements but might miss technical edge cases. Legal cares about compliance but might not grasp system constraints.

You need everyone in the review, which means slow reviews. Or you split reviews by concern, which means inconsistent specifications. Neither approach scales.

Some Knowledge Lives in Code

Real systems have emergent behaviors not captured in any specification. Performance characteristics, failure modes under load, interactions between components—these emerge from implementation details. You can try to specify everything, but complete specifications become unmaintaintable. Keep specs high-level, and they're too vague to be actionable.

Some knowledge simply lives in the code and can't be fully extracted.

Testing Gets Weird

If specs are code, they need tests. But what tests a specification? You can write example test cases, but those are just more specifications. You can validate generated implementations, but that only catches misalignment, not incorrect specs. There's no good answer for how you know if your specification is correct.

Tooling Doesn't Exist Yet

Code has incredible tooling—IDEs, linters, debuggers, profilers. Specifications have text editors. You can lint Markdown, but you can't lint intent. You can check for broken links, but you can't debug why a specification produces unexpected behavior. The tooling ecosystem for treating specifications as code barely exists.

Patterns That Actually Help

Despite these problems, some approaches make Specs as Code more viable.

Write executable specifications from the start. Not prose—executable examples. BDD-style scenarios, property-based test descriptions, or formal specifications that can be checked. If it can't be validated, it's not really code.

Version everything aggressively. Every specification change is a versioned release. Breaking changes get major versions, additions get minor versions, clarifications get patches. Take semantic versioning seriously.

Build automation that validates implementations against specifications. Test generation, contract validation, static analysis—whatever works for your domain. Without automation, consistency is impossible.

Modularize your specifications like you modularize code. Don't write monolithic specs. Compose small, focused specifications with clear responsibilities and interfaces.

Use unique IDs for every requirement. Code references the requirements it implements. When code changes, you know which specs are affected. When specs change, you know which code needs updating.

Accept that specifications can't capture everything. They should document critical contracts and interfaces. Implementation details live in code. Don't try to specify everything—specify what matters for correctness and alignment.

Is This The Right Direction?

Specs as Code isn't wrong, but it's not a universal solution.

It works when you need strong alignment across diverse stakeholders, when the problem domain is well-understood and stable, when you have clear contracts that benefit from explicit specification, and when correctness matters more than rapid iteration.

It doesn't work when requirements are volatile, when you're exploring the solution space, when performance and implementation details dominate, or when you lack tooling to validate spec-implementation alignment.

The real insight from Grove's talk isn't that we should replace code with specifications. It's that we've undervalued specifications as engineering artifacts. We should version them, review them, and maintain them—not because they replace code, but because they complement it.

Code answers "how." Specifications answer "what" and "why." Both matter. The question isn't which one is primary—it's how to keep them synchronized.

The Open Question

Maybe the future Grove describes will arrive—where clear specifications become our most valuable engineering artifact. Or maybe we'll discover that the relationship between intent and implementation is more complex than any single paradigm can capture.

What's certain is that we're discarding something valuable when we delete our prompts. The question is whether Specs as Code is the answer, or if we need something else entirely. Something that captures intent without the overhead of dual maintenance. Something that enables alignment without forcing everything through the narrow aperture of formal specifications.

The infrastructure exists—Git works fine for Markdown files. The practices are familiar—PRs, reviews, CI/CD all apply. What we're missing might not be tooling or process. It might be clarity about what problem we're actually trying to solve.

Are we trying to align humans? Align AI systems? Create audit trails? Enable non-technical contribution? Each of these goals might need a different solution. Specs as Code might be right for some and wrong for others.

For now, at least stop deleting your prompts. Version them, reference them, learn from them. Whether they become your new source code or just better documentation, they're too valuable to treat as ephemeral artifacts. The rest we'll figure out as we go.

Get in touch to discuss your specific challenges and explore solutions.