FastHTML page

Understanding Spec-Driven-Development: Kiro, Spec-Kit, and Tessl (martinfowler.com)

128 points by janpio 4 months ago | 32 comments

• tconfrey 4 months ago

I've been watching this trend toward SDD. Makes sense but it feels like the process pendulum is swinging back toward the pre-agile era of functional specs and design documents. Not quite Big Design Up Front[0] but maybe increasingly working software == comprehensive documentation[1]?

Waterfall anyone?!

[0] https://en.wikipedia.org/wiki/Big_design_up_front

[1] https://agilemanifesto.org/

• 9rx 4 months ago

Functional specs and design documents are just programming in a natural language. In the olden days it took a human to "code" that into a programming language, but now that compilers (i.e. LLMs) are getting better at compiling natural language, it might look like you're able to skip a step (to varying degrees of success).

Whereas agile doesn't care what language you build your software in. It's about taking managers out of the picture; encouraging developers to get involved with what are normally considered "managerial" tasks. The 12 Principles goes into more detail about the things developers might need to do if there are no managers.

• bonesss 4 months ago

Behaviour Driven Design, following Test Driven Design practice, can create a living specification. Human readable domain exploration, human-readable criteria, and direct links to the test harness to demonstrate conformance and domain capabilities.

This gives you verifiable set of spec documents (BDD reports for integration tests, acceptance tests, domain requirements, etc with green/red status), to iterate and collaborate on without requiring undue upfront work separated from the actual product. ‘Agile’, JIT, YAGNI-aware, specifications, no waterfall necessary.

• whazor 4 months ago

You can have small design up front as-well. You write down one or two pages, let the LLM generate code and tests. Keep iterating. If you believe in 100% LLM coded applications, then it makes sense to manage the English input as specifications. Instead of throwing the prompts away, you neatly organize them. Plus you can add additional constraints when the AI does something you don't like.

But I don't trust LLMs to program anything critical, and only do sandbox/tests/demo's. Things where code quality is less important.

• CuriouslyC 4 months ago

Spec driven development is a good idea, but the current implementations are trash because they hand off markdown files to an agent who might as well be wiping its ass with them for all the reproducibilty you get. If you're going to have agents generate specs they should be structured and transformable via code gen in to actual stub code and tests. It's only a little bit more work than unstructured markdown specs, saves a bunch of time in terms of boiler plate generation and gives you very high reproducibility.

• ilteris 4 months ago

Do you have any documents how this could be achieved? Thanks

• CuriouslyC 4 months ago

Take a look at the CLI subproject of https://github.com/sibyllinesoft/arbiter. It does all this. I am in the process of making a version of the CLI that's standalone with a fully open license, I'm just swamped ATM getting a side hustle ready for No Kings.

• yodon 4 months ago

This pretty much aligns with my experience with SpecKit - I'm excited by it, and enjoying working with it, but have had a hard time finding guidance on advanced real world use cases.

All the tutorials I've found are little more than "here's how to install it - now let's make a todo list app from scratch!!"

Would be great to see how others are handling real world use cases like making incremental improvements or refactorings to a huge legacy code base that didn't start out as a spec driven development hello world project.

• gsadaka 4 months ago

I have also struggled to find real world examples for these approaches.

Following a BDD approach with a coding CLI works a lot better, as it documents the features as code rather than verbose markdown files no one will read.

Having a checklist for an AI to follow makes sense, but that's why agents.md exists. Once the coding patterns and NFRs are documented in it, the agent follows them as well as they would follow a separate markdown spec.

• CuriouslyC 4 months ago

This focus on markdown specs is the dumbest thing. Have a spec DSL that can be validated and transformed into real code. I've already got this working with CUE (you can even define gherkin rules as part of the spec and it'll codegen them), I just need to split the CLI out from the enterprise product it's embedded in.

• josefrichter 4 months ago

yeah you need to read through its templates and "source code" to understand what it does - which is not necessarily a bad thing for this type of project.

• tharkun__ 4 months ago

    Distinguished Engineer and AI-assisted delivery expert at Thoughtworks.

And then talk about memory banks. Yeah, I recognize that from work where "AI has taken off" as well.

Guess what: As memory banks grow or accumulate the AI gets confused and doesn't quite deliver.

So far, a human that actually knows their product still prevails and is necessary to actually guide any AI effort. AIs have been trying to bullshit me so much it's not even funny any longer. Of course they all apologize and figure out reality when I guide them but that doesn't change the facts. And I simply can't read all the documents the AIs write for themselves to correct all of them and even if I did I wouldn't be sure enough that they'd improve significantly enough for me to try and spend this mind bogglingly boring amount of time to help this thing that's supposed to take my job ....

• beaker52 4 months ago

What exactly are you trying to say about their role description and the talk of memory banks?

• CuriouslyC 4 months ago

Memory is a bad idea right now, because it requires well tuned retrieval to deliver value, and one sized retrieval systems don't work, full stop. Most memory systems are designed around a homogenous chat paradigm and produce negative results in heterogenous chat environments or non-chat based agentic workflows.

The right way to do "memory" is to feed it to a "metacognition/default mode" network that builds a theory of mind / task ideation structure async from the main agent, then injects context relevant steering into the agent for each prompt based on this metamodel. So, "agentic memory" basically.

• mrbonner 4 months ago

It’s on there, right? And that “thought leader” title they’ve put on LinkedIn? I’m still scratching my head trying to figure out what that means!

• yoaviram 4 months ago

Sharing my experience with SpecKit in case anyone finds it useful.

I've been using Speckit for the last two weeks with Claude Code, on two different projects. Both are new code bases. It's just me coding on these projects, so I don't mind experimenting.

The first one was just speckit doing its thing. It took about 10 days to complete all the tasks and call the job done. When it finished, there was still a huge gap. Most tests were failing, and the build was not successful. I had to spend an equally long, excruciating time guiding it on how to fix the tests. This was a terrible experience, and my confidence in the code is low because Claude kept rewriting and patching it with many fixes to one thing, breaking another.

For the second project, I wanted to iterate in smaller chunks. So after SpecKit finished its planning, I added a few slash commands of my own. 1) generate a backlog.md file based on tasks.md so that I don't mess with SpecKit internals. 2) plan-sprint to generate a sprint file with a sprint goal and selected tasks with more detail. 3) implement-sprint broadly based on the implement command.

This setup failed as the implement-sprint command did not follow the process despite several revisions. After implementing some tasks, it would forget to create or run tests, or even implement a task.

I then modified the setup and created a subagent to handle task-specific coding. This is easy, as all the context is stored in SpecKit files. The implement-sprint functions as an orchestrator. This is much more manageable because I get to review each sprint rather than the whole project. There are still many cases where it declares the sprint as done even though tests still fail. But it's much easier to fix, and my level of trust in the code is significantly higher.

My hypothesis now is that Claude is bed at TDD. It almost always has to go back and fix the tests, not the implementation. My next experiment is going to be to create the tests after the implementation. This is not ideal, but at this point, I'd rather gain velocity, since it would be faster for me to code it myself.

• fabianlindfors 4 months ago

The note on how all those tools seem to mostly be spec-first and vague about spec maintenance was interesting to me. Me and my cofounder have been going all-in on spec-as-source, as we think it's really the most interesting use of specs, but it's also challenging to get of the ground. If anybody has any thoughts on this, I'd love to hear them.

Also in case somebody wants to try a spec-as-source tool, we'd love feedback: https://specific.dev

• conartist6 4 months ago

if the spec is truly the source, it's because you've invented a formal programming language that evaluates the spec.

Anything short of that and the spec is the spec, the source is the source.

Now you get to learn about what good code looks like, like the rest of us!

• fabianlindfors 4 months ago

With that, I was referring to the definition in the article: "The spec is the main source file over time, and only the spec is edited by the human, the human never touches the code". That's how Specific works.

And I think that opens up a very interesting question about quality. If the human never touches the code, then "good code" gets replaced with "good specs" instead, and I don't think anybody knows what constitutes good specs in that context right now!

• conartist6 4 months ago

That is just so at odds with my way of thinking. The code is the spec -- the only spec that matters. The only way of describing the behavior that everyone can agree on the meaning of.

A design document can add color, but only the code tells you what the application does

• hatmanstack 4 months ago

In my experience with Kiro's spec-driven approach it generated massive task lists (12+ tasks with 4+ sub-tasks each). The workflow was decent but it deleted code unpredictably and wouldn't revert changes. Being a full IDE likely diverts resources to UI edge cases rather than core reliability.

So much simpler to just iterate without the puzzle box of tasks. "a sledgehammer to crack a nut"

• iamdeedubs 4 months ago

In my experiments with SpecKit I was always left wondering "when does it merge all this specs into a single ground truth". I never got there and it felt like a huge missing step.

Now I'm left trying to define/design what a "spec" for communication between humans and coding agents would look like, to power what Birgitta called spec anchored.

• pessimizer 4 months ago

> Now I'm left trying to define/design what a "spec" for communication between humans and coding agents would look like, to power what Birgitta called spec anchored.

I feel that now with AI this is something that we have to finally do. Define how we write out a spec and record an architecture semi-formally, and in a way that is human-readable and human-manageable. And in a way that can 1) be consumed partially by an LLM context, rather than entirely (because it may be too big), and 2) have that partial ingestion be enough for it to do real work, either on the spec itself on or on the code, without deviating from the core intentions and architecture.

We tried and failed with the UML and Rational Rose type stuff, I think because it didn't record intentions well enough, was mostly pictures and not words, and seemed to be something that you would create after you finishd a project rather than fill in the details and guide you while you were building it. Hence, the whole idea fell away because it wasn't useful for anything but documentation, maintenance or refactoring; you were already selling the product before the spec became at all useful.

I'm left looking at vague leftfield ideas like https://c4model.com/.

• esafak 4 months ago

I don't know why SDD suddenly became a thing, but FWIF, I find value in spec files to make sure I know what I'm going to get, and to track progress when I break up projects into smaller tasks. Mind you, I don't use any tool or framework; just a simple Markdown file. I don't see value in the formalism beyond that.

• josefrichter 4 months ago

I like the part that uses custom slash commands as way to wrap your input into some well-structured prompt template for given type of task. I like the part that also injects relevant pieces of "broken down AGENTS.md" as I see it.

I don't like the part that tries to leave no knot untied, which creates that sledgehammer for cracking a nut, as mentioned in the article. But I am sure it's easy to add another custom slash command like "/experiment" or "/stub" that would bring those context management benefits without the bloat, in situations when you don't know yet what and how you want to build something.

And then maybe "/wrap-up" to tie all the untied knots once you're sufficiently happy. Kinda like surgeon stepping aside after the core part of the operation.

• ctxc 4 months ago

I was excited to use spec-kit. I had to dump it eventually when it generated steps that were the equivalent of Tony Stark building a robot from scratch in a cave when "just screw this bolt on" would have sufficed.

Always made it too complex, and at some point it wasn't worth correcting it anymore.

• raphinou 4 months ago

Seems I've been doing something like spec driven development on my last project. I keep a spec of the solution developed, and include it in every request sent to the ai, and it yields good results in my case. I'm still the developer in charge, but I can easily hand off non subtil or general code generation. It's clearly helped me code faster, though I had to spend quite some time on the spec, which still clarified a lot of things for me too. In the end I enjoy this approach.

• robertclaus 4 months ago

Plotly's new Plotly Studio product is a spec-anchored approach to building data applications. Each chart or dataset gets its own prompt/spec.

The question of how much detail to include in a spec is really hard. We actually split it into two levels - an input prompt describing details the user cares about in that component and an output spec describing what was built to allow verification.

• beaker52 4 months ago

At that point, isn’t it just a description of the chart or dataset?

• iamsaitam 4 months ago

> When I asked Kiro to fix a small bug (it was the same one I used in the past to try Codex), it quickly became clear that the workflow was like using a sledgehammer to crack a nut. The requirements document turned this small bug into 4 “user stories” with a total of 16 acceptance criteria, including gems like “User story: As a developer, I want the transformation function to handle edge cases gracefully, so that the system remains robust when new category formats are introduced.”

Kiro, your new corporate project manager.

• htrp 4 months ago

they did train it on the amazon way

• constantcrying 4 months ago

Really, we are doing waterfall, but with AI, now?