How AI-native products actually get built
The traditional SDLC is dead. The new lifecycle isn't about writing code - it's about orchestrating agents, validating outputs, and managing inference economics.
6 stages. Grounded in research from Sequoia, a16z, Bessemer, Anthropic, and the teams shipping AI-native products today.
This lifecycle powers Dacard's intelligence engine, defining how AI-native products get built.
Most B2B SaaS teams are still running a 2019 SDLC with AI bolted on. The ones winning are running a fundamentally different loop.
The AI-native product development lifecycle
Not a waterfall. Not agile. A new rhythm where humans specify intent, orchestrate agents, and own the quality bar.
This is a loop, not a line. Stage 6 feeds directly back into Stage 1.
What actually changed
The traditional SDLC assumed humans write all the code. The AI-native lifecycle assumes they mostly don't.
Traditional SDLC
AI-native PDLC
Specify & Constrain
Here's the thing most teams get wrong: they treat AI like a junior developer who needs a Jira ticket. That's not how this works. Your spec needs to be a structured prompt - complete with preconditions, constraints, and examples of what "done" looks like. And the harness? That's what keeps the agent from going rogue. OpenAI built a million-line product with zero hand-written code. The secret wasn't the model. It was the harness.
You
Writing structured specs with explicit acceptance criteria, preconditions, and examples. Defining the harness: what agents can touch, what they can't, and what patterns they must follow.
AI
Nothing yet. This is pure human judgment. The quality of everything downstream depends on what you define here.
Where it goes wrong
Vague specs produce vague outputs. "Build me a dashboard" gets you something. "Build me a dashboard with these 4 metrics, this layout, and this data source" gets you what you need. The difference is enormous.
- Write specs as structured prompts, not narrative documents. Include input/output examples, not just descriptions.
- Define harness constraints before generation starts: files agents cannot modify, patterns they must follow, libraries they must use.
- Set measurable acceptance criteria up front. "Works correctly" is not an acceptance criterion.
- Version your specs alongside your code. They're as important as the implementation.
- Include anti-examples - what the output should NOT look like. Agents learn from boundaries as much as targets.
Build the System of Context
When everyone has access to the same foundation models, what differentiates your product? Context. Emergence Capital calls this "Value over Model" - the surplus value your system creates when its context elevates raw model output into something uniquely useful. This stage is about building that system: curating what the agent knows, selecting which models handle which tasks, and defining the architectural constraints that keep everything coherent.
You
Curating context hierarchies (project-level, feature-level, task-level), selecting models, defining routing rules, and establishing architectural constraints as living documentation.
AI
Indexing codebases, building embeddings, analyzing dependency graphs, suggesting context relevance. The agent is helping you build its own instruction manual.
Where it goes wrong
Feeding the entire codebase as context. More context isn't better context. Token waste and diluted relevance are real problems. ICONIQ data shows companies use 2.8 models on average - single-model dependency is a strategic risk.
- Treat context curation as a first-class engineering discipline, not an afterthought. Someone should own it.
- Implement multi-model routing: expensive frontier models for complex reasoning, smaller models for simple tasks. Your COGS will thank you.
- Build context hierarchies: project-wide patterns, feature-specific knowledge, task-level instructions. Layer them.
- Define architectural constraints as context, not documentation. The agent reads context. It doesn't read your wiki.
- Pin model versions. Test upgrades in staging. A model provider update should never break your production system.
Orchestrate & Generate
This is the stage everyone fixates on - and almost everyone gets wrong. Generating code is the easy part. Orchestrating agents so the output is coherent, architecturally sound, and actually solves the right problem? That's the hard part. Cursor's CEO puts it bluntly: "If you close your eyes and have AIs build things with shaky foundations ... things start to crumble." The developer's job isn't writing code anymore. It's directing agents while maintaining taste and architectural judgment.
You
Managing parallel agent threads, resolving merge conflicts, defining scope boundaries, and making architectural decisions the agents can't make.
AI
Generating code across multiple files simultaneously, running parallel implementations, proposing alternatives, handling the mechanical work.
Where it goes wrong
Vibe coding without structure. Letting agents make architectural decisions. No "mission control" pattern for tracking what each agent is doing. The result is inconsistent code that works in isolation and fails at integration.
- Delegate in parallel, not serially. Modern tools support multiple agents on separate branches. Use them.
- Reserve architectural decisions for humans. Delegate implementation. This is the most important boundary in the lifecycle.
- Maintain a mission control view: what is each agent working on, what are the dependencies, where are the conflicts.
- Set token budgets per task before generation starts. Open-ended generation is an open-ended credit card.
- Review agent output in small batches. Kent Beck's finding: agents will sometimes delete tests to make them pass. Catch this early.
Validate, Eval & Craft
Here's what nobody talks about: AI-generated code has 1.7x more major issues and 2.74x more security vulnerabilities than human-written code. That's not a reason to stop using AI. It's a reason to get extremely good at validation. Intercom learned this the hard way - they pair every UX improvement with a "truth metric." When their AI agent boosted ticket deflection but accuracy dropped, they rolled it back. Speed is not the metric. Truth is.
You
Reviewing outputs for correctness and craft quality. Evaluating business logic. Making judgment calls on edge cases. Deciding what "good enough" means.
AI
Running automated test suites, eval pipelines, regression detection, security scanning, and code quality analysis. Flagging issues for human review.
Where it goes wrong
Accepting generated code without review. Measuring speed instead of quality. The DORA data is clear: AI improves throughput but degrades stability. More code, more risk - unless you validate ruthlessly.
- Build eval pipelines before you build generation pipelines. If you can't measure quality, you can't improve it.
- Track truth metrics: accuracy, hallucination rate, regression frequency. "All tests pass" is table stakes, not success.
- Distinguish between functional correctness (automatable) and craft quality (human judgment). Both matter.
- Implement the Intercom pattern: every AI-driven improvement gets paired with a counter-metric. If the counter degrades, roll back.
- Design reviews still matter. In a world where AI makes building easy, craft becomes the differentiator. Figma's Dylan Field calls this "pilot, not copilot."
Ship & Manage Economics
This stage didn't exist in the traditional SDLC. It exists now because AI-native products have a cost structure that traditional software doesn't: inference. Every API call, every agent loop, every chain-of-thought costs real money. Development costs of $200/month routinely explode to $10,000/month in production. Kyle Poyar's data shows 1,800+ pricing changes among the top 500 SaaS companies in 2025 alone. Nobody has this figured out yet - but the teams that are thinking about it are the ones that will survive.
You
Setting token budgets, monitoring cost-per-action, making model trade-off decisions, aligning inference costs with pricing tiers, building cost dashboards visible to product and engineering.
AI
Serving inference, processing requests, running production workloads. The meter is always running.
Where it goes wrong
No cost visibility. Inference costs scaling linearly with usage. No model version pinning - a provider update breaks production at 2am. Accel's data shows AI-native companies run 7-40% gross margins vs. 76% for traditional SaaS. The economics are different.
- Track cost-per-action, not just total inference spend. Know what each feature costs to serve.
- Implement tiered model routing in production: frontier models for complex tasks, smaller models for simple ones. This is your biggest cost lever.
- Pin model versions in production. Test upgrades in staging. Never auto-upgrade.
- Set per-customer inference budgets tied to pricing tiers. Your biggest customer shouldn't be your biggest loss.
- Build cost dashboards visible to product and engineering, not just finance. Everyone who ships features should see what they cost to serve.
Learn & Compound
This is the stage that turns a development process into a competitive moat. Every cycle, you update three things: your context, your harness constraints, and your delegation patterns. Dan Shipper at Every calls this "compounding engineering" - every feature built creates artifacts and agents that make building the next feature easier. The teams that do this well don't just ship faster. They compound faster. That gap widens every quarter.
You
Analyzing cycle outcomes, updating harness constraints, tuning agent delegation patterns, measuring whether cycles are actually getting faster.
AI
Processing outcome data, suggesting harness updates, identifying patterns across cycles, flagging when context has gone stale.
Where it goes wrong
Not closing the loop. Running cycles without capturing what you learned. No measurement of compounding velocity. This is where cognitive debt accumulates - Karpathy's concept for the hidden cost of poorly managed AI interactions.
- After every cycle, update three things: context, harness constraints, and delegation patterns. If you didn't update all three, the cycle is incomplete.
- Measure your Emergence Rate: output quality per unit of human effort, tracked over time. Emergence Capital uses this in their diligence.
- Build a library of proven spec templates from successful cycles. Your best specs become reusable assets.
- Track cognitive debt: accumulated cost of context loss, poorly managed handoffs, and unreliable agent behavior. It compounds faster than technical debt.
- Review and prune context regularly. Stale context degrades everything downstream. Context curation is maintenance, not a one-time setup.
Present at every stage
Three things that don't fit neatly into one stage because they span all of them. Ignore these and the lifecycle breaks down regardless of how well you execute each stage.
Token Economics
Inference costs inform architecture decisions at Stage 2, sprint planning at Stage 3, quality trade-offs at Stage 4, and production budgets at Stage 5. If your team doesn't think in tokens, they're flying blind on the economics of their own product.
Role Fluidity
The best person to write the spec might be the designer. The best person to validate might be the domain expert. Andrew Ng's team proposed a 1:0.5 PM-to-engineer ratio - twice as many PMs as engineers. Lenny calls this "a sign of where the world is going." Titles matter less than context and judgment.
Cognitive Debt
Every vague prompt, every unreviewed output, every skipped eval adds to a debt that compounds faster than technical debt. Karpathy coined the concept. It's the accumulated cost of poorly managed AI interactions, context loss, and unreliable agent behavior. Technical debt slows you down. Cognitive debt makes you wrong.
What the optimists leave out
This lifecycle model is only credible if it acknowledges what pushes back against it. Here's what the data actually says.
The METR Paradox
In a rigorous randomized controlled trial, experienced developers were 19% slower with AI tools - despite believing they were 20% faster. The perception gap is the real danger. You think you're moving faster. Your metrics say otherwise.
The DORA Stability Warning
The 2025 DORA report - nearly 5,000 respondents - found AI improves delivery throughput but degrades delivery stability. More code shipped faster, but more things break. AI doesn't fix teams. It amplifies what's already there. Good and bad.
The Quality Tax
CodeRabbit's analysis of 470 GitHub PRs: AI co-authored code surfaces 1.7x more major issues and 2.74x more security vulnerabilities per review. The industry calls this "AI slop" - code that looks correct and isn't. The validation stage exists because of this data.
The Tool Builders' Own Warning
Cursor's CEO Michael Truell - who built the fastest-growing developer tool in history - warns against vibe coding with "shaky foundations." Kent Beck - inventor of TDD - says agents will delete tests to make them pass. When the people building and championing these tools say "slow down," pay attention.
How the lifecycle changes at each maturity stage
The same lifecycle stage looks different depending on where you are on the maturity curve. This is where the lifecycle and the framework connect.
| Lifecycle Stage | Legacy | AI-curious | AI-enhanced | AI-first | AI-native |
|---|---|---|---|---|---|
| Specify & Constrain | PRDs and Jira | Basic prompts | Structured specs | Spec-as-code | Self-evolving specs |
| Build Context | Arch docs on a wiki | README files | Context libraries | Dynamic routing | Autonomous context |
| Orchestrate & Generate | All manual coding | Copilot autocomplete | Guided generation | Agent delegation | Multi-agent swarms |
| Validate & Craft | Manual QA | Basic CI/CD | Eval pipelines | Continuous eval | Autonomous quality |
| Ship & Manage Economics | No AI costs | Untracked spend | Cost monitoring | Token budgets | Self-optimizing |
| Learn & Compound | Quarterly retros | Ad hoc learning | Feedback loops | Systematic tuning | Compounding flywheel |
Who leads each stage
Roles are blurring. Intercom's designers write production code. Linear has 2 PMs for 87 people. The point isn't who has the title - it's who has the context.
Product Manager
- 01 Leads: structured specs, harness constraints, acceptance criteria
- 02 Supports: domain context, model selection priorities
- 03 Manages: scope decisions, dependency resolution, trade-offs
- 04 Validates: business logic, user-facing quality, craft
- 05 Owns: pricing alignment, cost-per-feature economics
- 06 Drives: cycle retrospectives, spec template library
Engineer
- 01 Supports: feasibility checks, architectural constraints
- 02 Leads: context engineering, model routing, version pinning
- 03 Leads: agent orchestration, parallel delegation, merge resolution
- 04 Leads: eval pipelines, automated testing, code review
- 05 Leads: deployment, inference monitoring, AI FinOps
- 06 Tunes: delegation patterns, context pruning, harness updates
Designer
- 01 Leads: interaction specs, UX patterns, user-facing constraints
- 02 Supports: design system as context, component libraries
- 03 Generates: prototypes, UI variations, design exploration
- 04 Validates: craft quality, visual coherence, accessibility
- 05 Supports: cost-aware design decisions, feature scoping
- 06 Evolves: design system, pattern library, UX standards
What each product function does at each stage
The operations framework defines 6 product functions. Here's how each one participates across the lifecycle. This is where the two frameworks connect.
| Stage | Strategy | Design | Development | Data | Operations | GTM & Growth |
|---|---|---|---|---|---|---|
| Specify & Constrain | Frame the problem, set strategic intent, define success criteria | Write interaction specs, define UX constraints and user flows | Assess feasibility, set architectural constraints and harness rules | Define measurement criteria, identify data requirements upfront | Set SLAs, compliance rules, operational constraints | Define positioning constraints, messaging guardrails |
| Build Context | Curate market and competitive intelligence as agent context | Maintain design system and component libraries as context | Lead context engineering, model routing, version pinning | Build data pipelines as context, establish analytics baselines | Maintain infrastructure context, toolchain configuration | Curate customer signals and market intelligence as context |
| Orchestrate & Generate | Prioritize parallel workstreams, resolve scope conflicts | Generate prototypes and UI variations, rapid exploration | Lead agent orchestration, parallel delegation, merge resolution | Instrument agent activity, track generation metrics | Manage CI/CD for agent outputs, parallel environments | Generate launch assets, messaging variations, content |
| Validate & Craft | Verify strategic alignment, review business logic | Validate craft quality, visual coherence, accessibility | Run eval pipelines, automated tests, code review | Measure truth metrics, eval accuracy, detect regressions | Enforce quality gates, security scanning, compliance checks | Test positioning effectiveness, validate adoption metrics |
| Ship & Manage | Align pricing with value delivered, model cost trade-offs | Make cost-aware design decisions, scope feature presentation | Deploy to production, monitor inference, manage AI FinOps | Monitor production metrics, track cost-per-action | Scale infrastructure, manage incidents, ensure uptime | Execute launch, personalize onboarding, track adoption |
| Learn & Compound | Refine priorities from outcome data, update strategic context | Evolve design system, update patterns from user feedback | Tune delegation patterns, update harness, prune context | Analyze cycle data, surface compounding patterns, flag drift | Optimize operational workflows, automate recurring patterns | Refine positioning from adoption data, feed signals back to strategy |
Every function, every stage
In the traditional SDLC, functions hand off sequentially: strategy sets direction, design mocks it up, engineering builds it, then QA tests it. In the AI-native lifecycle, every function participates at every stage simultaneously. Strategy doesn't disappear after Stage 1.
Context is the handoff
Instead of documents and meetings, each function contributes context that other functions consume. Design's component library becomes engineering's generation constraint. Data's metrics become strategy's feedback loop. The context system replaces the handoff chain.
The loop connects them
Stage 6 (Learn & Compound) is where every function closes its loop. GTM & growth adoption data feeds strategy's next cycle. Data's cycle analysis updates engineering's harness. Design's pattern evolution becomes the next cycle's context. Compounding happens across functions, not within them.
Import this lifecycle into your PM tool
6 stages, 34 tasks - ready to use. No sign-up required.
Built with its own methodology
Not just a framework on a page. Dacard itself was built using every stage described above. Every decision, framework application, and AI workflow documented as proof-of-practice.
Every decision documented
From specifying constraints to shipping and compounding, the build log traces every stage of this lifecycle in action. See the methodology at work, not just in theory.
Read the build logSee this lifecycle in action
Take the assessment to score against these stages, or open the app for AI-generated scoring.
Free. No sign-up required.