In AI, a week is a long time. A month is an era. Not long ago, building a new capability for a conversational AI platform meant weeks of scaffolding, meetings, manual wiring, and integration dread. Here’s what it looks like when you build the infrastructure to move differently.
Syntea is the AI learning platform at IU International University of Applied Sciences — thousands of students every day, talking to an AI tutor, running practice exams, tracking their progress. Under the hood: a central AI orchestrator, dozens of interactive tools, and a growing ecosystem of specialized agents. Every new capability has to work within this system. Here’s how we built the infrastructure that lets the phases of development collapse into each other.
Five Phases. Merging Into One.
Every product team has the same five phases: concept, build, test, deploy, iterate. They’re universal. They don’t go away. But the difference between a team that takes months and one that takes hours isn’t that the fast team skips phases — it’s that their phases overlap.
When the infrastructure is right, concept becomes the first build — you define a capability and immediately talk to a working version. Testing becomes using — an AI agent holds a real conversation and evaluates whether the tutor responds correctly. Monitoring feeds directly into the next iteration. The phases don’t disappear. They merge.
We built the infrastructure for this. With each cycle, the phases compress a little more. What matters isn’t a fixed timeline — it’s that every phase has been engineered to collapse into the ones around it. The result: release velocity nearly tripled in half a year, with the same team.
The Five Phases — Concept to Production
These five phases aren’t a rigid sequence — they’re a loop. For small capabilities, they collapse into a single session. For larger ones, they separate more. But the infrastructure ensures they overlap wherever possible, and each cycle compresses the next.
In practice, these phases rarely happen in isolation. For a small feature, concept and build are the same action — you define the capability and talk to a working version in the same session. For a larger one, the phases separate more, but the infrastructure ensures they still overlap wherever possible. The direction is always the same: tighter loops, faster cycles.
Define & Prototype
Every capability starts with intent: what problem does this solve, and what does a successful interaction look like? Trigger language, conversation scenarios, success criteria. This is universal. But with the right infrastructure, concept design doesn’t end with a document — it ends with a working prototype.
A recent example: we needed to help students navigate IU’s exam registration process — deadlines, re-registration rules, special accommodations. The conventional approach would be a search pipeline with a vector database and retrieval system. Instead, we tried something simpler: compress 40 FAQ documents into structured instructions and embed them directly into the AI’s context. No database. No retrieval layer. The AI reasons over the knowledge directly. A working proof of concept in under an hour — already covering the majority of common exam questions, enough to validate the approach before investing further.
We call this the “v0 approach”: start with the simplest thing that could work, add complexity only when you’ve proven the concept. The concept phase and the build phase collapsed into one action.
Scaffold & Validate
Once a concept proves itself, the build phase turns it into production-grade code. With the right tooling, this phase compresses dramatically: an AI coding agent scaffolds the service with correct structure, types, and configuration from a single prompt. The full local stack — dozens of services — starts in one command. The gap between “I have a working prototype” and “I have a production-ready service running locally” shrinks to minutes.
Local ≠ lite. A local environment that doesn’t match production isn’t a development environment — it’s a confidence trap. The overhead of running the real stack locally is always less than the cost of debugging production-only failures.
AI Tests AI
Here’s where AI-native development diverges from everything that came before. In deterministic software, you test code — does this function return the right value? In AI products, you test conversations — does the tutor correctly identify what the student needs? Does it know when not to act? That requires running real scenarios against a real model.
LLMs are fundamentally non-deterministic. The same input can produce different outputs. Behaviour depends on context, conversation history, and subtle prompt interactions you didn’t anticipate. Scenario-based testing doesn’t eliminate this uncertainty — it makes it visible and manageable.
Our QA system: an AI agent drives a real browser, holds multi-turn conversations as a student would, pursues specific test goals, and evaluates whether the AI tutor responded correctly. Test scenarios are defined in configuration files — readable, version-controlled, improvable over time. Because LLM responses vary, we run scenarios in parallel: five simultaneous test runs surface non-deterministic behaviour that single runs would miss. The output is structured — PASS, WARN, or FAIL with reasoning — so it feeds directly into the next decision.
Unit tests tell you the code works. Scenario tests tell you the product works. Both are necessary. Neither replaces the other. In AI systems, the gap between them is where most production incidents live.
Push & Flag
Deployment should not be an event. The infrastructure definition should be generated from your service configuration, committed alongside your code, and applied through a single command. Feature flags collapse deployment and release into a continuous motion — you push to production, then control who sees it and when.
We recently shipped a new multi-view interface — giving interactive tools nano, micro, and macro display modes — behind a feature flag, enabling controlled rollout by user segment before full release. The developer handles everything up to the boundary; a single privileged step completes the deployment. No runbook, no specialist, no meeting.
Observe & Compress
In traditional software, monitoring and iteration are separate activities — you watch dashboards, then plan the next sprint. In AI systems, monitoring feeds directly into the next build. Capability-level traces show you exactly which AI decision went wrong, how long inference took, and where behaviour diverged from expectations. There is no separate “analysis phase.” The trace IS the next iteration’s starting point.
A concrete example: one of our AI capabilities was intermittently timing out. End-to-end tracing showed the load balancer had a 30-second timeout, but the capability needed 16–28 seconds — most of that spent on LLM inference, not the external API it called (which responded in 118 milliseconds). Without capability-level tracing, this would have looked like a general platform issue. With it, we identified the exact bottleneck in under an hour and knew exactly what to fix.
Build the loop, not just the product. A team that can observe, change, test, and deploy in hours will outperform one that takes weeks — not just on the first capability, but on every one after it.
We’ve seen this concretely. Our exam FAQ capability started as a proof of concept — 40 documents compressed into the AI’s instructions, built in under an hour. Once validated, the second iteration added a dedicated backend service with a live API serving a much larger knowledge base — and a dedicated agent, so the main AI tutor’s context stays clean while the FAQ agent handles its specialized domain. That took about a week. A third version is planned. Each iteration builds on the scaffold, the test suite, and the monitoring already in place. The phases compressed further each time.
Build the Infrastructure to Move Fast. Then Move Fast.
The five phases above aren’t just a workflow — they’re a statement about how AI platform teams should be structured. A good platform eliminates the detour between domain knowledge and a working capability. That only works when the platform itself is treated as a product — shaped by the same people who feel the problems it solves. The closer product and platform sit, the faster each cycle becomes. We’ve watched this compound: what required long coordination loops early on now flows naturally, because the context is shared.
And critically — that platform must be designed not just for humans to use, but for AI agents to operate autonomously within it. Cursor and Claude Code don’t read wikis. They read well-structured CLI help text, consistent project layouts, and clear error messages. If your tools aren’t optimised for agents, you’re leaving most of their capability unused.
Agent-first tooling is the multiplier. A CLI designed for humans can be used by AI agents. A CLI designed for AI agents makes your entire team dramatically faster — because both humans and agents can operate it fluently, discover its capabilities, and chain its commands into autonomous workflows. This is the difference between a tool and a platform.
The five phases haven’t disappeared. They’ve merged into a tighter and tighter loop. What required a full team and weeks of handoffs a year ago, two people handle in days. For small capabilities, one person handles in an afternoon. The direction is always the same: more overlap, fewer boundaries, faster cycles.
Our release velocity nearly tripled in roughly half a year — from 27 to 74 releases per month — with largely the same team. That’s not because we hired faster. It’s because the infrastructure compounds: every capability that ships makes the next one easier to build, test, and deploy. The phases keep compressing. The loop keeps tightening.

Leave a Reply