SynteaNext — Sprinting in the Era of AI

Engineering
AI Platform
Production
February 2026

In AI, a week is a long time. A month is an era. Not long ago, building a new capability for a conversational AI platform meant weeks of scaffolding, meetings, manual wiring, and integration dread. Here’s what it looks like when you build the infrastructure to move differently.

Syntea is the AI learning platform at IU International University of Applied Sciences — thousands of students every day, talking to an AI tutor, running practice exams, tracking their progress. Under the hood: a central AI orchestrator, dozens of interactive tools, and a growing ecosystem of specialized agents. Every new capability has to work within this system. Here’s how we built the infrastructure that lets the phases of development collapse into each other.

Five Phases. Merging Into One.

Every product team has the same five phases: concept, build, test, deploy, iterate. They’re universal. They don’t go away. But the difference between a team that takes months and one that takes hours isn’t that the fast team skips phases — it’s that their phases overlap.

When the infrastructure is right, concept becomes the first build — you define a capability and immediately talk to a working version. Testing becomes using — an AI agent holds a real conversation and evaluates whether the tutor responds correctly. Monitoring feeds directly into the next iteration. The phases don’t disappear. They merge.

We built the infrastructure for this. With each cycle, the phases compress a little more. What matters isn’t a fixed timeline — it’s that every phase has been engineered to collapse into the ones around it. The result: release velocity nearly tripled in half a year, with the same team.

The Five Phases — Concept to Production

These five phases aren’t a rigid sequence — they’re a loop. For small capabilities, they collapse into a single session. For larger ones, they separate more. But the infrastructure ensures they overlap wherever possible, and each cycle compresses the next.

Concept

Define & Prototype

Define intent, validate immediately. Concept and first build can be the same action.

Build

Scaffold & Validate

AI agents scaffold. The full stack runs locally. Minutes from idea to working version.

Test

AI Tests AI

An agent holds real conversations and evaluates correctness. Parallel runs catch non-determinism.

Deploy

Push & Flag

Generated infrastructure. Feature flags separate deployment from release. One privileged step.

Iterate

Observe & Compress

Traces show exactly what to fix. The next cycle starts with clarity. Each loop is faster.

In practice, these phases rarely happen in isolation. For a small feature, concept and build are the same action — you define the capability and talk to a working version in the same session. For a larger one, the phases separate more, but the infrastructure ensures they still overlap wherever possible. The direction is always the same: tighter loops, faster cycles.

inside each phase

Concept

Define & Prototype

Every capability starts with intent: what problem does this solve, and what does a successful interaction look like? Trigger language, conversation scenarios, success criteria. This is universal. But with the right infrastructure, concept design doesn’t end with a document — it ends with a working prototype.

A recent example: we needed to help students navigate IU’s exam registration process — deadlines, re-registration rules, special accommodations. The conventional approach would be a search pipeline with a vector database and retrieval system. Instead, we tried something simpler: compress 40 FAQ documents into structured instructions and embed them directly into the AI’s context. No database. No retrieval layer. The AI reasons over the knowledge directly. A working proof of concept in under an hour — already covering the majority of common exam questions, enough to validate the approach before investing further.

We call this the “v0 approach”: start with the simplest thing that could work, add complexity only when you’ve proven the concept. The concept phase and the build phase collapsed into one action.

What enables this compression

Clear trigger definitionEvery capability needs a description the AI uses to decide when to activate it. Specific trigger language is the single most important thing you’ll write — and it can be tested immediately.

User scenarios that become test casesWrite 3–5 realistic conversation scenarios before building. They validate the concept, then become your QA test suite. Two phases, one artifact.

Instant prototyping through AI contextEmbedding knowledge directly into the AI’s instructions lets you test a concept before writing a single line of infrastructure code. If it works, build the real thing. If it doesn’t, you lost an hour, not a sprint.

Build

Scaffold & Validate

Once a concept proves itself, the build phase turns it into production-grade code. With the right tooling, this phase compresses dramatically: an AI coding agent scaffolds the service with correct structure, types, and configuration from a single prompt. The full local stack — dozens of services — starts in one command. The gap between “I have a working prototype” and “I have a production-ready service running locally” shrinks to minutes.

💡

Local ≠ lite. A local environment that doesn’t match production isn’t a development environment — it’s a confidence trap. The overhead of running the real stack locally is always less than the cost of debugging production-only failures.

What enables this compression

AI agents optimised for your codebaseA Claude Code or Cursor agent with embedded platform knowledge generates a working service skeleton — endpoint definitions, data models, deployment configuration — in under a minute, with no documentation reading required.

CLI designed for agents, not just humansOur CLI scaffolds services, starts the full local stack, and runs test suites. Every command and flag is structured so an AI coding agent can discover and invoke it autonomously. When an agent reads your CLI as fluently as a human reads a README, the entire team moves faster.

One-command full-stack orchestrationStarting the entire system — dozens of services including message brokers, databases, caching layers, the AI core, and the UI — is a single command. Agents and developers alike bring the stack up and down without surprises.

Test

AI Tests AI

Here’s where AI-native development diverges from everything that came before. In deterministic software, you test code — does this function return the right value? In AI products, you test conversations — does the tutor correctly identify what the student needs? Does it know when not to act? That requires running real scenarios against a real model.

LLMs are fundamentally non-deterministic. The same input can produce different outputs. Behaviour depends on context, conversation history, and subtle prompt interactions you didn’t anticipate. Scenario-based testing doesn’t eliminate this uncertainty — it makes it visible and manageable.

Our QA system: an AI agent drives a real browser, holds multi-turn conversations as a student would, pursues specific test goals, and evaluates whether the AI tutor responded correctly. Test scenarios are defined in configuration files — readable, version-controlled, improvable over time. Because LLM responses vary, we run scenarios in parallel: five simultaneous test runs surface non-deterministic behaviour that single runs would miss. The output is structured — PASS, WARN, or FAIL with reasoning — so it feeds directly into the next decision.

⚠️

Unit tests tell you the code works. Scenario tests tell you the product works. Both are necessary. Neither replaces the other. In AI systems, the gap between them is where most production incidents live.

What enables this compression

An automated QA agent that behaves like a real userAn AI agent that holds a real conversation, pursues goals, reacts to responses, and evaluates correctness. Configurable per scenario: persona, goals, environment, number of runs.

Scenario files as version-controlled artifactsTest scenarios live alongside the code — both a specification and a regression guardrail in one file. The same scenarios that validated the concept now guard the production version.

Multi-run variance detectionProbabilistic systems need probabilistic testing. Run scenarios multiple times to expose variance that single runs miss.

Deploy

Push & Flag

Deployment should not be an event. The infrastructure definition should be generated from your service configuration, committed alongside your code, and applied through a single command. Feature flags collapse deployment and release into a continuous motion — you push to production, then control who sees it and when.

We recently shipped a new multi-view interface — giving interactive tools nano, micro, and macro display modes — behind a feature flag, enabling controlled rollout by user segment before full release. The developer handles everything up to the boundary; a single privileged step completes the deployment. No runbook, no specialist, no meeting.

What enables this compression

Infrastructure-as-code generationCloud resources derived from the service config — task definitions, load balancer rules, permission scopes. Generated, not hand-crafted.

Feature flags as standardDeploying and releasing are different actions. Dark-launch new capabilities, enable them incrementally. Deployment becomes routine, not risky.

A minimal, explicit handoffWhatever requires elevated permissions is one clearly defined action. The developer takes everything to the boundary; the privileged step is minimal.

Iterate

Observe & Compress

In traditional software, monitoring and iteration are separate activities — you watch dashboards, then plan the next sprint. In AI systems, monitoring feeds directly into the next build. Capability-level traces show you exactly which AI decision went wrong, how long inference took, and where behaviour diverged from expectations. There is no separate “analysis phase.” The trace IS the next iteration’s starting point.

A concrete example: one of our AI capabilities was intermittently timing out. End-to-end tracing showed the load balancer had a 30-second timeout, but the capability needed 16–28 seconds — most of that spent on LLM inference, not the external API it called (which responded in 118 milliseconds). Without capability-level tracing, this would have looked like a general platform issue. With it, we identified the exact bottleneck in under an hour and knew exactly what to fix.

🔄

Build the loop, not just the product. A team that can observe, change, test, and deploy in hours will outperform one that takes weeks — not just on the first capability, but on every one after it.

We’ve seen this concretely. Our exam FAQ capability started as a proof of concept — 40 documents compressed into the AI’s instructions, built in under an hour. Once validated, the second iteration added a dedicated backend service with a live API serving a much larger knowledge base — and a dedicated agent, so the main AI tutor’s context stays clean while the FAQ agent handles its specialized domain. That took about a week. A third version is planned. Each iteration builds on the scaffold, the test suite, and the monitoring already in place. The phases compressed further each time.

What enables this compression

APM tracing at the capability levelEnd-to-end traces filterable by capability — isolate the latency and error profile of each feature independently. A slow component shouldn’t look like a slow system.

Instruction versioningIn AI systems, changing behaviour often means changing instructions, not code. We version-control everything: instructions, tool definitions, scenario tests. If something regresses, we know exactly what changed.

Scenario tests as a living regression suiteEvery scenario from the test phase becomes a guardrail during iteration. Run the suite before any change ships. If something that passed now fails, you know before your users do.

the bigger picture

Build the Infrastructure to Move Fast. Then Move Fast.

The five phases above aren’t just a workflow — they’re a statement about how AI platform teams should be structured. A good platform eliminates the detour between domain knowledge and a working capability. That only works when the platform itself is treated as a product — shaped by the same people who feel the problems it solves. The closer product and platform sit, the faster each cycle becomes. We’ve watched this compound: what required long coordination loops early on now flows naturally, because the context is shared.

And critically — that platform must be designed not just for humans to use, but for AI agents to operate autonomously within it. Cursor and Claude Code don’t read wikis. They read well-structured CLI help text, consistent project layouts, and clear error messages. If your tools aren’t optimised for agents, you’re leaving most of their capability unused.

🤖

Agent-first tooling is the multiplier. A CLI designed for humans can be used by AI agents. A CLI designed for AI agents makes your entire team dramatically faster — because both humans and agents can operate it fluently, discover its capabilities, and chain its commands into autonomous workflows. This is the difference between a tool and a platform.

The five phases haven’t disappeared. They’ve merged into a tighter and tighter loop. What required a full team and weeks of handoffs a year ago, two people handle in days. For small capabilities, one person handles in an afternoon. The direction is always the same: more overlap, fewer boundaries, faster cycles.

Our release velocity nearly tripled in roughly half a year — from 27 to 74 releases per month — with largely the same team. That’s not because we hired faster. It’s because the infrastructure compounds: every capability that ships makes the next one easier to build, test, and deploy. The phases keep compressing. The loop keeps tightening.

Quintus Stierstorfer

Senior Director, SynteaNext

Benjamin Meindl

Head of Agentic AI Platform, SynteaNext

Slman Ziab

Backend Technical Lead, SynteaNext

Mariam Khuchua

AI Engineer, SynteaNext