Human on the Loop: How We Build

Name: IAPM
Author: Immersive Fusion

Dan Kowalski - 2026-04-15

At Immersive Fusion, a human opens a work item with intent. Agents converge on the answer. A human reads the result, judges whether it matches what was asked, and owns the outcome. That loop is the shape of the day across engineering, product, and go-to-market. Code, specs, pricing updates, sales collateral, release notes, market research, blog posts: the shape is the same.

This is the operating model, and the part that makes it work is not the agents. It is the human on the loop. Humans define intent, set constraints, and own outcomes. Agents do the work in between. Remove the human judgment and a model produces confident output at machine speed with nobody accountable for whether it is right. Keep it, and you get leverage instead.

Something shifted in late 2024. Long-horizon agentic workflows stopped compounding errors and started compounding correctness, provided a human stayed on the loop to set the intent and check the result. The interesting question stopped being "can the model do the work" and became "what does the human still have to do." This is our answer.

Humans define intent. Agents do the work. Humans own the outcome.

One Source of Truth Governs Everything

Every team working this way faces the same question on day one: where does the agent read truth from? Get that wrong and you get confident hallucination at scale. Get it right and you get leverage.

Our answer is a single upstream source of truth. It holds the business strategy, pricing, product specifications, roadmaps, competitive analysis, messaging standards, legal posture, brand voice, marketing narratives, engineering conventions, and the work tracker. Every agent, every person, and every downstream surface reads from it.

Everything downstream subscribes to it. The docs site, the company site, the product surfaces, the pitch decks, and every external channel pull business context from the same upstream source. When pricing changes upstream, every downstream surface stays in sync. When a competitive claim gets refined, the messaging standards update and every agent writing a blog, a landing page, or a sales email picks up the new language on the next run.

This is not a content management system. It is a governance layer. The repository is how every agent and every person learns what kind of company we are.

What lives upstream:

Domain	What it contains	Why agents need it
Strategy	Business model, GTM, competitive analysis, vision	Every agent inherits positioning without having to infer it
Product	Specs, roadmaps, feature inventories	PM and sales agents cannot drift from what engineering actually built
Marketing	Messaging standards, narratives, banned phrases, tone rules	One voice across blog, site, decks, social, docs
Finance	Pricing, tiers, unit economics	No agent ever quotes a stale price
Legal	Contracts, trademarks, brand usage rules	Every public artifact inherits the legal guardrails automatically
Engineering	Conventions, architectural decisions, runbooks	Code agents inherit the house style
Work tracker	Active epics, spikes, tasks, status	Agents resume work where the last session left off

A person can read this repository and understand the company in an afternoon. An agent can read it in a second. Both get the same picture. That symmetry is the point.

How the Work Splits

The work splits the same way everywhere: what a human does, and what a machine does. That split is the design, and it shows up identically in engineering, in product, and in go-to-market.

Engineering

This is the part most people mean when they say "AI writes the code."

What the machines do. Agents read the work item, load the relevant context from the governance layer, generate the code, write the tests, author the migration, update the documentation, and run the work through a gated pipeline. The pipeline has multiple automated verification layers covering security, correctness, architectural compliance, and behavioral regression. Every merge to main passes automated gates. When agents generate tests for agent-generated code, mutation testing verifies that those tests would actually catch bugs, not just run green. The pipeline decides whether the work ships, because the gates are stricter than a quick human skim would be.

What the humans do. Humans write the intent, shape the constraints, and judge whether the output matches what was asked for. They hold the shape of the system in their head and intervene when the agent is about to make a decision that only a human should make. When the gates go green, the human reviews and approves. When they do not, the human reads the failing signal, sharpens the intent, and the agents try again. A developer who types all day is not thinking about the system. The human's job is to think about the system, hold its shape, and decide.

Product

Roadmaps, specs, feature definitions, competitive positioning, and release planning all live as structured artifacts in the upstream repository.

What the machines do. Agents read the artifacts, reason across them, and produce the next set of proposals. A new competitive move triggers a teardown. A shipped feature triggers a release note, a docs update, a marketing brief, a sales enablement update, and a pricing page review, all from the same commit. The machines do the legwork of turning a product decision into every downstream artifact that decision implies.

What the humans do. Humans decide what is worth building, what is not, and what the company sounds like when it talks about the work. The PM is not a ticket factory. The PM is a taste function: which competitive signals matter, which customer requests are real, which proposed features deserve a human's scarce attention. The agents cannot do that, and we are not trying to make them.

Go-to-Market

Landing pages, competitive comparison pages, outbound sequences, pitch decks, demo scripts, objection handlers, and analyst briefing materials are all generated from the same governance layer.

What the machines do. When a competitor publishes a new feature, an agent reads the announcement, cross-references our positioning, drafts a response, updates the relevant compare page, and queues a refresh of the demo deck. Every outbound sequence, every briefing document, every landing page rewrite starts as agent output that a human then shapes.

What the humans do. Humans are in the room with the customer. The seller reads the prospect, listens for what they are not saying, and decides when to push and when to hold back. The seller does not spend the week building collateral; the seller shows up with it already drafted and spends the meeting being present.

The machine does the legwork. The human owns the judgment, and the outcome.

One Team, No Walls

One quiet consequence of working this way is that it is flat. Every person gets the same AI tools, the same upstream source of truth, and the same process for turning intent into output. The same prompts work for the founder as for the most recent hire. No private stack for leadership and a thinner stack for everyone else.

Most companies are organized around walls. Engineering builds, then throws code over the wall to QA. Product writes specs, then throws them over the wall to engineering. Marketing gets the feature list after the feature ships. Sales learns about the product from a slide deck someone made last quarter. Every wall creates a translation layer, and every translation layer loses signal.

Here there is one upstream source of truth, and everyone reads from it. The same source, in the same format, with the same depth. The engineer who joined last month opens a work item the same way the CTO opens a work item. The result is that everyone has the same mental model of what we are building, why, for whom, and how far along we are. Nobody waits for a sync meeting to find out what another team decided. As Tobi Lutke put it in Shopify's AI-first memo: "The fundamental skill of using AI well is to be able to state a problem with enough context, in such a way that the task is plausibly solvable." The governance layer provides the context. The humans bring the problem.

The Assistant That Builds Itself

There is one more thing that matters more than the rest. The AI assistant we sell runs inside the workflow that builds it.

Tessa is the AI assistant that lives inside our product. When a customer opens our platform, Tessa is the interface that diagnoses traces, explains topology, and proposes fixes. Tessa also runs inside our own workflow. She coordinates agents, routes tasks to the right model, and removes the noise and minutiae so the humans can focus on the work that matters.

This is dogfooding with teeth. Most vendors who claim to "use our own product internally" mean they have a dashboard open in a tab somewhere. We mean something different. When a developer opens a work item, Tessa is the one marshaling the agents that converge on the answer, the same orchestrator a customer would talk to. When a seller prepares a demo, the explainer in the briefing came from the same model ladder that answers "what is wrong with this trace" for a paying customer.

That symmetry is the compounding loop. Every improvement we make to Tessa for customers is an improvement to Tessa for us. Every insight we get from using Tessa to build software becomes a feature request that customers benefit from. The assistant we ship gets better because we use it, and the human role in that loop is not writing the code. It is setting the specification, judging the output, and holding the veto. The human oversight is what keeps it honest.

Tessa builds Tessa. The human oversight is what keeps it honest.

One Model Is a Monoculture

We do not run on a single AI provider. Tessa, our customer-facing assistant, runs on OpenAI's GPT models today, and we published that model ladder so customers can see exactly which model lands on which query. Inside our own workflow, we run Tessa and Anthropic's Claude side by side. Different models are stronger at different things. A model that is excellent at long-context code reasoning may be mediocre at UI copy. A model that is a poet on prose may be expensive for deep investigation.

The first reason is resilience. Running on a single model provider is a single point of failure. Providers deprecate models on their own timelines, change prices, suffer outages, and change safety policies mid-quarter. We deliberately work across providers so no one vendor can brick the line.

The sharper reason is adversarial robustness. A monoculture is easier to manipulate with adversarial prompts than a routed ensemble. When one agent on one model flags something, and a review agent on a different model either confirms or contradicts it, the disagreement is where the interesting information lives. We pay attention to disagreements more than to agreements, because disagreements are where the signal hides.

Measure Ten Times, Cut Once

There is an old carpentry proverb that says measure twice and cut once. We run on a more extreme version of the same idea.

We spend the vast majority of our time planning, refining, sharpening intent, and drawing the lines the agents are not allowed to cross. The actual handoff to agents is a small fraction of the day. Implementation is the cheap part now. Planning is where the leverage is. As Kent Beck observed about the AI shift: "The whole landscape of what's 'cheap' and what's 'expensive' has shifted." When cutting is cheap, measuring is the job.

When implementation is fast, cheap, and nearly unbounded in volume, a sloppy plan turns into a sloppy artifact at machine speed. A half-formed intent produces a half-formed product, in parallel, across every surface, before anyone has time to notice. Speed without precision is a way to dig a very deep hole very fast. The only defense is to measure more times, not fewer.

So we measure ten times. We write the intent. We argue about the intent. We write the scenarios the system has to handle and the edge cases it is not allowed to get wrong. We write the constraints the agent is not allowed to violate. We write the gates that will verify the output. We decide what the system will look like when it is correct, before the agent writes a single line. And only then do we hand off.

When we hand off, we know what we will get. Not because we can predict every token the model will emit, but because the shape of the output has already been pinned down by the shape of the request. The agent runs inside a small, well-lit corridor. It does not surprise us in the dimensions that matter, because those dimensions are already measured.

Human on the loop is not a safety blanket draped on top of a reckless system. It is the structural load-bearing element that lets the system move this fast at all. Remove the planning rigor and the gates, and you get a very expensive way to generate very confident nonsense. Keep them, and you get something that actually ships.

Measure ten times. Cut once. The agents make cutting cheap. The humans make sure we cut the right thing.

The Honest Objections

Skeptics of AI-native operations usually raise one of three objections. They deserve real answers, not slogans.

"This only works for trivial code." We ship an observability platform with a 3D rendering engine, a native desktop application, a distributed tracing backend, an AI assistant with tool access to our own API, and a multi-tenant cloud service. None of this is trivial, and most of it is written by agents inside a gated pipeline. The pattern is not a toy. The gates are the point.

"You will lose quality." Quality is a measurable property, and the right measurement is not "did a human read every line." The right measurements are: do the automated gates catch regressions, does the product pass contract tests, does mutation testing confirm the tests are real, do integration tests exercise the actual paths, and does the end user see fewer bugs. Our gates are stricter than most code review processes we have worked under in the past. Kent Beck, the creator of test-driven development, calls AI "an unpredictable genie that grants your wishes, but oftentimes in unexpected ways" and argues that TDD becomes a superpower when working with AI agents. We agree. The verification layer matters more than ever when the generation layer is a model.

"You are just an API wrapper over someone else's model." The model is a commodity. Our leverage is not the model. It is the governance layer, the verification pipeline, the work decomposition, the taste function on top, and the judgment loop around the edges. Tobi Lutke calls the real skill "context engineering, not prompt engineering." A competitor with the same API key and no governance layer does not get the same output. We know, because we have watched it not happen.

Does This Work at Scale?

The honest steelman is: it works partly because we are small. In larger organizations, information is not just a logistics problem. It is a political one. Transparency is not always in people's best interest, at the top of the hierarchy or the bottom. The governance layer depends on clear decision rights, clear accountability, and a willingness to write down what the company actually thinks. That is hard to do at ten people. It is much harder to do at a thousand.

We take that seriously. This model is not a universal prescription. It is an operating choice, and it rests on commitments a larger organization would have to make deliberately. Decision rights have to be explicit, not implicit. Accountability for changes to the governance layer has to be owned, not diffused. The people with the authority to change how the company thinks have to be the same people accountable for the outcomes when those changes propagate.

In a pre-AI world, tribal knowledge carried the load. In an AI-native world, tribal knowledge is a liability, because every agent inherits it badly. So the honest answer is: it works at our scale because we built it, and it will work at larger scale only for companies that make the organizational commitments the governance layer requires. That work is harder than writing the markdown. It is the work of deciding what kind of company you are, and being willing to say so out loud.

The Force Multiplier Cuts Both Ways

There is a second honest limit, and it is more uncomfortable than the first.

The models are a force multiplier, and that cuts both ways. Point a high-throughput agent at a well-decomposed problem inside a clean architecture, and the output compounds correctness. Point the same agent at a tangled codebase with ambiguous intent and no house style, and the output compounds the mess. Faster. At scale. With an air of confidence that makes the mess harder to see, not easier.

This is the part of the AI-native story that gets glossed over in most manifestos. The governance layer is not magic. It does not elevate a team that cannot write a clear spec. It does not substitute for software design literacy. It amplifies whatever the team already is. If the team can hold the shape of a well-designed system in its head, the agents produce a lot of well-designed system. If the team cannot, the agents produce a lot of something else, and nobody reading the diff at machine speed will catch it until the interest on that debt comes due.

Clean architecture and disciplined design matter more this way, not less. When humans typed the code, bad design slowed them down enough to notice. When agents type the code, bad design gets shipped at the full speed of the pipeline. The gates catch functional regressions. The gates do not catch a design that will make the next feature twice as hard to build. That is a human's job, and it is a job that requires having been wrong about design before, recovered from it, and learned the pattern.

This is why we spend the ninety percent on planning. The plan is where design literacy shows up. The plan is where a senior engineer's instinct that "this abstraction will bite us in six months" becomes a constraint the agent has to respect. The plan is where a product manager's taste for what to cut becomes a specification the agent cannot drift past. Without those humans, this is a faster way to produce code a more experienced team will have to rewrite. With them, it is a multiplier on the work of people who already knew what good looked like.

The Honest Part

This is hard. The governance layer takes real investment to build and real discipline to maintain. A repository that is out of date is worse than no repository at all, because agents treat it as gospel. Every upstream update has to propagate. Every gate has to hold. Every intent has to be written down with enough precision that an agent stays inside the corridor. A sloppy intent produces a sloppy artifact at machine speed, which is a dangerous combination.

We also learned the hard way that not every task belongs in this model. Some conversations are better had out loud, on a whiteboard, with two humans and no model. The execution belongs in the loop. The invention belongs on the whiteboard. Confusing the two burns both.

And the humans who thrive this way are not the humans who thrived in the last model. The instincts that matter now are: write down what you want, notice what you do not like, stop the loop when it matters, and trust the system when it does not. These are not the instincts most of us were hired for ten years ago. We are learning them together.

The Part Only Humans Can Do

If you read this far and came away thinking the humans are mostly operating the machine, you missed the most important part.

The humans are not here because the machine is incomplete. The humans are here because software companies exist to serve other humans, and the people on the receiving end can tell the difference.

A customer calling during an outage is not looking for a perfectly structured status update. They are looking for someone who understands that their quarter is on the line and their team is not sleeping. An investor on a quarterly call is not looking for a report that could have been generated. They are looking for a founder whose eyes tell them whether to worry. A candidate on a recruiting call is looking for a future colleague they can read the room with. An employee going through a hard month is looking for someone who will notice, and say something, and mean it.

Empathy is the one capability that does not scale from a prompt. It does not compound on iteration. It does not emerge from a better model. It comes from a human being who has been through something similar, or who can sit with someone who has, and who is willing to show up without knowing in advance what the right thing to say is. The machine does not do that. It will not do that in the version after this one, either. Only another human can relate to a human, because relating is the thing that requires having been one.

This is not a sentimental add-on. It is the reason the work is worth doing in the first place. We automate the work that does not need a human so the humans can spend their day doing the work that does. Writing boilerplate did not need empathy. Talking to a customer whose product is on fire does. Sitting with a teammate who just got bad news does. The freed minutes and hours get returned to the conversations only humans can have.

Empathy does not compound from a prompt. Only another human can relate to a human.

What You Will See From Us

Every post you read on this blog, every feature you see in the product, every landing page you land on, every tier on our pricing table, and every answer our AI assistant gives inside the product came out of this way of working. That is not a marketing claim. It is a description of how the work happens. We publish the model ladder that powers Tessa. We publish the skill inventory that powers Tessa's internal workflows. We are willing to show the work because the work is the advantage.

If you are building an observability platform to run alongside the AI systems your team is shipping, we think you will find we understand the terrain in a way that 2D dashboard vendors cannot. We live inside the same loop you are trying to monitor.

And if you are building your own AI-native company, we hope this is useful. The deciding variable is never the model, and it is never the agents. It is the human on the loop: the judgment that sets the intent, checks the result, and owns the outcome.

Enter the World of Your Application®

Start Free. Immersive. AI-guided. Full-stack observability. Human on the loop, every step. Enter the World of Your Application®.

Dan Kowalski

Father, technology aficionado, gamer, Gridmaster

About Immersive Fusion

Immersive Fusion (immersivefusion.com) is pioneering the next generation of observability by merging spatial computing and AI to make complex systems intuitive, interactive, and intelligent. As the creators of IAPM, we deliver solutions that combine web, 3D/VR, and AI technologies, empowering teams to visualize and troubleshoot their applications in entirely new ways. This approach enables rapid root-cause analysis, reduces downtime, and drives higher productivity, transforming observability from static dashboards into an immersive, intelligent experience. Learn more about or join Immersive Fusion on LinkedIn, Mastodon, X, YouTube, Facebook, Instagram, GitHub, Discord.

Streamlined Setup

Simple integration

Cloud-native and open source friendly

Rapid Root Cause Analysis

Intuitive tooling

Find answers in a single glance. Know the health of your application

AI Powered

AI Assistant by your side

Unlock the power of AI for assistance and resolution

Intuitive Solutions

Conventional and Immersive

Expert tools for every user:
DevOps, SRE, Infra, Education

info@immersivefusion.com

Email

Chat right from the web site

Online chat

888-992-3429

Immersive Blogs

Publications about innovation and new functionality.