After the launch of Liminal a few days ago, I want to begin a series of posts I consider especially important. The idea of this series is to create pieces that can be read on their own, but that together form an increasingly clear map for anyone with the capacity or the interest to drive AI-based products, initiatives or business lines. I think it can be a useful contribution for C-levels and managers given all the noise around AI on social media right now.
We'll start at the most basic, least jargon-heavy layer, and we'll go deeper bit by bit until we reach the guts of these systems.
The aim is not to turn a business profile into an engineer, but to give them enough criteria to hold useful technical conversations, make better decisions, and tell a real opportunity from a demo dressed up as strategy.
In every post I'll use two very simple markers to orient the reading:
- Technical complexity: from 1 to 5. A 1 means any business profile should be able to follow it without friction. A 5 means content already aimed at senior technical teams.
- Business value: from 1 to 5. A 5 means the content is heavily oriented towards pure business, even though we're always talking about AI and technology.
In addition, to make reading easier, specialised industry terms will always appear in the language they were coined in — usually English — and linked to a glossary for anyone who wants to dig further. I don't think it makes sense to force-translate all the technical jargon into other languages, because in many cases it adds more noise than clarity.
This post is the first part of that series. And I want to start with a foundational idea, because if this isn't understood, almost everything else gets twisted: an AI product is not a prompt. It sounds extremely obvious and basic. That was the point. Now let's start digging in.
The original confusion
The biggest confusion I keep seeing in the market is this: entire teams claim to be building "AI products" when, in reality, they've wrapped a couple of prompts in a nice interface. Sometimes the demo is striking. Sometimes it even works in a meeting. But that doesn't mean a product exists, that it scales, or even that it'll work in a real conversation.
There's an additional problem with agent-based products: the mirage that something works simply because of the wow effect (less and less wow each day) of seeing AI do things and seem intelligent. The MVP gap has always existed in software — it's a question of expectation management. In traditional software, if an interface or a data flow worked, with unit tests, integration tests, end-to-end tests and a few iterations — to put it very simply — you already had something acceptable.
With agentic software, on the other hand, the interface is the language. And that is wonderful, but it also introduces a brutal complexity of corner cases.
The mirage shows up right there: when you see the system converse and respond inside the full complexity of human language, it looks like the product is already working. But when real users arrive, the usual avalanche of bugs from early releases is joined by others that are far harder to track: hallucinations, badly formulated requests, ambiguous expectations, unresolved demands, and behaviours that are hard to reproduce. That's when the real nightmare begins. The good news is that there are many ways to contain it, and they go well beyond the (over-)mentioned guardrails.
When that confusion reaches the executive committee or the decision-making layers, the problem multiplies. Initiatives get approved with a completely wrong picture of where the real difficulty lies, what assets matter, and why the solution then fails to scale, retain users, or generate the expected return.
A real AI product has several layers
When we talk seriously about an AI product, we're usually talking about several layers working together. There is no official way to do it, but there is a set of indispensable components you should know:
- Interface — how the user interacts: chat, voice, API, embedded in another system, SaaS, or as a copilot.
- Intent Routing — how you decide what kind of request has come in and what path it should follow.
- Reasoning and Planning — how the system thinks, decomposes, prioritises, and decides on steps.
- Tools — what actions it can actually execute: query data, call APIs, update systems, or trigger processes.
- Agent Memory — what context persists, what is remembered, and how it is used without breaking security, cost, or experience.
- Observability — how you really measure what's happening. On top of the usual (performance, errors, funnels, costs, system decisions), you now also need to bring in agentic observability (more qualitative).
The prompt only touches part of this system. Important, yes. But only a part. And in fact, often it isn't even the part that creates the most value or destroys the most risk.
Interact with the graphic to see how information flows through it (hover in a box on desktop, tap in a box on mobile).
1. Interface: the first guardrail of an AI product
Here, beyond the obvious things we've been building for ages (UX, a good frontend, a properly structured backend to achieve high availability, and everything else), we have to think about building experiences that don't let the beast loose. The UX itself should be the first guardrail of an AI product. And that prevents many headaches and user frustrations.
Leaving a ChatGPT-style interface in a product that isn't another ChatGPT is a mistake. I see it fairly often in client meetings and in some designs out in the market. UX teams should have, as has always been the case, a methodical conversation with your AI Engineers and your product team to decide what type of interaction best frames the problem the user wants to solve.
- Divide and conquer: an intentional interface to classify the request. Several touchpoints with agents tend to be better than one general chat when the user is focused on solving a specific problem and you already know what it is. There you're already classifying part of the prompt without using inference, and that prunes a huge number of corner cases (future errors).
- AI hidden in the BackEnd. It runs after button clicks, not after sending a message — not everything has to be a chat with the AI. The magic of filling in a form and having an agent act behind the scenes with its full power is brutal, precisely because it's a controlled, structured input. That reduces many other problems, narrows the failure surface, and creates an almost deterministic environment for the agent. It will never be fully deterministic, of course, but that's the point — to keep narrowing the response space.
- AI contextualised to specific actions of the app. If you're in a specific area, your prompt should be focused on a specific action, not letting the AI wander around the whole product. All possible context should be sent automatically alongside the prompt so the AI has extra information or can retrieve more (user profile, last actions…).
Before Intent Routing and Reasoning and Planning: the piece almost no one visualises
Most failed initiatives don't fail because the model is bad. They fail because the system treats every input the same way: with the same cost, the same latency, and the same execution logic.
Let alone if latency is a problem — which it always is — because we users want magic and we don't settle for just anything.
That mistake shows up enormously in agents, internal assistants, support, voice agents, operational automation, and any product where volume starts to grow. If every interaction goes through the same lane, the system becomes expensive, slow, and hard to govern.
That's why the Intent Routing and Reasoning and Planning layers aren't a technical detail. They are layers with direct impact on margins, user experience, and product viability.
2. Intent Routing: deciding well before thinking deeper, and investing resources
Intent Routing is the system's ability to understand what type of request it's receiving and decide what path it should follow. Put in business language, it's the layer that prevents you from spending expensive resources on tasks that don't need them.
This connects directly with the idea that agents need (at least) two speeds. Not every query deserves the same treatment. A simple, repetitive, or very narrow request should not consume the same time, cost, and complexity as an ambiguous, sensitive request that requires several tools.
When designing these patterns, one of the cases that usually gets classified revolves around the English concept of small talk. Many conversations have simple beginnings and endings: greetings, closings, thank-yous, or basic requests that should be cheap and fast.
A mature design typically separates at least these concepts:
- Fast lane for the routine: greetings, FAQs, status queries, closed tasks, cacheable or highly structured responses.
- Slow lane for the complex: analysis, chained tasks, tool use, decisions with context, and non-trivial flows.
- Guardrails: I usually also add early guardrails in this layer. And they don't have to be just the usual ones (misuse, abusive language, etc.). Guardrails can also be business guardrails. Things you don't want to process, or where you don't want to spend resources, because every word generated by the AI costs money (we'll see this later in a post about the economics of AI). Bouncing them at this first layer is a pure business decision you should demand. It impacts cost, the intent of your product, and performance, because you specialise the use of your resources on what matters to you.
In addition, intent routing can be multi-layered: it doesn't only choose between fast and slow, it routes to specialised agents within each lane. Think of a call centre with dozens of intents: the system filters and dispatches until it reaches the optimal agent. That way you can cover the pareto of operations quickly and have a positive impact within a few months.
Thanks to this, you decide which operations you are ready to take on with AI, controlling the volume of issues you'll resolve in each case and the quality of the resources invested.
For a C-level, the implication is straightforward. If there's no Routing layer, your operating cost goes up and your user experience drops. You're paying as if everything were premium when most of the traffic isn't. Specialising also drastically increases the agent's success rate, because remember: the more narrowed and contextualised, the better.
And this doesn't only apply to chatbots. It applies to voice agents, employee assistants, sales copilots, document automation, and almost any product with recurring interaction.
Intent isn't a single message. That's the ideal theoretical world: thinking that a user is going to give you enough information to dispatch them in a single interaction. Good Intent Routing has to work with the chat history, handle confidence scores before making decisions, and account for exceptions.
3. Reasoning and Planning: when the system needs judgement, not just text
Once the request has been properly classified, the next layer kicks in: Reasoning and Planning. Here we're not talking simply about the model giving a nicely worded answer. We're talking about how it decomposes a task, decides on steps, evaluates options, and coordinates actions when the problem isn't trivial.
This layer is the one many demos try to simulate with a clever prompt. But in a real environment, useful reasoning doesn't live in isolation. It lives constrained by context, tools, restrictions, memory, permissions, and business goals.
Technical teams know that the demo built in two weeks with a prompt held together with sticky tape — that dangerous mirage we mentioned earlier — usually has to be redone almost from scratch. We're no longer talking about a prompt tested with a few cases and a handful of exceptions added as guidelines.
We're talking about AI engineering: data, complex logic, integrations, and events. That's where the gap between the demo and the final product really opens up.
This is also where the two-speed pattern connects. The slow lane isn't just plugging in a more expensive model. It's switching on a richer form of processing when the use case truly demands it.
For business, this changes an important conversation. It's not about always buying the most powerful model, but about using the right level of reasoning at the right moment. That distinction has a brutal impact on cost, latency, and scalability.
A good signal will be that your technical team has, in their tech stack, several models — even from the same provider — of different sizes. That usually implies different speeds and different levels of intelligence or specialisation. And it lets you play more cards when designing the solution.
In addition, an agentic piece of software almost always has less than 5% inference code. The rest is still the usual: software engineering, logic, data handling, resource optimisation, and events. This is key. Combining deterministic logic and traditional programming with inference is, in practice, the most powerful guardrail of all. The indispensable one.
Honestly, even though it sounds too basic, all of this is about narrowing inference down to the points where it makes a difference, and not leaving too much freedom to the AI — without intent or control — in systems that are going to ship to production.
95% of your shiny agentic software will be practically the same software that was being written five or ten years ago. And that, in fact, is a great signal that your AI Engineers team is doing it right.
4. Tools: the point where AI stops being a demo
An AI that can't act, query, or execute often stays in a layer of text. Useful, yes. But limited. The Tools layer is what connects the system to the real business.
What's more, there are increasingly better and more reliable ways of getting an AI to interact with tools. Just two years ago, in 2024, this was much more Wild West. Protocols like MCP have paved the way, but they aren't the full magic.
The real magic is that models are being trained better and better to fail less when interacting with other pieces of software, especially in tool calling and function calling.
In 2024, many models were still focused almost entirely on responding to humans in whatever way seemed most useful or most intelligent. We'll see this later in a post about how LLMs are trained to satisfy humans, because it's not trivial. Now we're starting to see a more serious layer of operational reliability.
Querying data, looking something up in a CRM, updating a ticket, kicking off a workflow, summarising a meeting with real context, or cross-referencing information between systems are all examples of value that don't come from the prompt, but from the integration.
In practice, many companies think they have a model-limitation problem when in fact they have an integration problem or a context problem.
5. Memory: continuity, personalisation, and context
There's a lot of talk now about context engineering: giving context to the model for its answer. Models have larger and larger context windows, which lets you attach even thousands of pages to a question so they can answer taking all that context into account in seconds. The trend is to reach 2 million tokens generally available within a few months (as of May 2026). It's a beast. The Bible, or Don Quixote, fits several times over as context.
But it's not all about giving the model the largest possible amount of information. That has to be limited. It's expensive, and it's not always the most effective approach. Remember, divide and conquer applies here too. Continuity is important, yes, but I don't only mean memory to give continuity to a conversation. I mean saving the state of it.
If a user spoke with an agent that's supposed to solve a technical problem and that agent runs a defined, complex flow across several states (Greeting, requirement gathering, Problem solving…), the current state will move forward as steps are resolved and prerequisites for the next state are met.
That state, alongside other valuable data, can be saved associated to the user and the conversation. Your agent doesn't need to figure out where it is in the process by reading the entire history if that data already exists and is reliable. This concept is usually called state persistence, and it's one of the most important pieces for an agentic system to scale complexity with judgement.
State, as we have been seeing in the previous points, again specialises the context for the AI: "we are at this point and our objective is only X". As you can see, it is the pattern that ties all the points together.
6. Observability: seeing what the system is really doing
Observability can no longer be limited to measuring only classical KPIs, conversion funnels, or traditional software bugs. In agentic systems you also need to observe how routing classifies requests, where tools fail, what decisions the system takes, how much each flow costs, where exceptions are triggered, and what error patterns appear when inference comes into play. Without Observability, often you won't even know why something seemingly intelligent is failing in production.
AI enables us to observe the system qualitatively: we should build agents that read and interpret how the system and the other agents are resolving requests, or how satisfied the user ended up.
The key idea a decision maker should keep
If an AI initiative is being framed as if everything depended on the prompt or on the model, you're probably looking at the wrong layer. The real value usually lies in how you design the Interface, Routing (specialise and narrow), how you decide, how you integrate Tools, how you manage Memory, and how you observe the whole system.
In other words, an AI product isn't defined by the brilliance of its demo, but by the quality of its architecture, adapted to AI Engineering. And that architecture, when properly thought through, doesn't only make the system look intelligent. It lets you operate it, scale it, narrow it, and turn it into something genuinely useful for business.
Empower yourself and think from the business side, with these concepts, about how to build real AI systems: which issues repeat the most? Which ones do I want to invest more resources in (Slow Lane)? Where can the AI generate negative perception, so I should use it in the BackEnd to speed up processes and improve the value of my product?…
What we'll cover in Part 2
In the next part of this series we'll get into the determining factors a decision maker should put on the table before kicking off an AI project. There we'll talk about, among other things:
- Latency, the eternal enemy, especially delicate in voice models and experiences where a few hundred milliseconds completely change the perception of the product.
- Languages, because not every region has the same level of model maturity, and this shows up enormously in voice, accents, and multilingual cases.
- Cloud vs on-premise, a classic decision that comes back with force, now with a different balance. GPU as a Service no longer turns out so cheap when an operation scales, but on-prem also brings scarcity, capex, talent, and operational complexity.
- Synthetic data and measurable use cases before shipping to production.
The idea is to stop talking about AI as something abstract and start talking about real decisions. The kind that determine whether a project takes off or becomes another expensive pilot.
Closing
If this series does its job (which would make me very happy), by the end you won't have learned to programme agents. But you will have learned to ask better questions, to read the stack better, to talk to your team in a far more effective way, and to lead AI solutions from the business side with a lot more judgement. That means spotting earlier where the value is, where the risk is, where the demo mirages are, and where the product really begins. And for a business person with the capacity to launch initiatives, that already changes the conversation a great deal.
See you in the next post.

