AI Products Are Not Prompts — A Guide for C-Levels and Decision Makers (Part 1)

Liminal

by Jonathan Castro

The edge of change

Get notified when new posts are published at the edge of change

AI Products Are Not Prompts — A Guide for C-Levels and Decision Makers (Part 1)

#AI Strategy#AI Systems#Liminal

Business Value

Technical Complexity

After the launch of Liminal a few days ago, I want to begin a series of posts I consider especially important. The idea of this series is to create pieces that can be read on their own, but that together form an increasingly clear map for anyone with the capacity or the interest to drive AI-based products, initiatives or business lines. I think it can be a useful contribution for C-levels and managers given all the noise around AI on social media right now.

We'll start at the most basic, least jargon-heavy layer, and we'll go deeper bit by bit until we reach the guts of these systems.

The aim is not to turn a business profile into an engineer, but to give them enough criteria to hold useful technical conversations, make better decisions, and tell a real opportunity from a demo dressed up as strategy.

In every post I'll use two very simple markers to orient the reading:

Technical complexity: from 1 to 5. A 1 means any business profile should be able to follow it without friction. A 5 means content already aimed at senior technical teams.
Business value: from 1 to 5. A 5 means the content is heavily oriented towards pure business, even though we're always talking about AI and technology.

In addition, to make reading easier, specialised industry terms will always appear in the language they were coined in — usually English — and linked to a glossary for anyone who wants to dig further. I don't think it makes sense to force-translate all the technical jargon into other languages, because in many cases it adds more noise than clarity.

This post is the first part of that series. And I want to start with a foundational idea, because if this isn't understood, almost everything else gets twisted: an AI product is not a prompt. It sounds extremely obvious and basic. That was the point. Now let's start digging in.

The original confusion

The biggest confusion I keep seeing in the market is this: entire teams claim to be building "AI products" when, in reality, they've wrapped a couple of prompts in a nice interface. Sometimes the demo is striking. Sometimes it even works in a meeting. But that doesn't mean a product exists, that it scales, or even that it'll work in a real conversation.

There's an additional problem with agent-based products: the mirage that something works simply because of the wow effect (less and less wow each day) of seeing AI do things and seem intelligent. The MVP gap has always existed in software — it's a question of expectation management. In traditional software, if an interface or a data flow worked, with unit tests, integration tests, end-to-end tests and a few iterations — to put it very simply — you already had something acceptable.

With agentic software, on the other hand, the interface is the language. And that is wonderful, but it also introduces a brutal complexity of corner cases.

The mirage shows up right there: when you see the system converse and respond inside the full complexity of human language, it looks like the product is already working. But when real users arrive, the usual avalanche of bugs from early releases is joined by others that are far harder to track: hallucinations, badly formulated requests, ambiguous expectations, unresolved demands, and behaviours that are hard to reproduce. That's when the real nightmare begins. The good news is that there are many ways to contain it, and they go well beyond the (over-)mentioned guardrails.

When that confusion reaches the executive committee or the decision-making layers, the problem multiplies. Initiatives get approved with a completely wrong picture of where the real difficulty lies, what assets matter, and why the solution then fails to scale, retain users, or generate the expected return.

A real AI product has several layers

When we talk seriously about an AI product, we're usually talking about several layers working together. There is no official way to do it, but there is a set of indispensable components you should know:

Interface — how the user interacts: chat, voice, API, embedded in another system, SaaS, or as a copilot.
Intent Routing — how you decide what kind of request has come in and what path it should follow.
Reasoning and Planning — how the system thinks, decomposes, prioritises, and decides on steps.
Tools — what actions it can actually execute: query data, call APIs, update systems, or trigger processes.
Agent Memory — what context persists, what is remembered, and how it is used without breaking security, cost, or experience.
Observability — how you really measure what's happening. On top of the usual (performance, errors, funnels, costs, system decisions), you now also need to bring in agentic observability (more qualitative).

The prompt only touches part of this system. Important, yes. But only a part. And in fact, often it isn't even the part that creates the most value or destroys the most risk.

Interact with the graphic to see how information flows through it (hover in a box on desktop, tap in a box on mobile).

1. Interface: the first guardrail of an AI product

Here, beyond the obvious things we've been building for ages (UX, a good frontend, a properly structured backend to achieve high availability, and everything else), we have to think about building experiences that don't let the beast loose. The UX itself should be the first guardrail of an AI product. And that prevents many headaches and user frustrations.

Leaving a ChatGPT-style interface in a product that isn't another ChatGPT is a mistake. I see it fairly often in client meetings and in some designs out in the market. UX teams should have, as has always been the case, a methodical conversation with your AI Engineers and your product team to decide what type of interaction best frames the problem the user wants to solve.

Divide and conquer: an intentional interface to classify the request. Several touchpoints with agents tend to be better than one general chat when the user is focused on solving a specific problem and you already know what it is. There you're already classifying part of the prompt without using inference, and that prunes a huge number of corner cases (future errors).
AI hidden in the BackEnd. It runs after button clicks, not after sending a message — not everything has to be a chat with the AI. The magic of filling in a form and having an agent act behind the scenes with its full power is brutal, precisely because it's a controlled, structured input. That reduces many other problems, narrows the failure surface, and creates an almost deterministic environment for the agent. It will never be fully deterministic, of course, but that's the point — to keep narrowing the response space.
AI contextualised to specific actions of the app. If you're in a specific area, your prompt should be focused on a specific action, not letting the AI wander around the whole product. All possible context should be sent automatically alongside the prompt so the AI has extra information or can retrieve more (user profile, last actions…).

Before Intent Routing and Reasoning and Planning: the piece almost no one visualises

Most failed initiatives don't fail because the model is bad. They fail because the system treats every input the same way: with the same cost, the same latency, and the same execution logic.

Let alone if latency is a problem — which it always is — because we users want magic and we don't settle for just anything.

That mistake shows up enormously in agents, internal assistants, support, voice agents, operational automation, and any product where volume starts to grow. If every interaction goes through the same lane, the system becomes expensive, slow, and hard to govern.

That's why the Intent Routing and Reasoning and Planning layers aren't a technical detail. They are layers with direct impact on margins, user experience, and product viability.

2. Intent Routing: deciding well before thinking deeper, and investing resources

Intent Routing is the system's ability to understand what type of request it's receiving and decide what path it should follow. Put in business language, it's the layer that prevents you from spending expensive resources on tasks that don't need them.

This connects directly with the idea that agents need (at least) two speeds. Not every query deserves the same treatment. A simple, repetitive, or very narrow request should not consume the same time, cost, and complexity as an ambiguous, sensitive request that requires several tools.

When designing these patterns, one of the cases that usually gets classified revolves around the English concept of small talk. Many conversations have simple beginnings and endings: greetings, closings, thank-yous, or basic requests that should be cheap and fast.

A mature design typically separates at least these concepts:

Fast lane for the routine: greetings, FAQs, status queries, closed tasks, cacheable or highly structured responses.
Slow lane for the complex: analysis, chained tasks, tool use, decisions with context, and non-trivial flows.
Guardrails: I usually also add early guardrails in this layer. And they don't have to be just the usual ones (misuse, abusive language, etc.). Guardrails can also be business guardrails. Things you don't want to process, or where you don't want to spend resources, because every word generated by the AI costs money (we'll see this later in a post about the economics of AI). Bouncing them at this first layer is a pure business decision you should demand. It impacts cost, the intent of your product, and performance, because you specialise the use of your resources on what matters to you.

In addition, intent routing can be multi-layered: it doesn't only choose between fast and slow, it routes to specialised agents within each lane. Think of a call centre with dozens of intents: the system filters and dispatches until it reaches the optimal agent. That way you can cover the pareto of operations quickly and have a positive impact within a few months.

Thanks to this, you decide which operations you are ready to take on with AI, controlling the volume of issues you'll resolve in each case and the quality of the resources invested.

For a C-level, the implication is straightforward. If there's no Routing layer, your operating cost goes up and your user experience drops. You're paying as if everything were premium when most of the traffic isn't. Specialising also drastically increases the agent's success rate, because remember: the more narrowed and contextualised, the better.

And this doesn't only apply to chatbots. It applies to voice agents, employee assistants, sales copilots, document automation, and almost any product with recurring interaction.

Intent isn't a single message. That's the ideal theoretical world: thinking that a user is going to give you enough information to dispatch them in a single interaction. Good Intent Routing has to work with the chat history, handle confidence scores before making decisions, and account for exceptions.

3. Reasoning and Planning: when the system needs judgement, not just text

Once the request has been properly classified, the next layer kicks in: Reasoning and Planning. Here we're not talking simply about the model giving a nicely worded answer. We're talking about how it decomposes a task, decides on steps, evaluates options, and coordinates actions when the problem isn't trivial.

This layer is the one many demos try to simulate with a clever prompt. But in a real environment, useful reasoning doesn't live in isolation. It lives constrained by context, tools, restrictions, memory, permissions, and business goals.

Technical teams know that the demo built in two weeks with a prompt held together with sticky tape — that dangerous mirage we mentioned earlier — usually has to be redone almost from scratch. We're no longer talking about a prompt tested with a few cases and a handful of exceptions added as guidelines.

We're talking about AI engineering: data, complex logic, integrations, and events. That's where the gap between the demo and the final product really opens up.

This is also where the two-speed pattern connects. The slow lane isn't just plugging in a more expensive model. It's switching on a richer form of processing when the use case truly demands it.

For business, this changes an important conversation. It's not about always buying the most powerful model, but about using the right level of reasoning at the right moment. That distinction has a brutal impact on cost, latency, and scalability.

A good signal will be that your technical team has, in their tech stack, several models — even from the same provider — of different sizes. That usually implies different speeds and different levels of intelligence or specialisation. And it lets you play more cards when designing the solution.

In addition, an agentic piece of software almost always has less than 5% inference code. The rest is still the usual: software engineering, logic, data handling, resource optimisation, and events. This is key. Combining deterministic logic and traditional programming with inference is, in practice, the most powerful guardrail of all. The indispensable one.

Honestly, even though it sounds too basic, all of this is about narrowing inference down to the points where it makes a difference, and not leaving too much freedom to the AI — without intent or control — in systems that are going to ship to production.

95% of your shiny agentic software will be practically the same software that was being written five or ten years ago. And that, in fact, is a great signal that your AI Engineers team is doing it right.

4. Tools: the point where AI stops being a demo

An AI that can't act, query, or execute often stays in a layer of text. Useful, yes. But limited. The Tools layer is what connects the system to the real business.

What's more, there are increasingly better and more reliable ways of getting an AI to interact with tools. Just two years ago, in 2024, this was much more Wild West. Protocols like MCP have paved the way, but they aren't the full magic.

The real magic is that models are being trained better and better to fail less when interacting with other pieces of software, especially in tool calling and function calling.

In 2024, many models were still focused almost entirely on responding to humans in whatever way seemed most useful or most intelligent. We'll see this later in a post about how LLMs are trained to satisfy humans, because it's not trivial. Now we're starting to see a more serious layer of operational reliability.

Querying data, looking something up in a CRM, updating a ticket, kicking off a workflow, summarising a meeting with real context, or cross-referencing information between systems are all examples of value that don't come from the prompt, but from the integration.

In practice, many companies think they have a model-limitation problem when in fact they have an integration problem or a context problem.

5. Memory: continuity, personalisation, and context

There's a lot of talk now about context engineering: giving context to the model for its answer. Models have larger and larger context windows, which lets you attach even thousands of pages to a question so they can answer taking all that context into account in seconds. The trend is to reach 2 million tokens generally available within a few months (as of May 2026). It's a beast. The Bible, or Don Quixote, fits several times over as context.

But it's not all about giving the model the largest possible amount of information. That has to be limited. It's expensive, and it's not always the most effective approach. Remember, divide and conquer applies here too. Continuity is important, yes, but I don't only mean memory to give continuity to a conversation. I mean saving the state of it.

If a user spoke with an agent that's supposed to solve a technical problem and that agent runs a defined, complex flow across several states (Greeting, requirement gathering, Problem solving…), the current state will move forward as steps are resolved and prerequisites for the next state are met.

That state, alongside other valuable data, can be saved associated to the user and the conversation. Your agent doesn't need to figure out where it is in the process by reading the entire history if that data already exists and is reliable. This concept is usually called state persistence, and it's one of the most important pieces for an agentic system to scale complexity with judgement.

State, as we have been seeing in the previous points, again specialises the context for the AI: "we are at this point and our objective is only X". As you can see, it is the pattern that ties all the points together.

6. Observability: seeing what the system is really doing

Observability can no longer be limited to measuring only classical KPIs, conversion funnels, or traditional software bugs. In agentic systems you also need to observe how routing classifies requests, where tools fail, what decisions the system takes, how much each flow costs, where exceptions are triggered, and what error patterns appear when inference comes into play. Without Observability, often you won't even know why something seemingly intelligent is failing in production.

AI enables us to observe the system qualitatively: we should build agents that read and interpret how the system and the other agents are resolving requests, or how satisfied the user ended up.

The key idea a decision maker should keep

If an AI initiative is being framed as if everything depended on the prompt or on the model, you're probably looking at the wrong layer. The real value usually lies in how you design the Interface, Routing (specialise and narrow), how you decide, how you integrate Tools, how you manage Memory, and how you observe the whole system.

In other words, an AI product isn't defined by the brilliance of its demo, but by the quality of its architecture, adapted to AI Engineering. And that architecture, when properly thought through, doesn't only make the system look intelligent. It lets you operate it, scale it, narrow it, and turn it into something genuinely useful for business.

Empower yourself and think from the business side, with these concepts, about how to build real AI systems: which issues repeat the most? Which ones do I want to invest more resources in (Slow Lane)? Where can the AI generate negative perception, so I should use it in the BackEnd to speed up processes and improve the value of my product?…

What we'll cover in Part 2

In the next part of this series we'll get into the determining factors a decision maker should put on the table before kicking off an AI project. There we'll talk about, among other things:

Latency, the eternal enemy, especially delicate in voice models and experiences where a few hundred milliseconds completely change the perception of the product.
Languages, because not every region has the same level of model maturity, and this shows up enormously in voice, accents, and multilingual cases.
Cloud vs on-premise, a classic decision that comes back with force, now with a different balance. GPU as a Service no longer turns out so cheap when an operation scales, but on-prem also brings scarcity, capex, talent, and operational complexity.
Synthetic data and measurable use cases before shipping to production.

The idea is to stop talking about AI as something abstract and start talking about real decisions. The kind that determine whether a project takes off or becomes another expensive pilot.

Closing

If this series does its job (which would make me very happy), by the end you won't have learned to programme agents. But you will have learned to ask better questions, to read the stack better, to talk to your team in a far more effective way, and to lead AI solutions from the business side with a lot more judgement. That means spotting earlier where the value is, where the risk is, where the demo mirages are, and where the product really begins. And for a business person with the capacity to launch initiatives, that already changes the conversation a great deal.

See you in the next post.

Tras el lanzamiento de Liminal hace unos días, quiero empezar una serie de posts que considero especialmente importante. La idea de esta serie es crear piezas que se puedan leer por separado, pero que juntas formen un mapa cada vez más claro para quienes tienen capacidad o interés de impulsar productos, iniciativas o líneas de negocio basadas en IA. Creo que puede ser un buen aporte para c-levels y managers dado todo el ruido que existe ahora mismo en redes respecto a IA.

Empezaremos por la capa más básica y con menos jerga, pero iremos profundizando poco a poco hasta bajar a las tripas de estos sistemas.

El objetivo no es convertir a un perfil de negocio en ingeniero, sino darle criterio suficiente para mantener conversaciones técnicas útiles, tomar mejores decisiones y distinguir una oportunidad real de una demo disfrazada de estrategia.

En cada post usaré dos marcadores muy simples para orientar la lectura:

Complejidad técnica: del 1 al 5. Un 1 significa que cualquier perfil de negocio debería poder seguirlo sin fricción. Un 5 significa contenido ya orientado a equipos técnicos senior.
Valor para negocio: del 1 al 5. Un 5 significa que el contenido está muy orientado a negocio puro y duro, aunque siempre estemos hablando de IA y tecnología.

Además, para agilizar la lectura, los términos especializados de la industria irán siempre en el idioma en el que se acuñaron, normalmente inglés, y enlazados a un glosario para quien quiera ampliar su significado. No me parece correcto forzar la traducción de todo el argot técnico a otras lenguas, porque en muchos casos introduce más ruido que claridad.

Este post es la primera parte de esa serie. Y quiero arrancar con una idea base porque, si no se entiende esto, casi todo lo demás se tuerce: un producto de IA no es un prompt. Suena muy obvio y básico. Esa era la idea. Ahora empecemos a profundizar.

La confusión de origen

La mayor confusión que sigo viendo en el mercado es esta: equipos enteros dicen estar construyendo "productos de IA" cuando, en realidad, han envuelto un par de prompts en una interfaz agradable. A veces la demo es llamativa. A veces incluso funciona en una reunión. Pero eso no significa que exista un producto ni que escale, ni siquiera que vaya a funcionar en una conversación real.

Hay un problema adicional con los productos con Agentes: el espejismo de que algo funciona por el simple efecto wow (cada vez menos wow) de ver a la IA hacer cosas y parecer inteligente. El gap del MVP siempre ha existido en el software: es una cuestión de gestión de expectativas. En software tradicional, si una interfaz o un flujo de datos funcionaban, con unit tests, integration tests, end-to-end tests y unas cuantas iteraciones, simplificando mucho, ya podías tener algo aceptable.

Con software agentic, en cambio, la interfaz es el lenguaje. Y eso es maravilloso, pero también introduce una complejidad brutal de corner cases.

El espejismo aparece justo ahí: cuando ves que el sistema conversa y responde dentro de toda la complejidad del lenguaje humano, parece que el producto ya funciona. Pero cuando llegan usuarios reales, a la habitual fase de bugs de las primeras releases, se le suman otros mucho más difíciles de seguir: hallucinations, peticiones mal formuladas, expectativas ambiguas, demandas no resueltas y comportamientos difíciles de reproducir. Ahí empieza la pesadilla de verdad. La buena noticia es que hay muchas formas de contenerlo, y van bastante más allá de los (sobre) mencionados guardrails.

Cuando esa confusión llega a comité de dirección o a capas de decisión, el problema se multiplica. Se aprueban iniciativas con una imagen completamente equivocada de dónde está la dificultad real, qué activos importan y por qué luego la solución no escala, no retiene usuarios o no genera el retorno esperado.

Un producto de IA real tiene varias capas

Cuando hablamos de producto de IA en serio, normalmente estamos hablando de varias capas que trabajan juntas. No existe una forma oficial de hacerlo, pero sí una serie de componentes indispensables que debes conocer:

Interface — cómo interactúa el usuario: chat, voz, API, integrado dentro de otro sistema, SaaS o como copilot.
Intent Routing — cómo decides qué tipo de petición ha llegado y qué camino debe seguir.
Reasoning and Planning — cómo el sistema piensa, descompone, prioriza y decide pasos.
Tools — qué acciones puede ejecutar de verdad: consultar datos, llamar APIs, actualizar sistemas o disparar procesos.
Agent Memory — qué contexto persiste, qué se recuerda y cómo se usa sin romper seguridad, coste o experiencia.
Observability — cómo mides lo que está pasando de verdad. Además de lo ya conocido (rendimiento, errores, funnels, costes, decisiones del sistema), ahora también es necesario meter observabilidad agéntica (más cualitativa).

El prompt solo toca una parte de este sistema. Importante, sí. Pero solo una parte. Y, de hecho, muchas veces ni siquiera es la parte que más valor crea o más riesgo destruye.

Interactúa con el gráfico para ver cómo se mueve la información dentro de él (hover sobre una caja en escritorio, tap sobre una caja en móvil).

1. Interface: el primer guardrail de un producto de IA

Aquí, además de lo obvio que llevamos lustros construyendo (UX, un buen frontend, un backend bien estructurado para conseguir alta disponibilidad y todo lo demás), debemos pensar en construir experiencias que no dejen a la bestia libre. La propia UX debe ser el primer guardrail de un producto de IA. Y eso evita muchos dolores de cabeza y frustraciones de usuarios.

Dejar una interfaz al estilo ChatGPT en un producto que no es otro ChatGPT es un error. Lo veo con bastante frecuencia en reuniones con clientes y también en algunos diseños del mercado. Los equipos de UX deberían tener, como siempre se ha hecho, una conversación metódica con tus AI Engineers y con tu equipo de producto para decidir qué tipo de interacción acota mejor el problema que el usuario quiere resolver.

Divide y vencerás: interfaz intencional para clasificar la petición. Varios puntos de contacto con agents suele ser mejor que un chat general cuando el usuario está enfocado en resolver un problema específico y tú ya sabes cuál es. Ahí ya estás clasificando parte del prompt sin usar inferencia, y eso poda muchísimos corner cases (futuros errores).
IA oculta en el BackEnd. Se ejecuta tras clicks en botones, no al mandar un mensaje: no todo son chats con la IA. La magia de rellenar un formulario y que un agente actúe por detrás con toda su potencia es brutal, precisamente porque es un input controlado y estructurado. Eso reduce muchos otros problemas, acota mucho el fallo y genera un entorno casi determinista para el agent. Nunca lo va a ser del todo, claro, pero de eso se trata, de ir acotando el área de respuesta.
IA contextualizada a ciertas acciones de la app. Si estás en un área específica, tu prompt debería estar enfocado a una acción específica, no a dejar a la IA de paseo por todo el producto. Se debe mandar automáticamente junto con el prompt todo el contexto posible para que la IA tenga información extra o pueda recuperar otra (perfil del usuario, últimas acciones…).

Antes de Intent Routing y Reasoning and Planning: la pieza que casi nadie visualiza

La mayoría de iniciativas fallidas no fallan porque el modelo sea malo. Fallan porque el sistema trata todos los inputs de la misma forma: con el mismo coste, la misma latencia y la misma lógica de ejecución.

Ya no te digo si la latencia es un problema, que siempre lo es, porque los usuarios queremos magia y no nos conformamos con cualquier cosa.

Ese error se nota muchísimo en agentes, asistentes internos, soporte, agentes de voz, automatización operativa y cualquier producto donde el volumen empieza a crecer. Si cada interacción pasa por el mismo carril, el sistema se vuelve caro, lento y difícil de gobernar.

Por eso las capas de Intent Routing y Reasoning and Planning no son un detalle técnico. Son capas con impacto directo en márgenes, experiencia de usuario y viabilidad del producto.

2. Intent Routing: decidir bien antes de pensar más profundo, e invertir recursos

El Intent Routing es la capacidad del sistema para entender qué tipo de petición está recibiendo y decidir qué camino debe seguir. Dicho en lenguaje de negocio, es la capa que evita gastar recursos caros en tareas que no lo necesitan.

Aquí conecta directamente la idea de que los agentes necesitan dos velocidades (al menos). No todas las consultas merecen el mismo tratamiento. Una petición simple, repetitiva o muy acotada no debería consumir el mismo tiempo, coste y complejidad que una petición ambigua, sensible o que requiere varias herramientas.

Al diseñar estos patrones, uno de los casos que se suelen clasificar gira en torno al concepto anglosajón small talk. Muchas conversaciones tienen un comienzo y un final sencillos: saludos, cierres, agradecimientos o peticiones básicas que deben ser baratas y rápidas.

Un diseño maduro suele separar al menos estos conceptos:

Carril rápido para lo rutinario: saludos, FAQs, consultas de estado, tareas cerradas, respuestas cacheables o muy estructuradas.
Carril lento para lo complejo: análisis, tareas encadenadas, uso de herramientas, decisiones con contexto y flujos no triviales.
Guardrails: yo suelo añadir en esta capa también unos guardrails tempranos. Y no tienen por qué ser solo los de siempre (mal uso, lenguaje obsceno, etc.). Los guardrails también pueden ser de negocio. Cosas que no te interesa procesar, o en las que no quieres invertir recursos, porque cada palabra generada por la IA vale dinero (lo veremos más adelante en un post sobre la economía de la IA). Que reboten en esta primera capa es una decisión pura de negocio que debes exigir. Impacta en coste, en la intención de tu producto y en el rendimiento, porque especializas el uso de tus recursos en lo que te interesa.

Además, intent routing puede ser multicapa: no solo decide entre rápido y lento, sino que enruta a agentes especializados dentro de cada carril. Piensa en un call center con decenas de intenciones: el sistema va filtrando y derivando hasta llegar al agente óptimo. Así se puede cubrir el pareto de operaciones rápidamente e impactar positivamente en unos meses.

Gracias a esto, tú decides qué operaciones estás preparado para asumir con IA, controlando el volumen de issues que resolverás en cada caso y la calidad de los recursos invertidos.

Para un C-Level, la implicación es sencilla. Si no existe una capa de Routing, tu coste operativo sube y tu experiencia de usuario cae. Estás pagando como si todo fuera premium cuando la mayoría del tráfico no lo es. Especializar además incrementa drásticamente las probabilidades de éxito del Agente, porque recuerda: cuanto más acotado y contextualizado, mejor.

Y esto no aplica solo a chatbots. Aplica a agents de voice, asistentes de empleados, copilots comerciales, automatización documental y prácticamente cualquier producto con interacción recurrente.

La intención no es un único mensaje. Ese es el mundo ideal teórico: pensar que un usuario te va a dar suficiente información como para derivarlo en una sola interacción. Un buen Intent Routing debe trabajar con el historial del chat, manejar scores de confianza antes de tomar decisiones y contemplar excepciones.

3. Reasoning and Planning: cuando el sistema necesita criterio, no solo texto

Una vez la petición ha sido bien clasificada, entra la siguiente capa: Reasoning and Planning. Aquí no hablamos simplemente de que el modelo responda bonito. Hablamos de cómo descompone una tarea, decide pasos, evalúa opciones y coordina acciones cuando el problema no es trivial.

Esta capa es la que muchas demos intentan simular con un buen prompt. Pero en un entorno real, el razonamiento útil no vive aislado. Vive condicionado por contexto, herramientas, restricciones, memory, permisos y objetivos de negocio.

Los equipos técnicos saben que esa demo hecha en dos semanas con un prompt cogido con pinzas, ese peligroso espejismo del que hablábamos antes, suele tener que rehacerse casi por completo. Ya no hablamos de un prompt probado con varios casos y unas cuantas excepciones añadidas como guidelines.

Hablamos de AI engineering: datos, lógica compleja, integraciones y eventos. Aquí es donde se abre de verdad el gap entre la demo y el producto final.

Aquí también se conecta el patrón de las dos velocidades. El carril lento no es solo poner un modelo más caro. Es activar una forma de procesamiento más rica cuando el caso de uso realmente lo exige.

Para negocio, esto cambia una conversación importante. No se trata de comprar siempre el modelo más potente, sino de usar el nivel de razonamiento adecuado en el momento adecuado. Esa distinción impacta de forma brutal en coste, latencia y escalabilidad.

Será una buena señal que tu equipo técnico tenga en su tech stack varios modelos, aunque sean del mismo proveedor, de distintos tamaños. Eso suele implicar velocidad y nivel de inteligencia o especialización distintos. Y te permite jugar con más cartas a la hora de diseñar la solución.

Además, un software agentic casi siempre tiene menos de un 5% de código de inferencia. El resto sigue siendo lo de siempre: software engineering, lógica, manejo de datos, optimización de recursos y eventos. Esto es clave. Combinar lógica determinista y programación tradicional con inferencia es, en la práctica, el guardrail más potente de todos. El indispensable.

De verdad, aunque suene demasiado básico, todo esto va de acotar la inferencia a los puntos donde marca la diferencia, pero no dejar demasiada libertad a la IA, sin intención ni control, en sistemas que van a salir a producción.

El 95% de tu flamante software agentic será prácticamente el mismo software que se escribía hace cinco o diez años. Y eso, en realidad, es una gran señal de que tu equipo de AI Engineers lo está haciendo bien.

4. Tools: el punto donde la IA deja de ser una demo

Una IA que no puede actuar, consultar o ejecutar se queda muchas veces en una capa de texto. Útil, sí. Pero limitada. La capa de Tools es la que conecta el sistema con el negocio real.

Además, cada vez hay formas mejores y más fiables de conseguir que una IA interactúe con herramientas. Hace solo dos años, en 2024, esto era bastante más far west. Protocolos como MCP han allanado el camino, pero no son la magia completa.

La magia real es que los modelos se están entrenando cada vez mejor para fallar menos al interactuar con otras piezas de software, especialmente en tool calling y function calling.

En 2024, muchos modelos seguían centrados casi por completo en responder a humanos de la forma que pareciera más útil o más inteligente posible. Lo veremos más adelante en un post sobre cómo se entrenan los LLMs para satisfacer a los humanos, porque no es trivial. Ahora ya empezamos a ver una capa más seria de fiabilidad operativa.

Buscar datos, consultar un CRM, actualizar un ticket, lanzar un workflow, resumir una reunión con contexto real o cruzar información entre sistemas son ejemplos de valor que no nacen del prompt, sino de la integración.

En la práctica, muchas empresas creen que tienen un problema de limitación del modelo cuando en realidad tienen un problema de integración o de dar mal el contexto.

5. Memory: continuidad, personalización y contexto

Ahora se habla mucho de context engineering: dar contexto al modelo para su respuesta. Los modelos cada vez tienen más context window y eso permite adjuntarles incluso miles de páginas a una pregunta para que puedan contestar teniendo en cuenta todo ese contexto en segundos. La tendencia es llegar a los 2 millones de tokens generalizados en pocos meses (dicho en mayo de 2026). Es una bestialidad. Cabe la Biblia o el Quijote varias veces como contexto.

Pero no todo es darle la mayor cantidad de información posible. Eso hay que limitarlo. Es caro y no siempre es lo más efectivo. Recuerda, divide y vencerás también aplica aquí. La continuidad es importante, sí, pero no me refiero solo a memoria para dar continuidad a una conversación. Me refiero a guardar el estado de la misma.

Si un usuario habló con un agente que debe resolver un problema técnico y ese agente tiene un flujo definido complejo que pasa por varios estados (Saludo, toma de requisitos, Solución de problema…), el estado actual irá cambiando según se resuelven pasos y se marcarán requisitos para pasar de un estado al siguiente.

Ese estado, junto con otros datos de valor, puede guardarse asociado al usuario y a la conversación. Tu agente no tiene por qué decidir en qué punto está del proceso leyendo todo el historial si ese dato ya existe y es fiable. A este concepto se le suele llamar state persistence y es una de las piezas más importantes para que un sistema agentic escale complejidad con criterio.

State, como hemos ido viendo en los puntos anteriores, vuelve a especializar el contexto a la IA: "estamos en este punto y nuestro objetivo es solo X". Como ves, es el patrón que hila todos los puntos.

6. Observability: ver lo que el sistema está haciendo de verdad

Observability ya no puede limitarse a medir solo KPIs clásicos, funnels de conversión o bugs de software tradicional. En sistemas agentic necesitas observar también cómo clasifica el routing las peticiones, dónde fallan las tools, qué decisiones toma el sistema, cuánto cuesta cada flujo, dónde se disparan las excepciones y qué patrones de error aparecen cuando entra en juego la inferencia. Sin Observability, muchas veces ni siquiera sabes por qué algo aparentemente inteligente está fallando en producción.

La IA nos capacita para observar el sistema de forma cualitativa: debemos construir agentes que lean e interpreten cómo el sistema y los otros agentes están resolviendo las peticiones, o cómo de contento quedó el usuario.

La idea clave que debe quedarse un decisor

Si una iniciativa de IA se está planteando como si todo dependiera del prompt o del modelo, probablemente se está mirando la capa equivocada. El valor real suele estar en cómo diseñas la Interface, Routing (especializas y acotas), cómo decides, cómo integras Tools, cómo gestionas Memory y cómo observas todo el sistema.

En otras palabras, un producto de IA no se define por la brillantez de su demo, sino por la calidad de su arquitectura, adaptada a AI Engineering. Y esa arquitectura, cuando está bien pensada, no solo hace que el sistema parezca inteligente. Permite operarlo, escalarlo, acotarlo y convertirlo en algo realmente útil para negocio.

Empodérate y piensa desde negocio con estos conceptos cómo hacer sistemas de IA reales: ¿qué issues se repiten más?, ¿en cuáles quiero invertir más recursos (carril lento)?, ¿dónde la IA puede generar percepción negativa y debo usarla en el BackEnd para acelerar procesos y mejorar el valor de mi producto?…

Lo que veremos en la Parte 2

En la siguiente parte de esta serie entraremos en los factores determinantes que un decisor debería poner encima de la mesa antes de arrancar un proyecto de IA. Ahí hablaremos, entre otros, de:

Latencia, el enemigo eterno, especialmente delicado en modelos de voice y experiencias donde unos cientos de milisegundos cambian por completo la percepción del producto.
Idiomas, porque no todas las regiones tienen el mismo nivel de madurez de modelos y esto se nota muchísimo en voice, acentos y casos multilingües.
Cloud vs on-premise, una decisión clásica que vuelve con fuerza, ahora con un equilibrio diferente. GPU as a Service ya no sale tan barata cuando una operación escala, pero el on-prem también introduce escasez, capex, talento y complejidad operativa.
Datos sintéticos y casos de uso medibles antes de salir a producción.

La idea es dejar de hablar de IA como algo abstracto y empezar a hablar de decisiones reales. De esas que marcan si un proyecto despega o se convierte en otro piloto caro.

Cierre

Si esta serie cumple su objetivo (lo cual me haría muy feliz), al final no habrás aprendido a programar agents. Pero sí a hacer mejores preguntas, a leer mejor el stack, a hablar con tu equipo de una forma bastante más efectiva y a liderar desde negocio soluciones de IA con bastante criterio. Eso significa identificar antes dónde está el valor, dónde está el riesgo, dónde hay espejismos de demo y dónde empieza de verdad el producto. Y para una persona de negocio con capacidad de lanzar iniciativas, eso ya cambia muchísimo la conversación.

Nos vemos en el siguiente post.

About Jonathan Castro