Glossary

Industry terms kept in their original language with bilingual definitions.

A

Agent
A software system that uses an AI model — typically an LLM — to perceive context, decide on actions and execute them by calling tools or APIs. Agents can take multiple steps and adapt their plan based on intermediate results.
Agent Memory
The mechanisms an AI agent uses to retain information across turns or sessions: short-term context (conversation history, working state) and long-term memory (user profile, learned facts, retrieved documents).
Agentic
Describes AI systems that exhibit agency: they can plan, decide, use tools and act semi-autonomously toward a goal — beyond producing a single text response.
AI Engineering
The practice of designing, building, evaluating and operating AI systems in production — covering model selection, prompting, retrieval, tool integration, evaluation, observability, safety and deployment.
API
Application Programming Interface. A defined contract that exposes the functionality of a service so other software can call it programmatically — typically over HTTP, with structured request and response formats.

B

BackEnd
The server-side portion of an application — databases, business logic, APIs, background jobs and integrations — that supports the user-facing client without being directly visible to the user.
Batching
Grouping many requests together so a model (on the GPU) processes them in a single pass instead of one by one. It raises throughput and lowers cost per request — a core efficiency lever cloud providers exploit aggressively at scale.
Bugs
Defects in software that cause it to behave incorrectly or unexpectedly — anything from a visible crash to a subtle logic error that only appears under specific conditions.

C

CAPEX
Capital Expenditure. Money spent up-front on acquiring or upgrading long-lived assets — buildings, hardware, software licences, infrastructure. In AI, typically associated with buying servers, GPUs and on-premise infrastructure.
Chatbots
Software programs that interact with users through conversational text. Older chatbots are typically rule- or intent-based; modern ones are usually built on LLMs.
ChatGPT
OpenAI's conversational AI product, publicly launched in November 2022. It popularised the chat-based interface to LLMs and triggered the consumer-AI wave.
Cloud
On-demand delivery of computing resources — servers, storage, databases, networking, GPUs — over the internet, billed per use. Major providers include AWS, Azure and Google Cloud.
Cluster
A set of connected machines (often GPU servers) that work together as a single computing resource. To run a model on-premise you deploy it on your own cluster; frontier models from the big providers can't be installed on it — they're only reachable through their API.
Compliance
The set of legal, regulatory and contractual requirements a product must meet — for example, GDPR, HIPAA, SOC 2, ISO 27001. In AI, compliance often dictates where data may be processed, how it is stored, who can access it, and whether models may be trained on it.
Confidence Scores
A numerical estimate (typically between 0 and 1) of how certain a model is about its own prediction or classification. AI systems use confidence scores to decide whether to act on a result, ask for clarification, escalate to a stronger model or hand off to a human.
Context Window
The maximum amount of text — measured in tokens — that a language model can process in a single call. It bounds how much instruction, prior conversation and reference material can be included in one request.
Corner Cases
Edge conditions where multiple unusual factors combine — typically rare, hard to reproduce and often missed in initial testing. In AI systems, corner cases are amplified because the input space (natural language) is essentially unbounded.
CRM
Customer Relationship Management. Software used to manage a company's interactions with current and prospective customers — accounts, contacts, opportunities, support tickets and pipelines.

E

End-to-End Tests
Automated tests that exercise an entire user flow through the real system — UI, backend, databases and external services — to verify the product works as a whole.

F

FAQs
Frequently Asked Questions. A curated list of common questions with prepared answers, typically used in support, documentation and onboarding.
Frontier Models
The most capable AI models available at a given moment — typically the latest releases from leading providers (OpenAI, Anthropic, Google, etc.), running at the upper edge of cost, parameter count and capability. They define the ceiling of what current AI can do.

G

GPU
Graphics Processing Unit. A processor with thousands of cores optimised for the massively parallel maths behind training and running neural networks. GPUs are the dominant — and expensive — hardware for AI inference and training.
GPU as a Service
Renting GPU compute on demand from a cloud or specialised provider, paid by the hour or by the call, instead of purchasing and operating dedicated hardware.
Guardrails
Constraints applied to an AI system to keep its inputs and outputs within acceptable bounds — for example, content filters, schema validation, refusal policies, scope limits and rate caps. Can be enforced before, during or after model inference.

H

Hallucinations
Outputs from a generative AI model that sound plausible but are factually incorrect or fabricated — invented citations, non-existent functions, made-up details. A widely studied failure mode of LLMs.

I

Idle Zero Cost
An architectural pattern in which agents or services consume no compute resources when they are not actively serving requests. Typically achieved by hibernating idle instances and waking them on demand. Critical for cost control when each user or tenant runs an isolated agent.
Inference
The process of running a trained AI model on an input to produce an output — distinct from training. Each inference call has its own cost, latency and resource footprint.
Input
The data given to a system to be processed. For an AI model, the input is whatever is sent in a single call — typically a prompt plus any attached context, instructions, history, documents or images.
Integration Tests
Automated tests that verify multiple modules or services work correctly when combined — covering the seams between components rather than individual units in isolation.
Intelligence Index
An aggregate score that combines benchmarks across reasoning, mathematics, coding, knowledge and instruction-following to compare AI models. A practical proxy for 'how smart is this model overall', useful when picking a model for a product.
Intent Routing
Classifying an incoming request and dispatching it to the appropriate handler, model or specialised agent — so different types of input follow different processing paths optimised for cost, latency and accuracy.
Interface
The boundary through which a user or another system interacts with software — for AI products, commonly a chat UI, a voice channel, an API, an embedded widget or a copilot inside another tool.

K

Kernel
The core of an operating system: the program that manages memory, CPU time and access to hardware, and isolates the processes running on a machine from one another. Multiple containers or apps on the same server typically share one kernel.
KPI
Key Performance Indicator. A measurable value that tracks how effectively an organisation, team or product is achieving a specific objective. Common examples in software: conversion rate, retention, response time, monthly active users.
KV Cache
A memory of intermediate computations (the 'keys' and 'values') a language model reuses so it doesn't recompute the whole context on every generated token. It speeds up inference dramatically; sharing it across requests is a key optimisation for serving many users efficiently.

L

Latency
The time elapsed between sending a request to a system and receiving its response. In real-time AI experiences, especially voice, latency is one of the dominant drivers of perceived quality.
LLM
Large Language Model. A neural network — typically a transformer — trained on very large amounts of text to predict the next token. Examples include GPT, Claude, Gemini and Llama.
LLM Gateway
A middleware layer placed between an application and one or more LLM providers. It centralises concerns that would otherwise be reimplemented per model: API key management, model routing and fallback, caching, rate-limiting, observability, guardrails and cost control. Examples include Portkey, LiteLLM and OpenRouter.
Logical Isolation
Separating different customers' or users' data and execution by software boundaries (processes, namespaces, access controls) while they share the same underlying machine and kernel. Cheaper and denser than physical isolation, but several tenants still run on the same host.

M

Managed Hibernation
A pattern where long-lived agent or service instances are automatically suspended (their state frozen) when idle and resumed on the next request, instead of being torn down. It keeps per-tenant isolation while avoiding the cost of machines sitting idle.
MCP
Model Context Protocol. An open standard introduced by Anthropic that defines how AI assistants connect to external tools, data sources and prompts, so the same integration can be reused across compatible clients.
MoE
Mixture of Experts. A neural-network architecture in which the model is internally divided into multiple specialised sub-networks ('experts') and only a small subset is activated for each input. Allows models to grow to extremely large parameter counts while keeping inference cost relatively manageable.
Multi-Tenant
A software architecture in which a single instance of the system serves multiple customers (tenants), keeping their data and configuration logically isolated. The opposite is 'single-tenant', where each customer gets a dedicated instance — usually more expensive and required by stricter compliance regimes.
MVP
Minimum Viable Product. The smallest version of a product that can be released to validate a core hypothesis with real users while keeping cost and time low.

O

Observability
The ability to understand a running system's internal state from its external outputs — typically a combination of metrics, logs, traces and events. In AI systems it also covers model decisions, tool calls, costs and quality signals.
On-Premise
Running software on hardware operated inside an organisation's own data centres or offices, rather than on a public cloud. Often abbreviated 'on-prem'.
OPEX
Operational Expenditure. Ongoing day-to-day expenses to run a business — salaries, utilities, software-as-a-service subscriptions, pay-as-you-go cloud usage. In AI, typically associated with cloud APIs, pay-per-token model calls and managed services.

P

Paralinguistic Cues
Vocal signals that accompany speech without being words: tone, pauses, laughter, sighs, hesitation, volume and pace. They carry intent and emotion, and are critical for a voice agent to interpret what the user really means.
Parameters
The internal numerical weights a model learns during training; they encode what it 'knows'. Count is a rough proxy for capacity: more parameters generally means more capability (and more cost to run). '1T parameters' means one trillion (10^12) of them — the scale of today's frontier models.
Pareto
Pareto principle, also known as the 80/20 rule: the empirical observation that, in many domains, roughly 80% of effects come from 20% of causes.
Physical Isolation
Giving each customer or tenant its own dedicated infrastructure (machine, container or instance) so their data and execution never share hardware or kernel with anyone else's. Stronger than logical isolation and often demanded by strict compliance — but more expensive, since each tenant runs on resources of its own.
Product Market Fit
The point at which a product clearly resonates with a real market segment: customers actively pull it, retention is healthy and growth becomes organic. Reaching product-market fit is typically the goal of the early phase of a product, before scale and optimisation.
Prompt
The text instruction or query supplied as input to a language model to produce a response. Prompts can include role definitions, task description, examples, constraints and reference content.
Prosody
The rhythm, intonation, stress and tempo of speech — everything beyond the words themselves. A voice model that handles prosody well sounds natural and conveys emotion; one that doesn't sounds robotic.

Q

Quantisation
A compression technique that lowers the numerical precision of a model's weights (for example from 16-bit to 8-bit or 4-bit). A quantised model uses less memory and runs faster, usually with a small, controlled loss of quality — key to running large models on limited hardware.

R

Reasoning and Planning
An AI system's capacity to break a complex task into intermediate steps, weigh alternatives and decide on a sequence of actions — often combining model inference with deterministic logic and tool use.

S

Serverless
A deployment model where the provider runs your code on demand and allocates compute only while a request is being handled, scaling to zero when idle. You don't manage servers and you pay per execution — well suited to bursty or intermittent workloads.
Small Talk
Casual, low-content conversational exchanges — greetings, acknowledgements, thank-yous, closings — that fulfil a social function rather than transferring substantive information.
State Persistence
Storing the state of an ongoing process or conversation in a durable store so it can be reloaded later, instead of reconstructing it from raw history each time. A standard pattern for long-running or multi-session systems.
Synthetic Data
Data generated artificially — by simulation, rule-based generators or AI models — to augment or replace real-world data. Used to train models, fill rare cases, stress-test systems and protect privacy.

T

Time to Market
The elapsed time between starting work on a product and making it available to its first real users. In early-stage AI products, time to market is often more valuable than infrastructure savings because it accelerates feedback and validation.
Tokens
The atomic units used by a language model to process text. A token is typically a short piece of a word; pricing, context limits and throughput in modern AI APIs are measured in tokens.
Tools
Functions or external services an AI agent can invoke to act on the world or fetch information — for example, querying a database, calling an API, executing code or sending an email. Also referred to as function calls.

U

Unit Tests
Automated tests that verify the behaviour of a single unit of code — usually a function or class — in isolation from the rest of the system. Fast, cheap and run on every change.
UX
User Experience. The discipline of designing how a person interacts with a product — flows, layouts, controls, content and overall feel — so the experience is clear, efficient and pleasant.

W

Workflow
A defined sequence of steps that produces a business outcome — for example, processing an invoice, onboarding a customer or fulfilling an order. May be manual, automated, or a mix of both.

Glossary

A

Agent

Agent Memory

Agentic

AI Engineering

API

B

BackEnd

Batching

Bugs

C

CAPEX

Chatbots

ChatGPT

Cloud

Cluster

Compliance

Confidence Scores

Context Window

Corner Cases

CRM

E

End-to-End Tests

F

FAQs

Frontier Models

G

GPU

GPU as a Service

Guardrails

H

Hallucinations

I

Idle Zero Cost

Inference

Input

Integration Tests

Intelligence Index

Intent Routing

Interface

K

Kernel

KPI

KV Cache

L

Latency

LLM

LLM Gateway

Logical Isolation

M

Managed Hibernation

MCP

MoE

Multi-Tenant

MVP

O

Observability

On-Premise

OPEX

P

Paralinguistic Cues

Parameters

Pareto

Physical Isolation

Product Market Fit

Prompt

Prosody

Q

Quantisation

R

Reasoning and Planning

S

Serverless

Small Talk

State Persistence

Synthetic Data

T

Time to Market

Tokens