Data & AI Glossary

A

A/B Testing

Comparing two versions of something to determine which performs better.

Agentic AI

AI designed to act with autonomy, breaking down goals into steps and executing them.

AI Act

The European regulation classifying AI systems by risk level and setting rules for their use.

AI Agent

An AI system that can plan, use tools, and take actions autonomously to achieve a goal.

AI Bias

Systematic unfairness in AI outputs, often inherited from biased training data.

AI Governance

The policies and oversight ensuring AI systems are used safely and in line with organisational values.

Algorithm

A defined sequence of steps a computer follows to solve a problem or perform a task.

Analytics

The systematic analysis of data to discover patterns and support decisions.

API (Application Programming Interface)

A set of rules allowing different software systems to communicate and exchange data.

Artificial Intelligence (AI)

Systems designed to perform tasks that normally require human intelligence, such as reasoning or perception.

B

Big Data

Datasets so large, fast, or varied that traditional tools cannot process them efficiently.

Black Box

A model whose internal decision-making process is difficult or impossible to interpret.

Business Intelligence (BI)

Tools and practices that turn data into dashboards, reports, and insights for business users.

C

Chatbot

A program that interacts with users through conversation, often powered by language models.

Classification

A task where a model assigns inputs to predefined categories, such as spam or not spam.

Cloud Computing

Delivering computing services such as storage and processing over the internet, on demand.

Clustering

Grouping similar data points together without predefined categories.

Compliance

Adhering to laws, regulations, and internal policies governing data use.

Computer Vision

The field of AI enabling machines to interpret and analyse images and videos.

Context Window

The maximum amount of text a language model can consider at once.

Correlation

A statistical relationship between two variables, which does not necessarily imply causation.

D

Dashboard

A visual interface displaying key metrics and indicators at a glance.

Data

Raw facts, figures, or observations that have not yet been processed to produce meaning.

Data Acculturation

The process of building a shared data culture and mindset across an organisation.

Data Anonymisation

Removing or altering personal identifiers so individuals can no longer be identified.

Data Catalog

An organised inventory of an organisation's data assets, making them easy to find and understand.

Data Governance

The framework of policies, roles, and processes ensuring data is managed responsibly across an organisation.

Data Ingestion

The process of collecting and importing data from various sources into a storage system.

Data Lake

A storage system holding raw data in its original format until it is needed.

Data Lakehouse

An architecture combining the flexibility of a data lake with the management features of a warehouse.

Data Lineage

The documented journey of data, showing where it comes from and how it is transformed.

Data Literacy

The ability to read, understand, question, and communicate with data.

Data Mesh

A decentralised approach where business domains own and serve their data as products.

Data Mining

Exploring large datasets to discover hidden patterns and relationships.

Data Owner

The person accountable for a data asset, its access rules, and its business value.

Data Pipeline

An automated sequence of steps that moves and transforms data from source to destination.

Data Point

A single unit of information, such as one measurement or one customer record field.

Data Privacy

The protection of personal information and individuals' rights over how their data is used.

Data Quality

The degree to which data is accurate, complete, consistent, and fit for its intended use.

Data Retention

Rules defining how long data is kept before being archived or deleted.

Data Security

The measures protecting data against unauthorised access, corruption, or theft.

Data Steward

A person responsible for the quality, documentation, and proper use of specific data assets.

Data Storytelling

Combining data, visuals, and narrative to communicate insights in a compelling way.

Data Visualisation

Representing data graphically through charts, maps, or diagrams to make it easier to understand.

Data Warehouse

A central repository storing structured data from multiple sources, optimised for analysis and reporting.

Database

An organised system for storing, managing, and retrieving data electronically.

Dataset

A structured collection of related data, usually organised in rows and columns.

Deep Learning

A type of machine learning using multi-layered neural networks to learn complex patterns.

Descriptive Analytics

Analysis that explains what happened, based on historical data.

Digital Transformation

The integration of digital technologies, including data and AI, to reshape how an organisation operates and delivers value.

E

ELT (Extract, Load, Transform)

A modern variant where raw data is loaded first and transformed inside the target system.

Embedding

A numerical representation of text or other data that captures its meaning for machines.

ETL (Extract, Transform, Load)

A process that extracts data, transforms it into a usable format, then loads it into a target system.

Explainability

The ability to understand and explain how an AI model reaches its decisions.

F

Features

The input variables a model uses to make predictions.

Few-Shot Learning

A model performing a task after seeing only a handful of examples in the prompt.

Fine-Tuning

Further training a pre-trained model on specific data to specialise it for a task.

Foundation Model

A large model trained on broad data, adaptable to many downstream tasks.

G

GDPR

The European regulation governing how organisations collect, store, and process personal data.

Generative AI

AI systems that create new content such as text, images, code, or audio.

H

Hallucination

When an AI model generates information that sounds plausible but is false or invented.

Human-in-the-Loop

Keeping humans involved in AI decisions to validate, correct, or override outputs.

I

Inference

Using a trained model to make predictions on new, unseen data.

Information

Data that has been organised and contextualised so it becomes useful for decision-making.

K

KPI (Key Performance Indicator)

A measurable value showing how effectively an objective is being achieved.

L

Label

The correct answer attached to a training example in supervised learning.

LLM (Large Language Model)

A model trained on massive amounts of text to understand and generate human language.

M

Machine Learning (ML)

A branch of AI where systems learn patterns from data instead of following explicit rules.

Master Data

The core reference data of an organisation, such as customers, products, or suppliers.

Metadata

Data that describes other data, such as a file's author, creation date, or format.

MLOps

The practices and tools for deploying, monitoring, and maintaining machine learning models in production.

Model

The output of training an algorithm on data, capable of making predictions on new inputs.

Model Drift

The degradation of a model's performance over time as real-world data changes.

Multimodal AI

AI capable of processing several types of input, such as text, images, and audio together.

N

Neural Network

A computing model inspired by the human brain, made of interconnected layers of artificial neurons.

NLP (Natural Language Processing)

The field of AI enabling machines to understand and generate human language.

O

Open-Source Model

An AI model whose weights are publicly available for anyone to use or adapt.

Overfitting

When a model memorises training data too closely and performs poorly on new data.

P

Predictive Analytics

Analysis that uses historical data to forecast what is likely to happen.

Prescriptive Analytics

Analysis that recommends actions to take based on predicted outcomes.

Prompt

The instruction or question given to a generative AI model to produce a response.

Prompt Engineering

The practice of crafting effective prompts to get better results from AI models.

R

RAG (Retrieval-Augmented Generation)

A technique where a model retrieves relevant documents before generating its answer.

Regression

A task where a model predicts a continuous numerical value, such as a price.

Reinforcement Learning

Training an agent through trial and error, using rewards and penalties.

Reporting

The regular production of structured documents presenting data on activities and performance.

Responsible AI

Developing and using AI in ways that are ethical, fair, transparent, and accountable.

S

Self-Service BI

Tools enabling business users to explore data and build reports without technical help.

Semi-Structured Data

Data with some organisational markers but no rigid schema, such as JSON or XML files.

SQL (Structured Query Language)

The standard language used to query and manipulate data in relational databases.

Structured Data

Data organised in a predefined format, such as tables with rows and columns.

Supervised Learning

Training a model on labelled examples where the correct answer is provided.

System Prompt

Hidden instructions that define an AI assistant's behaviour, tone, and boundaries.

T

Temperature

A setting controlling how creative or predictable a model's responses are.

Token

A small unit of text, such as a word or word fragment, that language models process.

Training

The process of teaching a model by exposing it to data and adjusting its parameters.

U

Underfitting

When a model is too simple to capture the patterns in the data.

Unstructured Data

Data without a predefined format, such as emails, images, videos, or free text.

Unsupervised Learning

Training a model on unlabelled data to discover hidden structures or groups.

Z

Zero-Shot Learning

A model performing a task without having seen any examples of it.