Glossary

Key terms in Belgian tax and AI explained

A

Adapter

Small trainable modules inserted into frozen pretrained models, enabling efficient task-specific fine-tuning with minimal parameters.

Adversarial testing

Systematically probing models with difficult or malicious inputs to find failures.

Alignment

The process of training AI systems to behave in accordance with human values, intentions, and preferences—ensuring models are helpful, harmless, and honest.

Answer grounding

Ensuring that generated answers can be traced back to specific supporting sources.

Approximate Nearest Neighbor

Algorithms that find approximately similar vectors quickly by trading perfect accuracy for massive speed improvements.

Attention Mechanism

A neural network technique that allows models to focus on relevant parts of input when producing output, enabling context-aware processing.

Attribution

The AI capability of linking generated statements to specific source evidence, establishing which parts of the output are supported by which documents or data points.

B

Backpropagation

An algorithm that efficiently computes gradients by propagating errors backward through a neural network layer by layer.

Beam Search

A decoding algorithm that explores multiple candidate sequences in parallel, keeping the top-k most promising paths at each step.

Benchmarking

The systematic process of evaluating model performance against standardized datasets and metrics, enabling fair comparison between different models, architectures, and approaches.

Bi-Encoder

A neural architecture that separately encodes queries and documents into fixed-size vectors, enabling efficient similarity search through pre-computed embeddings and approximate nearest neighbor indexes.

Bias Mitigation

Bias mitigation is the set of methods used to detect and reduce unfair bias in an AI system’s data, model behavior, and outcomes.

BM25

Best Matching 25 - the state-of-the-art probabilistic ranking algorithm for text search based on TF-IDF principles.

Byte Pair Encoding (BPE)

A subword tokenization algorithm that builds a vocabulary by iteratively merging frequent symbol pairs.

C

Calibration

Aligning model confidence scores with the true likelihood of correctness.

Chain-of-Thought

A prompting technique that elicits step-by-step reasoning from language models, improving performance on complex tasks by making the model's reasoning process explicit and verifiable.

Chunking Strategy

The method of dividing documents into smaller segments for effective retrieval and processing in RAG systems.

Citation

The practice of explicitly referencing source documents in AI-generated responses, enabling verification of claims and building trust through transparency and traceability.

Confidence interval

A range of values within which a quantity is believed to lie with a specified probability.

Context injection

The practice of adding retrieved or auxiliary information into an LLM prompt to guide generation.

Context Window

The maximum amount of text (measured in tokens) that a language model can process in a single interaction.

Continuous evaluation

Regularly re-running evaluations in production to detect regressions or drift early.

Cosine Similarity

A mathematical measure of similarity between two vectors based on the cosine of the angle between them.

Cross-Encoder

A neural architecture that jointly encodes query and document pairs to produce relevance scores, providing higher accuracy than bi-encoders but at greater computational cost.

D

Deep Learning

A subset of machine learning using neural networks with many layers to learn hierarchical representations from data.

Dense Retrieval

Information retrieval using learned dense vector representations, enabling semantic matching beyond keyword overlap.

Dimensionality reduction

Techniques that reduce the number of features in data while preserving as much structure as possible.

Distance metric

A function that quantifies how far apart two points are in a space, subject to metric properties.

Dot product similarity

A similarity measure computed as the dot product between two embedding vectors.

E

Embedding alignment

Techniques to map embeddings from different models or languages into a shared space.

Embedding compression

Reducing the size or precision of embeddings to save memory and speed up search.

Embedding drift

Changes in embedding distributions over time that can degrade retrieval or model performance.

Embedding model

A machine learning model that converts inputs like text or images into vector embeddings.

Embedding space

The high-dimensional vector space in which embeddings live and semantic relationships are encoded geometrically.

Embeddings

Dense vector representations of data (text, images, etc.) that capture semantic meaning in a continuous numerical space.

Error analysis

Carefully examining where and why a model fails to improve future iterations.

Euclidean distance

The straight-line distance between two points in a vector space, used as a metric between embeddings.

Evals framework

A reusable setup for defining, running, and tracking evaluations of AI systems.

Evaluation dataset

A curated set of inputs and gold-standard outputs used to measure model or system performance.

Explainability

The ability to understand, interpret, and explain how AI/ML models make predictions—essential for trust, debugging, regulatory compliance, and responsible AI deployment.

F

Factual consistency

The degree to which a generated answer agrees with trusted sources or ground truth.

Factuality

The degree to which AI-generated content accurately reflects verifiable truth, distinguishing correct statements from fabrications, errors, and hallucinations.

FAISS

Facebook AI Similarity Search - the most comprehensive open-source library for efficient similarity search and clustering of dense vectors.

Faithfulness

The property that a model’s explanation or answer accurately reflects its underlying reasoning or evidence.

Feed-forward network

A neural network where information flows in one direction from input to output without recurrent connections.

Few-Shot Learning

A machine learning paradigm where models learn to perform tasks from just a handful of examples, enabling rapid adaptation without extensive retraining or fine-tuning.

Fine-Tuning

The process of further training a pre-trained model on domain-specific data to improve performance on specialized tasks.

Function calling

An LLM capability where the model selects and fills structured arguments to call external tools or functions.

G

Generative layer

The part of a RAG system where the language model conditions on retrieved context to produce an answer.

Gradient Descent

An optimization algorithm that iteratively adjusts model parameters by moving in the direction that reduces the loss function.

Greedy Decoding

A simple text generation strategy that always selects the highest-probability token at each step.

Ground Truth

The authoritative, verified reference data used to train and evaluate machine learning models—the 'correct' answers against which model predictions are measured.

Grounding

The technique of anchoring AI model outputs to verifiable sources, facts, or retrieved documents to reduce hallucinations and increase response accuracy and trustworthiness.

Guardrails

Safety mechanisms and constraints that prevent AI systems from generating harmful, inappropriate, or off-topic outputs—providing runtime protection beyond training-time alignment.

H

Hallucination

When an AI model generates false, fabricated, or unsupported information presented as fact.

Hallucination rate

The proportion of a model’s outputs that contain unsupported, fabricated, or false statements.

HNSW

Hierarchical Navigable Small World graphs - the state-of-the-art algorithm for fast approximate nearest neighbor search in high-dimensional spaces.

Human-in-the-loop validation

Using human reviewers to check, correct, or approve AI outputs as part of an evaluation process.

Hybrid indexing

Building and maintaining combined sparse and dense indices to support hybrid search.

Hybrid Search

A retrieval approach combining keyword-based and semantic vector search to leverage the strengths of both methods.

I

In-Context Learning

The ability of large language models to learn new tasks at inference time by conditioning on examples or instructions provided directly in the prompt, without any gradient updates.

Index refresh

The process of updating a vector or search index to reflect new or changed data.

Index sharding

Splitting an index across multiple shards or machines to scale retrieval.

Inference

The process of using a trained model to generate predictions or outputs on new, unseen data.

Instruction Tuning

A fine-tuning method that trains language models to follow natural language instructions across diverse tasks.

Inverted Index

A data structure mapping terms to document locations, enabling fast full-text search over large document collections.

Iterative retrieval

A retrieval strategy that repeatedly refines queries and context based on intermediate results.

J

Jailbreaking

The practice of crafting prompts or inputs to bypass an AI system's safety and policy constraints.

K

Knowledge Distillation

Training a smaller student model to mimic a larger teacher model, transferring knowledge while dramatically reducing size and cost.

Knowledge Graph

A structured network of entities and their relationships that enables machines to understand and reason about real-world concepts.

Knowledge retrieval strategy

The high-level design choices for how a system retrieves and structures knowledge for use by LLMs.

L

Legal Domain Adaptation

Legal domain adaptation tailors an AI or search system to legal language, sources, and reasoning so outputs are more precise and defensible.

LLM

Large Language Models are AI systems trained on vast text data to understand and generate human-like text, powering modern conversational AI.

Log probabilities

The logarithms of token probabilities produced by a language model, used for scoring and analysis of generations.

LoRA

Low-Rank Adaptation - an efficient fine-tuning technique that trains small adapter matrices instead of updating all model weights.

Loss Function

A mathematical function that measures how far a model's predictions are from the desired outputs during training.

M

Machine Learning

A field of AI where systems learn patterns from data to make predictions or decisions without explicit programming.

Metadata filtering

Restricting retrieval results based on document attributes like type, date, or jurisdiction.

Milvus

An open-source vector database optimized for storing, indexing, and searching massive-scale embedding vectors—enabling similarity search for AI applications like RAG, semantic search, and recommendations.

Model Compression

Techniques to reduce AI model size and computational requirements while preserving performance, enabling efficient deployment.

Model drift

A degradation in model performance over time because the data distribution or usage changes.

Model robustness

How well a model maintains performance under noise, shifts, or adversarial inputs.

Multi-Head Attention

A technique that runs multiple attention operations in parallel, allowing models to capture different types of relationships simultaneously.

Multi-hop retrieval

Retrieval that chains multiple retrieval steps together to answer complex, multi-step questions.

N

Named Entity Recognition

AI technique that identifies and classifies named entities like people, places, and organizations in text for information extraction.

Nearest neighbor search

Finding the closest items to a query in a vector space under a given distance metric.

Negative retrieval

A retrieval pattern that explicitly searches for contradicting, missing, or disconfirming evidence.

Neural Network

A machine learning model composed of interconnected layers of artificial neurons that learn patterns from data.

O

OCR

Optical Character Recognition—technology that converts images of text (scanned documents, photos, PDFs) into machine-readable text, enabling search, editing, and AI processing of printed or handwritten content.

P

Passage retrieval

Retrieving small passages or chunks of text rather than whole documents for more precise answers.

Perplexity

A metric measuring how well a language model predicts text, with lower values indicating better prediction ability.

Pinecone

A fully managed vector database service designed specifically for machine learning applications, providing serverless similarity search at scale.

Positional encoding

A technique used in transformer models to inject information about token positions into otherwise order-agnostic embeddings.

Pretraining

The initial phase of training a large language model on massive text corpora to learn general language patterns, world knowledge, and reasoning capabilities before task-specific fine-tuning.

Prompt

The input text or instruction given to a language model to guide its response generation.

Prompt Injection

An attack technique where malicious instructions are inserted into LLM inputs to override system prompts, bypass guardrails, or manipulate model behavior in unintended ways.

Pruning

Removing unnecessary weights or neurons from neural networks to reduce model size and computational cost without significant accuracy loss.

Q

QLoRA

Quantized LoRA - combines 4-bit quantization with LoRA adapters, enabling fine-tuning of 65B+ models on a single 48GB GPU.

Quantization

Reducing model precision from 32/16-bit to 8/4-bit, dramatically decreasing memory usage and speeding up inference.

Query Expansion

Techniques that automatically reformulate or augment search queries to improve retrieval by adding synonyms, related terms, or rephrased versions.

Query rewriting

The process of transforming a user query into a more effective form for retrieval.

R

Regression testing (AI systems)

Checking that changes to models or pipelines do not unintentionally degrade existing behaviour.

Reinforcement Learning

A machine learning approach where agents learn optimal behavior through trial-and-error interactions with an environment.

Reliability metrics

Metrics that capture how stable, predictable, and safe an AI system is over time.

Reranking

A second-stage retrieval technique that reorders initial search results to improve relevance using more sophisticated models.

Retrieval coverage

The extent to which a retrieval system can surface all the information needed to answer questions in a domain.

Retrieval filtering

Applying rules or metadata filters to restrict which documents can be retrieved for a query.

Retrieval latency

The time it takes for a retrieval system to return results for a query.

Retrieval layer

The part of a RAG system that finds and ranks relevant documents or chunks before generation.

Retrieval orchestration

Coordinating multiple retrieval steps, indices, or tools to serve a single AI task or query.

Retrieval pipeline

An ordered sequence of steps that process a query and documents to return ranked results in a RAG or search system.

Retrieval precision

The fraction of retrieved documents that are actually relevant to the query.

Retrieval recall

The fraction of all truly relevant documents that a retrieval system successfully returns.

Retrieval scoring

The computation of numeric relevance scores for documents or chunks given a query.

Retrieval-Augmented Generation

RAG is an AI technique that combines information retrieval with text generation to produce accurate, source-grounded responses.

RLHF

Reinforcement Learning from Human Feedback—a technique to fine-tune language models using human preferences as reward signals.

S

Self-Attention

A mechanism where each element in a sequence computes attention weights with all other elements in the same sequence.

Semantic clustering

Grouping embeddings into clusters so that each group represents a coherent semantic concept or topic.

Semantic Search

Search technology that understands meaning and intent rather than just matching keywords, enabling more relevant and intelligent results.

Semantic Similarity

A measure of how alike two pieces of text are in meaning, regardless of the specific words used.

SentencePiece

A language-agnostic subword tokenization library that learns a vocabulary directly from raw text.

Similarity search

Searching for items in a dataset that are similar to a query item under a chosen distance metric.

Sliding window chunking

A chunking strategy where overlapping windows move across a document to preserve context between chunks.

Sparse Retrieval

Information retrieval using high-dimensional sparse vectors based on term frequencies, like BM25 and TF-IDF.

Stress testing

Evaluating how an AI system behaves under extreme or degraded conditions.

Structured output generation

The practice of constraining LLM responses into well-defined formats such as JSON, XML, or schemas.

Supervised Learning

A machine learning approach where models learn from labeled training data to predict outputs for new inputs.

System prompt

The hidden or fixed instruction block that sets overall behavior and constraints for an LLM in a given application.

T

Temperature

A parameter controlling the randomness of language model outputs, affecting creativity versus consistency.

TF-IDF

Term Frequency-Inverse Document Frequency - a statistical measure of word importance in a document relative to a collection.

Tokenization

The process of splitting text into smaller units (tokens) that language models can process and understand.

Tool use in LLMs

The design pattern where LLMs decide when and how to call external tools to complete tasks.

Top-k Sampling

A sampling method that restricts token selection to the k most probable next tokens at each generation step.

Top-p Sampling

A sampling method that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p.

Transformer Architecture

A neural network architecture using self-attention to process sequential data in parallel, powering modern LLMs.

U

Uncertainty estimation

Quantifying how uncertain a model is about its predictions or answers.

Unsupervised Learning

A machine learning approach where models discover patterns and structure in data without labeled examples.

V

Vector Database

A specialized database optimized for storing and searching high-dimensional vector embeddings using similarity metrics.

Vector embeddings

Embeddings represented as numerical vectors in a high-dimensional space, used for similarity and retrieval.

Vector indexing

The process of organizing embeddings in a data structure that supports fast similarity search.

Vector normalization

Rescaling vectors to have a standard length, often unit norm, before computing similarity.

Vector quantization

Approximating vectors using a small set of codebook entries to reduce storage and speed up search.

W

Weaviate

An open-source vector database that combines vector search with structured data filtering and built-in machine learning modules—enabling semantic search, RAG, and AI-native applications.

Z

Zero-Shot Learning

A machine learning capability where models perform tasks without any task-specific examples, relying solely on pre-trained knowledge and natural language instructions.