Key terms in Belgian tax and AI explained
An authority ranking model ranks retrieved sources by legal authority and trustworthiness so controlling, reliable sources appear first.
Boolean search is a query style that combines terms with operators like AND, OR, and NOT to control inclusion and exclusion with high precision.
Assigning confidence estimates to predictions, answers, or retrieval results.
Content discoverability is how easily content can be found, accessed, and indexed by search systems (internal search or search engines).
Creating searchable indices over content so that it can be retrieved efficiently.
A body of text or documents used for training, evaluation, or retrieval.
A series of steps that move and transform data from source systems into usable form for AI and analytics.
Cleaning, transforming, and enriching raw data before training or indexing.
Identifying and removing duplicate or near-duplicate documents or records in a corpus.
A smaller segment of a larger document used as the atomic unit for indexing and retrieval.
The process of collecting, importing, and registering documents into a knowledge or search system.
Standardising document structure, encoding, and fields so that different sources can be processed consistently.
Extracting structure and text from raw document formats such as PDF, Word, or HTML.
Automatically identifying and labeling entities such as people, organisations, or legal concepts in text.
Full-text search retrieves documents by matching query terms against indexed text (often via an inverted index), then ranks the best matches.
An indexing strategy is the plan for what content gets indexed, how it is structured, and how it is kept up to date for reliable search.
A system that stores, indexes, and retrieves information in response to user queries.
A structured repository of information designed for querying, retrieval, and reuse.
Legal dependency mapping builds a graph of citations and relationships between legal sources so retrieval and analysis can follow what depends on what.
Search that matches queries to documents based on exact or stemmed keyword overlap.
Adding or improving metadata on content to make it easier to search, filter, and govern.
Multi-jurisdictional indexing structures an index across countries or regions so retrieval respects jurisdiction, language, and legal applicability.
A formal representation of concepts and relationships in a domain, often used to structure knowledge systems.
Query intent is the underlying goal behind a search query (what the user is trying to accomplish), used to tailor ranking and results.
Query understanding is how a search system interprets a query’s meaning (entities, intent, ambiguity) before retrieving and ranking results.
Computing how relevant each candidate result is to a query, often combining multiple signals.
Relevance tuning is the systematic process of improving search ranking by adjusting signals, weights, and rules based on evidence and evaluation.
Retrieval coverage analysis checks whether your index and retrieval pipeline can find the sources needed for a defined set of questions and topics.
Machine-readable annotations that describe the structure and meaning of content.
Search analytics is the measurement of how search is used and how well it performs (queries, clicks, zero-results, satisfaction) to drive improvements.
Semantic expansion broadens a query with related terms or meanings (synonyms, entities, embeddings) to improve recall without sacrificing intent.
Ordering results based on semantic relevance rather than only keyword overlap.
Source conflict resolution is how a search or RAG system detects and handles contradictory sources, prioritizing controlling authority and making uncertainty explicit.
Source freshness tracking records how current each source is (version, last update, in-force date) so retrieval stays aligned with changing law.
An ordered view of information sources by authority or precedence.
Source reliability weighting assigns higher influence to more trustworthy sources so retrieval and answers prioritize official, higher-quality material.
Data organised into explicit fields and types, such as tables or well-defined JSON.
A hierarchical classification of content used to organise navigation, search, and discovery.
Indexing content in a way that captures time, effective dates, or validity periods.
Tracking and managing different versions of documents and knowledge artefacts over time.