AI Hallucinations: Why Grounding AI in Your Own Documents Changes Everything

If you have spent time with a large language model — the technology behind ChatGPT and its competitors — you will probably have encountered a hallucination. The AI produces a confident, well-written answer that is factually wrong. A case that does not exist. A clause that was never in the contract. A statistic from a source that cannot be found.

For casual use, a hallucination is an inconvenience. For a legal team, an architecture firm, or a public sector body operating under professional obligations, it is potentially a serious liability.

What hallucinations are and why they happen

A large language model is trained on vast amounts of text — books, articles, websites, code, and more. During training, it learns statistical patterns: given this sequence of words, what words are likely to follow? This is a remarkable and powerful capability. It is also the source of hallucinations.

The model does not have a factual database that it looks things up in. It generates responses based on patterns learned during training. When asked about something that falls outside its training data, or where the training data was inconsistent or wrong, it generates a plausible-sounding response anyway — because generating plausible text is what it is optimised to do.

The model does not know when it is hallucinating. From its perspective, every response is equally generated through the same mechanism. Confidence and accuracy are not the same thing.

Why this matters for professional organisations

In a professional context, the hallucination problem is particularly acute for three reasons.

First, the stakes of errors are higher. A legal team that relies on AI to find precedent and gets a plausible-sounding but non-existent case could produce advice that is wrong in a way that is hard to detect until it matters. An architect who trusts an AI summary of planning conditions that does not accurately reflect the actual conditions faces real professional risk.

Second, professional documents contain specific, technical, and often unique content that is unlikely to feature prominently in public AI training data. The AI simply has no reliable knowledge base to draw on for your planning decision notices, your client contracts, or your internal procedures.

Third, professionals are trained to be accurate and to cite their sources. AI responses that are presented without sources, or with sources that cannot be verified, are incompatible with how professional knowledge work is supposed to work.

How grounding the AI in your own documents changes this

There is a well-established technique — used by the better AI knowledge platforms — for addressing the hallucination problem in specific use cases. It is sometimes called retrieval-augmented generation, or RAG. We would describe it more simply as grounding the AI in your own knowledge.

Here is how it works. Instead of asking the AI to generate an answer from its training data, you first search your organisation's own documents for the most relevant passages relating to the question. You then provide those passages directly to the AI as context, and ask it to answer the question based on those passages alone.

The result is an AI system that is not drawing on statistical patterns from the internet — it is summarising and synthesising content that actually exists in your documents, right now, for this specific question. The answer is grounded.

Why cited answers matter

The grounding approach also makes it natural to include citations. Because the system knows which passages it drew on to construct the answer, it can tell you: "This response is based on paragraphs 3 and 4 of the Riverside Planning Decision Notice, dated October 2021."

This changes the professional calculus entirely. Instead of asking "can I trust this AI response?", the question becomes "can I verify this in the source document?" — which is a question professionals know how to answer. The AI becomes a research assistant, not an oracle.

If the AI cannot find a reliable answer in the documents, a well-designed system says so. It does not fabricate. This is an important design choice. The system should be calibrated to say "I cannot find information about this in your current knowledge base" rather than to produce a plausible but unreliable answer.

What this means in practice

For an architecture firm, it means asking "what were the drainage conditions on the Morrison Street scheme?" and getting an answer drawn directly from the relevant project reports — with the source identified. The answer is either right (because it comes from the document) or the system says it cannot find the information.

For a legal team, it means searching across a client's contract portfolio for how a specific type of liability clause has been drafted in previous agreements — and being able to verify every example by going back to the source contract.

For a public sector organisation, it means being able to answer questions about policy and procedure with confidence that the answer reflects what is actually written in the relevant documents — and being able to show an audit trail if needed.

The hallucination problem is not solved — it is bypassed

It is worth being precise about what this approach does and does not do. It does not fix the underlying tendency of language models to hallucinate when they have no reliable context to work from. What it does is constrain the system so that it is working only with context you have provided — your documents — rather than drawing on its general training data.

If your documents are accurate, the answers will be accurate. If a particular question cannot be answered from your documents, the system should tell you that rather than inventing an answer. The hallucination risk does not disappear, but it is substantially contained by the architecture of the system.

This is why the design of an AI knowledge tool matters as much as the AI technology inside it. The same underlying language model, deployed without grounding, will hallucinate. Deployed with a well-designed retrieval and citation architecture, it becomes genuinely reliable for professional use.