The Hidden Risk of Using Public AI Tools with Sensitive Documents

The first time someone in your organisation uploads a client contract into ChatGPT to get a quick summary, it might feel like a minor convenience. It probably also represents a compliance breach that no one noticed.

This is happening in professional organisations every day. Not through malice or carelessness — people are trying to do their jobs more effectively, and these tools genuinely help. But the data implications of using public AI tools with sensitive professional documents are real, and most organisations have not fully worked through them.

What happens when you upload a document to a public AI tool

When you paste text or upload a file into a consumer AI tool, that content is transmitted to servers operated by the company that provides the tool. Depending on the terms of service — which few people read carefully — that content may be:

›Stored on the provider's servers for an indefinite period
›Reviewed by human reviewers as part of safety or quality processes
›Used to train or improve future versions of the AI model
›Subject to the data retention and access policies of the provider's jurisdiction

Enterprise versions of these tools typically offer stronger protections — promises that data will not be used for training, for instance. But these promises are contractual, not architectural. They depend on the provider honouring them, not on your data never reaching their systems.

The GDPR question

For UK and EU organisations, the GDPR implications are the most immediate concern. The regulation requires that personal data is processed only on an appropriate legal basis, that data subjects are informed of how their data is used, and that personal data is not transferred to countries outside the UK/EU without adequate safeguards.

A document containing a client's name, address, financial details, or health information is personal data under GDPR. Uploading it to a third-party AI tool constitutes processing of that data by a third-party processor. This requires, at a minimum, a Data Processing Agreement with the provider — and most users of consumer AI tools have not entered into one.

The ICO — the UK's data protection regulator — has been clear that organisations are responsible for how their employees handle personal data, including through third-party tools. "My staff are doing it on their own initiative" is not a defence.

Beyond personal data

GDPR is the most clearly defined legal exposure, but it is not the only one. Many professional organisations are also bound by:

Client confidentiality obligations. Legal professionals have strict duties of confidentiality to clients. Sharing client documents with a third-party AI tool — even with good intentions — could breach those obligations. The same applies in architecture, where client briefs and commercial terms are often confidential.

Data residency requirements. Public sector organisations and those in regulated industries often have data residency obligations — requirements that certain data be stored or processed only within specific jurisdictions. Most public AI tools process data in the United States or across multiple global regions.

Commercial sensitivity. Beyond formal legal obligations, there is the simple question of competitive risk. Uploading strategy documents, financial projections, or tender responses to an external AI tool means that information exists, in some form, on someone else's infrastructure.

Why this is hard to manage by policy alone

The obvious response is to issue a policy: "Staff must not upload client data to external AI tools." This is the right thing to do. It is also likely to be only partially effective, for a simple reason: the tools are genuinely useful, and the temptation to use them will not go away.

The more durable solution is to provide an alternative that is as easy to use as the public tools — but that keeps data inside the organisation's own environment. If staff can ask questions about documents and get useful answers without the data leaving their organisation's infrastructure, the incentive to use external tools diminishes considerably.

What a compliant alternative looks like

A private AI knowledge tool — one that runs within your organisation's own deployment environment, or in a sovereign cloud region under your control — provides the same capabilities as public AI tools without the data exposure.

Key requirements for a compliant solution include: the AI processing happens inside your environment, not at an external provider; your documents are never transmitted to external servers for AI inference; there is a clear contractual framework with a UK/EU-based processor; and audit logs make it possible to demonstrate compliance in the event of an inquiry.

This is achievable today. The technology that makes public AI tools powerful — large language models, vector search, retrieval-augmented generation — can be deployed in ways that keep every document, every query, and every response within a boundary you control. The question is not whether a compliant alternative exists. It is whether your organisation has found and implemented one.

The practical first step

For most organisations, the starting point is an honest audit of how AI tools are currently being used. What tools are staff using? What types of document are being processed? What data protection agreements, if any, are in place with AI providers?

The answers to those questions tend to clarify priorities quickly. And for organisations where sensitive, confidential, or regulated documents are part of everyday work — which describes most professional firms — the case for a private alternative becomes straightforward.