Why do AI models make things up?

Large language models predict the next likely word based on training patterns, not on a database of facts. When a model does not know an answer, it does not say "I do not know"; it produces text that looks like an answer. The model is not lying in the human sense; it is doing pattern completion. That is why hallucinations sound so confident.

How often does AI hallucinate?

Studies in 2024-25 reported hallucination rates anywhere between 3% and 27% depending on task, model and how strictly you defined a hallucination. For factual research tasks involving specific names, dates or citations, the rate is at the higher end. Always verify.

Are the big-name models better now in 2026?

They have improved considerably, especially on common questions, but hallucination is not solved. Newer models hallucinate less often but more confidently. Some 2025 models are worse at saying "I do not know" than 2023 models were. Always verify.

Spotting AI bullshit (hallucinations, fake citations, fabricated cases)

AI hallucination is not a bug that will be fixed next quarter. It is a feature of how large language models work. The interesting question is not whether AI gets things wrong, but how to recognise when it has and what to do about it.

This guide is the bullshit detector you should run on AI output before you submit it, file it, cite it or quote it.

The categories of AI bullshit

Not all hallucinations are the same. There are six main shapes:

Fake citations. AI invents a paper, book, judgment or report that sounds entirely plausible. Authors, titles, dates, journal names, volume numbers, page references and even DOIs are confidently generated and confidently wrong. This is the single most dangerous category for students, lawyers and journalists.
Real source, wrong content. AI cites a real paper but invents what that paper says, or attributes quotes to the wrong author. Worse than category 1 because the source check passes.
Plausible but wrong numbers. AI generates a percentage, dollar figure or date that fits the shape of the answer but is incorrect by 5% or 20% or an order of magnitude. Particularly common with anything below the model's training cutoff.
Composite people. AI invents a person with a real-sounding name, a real-sounding title and a real-sounding employer who does not exist. Especially common when you ask for "an expert in X who said something about Y".
Outdated fact stated as current. AI's training data has a cutoff. It will happily tell you the 2023 prime minister, the 2022 ATO rate or the 2024 product as if they are today's. With agentic browsing on, this is less common, but verify currentness anyway.
Confident wrong reasoning. AI walks through a clean-looking step-by-step argument that contains a wrong premise or a logical jump. Particularly common on legal, scientific, statistical and mathematical questions.

The 60-second verification workflow

For anything you would actually rely on:

Run the claim through a search engine. If the AI's claim is "the ATO raised the tax-free threshold to $24,000 in 2026", search for "ATO tax-free threshold 2026" and see if a gov.au URL confirms it. If you cannot find independent confirmation, the claim is suspect.
For citations: open the source. Click the link or search the title. If AI gives you "Smith v Commonwealth (2021) 273 CLR 482", that case should appear in AustLII. If it does not, the cite is fake.
For numbers: find a second source. Two independent sources agreeing on a figure is the bar. One AI claim is not a source.
For names and titles: check LinkedIn or the organisation's own site. A real expert is publicly findable. A composite is not.
For anything in the last six months: assume the AI does not know it. Ask AI to label what is uncertain or outdated.

If a claim fails verification, do not "fix" it by re-prompting AI. Find a real source or remove the claim.

Where students get caught

Real examples we have seen in 2024-26:

A Year 12 modern history student cited a German historian by name who did not exist. The teacher Googled it.
A first-year law student filed a memorandum citing two High Court of Australia cases that did not exist. The lecturer reported it to the academic conduct committee.
A journalism cadet ran a story quoting a fictitious CSIRO researcher. The Drum picked it up. The cadet was let go.
A junior consultant generated a market-sizing slide where the AI had multiplied two real numbers wrong. The partner caught it in front of the client.

In every case the person submitting the work had not checked.

Red flags in AI output that should slow you down

Specific names of people you have never heard of, paired with very precise quotes.
Citations that include volume numbers and page numbers, especially for cases or papers.
Statistics with two decimal places ("23.47% of Australians...") and no source.
Dates more specific than the rest of the answer ("On 14 March 2024 the ATO announced...") without a link.
Confident answers to questions that should be controversial or uncertain.
Step-by-step reasoning that uses the word "clearly" or "obviously".
Any answer where you asked for "an example" and got something that fits too neatly.

If you see two or more of these in one paragraph, treat the whole answer as suspect.

What the model is and is not built to do

A useful mental model from CSIRO's Responsible AI work and the Australian Government's safe-and-responsible-AI consultation: current LLMs are pattern-completion engines with strong fluency, moderate-to-good general-knowledge recall, and weak-to-moderate factual grounding for anything specific. They are not search engines. They are not databases. They are not licensed professionals. They are confident-sounding text generators.

The accuracy increases when:

The model has browsing or retrieval (RAG) wired in and shows you the source.
The question is in a well-trodden domain (basic maths, common code patterns, well-known historical events).
You explicitly ask the model to mark uncertainty ("for each claim, rate your confidence").

It decreases when:

The question is about recent events.
The question asks for specific names, citations or numbers.
The model is "freestyling" without a retrieved source.
You phrase the question as if you already know the answer (the model will tend to confirm).

When to use AI and when not to

Use AI as a starting point for exploration. Do not use AI as the final source for anything that would matter if it were wrong. A 2024 Law Council of Australia guidance note puts it cleanly: "Generative AI output is a draft. It is not a finding."

The same principle applies to school assignments, university essays, journalism pieces, legal filings, financial advice, medical questions and anything you would put your name to.