publishing.co.uk · methodology

How the AI Book Discoverability Index is built

One page, plain English — everything a journalist, author or sceptic needs to evaluate the index and the AI visibility audits behind it. Live state: 91,190 citations · 916 books · 16 genres · last updated 10 Jun 2026.

The engines

The index is built from the four citation-transparent enginesChatGPT, Claude, Gemini, Perplexity — the ones that return checkable source links with their answers. The paid audit product additionally probes a fifth surface, a clearly-labelled Amazon Rufus simulation (Rufus has no public API): ChatGPT, Claude, Perplexity, Gemini, Amazon Rufus 5 in total. Simulated-assistant runs never feed the index — they are excluded at the data layer. The free check runs 2 of the 5 engines.

The questions

Every audited book is probed with ~25 reader-style questions per engine — “best [genre] books”, “books like [comp title]”, use-case and gift prompts, and direct recall (“tell me about [title]”). These are the questions readers actually ask, not SEO keywords.

Citation tracing — live citations, not model memory

When an engine answers with web grounding, it cites sources. We log every cited URL — domain, deep link, the book it was cited for — into a permanent store. Every number on the index traces to those logged citations; nothing is estimated from what a model “remembers”. Search-infrastructure redirect hosts are excluded as non-sources.

The AI Shelf Score

A book’s score (0–100) measures how reliably that title comes back by nameacross every engine and question. A genre’s score is the average across its audited books. Scores sample a non-deterministic system, so re-audited titles are day-averaged and movement is only reported when it clears sampling noise. Current distribution: 27% of books are never named at all(mean score <10), the median book scores 17/100, and 3% are recommended reliably (50+).

Anti-skew rules

Two rules stop any single book from distorting a genre’s source map: a domain must be cited for at least two different audited books to rank at all, and any domain whose genre citations come ≥70% from one titleis excluded from rankings and logged for review. Ranked lists display the breadth (“across N books”) so you can see it. Raw counts are never mutated — what you see is what was logged.

Update cadence

The dataset has been accumulating since 2 June 2026and the public pages query it live — every new audit updates the index immediately. The visible “last updated” date on each page is the timestamp of the newest observation.

What stays private

Individual book scores, score movers and the raw citation export are not public — they belong to the authors and publishers who commission audits. The public index shows aggregate, domain-level findings only.

Questions or corrections

We publish corrections openly. Write to hello@publishing.co.uk — journalists can request the methodology in more depth or a walkthrough of any figure.