Sell More BooksRun yours →

§ Methodology

How the AI Discovery Score is computed.

The score measures whether modern LLMs surface a specific book for its genre when readers ask for recommendations. It’s a measurement, not a marketing claim. This page explains exactly how we compute it, what we don’t measure, and the known limitations.

What actually drives LLM book retrieval

When a reader asks ChatGPT, Perplexity, Claude, or Gemini for “best [genre] books” or “books like X”, two layers determine which titles surface:

Layer 1 — training-data citation graph.Wikipedia, Wikidata, Goodreads (especially shelves and lists), major review outlets (NYT / Guardian / LRB / Kirkus), Reddit (which most LLMs trained on via Common Crawl and Reddit’s licensed corpus), Substack and Medium niche roundups. Once a model is trained, this layer is locked in until the next retrain.

Layer 2 — retrieval-augmented (live web) signals.Listicle rankings (“best [genre] books 2026”), bookstore staff picks, podcast transcripts, genre-vertical sites (Lesbrary, Crimereads, Romance.io, ALLi, SLJ). When LLMs do live web search, this is what they read.

The six axes we score

The diagnostic extracts a structured signal table from its searches and computes the score deterministically — same inputs, same number, every time. Haiku doesn’t pick the score; the rubric below does.

Retrievability — 0 to 10

Does the Amazon page surface for direct title-and-author search? Almost every published book passes this; a zero here is a giveaway that something’s badly wrong (broken metadata, taken-down listing).

Structured-data depth — 0 to 20

Goodreads page (scaled by ratings count), Wikipedia article on the book, Wikidata entry. Goodreads is the single biggest input to LLM citation graphs for fiction and trade non-fiction. Wikipedia + Wikidata are the semantic backbone that lets LLMs traverse author → book → genre.

Listicle / peer-set presence — 0 to 25 (heaviest weight)

Inclusion in “best [genre] books” round-ups, peer-recommendation chains, curated lists. This is the axis that most directly answers would an LLM recommend this book?. A book with zero hits here is, by rubric definition, not in the recommendation chain — and the score caps accordingly.

Institutional authority — 0 to 20 (genre-specific)

Genre-specific endorsements that disproportionately shape LLM citation. Examples by genre:

  • Health / self-help / clinical: NHS Reading Well, ABCT Self-Help Seal, ADAA, NICE guidance
  • Literary fiction: Booker / Women’s Prize / Pulitzer / Costa long & shortlists, NYT/Guardian/LRB review, Kirkus starred
  • Crime & thriller: CWA Dagger awards, Crimereads roundups, BookRiot crime lists
  • Children’s & YA: ALA awards, Carnegie / Greenaway, SLJ starred reviews
  • Romance: Romance Writers awards, Smart Bitches features, Goodreads romance shelves with 100+ shelf hits
  • Academic / non-fiction: Google Scholar 50+ citations, peer-reviewed press, syllabus inclusions
  • Indie / self-published: ALLi top picks, Reedsy showcase, IndieReader awards, Storygraph community feature

Author graph — 0 to 15

Author Wikipedia article, bylines in credible publications, podcast appearance trail, expert credentials in the genre. A strong author footprint multiplies a book’s citation likelihood — when LLMs see a book by a cited author, they’re more willing to recommend it.

Cross-source citation — 0 to 10

Reddit discussions, Substack pieces, niche enthusiast forums. Reddit weighs disproportionately because its corpus was licensed by major model providers and recurs in retrieval queries.

How signals are verified

The signal table feeding the score has two sources, in this order of authority:

Direct-source verification eliminates the “model didn’t happen to surface that signal in three searches” failure mode for the boolean facts. For the harder signals, multi-source corroboration is on the roadmap.

The empirical recommendation test

Every score now includes the result of an empirical test: we make a separate LLM call with one constrained web search asking the actual question a reader would ask — “best [genre] books”— and check whether the user’s book appears in the top results.

The test returns a binary: recommended or not recommended, plus the top books that did surface in the search. Both are shown verbatim on the public score page. Anyone can re-run the same query in ChatGPT, Perplexity, or Gemini and verify the answer in under 60 seconds.

When the test returns not recommended, the runtime forces the listicle-presence axis to zero regardless of any inclusions Haiku separately reported — the test is more authoritative because it asks the recommendation question directly. This is what stops a book from being scored as “well-indexed” when an LLM-with-retrieval doesn’t actually surface it for its genre.

Hard caps

Three caps override the raw axis sum. They exist because the rubric refuses to label a book “well-indexed” when its primary genre-retrieval signal is missing — no matter how good the other axes look.

Bands

What we deliberately do not measure

Anti-gaming notes

The fastest way to ruin a measurement metric is to make it gameable. We weight the signals deliberately:

Known limitations

Refresh cycle

Every public score is automatically re-run every 90 days. The author is emailed the change. The badge SVG updates to show the latest figure and date — no action required from authors who’ve embedded it. Refresh count is visible on the public score page.

Run it on your own book.

Free, ungated. Takes about 90 seconds. You’ll get the score, the verdict, the competitive set, the three fastest fixes — plus an embeddable badge if you want to publish the score.

Run my AI-Discovery Score →