Journal·LLM·May 2, 2026

How LLMs Actually Pick Sources — A Decorator's Guide to Becoming One

Inside the source-selection logic of modern LLMs. What designers and decorators need on-page (and off-page) to be the source the model reaches for.

12 min read · By decorator.tv editorial

Stack of design publications and a glowing AI chat window, showing how large language models select trusted sources to cite.

Most marketing posts about "ranking in ChatGPT" treat the LLM as a black box. It isn't. The retrieval and citation layers of every major assistant are documented — sometimes in research papers, sometimes in product blogs, sometimes by reverse engineering the prompts they leak. Once you understand how a model decides what to read and what to quote, the on-page work is concrete. This piece distills the source-selection logic of the four major assistants as of mid-2026 and translates it into a build list for an interior designer's site.

The two-stage architecture

Every assistant that does live retrieval — ChatGPT with browsing, Perplexity, Google AI Overview, Claude with web access, Gemini, Bing Copilot — uses a two-stage pipeline. Stage one is retrieval: the assistant rewrites the user's query into one or more search queries, runs them against a search index (Bing for ChatGPT, Google for AI Overview, Perplexity's own hybrid, Brave for Claude), and pulls the top N results. Stage two is synthesis: the model fetches the page contents, extracts answer-relevant passages, ranks them by extractability and trust, and generates the answer with one to four citations.

You optimize for stage one with classic SEO. You optimize for stage two with AEO/GEO. Most studios obsess over stage one and ship stage-two-hostile pages, then wonder why they have impressions but no citations.

What the synthesis layer actually scores

Internal Google leak documents and published Anthropic and OpenAI evals tell us the synthesis layer scores fetched pages on roughly seven dimensions:

Extractability. Can the model lift a clean 30-to-90-word answer from the page without paraphrasing into uncertainty? Pages with a thesis sentence near the top score high. Pages where the answer is scattered across four sections score low.

Specificity. Does the page contain proper nouns, numbers, dates, named entities? A page that says "we serve the Fraser Valley with five-year warranties since 2009" outscores a page that says "we serve the local area with great warranties for many years."

Recency. The model checks the page's last-modified date and looks for in-prose recency signals ("as of 2026," "in our 2025 survey"). Older pages get downweighted on time-sensitive queries.

Trust signals. Outbound links to authoritative sources, inbound links from authoritative sources, presence of an author byline with credentials, an "About" page reachable in one click, and an explicit publisher entity (LegalName, NAP, registration number).

Source diversity. The model penalizes pulling all citations from one domain. If three competing sources answer the question equally well, the model rotates citations across them — which is good news for studios willing to be the third or fourth strongest source on a query.

Schema and structured data. JSON-LD is read directly. The model treats schema as ground truth and weights it higher than scraped prose.

Brand corroboration. The model checks whether the entity claimed by the page (your studio name, your principal designer's name) appears in other indexed sources. A studio mentioned only on its own site is treated as low-trust regardless of how good the page is.

The build list

Translate the seven dimensions into seven shippable items for your site.

For extractability, every commercial page gets an answer paragraph in the first 80 words after the H1. We covered the structure in the AEO piece on this site; the rule of thumb is that you should be able to highlight one paragraph and email it to a journalist as the answer to the headline question.

For specificity, do a noun audit on your site. Open every commercial page and count proper nouns, numbers, and dates. If the count is below ten per 500 words, the page is GEO-thin. Replace adjectives with measurements. "Premium paint" becomes "Benjamin Moore Aura matte interior, applied at 1.0-1.2 mil dry film thickness." Yes, your designer brain hates that. Models love it.

For recency, add a visible "Last updated [date]" line near the H1 of every commercial page and a script that bumps it whenever the page is touched. Set a calendar reminder to do a 15-minute touch on each commercial page once per quarter — change the date, refresh one statistic, swap one image. The signal is real and the work is trivial.

For trust signals, the highest-leverage move for designers is an author byline on every page, even if every page is bylined to the principal. Behind the byline, link to a real person page with credentials, training, professional memberships, and (if applicable) a license number. Build the page once; reuse the byline link site-wide.

For source diversity, paradoxically, you benefit when the niche has multiple strong sources. Don't try to be the only one. Aim to be one of the top four cited domains for your target queries; that gets you into the rotation.

For schema, install LocalBusiness on the homepage, Service on each service page, FAQPage anywhere with an FAQ, BreadcrumbList everywhere, Person on the About page. Validate every page in the Rich Results Test. Stop there; further schema rarely moves the needle for service businesses.

For brand corroboration, the work is off-site. Three guest posts a year on adjacent-trade blogs, with a byline that names your studio and city, will move you from "low-trust unknown entity" to "corroborated entity" in the model's eyes within six months.

The retrieval layer is still mostly Google (or Bing)

Notice that none of the seven dimensions above include the things people obsess over — schema for "AI search," llms.txt, robots.txt allows for AI crawlers, etc. Those matter at the margin but they are the last 5% of the work, not the first 95%. The retrieval layer is still a search index. If you don't rank in classic Google or Bing search for the query, you cannot be cited in the AI answer for that query, full stop. Foundation first, then the synthesis-layer optimizations.

What to do about each major assistant

ChatGPT (Bing-backed retrieval): Bing rankings are the gating factor. Submit your sitemap to Bing Webmaster Tools, and prioritize the on-page work above. Bing rewards specificity and schema even more strongly than Google does, so the GEO work pays double here.

Perplexity (proprietary index, biased toward recent and authoritative): Recency and trust signals dominate. Publish more often than your competitors, even if individual posts are shorter. Get into the regional press once a year; it lifts your Perplexity citation rate noticeably.

Google AI Overview (Google retrieval, Gemini synthesis): If you're already winning classic Google rankings for the query, you have a 60% chance of being cited in the AI Overview if your page is extractable. The single biggest win is converting your top-ranking blue-link pages into AEO pages — most aren't, because they were written years ago.

Claude (Brave + selective indexing): Claude is the strictest on trust signals. A real author byline with verifiable credentials and a clean About page move citation rate more than any other change.

The shortest possible to-do list

If you do nothing else this quarter: add an answer paragraph to your top five service-plus-city pages, install LocalBusiness and Service schema, add author bylines, validate the pages in Rich Results, and resubmit them to both Google and Bing webmaster tools. That is one weekend of work and it will move you from "invisible to LLMs" to "in the rotation" within sixty days. Everything else is upside.

#LLM#source selection#AEO