ai-visibility

The 50 domains AI cites most: What brands can learn from them

AI citations cluster around reference sites, community platforms, video, review hubs, and vertical authorities. Here is where brands need to show up.

Updated May 14, 202615 min read

Originally published May 14, 2026

The 50 domains AI cites most: What brands can learn from them

The AI source graph is not a mystery anymore. It is uneven, platform-specific, and much less friendly to brand websites than most SEO teams expected. Ahrefs' published Google AI Mode dataset gives us a concrete 50-domain inventory, while cross-engine studies from Semrush, Similarweb, Peec, and Surfer show why the exact ordering changes by model. The list should change how Sarah budgets AI visibility.

Soar is a community marketing agency that has run 4,200+ community campaigns across 280+ brands since 2017. We read the source studies differently than a GEO tool vendor does: not as a leaderboard to admire, but as a map of where a brand has to earn third-party proof before ChatGPT, Perplexity, Gemini, or Google AI Mode will recommend it.

Top-50 sources shared across AI Overviews, ChatGPT, and Perplexity.

Source: Ahrefs, June 2025

100M+

AI citations analyzed across ChatGPT Search, Google AI Mode, and Perplexity.

Source: Semrush, 13-week citation study

46M

Google AI Overview citations analyzed from March to August 2025.

Source: Surfer, AI Overviews study

0.664

Correlation between brand web mentions and AI Overview brand visibility.

Source: Ahrefs, 75K-brand study

Which 50 domains are actually in the source set?

Ahrefs' most complete public 50-row list comes from 5.5 million Google AI Mode queries, where Wikipedia, YouTube, Google properties, Reddit, and Amazon occupy the first six positions. Treat this as the cleanest published inventory, not a universal cross-engine ranking. Semrush, Peec, Similarweb, and Surfer all show the same pattern with different engine-specific winners.

The table below normalizes www and mobile hosts where the parent domain is the useful planning unit. It keeps Google product hosts and Wikipedia language hosts separate because the source behavior is different. The final column is the board-level implication, not a tactical posting instruction.

Rank	Domain	Source class	What it means for brands
1	`en.wikipedia.org`	Reference	Independent notability and entity facts still anchor AI answers.
2	`youtube.com`	Video	Transcript-rich video is a retrieval surface, not only a media channel.
3	`blog.google`	Platform-owned	Google cites its own product and policy material heavily.
4	`reddit.com`	Community	Buyer objections and peer comparisons feed subjective recommendations.
5	`google.com`	Platform-owned	Official Google properties shape product, search, and local answers.
6	`amazon.com`	Commerce	Product listings and review density matter for shopping prompts.
7	`quora.com`	Community	Explainer questions still have retrieval weight in Google AI surfaces.
8	`facebook.com`	Social/community	Local, consumer, and group-based evidence remains visible.
9	`yelp.com`	Reviews/local	Local service and reputation prompts need review-site coverage.
10	`instagram.com`	Social/video	Visual consumer categories pull from social proof.
11	`imdb.com`	Vertical database	Structured vertical databases beat general pages in their category.
12	`tripadvisor.com`	Reviews/travel	Travel and local experience queries lean on review ecosystems.
13	`linkedin.com`	Professional/social	B2B proof increasingly comes from people, companies, and expert posts.
14	`mapquest.com`	Local/maps	Location-intent answers can pull from legacy local data sources.
15	`walmart.com`	Commerce	Large retail catalogs become source material for product answers.
16	`britannica.com`	Reference	Neutral explainers still compete with Wikipedia for factual prompts.
17	`healthline.com`	Health vertical	Consumer health answers reward readable medical explainers.
18	`yahoo.com`	Portal/news	Broad portals still surface in news, finance, and general queries.
19	`ebay.com`	Commerce	Marketplace pages shape price, availability, and collectible prompts.
20	`clevelandclinic.org`	Health authority	Institutional trust dominates higher-risk medical questions.
21	`mayoclinic.org`	Health authority	Medical authority is a separate source class, not generic SEO.
22	`webmd.com`	Health vertical	Legacy consumer health libraries remain heavily retrievable.
23	`pinterest.com`	Visual/social	Visual planning queries use image-led discovery surfaces.
24	`support.google.com`	Platform-owned	Official support docs answer product and troubleshooting prompts.
25	`merriam-webster.com`	Reference	Definitions are still a major citation pattern.
26	`tiktok.com`	Short-form video	Some consumer and trend queries now cite short-form social content.
27	`medicalnewstoday.com`	Health vertical	Health publishers can outrank brands with clearer condition explainers.
28	`pmc.ncbi.nlm.nih.gov`	Research	Primary and archived research matter when the model needs evidence.
29	`wikihow.com`	Instructional	Step-by-step utility content still wins procedural prompts.
30	`study.com`	Education	Education libraries perform well on explanatory queries.
31	`indeed.com`	Jobs/careers	Labor-market answers pull from structured job and career data.
32	`espn.com`	Sports vertical	Sports queries reward dedicated, current vertical authorities.
33	`collinsdictionary.com`	Reference	Dictionary sources cluster around definition and language prompts.
34	`medium.com`	Editorial/social	Long-form expert posts can enter the citation set when they are clear.
35	`etsy.com`	Commerce	Niche marketplaces influence product and gift recommendations.
36	`businessinsider.com`	Business media	Business explainers and buying guides still shape category answers.
37	`nytimes.com`	News/media	High-authority journalism remains a trust layer for broad topics.
38	`verywellhealth.com`	Health vertical	Consumer medical publishers add approachable context to health answers.
39	`target.com`	Commerce	Retail catalogs and product pages influence shopping responses.
40	`dictionary.com`	Reference	Definition-heavy prompts still cite classic reference sites.
41	`play.google.com`	Platform-owned	App and software discovery can pull from app-store listings.
42	`goodrx.com`	Health/pricing	Medication and price prompts need specialized health-commerce sources.
43	`homedepot.com`	Commerce/how-to	Home-improvement answers blend retail inventory with instructional content.
44	`x.com`	Social/news	Breaking-news and Grok-adjacent retrieval can make X disproportionately visible.
45	`sciencedirect.com`	Research	Scientific literature remains the trust layer for technical claims.
46	`nerdwallet.com`	Finance vertical	Personal-finance explainers shape comparison and recommendation prompts.
47	`people.com`	Entertainment/media	Celebrity and entertainment prompts cite mainstream entertainment media.
48	`usatoday.com`	News/media	National news sources still anchor broad informational answers.
49	`forbes.com`	Business media	Contributor and editorial coverage affects business-category visibility.
50	`simple.wikipedia.org`	Reference	Simplified encyclopedic pages can be easier for retrieval systems to use.

The Soar read is simple: the exact rank order changes, but the source classes do not. Models keep reaching for encyclopedic pages, human discussion, video, professional identity, reviews, commerce catalogs, and high-authority vertical sites. For your brand, the question is not "can we outrank Wikipedia?" It is "which of these source classes should mention us when a buyer asks about our category?"

Why do the rankings vary by platform?

AI citation studies disagree because the engines are not using one shared web index. ChatGPT Search, Google AI Mode, Google AI Overviews, Perplexity, Gemini, and Claude each combine different crawling, retrieval, reranking, partner, and search layers. Treating "AI search" as one channel is the fastest way to misread the data.

OpenAI's own help center says ChatGPT can search the web and may include inline citations when search is used. Google's AI Mode announcement describes query fan-out, where Search breaks a question into subtopics and issues multiple searches simultaneously. That architecture naturally rewards pages that answer fan-out subquestions, not only pages that rank for the surface keyword. Semrush's AI Mode comparison shows the result: AI Mode had only about 54 percent domain overlap and 35 percent URL overlap with Google's top 10, while AI Overviews and Perplexity tracked traditional search more closely.

This is why Sarah should reject any AI visibility plan built around a single tactic. Bing crawlability may matter for ChatGPT Search. Community proof may matter for Perplexity. Classical search and E-E-A-T still matter for AI Overviews. A source plan should be built per engine, not copied from the SEO roadmap.

What source types matter for brands?

For a marketing leader, the useful map has five buckets, not 50 rows. Reference sites establish entity facts. Community sites capture buyer language. Video demonstrates usage and expertise. Professional/social platforms validate people and companies. Review and directory sites give category comparison prompts a structured answer set.

Wikipedia, NIH, Google docs, government pages, and category explainers help AI systems anchor facts, definitions, and neutral descriptions.

Reference

Reddit, Quora, forums, and niche communities carry buyer objections, lived experience, competitor comparisons, and sentiment.

Community

YouTube and transcript-rich video pages help engines answer how-it-works, comparison, product, and tutorial prompts.

Video

LinkedIn, GitHub, company blogs, and author pages validate expertise, team identity, and technical credibility.

Professional

G2, Yelp, Amazon, Capterra, Trustpilot, and vertical directories shape "best", "alternative", and "is it worth it" answers.

Review

The mistake is to chase all five equally. A B2B SaaS brand probably needs G2, LinkedIn, Reddit, technical docs, and GitHub more than Yelp. A DTC brand may need Reddit, YouTube, TikTok, Amazon reviews, and specialist publishers. A health brand needs medical authority and compliance-safe expert content before it worries about Reddit volume. The source graph is a prioritization tool, not a checklist.

What does Reddit's position actually mean?

Reddit matters because it is both a training-data asset and a retrieval asset, but its exact citation share changes by engine and month. That volatility is not a reason to ignore it. It is the reason brands need durable community presence instead of one-off seeding when the numbers look favorable.

OpenAI announced in May 2024 that it would access Reddit's Data API for real-time, structured Reddit content. Axios reported in October 2025 that Profound analyzed more than 1 billion citations and found Reddit was the second most-cited platform behind YouTube, while Perplexity cited Reddit most often among the sources Profound tracked. Semrush, meanwhile, documented a sharp ChatGPT shift: Reddit appeared in close to 60 percent of prompt responses in early August 2025 before dropping to about 10 percent by mid-September.

For a brand, the implication is not "Reddit always wins." It is more precise: Reddit is a high-trust source for subjective category questions, competitor comparisons, troubleshooting, and buyer validation. Those are exactly the prompts that drive pipeline. The deeper mechanics are covered in how Reddit became the biggest single source of LLM citations, but the operating rule is simple: if Reddit threads shape buyer confidence in your category, they are part of your AI visibility surface.

Where should a brand build presence first?

Start where a buyer would expect credible evidence if an AI answer did not exist. That usually means one reference surface, one community surface, one review surface, and one owned page that ties the facts together. Do not start by rewriting every blog post into answer-box copy.

Ahrefs' 75,000-brand study is the strongest signal here. Brand web mentions had a 0.664 correlation with AI Overview brand visibility, much higher than backlinks at 0.218. The same study found the top quartile of brands by web mentions earned over 10 times more AI Overview mentions than the next quartile. Correlation is not causation, but it matches what we see in client audits: brands absent from third-party conversation are rarely recommended in AI answers, even when their own websites are technically sound.

The first 90 days should create source diversity. For a category with Reddit-heavy buying behavior, that means credible Reddit participation plus a reference-quality owned page. For a B2B software category, add G2 and LinkedIn. For technical categories, add GitHub or documentation. For consumer categories, add YouTube and review ecosystems. The goal is not volume; the goal is enough corroboration that the model does not have to trust your homepage alone.

How should Sarah use the top 50 list in a board deck?

The board-slide version is this: AI visibility is no longer only an owned-content problem. The source graph shows that models answer commercial questions with third-party proof, community discussion, video, reviews, and vertical authority. That means AI visibility budget belongs partly in community, partly in content, partly in review operations, and partly in measurement.

Use a 2-by-2. On one axis: source credibility for your category. On the other: controllability. Your website is high-control but often lower-trust for recommendation prompts. Wikipedia and major publishers are high-trust but low-control. Reddit and Quora sit in the middle: you cannot control them, but you can participate credibly and create useful evidence over time. Review sites sit in the middle too, with structured profiles and customer proof you can influence without faking anything.

The budget implication is the point. If 80 percent of the quarter's AI visibility spend goes into schema, blog rewrites, and landing-page copy, the plan is overweight owned surfaces. A healthier first-quarter allocation reserves real budget for third-party source development: Reddit and Quora participation, review profile cleanup, executive LinkedIn proof, YouTube explainers, and source measurement across engines.

How much should this cost?

A serious AI visibility source-development program costs more than a technical audit and less than hiring a full internal GEO team. For a $5M to $50M company, a focused 90-day program typically lands between $15,000 and $45,000 before enterprise tooling. That covers baseline prompts, source mapping, community participation, review-site cleanup, content refreshes, and reporting.

The cheap version is a $2,000 tool subscription plus a few SEO rewrites. It will produce charts and maybe better on-page answers, but it will not create the off-site proof AI systems keep citing. The expensive version is a six-month multi-surface program that bundles Reddit, Quora, YouTube, review operations, PR, content, and measurement. That can be right for a category leader, but it is not the right starting point for a team trying to prove the channel.

The 90-day pilot should have a kill switch. Baseline 40 to 80 prompts, pick 3 to 5 source surfaces, ship interventions for 60 days, and rerun the same prompt set. If mention rate, citation quality, or competitor displacement does not move by the second review, narrow the surface or pause. Our 90-day GEO program lays out the cadence.

Who is this strategy for?

This source-graph strategy is for brands whose buyers ask comparative, skeptical, or high-intent questions before they book a demo or buy. If your category is low-consideration and mostly impulse-driven, AI citations may matter later. If your category involves risk, switching cost, compliance, peer validation, or category confusion, it matters now.

It is especially useful for B2B SaaS, fintech, health and wellness, developer tools, DTC categories with heavy review behavior, and any company whose competitors are already recommended by ChatGPT or Perplexity. These are categories where a model rarely trusts one vendor page. It wants evidence from peers, reviews, technical documentation, explainers, and comparison pages. The cross-engine differences are why we maintain ChatGPT vs Claude vs Perplexity vs Gemini as a separate playbook.

The wrong fit is a brand that wants citation share without changing where it shows up. AI visibility work exposes old distribution debt. If the category conversation is happening on Reddit and LinkedIn while your team only publishes blog posts, the answer is not more blog posts. The answer is to enter the places the model already trusts.

Frequently asked questions

What are the most cited domains across AI engines?

The exact order changes by study and month, but Wikipedia, Reddit, YouTube, LinkedIn, Google properties, G2, Yelp, Forbes, Fandom, GitHub, NIH, Amazon, and major news publishers recur across the strongest public datasets. Treat the list as source classes, not a fixed leaderboard.

Does my brand need a Wikipedia page to show up in AI answers?

Not always. Wikipedia helps entity recognition, but many brands are not notable enough for a legitimate page. For most mid-market brands, the better first move is third-party corroboration through Reddit, Quora, review sites, YouTube, LinkedIn, analyst coverage, and category-specific publishers.

Is Reddit still worth investing in if ChatGPT citations fluctuate?

Yes, if Reddit shapes buyer trust in your category. Semrush found Reddit's ChatGPT citation share fell sharply in 2025, while Axios and Profound still found Reddit heavily cited across broader AI surfaces. The stable insight is not a single percentage; it is that Reddit remains a trusted source class for subjective buyer questions.

How often should we audit AI source coverage?

Monthly is the right cadence for active programs. AI retrieval weights shift faster than classical rankings, and source changes can move before traffic does. Track prompt-level mention rate, citation source, sentiment, and competitor displacement across ChatGPT, Claude, Perplexity, Gemini, and Google AI surfaces.

Can owned content alone earn AI citations?

Owned content can earn citations, especially when it is authoritative, fresh, and already visible in search. But the strongest brand-visibility studies point to off-site mentions and third-party proof as major signals. Owned content should anchor the facts; community and review surfaces should corroborate them.

Conclusion

The top-50 AI citation lists are not trivia. They are a budget map. AI engines do not build brand recommendations from your homepage alone; they triangulate from reference sites, communities, video, professional identity, reviews, news, and vertical authorities. The brands that treat those surfaces as one visibility system will have a measurable advantage over brands still asking an SEO agency to "add AI keywords" to blog posts.

The practical plan is narrow: choose the engines your buyers use, map the source classes those engines already cite, build credible proof on the 3 to 5 surfaces that matter most, and measure the same prompt set every month. The list will keep changing. The operating model will not.