If your competitors keep showing up in ChatGPT and your brand does not, you do not have a content problem. You have a measurement problem. Until you can see exactly where the gap is - which prompts, which platforms, which competitors - every dollar you spend on "AI visibility" is a guess.
This is the audit you can run before hiring anyone. Twenty prompts, three AI platforms, a four-metric scoring rubric. It takes about four hours. At the end you will know whether the gap is small enough for your in-house team to close or large enough that you need help.
We run a version of this audit for every prospect who walks through the door. Soar is a community marketing agency that has run 4,200+ community campaigns across 280+ brands since 2017, and every engagement starts here. The version below is stripped down to what a marketing director with a spreadsheet can do without buying a tool.
Why AI visibility audits matter right now
Forty-eight percent of all Google searches now trigger an AI Overview, up from roughly 31% in early 2025 - a 58% year-over-year increase (Heroic Rankings 2026). Searches that trigger an AIO show an average zero-click rate around 83%. The traditional CTR funnel is collapsing in front of your category, and brand mention inside the AI answer is the new top-of-funnel impression.
The catch: AI citations do not track Google rankings. The overlap between ChatGPT citations and Google's top 10 is roughly 12%, and ChatGPT shares only about 8% of its top results with Google or Bing (Search Engine Land). The pages that win in AI answers are different pages than the ones your SEO agency optimized for. Without an audit, you are flying blind.
The Ahrefs 75K-brand study found that unlinked brand mentions correlate 0.664 with AI citations versus backlinks at 0.218 - mentions are three times more predictive than links (Ahrefs). That means your audit must measure mentions across third-party sources, not just your own pages. What this gives Sarah: a defensible baseline to put in her board deck before she requests a budget for the next phase.
What an AI visibility audit actually measures
An AI visibility audit measures four things: mention rate, citation position, share of voice, and citation source mix. Sentiment is a fifth, but most teams skip it on a first pass because the volume is too low to be reliable. If you only have time to track one metric, track share of voice - it is the only one that survives platform turbulence.
Mention rate is the percentage of audit prompts where your brand appears anywhere in the answer. Citation position is whether you are mentioned first ("we recommend X") or buried ("you might also consider Y"). Share of voice is your mentions as a fraction of total brand mentions across the audit set. Source mix is which third-party domains the AI cited when it mentioned you (Reddit thread, G2 review, your own site, an industry roundup).
These are different from the brand-search-volume metrics your SEO team tracks today. Brand search volume correlates 0.334 with AI citations - meaningful, but not deterministic (Ahrefs). Most of the predictive signal lives in mentions across third-party communities, not in branded queries on Google. What this means for your scope: the audit has to look outside your own site to be honest.
Step 1: Build your 20-prompt test set
The prompt set is the audit. Get this wrong and the rest is noise. Build twenty prompts in three categories: ten category prompts (no brand named - "best [thing] for [audience]"), six comparison prompts (your brand or category vs. a competitor), and four branded prompts (your name, your product). The 70/30 unbranded-to-branded split mirrors how enterprise AI visibility platforms standardize their tracking sets (Averi).
Each prompt should map to a real buying intent. To find them, pull from three places: your sales team's most-cited objections, your top-of-funnel paid keywords, and the recurring questions in the two or three subreddits where your category lives. If you have a fixed list of prompts your buyers actually type, use that - most teams hit a useful accuracy plateau between 30 and 40 prompts (SE Ranking), but 20 is enough for a first audit.
| Prompt type | Count | Example |
|---|---|---|
| Category | 10 | "What are the best [category] tools for a Series B SaaS company?" |
| Comparison | 6 | "[Your brand] vs [competitor] for [use case]" |
| Branded | 4 | "Is [your brand] a good fit for [persona]?" |
What this gives you: a reusable prompt library you can re-run quarterly without designing new questions every time. For a deeper treatment of prompt selection, see our piece on how to find the prompts that matter for ChatGPT and Claude visibility.
Step 2: Run prompts across ChatGPT, Perplexity, and Google AI Overviews
Run each of the 20 prompts across three AI surfaces. ChatGPT (default GPT, search enabled). Perplexity (default model). Google AI Overviews (logged-out Chrome window, your target country). That gives you 60 raw observations. Run each prompt three times per platform and record the modal answer - every LLM answer is probabilistic, and a single-shot check produces noise rather than signal.
Do not test the same prompt on Claude unless you sell into a developer audience. Claude has the smallest measurable consumer footprint of the major assistants and the lowest brand-recall ceiling for non-technical categories. Add it later when you scale the audit.
Why these three? They behave differently enough that one is not a proxy for another. Reddit accounts for roughly 24% of all Perplexity citations and over 5% of ChatGPT's, but only 0.1% of Gemini's, according to Tinuiti's Q1 2026 AI Citations Trends Report (Otterly). A brand that wins on Perplexity through Reddit conversations can be invisible on Gemini, and vice versa.
Capture the raw output in a spreadsheet: prompt, platform, run number, brands mentioned (in order), cited URLs. What this gives you: the corpus you score in step three. Without it, you have opinions, not an audit.
Step 3: Score the results with a 100-point rubric
Score each prompt-platform pair on four dimensions and sum to a 100-point total. Mention rate (40 pts): percentage of prompts where you appear at all. Citation position (20 pts): weighted score where first-position mentions count fully, mid-list at 0.5, last-mention at 0.25. Share of voice (25 pts): your mentions ÷ total brand mentions across the audit. Source quality (15 pts): percent of citations from third-party sources Sarah's team did not write (forums, review platforms, independent media).
The 40-point weight on mention rate isn't arbitrary - it tracks roughly with how HubSpot's AEO Grader and the major audit tools weight the basics (HubSpot AEO Grader). Citation position matters less than people assume; once you are in the answer, the bigger lever is whether the AI cites you for the right reason.
| Score band | What it means |
|---|---|
| 80–100 | Category leader. AI is reinforcing your existing distribution. |
| 60–79 | Mentioned but not preferred. Optimization, not rebuild. |
| 40–59 | Inconsistent. Visible on one platform, invisible on others. |
| 20–39 | Trailing the category. Structural gap in earned media. |
| 0–19 | Functionally invisible. Your competitors own this surface. |
What this means for the next conversation: anything below 40 is the threshold where DIY rarely closes the gap inside two quarters. We will get to why later.
What different audit results look like across platforms
Once you have the rubric numbers, look at the platform breakdown - not the average. The platform mix is where the diagnosis lives. A brand can score 65 overall but be at 85 on Perplexity and 25 on Google AIO, which is a completely different problem from a flat 65 across all three.
| Pattern | Likely cause | Fix |
|---|---|---|
| High Perplexity, low ChatGPT | Strong Reddit footprint, weak independent media | Push for editorial coverage and G2/Capterra presence |
| High ChatGPT, low Perplexity | Strong owned content, weak community signal | Earn Reddit and Quora mentions in target subreddits |
| High AI Overviews, low everywhere else | Strong classical SEO, weak entity authority | Build third-party brand mentions and structured data |
| Flat low across all three | Category awareness gap, not a tactical gap | Strategic community plus PR program, 6+ months |
Brands with G2, Capterra, or Trustpilot profiles are roughly three times more likely to be cited in AI answers, and brands mentioned positively across four or more non-affiliated forums are 2.8x more likely to appear in ChatGPT responses (ConvertMate). The pattern in your audit will tell you which of those levers you are missing. What this means for budget: the diagnosis sets the channel mix, not the other way around.
What "good" looks like - realistic benchmark ranges
Treat absolute scores carefully. Categories vary wildly. In crowded SaaS verticals, the top three brands often hit 70+ and the long tail sits below 20 - there is rarely a middle. In services categories (law, finance, healthcare), median scores cluster between 40 and 55 because AI models hedge harder. In consumer categories with strong Reddit communities, well-known DTC brands frequently hit 60+ on Perplexity and 30–40 on ChatGPT.
Two reasons the benchmarks shift fast. First, the share of AI citations attributed to social media climbed past 9% in early 2026, with Reddit driving most of the growth across nine product categories (Otterly Q1 2026). Second, citation source mix is volatile: when Reddit sued Perplexity in October 2025, Perplexity's Reddit citation share dropped 86% almost immediately and YouTube filled the gap - a category leader could lose half its Perplexity surface in six weeks.
The first 30% of a page accounts for 44.2% of all ChatGPT citations (Search Engine Land), and content updated within 30 days receives 3.2x more AI citations. What this gives you: a ranking of which fixes have the highest expected payoff per quarter.
Tools that automate the audit (free and paid)
If you do not want to run sixty prompts by hand, the tooling has matured. Free options: HubSpot's AEO Grader and Knowatoa give you a single-shot diagnostic across the major platforms. Paid options that we see most often in client stacks: Profound, Otterly, Peec AI, and Parse. Otterly starts around $29/month and Peec AI at €89/month, which keeps an ongoing audit cheaper than a single agency hour (Otterly, Visiblie).
Two cautions. First, free tools standardize their prompt sets. The score they give you is for the prompts they test, not the prompts your buyers type. The audit above is more diagnostic than any free tool because the prompt list matches your actual sales motion. Second, paid platforms drift. The "Reddit citations dropped 86%" event would have whipped any tool's score chart even though the underlying brand performance did not change. Tools are useful, but they are not a substitute for understanding the methodology.
For a deeper comparison of tools, including how to budget for ongoing tracking, see free AI visibility tracking tools and our measurement framework for AI visibility KPIs. What this means for procurement: start free, validate the methodology, then buy.
How long the audit takes (and how much it costs DIY)
The DIY version costs roughly four hours of a marketing operations analyst's time, plus whatever you pay for a Claude or ChatGPT Pro account. Three hours to design the 20-prompt set with input from sales and product. One hour to run all sixty prompt-platform combinations and score them. Add an hour if you do not have shared prompt-tracking spreadsheet templates yet.
The agency version is different in scope, not just polish. A professional AI visibility audit typically covers 50–100 prompts, runs each ten or more times to control for variance, segments share of voice by buyer persona and geography, and ties citation patterns back to the upstream community signals (which subreddits, which review platforms, which press surfaces) that produced them. Pricing for that engagement runs $5,000–$15,000 as a one-time deliverable, or sits inside a monthly retainer at most agencies, including ours.
For most marketing teams, the DIY four-hour version is the right starting point. It tells you whether you have a problem and how big. The agency version makes sense after you have decided to act. What this gives you: a forcing function before you spend retainer dollars.
Who this audit is for (and who should skip it)
Run this audit yourself if any of the following apply: your category has at least three named competitors, your buyers research before purchase, you sell to humans (not just procurement RFPs), and the average deal size is large enough that a single AI-driven referral matters. SaaS, B2B services, DTC consumer, and prosumer tools all qualify. Most categories above $5M ARR qualify by default.
Skip the audit (for now) if you are pre-product-market-fit, if your buyers are 100% paid-search-driven and never research, or if you are in a regulated category where AI tools refuse to recommend specific brands at all (most of regulated finance, legal, and certain healthcare categories - AIs hedge here regardless of how visible you are).
A founder or CMO can run this with a marketing operations analyst. A junior marketer running it alone tends to underweight comparison prompts and oversample branded prompts, which inflates the score. What this means for staffing: assign someone with category context, not just spreadsheet skills.
When the audit results say "hire help"
The threshold where DIY rarely closes the gap is a sustained score below 40 across all three platforms, combined with a source mix that is more than 60% your own owned content. That pattern says the issue is not your website, your blog, or your schema markup. It is that the third-party surfaces AI models trust are not talking about you, and you cannot fix that by editing pages.
The other DIY ceiling: when the audit shows a clear platform gap (high on one, low on another) that maps to a community channel where you have no organic footprint. Reddit and Quora in particular are not channels where in-house teams reliably succeed. Account-warming, subreddit selection, and AutoMod navigation take months of dedicated work, and the failure mode (a permanent ban) is hard to recover from.
If your score is 60+ and your source mix is balanced, you have an optimization problem and should keep it in-house. For more on the tradeoff, see our breakdown of AI visibility in-house vs. agency in 2026. What this gives you: a defensible threshold to put in front of the finance team.
Frequently asked questions
How often should I re-run this audit? Quarterly for the strategic version. Monthly for the trimmed prompt set if you are actively running an AI visibility program. Citation source mixes shift fast - between 40% and 60% of cited sources change month-to-month across Google AI Mode and ChatGPT. Annual is too slow.
Can I just look at brand-search volume instead? No. Brand search correlates 0.334 with AI citations - it is one signal in a wider model. The Princeton GEO study found that adding statistics to a page lifts AI visibility by 41%, citing external sources lifts visibility by 115% for lower-ranked content, and quotation density lifts it by 28% (Princeton/arXiv). None of those show up in brand search.
Should the audit include Claude and Gemini? Add Claude if you sell into developer or technical buyer audiences. Skip Gemini for the first audit unless you are in Google's owned ecosystem (Workspace, Android, certain enterprise contracts) - Reddit accounts for only ~0.1% of Gemini citations, which makes it a different optimization target than ChatGPT or Perplexity.
What if my brand is brand-new and never appears in any prompt? The audit still has value. Score the category prompts to see who AI does mention - that is your competitive set as the AI sees it. Build your roadmap to displace the bottom-quartile competitors first.
Is there a single number I can report to leadership? Yes - the 100-point composite score, broken out by platform. Report the trend, not the absolute. Leadership wants direction more than they want diagnostics.
What to do after the audit
The audit is the start, not the deliverable. The output is a numbered list of fixes ranked by expected lift: which prompts to win first, which platforms to prioritize, which third-party surfaces to invest in, and which timelines are realistic. Most teams should expect 60–90 days for the first measurable score movement and 4–6 months for compound gains, especially when the work involves earned community presence on Reddit or Quora.
If you want a second pair of eyes on your results - or you want the agency-grade version with deeper share-of-voice segmentation, source-pathway analysis, and a 90-day execution plan - that is the conversation we run with every new client. The audit becomes the strategy document.