Competitor Analysis in AI Search: A 30-Minute Baseline Before You Buy Any Tool

Hugo Debrabandere

Hugo Debrabandere

Co-founder · Clairon

Apr 28, 2026

A VP Marketing at a $25M ARR analytics SaaS pinged me last Thursday: "Our competitor is showing up everywhere in ChatGPT, we are nowhere." First question I asked: how do you know? She did not. She had run two prompts in her browser on a Tuesday morning, seen Looker named twice, and forwarded the screenshots to her CEO. Two prompts is one observation of a system that changes 70% of the time. The screenshots were not data, they were anxiety with a logo on top.

Most teams skip the baseline and go straight to the tool purchase. Wrong order. The 30-minute manual baseline below tells you whether your citation gap is real or imagined, which engines actually matter for your category, and where the competitor is earning citations. Once that is clear, you can buy a tool with a specific job description.

6.5%
ChatGPT–Google top-10 citation overlap (BrightEdge, 2026)
30%
of brands stay visible between two consecutive ChatGPT runs of the same prompt (AirOps, 2026)
85%
of brand mentions in LLM answers come from third-party pages, not owned domains (AirOps, 2026)

Why competitor analysis in AI search is not SEO competitor analysis

Most SEO competitor frameworks shipped between 2010 and 2024 assume one thing: that the engines are reading roughly the same web. In 2026, they are not. Each engine pulls from a different retrieval pool with a different refresh cadence and a different weighting on third-party authority versus owned content.

Engine retrieval pool, refresh cadence, citation overlap with Google top 10
EngineRefresh cadenceGoogle top-10 overlap
ChatGPT12 to 16 weeks6.5% (BrightEdge, 2026)
ClaudeQuarterly+ (slowest)Limited public data
Perplexity4 to 6 weeks (fastest)43.5% (Ahrefs, 2025)
Gemini8 to 12 weeksMid-range
CopilotInherits Bing, ~weeklyBing-correlated
Google AI Overviews8 to 10 weeks~17% (citations from outside top 10 ~83%)

upGrowth’s 2026 framework quantifies the consequence: 83% of Google AI Overviews citations come from pages outside the top 10 organic results. Translation: a competitor who outranks you on Google can still be invisible on AI Overviews, and a competitor invisible on Google can dominate a Perplexity answer. SEO position alone tells you almost nothing about citation share.

Pick 3 competitors and 3 tier buckets

Three real competitors. No more. Beyond three, the prompt grid becomes unworkable inside the 30-minute window, and the tier matrix loses its forcing function. Pick competitors that represent three structural threats: the incumbent leader, the fastest-growing peer, and the AI-native challenger.

Three tier buckets to organize the prompts:

  • Tier A: named head-to-head. "HubSpot vs Salesforce." Forces the model to rank. Highest commercial signal, easiest to score.
  • Tier B: category recommendation. "Best CRM for a 50-person B2B SaaS." Your name may not appear in the prompt. Tests whether you make the shortlist at all.
  • Tier C: problem-led. "How do I stop losing deals at the proposal stage?" Long-tail. Often where small brands displace incumbents because the model has weaker priors.

Reference matrix (use these to sanity-check the method)

Plug your own competitors in, but verify the methodology in real time using one of these publicly observable head-to-heads first. Run any prompt in the next section and you should see results consistent with what we describe.

Verifiable head-to-head examples
VerticalBrand 1Brand 2Brand 3
Marketing / CRMHubSpotSalesforcePipedrive
ProductivityNotionLinearClickUp
PaymentsStripeAdyenCheckout.com
ObservabilityDatadogNew RelicGrafana
Analytics SaaSLookerTableauMode

The 30-prompt grid (verbatim)

Three tiers × ten prompts each. The phrasings below are the shapes that consistently force comparable answers across all six engines. Replace the bracketed placeholders with your actual competitors and category.

Tier A: direct head-to-head (10 prompts)

Forces the engines to rank. Use these verbatim:

  1. Compare [Brand 1] vs [Brand 2] vs [Brand 3] for a [team size] [vertical]. Rank them in order and give one paragraph per tool with sources.
  2. [Brand 1] vs [Brand 2] vs [Brand 3] for [primary use case]. Which would you recommend and why? Cite specific reviews.
  3. Score [Brand 1], [Brand 2] and [Brand 3] on [criterion 1], [criterion 2] and [criterion 3]. Use a 1-to-10 scale.
  4. For a [team size] team running [workflow], compare the top 3 [category] tools head-to-head.
  5. Which is better, [Brand 1] or [Brand 2], for [specific workflow]? Be specific.
  6. List the top 3 [category] tools in 2026 and rank them by [your differentiator].
  7. [Brand 1] vs [Brand 2]: which one has better [specific feature]?
  8. Compare pricing and total cost of ownership for [Brand 1], [Brand 2] and [Brand 3] over 24 months for a [team size] team.
  9. Which [category] tool is the best fit for [vertical], [Brand 1], [Brand 2] or [Brand 3]?
  10. Rank [Brand 1], [Brand 2] and [Brand 3] for [specific integration] support.

Tier B: category recommendation (10 prompts)

Your name may not appear. Tests whether you make the shortlist.

  1. What are the top 3 [category] tools for a [team size] [vertical] in 2026? List with one-line reasons and sources.
  2. Best [category] for a remote-first startup, 2026. Top 3 only.
  3. Best [category] tool with [specific capability], 2026.
  4. Recommend a [category] platform for a $[revenue] [vertical].
  5. What [category] tools have the highest customer satisfaction in 2026?
  6. Best mid-market [category] tools under $[price] per month.
  7. Top 5 [category] platforms for [specific use case].
  8. Which [category] tools have the best [feature] in 2026?
  9. Affordable [category] alternatives to [market leader].
  10. Best [category] tool for [specific integration].

Tier C: problem-led (10 prompts)

Long-tail, where small brands displace incumbents:

  1. Our [team] keeps [problem]. What software stack would you recommend? Name specific products.
  2. How do I [specific job]? Recommend specific tools.
  3. We are [team size] and our [workflow] is a mess across [N] tools. What is the right consolidation?
  4. What is the best way to [specific outcome] for a [vertical]?
  5. How do I improve [metric] using [category] software?
  6. What tool helps with [specific pain point]?
  7. Recommend a workflow to [specific goal] using [category].
  8. Best practices to [outcome], with named tools.
  9. I am switching from [legacy tool], what is the modern replacement?
  10. How do I scale [process] from 10 to 100 [units]? Name the stack.

Run it across 6 engines, two runs each

Open one fresh incognito window per engine. No logged-in personalization. Run each prompt twice, ideally spaced across 48 hours, because LLMs are stochastic and a single run is one observation of a system with 70% answer churn.

Open six incognito tabs

ChatGPT (with web search on), Claude.ai (with search), Perplexity, Gemini, Copilot, Google (for AI Overviews). Log out of any account that personalizes results.

Paste each prompt verbatim, twice per engine

Run 1, log result. Wait 24 to 48 hours. Run 2, log result. Two runs is the minimum for stochastic systems, three is better. Screenshot every answer.

For each answer, log five fields

Brand cited Y/N. Position when cited (1st named / mid-list / footnote). Sentiment (positive / neutral / cautionary). Cited URL. Source domain (own site / G2 / Reddit / LinkedIn / news / blog).

Repeat for all 30 prompts × 6 engines × 2 runs

Total: 360 cells. Time budget: 90 minutes if you do not screenshot, 120 if you do. Screenshots help in the QBR review later, do them.

The competitor citation share scorecard

Copy this template into a Google Sheet. One row per cell (prompt × engine × run). At 360 rows the patterns become legible.

Scorecard fields, one row per cell
FieldExample valueWhy it matters
Date run2026-04-28LLMs drift. Date-stamp every cell.
EngineChatGPT-5Cross-engine SOV requires per-engine attribution.
Prompt IDA-01 (Tier A, prompt 1)Lets you re-run identical prompts on a quarterly cadence.
TierA / B / CTier C is where SMBs steal citations from incumbents.
Run1 of 2Stochastic, only 30% inter-run consistency (AirOps).
Your brand cited?Y / NBinary base.
Position1st named / mid / footnotePosition 1 gets 1.5 to 2× more consideration than position 3.
SentimentPositive / Neutral / CautionaryCautionary mentions hurt, separate from wins.
Cited domain (your brand)g2.com, yourbrand.com, reddit.com/r/SaaSTells you which third party is doing the lifting.
Comp 1 cited?Y / NSame fields per competitor.
Comp 1 position + cited domain1st named / hubspot.com/blog/xThe third-party asset you need to displace.
Citation gap (pp)(their SOM) − (your SOM) per engineThe weekly KPI.
Recovery hypothesisAdd 120-word answer block to /pricingTie every gap row to one editable paragraph.

Compute share-of-model and the gap

The two derived numbers that close the loop. Compute both per engine, then average across engines for a cross-engine SOM.

text
# Per engine, per brand
SOM_engine = (cells where brand X appears) / (total cells run on engine) × 100

# Cross-engine
SOM_cross = average(SOM_engine across all 6 engines)

# Citation gap to leader
gap_pp = SOM_leader_competitor - SOM_yours

# Action threshold
if gap_pp >= 10 and engines_with_gap >= 4:
    action = "automate the loop, buy a cross-engine tool"
elif gap_pp >= 10 and engines_with_gap < 4:
    action = "engine-specific content sprint"
elif gap_pp < 10:
    action = "ship 2-3 paragraph rewrites, re-baseline in 4 weeks"

B2B SaaS leadership threshold is roughly 25%+ cross-engine SOM (upGrowth, 2026). Most B2B SaaS sit at 15 to 20%. If you are below 10% cross-engine and your top competitor is above 25%, you have an addressable but real gap. The recovery target is the 10-point band, the recovery horizon is two quarters.

Tag the citation source for every win

The source domain column is the most actionable column in the sheet. Cluster wins by source type:

  • Owned domain wins (yourbrand.com cited): your content engine is doing the work. Double down on the page shapes that win.
  • G2 / Capterra wins: review platform momentum. If competitors win here and you do not, you have a review volume problem.
  • Reddit / Quora wins: watch the date. Post- September 2025, ChatGPT deweighted Reddit aggressively. Reddit wins observed pre-Q4 2025 may already be decaying.
  • LinkedIn / Forbes / news wins: rising. The slice that absorbed the redistributed Reddit / Wikipedia citation share. Invest here in 2026.
  • Comparison-blog wins (third-party listicles): invest in placement on the listicles your competitor wins on. Eighty-five percent of brand mentions come from third-party pages (AirOps), this is the cheapest displacement vector.

Engine-specific gotchas

Each engine has a personality. Skipping these calibrations is how you ship a "competitor X dominates" report that is actually an artifact of your prompt phrasing.

What to remember per engine
EngineWatch out forCalibration
ChatGPTSource mix shifted hard in Sept 2025 (Reddit -50pp, Wikipedia -35pp)Re-baseline if your data predates Oct 2025
ClaudeRefuses to rank named competitors ~40% without scaffoldingPrepend procurement-analyst persona prompt
PerplexityRefresh cadence is 4 to 6 weeks, fastest of the sixRe-baseline monthly, weekly if you ship a content sprint
GeminiUnder-cites by default, favors LinkedInAdd: 'Cite at least 3 external sources per recommendation'
CopilotInherits Bing's source bias, results correlate with Bing rankingsIf you rank well on Bing, expect Copilot wins
Google AI Overviews83% of citations from outside Google top 10SEO position is uncorrelated with AIO citation, do not assume overlap
A competitor that wins on one engine is winning a contest. A competitor that wins on five is shipping a strategy.
Internal Clairon playbook·Competitor analysis principle #3

When to upgrade from manual to a tool

The manual baseline is honest, defensible, and free. It is also a one-time exercise. Once you have proven the gap is real, the weekly maintenance becomes the bottleneck. The break-even is consistent across the customer set we have audited:

  1. Sustained 10+ point gap across 4 or more engines. You need automated cross-engine tracking, the manual loop cannot keep up week-to-week.
  2. 10+ hours per week on the manual loop. At $80/hr loaded analyst cost, the manual loop costs $3,200/mo in labor. A $49 to $499 monthly tool subscription is 6 to 60× cheaper.
  3. Need for paragraph-level rewrite suggestions. Once you know which prompts you lose, the next question is "rewrite which sentence on which page." Manual analysis cannot tell you that, but tools that pair measurement with operative recommendations can.

Where to go deeper

This article sits inside the GEO measurement cluster. The companion playbooks below cover the full measurement stack, the tooling teardown, and the weekly operator’s loop.

The competitor that worries you most in your weekly Slack is rarely the one that beats you in AI search. Run the baseline. The actual leader is usually the one you forgot to track.

Frequently asked questions

How many prompts do I actually need to baseline three competitors, 10, 30, or 200?
Thirty prompts is the right unit for a manual baseline (3 tiers × 10 prompts each), run twice for stochasticity. Below 30 you cannot detect a tier-specific gap, above 50 the marginal prompt teaches you nothing new for a manual exercise. The 200-prompt set is for the weekly automated loop, not the one-time competitor baseline.
How often should I re-run the competitor baseline?
Cadence varies by engine. Perplexity refreshes its citation pool every 4 to 6 weeks, Google AI Overviews every 8 to 10 weeks, Gemini every 8 to 12 weeks, ChatGPT every 12 to 16 weeks (upGrowth, 2026 framework). Re-baseline against the slowest engine in your priority list. For a B2B SaaS prioritizing ChatGPT and Perplexity, that's roughly quarterly.
Is being mentioned without being cited (linked) actually worth anything?
Less than you think. AirOps measured that brands earning both a mention and a citation in the same answer were 40% more likely to reappear in subsequent runs than brands earning only a mention. A mention without a link is unstable: the model is paraphrasing, the next refresh may drop you. Track mention rate separately from citation rate, and treat mention-only signals as leading indicators rather than wins.
What does it mean when ChatGPT and Perplexity disagree about the top competitor?
It means the engines are reading different webs. Perplexity has a 43.5% top-10 citation overlap with Google search results (Ahrefs 2025), ChatGPT has only 6.5% (BrightEdge 2026). When they disagree, you should believe the engine your buyer uses, not the average. For technical B2B buyers, weight Claude and Perplexity higher. For consumer DTC, weight ChatGPT and Google AI Overviews.
Can I trust a single run of a prompt?
No. AirOps measured that only 30% of brands stayed visible between two consecutive runs of the same prompt, and only 20% across five runs. Always run each prompt at least twice in incognito sessions, ideally three times spaced across 48 hours. A single run is one sample of a stochastic system, the equivalent of a one-question political poll.
My competitor wins on Reddit threads I cannot post in. What now?
Check the date. Between August and September 2025, ChatGPT cut its Reddit citation share from roughly 60% to 10% and its Wikipedia share from 55% to under 20% (Semrush 3-month study, 230k prompts). If your competitor is winning on Reddit, that win is probably already decaying on ChatGPT. Pivot your third-party content investment to LinkedIn, Forbes, and category-specific publications, which are absorbing the redistributed citation share.
How is share-of-model different from share-of-voice, and which should I report to my CEO?
Share of voice is mentions across all media (PR, social, organic). Share of model is mentions inside LLM answers specifically, derived from a fixed prompt set. They are not interchangeable. Report share of model to your CEO when the conversation is about AI search performance, share of voice when the conversation is about brand awareness more broadly. Most articles in 2026 conflate the two, which is why the same brand can show 38% on one tool and 12% on another.
At what gap size is it cheaper to buy a tool than keep tracking manually?
When the manual baseline shows a sustained 10+ percentage-point gap to your top competitor across at least 4 engines, the cost of running the manual loop weekly (8 to 12 hours of analyst time) exceeds the cost of a $49 to $499 monthly subscription. Below a 10-point gap, manual is honest enough. Above it, you need automated cross-engine tracking plus paragraph-level rewrite suggestions, which is the wedge specialists like Clairon, Profound and Peec compete on.
Summarize with Claude
Summarize with Perplexity
Summarize with Google
Summarize with Grok
Summarize with ChatGPT