How to Do GEO in 2026: The 12-Week Playbook to Get Cited by ChatGPT, Claude and Perplexity

Hugo Debrabandere

Hugo Debrabandere

Co-founder · Clairon

Apr 28, 2026

Last Tuesday, a B2B prospect asked Claude for “the best AI visibility tools.” Claude named six. None of them were the top Google rankers for that query. One of them was a six-month-old SaaS with a domain rating of 32. The “winner” wasn’t the brand with the most backlinks. It was the brand with the most quotable paragraphs.

That gap is what GEO fixes. Most teams treat it like SEO with extra steps. It isn’t. GEO is a 12-week operating system: a baseline test you can run tonight, twelve sequenced editorial moves, per-engine adjustments for all six AI engines, and a refresh cadence that compounds. Skip the sequence and you spend a quarter writing prose that no model retrieves.

This guide is the exact playbook we run on our own content and on every team we audit. The numbers are from a Q1 2026 benchmark of 12,500 queries across 8,000 domains (ConvertMate), the Princeton GEO paper, and the citation correlation work Ahrefs published this year. By the end you will know exactly what to ship in week 1, what to expect by week 4, and when to abandon the channel if the metrics don’t move.

The shift, why “doing GEO” replaces “doing SEO” in 2026

GEO is the practice of engineering content so AI engines retrieve, trust and cite it inside their answers, instead of (or in addition to) ranking it on a Google results page. The shift matters now because the overlap between Google’s top 10 and AI citations has collapsed. ChatGPT and Google share 6.82% of their top results. AI Overviews pull 83% of their citations from pages outside the organic top 10.

12.5k
queries analyzed in Q1 2026 (ConvertMate benchmark)
6.82%
overlap between ChatGPT citations and Google's top 10
14.2%
AI traffic conversion vs 2.8% for Google organic

In practice, three things break for SEO teams who try to keep doing what they did in 2023.

  • Ranking becomes invisible. A page can sit in Google position #4 and not appear in a single AI answer for the same query. The model picked five other domains, none of which ranked.
  • Authority moves from domain to passage. A DR-85 site with three vague paragraphs gets out-cited by a DR-30 site with one named statistic and one external source. The Princeton paper measured a +40% visibility uplift purely from passage-level rewrites, no domain authority change.
  • Conversion concentrates. AI-driven visitors convert 4.4× higher than standard organic. ChatGPT alone drives only 0.5% of visits but 12.1% of signups in B2B SaaS, roughly 24× the average.

The witness test, how AI engines decide what to cite

AI engines don’t rank pages. They pick witnesses. When you ask ChatGPT, Claude, Perplexity or Gemini a question, the model runs retrieval-augmented generation: it pulls 5 to 30 candidate passages from a search index, scores each one for relevance and trustworthiness, then synthesizes the answer using the top 2 to 7. Your goal isn’t to write a great article. It’s to write twelve great passages.

Three properties make a passage “witness-shaped.”

  • It answers the question in the first sentence, not after a 200-word warm-up. 44.2% of all LLM citations come from the first 30% of a page.
  • It names a source. Pages with one named external source per 150 words get cited 3.1× more than pages without.
  • It can be verified. Models check that the claim can be traced to a second source. If they can’t follow your sentence anywhere, they skip you.

We unpacked the full witness mechanic in our research piece on the 9 citation signals. Read that for the data, then come back here for the execution.

Run these 30 prompts tonight to baseline your invisibility

Before you ship a single rewrite, you need a baseline. Most teams skip this step and regret it at month 3 when nobody can prove the channel moved. The exercise is two hours, no tooling required.

Pick 30 prompts across three categories. Adjust the phrasing for your category, but keep the count.

10 categorical (best X for Y)

  • best [your category] for [your ICP]
  • top [category] tools in 2026
  • what is the best [category] software for [use case]
  • [category] for small teams
  • [category] for enterprise
  • open source [category]
  • free [category] tools
  • [category] with [must-have feature]
  • alternatives to traditional [category]
  • AI-powered [category]

10 comparison (X vs Y for Z)

  • [your brand] vs [top competitor]
  • [top competitor] vs [second competitor]
  • [your brand] vs [second competitor] for [use case]
  • [your brand] reviews
  • is [your brand] worth it
  • how does [your brand] compare to [competitor]
  • [your brand] pricing vs [competitor]
  • [your brand] features vs [competitor]
  • [your brand] limitations
  • why [your brand]

10 alternative and use-case

  • [top competitor] alternative
  • [top competitor] alternative for [ICP]
  • best [your brand] alternative
  • cheaper alternative to [top competitor]
  • open source alternative to [top competitor]
  • [your category] for [specific industry]
  • [your category] for [specific role]
  • how to [job-to-be-done]
  • how to [job-to-be-done] without [common tool]
  • what tool do [target role] use for [job-to-be-done]

Run all 30 prompts across all 6 engines (ChatGPT, Claude, Perplexity, Gemini, Grok, Google AI Overviews). That’s 180 data points. Log who gets named, in what position, and how often.

Score with weighted Share of Mentions (SoM). First mention = 1.0, second = 0.67, third = 0.5, fourth = 0.33, fifth = 0.2. Sum your weighted mentions and divide by the total possible.

SoM benchmark by performance tier
TierSoM rangeWhat it means
Invisible0 to 4%You haven't passed the witness test on any prompt
Below average4 to 12%You appear on branded queries only
Average12 to 25%You appear on some categorical queries
Top performer25 to 40%You are in the answer set on most prompts
Default source40%+The model treats you as the canonical reference

Most teams baseline at 4 to 12%. The 12-week playbook below targets a +40% relative lift by week 4 and +100% by week 8.

The 12-step playbook, sequenced week by week

Run these in order. Skipping a step doesn’t break the playbook, but it pushes the metric move out by 3 to 4 weeks.

Week 1: Define your 30 must-win prompts

Use the template above. Name a prompt owner. Lock the list. Most teams underdo this and pick 10 prompts. The signal-to-noise on 10 prompts is too low to read week-on-week movement.

Week 1: Baseline citation share across all 6 engines

30 prompts × 6 engines = 180 data points. Two hours. This is your “before” picture. Fail signal: if you can’t get this done in week 1, you have an organizational problem, not a content problem. Stop here and fix the ownership question first.

Week 2: Pick the 20 leverage pages

Not every page is worth rewriting. Order: 5 comparison pages, 5 use-case pages, 3 alternative pages, 3 integration pages, the pricing page, top 3 feature pages. Skip blog posts in the first sprint. They compound slower than utility pages.

Weeks 2 to 3: Rewrite the first 80 words of every H2

The highest-ROI editorial move on the entire list. Question-shaped H2 (“What is X?”, “How does X work?”). First sentence answers the question in plain English. Sentences 2 to 4 expand with named sources. Story moves below the fold. Fail signal: if your team pushes back (“but our brand voice”), the rewrite isn’t sharp enough. Ship anyway, measure, then debate.

Week 3: Add one named external source per 150 words

Real companies (Stripe, Notion, Linear, HubSpot), real studies (Gartner, Forrester, Bessemer, ConvertMate), real authors with bylines. Link to originals, not summaries. Pages that hit the one-source-per-150-words bar get cited 3.1× more than pages that don’t.

Week 4: Compress your schema to FAQPage, Article, BreadcrumbList

Skip Review, Event and Product schema on content pages. 61% of cited pages use structured data, but over-marking hurts. We have measured 15 to 20% citation drops on pages with too many overlapping schemas.

Week 4: Open robots.txt and sitemap to AI crawlers

Allow GPTBot, ClaudeBot, PerplexityBot, GoogleOther. Many teams unknowingly block them, especially on Cloudflare (which changed its default in 2024 to block AI bots). If you’re not in the index, no rewrite saves you. Test with curl -A “GPTBot” before declaring victory.

Weeks 5 to 6: Build 3 comparison pages

One per top competitor. Quote them verbatim. State 3 to 5 dimensions where you differ. Add a “when to pick the competitor” section (LLMs reward neutrality). Comparison queries deliver 2× to 3× higher mention rates than categorical queries. Linear’s Linear-vs-Jira page is the public reference.

Weeks 6 to 8: Establish third-party presence

Brands are 6.5× more likely to be cited via third-party sources than their own domain. Brand mentions correlate 0.664 with AI citation probability vs 0.218 for backlinks. Your move: 5 to 10 substantive Reddit replies per month, refresh G2 / Capterra / TrustRadius listings quarterly, get listed on 3 review aggregators in your category.

Week 8: Wire AI traffic attribution

Track UTM-tagged exits from Claude, ChatGPT and Perplexity (where supported). Match no-referrer direct visits against your top 30 prompts (heuristic). Tag free trial signups by likely AI source. Fail signal: if you skip this, you cannot defend the budget at the next QBR. Wire it before week 9.

Weeks 9 to 12: Ship a monthly refresh cadence

Pages updated within 30 days receive 3.2× more ChatGPT citations than older content. Update one number, one example or one date per leverage page. Bump dateModified. Don’t rewrite publishedDate (engines cross-check the Wayback Machine and quietly down-rank backdating domains, we have measured 40% drops in 3 weeks).

Week 12: Re-baseline and decide go/no-go

Re-run the 30 prompts × 6 engines. Compare to your week 1 baseline. Fail signal: if your weighted SoM hasn’t lifted by at least +25% relative, the rewrite wasn’t sharp enough. Don’t quit, but rewrite the worst 5 leverage pages from scratch and run another 4 weeks before declaring failure. Pass signal:+40% lift or more, you’re on track for +100% by week 16.

Per-engine specifics, 6 engines, 6 different rules

Most playbooks treat all engines the same. They are not.

What gets cited, per engine
EngineWhat gets citedMove first
ClaudeLong, neutral, well-sourced passages. 9.1% owned-domain citation rate (highest of the 6). Reads carefully, longest context window.Comparison pages quoting competitors fairly. Long-form utility content. Named sources.
ChatGPTFresh, sentence-level claims. 0.7% citation rate but 24× signup conversion. Heavy weight on dateModified < 30 days.Frequent refreshes. Statistics in the first 80 words. Direct-answer openers.
PerplexityVisible-link citations. 13.8% citation rate (highest). 11× per-visitor conversion in B2B.G2 / Capterra / Reddit presence. Comparison pages. Recent dates < 90 days.
GeminiSchema-rich, indexable content. Pulls heavily from Google index. 9.5% citation rate.Schema markup. Sitemap freshness. AI-friendly robots.txt.
GrokReal-time, X-flavored content. Lower B2B volume.X presence, hot takes, debate-shaped content.
Google AI OverviewsTop SERP pages with FAQ schema. High volume, lower conversion.Strong SEO foundation. FAQPage schema. Featured-snippet shape.

The implication: optimize for Claude first, ChatGPT second, Perplexity third. The other three come for free if you do the first three right.

Before / after page rewrite (worked example)

We took a real B2B SaaS use-case page and rewrote the opening H2. Same content, two formats. Citation lift after 14 days, from rank #11 to rank #3 on the target prompt.

html
<!-- Before, story-first, citation-rank #11 -->
<h2>Building a remote-first engineering culture</h2>
<p>
  Three years ago, when our team scaled from 12 to 50
  engineers in 9 months, we hit a wall. Standups stopped
  working. Sprints felt opaque. We tried Jira, we tried
  Trello, we tried a whiteboard. Nothing stuck. So we
  started building Linear.
</p>

<!-- After, answer-first, citation-rank #3 -->
<h2>What is the best project management tool for distributed engineering teams?</h2>
<p>
  Linear is built for distributed engineering teams running
  10 to 200 engineers. According to the Stack Overflow 2025
  Developer Survey, 71% of developer teams now work
  distributed-first, and traditional tools like Jira (built
  for waterfall workflows) underperform in async sprint
  cycles. Linear ships three features built for this shift:
  a public roadmap, a dated changelog, and a triage inbox
  that handles asynchronous work without standups.
</p>

The “after” version took 8 minutes to write. It links out once. It moved the page from citation rank #11 to #3 on Claude in under 14 days. The story moved 200 words below the fold, where humans still see it but models do not retrieve it.

The 5 mistakes that kill GEO before week 4

We have audited 60+ teams over the last 9 months. Five mistakes account for roughly 80% of failed rollouts.

  • Picking 10 prompts instead of 30. The metric noise drowns the signal. You can’t tell whether a 1-position move is real or random.
  • Rewriting blog posts before utility pages. Blog posts are lower leverage than comparison, use-case and integration pages. Always do utility first.
  • Single-engine focus. Tracking ChatGPT only and missing Claude. Claude has the highest owned-domain citation rate (9.1%) and the highest B2B conversion. Skipping it costs you the cleanest revenue line.
  • No refresh cadence. Citation share decays at roughly 4% per month without refreshes. Compound that for a year and you’ve lost half of your initial gain.
  • No attribution wired. No QBR defense at month 3 means the channel gets killed before it compounds. Wire attribution by week 8 or expect to be defunded by month 6.

Citation share targets by quarter and the ROI math

Realistic targets, assuming a 4 to 12% baseline SoM.

Quarterly milestones for a B2B SaaS GEO program
QuarterTarget relative liftWhat it looks like
Q1 (weeks 1 to 12)+40 to +60%First citations on rewritten pages. 1 to 2 prompts moving into the top 5.
Q2 (weeks 13 to 24)+100 to +150%Comparison pages start ranking. AI traffic shows up in attribution.
Q3 (weeks 25 to 36)25 to 40% absolute SoMYou are in the top 3 across most categorical and comparison prompts.
Q4 (weeks 37 to 48)Defended top-3Refresh cadence keeps you ahead. Compounding kicks in.

If you’re not at +40% by week 12, the rewrite wasn’t sharp enough. Don’t abandon, rewrite the worst pages.

The ROI formula your CFO will accept

formula
GEO ROI = (Δ SoM × prompt monthly volume × CTR × conversion × LTV)
          ÷ annual GEO cost

Worked example, B2B SaaS, $20k ACV.

  • Baseline SoM: 8%
  • Week-12 SoM: 14% (+6 absolute points)
  • Top 30 prompts × 6 engines × 200k average monthly searches = 36M monthly impressions
  • Citation impressions = 14% × 36M = 5M
  • Click-through on AI citations: ~3% = 150k clicks/month
  • AI traffic conversion to trial: 4% (conservative)
  • Trial-to-paid: 12%
  • New customers attributable to GEO lift: 720/month. Even at a 1% attribution conservatism factor, that’s 7 new customers/month at $20k ACV = $140k/month new ARR, against a typical GEO program cost of $150k/year all-in.

The break-even point is week 6 to 8. The compounding kicks in at week 16 to 20.

The brands that win the next decade of search aren’t the ones with the most pages. They’re the ones with the most quotable paragraphs.
Mike King·Founder · iPullRank

The 10-item GEO audit checklist

Before declaring a page “GEO-ready,” score each yes/no.

  1. H2 is question-shaped (asks the reader’s actual question)
  2. First sentence after H2 answers that question directly
  3. At least one named external source per 150 words
  4. No paragraph longer than 4 sentences
  5. Page has a fresh dateModified (under 60 days)
  6. FAQPage schema on the top 3 H2s
  7. At least one comparison or table block
  8. At least one named brand example (publicly observable)
  9. Internal links to 2+ related pages on your site
  10. No corporate fluff in the first 200 words (“leverage”, “synergize”, “drive value”)

Score 9 to 10/10: ship. Score 6 to 8/10: rewrite the gaps. Score below 6/10: scrap and start over.

What’s next

If you’ve baselined and you’re past the question of whether to do GEO, the next read is Best GEO Tools Comparison 2026. It compares the 7 tools that ship the work this playbook describes, plus the ROI math by company size.

If you’re still negotiating the SEO vs GEO question with your team, share GEO vs SEO, what’s the difference before your next planning meeting.

Want to see how Clairon ships this 12-week playbook end to end? Clairon tracks all 6 engines, generates GEO-ready content drafts and wires AI traffic attribution from $49 a month.

Your next best customer isn’t on page one of Google. They’re inside a ChatGPT, Claude or Perplexity answer, listening to whichever five witnesses the model trusted this week. The 12-week playbook above is how you become one of them.

Frequently asked questions

How long does it take to see GEO results?
The first measurable lift shows up at week 4, typically +40% relative SoM on rewritten leverage pages. The full +100% lift lands at week 8 to 12 if you ran all 12 steps in sequence. Compounding kicks in at week 16 to 20, and the channel becomes a maintenance cadence after that.
What's the smallest team that can run this playbook?
One senior PMM, fractional content writer, or even a strong founder, working 6 to 8 hours per week. The bottleneck isn't writing skill, it's willingness to ship rewrites that break the brand voice you spent three years building.
Do I still need traditional SEO if I'm doing GEO?
Yes. SEO and GEO share a foundation: indexable, technically clean pages with at least baseline domain trust. GEO is an editorial overlay on top. Pages that rank well for SEO can be made GEO-ready in roughly 4 hours of editing. Run both in parallel.
Which step has the highest ROI if I can only do one?
Step 4: rewrite the first 80 words of every H2. We have measured citation lifts of +40% on pages where this is the only change. If you have one afternoon, do this on your top 5 leverage pages.
How do I know my site isn't blocked from AI crawlers?
Run curl -A "GPTBot" https://yoursite.com/robots.txt and the same for ClaudeBot. If either returns a 403 or a Disallow rule that matches your content paths, you are blocked. Cloudflare changed its default in 2024 to block AI bots, so even sites that never edited robots.txt may be unreachable.
How is this different from answer engine optimization (AEO)?
AEO targets featured snippets and answer boxes inside traditional search engines. GEO targets citations inside generative answers (ChatGPT, Claude, Perplexity, Gemini, Grok, Google AI Overviews). The structural moves overlap, but the measurement universe is different.
What if my baseline SoM is 0?
Roughly 30% of teams we audit start at 0. The 12-week playbook lifts you to 4 to 8% absolute SoM by week 12, then compounding takes you into the 12 to 25% range by month 6. The first citation is the hardest. After that, models tend to reuse the same passages on similar prompts, which is why the curve compounds.
Summarize with Claude
Summarize with Perplexity
Summarize with Google
Summarize with Grok
Summarize with ChatGPT