Step 02 · Measure — Score Visibility Across 5 AI Engines

Score visibility across 5 AI engines.

You cannot improve what you don't measure — and AI visibility measurement is where most platforms get it most wrong. Step 02 of the RevvUp.ai platform scores your brand against the prompt graph from Step 01, across all five major AI engines, with citation rate, mention rate, and source rate broken out as separate metrics. We refresh weekly, because AI engines update their indices and synthesis patterns continuously and monthly snapshots miss meaningful movement. The output is the diagnostic baseline that turns AI visibility from a vibes-based metric into something you can actually run a program against.

What you get out of Measure: A real, weekly-refreshed visibility score per engine and per prompt — distinguishing between when AI names your brand, when it links to your site, and when it uses you as a source without naming you.

Why most AI visibility scores are wrong

Three measurement mistakes show up in almost every AI visibility platform we've audited:

1. Treating "citation" and "mention" as the same thing. They're not. A citation is when an AI engine names you and links you. A mention is when AI names you without a link. Research shows roughly 85% of brand appearances in ChatGPT answers are mentions without citations. If you're only counting citations, you're undercounting your true visibility by 5–7×.

2. Reporting one unified "AI score." AI engines behave so differently that averaging across them hides the signal. A brand strong in Perplexity and absent in ChatGPT looks identical to a brand strong in ChatGPT and absent in Perplexity if you collapse the score. They need wildly different fixes.

3. Measuring monthly or quarterly. AI engines refresh their indices weekly or faster. Content decays in days. A monthly cadence catches roughly 25% of the meaningful movement happening in your category.

Measure is built around these three corrections.

The three metrics we score, separately

Every prompt in your intent graph gets scored on three independent dimensions per engine:

Citation rate

Definition: % of relevant AI responses that name your brand AND include a clickable link to your domain.

This is the headline metric most platforms report. It matters — citations drive direct traffic and represent the strongest single signal of AI engine trust. But it's the rarest outcome, and over-indexing on it misses the rest of your visibility surface.

Typical Shopify mid-market brand starting baseline: 3–8% across the engines combined. Strong: 15–30%. Best-in-class: 30%+.

Mention rate

Definition: % of relevant AI responses that name your brand, with or without a link.

This is the metric that matters most for revenue. Mentions shape consideration sets. A brand named three times across an AI shopper's research session enters their consideration set even if they never click. That's the upstream brand-building marketers used to pay for with TV — and AI is now delivering for free, to the brands that earn it.

Typical Shopify mid-market starting baseline: 8–18% across engines. Strong: 30–50%. Best-in-class: 50%+.

Source rate

Definition: % of relevant AI responses that use your domain as a source, regardless of whether your brand is named in the answer.

This is the leading indicator. Source-only links are AI engines trusting your content as a reference but not yet endorsing your brand. They precede mentions and citations on the same prompts — when your source rate goes up, mention rate typically follows 30–60 days later.

Typical Shopify mid-market starting baseline: 5–12% across engines.

Reporting all three separately, per engine, per prompt, is the difference between AI visibility you can act on and AI visibility you can guess at.

Per-engine measurement (not averaged)

Measure scores you separately against each of the five major engines because they retrieve from genuinely different parts of the web:

Engine	What we measure separately
ChatGPT	Native web mode citations, mentions in synthesis-only responses, Google Merchant Center product carousel surfacing, third-party aggregator mention coverage
Claude	Brand inclusion in long-form recommendations, qualifier language (Claude tends to use measured language — we capture nuance), training data vs. real-time retrieval responses
Perplexity	Numbered citation position, source diversity per response, freshness recency of cited content, sub-vertical authority signals
Copilot	Microsoft Shopping product surfacing, Bing index alignment, Copilot Checkout enrollment status, LinkedIn-sourced content citations
Gemini	AI Overviews citation, AI Mode separate citation, Knowledge Graph entity recognition, Google Business Profile signal weight

Why this matters in practice: a brand might score 45% mention rate in Perplexity and 8% in ChatGPT — a 5× gap that demands totally different fixes. Averaging hides that. Measure surfaces it.

How weekly refresh works

Most AI visibility platforms refresh monthly or on-demand. Measure runs weekly by default, with continuous refresh available for high-velocity categories.

What that gets you:

Week-over-week movement on every prompt. When a fix ships, you see the impact within 7–14 days, not 30–60. Our Audit and Fix steps depend on this — if the measurement loop is too slow, the optimization loop falls apart.

Decay detection. Content that loses citation priority shows up as a downward trend before competitors take your spot. You can intervene with a freshness update before the loss compounds.

Competitive movement tracking. When a direct competitor's score moves on a prompt you care about, you see it the same week. That's where most platforms are blind — they catch shifts after the share is already lost.

New prompt surfacing. As AI engines surface new prompts in your category (driven by trends, news cycles, viral products), they enter your tracking set within days, not next quarter's report.

Variance handling

A research note worth flagging: AI engines produce different recommendation lists more than 99% of the time when the same prompt is asked twice. One 2026 study found only 9.2% of cited URLs in Google AI Mode stay consistent across three runs of the same query.

This is the measurement-validity problem most AI visibility tools haven't solved. A single snapshot doesn't survive that variance — your reported score could swing 20+ points just from re-running the same query.

Measure handles this by running each prompt multiple times per engine per week, then reporting:

Median visibility (the stable signal)
Variance (how much the engine's answer shifts when you re-ask)
Trend (week-over-week, controlling for variance)

That's the difference between "you scored 47 last week and 62 this week" (probably variance) and "you scored 47±3 last week and 62±2 this week" (real movement). Most platforms report the first version. Measure reports the second.

What you actually see in the dashboard

The Measure dashboard is built around three views:

1. Overall visibility score, with the breakdown. A single overall score (0–100) for quick health-check, with citation rate, mention rate, source rate, and per-engine subscores immediately visible. Hover any subscore for the prompt-level detail.

2. Per-prompt scorecard. Every prompt in your intent graph, with its current visibility per engine, week-over-week movement, and the specific SKU that should be winning it. Sort by revenue opportunity, sort by movement, sort by gap to category leader.

3. Competitive view. Your visibility on each prompt alongside the top 3–5 competitors AI is currently citing. Share-of-voice per engine, share-of-voice per prompt, and the brands AI is most often pairing your brand with (co-citation patterns matter — they reveal how AI engines view your competitive set, which is sometimes very different from how you view it).

What Measure intentionally avoids

A few things we deliberately don't do:

No vanity composites. "Your AI brand health score is 73" with no breakdown isn't actionable. Every number traces to specific prompts and specific engines.
No keyword-density throwbacks. Counting word matches in AI responses isn't measurement. We're scoring actual brand appearances and source attributions, not lexical pattern matching.
No "AI sentiment" without grounding. Some platforms report whether AI is "positive or negative" about your brand. That's mostly noise. Where Claude phrases recommendations with qualifiers (which it tends to do — see Claude), we capture the nuance; we don't reduce it to a happy/sad score.
No black-box ranking. Every score is explainable. You can click through to see the exact prompts driving it, the exact responses AI engines gave, and the exact sources being cited.

What happens next

Measure feeds directly into Step 03 · Audit. Once you know where your visibility sits, Audit diagnoses why — the specific structural, content, and signal gaps causing each shortfall. From there, Step 04 · Fix turns that diagnosis into a ranked queue of concrete moves you can push to Shopify in one click.

Run a free RevvUp.ai audit to see your Measure baseline — no integration, no credit card, just your Shopify URL.

Frequently asked questions

Questions

Citation rate is when AI names you AND links to your site. Mention rate is when AI names you with or without a link (the broader, more important metric — most AI brand appearances are mentions without citations). Source rate is when AI uses your domain as a reference without naming you (the leading indicator that often precedes mentions and citations).

AI engines refresh their indices and synthesis patterns continuously. A monthly cadence catches roughly 25% of meaningful week-over-week movement in your category. By the time you see a problem on a quarterly report, the share has often already been ceded to a competitor.

We run each prompt multiple times per engine per week and report median visibility, variance, and trend separately. That's the difference between confusing variance for real movement and actually detecting signal. Reporting a single-snapshot score is unreliable when AI responses shift 20+ points across runs of the same query.

Yes — by default. Your visibility on each prompt is reported alongside the top 3–5 competitors AI is currently citing, with share-of-voice and co-citation patterns. This often surfaces competitive sets you wouldn't have identified yourself, because AI engines sometimes pair brands very differently from how marketers do.

Highly category-dependent. Starting baselines for beauty and supplements are typically 8–18% mention rate; strong programs hit 30–50%; best-in-class brands reach 50%+. Apparel and home tend to start lower because the visual-text mismatch and consideration cycle are different. We benchmark you against your category, not against an absolute number.