For twenty years, "how visible are we?" had a clean answer: your rank. Tools checked your position for a keyword set every morning and drew a line chart. That instrument doesn't work anymore, because the surface it measured is being replaced. When a buyer asks an assistant "what's the best [your category] for [their situation]?", there is no position 4. There is a synthesized answer that names two or three brands — or doesn't name you at all.
The replacement metric is share of voice in AI answers: across the questions your buyers actually ask, how often do assistants bring you up, and how?
The five things worth tracking
A mention count alone is too crude. The useful dashboard has five layers:
- Mention rate — what percentage of category queries include your brand at all. This is the headline number.
- Position — when you appear, are you the first name or an afterthought in a trailing "other options include…"?
- Authority — are you the recommendation, or one of several hedged alternatives?
- Sentiment context — how does the model characterize you relative to competitors? "Powerful but complex" is a positioning problem you can't see in any analytics tool.
- Citation quality — does the answer link or attribute to your pages, or describe you from stale memory? (Stale descriptions are an entity grooming problem.)
How to actually measure it
You can't sit and type queries by hand — phrasing changes results, and one sample tells you nothing. The working pattern is a synthetic query loop:
- Generate query variants programmatically. "Best X", "top X", "X recommendations", "compare A vs B for [industry]" all retrieve differently. Cover the space.
- Run them on a schedule across ChatGPT, Claude, Gemini, and Perplexity. The four surfaces disagree more than you'd expect.
- Add persona variants: a CTO evaluating an enterprise deployment gets a different shortlist than a founder looking for something cheap. Models respond to framing; your buyers arrive with framing.
- Parse responses for mentions, position, and sentiment; store the results as a time series; alert on drops.
One discipline matters more than any tooling choice: test the problems you solve, not your brand name. "What does [YourCo] do?" flatters you and measures nothing. "How do I automate HIPAA compliance monitoring?" measures whether you exist in the buyer's actual decision path.
Why this is urgent, not just interesting
AI answers don't have a long tail. There is no page two to slowly climb. For a given category query, an assistant's shortlist typically holds a handful of names — and the evidence so far suggests only about 3–5 brands per category ever reach that reliable-recall status, the point where the model names them unprompted. Everything else is noise the model doesn't retrieve.
Worse, positions harden. Each training cycle reinforces the brands models already associate with a category, which makes displacement progressively more expensive — the practitioners tracking this estimate an 18–24 month window before today's shortlists calcify into defaults. If you start measuring now, you'll see whether you're consolidating a position or watching a competitor consolidate one over you. If you start in a year, you'll mostly be documenting a fait accompli.
Start small
You don't need Grafana on day one. Fifty queries, two assistants, a spreadsheet, and a weekly cadence will establish your baseline and catch the big moves. Scale the pipeline once the numbers start driving decisions — which they will, faster than rank reports ever did, because a missing mention maps directly to deals you were never considered for. The GEO playbook tells you what to fix; share of voice tells you whether it's working.
Measurement tells you where you stand in the answers; the Legible Readiness Index tells you why, across the eight machine-facing dimensions that drive it. Run a free report to pair your share-of-voice baseline with the specific gaps holding it down.