How Collimer Scores AI Visibility

Every score is reproducible. Here is exactly what we ask, who we ask, and how we calculate the result.

Model list last updated: July 28, 2026

Probing: OpenAI Anthropic Google Perplexity — current production models

Prompt categories

A standard scan fires 25 prompts across five buyer-journey stages — each generated fresh for your brand and domain. The category mix reflects where buyers actually are in their decision process.

Problem-led 30%

Buyers describe pain, not your name — if you're absent here, you're invisible at the top of the funnel.

"What tool helps a B2B SaaS team track churn?"

Category-aware 25%

When buyers already know the category they need, your brand must appear as a credible option.

"What are the best AI-visibility platforms for marketing teams?"

Brand 20%

Direct brand queries test whether AI actually knows what you do — a baseline of factual authority.

"Tell me about Acme Analytics and what they offer."

Alternative-seeking 15%

Buyers considering a switch ask for alternatives — appearing here turns competitive pressure into pipeline.

"What are the best alternatives to Competitor X?"

Comparison 10%

Direct head-to-head questions are the highest-intent queries at the bottom of the funnel.

"How does Acme Analytics compare to Rival Y?"

Category	Prompts	Share
Problem-led	8	30%
Category-aware	6	25%
Brand	5	20%
Alternative-seeking	4	15%
Comparison	3	10%
Total	25	100%

What your report looks like

Every scan produces these three components. Here is a static illustration using example data.

Visibility score

47 ±8

95% confidence interval — 4 runs × 25 prompts × 4 models

Example data only

Per-provider breakdown

GPT 62

Claude 51

Gemini 34

Perplexity 41

Example data only

Score trend (6 scans)

+15 pts over 6 scans

Example data only

Scoring formula

Each model response produces one result. Results are scored individually, then averaged across all prompts and models to produce the final 0–100 score.

Per-response score


              score = base × mention × sentiment × citation × position_decay

base = 100
mention = 1.0 if brand appears, 0.0 if not
sentiment = 1.2 positive · 1.0 neutral · 0.7 negative
citation = 1.0 (wiring in progress; currently neutral)
position_decay = rank 1 → 1.00 · rank 2 → 0.85 · rank 3 → 0.72 · rank 4 → 0.61 · rank 5+ → 0.50

Final score


              final = mean(per_response_scores), clamped to [0, 100]

Example — 3 prompts, 1 model

Prompt	Mentioned	Rank	Sentiment	Score
Awareness #1	yes	1	neutral	100 × 1.0 × 1.0 × 1.00 = 100
Evaluation #1	yes	3	positive	100 × 1.0 × 1.2 × 0.72 = 86
Comparison #1	no	—	—	100 × 0.0 × — × — = 0
Visibility score				mean(100, 86, 0) = 62

Probed models

Each scan sends the same prompt set to every model below. Scores reflect real API responses — no caching, no mocking.

OpenAI Anthropic Google Perplexity

OpenAI · Anthropic · Google · Perplexity — current production models

Exact model IDs (credibility fine-print)

Provider	Model IDs
Anthropic	claude-haiku-4-5 claude-sonnet-4-6
Google (Gemini)	gemini-2.5-flash gemini-2.5-pro
OpenAI	gpt-4.1 gpt-4.1-mini
Perplexity	sonar-pro

Model IDs sourced from the active routing table (lib/beacon/llm/router.ex). Updated when models rotate.

Questions about the methodology? hello@sandcastlelabs.ai