How Collimer Scores AI Visibility

Every score is reproducible. Here is exactly what we ask, who we ask, and how we calculate the result.

Model list last updated: June 12, 2026

Probing: OpenAI Anthropic Google Perplexity — current production models

Prompt categories

A standard scan fires 25 prompts across five buyer-journey stages — each generated fresh for your brand and domain. The category mix reflects where buyers actually are in their decision process.

Problem-led 30%

Buyers describe pain, not your name — if you're absent here, you're invisible at the top of the funnel.

"What tool helps a B2B SaaS team track churn?"

Category-aware 25%

When buyers already know the category they need, your brand must appear as a credible option.

"What are the best AI-visibility platforms for marketing teams?"

Brand 20%

Direct brand queries test whether AI actually knows what you do — a baseline of factual authority.

"Tell me about Acme Analytics and what they offer."

Alternative-seeking 15%

Buyers considering a switch ask for alternatives — appearing here turns competitive pressure into pipeline.

"What are the best alternatives to Competitor X?"

Comparison 10%

Direct head-to-head questions are the highest-intent queries at the bottom of the funnel.

"How does Acme Analytics compare to Rival Y?"

Category Prompts Share
Problem-led 8 30%
Category-aware 6 25%
Brand 5 20%
Alternative-seeking 4 15%
Comparison 3 10%
Total 25 100%

What your report looks like

Every scan produces these three components. Here is a static illustration using example data.

Visibility score

47 ±8

95% confidence interval — 4 runs × 25 prompts × 4 models

Example data only

Per-provider breakdown

GPT 62
Claude 51
Gemini 34
Perplexity 41
Example data only

Score trend (6 scans)

32 47

+15 pts over 6 scans

Example data only

Scoring formula

Each model response produces one result. Results are scored individually, then averaged across all prompts and models to produce the final 0–100 score.

Per-response score

score = base × mention × sentiment × citation × position_decay
  • base = 100
  • mention = 1.0 if brand appears, 0.0 if not
  • sentiment = 1.2 positive · 1.0 neutral · 0.7 negative
  • citation = 1.0 (wiring in progress; currently neutral)
  • position_decay = rank 1 → 1.00  · rank 2 → 0.85  · rank 3 → 0.72  · rank 4 → 0.61  · rank 5+ → 0.50

Final score

final = mean(per_response_scores), clamped to [0, 100]

Example — 3 prompts, 1 model

Prompt Mentioned Rank Sentiment Score
Awareness #1 yes 1 neutral 100 × 1.0 × 1.0 × 1.00 = 100
Evaluation #1 yes 3 positive 100 × 1.0 × 1.2 × 0.72 = 86
Comparison #1 no 100 × 0.0 × — × — = 0
Visibility score mean(100, 86, 0) = 62

Probed models

Each scan sends the same prompt set to every model below. Scores reflect real API responses — no caching, no mocking.

OpenAI Anthropic Google Perplexity

OpenAI · Anthropic · Google · Perplexity — current production models

Exact model IDs (credibility fine-print)
Provider Model IDs
Anthropic
  • claude-haiku-4-5
  • claude-sonnet-4-6
Google (Gemini)
  • gemini-2.5-flash
  • gemini-2.5-pro
OpenAI
  • gpt-4o
  • gpt-4o-mini
Perplexity
  • sonar-pro

Model IDs sourced from the active routing table (lib/beacon/llm/router.ex). Updated when models rotate.

Questions about the methodology? hello@sandcastlelabs.ai