How citations work
The methodology behind AIRRNK's citation tracker, including sampling, query construction, and deduplication.
title: "How citations work" slug: "how-citations-work" description: "The methodology behind AIRRNK's citation tracker, including sampling, query construction, and deduplication." group: "Core Concepts" order: 20
A citation in AIRRNK is: a model response, generated in reply to one of your tracked queries, that includes either a direct URL reference to your site or a verbatim paragraph-level snippet that traces to one of your indexed pages. Both count. Neither is trivial to detect. This page explains how we do it.
Platforms we track
| Platform | Model(s) | Retrieval layer | Refresh |
|---|---|---|---|
| ChatGPT | GPT-4.1 + o-series with browsing | Bing grounding | 6 h |
| Claude | Claude 4.5 / 4.6 / 4.7 web search | First-party index | 6 h |
| Perplexity | Sonar / Claude / GPT auto-routed | Perplexity index | 6 h |
| Google AI Mode | Gemini with Google grounding | Google Search | 6 h |
We run a clean session per query — no memory, no chat history — so results reflect the model's baseline priors for that prompt, not a personalization artifact.
Query list
You curate it. We seed it with 10–15 suggestions derived from your content, your category, and competitor pages. You edit, add, remove. The quality ceiling of your citation tracking is set by this list.
Good queries are buyer questions with intent — "what's the best X for Y", "how do I do Z", "is A a good alternative to B". Bad queries are your brand name (you'll always get a citation for those) or pure technical lookups (models route those to the docs site of whoever owns the underlying tech).
Sampling
For each platform, each query, each refresh cycle:
- Issue the query.
- Parse the response into (a) the prose answer, (b) the sources panel (if any), (c) inline citation links.
- For each candidate citation, match against your site's known URLs and content fingerprints.
- Score the citation:
url_match(direct link),snippet_match(verbatim span),paraphrase_match(semantic similarity above 0.88 cosine on an sentence embedding). - Deduplicate across snippet variants.
- Log the raw response for your audit trail.
We do not synthesize citations. Every entry in your tracker is traceable back to a specific model response we captured.
Competitor detection
On the same response, we look for citations of competitors you've declared in your site settings. This is how the "you vs competitor" view is populated. If you haven't declared competitors, we auto-infer a candidate list from your category and show those as ghosts — click any one to promote it to a tracked competitor.
Why the numbers move day-to-day
LLM responses are non-deterministic. Even with temperature 0, retrieval layers introduce variability — Bing re-ranks, Perplexity's index is refreshed hourly, Google AI Mode's grounding layer rotates sources. A 10–15% week-over-week variance on any single query is normal. We smooth the public dashboard over a 7-day trailing window; the raw per-query timeseries is available in the advanced view.
What gets counted, what doesn't
Counted:
- Direct URL citations to your site.
- Verbatim snippets of 15+ tokens traceable to your content.
- Paraphrases above the semantic threshold with no plausible alternative source.
Not counted:
- Generic category mentions without attribution ("many brands in this space…").
- Citations where your URL appears in the response but the snippet traces elsewhere — that's a mention, not a citation, and it's tracked separately.
- Your own site's internal tooling responses (e.g. if your chatbot was indexed).
Exporting the data
Every citation event can be exported as CSV or Parquet from Dashboard → Citations → Export. The same data is available via GET /api/sites/:id/citations — see the API reference.