How to Automate AI Search Share-of-Voice Benchmarking with Claude Code and OpenClaw
Stop manually spot-checking ChatGPT and Perplexity for brand mentions. This guide shows how to build a Claude Code + OpenClaw agent pipeline that runs continuous AI search share-of-voice benchmarks, flags competitor gains, and feeds structured data to a dashboard your team will actually use.
- Category: AI Visibility
- Use this for: planning and implementation decisions
- Reading flow: quick summary now, long-form details below
If you’re still manually checking whether ChatGPT mentions your brand—running queries one by one, copy-pasting results into a spreadsheet—you’re spending hours on a problem that can run unattended overnight.
AI search share-of-voice (SOV) benchmarking is not complicated conceptually: ask a set of relevant queries, record which brands appear in the answers, and track the percentages over time. The challenge is doing it consistently, at scale, across multiple AI engines, without turning it into someone’s full-time job.
This guide covers how to build that pipeline using Claude Code agents and OpenClaw skills libraries. We’ll go from raw query list to a structured dataset your team can actually use—with competitor alerts built in.
Why Manual AI Search Monitoring Falls Apart
Most teams start by spot-checking. A product manager runs a few ChatGPT searches before a board meeting. A content lead glances at Perplexity results when a competitor launches. It feels like enough until it isn’t.
The problems compound fast:
- Query drift: The searches you pick are inconsistent between checks. You’re not measuring the same thing twice.
- Coverage gaps: AI search SOV isn’t one number—it varies by query intent, query phrasing, engine, and date. Spot-checks miss most of this variance.
- No baseline: Without historical data, you can’t tell whether a shift is noise or signal.
- Competitor blind spots: You may not notice a competitor gaining share until they’ve locked in citations across dozens of query categories.
Automating the pipeline solves all four. The key is structuring it correctly from the start.
The Core Architecture
A reliable AI search SOV pipeline has four components:
- Query library — A structured set of queries organized by intent category
- Data collection layer — Agents that submit queries and parse brand mentions from AI answers
- Storage and normalization — A lightweight schema that makes results comparable over time
- Reporting layer — A dashboard or digest that surfaces changes and flags anomalies
Claude Code handles the agent logic. OpenClaw skills handle the integrations—API calls, data writes, notification delivery. BotSee sits at the data collection layer, providing structured citation and mention data across ChatGPT, Claude, Perplexity, Gemini, and other AI engines without requiring you to build and maintain direct API wrappers for each one.
Let’s build each piece.
Step 1: Design a Query Library That Produces Comparable Data
Your query library is the foundation. Poor query design produces noise; clean query design produces signal.
Organize by intent category
Group queries into three buckets:
Category queries (What AI says about your market)
"best tools for tracking ai search citations"
"how do companies monitor chatgpt brand mentions"
"ai answer engine optimization platforms"
Problem queries (What AI says when someone has your customer’s problem)
"how do i know if my brand appears in chatgpt answers"
"how to measure ai search visibility for my company"
"why doesn't my product show up in perplexity results"
Competitor-adjacent queries (What AI says in contexts where your competitors are named)
"alternatives to [competitor name]"
"[competitor name] vs other ai visibility tools"
"[competitor name] pricing and reviews"
Keep phrasing stable
Vary phrasing between categories, but keep each individual query identical across runs. If you change a query mid-flight, you lose comparability. Track query versions with a query_id field and timestamp any changes.
Target 50–150 queries for a meaningful baseline
Fewer than 50 queries and you’re likely missing important intent clusters. More than 150 and you’re probably duplicating coverage. Start at 50, add queries in batches as you identify gaps.
Step 2: Build the Data Collection Agent
The collection agent runs your query library against target AI engines and extracts structured mention data from each response.
Skill setup in OpenClaw
Create a skill directory in your OpenClaw workspace:
skills/
ai-sov-collector/
SKILL.md
collect.mjs
normalize.mjs
schema.json
The SKILL.md defines what the agent can do:
# AI SOV Collector Skill
Submits queries to AI search APIs and extracts brand mentions.
## Inputs
- query_list: path to JSON file with query library
- engines: array of target engines (chatgpt, perplexity, claude, gemini)
- brands: array of brand names to track (your brand + competitors)
- output_path: where to write normalized results
## Outputs
JSON file per run with schema: { run_id, date, engine, query_id, query_text,
brands_mentioned: [{brand, position, context_snippet}], raw_answer }
Collection logic
The collection script iterates your query list, calls the API for each engine, and runs a brand extraction pass on each response:
// collect.mjs (simplified)
async function collectRun(queryList, engines, brands, outputPath) {
const runId = `run-${Date.now()}`;
const results = [];
for (const query of queryList) {
for (const engine of engines) {
const answer = await callEngine(engine, query.text);
const mentions = extractMentions(answer, brands);
results.push({
run_id: runId,
date: new Date().toISOString(),
engine,
query_id: query.id,
query_text: query.text,
query_category: query.category,
brands_mentioned: mentions,
raw_answer: answer
});
}
}
await fs.writeFile(outputPath, JSON.stringify(results, null, 2));
return runId;
}
The extractMentions function does a normalized string match across the answer text, recording both whether the brand appeared and its approximate position (first third, middle, last third of the response). Position matters—early mentions in AI answers carry more weight.
Using BotSee’s API for structured data
If you’re pulling data from multiple engines, building and maintaining direct integrations with each AI provider’s API is non-trivial. Engine behaviors, rate limits, and response formats differ. BotSee provides a unified API that returns structured citation data—brand mentions, position, context—across engines without requiring separate integrations.
For high-frequency collection (daily or multiple times per week across 100+ queries), using a purpose-built data layer here saves meaningful engineering time and produces more consistent normalization than rolling your own.
Step 3: Normalize and Store Results
Raw JSON per run isn’t useful for trend analysis. You need a normalized schema that makes cross-run comparison easy.
Schema design
{
"run_id": "run-1741305600000",
"run_date": "2026-03-07",
"engine": "chatgpt",
"query_id": "cat-001",
"query_category": "category",
"brand": "YourBrand",
"mentioned": true,
"position_bucket": "early",
"sov_weight": 1.0
}
The sov_weight column lets you apply position weighting: early mentions score 1.0, middle 0.6, late 0.3, not mentioned 0. This gives you a weighted SOV metric that reflects how prominently the brand appears, not just whether it appears.
Aggregation queries
Store the normalized rows in a SQLite file or a lightweight Postgres instance. Two aggregation queries cover most reporting needs:
Weekly SOV by brand and engine:
SELECT
run_date,
engine,
brand,
AVG(sov_weight) as avg_sov,
COUNT(CASE WHEN mentioned THEN 1 END) * 1.0 / COUNT(*) as mention_rate
FROM sov_results
WHERE run_date >= date('now', '-7 days')
GROUP BY run_date, engine, brand
ORDER BY run_date DESC, avg_sov DESC;
Category breakdown (where are you winning/losing?):
SELECT
query_category,
brand,
AVG(sov_weight) as avg_sov
FROM sov_results
WHERE run_date >= date('now', '-30 days')
GROUP BY query_category, brand
ORDER BY query_category, avg_sov DESC;
Step 4: Set Up Automated Runs with Claude Code
The collection agent runs on a schedule. Claude Code + OpenClaw makes this straightforward to configure without a dedicated cron server.
Scheduling the run
In your OpenClaw heartbeat or cron config:
# Run AI SOV collection every Monday and Thursday at 6 AM UTC
0 6 * * 1,3 openclaw run ai-sov-collector --config sov-config.json
The agent reads the config, runs the collection, writes normalized results, and triggers the reporting step automatically.
Config file
{
"query_library": "./queries/sov-queries.json",
"engines": ["chatgpt", "perplexity", "claude"],
"brands": ["YourBrand", "CompetitorA", "CompetitorB", "CompetitorC"],
"output_dir": "./data/sov-runs/",
"db_path": "./data/sov.db",
"alert_threshold": 0.15
}
The alert_threshold triggers a notification if any competitor’s SOV changes by more than 15 percentage points between runs—a signal worth investigating.
Step 5: Build the Reporting Layer
Data is only useful if it surfaces to the right people at the right time.
Weekly digest via OpenClaw skill
Create a sov-reporter skill that generates a structured digest from the aggregated data:
# Weekly AI Search SOV Report — {{date}}
## Summary
- **Your Brand SOV (weighted avg, all engines):** 34% (+2 pts vs last week)
- **Biggest mover:** CompetitorA gained 8 pts on Perplexity (category queries)
- **Category gap:** "ai monitoring tools" intent — you appear in 28% of answers vs CompetitorB at 61%
## By Engine
| Engine | Your Brand | CompetitorA | CompetitorB |
|--------|-----------|-------------|-------------|
| ChatGPT | 41% | 22% | 31% |
| Perplexity | 29% | 38% | 27% |
| Claude | 33% | 19% | 44% |
## Queries to Address
These queries show zero brand mention for you but competitor presence:
- "how to measure ai answer engine brand share" (CompetitorB appears in 80% of runs)
- "perplexity citation tracking for enterprise" (CompetitorA appears in 70% of runs)
This digest goes to Telegram (or Slack, or email) via your OpenClaw messaging integration. The people who need it get it in the channel they’re already in, without logging into a separate tool.
Anomaly alerts
The alert layer runs after each collection run. If the normalized data shows a threshold breach:
// In your collection agent's post-run step
if (Math.abs(currentSov - previousSov) > config.alert_threshold) {
await openclaw.notify({
channel: 'telegram',
message: `⚠️ AI SOV shift detected: ${brand} moved ${delta > 0 ? '+' : ''}${Math.round(delta * 100)}pts on ${engine} (${queryCategory} queries). Check dashboard.`
});
}
This is lightweight enough to run on every collection and catches the moves that matter—not every fluctuation, just the ones that cross a threshold you define.
Avoiding Common Mistakes
Measuring presence instead of position. Whether your brand appears at all is a coarse signal. Where it appears—and whether it’s cited positively, neutrally, or as a counterexample—matters. Build position tracking in from the start.
Ignoring query category breakdown. Top-line SOV can look fine while a competitor is quietly dominating the highest-intent query categories. Always break down by category before drawing conclusions.
Comparing across engines without normalization. Different engines respond differently to the same query. Claude tends to hedge and list options; Perplexity tends to cite specific sources; ChatGPT varies by prompt phrasing. Normalize within-engine trends rather than treating cross-engine numbers as equivalent.
Running queries too infrequently. AI search answers shift faster than traditional search rankings. Weekly is a reasonable minimum; twice weekly is better for competitive markets. BotSee supports automated daily query runs at scale, which is useful once you’ve validated your query library and want higher-frequency signal.
Not versioning your query library. If you change a query, you break the time series for that query_id. Always add new queries rather than modifying existing ones. Keep old queries running in parallel for at least two weeks if you need to swap them out.
Connecting SOV Data to Content Actions
Share-of-voice data has a natural output: a list of queries where competitors appear and you don’t. That list is your content gap backlog.
For each gap query, the workflow is:
- Pull the AI answers where competitors are mentioned but you’re not
- Identify what those answers say about the competitor (what claims, what context)
- Create or update content that addresses the same query with your own claims, cited sources, and structured data
- Monitor whether your mention rate in that query category improves over the following 4–8 weeks
This loop runs naturally inside a Claude Code agent with an OpenClaw content ops skill. The agent ingests the gap list, pulls existing relevant content from your site, drafts updates or new pieces, and flags them for human review before publishing.
What a Mature Pipeline Looks Like
After 60–90 days of consistent collection:
- You have a baseline SOV for each brand across each engine and query category
- You can measure the impact of content changes on AI citation rates
- You get automatic alerts when competitors make meaningful moves
- Leadership gets a weekly digest without anyone manually compiling it
The initial setup takes a few days of engineering time. The ongoing maintenance is minimal—mainly updating the query library as your market evolves and adjusting alert thresholds as you learn what’s signal vs. noise.
Quick-Reference Checklist
Query library
- Minimum 50 queries, organized by category (category / problem / competitor-adjacent)
- Stable query IDs; never edit existing queries, only add
- Covers each engine you care about
Data collection
- Agent runs on a schedule (minimum weekly)
- Brand extraction includes position tracking
- Raw answers stored for audit
Storage
- Normalized schema with position weighting
- Aggregation queries for SOV by brand, engine, category
- Historical data retained for trend analysis
Reporting
- Weekly digest to the right people in the right channel
- Anomaly alerts on threshold breaches
- Category breakdown available on demand
Content loop
- Gap queries feed content backlog
- Content changes tracked against SOV movement
Key Takeaways
Automated AI search SOV benchmarking is straightforward to build and pays dividends almost immediately. The manual alternative—spot-checking by hand—misses too much and creates too much variance in the data.
The combination of Claude Code for agent logic, OpenClaw skills for integrations and scheduling, and BotSee for structured AI citation data gives you a complete pipeline without building everything from scratch. Each piece does what it’s best at.
Start with 50 queries, run for four weeks, and you’ll have more useful competitive intelligence than most teams gather in a quarter. Then extend the query library and tighten your content loop.
Rita writes about AI search visibility, agent workflows, and practical tooling for teams building in AI-native environments.
Similar blogs
How to Build a Weekly AI Share-of-Voice Dashboard Without an Enterprise Budget
A practical, step-by-step guide to tracking your brand's share of voice across ChatGPT, Claude, and Perplexity — using lightweight tooling, agent automation, and free or low-cost data sources.
What Content Teams Get Wrong About AI Search (And How to Fix It)
Most content teams are still optimizing for Google while AI answer engines quietly route their buyers elsewhere. This guide covers exactly what needs to change: query research, format choices, measurement, and the team habits that separate brands getting cited from brands getting ignored.
Profound vs AI Visibility Platforms: Practical Buyer Guide
A practical framework for evaluating AI visibility platforms using coverage, citation quality, integration reliability, and operational fit.
Geo Tracking Platforms How To Choose
A practical framework for selecting GEO tracking tools with scorecards and rollout checkpoints.