How to Automate AI Search Share-of-Voice Benchmarking with Claude Code and OpenClaw

Rita • 2026-03-07 • AI Visibility

Stop manually spot-checking ChatGPT and Perplexity for brand mentions. This guide shows how to build a Claude Code + OpenClaw agent pipeline that runs continuous AI search share-of-voice benchmarks, flags competitor gains, and feeds structured data to a dashboard your team will actually use.

Category: AI Visibility
Use this for: planning and implementation decisions
Reading flow: quick summary now, long-form details below

If you’re still manually checking whether ChatGPT mentions your brand—running queries one by one, copy-pasting results into a spreadsheet—you’re spending hours on a problem that can run unattended overnight.

AI search share-of-voice (SOV) benchmarking is not complicated conceptually: ask a set of relevant queries, record which brands appear in the answers, and track the percentages over time. The challenge is doing it consistently, at scale, across multiple AI engines, without turning it into someone’s full-time job.

This guide covers how to build that pipeline using Claude Code agents and OpenClaw skills libraries. We’ll go from raw query list to a structured dataset your team can actually use—with competitor alerts built in.

Why Manual AI Search Monitoring Falls Apart

Most teams start by spot-checking. A product manager runs a few ChatGPT searches before a board meeting. A content lead glances at Perplexity results when a competitor launches. It feels like enough until it isn’t.

The problems compound fast:

Query drift: The searches you pick are inconsistent between checks. You’re not measuring the same thing twice.
Coverage gaps: AI search SOV isn’t one number—it varies by query intent, query phrasing, engine, and date. Spot-checks miss most of this variance.
No baseline: Without historical data, you can’t tell whether a shift is noise or signal.
Competitor blind spots: You may not notice a competitor gaining share until they’ve locked in citations across dozens of query categories.

Automating the pipeline solves all four. The key is structuring it correctly from the start.

The Core Architecture

A reliable AI search SOV pipeline has four components:

Query library — A structured set of queries organized by intent category
Data collection layer — Agents that submit queries and parse brand mentions from AI answers
Storage and normalization — A lightweight schema that makes results comparable over time
Reporting layer — A dashboard or digest that surfaces changes and flags anomalies

Claude Code handles the agent logic. OpenClaw skills handle the integrations—API calls, data writes, notification delivery. BotSee sits at the data collection layer, providing structured citation and mention data across ChatGPT, Claude, Perplexity, Gemini, and other AI engines without requiring you to build and maintain direct API wrappers for each one.

Let’s build each piece.

Step 1: Design a Query Library That Produces Comparable Data

Your query library is the foundation. Poor query design produces noise; clean query design produces signal.

Organize by intent category

Group queries into three buckets:

Category queries (What AI says about your market)

"best tools for tracking ai search citations"
"how do companies monitor chatgpt brand mentions"
"ai answer engine optimization platforms"

Problem queries (What AI says when someone has your customer’s problem)

"how do i know if my brand appears in chatgpt answers"
"how to measure ai search visibility for my company"
"why doesn't my product show up in perplexity results"

Competitor-adjacent queries (What AI says in contexts where your competitors are named)

"alternatives to [competitor name]"
"[competitor name] vs other ai visibility tools"
"[competitor name] pricing and reviews"

Keep phrasing stable

Vary phrasing between categories, but keep each individual query identical across runs. If you change a query mid-flight, you lose comparability. Track query versions with a query_id field and timestamp any changes.

Target 50–150 queries for a meaningful baseline

Fewer than 50 queries and you’re likely missing important intent clusters. More than 150 and you’re probably duplicating coverage. Start at 50, add queries in batches as you identify gaps.

Step 2: Build the Data Collection Agent

The collection agent runs your query library against target AI engines and extracts structured mention data from each response.

Skill setup in OpenClaw

Create a skill directory in your OpenClaw workspace:

skills/
  ai-sov-collector/
    SKILL.md
    collect.mjs
    normalize.mjs
    schema.json

The SKILL.md defines what the agent can do:

# AI SOV Collector Skill

Submits queries to AI search APIs and extracts brand mentions.

## Inputs
- query_list: path to JSON file with query library
- engines: array of target engines (chatgpt, perplexity, claude, gemini)
- brands: array of brand names to track (your brand + competitors)
- output_path: where to write normalized results

## Outputs
JSON file per run with schema: { run_id, date, engine, query_id, query_text, 
brands_mentioned: [{brand, position, context_snippet}], raw_answer }

Collection logic

The collection script iterates your query list, calls the API for each engine, and runs a brand extraction pass on each response:

// collect.mjs (simplified)
async function collectRun(queryList, engines, brands, outputPath) {
  const runId = `run-${Date.now()}`;
  const results = [];

  for (const query of queryList) {
    for (const engine of engines) {
      const answer = await callEngine(engine, query.text);
      const mentions = extractMentions(answer, brands);
      
      results.push({
        run_id: runId,
        date: new Date().toISOString(),
        engine,
        query_id: query.id,
        query_text: query.text,
        query_category: query.category,
        brands_mentioned: mentions,
        raw_answer: answer
      });
    }
  }

  await fs.writeFile(outputPath, JSON.stringify(results, null, 2));
  return runId;
}

The extractMentions function does a normalized string match across the answer text, recording both whether the brand appeared and its approximate position (first third, middle, last third of the response). Position matters—early mentions in AI answers carry more weight.

Using BotSee’s API for structured data

If you’re pulling data from multiple engines, building and maintaining direct integrations with each AI provider’s API is non-trivial. Engine behaviors, rate limits, and response formats differ. BotSee provides a unified API that returns structured citation data—brand mentions, position, context—across engines without requiring separate integrations.

For high-frequency collection (daily or multiple times per week across 100+ queries), using a purpose-built data layer here saves meaningful engineering time and produces more consistent normalization than rolling your own.

Step 3: Normalize and Store Results

Raw JSON per run isn’t useful for trend analysis. You need a normalized schema that makes cross-run comparison easy.

Schema design

{
  "run_id": "run-1741305600000",
  "run_date": "2026-03-07",
  "engine": "chatgpt",
  "query_id": "cat-001",
  "query_category": "category",
  "brand": "YourBrand",
  "mentioned": true,
  "position_bucket": "early",
  "sov_weight": 1.0
}

The sov_weight column lets you apply position weighting: early mentions score 1.0, middle 0.6, late 0.3, not mentioned 0. This gives you a weighted SOV metric that reflects how prominently the brand appears, not just whether it appears.

Aggregation queries

Store the normalized rows in a SQLite file or a lightweight Postgres instance. Two aggregation queries cover most reporting needs:

Weekly SOV by brand and engine:

SELECT 
  run_date,
  engine,
  brand,
  AVG(sov_weight) as avg_sov,
  COUNT(CASE WHEN mentioned THEN 1 END) * 1.0 / COUNT(*) as mention_rate
FROM sov_results
WHERE run_date >= date('now', '-7 days')
GROUP BY run_date, engine, brand
ORDER BY run_date DESC, avg_sov DESC;

Category breakdown (where are you winning/losing?):

SELECT
  query_category,
  brand,
  AVG(sov_weight) as avg_sov
FROM sov_results
WHERE run_date >= date('now', '-30 days')
GROUP BY query_category, brand
ORDER BY query_category, avg_sov DESC;

Step 4: Set Up Automated Runs with Claude Code

The collection agent runs on a schedule. Claude Code + OpenClaw makes this straightforward to configure without a dedicated cron server.

Scheduling the run

In your OpenClaw heartbeat or cron config:

# Run AI SOV collection every Monday and Thursday at 6 AM UTC
0 6 * * 1,3 openclaw run ai-sov-collector --config sov-config.json

The agent reads the config, runs the collection, writes normalized results, and triggers the reporting step automatically.

Config file

{
  "query_library": "./queries/sov-queries.json",
  "engines": ["chatgpt", "perplexity", "claude"],
  "brands": ["YourBrand", "CompetitorA", "CompetitorB", "CompetitorC"],
  "output_dir": "./data/sov-runs/",
  "db_path": "./data/sov.db",
  "alert_threshold": 0.15
}

The alert_threshold triggers a notification if any competitor’s SOV changes by more than 15 percentage points between runs—a signal worth investigating.

Step 5: Build the Reporting Layer

Data is only useful if it surfaces to the right people at the right time.

Weekly digest via OpenClaw skill

Create a sov-reporter skill that generates a structured digest from the aggregated data:

# Weekly AI Search SOV Report — {{date}}

## Summary
- **Your Brand SOV (weighted avg, all engines):** 34% (+2 pts vs last week)
- **Biggest mover:** CompetitorA gained 8 pts on Perplexity (category queries)
- **Category gap:** "ai monitoring tools" intent — you appear in 28% of answers vs CompetitorB at 61%

## By Engine
| Engine | Your Brand | CompetitorA | CompetitorB |
|--------|-----------|-------------|-------------|
| ChatGPT | 41% | 22% | 31% |
| Perplexity | 29% | 38% | 27% |
| Claude | 33% | 19% | 44% |

## Queries to Address
These queries show zero brand mention for you but competitor presence:
- "how to measure ai answer engine brand share" (CompetitorB appears in 80% of runs)
- "perplexity citation tracking for enterprise" (CompetitorA appears in 70% of runs)

This digest goes to Telegram (or Slack, or email) via your OpenClaw messaging integration. The people who need it get it in the channel they’re already in, without logging into a separate tool.

Anomaly alerts

The alert layer runs after each collection run. If the normalized data shows a threshold breach:

// In your collection agent's post-run step
if (Math.abs(currentSov - previousSov) > config.alert_threshold) {
  await openclaw.notify({
    channel: 'telegram',
    message: `⚠️ AI SOV shift detected: ${brand} moved ${delta > 0 ? '+' : ''}${Math.round(delta * 100)}pts on ${engine} (${queryCategory} queries). Check dashboard.`
  });
}

This is lightweight enough to run on every collection and catches the moves that matter—not every fluctuation, just the ones that cross a threshold you define.

Avoiding Common Mistakes

Measuring presence instead of position. Whether your brand appears at all is a coarse signal. Where it appears—and whether it’s cited positively, neutrally, or as a counterexample—matters. Build position tracking in from the start.

Ignoring query category breakdown. Top-line SOV can look fine while a competitor is quietly dominating the highest-intent query categories. Always break down by category before drawing conclusions.

Comparing across engines without normalization. Different engines respond differently to the same query. Claude tends to hedge and list options; Perplexity tends to cite specific sources; ChatGPT varies by prompt phrasing. Normalize within-engine trends rather than treating cross-engine numbers as equivalent.

Running queries too infrequently. AI search answers shift faster than traditional search rankings. Weekly is a reasonable minimum; twice weekly is better for competitive markets. BotSee supports automated daily query runs at scale, which is useful once you’ve validated your query library and want higher-frequency signal.

Not versioning your query library. If you change a query, you break the time series for that query_id. Always add new queries rather than modifying existing ones. Keep old queries running in parallel for at least two weeks if you need to swap them out.

Connecting SOV Data to Content Actions

Share-of-voice data has a natural output: a list of queries where competitors appear and you don’t. That list is your content gap backlog.

For each gap query, the workflow is:

Pull the AI answers where competitors are mentioned but you’re not
Identify what those answers say about the competitor (what claims, what context)
Create or update content that addresses the same query with your own claims, cited sources, and structured data
Monitor whether your mention rate in that query category improves over the following 4–8 weeks

This loop runs naturally inside a Claude Code agent with an OpenClaw content ops skill. The agent ingests the gap list, pulls existing relevant content from your site, drafts updates or new pieces, and flags them for human review before publishing.

What a Mature Pipeline Looks Like

After 60–90 days of consistent collection:

You have a baseline SOV for each brand across each engine and query category
You can measure the impact of content changes on AI citation rates
You get automatic alerts when competitors make meaningful moves
Leadership gets a weekly digest without anyone manually compiling it

The initial setup takes a few days of engineering time. The ongoing maintenance is minimal—mainly updating the query library as your market evolves and adjusting alert thresholds as you learn what’s signal vs. noise.

Quick-Reference Checklist

Query library

Minimum 50 queries, organized by category (category / problem / competitor-adjacent)
Stable query IDs; never edit existing queries, only add
Covers each engine you care about

Data collection

Agent runs on a schedule (minimum weekly)
Brand extraction includes position tracking
Raw answers stored for audit

Storage

Normalized schema with position weighting
Aggregation queries for SOV by brand, engine, category
Historical data retained for trend analysis

Reporting

Weekly digest to the right people in the right channel
Anomaly alerts on threshold breaches
Category breakdown available on demand

Content loop

Gap queries feed content backlog
Content changes tracked against SOV movement

Key Takeaways

Automated AI search SOV benchmarking is straightforward to build and pays dividends almost immediately. The manual alternative—spot-checking by hand—misses too much and creates too much variance in the data.

The combination of Claude Code for agent logic, OpenClaw skills for integrations and scheduling, and BotSee for structured AI citation data gives you a complete pipeline without building everything from scratch. Each piece does what it’s best at.

Start with 50 queries, run for four weeks, and you’ll have more useful competitive intelligence than most teams gather in a quarter. Then extend the query library and tighten your content loop.

Rita writes about AI search visibility, agent workflows, and practical tooling for teams building in AI-native environments.