Last run: 2026-04-17 21:15 UTC · 4 models

How well do AI models know congressional trading?

Same 28 factual questions. Every major frontier model. Each runs twice: once from memory alone ("cold"), once with access to our MCP server as its only tool. Ground truth is computed live from our database at the moment we ask.

The best model scored 8% cold — with our MCP server it scored 42%. (Best MCP score across all models: 50%.)

That gap is what an MCP server buys you: turning confident guesswork into live, verifiable data.

Methodology

Each model answers 28 questions twice: once with no tools (pure parametric knowledge), once with our MCP server wired up as its only tool. Ground truth is computed at run time by querying our D1 database directly — the benchmark tests against current data, not a frozen snapshot. Numeric answers score as correct with an exact match or within ±5% for averages/percentages. List answers use set-overlap (Jaccard index ≥ 0.8). Free-text answers are graded by Claude Haiku 4.5 as an LLM-as-judge. Questions are excluded from the denominator when the ground-truth query returns empty at run time (e.g. a not-yet-populated sector table), or when the model API fails after 3 retries — so a network blip or a missing data slice doesn't count against a model's accuracy. The scored denominator is shown next to each percentage. The full question set and scoring code is in the repo.

Leaderboard

Sorted by cold-mode accuracy — how well each model does without tools. "MCP" is the same model, same question, but with access to our MCP server.

#	Model	Cold	With MCP	Delta	Last run
1	grok-4	8% (2/26)	42% (10/24)	+34 pts	Apr 17, 2026 UTC
2	gpt-5	4% (1/26)	38% (10/26)	+35 pts	Apr 17, 2026 UTC
3	claude-opus-4-6	0% (0/25)	48% (12/25)	+48 pts	Apr 17, 2026 UTC
4	claude-sonnet-4-6	0% (0/25)	50% (13/26)	+50 pts	Apr 17, 2026 UTC

Footnote claude-opus-4-7 · 7.7% (2/26) cold · Apr 18, 2026 UTC · Manual chat run — not comparable to automated rows

Opus 4.7 cold was hand-fed via claude.ai on 2026-04-18 (UTC) — a manual chat run, not a programmatic benchmark. 2 of 26 in-scope questions scored correct (Q7 committees + Q23 party return). 2 questions skipped (Q6, Q20 — empty ground truth). Strict-substring used for string questions because the Haiku LLM-judge was unavailable (same Anthropic billing lapse that prevented an automated Opus 4.7 run); actual score may be slightly higher under LLM judging. No MCP run — the manual-chat format doesn’t support tool use.

C/D MCP tool fixes deployed 2026-04-18 (UTC) (commit 203edcf, Worker version e0d54de0). Effect not yet measured — leaderboard above predates this deploy.

Category breakdown (cold-mode accuracy)

Where models fall down without tools. Each cell shows correct/total for that category.

Model	Aggregate	Chamber & Party	Member-Level	Ticker-Level	Committee-Level
claude-opus-4-6	0% (0/3)	0% (0/4)	0% (0/10)	0% (0/4)	0% (0/4)
claude-sonnet-4-6	0% (0/3)	0% (0/4)	0% (0/11)	0% (0/3)	0% (0/4)
gpt-5	0% (0/3)	25% (1/4)	0% (0/11)	0% (0/4)	0% (0/4)
grok-4	0% (0/3)	25% (1/4)	9% (1/11)	0% (0/4)	0% (0/4)

Questions every AI got wrong (cold mode)

These are the questions where parametric knowledge fails universally — and where MCP tools earn their keep.

top traders How many stock trades has the most-active House member disclosed in the last 90 days?

Correct answer from our data: 137

Every model got this wrong cold · 4/4 got it right with MCP
member profile How many total stock trades has Nancy Pelosi disclosed across her career?

Correct answer from our data: 156

Every model got this wrong cold · 4/4 got it right with MCP
member profile What is Tommy Tuberville's average return percentage across all his disclosed stock trades?

Correct answer from our data: 18.37

Every model got this wrong cold · 4/4 got it right with MCP
committee What percentage of the Senate Finance Committee has disclosed a stock trade in the last 90 days? Give a percentage number.

Correct answer from our data: 50

Every model got this wrong cold · 4/4 got it right with MCP
aggregate How many total stock trades has the U.S. Congress disclosed all-time in this tracked database?

Correct answer from our data: 37,091

Every model got this wrong cold · 4/4 got it right with MCP
aggregate How many unique members of the U.S. Congress have disclosed at least one stock trade?

Correct answer from our data: 367

Every model got this wrong cold · 4/4 got it right with MCP
aggregate What is the average disclosure lag in days for congressional stock trades (the gap between trade_date and disclosure_date)?

Correct answer from our data: 72.1

Every model got this wrong cold · 4/4 got it right with MCP
recent activity Which single member of the U.S. Congress has made the most stock trades in the last 30 days?

Correct answer from our data: Rohit Khanna

Every model got this wrong cold · 4/4 got it right with MCP
top traders List the top 5 members of the U.S. Congress by number of stock trades disclosed in the last 90 days.

Correct answer from our data: Rohit Khanna, Gilbert Cisneros, Michael McCaul, John Boozman, April McClain Delaney

Every model got this wrong cold · 3/4 got it right with MCP
party chamber What percentage of all disclosed congressional stock trades come from the Senate versus the House? Give two percentages.

Correct answer from our data: Senate 17.8%, House 82.2%

Every model got this wrong cold · 3/4 got it right with MCP

Connect your AI to our data

One MCP endpoint. 12 tools. Same data that powers this benchmark. Works with Claude Desktop, Cursor, Claude Code, and any MCP-compatible client.

See MCP setup →