AI Build · 2026
An agentic client-reporting system
A Seattle digital marketing agency spent a team-week every month hand-building performance reports for roughly 40 clients across paid search, paid social, organic social, and SEO. Over about a month I replaced that workflow with a fleet of agentic report skills that pull live data, do the math, render branded deliverables, file them into a Drive-synced archive, and run on a monthly schedule. The account manager is left with exactly the work that needs a human: judgment, customization, and delivery.
Built in Claude Cowork over ~40 working sessions. Agency and client names withheld.
45+
Reports / May 2026
5
Production skills
~40
Client accounts
0
Fabricated numbers
For May 2026 alone: 17 PPC reports, 13 paid social, 10 SEO, plus organic social and LinkedIn — across ~40 accounts, ~63 connected analytics properties, and 48 Search Console properties.
The problem
A team-week, every month, before the real work started.
Forty clients, multiple channels each — Google Ads, Microsoft Ads, Meta, LinkedIn, organic social, SEO. Every month that meant manual data pulls, manual spreadsheet and dashboard assembly, and a stack of account-manager hours spent transcribing numbers instead of interpreting them. The goal was to make the machine do the pulling, the math, the writing, the rendering, and the filing — and give the humans back the part only humans do well.
The system
One config sheet in. A branded report out. On a schedule.
A master Google Sheet is the single source of truth — every client, their active channels, and their platform account IDs. A scheduled trigger invokes the matching report skill, which reads the sheet and runs the client through a three-phase pipeline. Each report type is a packaged skill: a workflow spec with its guardrails, reference docs, and the Python scripts that do the deterministic work.
Five batches run staggered through the 1st of the month, a 9 PM sweeper rebuilds anything that was missed, and a pre-flight check runs a few business days before month end to confirm connectors, auth, and config are healthy — so the 1st runs clean.
The model is restricted to the accented step. Scripts bracket it on both sides.
The engineering
A deterministic pipeline the LLM can't corrupt.
The first version asked the model to do everything — pull, track which response belonged to which client, compute the deltas, write the prose. It burned tokens transcribing raw API responses through context, and once it mismatched responses to clients because it was holding the mapping in its head. The rebuild splits the work into three phases and lets the model touch only the middle one.
01
Plan — a script
Given a client and a period, it emits a flat list of every data call needed, each one pinned in advance to a deterministic filename. The filename is the contract.
02
Execute — the LLM
The model reads the plan, makes each call, and writes the raw response verbatim to its pre-assigned file. No parsing, no summarizing, no state in its head.
03
Assemble — a script
Parses the raw files, computes every month-over-month and year-over-year figure client-side, runs the cross-checks, and renders the branded document.
Because the response-to-client binding is set before any call is made, the mismatch bug cannot recur. The model keeps only the three jobs it's actually good at: resolving client names, writing the analysis prose, and drafting optional account-manager notes. Everything that has to be exact is Python. On its validation run the new pipeline built five reports from 45 planned data pulls with zero transcription errors.
Design principles
What the build taught me about shipping agents into production.
Never fabricate.
Expired auth, a dormant account, an unsupported metric, an API still rolling out — each one renders an explicit fallback or halts the run. It refused to build a report for an account with zero activity since 2023 because a PDF full of zeros helps no one. The guardrail held on day one and never bent.
Push everything deterministic into scripts.
The model never holds intermediate state — no totals, no mappings, no percent math. Files on disk carry the bindings between phases; filenames encode meaning. Math in a script is reviewable, testable, reproducible. When it breaks you debug Python, not a chain of reasoning.
Bind outputs to destinations before the call.
An early run mismatched API responses to the wrong clients because the model tracked the mapping in its head. The fix: the plan assigns each response’s filename in advance, so the model can’t misfile what it hasn’t seen yet. The bug became structurally impossible.
Validate identity, not just math.
Shipping another client’s numbers is worse than a math error. Every build token-matches the pulled account name to the client name and checks each row’s account ID, aborting with a diagnostic on mismatch — with a logged override for the legitimate exceptions.
Cross-check and warn loudly.
Campaign-row sums are reconciled against account totals on every run. In validation it surfaced a benign $10.51 (0.22%) discrepancy in the upstream platform’s own near-real-time data — exactly the kind of thing that should announce itself instead of hiding.
Match the source the client already sees.
Mid-project I migrated the data layer to the same analytics platform that powers the dashboards clients log into — so the reports agree with what the client sees, instead of quietly disagreeing with it.
What it surfaced
Automating the reports turned up money no one was looking at.
Because every run validated against live connections instead of trusting the config, the system kept finding things a human pass had missed:
- Live ad accounts the config never listed — one running $14.8K/month — surfaced by matching against active platform connections instead of the spreadsheet.
- A client quietly spending with zero conversion tracking, flagged for an audit instead of papered over with a clean-looking report.
- A LinkedIn campaign that looked dead in native reporting — 0 conversions — but had driven 90 conversions visible only in the analytics layer, at a cost-per-result down 52% month over month. Without that second source the campaign looked like it produced nothing.
- Several config errors caught by reconciliation — a transposed account ID, a stale file pointer, a dead analytics property after a site migration — each proposed back to a human rather than silently “fixed.”
Where the human stays
The agent does the labor. The marketer does the judgment.
Automated: the data pulls, the math, the rendering, the filing, the scheduling, and the nightly sweep. Still human, by design: clearing pre-flight blockers, owning the master config, QA on each report, tailoring the analysis, filling the customization markers, adding the data the connectors can't reach, and actually sending the report to the client.
The reports proactively flag tracking changes, small-base percentage swings, and paid-versus-organic contamination — so the numbers are never oversold. Honest analysis beats flattering analysis.
Build one for your org
This is what marketing leadership looks like now.
The same person can make the strategic calls and build the system that measures them. If your team is spending its best hours assembling reports instead of reading them — or if there's an agentic system you've been meaning to stand up — that's the conversation. Thirty minutes, and I'll come with a point of view.
Or write me directly: hello@joekim.info