AI Build · 2026

An agentic client-reporting system

A Seattle digital marketing agency spent a team-week every month hand-building performance reports for roughly 40 clients across paid search, paid social, organic social, and SEO. Over about a month I replaced that workflow with a fleet of agentic report skills that pull live data, do the math, render branded deliverables, file them into a Drive-synced archive, and run on a monthly schedule. The account manager is left with exactly the work that needs a human: judgment, customization, and delivery.

Built in Claude Cowork over ~40 working sessions. Agency and client names withheld.

45+

Reports / May 2026

5

Production skills

~40

Client accounts

0

Fabricated numbers

For May 2026 alone: 17 PPC reports, 13 paid social, 10 SEO, plus organic social and LinkedIn — across ~40 accounts, ~63 connected analytics properties, and 48 Search Console properties.

The problem

A team-week, every month, before the real work started.

Forty clients, multiple channels each — Google Ads, Microsoft Ads, Meta, LinkedIn, organic social, SEO. Every month that meant manual data pulls, manual spreadsheet and dashboard assembly, and a stack of account-manager hours spent transcribing numbers instead of interpreting them. The goal was to make the machine do the pulling, the math, the writing, the rendering, and the filing — and give the humans back the part only humans do well.

The system

One config sheet in. A branded report out. On a schedule.

A master Google Sheet is the single source of truth — every client, their active channels, and their platform account IDs. A scheduled trigger invokes the matching report skill, which reads the sheet and runs the client through a three-phase pipeline. Each report type is a packaged skill: a workflow spec with its guardrails, reference docs, and the Python scripts that do the deterministic work.

Five batches run staggered through the 1st of the month, a 9 PM sweeper rebuilds anything that was missed, and a pre-flight check runs a few business days before month end to confirm connectors, auth, and config are healthy — so the 1st runs clean.

ConfigMaster Google Sheetdrives the batchPhase 1 · scriptPlanlists every call,pins each to a filenamePhase 2 · LLMExecutecalls MCP, writes rawresponses verbatimraw filesPhase 3 · scriptsAssemble + rendermath · cross-checks→ branded DOCXSources · via MCP connectorsAgencyAnalytics — paid + socialGA4 · Search Console · Meta Adslive pullsOutputBranded DOCXsyncHosts in cloudGoogle DriveReview & deliverAccount managerMissing or mismatched data renders an explicit fallback or halts the run — numbers are never fabricated.

The model is restricted to the accented step. Scripts bracket it on both sides.

The engineering

A deterministic pipeline the LLM can't corrupt.

The first version asked the model to do everything — pull, track which response belonged to which client, compute the deltas, write the prose. It burned tokens transcribing raw API responses through context, and once it mismatched responses to clients because it was holding the mapping in its head. The rebuild splits the work into three phases and lets the model touch only the middle one.

  1. 01

    Plan — a script

    Given a client and a period, it emits a flat list of every data call needed, each one pinned in advance to a deterministic filename. The filename is the contract.

  2. 02

    Execute — the LLM

    The model reads the plan, makes each call, and writes the raw response verbatim to its pre-assigned file. No parsing, no summarizing, no state in its head.

  3. 03

    Assemble — a script

    Parses the raw files, computes every month-over-month and year-over-year figure client-side, runs the cross-checks, and renders the branded document.

Because the response-to-client binding is set before any call is made, the mismatch bug cannot recur. The model keeps only the three jobs it's actually good at: resolving client names, writing the analysis prose, and drafting optional account-manager notes. Everything that has to be exact is Python. On its validation run the new pipeline built five reports from 45 planned data pulls with zero transcription errors.

Design principles

What the build taught me about shipping agents into production.

What it surfaced

Automating the reports turned up money no one was looking at.

Because every run validated against live connections instead of trusting the config, the system kept finding things a human pass had missed:

Where the human stays

The agent does the labor. The marketer does the judgment.

Automated: the data pulls, the math, the rendering, the filing, the scheduling, and the nightly sweep. Still human, by design: clearing pre-flight blockers, owning the master config, QA on each report, tailoring the analysis, filling the customization markers, adding the data the connectors can't reach, and actually sending the report to the client.

The reports proactively flag tracking changes, small-base percentage swings, and paid-versus-organic contamination — so the numbers are never oversold. Honest analysis beats flattering analysis.

Build one for your org

This is what marketing leadership looks like now.

The same person can make the strategic calls and build the system that measures them. If your team is spending its best hours assembling reports instead of reading them — or if there's an agentic system you've been meaning to stand up — that's the conversation. Thirty minutes, and I'll come with a point of view.

Or write me directly: hello@joekim.info