An AI reconciliation copilot — and everything you need to understand it, explain it, and defend it in an eBay “AI builder / internal financial tools” interview.
eBay pays millions of sellers. Someone in finance has to prove that what the books say a seller is owed equals what the payment processor settled equals what actually left the bank. That three-way check is called reconciliation. When the three don’t agree, that’s a break, and a human currently hunts it down by hand in spreadsheets.
Closing Room automates the hunt. A deterministic program (plain code, no AI) matches the millions of clean rows and isolates the handful that don’t reconcile. Only those go to an AI, which explains each one in plain English, cites the exact records it used, and drafts a correction — but never posts anything; a human approves. Crucially, the AI’s “confidence” and “is this grounded in real data” are not the AI’s own claims — they are measured by separate code. That is the whole pitch: in finance, you don’t trust an AI’s word; you verify it mechanically.
You’re a designer who taught yourself to ship real autonomous AI systems. That’s rare and valuable — most “AI builders” can’t design, and most designers can’t build agents. Don’t apologize for “no formal training.” The field moves faster than any curriculum; what matters is that you can take a fuzzy problem, wire an LLM + tools + data into something reliable, and ship it. You’ve done exactly that repeatedly.
Closing Room is chosen specifically because it lets you say something a finance org deeply cares about: an AI that confidently invents a number is worse than no AI at all. The entire design answers that fear. That’s why it beats a generic “chat with your data” dashboard — it proves judgment about reliability, not just wiring.
Imagine you sell on a marketplace. Over a two-week period you make sales, some get refunded, the marketplace charges fees and ad costs, and it withholds sales tax on your behalf. At the end of the period the marketplace owes you a single net payout. Three separate systems record this journey:
The marketplace’s own books: “for this period, we owe seller X $Y net” (after fees, refunds, tax, reserves).
The payment company (Stripe/Adyen): “we settled $Z to that seller,” minus its own processing fee.
The actual money movement: “a payout of $W hit the bank on this date,” often bundling many settlements.
Reconciliation = proving Ledger = Processor = Bank for every seller, every period. At eBay scale this is enormous, repetitive, and unforgiving — a few cents wrong across millions of rows is both a control failure and, at aggregate, real money. The tedious part isn’t the millions that match; it’s finding and explaining the few that don’t.
Use deterministic logic for what must be auditable. Use the LLM only where judgment and language add value. And never let the model self-certify — measure its trustworthiness with separate code.
Breaking that down:
claude -p command line) to explain the break and propose a resolution. This runs while building, not during the demo.You don’t need an accounting background — here are the only concepts you need, in order.
Never store money as 12.34 (a floating-point number) — computers can’t represent it exactly and rounding errors creep in. We store 1234 (integer cents) and format to “$12.34” only for display. In a reconciliation tool, a one-cent drift is the exact bug you’re hunting, so you must be exact everywhere.
For a seller in a period, the ledger computes:
Then reconciliation asks: does netOwed equal what the processor settled, which equals what the bank paid out? Two of those terms deserve explanation because they’re the classic sources of confusion an eBay person will look for:
Marketplaces hold back a slice of your money for a while as protection against future refunds/chargebacks, then release it later. So a payout can be larger than the period’s sales because an old reserve was released — surprising, but correct. That’s a reconciling item, not a break.
eBay collects sales tax and remits it to the government on the seller’s behalf, so it’s withheld from the seller’s payout. If you forget this line, your “what the seller is owed” is wrong — and an eBay finance person would notice instantly.
A single bank payout usually bundles many settlements. So matching isn’t always 1-row-to-1-row; the tool handles many settlements → one payout, and shows a variable number of source records rather than a rigid three-column layout.
The generator plants these deliberately. Notice the third column — the same “difference” can demand very different actions, and that nuance is the credibility.
| Item | Bucket | What it is | Right action |
|---|---|---|---|
| Sub-cent rounding | reconciling | Off by 1–2¢ from currency math | No action (within tolerance) |
| Reserve released | reconciling | Old held-back money released this period | No action (explained) |
| Timing lag | reconciling | Settlement lands next window, not yet paid | No action; escalate if it ages out |
| Fee mismatch | exception | Charged 2.9% vs. the 2.5% contract | Adjusting journal entry (correct the fee) |
| FX remeasurement | exception | Exchange rate moved → real gain/loss | Adjusting entry to an FX Gain/Loss account |
| Duplicate settlement | exception | Processor paid the same period twice | Adjusting entry (reverse the duplicate) |
| Missing settlement | exception | Ledger owes, processor never settled | Dispute case — NOT a journal entry |
| Chargeback pending | exception | Payout short by a late chargeback | Dispute case until resolved |
| Unknown / ambiguous | exception | No rule matches deterministically | Route to a human |
The match/no-match decision is plain code, reproducible and inspectable. Why: auditors and controllers need to point at an exact rule. “The AI thought so” is not an acceptable basis for a financial control.
Every AI answer is mechanically checked before it’s shown: (1) every cited ID actually exists in the data; (2) every cited amount equals the real source value; (3) any proposed journal entry balances (debits = credits) and uses a valid account; (4) the correction’s size matches the measured difference. Why: this converts “the AI cites its sources” from a claim into a tested fact. If the model hallucinates an ID or an unbalanced entry, the validator catches it and the UI shows it in red. We literally plant one bad case to prove the catch works — in the live demo it shows Caught ✕ at confidence 0.35.
The confidence number is calculated from real signals — does the arithmetic reconcile exactly? did the AI’s root cause match what the deterministic matcher independently suspected? does the entry balance? is the case unambiguous? Why: if you ask an LLM “how confident are you?”, the number is poorly calibrated theatre. A number derived from verifiable properties is defensible. This is the single most impressive answer you can give when a sharp interviewer probes.
Nothing posts automatically. Proposed corrections sit in an Approve/Reject queue with an audit trail. Why: it mirrors real financial controls (segregation of duties, SOX). The AI is an assistant, not an actor.
The AI runs while building; the live site serves the cached, already-validated results. Why: two reasons — (a) the demo can never lag, rate-limit, or cost money while you’re screen-sharing; (b) it’s deterministic, so “do it again” always behaves. This is itself a signal of engineering judgment about demos and cost.
People assume “AI feature” means “call an API at runtime.” Here it deliberately doesn’t. The flow:
claude -p command (your plan auth — $0, no API key), asking for a strict JSON answer.JSON.parsed, and validated against a strict schema (zod). If it’s malformed, it retries up to 3×, then throws loudly — it never silently skips an exception.Real result on the current data: 6 of 7 exceptions grounded, with the one planted hallucination correctly caught. The 3 confirmed breaks got balanced adjusting entries; the unconfirmed ones got dispute cases.
claude -p triage, zod-gated, cached| reconciliation | Proving two or more records of the same money agree. |
| 3-way match | Checking ledger = processor = bank for each item. |
| break | A case where the records don’t agree. |
| reconciling item | A difference that is explained and needs no action (e.g. timing). |
| exception | A difference that needs a fix or investigation. |
| settlement | The payment processor’s record of paying a seller. |
| payout | The actual bank money movement (may bundle settlements). |
| reserve | Money held back temporarily as risk protection, released later. |
| facilitator tax | Sales tax the marketplace collects & remits for the seller. |
| adjusting entry | An accounting correction with balanced debits and credits. |
| dispute case | A follow-up for an unconfirmed item (no booking yet). |
| chart of accounts | The fixed list of accounts entries can post to. |
| grounding | Checking the AI’s claims trace to real source data. |
| minor units | Money as whole cents (integers) to avoid rounding drift. |
| claude -p | Running Claude non-interactively from the command line. |
| zod | A library that validates data matches an expected shape. |
| SOX | Sarbanes–Oxley — the law behind strict financial controls. |
Closing Room · internal explainer · generated 2026-07-02 · demo at closing-room.thomaspeng.ca · repo /home/tpeng/closing-room