One sprint · 18 ranked opportunities · 3 pilots in production
The challenge
The COO of a mid-sized DACH insurance group — claims, life, and a small SME commercial book — inherited a 200-page “AI roadmap” from a Big-Four engagement the year before. It had three things in abundance: maturity models, risk matrices, and nice typography. It had two things missing: a single shipped system, and an honest view of what was worth doing first.
The board wanted pressure-tested answers to three questions:
- Where are the top 5 AI opportunities, ranked by real effort and real impact?
- How do we ship something in this quarter — not next year — without tripping over the EU AI Act?
- What is the smallest team and budget that can keep this going after the consultants leave?
They did not need another deck. They needed a studio that could think and build in the same room.
Our roadmap
-
01Phase 01 Week 1
Strategy sprint
18 opportunities benchmarked, 3 prioritised, governance & EU AI Act layer drafted, leadership handover.
-
02Phase 02 Days 8–30
Pilot 1 — Claims triage
LLM-assisted claims triage for household & motor. Human-in-the-loop from day one, clean audit trail.
-
03Phase 03 Days 31–60
Pilot 2 — Broker assistant
Retrieval-grounded assistant for the broker hotline. Reduced hold time, cited sources on every answer.
-
04Phase 04 Days 61–90
Pilot 3 — Policy document QA
Underwriter-facing QA on 80k policy documents, with source-linked answers and refusal when unsure.
How we shipped it
Week 1 — A strategy sprint that leaves code behind
Our strategy sprint runs for five working days, on-site or hybrid. Not a 200-page deck — a short, ruthless document and a working prototype of the top-ranked opportunity.
What we did:
- Day 1: Interviewed 12 people across claims, broker operations, underwriting, compliance, and IT. Pulled real data on volumes, current tools, and pain points — not the sanitised version from the intranet.
- Day 2: Benchmarked 18 candidate use cases on a 4-axis scoring rubric: expected impact, data readiness, regulatory risk, time-to-value. Threw out eight that were re-branded OCR projects.
- Day 3: Ran a 90-minute governance workshop with legal and compliance. Sketched the EU AI Act classification for each remaining opportunity (limited vs. high risk), and what that actually meant for controls, documentation, and human oversight.
- Day 4: Shipped a working prototype of the top-ranked opportunity (claims triage) — crude but real, against anonymised sample data, so the room could stop arguing about hypotheticals.
- Day 5: Leadership handover. A 14-page written brief. A ranked backlog. A governance frame. A recommended team shape. A quote for Pilot 1.
The board signed off on three pilots and a rolling governance model on the Monday of week 2.
Days 8–30 — Pilot 1: Claims triage
The pilot targeted household and motor claims, the two highest-volume lines. The agent reads the first-notice-of-loss submission, cross-references policy coverage, flags likely fraud indicators, and routes the claim to the right queue with a confidence score.
What made it work:
- Human-in-the-loop from day one. No auto-approvals. Adjusters saw the agent’s recommendation, accepted it in one keystroke, and kept every decision as their own.
- Every recommendation cited its sources — which clause, which previous claim, which fraud signal. Compliance could audit any decision in seconds.
- Refusal was a first-class feature. On 6% of claims the agent explicitly declined to recommend, and humans handled them untouched.
Days 31–60 — Pilot 2: Broker assistant
The broker hotline handled 1,400 calls a day, most asking the same fifty questions across 12,000 pages of product documentation. We built a retrieval-grounded assistant for the agents manning the phone — always cited, never inventive, refuses cleanly when it doesn’t know.
- Average hold time dropped from 4m 20s to 2m 10s on covered topics.
- Agent satisfaction scores went up — they described the tool as “a colleague who actually read the product manual.”
- Every answer cited the source document and section. Compliance loved it.
Days 61–90 — Pilot 3: Policy document QA
Underwriters spent hours per day searching across 80,000 policy documents — legacy PDFs, scanned contracts, email addenda. We shipped a document-QA interface that answers natural-language questions and cites the exact paragraph it came from, refusing to answer when the source was ambiguous.
The refusal policy was the differentiator. Underwriters told us that a tool that admits it doesn’t know was more trustworthy than one that bluffs — and spent meaningfully more time using it as a result.
The governance layer
Everything above shipped inside a governance frame the board could actually sign. It was short:
- Classification: Each pilot was tagged limited-risk under the EU AI Act, with documented rationale.
- Human oversight: Mandatory for every live decision. No silent auto-actions in Y1.
- Transparency: Every output cited its sources. Every prompt template was versioned in git.
- Evaluation: A gold-set of 500 examples per pilot, regressed in CI, reviewed monthly by a small internal AI council.
- Right to pause: Any of compliance, legal, or the line-of-business owner can pause a pilot in 15 minutes. They never had to.
The same frame now governs pilots 4 and 5, without re-asking the board.
The results
After 90 days, measured across all three pilots:
- 3 pilots live in production, not in a lab.
- 18 opportunities benchmarked with honest scoring — 11 parked deliberately, 4 killed, 3 shipped.
- Projected Y1 savings of €420k from the claims-triage pilot alone, conservatively estimated on adjuster time.
- Broker hotline hold time halved on covered topics.
- Zero regulatory incidents. The governance frame passed a BaFin informational review in month four.
And the most important number, from the COO: “We now have a team of four people who can keep this running without you. That is what we actually bought.”
What’s next
Pilots 4 and 5 — underwriting assistant for SME commercial lines and complaint classification for the ombudsman team — are in discovery. Because the governance layer, evaluation infrastructure, and retrieval tooling are already in place, each new pilot starts at roughly 40% already built.
If you have a multi-hundred-page AI strategy and nothing in production — we can help you turn it into a short ranked list and a shipped pilot in five weeks. If you don’t have a strategy yet, we can help you skip straight to the shipped pilot. Either way, we’ll be honest about what’s worth doing.