Methodology

How We Verify a Casino Bonus

From scrape to publish in 11 stages

The data pipeline behind BonusWijs.nl. Built and refined over a month of adversarial review by six frontier LLMs. Now running daily across 30 KSA-licensed operators.

10 min read|LLM ExtractionSchema ValidationAudit TrailFail-Closed

The Problem

KSA-licensed operators publish their bonus terms in long Dutch legal text. Match percentages buried in paragraph four. Wagering requirements expressed three different ways on the same page. "Choice bonuses" where the player picks between alternative arms (a 100% match up to €250 OR 100 free spins) look like contradictions to anything trying to extract a single value.

Affiliates and comparison sites have two options. Republish the operator's marketing copy unchanged: low-trust, high-throughput, what most of the market does. Or maintain the data by hand: slow, error-prone, doesn't scale past a handful of operators.

Both fail under the regulatory standard the KSA tightened in 2024. Display anything that doesn't match the live source page and you're not just risking your reputation. You're inviting a regulator's letter.

We needed a third option. Something that could read the page like a human, encode the structure like a database, and refuse to publish anything it wasn't sure about.

The Shape of the System

Stages 01-04 reduce noise. Stages 05-07 decide what's safe. Stages 08-11 publish, monitor, and recover. The flow is one direction, with documented hand-offs.

Reduce

01 - 04

Scrape, extract, compare, suppress.

Decide

05 - 07

Auto-update what's safe, escalate what isn't, validate everything before publish.

Publish & Watch

08 - 11

Ship clean runs, fail closed on staleness, queue new arms, guard against hallucination.

The 11 Stages

Each stage solves one problem and hands off to the next. Together they take a bonus from operator page to published data with a documented audit trail at every step.

Daily scrape

Every morning at 06:00, the official bonus terms page of every KSA-licensed operator gets pulled. Playwright with stealth flags, Dutch geo via NordVPN, retries on transient failures. 30 operators, fully unattended.

LLM extraction

Claude reads the scraped HTML and returns structured fields: match percentage, max bonus, free spins, wagering requirement, wagering target, time limits, bonus type. The model sees the page; the schema sees only typed values.

Choice-arm-aware comparison

Extracted values get compared against the stored data, but the comparator knows about "keuzebonussen" (choice offers). When a player can pick between alternative arms, each arm is structurally encoded. Matching either one counts as confirmed.

Suppression filter

Known LLM extraction artifacts get auto-resolved: prose-as-null misreads, choice-arm confusion, "combined" misclassification on multi-arm offers. Nine recurring patterns are filtered automatically so they don't reach a human.

Auto-update for safe fields

Fields where the extractor agrees with high confidence and the impact is non-bonus-critical (time limits, formatting) auto-apply. Bonus-critical fields (match %, max bonus, free spins, wagering, type) never auto-apply. Ever.

Human-in-the-loop on real flags

Anything that survives suppression and isn't auto-applicable goes to manual review against the live source page. Each decision is recorded in public verification notes: what changed, why, with a citation to the operator's page.

4-layer validation gate

Before anything publishes: zod-typed schema guards with hard ceilings, a heuristic validator that catches anomalies (wagering on a no-deposit bonus = warn), a verification-notes pre-commit hook that requires documented evidence for bonus-critical changes, and regression tests that catch every past incident.

Auto-publish, auto-deploy

Clean runs commit, bump npm, publish to GitHub Packages, bump the downstream consumer (BonusWijs.nl), and trigger a Vercel deploy. The end-to-end loop runs without human hands on a green run.

Safety valves

Publish-block if a displayed bonus has gone unconfirmed for more than 30 days. A per-arm staleness watchdog at 60, 90, 180 days. Fail-closed semantics: better a stale date in the footer than wrong terms on the page.

Candidate queue for new arms

If a scraper consistently returns a value matching no known arm for three consecutive runs, it gets surfaced for human review. The first promotion per casino requires explicit approval; subsequent matches auto-apply. New offers don't get lost, but they don't slip through either.

Hallucination guard

At promotion time, a second-pass LLM extraction with a structurally different prompt re-reads the page. Disagreement between the two passes defers to manual review. Two cheap reads beat one expensive recovery from a hallucinated field.

What Changed

The pipeline replaces a manual review queue that was generating ~9 false positives a day. After the suppression filter and choice-arm comparator landed, real signal-to-noise dropped to 0-3 actionable flags per day.

9 → 0-3

Daily false positives, before/after suppression and choice-arm comparison

Frontier LLMs in 3 rounds of adversarial peer review on the schema

KSA-licensed operators on the daily run

100%

Bonus-critical changes documented in public verification notes

Adversarial peer review

The schema design went through three rounds of review by six frontier models: DeepSeek V4 Pro, Minimax M1, GPT-5.5 Pro, Kimi K2.6, Claude Sonnet 4.6, Gemini 2.5 Pro. Disagreement between models surfaced edge cases the original spec missed. The final schema is the version every reviewer signed off on.

Why This Matters For Operators and Affiliates

Three properties of the pipeline translate directly into things our clients care about.

Regulatory readiness

Every bonus-critical change is documented with a citation to the live operator page. If a regulator asks why the displayed value is what it is, the answer is in version control. Not in someone's inbox.

Trust as a product surface

Users can read the verification notes. The audit trail is the marketing. Comparison sites that publish unchecked operator copy can't say that, and increasingly that gap shows up in search and conversion.

Scaling beyond manual review

Adding the 31st operator costs a configuration entry, not a headcount. The structural elimination of noise (choice-arm comparison, suppression filter) is what makes that economically real.

Fail-closed by default

When the pipeline is uncertain, it doesn't publish. It blocks the stale value and surfaces the gap. The asymmetry (better a missing date than a wrong number) is wired into the gate logic.

The Takeaway

Bonus data isn't hard because the values are complex. It's hard because the source pages are messy, the structure is implicit, and a single wrong number is a regulatory event. The pipeline doesn't try to be cleverer than the page. It tries to be more disciplined than a human reviewing 30 operators by hand.

The same approach generalizes. Wherever you have a regulated content surface, structured comparison data, and a wall of source-page legal text in between: the methodology applies.

If you run an operator brand or an affiliate portfolio in a regulated market and your data has these problems, we should talk.

See it live

BonusWijs.nl runs entirely on this pipeline. The verification notes are public.

Visit BonusWijs.nl →

Get in touch