Code review is too late

Baz Planner reviews the plan before the code exists, and turns the pull request into a confirmation.

Use Baz where you work
Back to plansPlans
More actions

Plan - Backfill late-arriving events in the orders pipeline DATA-2187

BBaz Planner · drafted from the DATA-2187 thread Pending review v3· 2 minutes ago
No reviews yet · review requested from Sofia Keller

Context

The orders pipeline ingests events in hourly batches. The mobile SDK retries from a device queue, so a slice of events lands hours late and is currently dropped from fct_orders. The goal: capture late arrivals without double-counting. Five stages change together:

  1. Capture - record a processing timestamp on every event so late arrivals can be detected.
  2. Re-window - drive the incremental model off processing time with a bounded lookback.
  3. Backfill - idempotently re-merge the last 14 days so no day is left short.

1. Source

Stamp event_ingested_at (processing time) onto every row as it lands, in stg_orders. The producer already emits an event_time; this adds the warehouse arrival time alongside it.

2. Schema

Add a nullable event_ingested_at timestamp to the orders_landing table. No historical backfill of the column is required - new loads populate it, and the model treats nulls as on-time.

3. Incremental model

Switch the fct_orders incremental predicate from event_time to event_ingested_at with a 3-day lookback, and dedupe on order_id keeping the latest row.

13{% if is_incremental() %}14  where event_time > (select max(event_time) from {{ this }})14  where event_ingested_at >=15    (select max(event_ingested_at) from {{ this }}) - interval '3 days'16{% endif %}

4. Backfill job

Add an idempotent backfill for the last 14 days of partitions, guarded so it can never --full-refresh the production table. Re-runs upsert on order_id and converge to the same totals.

5. Downstream consumers

The revenue rollup and the finance dashboard both read fct_orders. Late merges revise historical daily totals, so notify #data-platform and move the dashboard to as-of semantics.

Verification

dbt build passes with the unique/not_null tests on order_id, row counts reconcile against source for a seeded late-event fixture, and no run triggers a partition full-refresh. Estimated 4 models - no destructive migration.

Move your review upstream

Start free on a real repository and let Planner review your next change before the code is written.