310 — Backoffice recalculation upon factor updates¶
Context¶
When operators update emission factors via the backoffice (CSV upload or factor sync), all derived data downstream of those factors becomes stale: data_entry_emissions rows still reference the old factor values, and carbon_reports.stats aggregates still reflect the pre-update totals. Today this requires a manual recalculation step that operators forget, leading to inconsistent reporting until the next batch run.
The naive fix — "compute emissions inline during factor ingest" — surfaced concurrency bugs: multiple bulk pipelines for different modules race on the same data_entry_emissions and carbon_reports tables, two pods can claim the same job under the existing is_current flag, and the orchestration layer (BackgroundTasks + ad-hoc dispatch) had no recovery for crashed workers.
Decisions¶
After scoping the work, the issue was decomposed into four phased plans, each landing independently. Read 310-overview.md for the architectural rationale.
| Plan | Focus | File |
|---|---|---|
| A | Pod safety + atomic claim + safety poller + manual recovery endpoint | 310-a-pod-safety.md |
| B | Factor upsert-in-place (JSONB classification, last_seen_job_id), auto-recalc trigger, unit-sync tracking | 310-b-factor-pipeline.md |
| C | Handler registry + unified run_job runner + observability columns | 310-c-dag-handler-registry.md |
| D | Bulk path: pure-async ingest → recalc → aggregation chain (Path 2 only) | 310-d-pipeline-responsibility-split.md |
Cross-cutting principles that hold across all four plans:
- One writer per table on the bulk path.
data_entrieswritten only bycsv_ingest/api_ingest;data_entry_emissionsonly byemission_recalc;carbon_reports.statsonly byaggregation. - Path 1 (interactive UI) is unchanged. Single-row module edits stay synchronous; their inline writes are scoped to one row and trivially serialized by the request.
- Postgres is the queue. Atomic claim via partial unique index +
SELECT FOR UPDATE SKIP LOCKEDpolling. No external broker (Celery / Redis / RabbitMQ). - Recalculation is idempotent. Handlers can be replayed without producing duplicate derived rows; aggregation always uses
ON CONFLICT DO UPDATE.
Out of scope¶
- Path 1 inline emission compute (deliberate UX choice for the interactive editor).
- Migration to a third-party job queue. Decision rationale lives in
310-overview.md. - Backfill of factor history beyond what
last_seen_job_idexposes (Plan B).
Status¶
Plan A has shipped on feat/310-backoffice-recalculation-upon-factors-update. Plans B, C, D are sequenced behind it. See each plan's "Tests" section for acceptance criteria.