Emission Pipeline Flow¶

Reference documentation for the emission calculation pipeline. Describes how DataEntryEmission rows are produced from a DataEntry.

Pipeline Overview¶

sequenceDiagram
    participant Router as API Router
    participant MHS as ModuleHandlerService
    participant DES as DataEntryService
    participant DEES as DataEntryEmissionService
    participant Handler as ModuleHandler
    participant FS as FactorService
    participant DB as Database

    Note over Router: POST /carbon-report-modules/{id}/items
    Router->>MHS: resolve_primary_factor_id(handler, payload)
    MHS->>FS: get_by_classification(kind, subkind)
    FS->>DB: SELECT Factor WHERE classification
    DB-->>FS: Factor
    FS-->>MHS: factor.id
    MHS-->>Router: payload + primary_factor_id

    Router->>DES: create(data_entry_create)
    DES->>DB: INSERT DataEntry
    DB-->>DES: DataEntry (with id)
    DES-->>Router: DataEntryResponse

    Router->>DEES: upsert_by_data_entry(response)
    DEES->>DEES: prepare_create(data_entry)

    Note over DEES: Step 1: Resolve emission types
    DEES->>DEES: resolve_emission_types(data_entry_type, data)

    Note over DEES: Step 2: Pre-compute (enrich ctx)
    DEES->>Handler: pre_compute(data_entry, session)
    Handler-->>DEES: extra ctx (e.g. distance_km)

    Note over DEES: Step 3: Resolve computations
    DEES->>Handler: resolve_computations(data_entry, emission_type, ctx)
    Handler-->>DEES: list[EmissionComputation]

    Note over DEES: Step 4: Fetch factors
    alt Strategy A — factor_id
        DEES->>FS: get(factor_id)
    else Strategy B — FactorQuery
        DEES->>FS: get_by_classification / get_factor
    end
    FS->>DB: SELECT Factor
    DB-->>FS: Factor
    FS-->>DEES: Factor (with .values)

    Note over DEES: Step 5: Apply formula
    DEES->>DEES: formula_func(ctx, factor.values) → kg_co2eq
    DEES->>DB: INSERT DataEntryEmission

Pipeline Steps¶

Step 1 — Resolve Emission Types¶

resolve_emission_types(data_entry_type, data) returns the list of EmissionType leaves to produce for this data entry. Each leaf becomes one (or more) DataEntryEmission row.

Step 2 — Pre-compute (`handler.pre_compute`)¶

Enriches the context dict with values that require DB access or non-trivial arithmetic from user data only. The returned dict is merged into ctx = {**data_entry.data, **pre_compute_result}.

Rule: pre_compute must NOT read factor values. Factor data is only available at Step 5 via factor_values.

Step 3 — Resolve Computations (`handler.resolve_computations`)¶

Declares one EmissionComputation per factor lookup needed. Each computation specifies either:

Strategy A — factor_id (int): direct lookup by ID
Strategy B — factor_query (FactorQuery): classification-based lookup

Step 4 — Fetch Factors (`_fetch_factors`)¶

Retrieves Factor objects from the database using the strategy declared in Step 3. For Strategy B, progressive fallbacks are attempted (see below).

Step 5 — Apply Formula (`_apply_formula` / `formula_func`)¶

Computes kg_co2eq from ctx (user data + pre-computed values) and factor_values (from the fetched Factor). Two approaches:

Approach	When	How
Key-based	Simple `quantity × ef`	`formula_key`, `quantity_key`, optional `multiplier_key`
`formula_func`	Complex logic	`formula_func(ctx, factor_values) → float`

Factor Retrieval Strategies¶

Strategy	Trigger	Returns	Used by
A — direct factor_id	`primary_factor_id` in `data_entry.data`	1 factor	Equipment, Purchase, Process Emissions, External Cloud/AI, Research Facilities, Energy Combustion , Buildings
B — FactorQuery	`FactorQuery(kind, subkind, context)`	1..N factors	Travel (plane/train), Headcount (member/student)

Strategy B Fallback Order¶

Full classification (subkind + context + fallbacks)
Kind only (no subkind/context)
By emission_type → returns N factors
By data_entry_type → broadest

Rematch: How Factor Changes Propagate (Plan 310-D)¶

When a factor is updated (CSV reupload, manual edit), existing DataEntry rows must be re-linked to the new factor IDs and their DataEntryEmission rows recomputed. Where the link is stored governs the rematch path — and it's a different axis from the factor-retrieval Strategy A/B above.

Link location	Rematch shape	Modules	Status
`data_entries.data->primary_factor_id` (JSON)	1:1 or 1:N per entry	Equipment, Purchase (common + additional), Process Emissions, External Cloud, External AI, Buildings — Energy Combustion, Buildings — Rooms (1:N)	✅ Plan 310-D
`data_entry_emissions.primary_factor_id` (FK)	1:N per entry	Travel (plane / train), Headcount (member / student), Buildings — Embodied Energy	✅ PR #1042

Why two paths: the JSON-link modules carry the factor on the entry itself — the rematch updates the JSON column on the entry, and any fan-out to multiple emission rows (Buildings Rooms emits one row per room_type) follows from the same canonical entry-side link. The FK-link modules either generate multiple emission rows per entry from sub-factors (Headcount) OR resolve their factor via FactorQuery at compute time so the link only ever lived on the emission row (Travel). For FK-link, the rematch walks data_entry_emissions directly via upsert_by_data_entry — pre_compute + _fetch_factors re-runs the live Strategy B query, which is empirically what propagates new factor values into the existing chain (PR #1042 finding — no workflow code change was required for FK-link rematch; only test coverage was missing).

Plan 310-D Rematch Contract (JSON-link modules)¶

EmissionRecalculationWorkflow.recalculate_for_data_entry_type:

Single bulk-fetch — all factors for (data_entry_type_id, year) in one query, indexed in a Python dict keyed by (kind, subkind). No per-entry DB roundtrips.
In-memory kind→subkind→kind-only fallback — mirrors the chain ModuleHandlerService.resolve_primary_factor_id runs in DB. Try (kind, subkind) first; on miss with subkind set, fall through to (kind, None) (only succeeds if a subkind=NULL row was prefetched).
Strict-drop on overall miss — a factor not in the current CSV is treated as deleted. The entry's primary_factor_id is cleared on the in-memory entry.data, then upsert_by_data_entry is called with the now-unmatched payload — prepare_create returns no computations and the no-emissions branch invokes delete_by_data_entry_id, so the entry's existing emission rows are removed. The entry itself stays around (so operators see the missing-factor signal on the dashboard) but no kg_co2eq row persists. (Earlier wording said "set kg_co2eq to None"; the actual code deletes the row entirely — see PR #1042's strict-drop clarification in 310-d-strategy-b-rematch.md.)
Year-strict — (year IS NULL) factors do not satisfy a year-scoped query. No fallback; if no year-matched factor exists, the entry's link is dropped per rule 3.
Per-entry rollback — the in-memory entry.data mutation is restored if upsert_by_data_entry raises, so a partial-failure batch never persists a stale primary_factor_id next to an old data_entry_emissions row.

The Strategy B handlers (kind_field declared but value derived in pre_compute, e.g. travel) are skipped by the gate kind_field is not None and kind_field in entry.data — their correct rematch path lands in the follow-up (310-d-strategy-b-rematch).

Per-Module Breakdown¶

Professional Travel — Plane¶

Step	What happens
`pre_compute`	Fetches origin/destination `Location` from DB. Computes `distance_km` (haversine) and `haul_category` (short/medium/long)
`resolve_computations`	Strategy B — `FactorQuery(kind=haul_category)`
Formula	Key-based: `distance_km × ef_kg_co2eq_per_km × rfi_adjustment`

Professional Travel — Train¶

Step	What happens
`pre_compute`	Fetches origin/destination `Location` from DB. Computes `distance_km` and `country_code`
`resolve_computations`	Strategy B — `FactorQuery(kind=country_code, fallbacks={"kind": "RoW"})`
Formula	Key-based: `distance_km × ef_kg_co2eq_per_km`

Equipment Electric Consumption (IT / Scientific / Other)¶

Step	What happens
`pre_compute`	Validates `active_usage_hours + standby_usage_hours ≤ 168`
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	`formula_func`: reads `active_power_w`, `standby_power_w`, `ef_kg_co2eq_per_kwh` from `factor_values`; reads hours from `ctx`. Computes `annual_kwh = ((active_hours × active_power_w) + (standby_hours × standby_power_w)) × WEEKS_PER_YEAR / 1000`, then `annual_kwh × ef`

Buildings — Rooms¶

Step	What happens
`pre_compute`	No override (default returns `{}`)
`resolve_computations`	Strategy B — `FactorQuery(kind=building_name, subkind=room_type)`. One computation per emission type (lighting, cooling, ventilation, heating_elec, heating_thermal)
Formula	`formula_func`: reads `kwh_per_square_meter` field (varies by emission type) from `factor_values`, multiplies by `room_surface_square_meter` from `ctx`, then `× ef_kg_co2eq_per_kwh × conversion_factor`

Buildings — Energy Combustion¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	Key-based: `quantity × kg_co2eq_per_unit`

Headcount (Member / Student)¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy B — `FactorQuery(data_entry_type, no kind/subkind)` → returns all sub-factors
Formula	Key-based: `fte × ef_kg_co2eq_per_fte`

Purchase (Common)¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	Key-based: `total_spent_amount × ef_kg_co2eq_per_currency`

Purchase (Additional)¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	`formula_func`: `annual_consumption × coef_to_kg × ef`

External Cloud¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	Key-based: `spent_amount × ef_kg_co2eq_per_currency`

External AI¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	`formula_func`: `frequency × 5 × 46 × users × factor_g / 1000`

Process Emissions¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	`formula_func`: `quantity_kg × gwp_kg_co2eq_per_kg`

Research Facilities¶

Step	What happens
`pre_compute`	No override
`resolve_computations`	Strategy A — `factor_id = primary_factor_id`
Formula	TBD — currently returns empty list

Data Enrichment for LIST/GET Responses¶

The get_submodule_data() repository method enriches data_entry.data with factor information for display purposes:

data_entry.data = {
    **data_entry.data,
    "kg_co2eq": total_kg_co2eq,
    "primary_factor": {
        **factor.values,          # e.g. active_power_w, ef_kg_co2eq_per_kwh
        **factor.classification,  # e.g. kind, subkind
    }
}

This enriched primary_factor dict is used by to_response() to extract display fields (e.g. active_power_w for equipment, kwh_per_square_meter for buildings). It is only available during LIST/GET, NOT during emission creation.

Important: pre_compute and formula_func must never rely on the primary_factor dict in data_entry.data. Factor values are available via factor_values parameter in formula_func / _apply_formula.