Carbon Report Module — Stats Column Population¶

Context¶

The carbon_report_modules table has a stats JSON column that was never populated. Every time the frontend needs emission breakdowns or scope totals for a module, it triggers live aggregation queries across data_entry_emissions → data_entries → carbon_report_modules. As data grows, these queries become expensive.

Goal: Populate the stats column automatically whenever a data entry is created, updated, or deleted. The stats contain scope totals, a full emission-type breakdown (leaves + rollup ancestors), and data quality signals — making frontend reads a single SELECT stats FROM carbon_report_modules WHERE id = ?.

Stats Shape¶

{
  "scope1": 150.0,
  "scope2": 5678.9,
  "scope3": 9012.3,
  "total": 14841.2,
  "by_emission_type": {
    "60000": 6498.1,
    "60100": 1469.1,
    "60101": 123.4,
    "60102": 456.7,
    "60104": 789.0,
    "60105": 100.0,
    "60200": 50.0
  },
  "computed_at": "2026-03-01T12:00:00Z",
  "entry_count": 42
}

by_emission_type includes all nodes in the module's EmissionType subtree — both leaves (actual DB values) and rollup ancestors (computed in Python).
Zero values are omitted.
Scope totals are derived from leaf values using the existing EMISSION_SCOPE mapping.
entry_count is the number of data_entries under this module.

Architecture¶

Key Insight¶

The EmissionType enum already encodes a tree structure via its 6-digit positional scheme (XX YY ZZ). The existing parent, level, and children() helpers make rollup computation trivial in Python — no extra DB queries needed.

Data Flow¶

flowchart TD
    subgraph Route["API Route (carbon_report_module.py)"]
        A["POST / PUT / DELETE\n/unit_id/year/module_id/submodule_id/item_id"]
    end

    subgraph Service["Service Layer"]
        B["DataEntryService\n.create() / .update() / .delete()"]
        C["DataEntryEmissionService\n.upsert_by_data_entry()"]
        D["CarbonReportModuleService\n.recompute_stats()"]
    end

    subgraph Stats["Stats Computation (pure Python)"]
        E["compute_module_stats()"]
        F["get_subtree_leaves()\nget_all_nodes()"]
        G["EMISSION_SCOPE\nMODULE_TYPE_TO_EMISSION_ROOTS"]
    end

    subgraph Repo["Repository Layer"]
        H["DataEntryRepository"]
        I["DataEntryEmissionRepository\n.get_stats()"]
        J["CarbonReportModuleRepository\n.update_stats()"]
    end

    subgraph DB["Database"]
        K[("data_entries")]
        L[("data_entry_emissions")]
        M[("carbon_report_modules\n→ stats JSON column")]
    end

    A --> B
    B --> H
    H --> K
    A --> C
    C -->|"delete old + create new emissions"| L
    A --> D
    D --> I
    I -->|"SUM(kg_co2eq)\nGROUP BY emission_type_id"| L
    D --> E
    E --> F
    E --> G
    F -->|"leaf values\n+ rollup ancestors"| E
    G -->|"scope mapping\n+ module roots"| E
    E -->|"stats dict"| D
    D --> J
    J -->|"UPDATE stats, last_updated"| M

    style A fill:#4a9eff,color:#fff
    style E fill:#ffa94d,color:#fff
    style M fill:#51cf66,color:#fff

Implementation¶

Phase 1 — Pure helpers (no DB)¶

File: app/models/data_entry_emission.py

Added two module-level functions:

get_subtree_leaves(root: EmissionType) -> list[int] — recursively collects only leaf values under a node.
get_all_nodes(root: EmissionType) -> list[EmissionType] — collects all nodes (root + intermediates + leaves) under a node.

File: app/models/module_type.py

Added MODULE_TYPE_TO_EMISSION_ROOTS: dict[ModuleTypeEnum, list[EmissionType]] mapping each module type to its EmissionType root(s):

Module Type	Emission Roots
headcount	`[food, waste, commuting, grey_energy]` (4 flat leaves)
professional_travel	`[professional_travel]`
buildings	`[buildings]`
equipment_electric_consumption	`[equipment]`
purchase	`[purchases]`
process_emissions	`[process_emissions]`
external_cloud_and_ai	`[external]`

global_energy and internal_services have no emission type mapping and are skipped.

Phase 2 — Stats computation¶

File: app/services/carbon_report_module_service.py

Added compute_module_stats(leaf_emissions, emission_roots, entry_count) -> dict — a pure function that:

Takes leaf-level emissions from the existing DataEntryEmissionRepository.get_stats() query.
Populates by_emission_type with leaf values.
Computes rollup values for every non-leaf node by summing its subtree leaves.
Computes scope totals using the existing EMISSION_SCOPE dict.
Returns the full stats dict with scope1/2/3, total, by_emission_type, computed_at, entry_count.

Phase 3 — Repository & Service¶

File: app/repositories/carbon_report_module_repo.py

Added update_stats(carbon_report_module_id: int, stats: dict) -> None — persists the stats JSON on the module row.

File: app/services/carbon_report_module_service.py

Added recompute_stats(carbon_report_module_id: int) -> None:

Fetches the module to get module_type_id.
Looks up emission roots from MODULE_TYPE_TO_EMISSION_ROOTS; skips if no mapping.
Fetches leaf emissions via existing DataEntryEmissionRepository.get_stats().
Counts data entries for this module.
Calls compute_module_stats() → repo.update_stats().

Phase 4 — Route hooks¶

File: app/api/v1/carbon_report_module.py

Hooked CarbonReportModuleService(db).recompute_stats(carbon_report_module_id) into three routes, after emission upsert and before db.commit():

Create (~L443): after DataEntryEmissionService.upsert_by_data_entry().
Update (~L613): after DataEntryEmissionService.upsert_by_data_entry().
Delete (~L670): after DataEntryService.delete(). Resolves carbon_report_module_id from route params before deletion (emissions cascade-delete with the entry).

Verification¶

Smoke-tested with mock data:

Module	Input	Result
Buildings	5 leaf emissions (lighting 123.4, cooling 456.7, heating_elec 789.0, heating_thermal 100.0, combustion 50.0)	scope1=150.0, scope2=1369.1, total=1519.1, rollups: 60100=1469.1, 60000=1519.1
Headcount	4 flat leaves (food 336.0, waste 50.0, commuting 200.0, grey_energy 100.0)	scope3=686.0, total=686.0, 4 entries in by_emission_type

All 653 existing unit tests pass with no regressions.

Future Work¶

Bulk ingestion hooks: CSV imports bypass the route layer. recompute_stats should be called once per module after bulk operations in ingestion tasks.
formula_version: Could be sourced from the most recent emission's formula_version field or from a build config. Omitted for v1.
missing_factors: Count of data entries with no emission rows (LEFT JOIN query). Omitted for v1.
Frontend consumption: Replace live aggregation calls with reads from stats column where applicable.