📄 PRD 3 — Detailed CSVs (*_data.csv)¶
1. Objective¶
Provide raw, traceable, and schema-specific data for each data entry type.
This dataset is intended for:
- auditing
- debugging
- detailed analysis
2. Scope¶
- One CSV per
data_entry_type_id - One row per
data_entry - JSON data flattened into columns
3. Data Sources¶
data_entriesdata_entry_emissionscarbon_report_modulescarbon_reportsunits
4. Data Model¶
Granularity¶
- 1 row = 1 data_entry
5. Common Columns¶
| Column | Description |
|---|---|
| unit_institutional_id | unit identifier |
| year | reporting year |
| data_entry_id | entry identifier |
| kg_co2eq | aggregated emissions |
| note | optional note |
6. Key Rule¶
Each CSV:
- corresponds to one
data_entry_type_id - has its own schema
- flattens
data_entries.dataJSON
7. Base SQL Template¶
SELECT
cr.year,
u.institutional_id AS unit_institutional_id,
de.id AS data_entry_id,
de.data,
SUM(dee.kg_co2eq) AS kg_co2eq
FROM data_entries de
JOIN carbon_report_modules crm
ON crm.id = de.carbon_report_module_id
JOIN carbon_reports cr
ON cr.id = crm.carbon_report_id
JOIN units u
ON u.id = cr.unit_id
LEFT JOIN data_entry_emissions dee
ON dee.data_entry_id = de.id
WHERE de.data_entry_type_id = <TYPE_ID>
GROUP BY
cr.year,
u.institutional_id,
de.id,
de.data;
8. Transformation¶
- Extract JSON fields using
->>operator - Cast values when needed
- Rename columns to match CSV contract
9. Example — Plane Travel¶
SELECT
u.institutional_id,
de.data->>'origin_iata' AS origin_iata,
de.data->>'destination_iata' AS destination_iata,
de.data->>'user_institutional_id' AS user_institutional_id,
de.data->>'departure_date' AS departure_date,
(de.data->>'number_of_trips')::int AS number_of_trips,
de.data->>'cabin_class' AS cabin_class,
de.data->>'note' AS note,
SUM(dee.kg_co2eq) AS kg_co2eq
FROM data_entries de
JOIN carbon_report_modules crm ON crm.id = de.carbon_report_module_id
JOIN carbon_reports cr ON cr.id = crm.carbon_report_id
JOIN units u ON u.id = cr.unit_id
LEFT JOIN data_entry_emissions dee ON dee.data_entry_id = de.id
WHERE de.data_entry_type_id = 20
GROUP BY
u.institutional_id,
origin_iata,
destination_iata,
user_institutional_id,
departure_date,
number_of_trips,
cabin_class,
note;
10. Output¶
- Files:
<type_name>_data.csv - One file per
data_entry_type_id
11. Constraints¶
- No mixing of types
- JSON schema must be stable per type
- Aggregation only at data_entry level
12. Future Improvements¶
- Automate SQL generation per type
- Add validation layer for schema consistency
- Build unified export pipeline (Python/dbt)