ADR-011: Store factors.classification as JSONB¶
Status: Accepted Date: 2026-05-05 Deciders: Backend Team Related: ADR-004: Database Selection; plan docs/src/implementation-plans/310-b-factor-pipeline.md
Context¶
factors.classification was declared Column(JSON) to hold a small free-form dict (e.g. {"fuel": "diesel", "vehicle": "lorry"}). Plan 310-b introduces a unique factor identity:
(data_entry_type_id, year, emission_type_id, classification)
enforced by a partial unique index on (classification::text). PostgreSQL JSON preserves the original text, so ::text reflects whatever order Python's json.dumps produced. A second caller that writes {"vehicle": "lorry", "fuel": "diesel"} would be a different ::text value, defeating the index and silently inserting a duplicate row.
Decision¶
Migrate the column to PostgreSQL JSONB and the SQLAlchemy model to Column(JSONB) from sqlalchemy.dialects.postgresql.
ALTER TABLE factors
ALTER COLUMN classification TYPE JSONB USING classification::JSONB;
JSONB normalizes keys alphabetically at write time, so classification::text is deterministic regardless of insertion order. The partial unique indexes introduced by 310-b (uq_factor_identity and uq_factor_identity_no_year) now actually enforce identity.
Read paths return the same Python dict, so application code is unchanged.
Consequences¶
Positive:
- The unique index is a real constraint, not a coincidence. Silent duplicate-factor rows are eliminated.
ON CONFLICT DO UPDATEupserts (310-b Part 2) preservefactor.id, keepingDataEntry.primary_factor_idFKs alive across CSV re-uploads (Strategy A entries).- JSONB permits GIN indexes if classification queries grow.
Negative:
- One-time PostgreSQL-only migration; SQLite dev databases keep
JSON(no behavioral difference for tests, the index is Postgres-scoped). - Whitespace and key-order changes to the source CSV no longer produce different rows — operators must rely on values, not serialization, to express intent (the desirable outcome).
Migration path:
- Rolled out in the 310-b PR alongside the unique indexes and the
factor_repo.upsert_factorscode path. - No backfill needed — JSONB normalizes existing rows on cast.
References¶
docs/src/implementation-plans/310-b-factor-pipeline.mddocs/src/implementation-plans/310-overview.md- PostgreSQL JSON Types