1183 — Train CSV station disambiguation¶
1. Problem¶
The train CSV trip format ships station names (origin_name, destination_name). Same-name stations exist in multiple countries (BERNE, CH vs BERNE, DE), so the CSV-time resolver in ProfessionalTravelTrainModuleHandler.enrich_csv_row cannot disambiguate cross-country collisions from names alone.
The earlier approach defaulted the missing country to CH. That silently mis-resolved every non-Swiss station and violated the no-silent-fallbacks rule. This iteration removes the default and requires the country instead — made viable by sourcing the seed from a dataset that carries country natively.
2. Location seed: trainline-eu stations.csv¶
The train location seed is built from the open trainline-eu dataset (https://github.com/trainline-eu/stations), which ships an ISO-2 country per station — so country codes come from the source, not a backfill.
- Builder:
backend/scripts/build_train_seed_from_trainline.py(stdlibcsvonly; LF terminators; ≤40-line functions). - Input
backend/stations.csv(;-delimited, gitignored) → outputbackend/seed_data/seed_travel_location_train.csv(the 10-column comma schemaReferenceDataCSVProvideringests). - Kept rows:
is_suggestable=t,is_airport≠t, non-emptylatitude/longitude(NOT NULL inlocations), non-emptycountry. → 51,299 stations. continent/municipality/iata_code/airport_sizeare absent from the source and stay blank (all optional for trains);keywordsmirrorsnamefor station search.
Re-upload via the backoffice train station reference slot (data_entry_type_id=21, not plane=20).
3. Required country_code in the trip resolver¶
enrich_csv_row now rejects any train CSV row that lacks a {role}_country_code for an endpoint without a precomputed {role}_natural_key — before any station lookup. There is no CH default.
app/modules/professional_travel/schemas.py::enrich_csv_row— missing country_code → row errorMissing {role}_country_code.app/services/location_service.py::resolve_train_station_for_csv—country_codeis now a required parameter (nodefault_country_code="CH").- UI/API entries are unaffected: they carry
*_natural_keyfrom the station autocomplete and skip the resolver branch entirely.
location_repo.search_locations keeps its CH-first autocomplete ranking — that is UI ordering, not an ingestion default, and is out of scope here.
4. Tests¶
tests/unit/services/data_ingestion/test_train_enrich_csv_row.py— new. Asserts a row missingorigin/destinationcountry_code is rejected and never queries the DB (sentinel session). Fast, no Postgres.tests/integration/.../test_travel_pg.py::test_train_csv_resolves_station_by_required_country_code— renamed from the CH-default-override test; pins the CH-vs-DEBernecollision, assertingdestination_country_code=DEresolves to the German station.
5. Backfill risk (accepted)¶
Legacy DataEntry rows ingested under the old CH-default resolver may carry natural_keys pointing at the Swiss station. Per project policy ("No DB backfill until v1.x" — v0.x drops the DB between deploys) we accept the drift and do not migrate historical rows.
6. Verification¶
cd backend && uv run python scripts/build_train_seed_from_trainline.py→wrote 51299 train stations.uv run pytest tests/unit/services/data_ingestion/test_train_enrich_csv_row.pyuv run pytest tests/integration/services/data_ingestion/test_travel_pg.py -k trainmake type-check
7. Files¶
backend/scripts/build_train_seed_from_trainline.py— new (replaces the retiredadd_country_codes_to_train_csvs.py).backend/app/modules/professional_travel/schemas.py— require country_code.backend/app/services/location_service.py— dropCHdefault param.backend/tests/unit/services/data_ingestion/test_train_enrich_csv_row.py— new.backend/tests/integration/services/data_ingestion/test_travel_pg.py— test rename/rescope.docs/src/implementation-plans/1183-train-csv-country-code.md— this plan.