Implementation Progress Report¶
π Summary¶
The audit trail infrastructure has been partially implemented with core versioning, database models, and API integration. The foundation is solid but needs external storage integration and UI exposure.
β οΈ Important Clarification: AUDIT vs APPLICATION Logs¶
| Logs Type | Storage | Purpose | Tools |
|---|---|---|---|
| AUDIT (OPDO) | audit_documents DB table | "Who did what when?" - Track data modifications | Internal DB + Export to ElasticSearch |
| APPLICATION | Pod logs (Kubernetes) | Debug: CPU/RAM, errors, API connectivity | Grafana, Loki, OpenTelemetry |
- Travel API Logs = APPLICATION logs (API connectivity, performance) in Kubernetes pods
- Travel Data Imports = AUDIT logs created when job is inserted into DB
- User Activity = AUDIT logs (connections, data modifications)
β Phase 1: What Has Been Done¶
1. Core AUDIT Infrastructure (OPDO - DB Historical Logs)¶
- β
Created
AuditDocumentmodel with versioning fields (entity_type, entity_id, version, is_current, change_type, etc.) - β
Implemented
AuditChangeTypeEnumwith CREATE, READ, UPDATE, DELETE, ROLLBACK, TRANSFER - β Database migration files created for PostgreSQL/SQLite compatibility
- β Hash chain integrity mechanism for tamper detection
- β Tracks what changed, when, by whom, from which IP, via which route
2. High-Performance Bulk Version Creation¶
- β
Implemented
AuditDocumentService.bulk_create_versions() - Reduces N sequential DB queries to 1 batch query
- Single flush instead of N flushes
- Critical for CSV import performance (1000+ entries)
- β
AuditDocumentRepository.bulk_create()for batch insertions
3. Data Entry Service Versioning¶
- β
Enhanced
DataEntryServicemethods: create()- creates version on single entry creationbulk_create()- bulk versions with job_id context (CSV imports)bulk_delete()- captures snapshots before deletionupdate()- audit trail for modificationsdelete()- records deletionsget_submodule_data()- READ audit records for OPDO compliance
4. Request Context Capture¶
- β
New
app/utils/request_context.py- extract IP, route path, payload - β
New
app/utils/audit_helpers.py- extract user identifiers (sciper, traveler_id) - β API route handlers updated to pass request context
- β
Updated routes:
get_submodule,create,update,deleteincarbon_report_module.py
5. User Activity Tracking (AUDIT LOGS)¶
- β Data Entry CREATE operations logged with user, timestamp, IP, route
- β Data Entry UPDATE/DELETE operations with change snapshots
- β READ operations logged (trips, member data for OPDO compliance)
- β Handled IDs extracted (sciper for headcount, traveler_id for trips)
- β CSV import jobs tracked (who imported, when, job_id)
- β³ Authentication events (login, logout) - NOT YET IMPLEMENTED
- Need to add audit records for user logins to answer "qui loggΓ© quand?"
- Should log from auth middleware/endpoint
6. Schema & DB Updates¶
- β Migration 1: audit_documents table creation
- β Migration 2: Enum type conversion, field renaming
- β UserRead schema includes provider_code
- β HeadCount schemas include sciper field
π§ Phase 2: Still Needs Implementation¶
1. External Audit Log Storage (HIGH PRIORITY) - NOT APPLICATION LOGS¶
This is for AUDIT logs only (who changed what in DB), NOT travel API application logs.
EPFL Data Protection Compliance (inside.epfl.ch/data-protection/):
- Must log all CRUD operations on personal data (Headcount + Travel modules with
sciperinvolved) - Must log READ operations on personal data (rule TBD: likely when
# sciper < 20) - Mandatory Fields in all audit logs:
actor_id: Identifier of person who performed processing (unique, retrievable by authorized staff)recipient_id: Identifier of person whose data is accessed (if applicable)change_type: Nature of processing (READ, CREATE, UPDATE, DELETE, TRANSFER)changed_at: ISO 8601 timestamp with timezone (yyyy-mm-dd HH:MM:SS Β±UTC)- Recommended Fields (for compliance analysis):
subject_id: Which sciper/person's data was accessedquery_summary: Transaction/query used (e.g., "SELECT initial..." without results)-
source_ip: Machine initiating processing (hostname or IP address) -
Automatic Archival & Purge Mechanism (MΓ©chanisme d'archivage et purge automatique)
- Keep audit logs in local DB for 1 year (searchable, fast access)
- After 1 year: automatically archive to ElasticSearch/external storage (IS-GOV)
- Automatic purge from local DB after archival (no manual intervention)
- Ensure archived logs remain immutable (write-once, read-many)
- All EPFL compliance fields preserved during archival
-
Scope: Only personal data CRUD + READ operations (< 20 sciper threshold TBD)
-
Audit Log Viewing Interface (externalized console)
- Standalone audit log viewer (NOT embedded in app)
- Query/filter by:
- User (who performed action)
- Entity type (DataEntry, User, etc.)
- Date range (1-year local + historical from ES)
- Action (CREATE, READ, UPDATE, DELETE, TRANSFER)
- Subject (whose data was accessed)
- Display: timestamp, actor, action, entity, changes, IP address
- Read-only interface (no data modification)
- Authorization: Service managers + admins only
2. Application Observability Logs (MONITORING - SEPARATE CONCERN)¶
These are Kubernetes pod logs, handled by DSI (not this feature):
- β³ Travel API connectivity logs β Loki/Grafana (NOT our responsibility)
- β³ CPU/RAM usage logs β OpenTelemetry
- β³ Error/debug logs β Pod logs
- Note: Travel API import jobs ARE tracked as audit events in audit_documents
3. Authentication Audit Logging (HIGH PRIORITY) - NEW¶
Add audit events for user authentication:
- Login Events
- entity_type = "User"
- entity_id = user_id
- change_type = CREATE (new session) / TRANSFER (existing user login)
- Track: who, when, from which IP
- Logout Events
- change_type = DELETE (session ended)
- Failed Login Attempts (optional but recommended)
- Track for security analysis
4. Service Manager API & UI (HIGH PRIORITY)¶
Query the AUDIT logs to answer Service Manager questions:
cf 240-feat-interface-service-mgr
5. Data Retention & Compliance (MEDIUM PRIORITY) - AUDIT LOGS ONLY¶
Legal requirement: Keep audit logs for 5 years minimum, with local 1-year copies.
- 1-Year Local Archive
- Audit logs kept in DB for quick access (1 year)
- Indexed for fast queries
- Long-Term External Storage
- After 1 year: move to ElasticSearch/cold storage
- Keep for 5+ years per legal requirements
- Immutable (write-once, read-many)
- Purge Policy
- Automated job to archive logs older than 1 year
- Delete from local DB after archiving
- Ensure deleted data cannot be recovered (disk wipe/encryption)
6. Travel API & Data Imports (MEDIUM PRIORITY)¶
Clarification: Travel API connectivity logs = APPLICATION logs (pod logs), not audit logs.
- β CSV import jobs ARE tracked in audit_documents (job creation event)
- β Data entries created via CSV ARE tracked (entity creation events)
- β³ Verify travel API data imports flow through correctly
- β³ Test that CSV import audit trail shows correct job_id
7. Testing & Validation (MEDIUM PRIORITY)¶
- Unit tests:
test_audit_service.py- versioning logic, hashingtest_data_entry_service_versioning.py- integration tests- Integration tests:
- CSV bulk import with audit trail
- Update/delete operations
- Authentication logging (login/logout)
- READ audit logging
- Performance tests:
- Bulk 10k entry import performance
- Query performance on audit table with 1M+ records
-
Target coverage: β₯60% backend code
-
Files with missing lines Patch % Lines backend/app/api/v1/audit.py 25.68% 81 Missing β οΈ backend/app/repositories/audit_repo.py 24.44% 68 Missing β οΈ backend/app/services/audit_service.py 56.48% 57 Missing β οΈ backend/app/utils/audit_helpers.py 42.85% 20 Missing β οΈ backend/app/api/v1/auth.py 80.00% 10 Missing β οΈ backend/app/utils/request_context.py 66.66% 10 Missing β οΈ backend/app/services/data_entry_service.py 87.03% 7 Missing β οΈ ...ckend/app/services/data_ingestion/base_provider.py 0.00% 6 Missing
8. Security & RBAC (MEDIUM PRIORITY)¶
- Authorization checks:
- Only service managers can view audit logs
- Users CANNOT see other users' activity (privacy)
- Admin has full access
- Audit log access is itself logged (create audit event when someone views logs)
- IP address validation/masking (avoid exposing internal IPs)
9. Documentation (LOW PRIORITY)¶
- User Guide:
- "Activity History" section in Service Manager
- How to find who created/modified an entry
- Understanding the audit log timeline
- API Documentation:
- Audit query endpoints
- AuditChangeTypeEnum values
- Request/response examples
- Architecture documentation:
- "AUDIT vs APPLICATION Logs" clarification
- Versioning system design
- Hash chain integrity explanation
- 1-year local + 5-year external retention model
10. Operational Tasks (LOW PRIORITY)¶
- Database index optimization on audit_documents
- Index on (entity_type, entity_id, changed_at) for faster queries
- Index on changed_by for user activity queries
- Monitoring/alerting for audit table growth
- Backup strategy for audit data (immutable copies)
- Audit log integrity verification script
β Open Questions for DPO/Legal Team¶
Source: EPFL Data Protection Guidelines (inside.epfl.ch/data-protection/) Contact: PM/PO
- READ Data Logging Scope & Threshold
- Which READ operations must be logged to ES?
- Tentative Rule: Log all READs where affected
scipercount < 20 (to prevent data breach of "aggregated" data) - Question: Is this the correct threshold? Below what number of users does aggregated data become personally identifiable?
- Example: Unit with 3 people, only 1 took a trip β logging this READ creates breach risk
-
Need: Exact rule from DPO on scope
-
Headcount vs Travel Module Logging
- Both modules involve
sciper(mandatory personal data subject) - Both require CRUD + READ audit logging
- Confirmed: Both modules go to ES
-
Need: Confirmation on READ threshold application to both
-
Anonymous/Aggregated Data
- Dashboard dashboards with aggregated stats (no individual sciper)?
- Should these be logged? Does the < 20 rule apply?
-
Need: Classification of which queries are "personal data" vs "aggregated only"
-
Data Recipient Identification
- When data is accessed by a report or API call, how to identify the "recipient"?
- Is it the end-user, or the system querying on their behalf?
- Need: Guidance on
recipient_idfield mapping for different query types
π Implementation Roadmap¶
Sprint 1 (Immediate - Week of Feb 17)¶
- β Core audit infrastructure (DONE)
- β High-performance bulk versioning (DONE)
- Write unit tests for AuditDocumentService
- Add authentication logging - login/logout events (NEW)
- Verify CSV import captures audit trail end-to-end
Sprint 2 (Week of Feb 24)¶
- Create audit query endpoints
/api/v1/audit/activity - Implement basic Service Manager dashboard view
- Add date-range and user filtering
- Test authentication event logging
Sprint 3 (Week of Mar 3)¶
- ElasticSearch integration (requires IS-GOV access)
- Implement 1-year archive + ES sync job
- Create ES queries for dashboard
- Setup purge automation (1-year trigger)
Sprint 4 (Week of Mar 10)¶
- 5-year retention compliance testing
- Export functionality (CSV/JSON)
- Archive immutability verification
- Documentation and user guide
π― Success Criteria Checklist¶
| Criterion | Status | Notes |
|---|---|---|
| Activity saved in AUDIT DB | β Done | All CRUD ops logged |
| Authentication events logged | β Done | Login/logout audit records needed |
| Visible in Service Mgr UI | β Done | API done, UI needed |
| Query audit logs by date/user/type | β Done | API endpoints required |
| Export audit log capability | β Done | CSV/JSON export |
| User guide updated | β³ Not Started | "Activity History" section needed |
| API docs updated | β³ Not Started | OpenAPI/Swagger |
| Test coverage β₯60% | β³ Not Started | Need test suite |
| ElasticSearch integration | β³ Not Started | Awaits IS-GOV access |
| 1-year local archive automation | β³ Not Started | Cron job to move old logs |
| 5-year retention compliance | β³ Not Started | Legal requirement |
| Who created/modified objects | β Done | changed_by field populated |
| Who logged in when | β³ Not Started | Authentication audit events needed |
| Clarified AUDIT vs APP logs | β Done | This document |
π Next Immediate Steps¶
- Clarify scope with PM: AUDIT logs β APPLICATION logs
- AUDIT: Who changed data in DB (our responsibility)
-
APPLICATION: Pod logs, API connectivity (DSI responsibility)
-
Add authentication logging
- Implement login/logout events in audit_documents
-
Answer "qui loggΓ© quand?" requirement
-
Run CSV import end-to-end test
- Confirm audit trail works for bulk operations
-
Verify job_id tracking
-
Define audit query API with Service Manager team
- What fields do they need to filter on?
-
What timeline granularity?
-
Get ES access from security team
-
Start planning ElasticSearch cluster use
-
Schedule review with code review team on audit log format/structure
- Validate field naming conventions
- Confirm compliance with regulations