Admin message

On Thursday, June 4th, we will be performing critical infrastructure maintenance on our Disaster Recovery processes. This will necessitate brief outages for Community GitLab from time to time. The outages will not lose any work, however, some jobs may need to be restarted. If you are not able to access one of our services or websites, please wait a few minutes and try again. Additional status updates will be available on our status page at https://status.opengroup.org/.

SamplesAnalysisID in bulk data not validated against record_id in URL

Problem

When posting bulk data via POST /api/rafs-ddms/v2/samplesanalysis/{record_id}/data/{analysis_type}, the SamplesAnalysisID column in the request body is not validated to match the record_id in the URL path.

This allows inconsistent data where:

  • URL: /samplesanalysis/ns:wpc--SamplesAnalysis:ABC:/data/cce
  • Body contains: "SamplesAnalysisID": "ns:wpc--SamplesAnalysis:XYZ:"

The system silently accepts this mismatch.

Current Behavior

  • record_id from URL is used to update DDMSDatasets in the WPC metadata
  • SamplesAnalysisID in bulk data is stored as-is in the parquet file
  • No validation connects these two values

Impact

Data integrity issue: The relationship from content back to WPC (SamplesAnalysisID) can be inconsistent with the authoritative relationship (DDMSDatasets).

Proposed Solutions

Option A: Strict Validation

  • Validate that all SamplesAnalysisID values in the data match record_id
  • Return 422 error if mismatch detected

Option B: Auto-populate

  • Overwrite/populate SamplesAnalysisID column with record_id from URL
  • Ensures consistency without requiring user to provide it

Affected Code

  • app/api/routes/v2/data/endpoints.py:157-234 (post_data_v2)
  • app/api/routes/data/api.py:262-326 (_get_validated_payload)
  • app/bulk_data_validation/data_validation.py
Assignee Loading
Time tracking Loading