Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • I Ingestion DAGs
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 32
    • Issues 32
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data IngestionData Ingestion
  • Ingestion DAGs
  • Issues
  • #51
Closed
Open
Issue created Mar 19, 2021 by Debasis Chatterjee@debasiscOwner

Ingestion workflow - provide option to enable, disable validation checks (referenced information)

Starting with recent version of "Ingestion Workflow", we see that integrity checks are enabled. This is very useful.

However, please provide option of setting true/false for "integrity checks" to the end user (Data Loader) who may or may not have required skills to tweak python code inside the DAG. Default can be left as "True".

Excerpt from Airflow Log (GCP environment) is shown below for quick reference -

[2021-03-18 17:45:08,504] {base_task_runner.py:113} INFO - Job 27967: Subtask provide_manifest_integrity_task [2021-03-18 17:45:08,503] 
{validate_referential_integrity.py:156} DEBUG - Extracted reference ids: 
['osdu:reference-data--AliasNameType:WELL_NAME', 
'osdu:reference-data--VerticalMeasurementPath:DEPTH_DATUM_ELEV', 
'osdu:reference-data--ResourceSecurityClassification:Public', 
'osdu:reference-data--FacilityEventType:SPUD_DATE', 
'osdu:reference-data--FacilityType:WELLBLABLA', 
'osdu:master-data--Organisation:HESS']

In this example, all checks failed as the environment lacked standard Reference values at the time of this run. Else, I would only expect one reference check to fail (FacilityType = "WELLBLABLA" instead of "WELL").

[2021-03-18 17:45:44,405] {base_task_runner.py:113} INFO - Job 27967: Subtask provide_manifest_integrity_task [2021-03-18 17:45:44,405] 
{validate_referential_integrity.py:177} WARNING - The next ids are absent in the system: 

['osdu:reference-data--FacilityType:WELLBLABLA', 
'osdu:reference-data--FacilityEventType:SPUD_DATE', 
'osdu:reference-data--ResourceSecurityClassification:Public', 
'osdu:reference-data--VerticalMeasurementPath:DEPTH_DATUM_ELEV', 
'osdu:master-data--Organisation:HESS', 
'osdu:reference-data--AliasNameType:WELL_NAME']

[2021-03-18 17:45:44,413] {base_task_runner.py:113} INFO - Job 27967: Subtask provide_manifest_integrity_task [2021-03-18 17:45:44,411] 
{validate_referential_integrity.py:231} WARNING - Resource with kind odesprod:wks:master-data--Well:1.0.0 was rejected
Assignee
Assign to
Time tracking