Parsers/converters code organization porposal
Rationale
Now the number of different parsers was presented to the OSDU. The parsers/converters were implemented with using of different technologies and programming languages(C++, Java, Python, etc.).
It can cause difficulties during onboarding such parsers: requirements, code organization, runtime environment setup.
Objective
Approve or develop a unified approach regarding to the parsers/converters representation and usage as Airflow DAGs’ operators.
Proposal
The intention to use as much as possible containerized DAG’s steps, i.e. to use KubernetesPodOperator was mentioned as one of the best practices for Manifest-based Ingestion pipelines. It means that the pipeline step can be implemented on absolutely different technology and executable part of the step will be executed inside Docker container.
The proposal is to deliver with a parser code a properly configured base Dockerfile. This docker file will contain only required dependencies to run the parser with ability to extend or configure the executable invocation (parameters, environment variables etc.)
Each CSP provider, if needed, should develop own Dockerfile with additional requirements or environment variables setup.
Implementation
The proposal's implementation example - WITSML parser
Note
For lightweight DAG’s dependencies (local dependencies) the Packaged DAGs approach can be used.