Parsers/converters code organization porposal

Rationale

Now the number of different parsers was presented to the OSDU. The parsers/converters were implemented with using of different technologies and programming languages(C++, Java, Python, etc.).

It can cause difficulties during onboarding such parsers: requirements, code organization, runtime environment setup.

Objective

Approve or develop a unified approach regarding to the parsers/converters representation and usage as Airflow DAGs’ operators.

Proposal

The intention to use as much as possible containerized DAG’s steps, i.e. to use KubernetesPodOperator was mentioned as one of the best practices for Manifest-based Ingestion pipelines. It means that the pipeline step can be implemented on absolutely different technology and executable part of the step will be executed inside Docker container.

The proposal is to deliver with a parser code a properly configured base Dockerfile. This docker file will contain only required dependencies to run the parser with ability to extend or configure the executable invocation (parameters, environment variables etc.)

Each CSP provider, if needed, should develop own Dockerfile with additional requirements or environment variables setup.

Implementation

The proposal's implementation example - WITSML parser

Note

For lightweight DAG’s dependencies (local dependencies) the Packaged DAGs approach can be used.

Edited Jun 29, 2021 by Siarhei Khaletski (EPAM)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information