Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • H Home
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 15
    • Issues 15
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data IngestionData Ingestion
  • Home
  • Issues
  • #48
Closed
Open
Issue created Jun 29, 2021 by Siarhei Khaletski (EPAM)@Siarhei_Khaletski🚩Owner

Parsers/converters code organization porposal

Rationale

Now the number of different parsers was presented to the OSDU. The parsers/converters were implemented with using of different technologies and programming languages(C++, Java, Python, etc.).

It can cause difficulties during onboarding such parsers: requirements, code organization, runtime environment setup.

Objective

Approve or develop a unified approach regarding to the parsers/converters representation and usage as Airflow DAGs’ operators.

Proposal

The intention to use as much as possible containerized DAG’s steps, i.e. to use KubernetesPodOperator was mentioned as one of the best practices for Manifest-based Ingestion pipelines. It means that the pipeline step can be implemented on absolutely different technology and executable part of the step will be executed inside Docker container.

The proposal is to deliver with a parser code a properly configured base Dockerfile. This docker file will contain only required dependencies to run the parser with ability to extend or configure the executable invocation (parameters, environment variables etc.)

Each CSP provider, if needed, should develop own Dockerfile with additional requirements or environment variables setup.

Parsers_dependencies

Implementation

The proposal's implementation example - WITSML parser

Note

For lightweight DAG’s dependencies (local dependencies) the Packaged DAGs approach can be used.

Edited Jun 29, 2021 by Siarhei Khaletski (EPAM)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking