Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • I Ingestion Workflow
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 41
    • Issues 41
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 11
    • Merge requests 11
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data IngestionData Ingestion
  • Ingestion Workflow
  • Issues
  • #94
Closed
Open
Issue created Mar 02, 2021 by Alan Henson@alan.hensonDeveloper

Airflow: Performance design review

R3 ingestion development work uncovered multiple performance issues with Airflow 1.10.x. Considerations for optimization range from the infrastructure for managing Airflow to consider an approach other than Airflow. Engage the Enterprise Architecture team to review the existing Workflow Service design using Airflow and determine if:

There are near-term and longer-term considerations. Near-term assumes R3M5/R3M6 development efforts. Longer-term provides space for new architectural considerations, such as cloud-native implementations with standardized workflows for write-once run-anywhere capabilities.

Near-Term

  • Update Airflow infra to optimize always-available Airflow instances to minimize the lag between ingestion initiation and ingestion start (cost is secondary, though cost-optimized profiles are valid)
  • Configure Airflow within the infrastructure as always-on vs. spin-up-on-demand. This approach increases cost but improves performance as it minimizes the delay in initiating a workflow.
  • Introduce a throttling mechanism for workflow run requests to ensure Airflow is not overwhelmed to the point of failure with large numbers of request (this also needs to consider the Storage Service max-records of 500)
  • Understand what scaling capabilities the CSPs have implemented and whether those are captured as best practices
  • Determine SLAs for workflows in terms of parallelism, CPU and memory consumption, etc.

**Longer-Term ** (will break out into separate issue)

  • A migration to Airflow 2.x should be considered
  • What infrastructure updates could be made to support better scalability
  • Determine SLAs for workflows in terms of parallelism, CPU and memory consumption, etc.
  • Consider additional data processing capabilities (e.g., Apache Spark or Apache Beam)
Edited Mar 23, 2021 by Alan Henson
Assignee
Assign to
Time tracking