Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • I Ingestion DAGs
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 32
    • Issues 32
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Open Subsurface Data Universe SoftwareOpen Subsurface Data Universe Software
  • Platform
  • Data Flow
  • Data IngestionData Ingestion
  • Ingestion DAGs
  • Issues
  • #80
Closed
Open
Issue created Jul 22, 2021 by Ben Lasscock@blasscoc💬Maintainer

M7 Manifest based ingestion - Load Testing

Definitions

Load Testing - The number of records or manifests that can be processes at a time.

Background

Throughout M7 there have been a number of performance improvements delivered by EPAM, as well as work on improving issues with configuration etc. We expect this has made a significant improvement to the capacity of the manifest based ingestion, but we don't have a specific figure.

The process of load testing should be repeatable, with the expectation it will be applied to the upcoming Airflow 2+ changes.

Requirements

We need test the "5000 manifest test" @debasisc @todaiks to be re-run on the M7 release. The result should a binary pass/fail and the wall-time for executing the job. For completeness (Table 1) we show a set of recommended test cases that we believe should ultimately be automated and runnable through the QA group.

Test Issue AWS Azure GCP IBM
the "5000 manifest" Our current baseline
1 Manifest with 5,000 records
1 Manifest with 20,000 records
1 Manifest with 50,000 records Limit on the size of the request body
50K manifests in multiple requests, not simultaneously Airflow 1.X doesn’t allow sending multiple requests (Fixed in Airflow 2.0)
chunks of 50, 1000 DAG runs 1. max_active_runs (50) limitation 2. limitation of workflow service: java heap error Issue 64 3. Storage Service has a limitation of storing no more than 500 records/s
chunks of 1000 see above
50 DAG runs
Launch several different DAGS simultaneously
Ingest the Volve data to promote adoption
Ingest the TNO data to promote adoption
Edited Jul 28, 2021 by Ben Lasscock
Assignee
Assign to
Time tracking