Manifest Ingestion DAG issueshttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues2022-08-23T11:19:22Zhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/54Redundant steps executed: validate schemas and ensure referential integrity2022-08-23T11:19:22ZBrady Spiva [AWS]Redundant steps executed: validate schemas and ensure referential integrity## Expected behavior
The validate schemas and ensure referential integrity operations should only need to be executed once per manifest
## Observed behavior
In the [top-level DAG definition](https://community.opengroup.org/osdu/platform...## Expected behavior
The validate schemas and ensure referential integrity operations should only need to be executed once per manifest
## Observed behavior
In the [top-level DAG definition](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/osdu-ingest-r3.py#L131), you can see “validate schema” and “ensure integrity” operators are executed as part of the DAG:
`branch_is_batch_op >> validate_schema_operator >> ensure_integrity_op >> process_single_manifest_file >> update_status_finished_op`
But then diving deeper into the `process_single_manifest_file` operator, it ALSO [validates schemas and ensures referential integrity](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/processors/single_manifest_processor.py#L81), resulting in redundant API calls.
This problem will go unnoticed for small workloads, but for larger workloads the increased latency will start to quickly add up. Using the TNO and Volve sample ingestion dataset as an example, there are about **24,000 manifest files**. If this redundancy adds just 2 extra API calls per manifest ( one for schema validation, one for referential integrity checks ), and each API request takes 250 milliseconds, then this would increase the overall ingestion time by:
24,000 manifests * ( .25 seconds redundancy * 2 requests ) / 60 seconds per minute = **200 minutes, or 3.3 hours**.
As the ingestion workload size increases, this redundancy becomes a non-trivial amount of time. Naturally, your mileage may vary! I'm sure we'll see different latency results for different networks, different customers, different Cloud Providers, etcetera. I'm confident customer experiences will be improved by reducing this latency.
## Some proposed solutions
1) remove the [duplicated referential integrity call](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/processors/single_manifest_processor.py#L97), the results of this operation aren't used anyway
2) change the way the manifest is obtained for the `process_single_manifest_file` operator, allowing the removal of the [duplicated schema validation call](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/processors/single_manifest_processor.py#L98). We could use Airflow mechanisms ( Xcoms, variables, etcetera ) to reuse the manifest from the `validate_schema_operator`, but that might affect the atomicity of the operator.
What do you think?JoeSiarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Alan HensonJoehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/71Airflow 2.0 WIP2022-03-09T14:56:29ZBen LasscockAirflow 2.0 WIPTracking the task list for the Airflow 2.0 port described [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/65).
### Install Airflow 2.0 to all environments
Azure
* [ ] Not Started
* [ ] ...Tracking the task list for the Airflow 2.0 port described [here](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/65).
### Install Airflow 2.0 to all environments
Azure
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
AWS
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
GCP
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
IBM [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/66) M9
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
### Airflow 2.0 required DAG code changes (level of effort estimated at 5-days)
The strategy is for (EPAM) to deliver code changes for the manifest based ingestion DAGs [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/60) and the WITSML parser [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/61) which will demonstrate the required changes to the Python code to support both Airflow 2.+ and a deprecating strategy to continue support for Airflow 1.10.X.
2.1 Manifest-based ingestion (Python) [issue] @Kateryna_Kurach (https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/60)
* [ ] Not Started
* [ ] In Progress - Planned delivery tagged for M7
* [ ] Blocking
* [x] Completed
2.2 WITSML parser (Python) [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/61) @Kateryna_Kurach
* [ ] Not Started
* [ ] In Progress - Planned delivery tagged for M7
* [ ] Blocking
* [x] Completed
2.3 SEGY -> OpenVDS - Planned delivery tagged for M8
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
SEGY -> ZGY [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/issues/4) @sacha
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
CSV Parser [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/33) @tdixon @sdubey7 @frubio (M8 by EPAM)
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
### Workflow Services (level of effort estimated at 10-days)
Plans for updating the workflow services to be compatible with Airflow 2.0 is detailed here [issue](https://community.opengroup.org/osdu/platform/data-flow/data-workflow-framework/data-workflow/-/issues/1). ~~(EPAM) has taken on this work.~~
Implement Aifrlow 2.0 stable API in workflow-core, make it optional.
* [ ] Not Started - Planned delivery tagged for ~~M8~~ M9
* [x] In Progress
* [ ] Blocking
* [ ] Completed
Azure Migration @kibattul delivery M9
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
AWS Migration @Wibben
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
GCP Migration @Kateryna_Kurach M9
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
IBM Migration @shrikgar M8
* [ ] Not Started
* [ ] In Progress
* [ ] Blocking
* [x] Completed
### 4. Testing and pre-shipping
The dev team (??) will need to provide clear instructions for the Airflow upgrade which the CSP can follow in both QA and pre-ship environments so that it is prepped for testing by respective teams.
Create Airflow upgrade instructions.
* [x] Not Started
* [ ] In Progress
* [ ] Blocking
* [ ] Completed
QA - Airflow 2.+ readiness
* [x] Not Started
* [ ] In Progress
* [ ] Blocking
* [ ] Completed
Pre-shipping - Airflow 2.+ readiness
* [x] Not Started
* [ ] In Progress
* [ ] Blocking
* [ ] Completedhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/48Manifest ingestion fails with large number of wells2022-08-23T11:19:22ZAlan HensonManifest ingestion fails with large number of wellsThis issue is a mirror of the issues created by Pre-Shipping Team A, which can be found here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/64This issue is a mirror of the issues created by Pre-Shipping Team A, which can be found here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/64Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/87WIP - Airflow 2+ Adoption2022-03-16T07:33:32ZBen LasscockWIP - Airflow 2+ AdoptionThis issue is a place to track adoption of Airflow 2+ by the various CSP's.
Running Airflow 2+ with the experimental API is backward compatibility with the current workflow services. However it does provide potential performance improv...This issue is a place to track adoption of Airflow 2+ by the various CSP's.
Running Airflow 2+ with the experimental API is backward compatibility with the current workflow services. However it does provide potential performance improvements, particularly around the scheduler. Please update you current status here.
AWS - M9 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)
Azure - M10 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)
IBM - M9 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)
GCP - M8 timeline
* [ ] Not Started
* [ ] Airflow 2+ (experimental API)
* [x] Airflow 2+ (stable API)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/94Integration E2E Tests for manifest ingestion - AWS2022-08-24T14:48:52ZChris ZhangIntegration E2E Tests for manifest ingestion - AWSThis is to track the AWS team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85This is to track the AWS team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85M10 - Release 0.13Gustavo UrdanetaGustavo Urdanetahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/96Integration E2E Tests for manifest ingestion - IBM2022-11-11T12:30:22ZChris ZhangIntegration E2E Tests for manifest ingestion - IBMThis is to track the IBM team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85This is to track the IBM team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85M12 - Release 0.15Shrikant Gargjingdong sunShrikant Garghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/77[R3-M8]Issue with “frame of reference” handling (Unit of Measure) by Manifest...2021-10-05T11:09:12Zkenneth liew[R3-M8]Issue with “frame of reference” handling (Unit of Measure) by Manifest-based Ingestion [GONRG-3041]The case is related to the unit conversation with the environment R3-M5 (Katalyst private AWS setup).
Related to “master-data—SeismicAcquisitionSurvey” entity.
Test case-1 :
I verified that Normalizer (Indexer) does perform auto-conve...The case is related to the unit conversation with the environment R3-M5 (Katalyst private AWS setup).
Related to “master-data—SeismicAcquisitionSurvey” entity.
Test case-1 :
I verified that Normalizer (Indexer) does perform auto-conversion as expected when we persist using storage service.
Test case-2 :
However, it does not perform auto-conversion when we persist data using manifest-based ingestion.
My test case uses length values in "feet" and I do see converted values (meter) from Search Service query in the first test case.
I also noticed that the “meta”’s details are missing on that record by using the “Storage Get” method after inserting the record by manifest ingestion.
I have attached some examples related to this issue.
Please focus on the blue highlight at the "ManifesIngestionMethod.docx" document.
FYI @debasisc
[ManifestIngestionMethod.docx](/uploads/4b774f1ccc88db210882ab79a49d70fb/ManifestIngestionMethod.docx)
[StorageMethod.docx](/uploads/2ce57a7bacbee29f3a66b2db07923ebf/StorageMethod.docx)M8 - Release 0.11Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/81While using manifest_ingestion (Osdu_ingest), the tags field for the Wellbore...2022-08-28T18:47:20ZKamlesh TodaiWhile using manifest_ingestion (Osdu_ingest), the tags field for the Wellbore data is populated in the payload, it appears that the tags field is not getting ingestedThe issue is that when the tags field for the Wellbore data is populated in payload while ingesting the wellbore data, it appears that the tags field is not getting ingested. There are no warnings or errors in the Airflow logs regarding ...The issue is that when the tags field for the Wellbore data is populated in payload while ingesting the wellbore data, it appears that the tags field is not getting ingested. There are no warnings or errors in the Airflow logs regarding this and the wellbore data is getting ingested, but the tags field is missing.
Note: When tried to insert Wellbore data with tags field populated, using the storage API, it works fine.
The details are attached in the word docs.
tagsFieldIngestIssue.docx contains the payload used during ingestion and the queries done to check the tags field data.
tagsFieldStorageSearch.docx contains the payload used while creating the wellbore record with tags field using the storage API
The test was done on two platforms (AWS and GCP)
The DAG/run details for the GCP
{
"workflowId": "ef82cba0-0e45-4df3-91bf-4df1553102d3",
"runId": "22821aa9-82a2-4910-9e3f-d1e27addb49d",
"startTimeStamp": 1627328994856,
"endTimeStamp": 1627329575098,
"status": "finished",
"submittedBy": "kamlesh_todai@osdu-gcp.go3-nrg.projects.epam.com"
}
The DAG/run details for the AWS runId: 57a9adc0-aabb-4bb9-8154-561b5c12412f
Have not tried on IBM and Azure to see whether the behavior is the same or different.
[tagsFieldIngestIssue.docx](/uploads/83fbd805ec66927bb850abc683ef076b/tagsFieldIngestIssue.docx)
[tagsFieldStorageSearch.docx](/uploads/9a494e21f5d55ea9a9d337974b1eb6f7/tagsFieldStorageSearch.docx)
@ChrisZhang @ethiraj @debasisc @Wibben @Kateryna_Kurach @anujgupta @manishkKamlesh TodaiKamlesh Todaihttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/97"Broken DAG for manifest ingestion.2022-01-23T10:58:41ZAbhijeet Sawant"Broken DAG for manifest ingestion.Airflow UI showing import error- "Broken DAG: [/opt/airflow/dags/manifest_ingestion_dags.zip] No module named 'osdu_ingestion.libs.auth'"
Image deployed - repository: msosdu.azurecr.io/airflow-docker-image tag: v0.10![airflow_import_err...Airflow UI showing import error- "Broken DAG: [/opt/airflow/dags/manifest_ingestion_dags.zip] No module named 'osdu_ingestion.libs.auth'"
Image deployed - repository: msosdu.azurecr.io/airflow-docker-image tag: v0.10![airflow_import_error](/uploads/ecc51fc6ba34574bc852832ee0349177/airflow_import_error.JPG)
Continuous alerts are getting triggered.Kishore BattulaKishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/82Manifest ingestion does not show any updates in airflow when backslash charac...2021-10-19T19:16:21ZNaufal Mohamed NooriManifest ingestion does not show any updates in airflow when backslash character used in json body**Description**:
Using manifest ingestion (DAG) workflow service, when user insert backslash \ into json body manifest the workflow run stucks in SUBMITTED status. There is also no trace of the runID running in the Airflow log.
**Steps...**Description**:
Using manifest ingestion (DAG) workflow service, when user insert backslash \ into json body manifest the workflow run stucks in SUBMITTED status. There is also no trace of the runID running in the Airflow log.
**Steps to reproduce:**
a) Insert the body json into DAG worklow body. [With_Backslash_BodyData.json](/uploads/c2f2e8e8241df526830a73cc9ba2336a/With_Backslash_BodyData.json)
b) When submit the body json into base_url/api/workflow/v1/workflow/Osdu_ingest/workflowRun the workflow is submitted succesfully with the following response:
{
"workflowId": "dev:Osdu_ingest",
"runId": "4327f575-e7b3-490f-a1ee-b1e2e950c2a4",
"startTimeStamp": 1627041278115,
"status": "submitted",
"submittedBy": "naufal.noori@katalystdm.com"
}
c) After a while, check DAG run status and the workflow still showing the run is in submitted status. And no trace of the run ID in the Airflow log (This follow up check was done after 24 hours):
_Endpoint_: base_url/api/workflow/v1/workflow/Osdu_ingest/workflowRun/4327f575-e7b3-490f-a1ee-b1e2e950c2a4
_Response_:
{
"workflowId": "dev:Osdu_ingest",
"runId": "4327f575-e7b3-490f-a1ee-b1e2e950c2a4",
"startTimeStamp": 1627041278115,
"status": "submitted",
"submittedBy": "naufal.noori@katalystdm.com"
}
d) When a second trial run was conducted by replacing \ char with empty char, the workflow run was running perfectly and shows trace of running in Airflow log. [With_NO_Backslash_BodyData.json](/uploads/9fdbc2a59a930444feeb6bfacd1e1200/With_NO_Backslash_BodyData.json)
**Expectation**:
We are expecting that the workflow run to failed our request with clear and meaningful error message i.e. Request is failed. There is non-allowed special characters in line #something to line #something in your json body.
**Reason**
It will be a confusion for users to have a run successfully submitted but stuck in the process without any log trace whatsoever.
cc @debasiscM9 - Release 0.12https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/78SEGY->OpenVDS DAGS Airflow 2+ port2021-08-24T18:07:26ZBen LasscockSEGY->OpenVDS DAGS Airflow 2+ portA placeholder for the port of the SEGY->OpenVDS DAGS to Airflow 2+.A placeholder for the port of the SEGY->OpenVDS DAGS to Airflow 2+.M8 - Release 0.11Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/63[Manifest-based ingestion] Implement the logic to split large manifest into c...2021-08-24T18:17:29ZKateryna Kurach (EPAM)[Manifest-based ingestion] Implement the logic to split large manifest into chunks [GONRG-2696]In the current implementation, Manifest files’ entities are processed sequentially, even though this entities don’t depend on each other, that
becomes a problem when there are thousands of them inside a single Manifest file. We need to s...In the current implementation, Manifest files’ entities are processed sequentially, even though this entities don’t depend on each other, that
becomes a problem when there are thousands of them inside a single Manifest file. We need to stick to approach of splitting the Manifest file into
chunks and ingest them in parallel.
https://jiraeu.epam.com/browse/GONRG-2696M8 - Release 0.11Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/62Add cache for Search ID handler [GONRG-2593]2021-07-08T13:21:54ZKateryna Kurach (EPAM)Add cache for Search ID handler [GONRG-2593]https://jiraeu.epam.com/browse/GONRG-2593https://jiraeu.epam.com/browse/GONRG-2593M7 - Release 0.10Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/72Getting error in manifest ingestion after enabling policy service2022-04-05T15:15:30ZShrikant GargGetting error in manifest ingestion after enabling policy serviceWhen enabling policy service at search service manifest ingestion fails with error stating search response should have return fields like acl,id, kind and legal.
Fix: In search_record_id.py , we need to add "returnedFields": ["id", "ve...When enabling policy service at search service manifest ingestion fails with error stating search response should have return fields like acl,id, kind and legal.
Fix: In search_record_id.py , we need to add "returnedFields": ["id", "version","acl","kind","legal"] to make it work.M10 - Release 0.13Shrikant GargShrikant Garghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/61Implement code changes for WITSML DAG to be compatible with Airflow 2.0 [GON...2021-07-14T14:10:31ZKateryna Kurach (EPAM)Implement code changes for WITSML DAG to be compatible with Airflow 2.0 [GONRG-2729]https://jiraeu.epam.com/browse/GONRG-2729https://jiraeu.epam.com/browse/GONRG-2729M7 - Release 0.10Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/60Implement code changes for Manifest-based ingestion DAG to be compatible with...2021-07-14T12:37:20ZKateryna Kurach (EPAM)Implement code changes for Manifest-based ingestion DAG to be compatible with Airflow 2.0 [GONRG-2591]Update manifest-based ingestion DAG to work with Airflow 2.0
https://jiraeu.epam.com/browse/GONRG-2591Update manifest-based ingestion DAG to work with Airflow 2.0
https://jiraeu.epam.com/browse/GONRG-2591M7 - Release 0.10Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/28Deploy Manifest Ingestion2021-03-23T11:28:13ZAlan HensonDeploy Manifest Ingestionhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/26Documentation: Manifest Ingestion User Guide2022-09-15T23:49:36ZAlan HensonDocumentation: Manifest Ingestion User Guidehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/25Test Cases: Manifest Ingestion2021-03-02T02:09:32ZAlan HensonTest Cases: Manifest IngestionCreate test cases for Manifest Ingestion workflow.Create test cases for Manifest Ingestion workflow.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/74[Performance improvements] [Manifest] Batch processing during integrity check2021-07-07T13:56:20ZKateryna Kurach (EPAM)[Performance improvements] [Manifest] Batch processing during integrity checkInvestigate a possibility to improve manifest-based ingestion performance by splitting and processing big manifests into batches.Investigate a possibility to improve manifest-based ingestion performance by splitting and processing big manifests into batches.Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/6Implement Integration tests for Ingestion DAGs2021-02-25T22:32:14ZElizaveta Zeldina (EPAM)Implement Integration tests for Ingestion DAGsCreate the testing stage in GitLab CI/CD, where the DAGs are tested inside a docker container.
The steps of the task:
- create container with Airflow-server and Flask server mocking external API (Storage and Workflow)
- Trigger the DAG...Create the testing stage in GitLab CI/CD, where the DAGs are tested inside a docker container.
The steps of the task:
- create container with Airflow-server and Flask server mocking external API (Storage and Workflow)
- Trigger the DAGs inside the container with different arguments
- Compare expected and actual results of DAG executions
- Integrate the steps mentioned above into GitLab CI/CD
- [ ] AWS
- [ ] Azure
- [ ] IBM
- [x] GCPJoeDaniel SchollDmitriy RudkoAlan BrazJoehttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/36Automate file metadata for dataset within manifest2021-03-02T01:45:52ZAlan HensonAutomate file metadata for dataset within manifesthttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/47Documentation: Best practices for ingestion DAGs2022-09-15T23:49:37ZAlan HensonDocumentation: Best practices for ingestion DAGsWe need a guide or document that offers best practices recommendations to those constructing ingestion (or enrichment) related DAGs. This document should cover things such as:
- DAG Operator composability recommendations
- Performance c...We need a guide or document that offers best practices recommendations to those constructing ingestion (or enrichment) related DAGs. This document should cover things such as:
- DAG Operator composability recommendations
- Performance considerations
- Recommended property use (see https://community.opengroup.org/osdu/documentation/-/issues/80)
- Othershttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/93Airflow log reports success when failed2021-10-29T08:57:50ZJan MortensenAirflow log reports success when failed**Technical Context**
* Deployed version: _M8 aka release/0.11_
* DAG: _Osdu_ingest_
* Task: _process_single_manifest_file_task_
**Description**
When running a manifest ingestion we got some errors, but the specific task completed succe...**Technical Context**
* Deployed version: _M8 aka release/0.11_
* DAG: _Osdu_ingest_
* Task: _process_single_manifest_file_task_
**Description**
When running a manifest ingestion we got some errors, but the specific task completed successfully. This can be confusing when trying to debug.
Attaching the last part of the log which shows both the error and the "Marking task as SUCCESS".
>[2021-10-25 12:39:34,130] {connectionpool.py:230} DEBUG - Starting new HTTP connection (1): storage.osdu-azure.svc.cluster.local:80
[2021-10-25 12:39:34,178] {connectionpool.py:442} DEBUG - http://storage.osdu-azure.svc.cluster.local:80 "PUT /api/storage/v2/records HTTP/1.1" 400 None
[2021-10-25 12:39:34,179] {process_manifest_r3.py:131} ERROR - Request error.
[2021-10-25 12:39:34,179] {process_manifest_r3.py:132} ERROR - Response status: 400. Response content: {"code":400,"reason":"Invalid ACL","message":"Acl not match with tenant or domain"}.
[2021-10-25 12:39:34,179] {authorization.py:137} ERROR - {"code":400,"reason":"Invalid ACL","message":"Acl not match with tenant or domain"}
[2021-10-25 12:39:34,179] {single_manifest_processor.py:79} WARNING - Can't process entity SRN: opendes:reference-data--MaterialType:WTS
[2021-10-25 12:39:34,179] {single_manifest_processor.py:255} INFO - Processed ids []
[2021-10-25 12:39:34,179] {process_manifest_r3.py:173} INFO - Processed ids []
[2021-10-25 12:39:34,735] {__init__.py:62} DEBUG - Backend: None, Lineage called with inlets: [], outlets: []
[2021-10-25 12:39:35,139] {taskinstance.py:1070} INFO - Marking task as SUCCESS.dag_id=Osdu_ingest, task_id=process_single_manifest_file_task, execution_date=20211025T123739, start_date=20211025T123909, end_date=20211025T123935
[2021-10-25 12:39:35,484] {base_job.py:197} DEBUG - [heartbeat]
[2021-10-25 12:39:35,485] {local_task_job.py:102} INFO - Task exited with return code 0
**Expected result**
Task and DAG-run marked as failureM10 - Release 0.13https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85Integration E2E Tests for manifest ingestion (GONRG-3300) - GCP2021-11-15T18:44:33ZKishore BattulaIntegration E2E Tests for manifest ingestion (GONRG-3300) - GCPCurrently manifest ingestion doesn't have tests which invokes the workflow service to trigger the ingestion and validates it by fetching the ingested records through search service (or) storage service. This validates the airflow, workfl...Currently manifest ingestion doesn't have tests which invokes the workflow service to trigger the ingestion and validates it by fetching the ingested records through search service (or) storage service. This validates the airflow, workflow service and all related services are running with right set of configurations for manifest ingestion.
**Acceptance Criteria**: Add new E2E tests which can validate the manifest ingestion by triggering it through workflow service
https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/49#note_58471M10 - Release 0.13Chris ZhangChris Zhanghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/8Register new Ingestion DAG2021-01-14T20:22:51ZStephen Whitley (Invited Expert)Register new Ingestion DAGStephen Whitley (Invited Expert)Todd DixonSwapnilStephen Whitley (Invited Expert)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/4Verify record indexing status in workflow Status Tracking Operator2020-06-26T11:45:04ZElizaveta Zeldina (EPAM)Verify record indexing status in workflow Status Tracking OperatorEnhance functionality of Status Tracking operator and add validation that new record was indexed. Enhance functionality of Status Tracking operator and add validation that new record was indexed. Dmitriy RudkoDmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/17List of Checks to be provided to Oracle Team2021-08-10T08:51:11ZMeena RathinavelList of Checks to be provided to Oracle TeamKeith WallKeith Wallhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/11Work Product Component - Perform Manifest Checks2021-08-10T08:51:34ZMeena RathinavelWork Product Component - Perform Manifest Checks- (Understand EPAM scripts and find out what checks are being performed)
- Decision point
- STOP if there is a problem- (Understand EPAM scripts and find out what checks are being performed)
- Decision point
- STOP if there is a problemM1 - Release 0.1Clifford PattersonJames O'BoyleRohit KurhekarClifford Patterson2020-10-21https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/5Implement custom status-tracking Airflow operator2020-08-27T23:04:20ZElizaveta Zeldina (EPAM)Implement custom status-tracking Airflow operatorImplement custom status-tracking Airflow operator according to requirements specified in https://community.opengroup.org/osdu/documentation/-/wikis/OSDU-(C)/Design-and-Implementation/Ingestion-and-Enrichment-Detail/R2-Ingestion-Workflow-...Implement custom status-tracking Airflow operator according to requirements specified in https://community.opengroup.org/osdu/documentation/-/wikis/OSDU-(C)/Design-and-Implementation/Ingestion-and-Enrichment-Detail/R2-Ingestion-Workflow-Orchestration-non-Spike#workflow-status-operatorDmitriy RudkoDmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/3Evaluate DAG Framework and APIs2022-08-23T09:56:51ZDania Kodeih (Microsoft)Evaluate DAG Framework and APIs- [x] GCP
- [x] Azure
- [x] IBM
- [x] AWS- [x] GCP
- [x] Azure
- [x] IBM
- [x] AWSDania Kodeih (Microsoft)Wei SunPingjiang WangDania Kodeih (Microsoft)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/9Supplemental materials for Ingestion Framework2021-02-10T19:24:15ZDmitriy RudkoSupplemental materials for Ingestion FrameworkAs a part of this story we need to create supporting material for people starting with Ingestion Framework. Material include:
- [ ] Postman Collections with examples
- [ ] Development related documentation
- [ ] Deployment related docum...As a part of this story we need to create supporting material for people starting with Ingestion Framework. Material include:
- [ ] Postman Collections with examples
- [ ] Development related documentation
- [ ] Deployment related documentation
- [ ] Sample hello-world project / DAGDmitriy RudkoKateryna Kurach (EPAM)Dmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/29Deploy Manifest Ingestion2021-03-02T13:11:33ZKishore BattulaDeploy Manifest IngestionKishore BattulaKishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/24R3 Manifest Test Data - Feb 20212021-03-02T13:17:58ZAlan HensonR3 Manifest Test Data - Feb 2021- Complete provision of Volve data
- Provide manifests for TNO data, including the same data used for R2:
- Well master data (well, wellbore)
- Other master data (Geopolitical Entity, Organization
- Well Work Product (markers, traj...- Complete provision of Volve data
- Provide manifests for TNO data, including the same data used for R2:
- Well master data (well, wellbore)
- Other master data (Geopolitical Entity, Organization
- Well Work Product (markers, trajectories, well logs)
- Reference data used by manifestsKeith WallKeith Wall2021-02-28https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/22Master Data - Perform Manifest Checks2021-03-02T12:45:33ZMeena RathinavelMaster Data - Perform Manifest ChecksM1 - Release 0.1Clifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/21Master Data - Perform checks on Reference Data2021-03-02T12:45:38ZMeena RathinavelMaster Data - Perform checks on Reference DataM1 - Release 0.1Clifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/20Master Data - Load manifest2022-08-23T10:47:24ZMeena RathinavelMaster Data - Load manifestM1 - Release 0.1Clifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/19Master Data - Perform check on linked WPC data2021-03-02T12:45:46ZMeena RathinavelMaster Data - Perform check on linked WPC dataClifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/18Work Product Component - Perform check on linked WPC data2021-03-02T12:45:50ZMeena RathinavelWork Product Component - Perform check on linked WPC data Ex: Bin Grid for Seismic Trace
- Reject or not (based on User Input 'loose' or strict') Ex: Bin Grid for Seismic Trace
- Reject or not (based on User Input 'loose' or strict')Clifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/12Work Product Component - Perform checks on Reference Data2022-06-28T19:50:51ZMeena RathinavelWork Product Component - Perform checks on Reference DataPerform checks on reference data
- (CurveUnit":"srn:reference-data/UnitOfMeasure:M)
- Does it exist?
- Reject or not (based on User Input 'loose or strict')?Perform checks on reference data
- (CurveUnit":"srn:reference-data/UnitOfMeasure:M)
- Does it exist?
- Reject or not (based on User Input 'loose or strict')?M1 - Release 0.1Clifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/15Work Product Component - Perform check on linked master data2021-03-02T12:45:57ZMeena RathinavelWork Product Component - Perform check on linked master data- ex: Wellbore for WellLog
- Reject or not (based on user input 'loose or strict)?- ex: Wellbore for WellLog
- Reject or not (based on user input 'loose or strict)?M1 - Release 0.1Clifford PattersonJames O'BoyleRohit KurhekarClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/16Load the file plus manifest2021-08-10T08:51:19ZMeena RathinavelLoad the file plus manifestM1 - Release 0.1Clifford PattersonJames O'BoyleClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/13Perform ACL Check2022-08-23T10:47:24ZMeena RathinavelPerform ACL Check- Does my role permit me to add data?- Does my role permit me to add data?M1 - Release 0.1Clifford PattersonJames O'BoyleClifford Pattersonhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/7[Validation] Add manifest validation in Manifest Ingestion Workflow - Schema ...2021-01-25T16:55:14ZElizaveta Zeldina (EPAM)[Validation] Add manifest validation in Manifest Ingestion Workflow - Schema validationDetails can be found here:
https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/15
This issue was also mentioned in Lessons Learned after Energistics demo. We need to add the following types of validation:
- Manifest st...Details can be found here:
https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/15
This issue was also mentioned in Lessons Learned after Energistics demo. We need to add the following types of validation:
- Manifest structure conforms the Schema that is being used
- Mandatory fields are filledDmitriy RudkoDmitriy Rudkohttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/23R3 Manifest Test Data - Jan 20212021-02-24T20:05:36ZAlan HensonR3 Manifest Test Data - Jan 2021- Loadable manifests and data, for Volve. These will include representative examples of
- Well master data (well, wellbore)
- Seismic master data (Acquisition Project, Processing Project, Interpretation Project)
- Other master dat...- Loadable manifests and data, for Volve. These will include representative examples of
- Well master data (well, wellbore)
- Seismic master data (Acquisition Project, Processing Project, Interpretation Project)
- Other master data (Geopolitical Entity, Organization
- Well Work Product (markers, trajectories, well logs)
- Seismic Work Product (trace data, bin grid, horizons, fault systems)]
- Reference data used by manifests
- Updated Python scripts to generate manifests
- Python scripts to generate synthetic manifests are available now, and example synthetic manifests are ready for testing of ingestionKeith WallKeith Wall2021-01-31https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/31Validate entire manifest entity instead of Data inside the manifest entity2021-02-23T01:50:23ZKishore BattulaValidate entire manifest entity instead of Data inside the manifest entityCurrently in the schema validation logic below
```
def _validate_entity(self, entity: dict, schema: dict = None):
"""
Validate the 'data' field of any entity against a schema got by entity's kind.
"""
if n...Currently in the schema validation logic below
```
def _validate_entity(self, entity: dict, schema: dict = None):
"""
Validate the 'data' field of any entity against a schema got by entity's kind.
"""
if not schema:
schema = self.get_schema(entity["kind"])
data = entity["data"]
try:
self._validate_against_schema(schema, data)
logger.debug(f"Record successfully validated")
return True
except exceptions.ValidationError as exc:
logger.error("Schema validation error. Data field.")
logger.error(f"Manifest kind: {entity['kind']}")
logger.error(f"Error: {exc}")
return False
```
Only the data in a masterData/referenceData/Wp/Wpc is validate. Instead, the whole masterData/referenceData/Wp/Wpc must be validated. This validation will conform to the schemas uploaded in schema service.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/32DAGs should be updated to adhere to new workflow service APIs2021-02-23T01:50:07ZKishore BattulaDAGs should be updated to adhere to new workflow service APIsCurrently the DAGs for manifest ingestion work against the old workflow service APIs. These APIs are deleted in the [latest MR](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/77) on ...Currently the DAGs for manifest ingestion work against the old workflow service APIs. These APIs are deleted in the [latest MR](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/merge_requests/77) on workflow service.
Below are changes needed in ingestion DAGs
- Input reading inside manifest process operator. The whole manifest content is available in conf section of airflow payload. Now it will be part of conf.executionContext
- Update status operator. The update status operator must be updated to use new API on workflow service
### Details:
1. DAG currently is triggered with the following request object:
```
{
"run_id": "2db251ff-bb77-4518-adae-820e8b559bec",
"execution_date": "2021-02-08T01:10:43",
"conf": {
"workflow_id": "ada0d71c-847e-4724-a5a6-851d74d681e2",
"run_id": "2db251ff-bb77-4518-adae-820e8b559bec",
"authToken": "Bearer <token>",
"execution_context": {
"key1": "value1"
},
"correlation_id": "e1d062dd-2472-4d78-9d4c-43e48b78db0a"
}
}
```
2. Workflow API was updated. Please see the latest OpenAPI spec for Workflow service here:
https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-workflow/-/blob/master/docs/api/openapi.workflow.yamlhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/49Record does not appear in search2021-03-16T11:59:38ZAlan HensonRecord does not appear in searchThis is a mirror of the issue reported by Pre-Shipping Team A here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/63.
This references GCP Jira issue: https://jiraeu.epam.com/browse/GONRG-1991This is a mirror of the issue reported by Pre-Shipping Team A here: https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/issues/63.
This references GCP Jira issue: https://jiraeu.epam.com/browse/GONRG-1991Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/30Move ingestion DAGs and operators under a folder named osdu2021-06-15T04:06:21ZKishore BattulaMove ingestion DAGs and operators under a folder named osduCurrently the DAGs and operators are in top level folder `src`. Clients deploying these DAGs and operators will copy the DAGs into DAGs folder and operators into operators folders.
In a customer environment there will be more DAGs and ...Currently the DAGs and operators are in top level folder `src`. Clients deploying these DAGs and operators will copy the DAGs into DAGs folder and operators into operators folders.
In a customer environment there will be more DAGs and operators and there are chances where the python names can conflict with the existing names mentioned in this repository.
Can we move the DAGs, operators and hooks into osdu folder or any other folder name so that it will be easy manage. This has to be done in the repository only because the dags use import statements for operators and libs which will fail if someone wanted to put these under different folder structure.
**Benifits**
- Avoids naming conflict
- Easy to propagate updates from this repository into airflow. We can replace the entire folder in the destinationM1 - Release 0.1https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/37Refactor reusable Python logic from Ingestion DAGs to Python SDK2021-11-01T15:47:02ZAlan HensonRefactor reusable Python logic from Ingestion DAGs to Python SDKIngestion DAGs have functionality baked into the Python code that should be refactored into the OSDU Python SDK or an equivalent OSDU Python library. This story should determine what that functionality is and create stories to capture th...Ingestion DAGs have functionality baked into the Python code that should be refactored into the OSDU Python SDK or an equivalent OSDU Python library. This story should determine what that functionality is and create stories to capture the refactoring.
The scope of this issue excludes that logic that is specific to ingestion (i.e., validation logic, surrogate-key processing logic, storage logic, etc.).https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/41Manifest Ingestion: Refactor syntax and validation logic into the common inge...2022-08-23T11:19:17ZAlan HensonManifest Ingestion: Refactor syntax and validation logic into the common ingestion Python LibraryThe first step in making the syntax and validation logic within the manifest ingestion DAG reusable is to refactor to a common place. A new ingestion python library should be created (see https://community.opengroup.org/osdu/platform/dat...The first step in making the syntax and validation logic within the manifest ingestion DAG reusable is to refactor to a common place. A new ingestion python library should be created (see https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/40) and the syntax and validation logic should be refactored to that library for reuse.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/42Manifest Ingestion: Refactor to have standalone capability2022-08-23T11:19:21ZAlan HensonManifest Ingestion: Refactor to have standalone capabilityIn seeking maximum reusability, the manifest ingestion workflow should take a bottoms-up approach where:
- Each DAG operator is Airflow agnostic and capable of running within a Python runtime environment without Airflow
- Each DAG opera...In seeking maximum reusability, the manifest ingestion workflow should take a bottoms-up approach where:
- Each DAG operator is Airflow agnostic and capable of running within a Python runtime environment without Airflow
- Each DAG operator is able to run as a script taking the appropriate inputs, performing its work, and then providing the expected outputs (interacting with OSDU services is expected)
- A DAG workflow should run end-to-end without requiring Airflow - in essence, running as a script from the command line with the correct inputs
- The DAG workflow should encapsulate the above into an Airflow workflow
The workflow should be executable outside of Airflow. The Airflow components should abstract the Airflow pieces from the core workflow itself.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/43Schema Resolver making calls to schema service even though it exists in the d...2021-03-08T10:58:44ZKishore BattulaSchema Resolver making calls to schema service even though it exists in the definitions sectionWhen validating the data against schema multiple calls were made to schema service to fetch schema components which is unnecessary
```python
resolver = OSDURefResolver(
schema_service=self.schema_service,
base_ur...When validating the data against schema multiple calls were made to schema service to fetch schema components which is unnecessary
```python
resolver = OSDURefResolver(
schema_service=self.schema_service,
base_uri=schema.get("$id", ""),
referrer=schema,
handlers=self.resolver_handlers,
cache_remote=True
)
validator = jsonschema.Draft7Validator(schema=schema, resolver=resolver)
validator.validate(data)
```
When fetching schema for master data well it is making below calls to schema service
```
[2021-03-03 05:29:22,532] {connectionpool.py:442} DEBUG - https://osdu-demo.msft-osdu-test.org:443 "GET /api/schema-service/v1/schema/opendes:wks:master-data--Well:1.0.0 HTTP/1.1" 200 74509
[2021-03-03 05:29:22,540] {connectionpool.py:943} DEBUG - Starting new HTTPS connection (1): osdu-demo.msft-osdu-test.org:443
[2021-03-03 05:29:22,869] {connectionpool.py:442} DEBUG - https://osdu-demo.msft-osdu-test.org:443 "GET /api/schema-service/v1/schema/opendes:wks:AbstractMaster:1.0.0 HTTP/1.1" 200 36047
[2021-03-03 05:29:22,876] {connectionpool.py:943} DEBUG - Starting new HTTPS connection (1): osdu-demo.msft-osdu-test.org:443
[2021-03-03 05:29:23,365] {connectionpool.py:442} DEBUG - https://osdu-demo.msft-osdu-test.org:443 "GET /api/schema-service/v1/schema/opendes:wks:AbstractSpatialLocation:1.0.0 HTTP/1.1" 200 21883
```
Below is the schema for well master data fetched from schema service
```json
{
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:master-data--Well:1.0.0",
"description": "The origin of a set of wellbores.",
"additionalProperties": false,
"title": "Well",
"type": "object",
"definitions": {
"opendes:wks:AbstractGeoPoliticalContext:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractGeoPoliticalContext:1.0.0",
"description": "A single, typed geo-political entity reference, which is 'abstracted' to AbstractGeoContext and then aggregated by GeoContexts properties.",
"x-osdu-review-status": "Accepted",
"title": "AbstractGeoPoliticalContext",
"type": "object",
"properties": {
"GeoPoliticalEntityID": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-GeoPoliticalEntity:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Reference to GeoPoliticalEntity.",
"x-osdu-relationship": [
{
"EntityType": "GeoPoliticalEntity",
"GroupType": "master-data"
}
],
"type": "string"
},
"GeoTypeID": {
"x-osdu-is-derived": {
"RelationshipPropertyName": "GeoPoliticalEntityID",
"TargetPropertyName": "GeoPoliticalEntityTypeID"
},
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-GeoPoliticalEntityType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The GeoPoliticalEntityType reference of the GeoPoliticalEntity (via GeoPoliticalEntityID) for application convenience.",
"x-osdu-relationship": [
{
"EntityType": "GeoPoliticalEntityType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractGeoPoliticalContext.1.0.0.json"
},
"opendes:wks:AbstractFacilityOperator:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFacilityOperator:1.0.0",
"description": "The organisation that was responsible for a facility at some point in time.",
"title": "AbstractFacilityOperator",
"type": "object",
"properties": {
"FacilityOperatorID": {
"type": "string",
"title": "Facility Operator ID",
"description": "Internal, unique identifier for an item 'AbstractFacilityOperator'. This identifier is used by 'AbstractFacility.CurrentOperatorID' and 'AbstractFacility.InitialOperatorID'."
},
"EffectiveDateTime": {
"format": "date-time",
"description": "The date and time at which the facility operator becomes effective.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"FacilityOperatorOrganisationID": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Organisation:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The company that currently operates, or previously operated the facility",
"x-osdu-relationship": [
{
"EntityType": "Organisation",
"GroupType": "master-data"
}
],
"type": "string"
},
"TerminationDateTime": {
"format": "date-time",
"description": "The date and time at which the facility operator is no longer in effect. If the operator is still effective, the 'TerminationDateTime' is left absent.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFacilityOperator.1.0.0.json"
},
"opendes:wks:AbstractCommonResources:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractCommonResources:1.0.0",
"description": "Common resources to be injected at root 'data' level for every entity, which is persistable in Storage. The insertion is performed by the OsduSchemaComposer script.",
"title": "OSDU Common Resources",
"type": "object",
"properties": {
"ResourceHomeRegionID": {
"x-osdu-relationship": [
{
"EntityType": "OSDURegion",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-OSDURegion:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The name of the home [cloud environment] region for this OSDU resource object.",
"title": "Resource Home Region ID",
"type": "string"
},
"ResourceHostRegionIDs": {
"description": "The name of the host [cloud environment] region(s) for this OSDU resource object.",
"title": "Resource Host Region ID",
"type": "array",
"items": {
"x-osdu-relationship": [
{
"EntityType": "OSDURegion",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-OSDURegion:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"type": "string"
}
},
"ResourceLifecycleStatus": {
"x-osdu-relationship": [
{
"EntityType": "ResourceLifecycleStatus",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-ResourceLifecycleStatus:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Describes the current Resource Lifecycle status.",
"title": "Resource Lifecycle Status",
"type": "string"
},
"ResourceSecurityClassification": {
"x-osdu-relationship": [
{
"EntityType": "ResourceSecurityClassification",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-ResourceSecurityClassification:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Classifies the security level of the resource.",
"title": "Resource Security Classification",
"type": "string"
},
"ResourceCurationStatus": {
"x-osdu-relationship": [
{
"EntityType": "ResourceCurationStatus",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-ResourceCurationStatus:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Describes the current Curation status.",
"title": "Resource Curation Status",
"type": "string"
},
"ExistenceKind": {
"x-osdu-relationship": [
{
"EntityType": "ExistenceKind",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-ExistenceKind:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Where does this data resource sit in the cradle-to-grave span of its existence?",
"title": "Existence Kind",
"type": "string"
},
"Source": {
"description": "The entity that produced the record, or from which it is received; could be an organization, agency, system, internal team, or individual. For informational purposes only, the list of sources is not governed.",
"title": "Data Source",
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractCommonResources.1.0.0.json"
},
"opendes:wks:AbstractAliasNames:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractAliasNames:1.0.0",
"description": "A list of alternative names for an object. The preferred name is in a separate, scalar property. It may or may not be repeated in the alias list, though a best practice is to include it if the list is present, but to omit the list if there are no other names. Note that the abstract entity is an array so the $ref to it is a simple property reference.",
"x-osdu-review-status": "Accepted",
"title": "AbstractAliasNames",
"type": "object",
"properties": {
"AliasNameTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-AliasNameType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "A classification of alias names such as by role played or type of source, such as regulatory name, regulatory code, company code, international standard name, etc.",
"x-osdu-relationship": [
{
"EntityType": "AliasNameType",
"GroupType": "reference-data"
}
],
"type": "string"
},
"EffectiveDateTime": {
"format": "date-time",
"type": "string",
"description": "The date and time when an alias name becomes effective."
},
"AliasName": {
"type": "string",
"description": "Alternative Name value of defined name type for an object."
},
"TerminationDateTime": {
"format": "date-time",
"type": "string",
"description": "The data and time when an alias name is no longer in effect."
},
"DefinitionOrganisationID": {
"pattern": "^[\\w\\-\\.]+:(reference-data\\-\\-StandardsOrganisation|master-data\\-\\-Organisation):[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The StandardsOrganisation (reference-data) or Organisation (master-data) that provided the name (the source).",
"x-osdu-relationship": [
{
"EntityType": "StandardsOrganisation",
"GroupType": "reference-data"
},
{
"EntityType": "Organisation",
"GroupType": "master-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractAliasNames.1.0.0.json"
},
"opendes:wks:AbstractAnyCrsFeatureCollection:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractAnyCrsFeatureCollection:1.0.0",
"description": "A schema like GeoJSON FeatureCollection with a non-WGS 84 CRS context; based on https://geojson.org/schema/FeatureCollection.json. Attention: the coordinate order is fixed: Longitude/Easting/Westing/X first, followed by Latitude/Northing/Southing/Y, optionally height as third coordinate.",
"title": "AbstractAnyCrsFeatureCollection",
"type": "object",
"required": [
"type",
"persistableReferenceCrs",
"features"
],
"properties": {
"CoordinateReferenceSystemID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-CoordinateReferenceSystem:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The CRS reference into the CoordinateReferenceSystem catalog.",
"x-osdu-relationship": [
{
"EntityType": "CoordinateReferenceSystem",
"GroupType": "reference-data"
}
],
"title": "Coordinate Reference System ID",
"type": "string",
"example": "namespace:reference-data--CoordinateReferenceSystem:BoundCRS.SLB.32021.15851:"
},
"persistableReferenceCrs": {
"description": "The CRS reference as persistableReference string. If populated, the CoordinateReferenceSystemID takes precedence.",
"type": "string",
"title": "CRS Reference",
"example": "{\"lateBoundCRS\":{\"wkt\":\"PROJCS[\\\"NAD_1927_StatePlane_North_Dakota_South_FIPS_3302\\\",GEOGCS[\\\"GCS_North_American_1927\\\",DATUM[\\\"D_North_American_1927\\\",SPHEROID[\\\"Clarke_1866\\\",6378206.4,294.9786982]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],PROJECTION[\\\"Lambert_Conformal_Conic\\\"],PARAMETER[\\\"False_Easting\\\",2000000.0],PARAMETER[\\\"False_Northing\\\",0.0],PARAMETER[\\\"Central_Meridian\\\",-100.5],PARAMETER[\\\"Standard_Parallel_1\\\",46.1833333333333],PARAMETER[\\\"Standard_Parallel_2\\\",47.4833333333333],PARAMETER[\\\"Latitude_Of_Origin\\\",45.6666666666667],UNIT[\\\"Foot_US\\\",0.304800609601219],AUTHORITY[\\\"EPSG\\\",32021]]\",\"ver\":\"PE_10_3_1\",\"name\":\"NAD_1927_StatePlane_North_Dakota_South_FIPS_3302\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"32021\"},\"type\":\"LBC\"},\"singleCT\":{\"wkt\":\"GEOGTRAN[\\\"NAD_1927_To_WGS_1984_79_CONUS\\\",GEOGCS[\\\"GCS_North_American_1927\\\",DATUM[\\\"D_North_American_1927\\\",SPHEROID[\\\"Clarke_1866\\\",6378206.4,294.9786982]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],METHOD[\\\"NADCON\\\"],PARAMETER[\\\"Dataset_conus\\\",0.0],AUTHORITY[\\\"EPSG\\\",15851]]\",\"ver\":\"PE_10_3_1\",\"name\":\"NAD_1927_To_WGS_1984_79_CONUS\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"15851\"},\"type\":\"ST\"},\"ver\":\"PE_10_3_1\",\"name\":\"NAD27 * OGP-Usa Conus / North Dakota South [32021,15851]\",\"authCode\":{\"auth\":\"SLB\",\"code\":\"32021079\"},\"type\":\"EBC\"}"
},
"features": {
"type": "array",
"items": {
"title": "AnyCrsGeoJSON Feature",
"type": "object",
"required": [
"type",
"properties",
"geometry"
],
"properties": {
"geometry": {
"oneOf": [
{
"type": "null"
},
{
"title": "AnyCrsGeoJSON Point",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsPoint"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON LineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsLineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON Polygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsPolygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON MultiPoint",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsMultiPoint"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON MultiLineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsMultiLineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON MultiPolygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsMultiPolygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON GeometryCollection",
"type": "object",
"required": [
"type",
"geometries"
],
"properties": {
"type": {
"type": "string",
"enum": [
"AnyCrsGeometryCollection"
]
},
"geometries": {
"type": "array",
"items": {
"oneOf": [
{
"title": "AnyCrsGeoJSON Point",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsPoint"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON LineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsLineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON Polygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsPolygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON MultiPoint",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsMultiPoint"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON MultiLineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsMultiLineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "AnyCrsGeoJSON MultiPolygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
}
},
"type": {
"type": "string",
"enum": [
"AnyCrsMultiPolygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
}
]
}
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
}
]
},
"type": {
"type": "string",
"enum": [
"AnyCrsFeature"
]
},
"properties": {
"oneOf": [
{
"type": "null"
},
{
"type": "object"
}
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
}
},
"persistableReferenceUnitZ": {
"description": "The unit of measure for the Z-axis (only for 3-dimensional coordinates, where the CRS does not describe the vertical unit). Note that the direction is upwards positive, i.e. Z means height.",
"type": "string",
"title": "Z-Unit Reference",
"example": "{\"scaleOffset\":{\"scale\":1.0,\"offset\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}"
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
},
"persistableReferenceVerticalCrs": {
"description": "The VerticalCRS reference as persistableReference string. If populated, the VerticalCoordinateReferenceSystemID takes precedence. The property is null or empty for 2D geometries. For 3D geometries and absent or null persistableReferenceVerticalCrs the vertical CRS is either provided via persistableReferenceCrs's CompoundCRS or it is implicitly defined as EPSG:5714 MSL height.",
"type": "string",
"title": "Vertical CRS Reference",
"example": "{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"5773\"},\"type\":\"LBC\",\"ver\":\"PE_10_3_1\",\"name\":\"EGM96_Geoid\",\"wkt\":\"VERTCS[\\\"EGM96_Geoid\\\",VDATUM[\\\"EGM96_Geoid\\\"],PARAMETER[\\\"Vertical_Shift\\\",0.0],PARAMETER[\\\"Direction\\\",1.0],UNIT[\\\"Meter\\\",1.0],AUTHORITY[\\\"EPSG\\\",5773]]\"}"
},
"type": {
"type": "string",
"enum": [
"AnyCrsFeatureCollection"
]
},
"VerticalCoordinateReferenceSystemID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-CoordinateReferenceSystem:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The explicit VerticalCRS reference into the CoordinateReferenceSystem catalog. This property stays empty for 2D geometries. Absent or empty values for 3D geometries mean the context may be provided by a CompoundCRS in 'CoordinateReferenceSystemID' or implicitly EPSG:5714 MSL height",
"x-osdu-relationship": [
{
"EntityType": "CoordinateReferenceSystem",
"GroupType": "reference-data"
}
],
"title": "Vertical Coordinate Reference System ID",
"type": "string",
"example": "namespace:reference-data--CoordinateReferenceSystem:VerticalCRS.EPSG.5773:"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractAnyCrsFeatureCollection.1.0.0.json"
},
"opendes:wks:AbstractFacilitySpecification:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFacilitySpecification:1.0.0",
"description": "A property, characteristic, or attribute about a facility that is not described explicitly elsewhere.",
"title": "AbstractFacilitySpecification",
"type": "object",
"properties": {
"FacilitySpecificationText": {
"type": "string",
"description": "The actual text value of the parameter."
},
"FacilitySpecificationDateTime": {
"format": "date-time",
"description": "The actual date and time value of the parameter.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"FacilitySpecificationIndicator": {
"type": "boolean",
"description": "The actual indicator value of the parameter."
},
"TerminationDateTime": {
"format": "date-time",
"description": "The date and time at which the facility specification instance is no longer in effect.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"EffectiveDateTime": {
"format": "date-time",
"description": "The date and time at which the facility specification instance becomes effective.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"UnitOfMeasureID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-UnitOfMeasure:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The unit for the quantity parameter, like metre (m in SI units system) for quantity Length.",
"x-osdu-relationship": [
{
"EntityType": "UnitOfMeasure",
"GroupType": "reference-data"
}
],
"type": "string"
},
"FacilitySpecificationQuantity": {
"type": "number",
"description": "The value for the specified parameter type.",
"x-osdu-frame-of-reference": "UOM_via_property:UnitOfMeasureID"
},
"ParameterTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-ParameterType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Parameter type of property or characteristic.",
"x-osdu-relationship": [
{
"EntityType": "ParameterType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFacilitySpecification.1.0.0.json"
},
"opendes:wks:AbstractMetaItem:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"oneOf": [
{
"title": "FrameOfReferenceUOM",
"type": "object",
"properties": {
"persistableReference": {
"description": "The self-contained, persistable reference string uniquely identifying the Unit.",
"title": "UOM Persistable Reference",
"type": "string",
"example": "{\"abcd\":{\"a\":0.0,\"b\":1200.0,\"c\":3937.0,\"d\":0.0},\"symbol\":\"ft[US]\",\"baseMeasurement\":{\"ancestry\":\"L\",\"type\":\"UM\"},\"type\":\"UAD\"}"
},
"kind": {
"const": "Unit",
"description": "The kind of reference, 'Unit' for FrameOfReferenceUOM.",
"title": "UOM Reference Kind"
},
"propertyNames": {
"description": "The list of property names, to which this meta data item provides Unit context to. Data structures, which come in a single frame of reference, can register the property name, others require a full path like \"Data.StructureA.PropertyB\" to define a unique context.",
"title": "UOM Property Names",
"type": "array",
"items": {
"type": "string"
},
"example": [
"HorizontalDeflection.EastWest",
"HorizontalDeflection.NorthSouth"
]
},
"name": {
"description": "The unit symbol or name of the unit.",
"title": "UOM Unit Symbol",
"type": "string",
"example": "ft[US]"
},
"unitOfMeasureID": {
"x-osdu-relationship": [
{
"EntityType": "UnitOfMeasure",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-UnitOfMeasure:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "SRN to unit of measure reference.",
"type": "string",
"example": "namespace:reference-data--UnitOfMeasure:ftUS:"
}
},
"required": [
"kind",
"persistableReference"
]
},
{
"title": "FrameOfReferenceCRS",
"type": "object",
"properties": {
"coordinateReferenceSystemID": {
"x-osdu-relationship": [
{
"EntityType": "CoordinateReferenceSystem",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-CoordinateReferenceSystem:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "SRN to CRS reference.",
"type": "string",
"example": "namespace:reference-data--CoordinateReferenceSystem:EPSG.32615:"
},
"persistableReference": {
"description": "The self-contained, persistable reference string uniquely identifying the CRS.",
"title": "CRS Persistable Reference",
"type": "string",
"example": "{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"32615\"},\"type\":\"LBC\",\"ver\":\"PE_10_3_1\",\"name\":\"WGS_1984_UTM_Zone_15N\",\"wkt\":\"PROJCS[\\\"WGS_1984_UTM_Zone_15N\\\",GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],PROJECTION[\\\"Transverse_Mercator\\\"],PARAMETER[\\\"False_Easting\\\",500000.0],PARAMETER[\\\"False_Northing\\\",0.0],PARAMETER[\\\"Central_Meridian\\\",-93.0],PARAMETER[\\\"Scale_Factor\\\",0.9996],PARAMETER[\\\"Latitude_Of_Origin\\\",0.0],UNIT[\\\"Meter\\\",1.0],AUTHORITY[\\\"EPSG\\\",32615]]\"}"
},
"kind": {
"const": "CRS",
"description": "The kind of reference, constant 'CRS' for FrameOfReferenceCRS.",
"title": "CRS Reference Kind"
},
"propertyNames": {
"description": "The list of property names, to which this meta data item provides CRS context to. Data structures, which come in a single frame of reference, can register the property name, others require a full path like \"Data.StructureA.PropertyB\" to define a unique context.",
"title": "CRS Property Names",
"type": "array",
"items": {
"type": "string"
},
"example": [
"KickOffPosition.X",
"KickOffPosition.Y"
]
},
"name": {
"description": "The name of the CRS.",
"title": "CRS Name",
"type": "string",
"example": "NAD27 * OGP-Usa Conus / North Dakota South [32021,15851]"
}
},
"required": [
"kind",
"persistableReference"
]
},
{
"title": "FrameOfReferenceDateTime",
"type": "object",
"properties": {
"persistableReference": {
"description": "The self-contained, persistable reference string uniquely identifying DateTime reference.",
"title": "DateTime Persistable Reference",
"type": "string",
"example": "{\"format\":\"yyyy-MM-ddTHH:mm:ssZ\",\"timeZone\":\"UTC\",\"type\":\"DTM\"}"
},
"kind": {
"const": "DateTime",
"description": "The kind of reference, constant 'DateTime', for FrameOfReferenceDateTime.",
"title": "DateTime Reference Kind"
},
"propertyNames": {
"description": "The list of property names, to which this meta data item provides DateTime context to. Data structures, which come in a single frame of reference, can register the property name, others require a full path like \"Data.StructureA.PropertyB\" to define a unique context.",
"title": "DateTime Property Names",
"type": "array",
"items": {
"type": "string"
},
"example": [
"Acquisition.StartTime",
"Acquisition.EndTime"
]
},
"name": {
"description": "The name of the DateTime format and reference.",
"title": "DateTime Name",
"type": "string",
"example": "UTC"
}
},
"required": [
"kind",
"persistableReference"
]
},
{
"title": "FrameOfReferenceAzimuthReference",
"type": "object",
"properties": {
"persistableReference": {
"description": "The self-contained, persistable reference string uniquely identifying AzimuthReference.",
"title": "AzimuthReference Persistable Reference",
"type": "string",
"example": "{\"code\":\"TrueNorth\",\"type\":\"AZR\"}"
},
"kind": {
"const": "AzimuthReference",
"description": "The kind of reference, constant 'AzimuthReference', for FrameOfReferenceAzimuthReference.",
"title": "AzimuthReference Reference Kind"
},
"propertyNames": {
"description": "The list of property names, to which this meta data item provides AzimuthReference context to. Data structures, which come in a single frame of reference, can register the property name, others require a full path like \"Data.StructureA.PropertyB\" to define a unique context.",
"title": "AzimuthReference Property Names",
"type": "array",
"items": {
"type": "string"
},
"example": [
"Bearing"
]
},
"name": {
"description": "The name of the CRS or the symbol/name of the unit.",
"title": "AzimuthReference Name",
"type": "string",
"example": "TrueNorth"
}
},
"required": [
"kind",
"persistableReference"
]
}
],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractMetaItem:1.0.0",
"description": "A meta data item, which allows the association of named properties or property values to a Unit/Measurement/CRS/Azimuth/Time context.",
"title": "Frame of Reference Meta Data Item",
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractMetaItem.1.0.0.json"
},
"opendes:wks:AbstractGeoContext:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"oneOf": [
{
"$ref": "#/definitions/opendes:wks:AbstractGeoPoliticalContext:1.0.0"
},
{
"$ref": "#/definitions/opendes:wks:AbstractGeoBasinContext:1.0.0"
},
{
"$ref": "#/definitions/opendes:wks:AbstractGeoFieldContext:1.0.0"
},
{
"$ref": "#/definitions/opendes:wks:AbstractGeoPlayContext:1.0.0"
},
{
"$ref": "#/definitions/opendes:wks:AbstractGeoProspectContext:1.0.0"
}
],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractGeoContext:1.0.0",
"description": "A geographic context to an entity. It can be either a reference to a GeoPoliticalEntity, Basin, Field, Play or Prospect.",
"title": "AbstractGeoContext",
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractGeoContext.1.0.0.json"
},
"opendes:wks:AbstractLegalTags:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractLegalTags:1.0.0",
"description": "Legal meta data like legal tags, relevant other countries, legal status. This structure is included by the SystemProperties \"legal\", which is part of all OSDU records. Not extensible.",
"additionalProperties": false,
"title": "Legal Meta Data",
"type": "object",
"properties": {
"legaltags": {
"description": "The list of legal tags, which resolve to legal properties (like country of origin, export classification code, etc.) and rules with the help of the Compliance Service.",
"title": "Legal Tags",
"type": "array",
"items": {
"type": "string"
}
},
"otherRelevantDataCountries": {
"description": "The list of other relevant data countries as an array of two-letter country codes, see https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2.",
"title": "Other Relevant Data Countries",
"type": "array",
"items": {
"pattern": "^[A-Z]{2}$",
"type": "string"
}
},
"status": {
"pattern": "^(compliant|uncompliant)$",
"description": "The legal status. Set by the system after evaluation against the compliance rules associated with the \"legaltags\" using the Compliance Service.",
"title": "Legal Status",
"type": "string"
}
},
"required": [
"legaltags",
"otherRelevantDataCountries"
],
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractLegalTags.1.0.0.json"
},
"opendes:wks:AbstractGeoBasinContext:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractGeoBasinContext:1.0.0",
"description": "A single, typed basin entity reference, which is 'abstracted' to AbstractGeoContext and then aggregated by GeoContexts properties.",
"x-osdu-review-status": "Accepted",
"title": "AbstractGeoBasinContext",
"type": "object",
"properties": {
"BasinID": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Basin:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Reference to Basin.",
"x-osdu-relationship": [
{
"EntityType": "Basin",
"GroupType": "master-data"
}
],
"type": "string"
},
"GeoTypeID": {
"x-osdu-is-derived": {
"RelationshipPropertyName": "BasinID",
"TargetPropertyName": "BasinTypeID"
},
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-BasinType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The BasinType reference of the Basin (via BasinID) for application convenience.",
"x-osdu-relationship": [
{
"EntityType": "BasinType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractGeoBasinContext.1.0.0.json"
},
"opendes:wks:AbstractMaster:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractMaster:1.0.0",
"description": "Properties shared with all master-data schema instances.",
"x-osdu-review-status": "Accepted",
"title": "Abstract Master",
"type": "object",
"properties": {
"NameAliases": {
"description": "Alternative names, including historical, by which this master data is/has been known (it should include all the identifiers).",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractAliasNames:1.0.0"
}
},
"SpatialLocation": {
"description": "The spatial location information such as coordinates, CRS information (left empty when not appropriate).",
"$ref": "#/definitions/opendes:wks:AbstractSpatialLocation:1.0.0"
},
"VersionCreationReason": {
"description": "This describes the reason that caused the creation of a new version of this master data.",
"type": "string"
},
"GeoContexts": {
"description": "List of geographic entities which provide context to the master data. This may include multiple types or multiple values of the same type.",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractGeoContext:1.0.0"
}
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractMaster.1.0.0.json"
},
"opendes:wks:AbstractGeoProspectContext:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractGeoProspectContext:1.0.0",
"description": "A single, typed Prospect entity reference, which is 'abstracted' to AbstractGeoContext and then aggregated by GeoContexts properties.",
"x-osdu-review-status": "Accepted",
"title": "AbstractGeoProspectContext",
"type": "object",
"properties": {
"ProspectID": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Prospect:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Reference to the prospect.",
"x-osdu-relationship": [
{
"EntityType": "Prospect",
"GroupType": "master-data"
}
],
"type": "string"
},
"GeoTypeID": {
"x-osdu-is-derived": {
"RelationshipPropertyName": "ProspectID",
"TargetPropertyName": "ProspectTypeID"
},
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-ProspectType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The ProspectType reference of the Prospect (via ProspectID) for application convenience.",
"x-osdu-relationship": [
{
"EntityType": "ProspectType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractGeoProspectContext.1.0.0.json"
},
"opendes:wks:AbstractFacility:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFacility:1.0.0",
"description": "",
"title": "AbstractFacility",
"type": "object",
"properties": {
"FacilityStates": {
"description": "The history of life cycle states the facility has been through.",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractFacilityState:1.0.0"
}
},
"FacilityID": {
"description": "A system-specified unique identifier of a Facility.",
"type": "string"
},
"OperatingEnvironmentID": {
"x-osdu-relationship": [
{
"EntityType": "OperatingEnvironment",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-OperatingEnvironment:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Identifies the Facility's general location as being onshore vs. offshore.",
"type": "string"
},
"FacilityNameAliases": {
"description": "Alternative names, including historical, by which this facility is/has been known.",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractAliasNames:1.0.0"
}
},
"FacilityEvents": {
"description": "A list of key facility events.",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractFacilityEvent:1.0.0"
}
},
"FacilitySpecifications": {
"description": "facilitySpecification maintains the specification like slot name, wellbore drilling permit number, rig name etc.",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractFacilitySpecification:1.0.0"
}
},
"DataSourceOrganisationID": {
"x-osdu-relationship": [
{
"EntityType": "Organisation",
"GroupType": "master-data"
}
],
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Organisation:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The main source of the header information.",
"type": "string"
},
"InitialOperatorID": {
"x-osdu-relationship": [
{
"EntityType": "Organisation",
"GroupType": "master-data"
}
],
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Organisation:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "A initial operator organization ID; the organization ID may also be found in the FacilityOperatorOrganisationID of the FacilityOperator array providing the actual dates.",
"type": "string",
"title": "Initial Operator ID"
},
"CurrentOperatorID": {
"x-osdu-relationship": [
{
"EntityType": "Organisation",
"GroupType": "master-data"
}
],
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Organisation:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The current operator organization ID; the organization ID may also be found in the FacilityOperatorOrganisationID of the FacilityOperator array providing the actual dates.",
"type": "string",
"title": "Current Operator ID"
},
"FacilityOperators": {
"description": "The history of operator organizations of the facility.",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractFacilityOperator:1.0.0"
}
},
"FacilityName": {
"description": "Name of the Facility.",
"type": "string"
},
"FacilityTypeID": {
"x-osdu-relationship": [
{
"EntityType": "FacilityType",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-FacilityType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The definition of a kind of capability to perform a business function or a service.",
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFacility.1.0.0.json"
},
"opendes:wks:AbstractSpatialLocation:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractSpatialLocation:1.0.0",
"description": "A geographic object which can be described by a set of points.",
"title": "AbstractSpatialLocation",
"type": "object",
"properties": {
"AsIngestedCoordinates": {
"description": "The original or 'as ingested' coordinates (Point, MultiPoint, LineString, MultiLineString, Polygon or MultiPolygon). The name 'AsIngestedCoordinates' was chosen to contrast it to 'OriginalCoordinates', which carries the uncertainty whether any coordinate operations took place before ingestion. In cases where the original CRS is different from the as-ingested CRS, the OperationsApplied can also contain the list of operations applied to the coordinate prior to ingestion. The data structure is similar to GeoJSON FeatureCollection, however in a CRS context explicitly defined within the AbstractAnyCrsFeatureCollection. The coordinate sequence follows GeoJSON standard, i.e. 'eastward/longitude', 'northward/latitude' {, 'upward/height' unless overridden by an explicit direction in the AsIngestedCoordinates.VerticalCoordinateReferenceSystemID}.",
"x-osdu-frame-of-reference": "CRS:",
"title": "As Ingested Coordinates",
"$ref": "#/definitions/opendes:wks:AbstractAnyCrsFeatureCollection:1.0.0"
},
"SpatialParameterTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-SpatialParameterType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "A type of spatial representation of an object, often general (e.g. an Outline, which could be applied to Field, Reservoir, Facility, etc.) or sometimes specific (e.g. Onshore Outline, State Offshore Outline, Federal Offshore Outline, 3 spatial representations that may be used by Countries).",
"x-osdu-relationship": [
{
"EntityType": "SpatialParameterType",
"GroupType": "reference-data"
}
],
"type": "string"
},
"QuantitativeAccuracyBandID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-QuantitativeAccuracyBand:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "An approximate quantitative assessment of the quality of a location (accurate to > 500 m (i.e. not very accurate)), to < 1 m, etc.",
"x-osdu-relationship": [
{
"EntityType": "QuantitativeAccuracyBand",
"GroupType": "reference-data"
}
],
"type": "string"
},
"CoordinateQualityCheckRemarks": {
"type": "array",
"description": "Freetext remarks on Quality Check.",
"items": {
"type": "string"
}
},
"AppliedOperations": {
"description": "The audit trail of operations applied to the coordinates from the original state to the current state. The list may contain operations applied prior to ingestion as well as the operations applied to produce the Wgs84Coordinates. The text elements refer to ESRI style CRS and Transformation names, which may have to be translated to EPSG standard names.",
"title": "Operations Applied",
"type": "array",
"items": {
"type": "string"
},
"example": [
"conversion from ED_1950_UTM_Zone_31N to GCS_European_1950; 1 points converted",
"transformation GCS_European_1950 to GCS_WGS_1984 using ED_1950_To_WGS_1984_24; 1 points successfully transformed"
]
},
"QualitativeSpatialAccuracyTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-QualitativeSpatialAccuracyType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "A qualitative description of the quality of a spatial location, e.g. unverifiable, not verified, basic validation.",
"x-osdu-relationship": [
{
"EntityType": "QualitativeSpatialAccuracyType",
"GroupType": "reference-data"
}
],
"type": "string"
},
"CoordinateQualityCheckPerformedBy": {
"type": "string",
"description": "The user who performed the Quality Check."
},
"SpatialLocationCoordinatesDate": {
"format": "date-time",
"description": "Date when coordinates were measured or retrieved.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"CoordinateQualityCheckDateTime": {
"format": "date-time",
"description": "The date of the Quality Check.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"Wgs84Coordinates": {
"title": "WGS 84 Coordinates",
"description": "The normalized coordinates (Point, MultiPoint, LineString, MultiLineString, Polygon or MultiPolygon) based on WGS 84 (EPSG:4326 for 2-dimensional coordinates, EPSG:4326 + EPSG:5714 (MSL) for 3-dimensional coordinates). This derived coordinate representation is intended for global discoverability only. The schema of this substructure is identical to the GeoJSON FeatureCollection https://geojson.org/schema/FeatureCollection.json. The coordinate sequence follows GeoJSON standard, i.e. longitude, latitude {, height}",
"$ref": "#/definitions/opendes:wks:AbstractFeatureCollection:1.0.0"
},
"SpatialGeometryTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-SpatialGeometryType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Indicates the expected look of the SpatialParameterType, e.g. Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon. The value constrains the type of geometries in the GeoJSON Wgs84Coordinates and AsIngestedCoordinates.",
"x-osdu-relationship": [
{
"EntityType": "SpatialGeometryType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractSpatialLocation.1.0.0.json"
},
"opendes:wks:AbstractGeoFieldContext:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractGeoFieldContext:1.0.0",
"description": "A single, typed field entity reference, which is 'abstracted' to AbstractGeoContext and then aggregated by GeoContexts properties.",
"title": "AbstractGeoFieldContext",
"type": "object",
"properties": {
"FieldID": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Field:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Reference to Field.",
"x-osdu-relationship": [
{
"EntityType": "Field",
"GroupType": "master-data"
}
],
"type": "string"
},
"GeoTypeID": {
"const": "Field",
"description": "The fixed type 'Field' for this AbstractGeoFieldContext."
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractGeoFieldContext.1.0.0.json"
},
"opendes:wks:AbstractFacilityVerticalMeasurement:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFacilityVerticalMeasurement:1.0.0",
"description": "A location along a wellbore, _usually_ associated with some aspect of the drilling of the wellbore, but not with any intersecting _subsurface_ natural surfaces.",
"title": "AbstractFacilityVerticalMeasurement",
"type": "object",
"properties": {
"WellboreTVDTrajectoryID": {
"x-osdu-relationship": [
{
"EntityType": "WellboreTrajectory",
"GroupType": "work-product-component"
}
],
"pattern": "^[\\w\\-\\.]+:work-product-component\\-\\-WellboreTrajectory:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Specifies what directional survey or wellpath was used to calculate the TVD.",
"type": "string"
},
"VerticalCRSID": {
"x-osdu-relationship": [
{
"EntityType": "CoordinateReferenceSystem",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-CoordinateReferenceSystem:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "A vertical coordinate reference system defines the origin for height or depth values. It is expected that either VerticalCRSID or VerticalReferenceID reference is provided in a given vertical measurement array object, but not both.",
"type": "string"
},
"VerticalMeasurementSourceID": {
"x-osdu-relationship": [
{
"EntityType": "VerticalMeasurementSource",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-VerticalMeasurementSource:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Specifies Driller vs Logger.",
"type": "string"
},
"VerticalReferenceID": {
"description": "The reference point from which the relative vertical measurement is made. This is only populated if the measurement has no VerticalCRSID specified. The value entered must be the VerticalMeasurementID for another vertical measurement array element in this resource or its parent facility, and as a chain of measurements, they must resolve ultimately to a Vertical CRS. It is expected that a VerticalCRSID or a VerticalReferenceID is provided in a given vertical measurement array object, but not both.",
"type": "string"
},
"TerminationDateTime": {
"format": "date-time",
"description": "The date and time at which a vertical measurement instance is no longer in effect.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"VerticalMeasurementPathID": {
"x-osdu-relationship": [
{
"EntityType": "VerticalMeasurementPath",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-VerticalMeasurementPath:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Specifies Measured Depth, True Vertical Depth, or Elevation.",
"type": "string"
},
"EffectiveDateTime": {
"format": "date-time",
"description": "The date and time at which a vertical measurement instance becomes effective.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"VerticalMeasurement": {
"description": "The value of the elevation or depth. Depth is positive downwards from a vertical reference or geodetic datum along a path, which can be vertical; elevation is positive upwards from a geodetic datum along a vertical path. Either can be negative.",
"x-osdu-frame-of-reference": "UOM_via_property:VerticalMeasurementUnitOfMeasureID",
"type": "number"
},
"VerticalMeasurementTypeID": {
"x-osdu-relationship": [
{
"EntityType": "VerticalMeasurementType",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-VerticalMeasurementType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Specifies the type of vertical measurement (TD, Plugback, Kickoff, Drill Floor, Rotary Table...).",
"type": "string"
},
"VerticalMeasurementDescription": {
"description": "Text which describes a vertical measurement in detail.",
"type": "string"
},
"VerticalMeasurementUnitOfMeasureID": {
"x-osdu-relationship": [
{
"EntityType": "UnitOfMeasure",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-UnitOfMeasure:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The unit of measure for the vertical measurement. If a unit of measure and a vertical CRS are provided, the unit of measure provided is taken over the unit of measure from the CRS.",
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFacilityVerticalMeasurement.1.0.0.json"
},
"opendes:wks:AbstractFeatureCollection:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFeatureCollection:1.0.0",
"description": "GeoJSON feature collection as originally published in https://geojson.org/schema/FeatureCollection.json. Attention: the coordinate order is fixed: Longitude first, followed by Latitude, optionally height above MSL (EPSG:5714) as third coordinate.",
"title": "GeoJSON FeatureCollection",
"type": "object",
"required": [
"type",
"features"
],
"properties": {
"type": {
"type": "string",
"enum": [
"FeatureCollection"
]
},
"features": {
"type": "array",
"items": {
"title": "GeoJSON Feature",
"type": "object",
"required": [
"type",
"properties",
"geometry"
],
"properties": {
"geometry": {
"oneOf": [
{
"type": "null"
},
{
"title": "GeoJSON Point",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
},
"type": {
"type": "string",
"enum": [
"Point"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON LineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"LineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON Polygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"Polygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON MultiPoint",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"MultiPoint"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON MultiLineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"MultiLineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON MultiPolygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
}
},
"type": {
"type": "string",
"enum": [
"MultiPolygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON GeometryCollection",
"type": "object",
"required": [
"type",
"geometries"
],
"properties": {
"type": {
"type": "string",
"enum": [
"GeometryCollection"
]
},
"geometries": {
"type": "array",
"items": {
"oneOf": [
{
"title": "GeoJSON Point",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
},
"type": {
"type": "string",
"enum": [
"Point"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON LineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"LineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON Polygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"Polygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON MultiPoint",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
},
"type": {
"type": "string",
"enum": [
"MultiPoint"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON MultiLineString",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
},
"type": {
"type": "string",
"enum": [
"MultiLineString"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
},
{
"title": "GeoJSON MultiPolygon",
"type": "object",
"required": [
"type",
"coordinates"
],
"properties": {
"coordinates": {
"type": "array",
"items": {
"type": "array",
"items": {
"minItems": 4,
"type": "array",
"items": {
"minItems": 2,
"type": "array",
"items": {
"type": "number"
}
}
}
}
},
"type": {
"type": "string",
"enum": [
"MultiPolygon"
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
}
]
}
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
}
]
},
"type": {
"type": "string",
"enum": [
"Feature"
]
},
"properties": {
"oneOf": [
{
"type": "null"
},
{
"type": "object"
}
]
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
}
}
},
"bbox": {
"minItems": 4,
"type": "array",
"items": {
"type": "number"
}
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFeatureCollection.1.0.0.json"
},
"opendes:wks:AbstractFacilityState:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFacilityState:1.0.0",
"description": "The life cycle status of a facility at some point in time.",
"title": "AbstractFacilityState",
"type": "object",
"properties": {
"EffectiveDateTime": {
"format": "date-time",
"description": "The date and time at which the facility state becomes effective.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"FacilityStateTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-FacilityStateType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The facility life cycle state from planning to abandonment.",
"x-osdu-relationship": [
{
"EntityType": "FacilityStateType",
"GroupType": "reference-data"
}
],
"type": "string"
},
"TerminationDateTime": {
"format": "date-time",
"description": "The date and time at which the facility state is no longer in effect.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFacilityState.1.0.0.json"
},
"opendes:wks:AbstractAccessControlList:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractAccessControlList:1.0.0",
"description": "The access control tags associated with this entity. This structure is included by the SystemProperties \"acl\", which is part of all OSDU records. Not extensible.",
"additionalProperties": false,
"title": "Access Control List",
"type": "object",
"properties": {
"viewers": {
"description": "The list of viewers to which this data record is accessible/visible/discoverable formatted as an email (core.common.model.storage.validation.ValidationDoc.EMAIL_REGEX).",
"title": "List of Viewers",
"type": "array",
"items": {
"pattern": "^[a-zA-Z0-9_+&*-]+(?:\\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,7}$",
"type": "string"
}
},
"owners": {
"description": "The list of owners of this data record formatted as an email (core.common.model.storage.validation.ValidationDoc.EMAIL_REGEX).",
"title": "List of Owners",
"type": "array",
"items": {
"pattern": "^[a-zA-Z0-9_+&*-]+(?:\\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,7}$",
"type": "string"
}
}
},
"required": [
"owners",
"viewers"
],
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractAccessControlList.1.0.0.json"
},
"opendes:wks:AbstractGeoPlayContext:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractGeoPlayContext:1.0.0",
"description": "A single, typed Play entity reference, which is 'abstracted' to AbstractGeoContext and then aggregated by GeoContexts properties.",
"x-osdu-review-status": "Accepted",
"title": "AbstractGeoPlayContext",
"type": "object",
"properties": {
"PlayID": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Play:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Reference to the play.",
"x-osdu-relationship": [
{
"EntityType": "Play",
"GroupType": "master-data"
}
],
"type": "string"
},
"GeoTypeID": {
"x-osdu-is-derived": {
"RelationshipPropertyName": "PlayID",
"TargetPropertyName": "PlayTypeID"
},
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-PlayType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The PlayType reference of the Play (via PlayID) for application convenience.",
"x-osdu-relationship": [
{
"EntityType": "PlayType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractGeoPlayContext.1.0.0.json"
},
"opendes:wks:AbstractLegalParentList:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractLegalParentList:1.0.0",
"description": "A list of entity IDs in the data ecosystem, which act as legal parents to the current entity. This structure is included by the SystemProperties \"ancestry\", which is part of all OSDU records. Not extensible.",
"additionalProperties": false,
"title": "Parent List",
"type": "object",
"properties": {
"parents": {
"description": "An array of none, one or many entity references in the data ecosystem, which identify the source of data in the legal sense. In contract to other relationships, the source record version is required. Example: the 'parents' will be queried when e.g. the subscription of source data services is terminated; access to the derivatives is also terminated.",
"title": "Parents",
"type": "array",
"items": {
"x-osdu-relationship": [],
"pattern": "^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]+$",
"type": "string"
},
"example": []
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractLegalParentList.1.0.0.json"
},
"opendes:wks:AbstractFacilityEvent:1.0.0": {
"x-osdu-inheriting-from-kind": [],
"x-osdu-license": "Copyright 2021, The Open Group \\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
"$schema": "http://json-schema.org/draft-07/schema#",
"x-osdu-schema-source": "osdu:wks:AbstractFacilityEvent:1.0.0",
"description": "A significant occurrence in the life of a facility, which often changes its state, or the state of one of its components. It can describe a point-in-time (event) or a time interval of a specific type (FacilityEventType).",
"title": "AbstractFacilityEvent",
"type": "object",
"properties": {
"EffectiveDateTime": {
"format": "date-time",
"description": "The date and time at which the event took place or takes effect.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"TerminationDateTime": {
"format": "date-time",
"description": "The date and time at which the event is no longer in effect. For point-in-time events the 'TerminationDateTime' must be set equal to 'EffectiveDateTime'. Open time intervals have an absent 'TerminationDateTime'.",
"x-osdu-frame-of-reference": "DateTime",
"type": "string"
},
"FacilityEventTypeID": {
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-FacilityEventType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The facility event type is a picklist. Examples: 'Permit', 'Spud', 'Abandon', etc.",
"x-osdu-relationship": [
{
"EntityType": "FacilityEventType",
"GroupType": "reference-data"
}
],
"type": "string"
}
},
"$id": "https://schema.osdu.opengroup.org/json/abstract/AbstractFacilityEvent.1.0.0.json"
}
},
"properties": {
"ancestry": {
"description": "The links to data, which constitute the inputs.",
"title": "Ancestry",
"$ref": "#/definitions/opendes:wks:AbstractLegalParentList:1.0.0"
},
"data": {
"allOf": [
{
"$ref": "#/definitions/opendes:wks:AbstractCommonResources:1.0.0"
},
{
"$ref": "#/definitions/opendes:wks:AbstractMaster:1.0.0"
},
{
"$ref": "#/definitions/opendes:wks:AbstractFacility:1.0.0"
},
{
"type": "object",
"properties": {
"DefaultVerticalCRSID": {
"x-osdu-relationship": [
{
"EntityType": "CoordinateReferenceSystem",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-CoordinateReferenceSystem:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "The default vertical coordinate reference system used in the vertical measurements for a well or wellbore if absent from input vertical measurements and there is no other recourse for obtaining a valid CRS.",
"type": "string"
},
"VerticalMeasurements": {
"description": "List of all depths and elevations pertaining to the well, like, water depth, mud line elevation, etc.",
"type": "array",
"items": {
"allOf": [
{
"type": "object",
"properties": {
"VerticalMeasurementID": {
"description": "The ID for a distinct vertical measurement within the Wellbore VerticalMeasurements array so that it may be referenced by other vertical measurements if necessary.",
"type": "string"
}
}
},
{
"$ref": "#/definitions/opendes:wks:AbstractFacilityVerticalMeasurement:1.0.0"
}
]
}
},
"InterestTypeID": {
"x-osdu-relationship": [
{
"EntityType": "WellInterestType",
"GroupType": "reference-data"
}
],
"pattern": "^[\\w\\-\\.]+:reference-data\\-\\-WellInterestType:[\\w\\-\\.\\:\\%]+:[0-9]*$",
"description": "Pre-defined reasons for interest in the well or information about the well.",
"type": "string"
},
"DefaultVerticalMeasurementID": {
"description": "The default datum reference point, or zero depth point, used to determine other points vertically in a well. References an entry in the VerticalMeasurements array.",
"type": "string"
}
}
},
{
"type": "object",
"properties": {
"ExtensionProperties": {
"type": "object"
}
}
}
]
},
"kind": {
"pattern": "^[\\w\\-\\.]+:[\\w\\-\\.]+:[\\w\\-\\.]+:[0-9]+.[0-9]+.[0-9]+$",
"description": "The schema identification for the OSDU resource object following the pattern {Namespace}:{Source}:{Type}:{VersionMajor}.{VersionMinor}.{VersionPatch}. The versioning scheme follows the semantic versioning, https://semver.org/.",
"title": "Entity Kind",
"type": "string",
"example": "osdu:wks:master-data--Well:1.0.0"
},
"acl": {
"description": "The access control tags associated with this entity.",
"title": "Access Control List",
"$ref": "#/definitions/opendes:wks:AbstractAccessControlList:1.0.0"
},
"version": {
"format": "int64",
"description": "The version number of this OSDU resource; set by the framework.",
"title": "Version Number",
"type": "integer",
"example": 1562066009929332
},
"tags": {
"description": "A generic dictionary of string keys mapping to string value. Only strings are permitted as keys and values.",
"additionalProperties": {
"type": "string"
},
"title": "Tag Dictionary",
"type": "object",
"example": {
"NameOfKey": "String value"
}
},
"modifyUser": {
"description": "The user reference, which created this version of this resource object. Set by the System.",
"title": "Resource Object Version Creation User Reference",
"type": "string",
"example": "some-user@some-company-cloud.com"
},
"modifyTime": {
"format": "date-time",
"description": "Timestamp of the time at which this version of the OSDU resource object was created. Set by the System. The value is a combined date-time string in ISO-8601 given in UTC.",
"title": "Resource Object Version Creation DateTime",
"type": "string",
"example": "2020-12-16T11:52:24.477Z"
},
"createTime": {
"format": "date-time",
"description": "Timestamp of the time at which initial version of this OSDU resource object was created. Set by the System. The value is a combined date-time string in ISO-8601 given in UTC.",
"title": "Resource Object Creation DateTime",
"type": "string",
"example": "2020-12-16T11:46:20.163Z"
},
"meta": {
"description": "The Frame of Reference meta data section linking the named properties to self-contained definitions.",
"title": "Frame of Reference Meta Data",
"type": "array",
"items": {
"$ref": "#/definitions/opendes:wks:AbstractMetaItem:1.0.0"
}
},
"legal": {
"description": "The entity's legal tags and compliance status. The actual contents associated with the legal tags is managed by the Compliance Service.",
"title": "Legal Tags",
"$ref": "#/definitions/opendes:wks:AbstractLegalTags:1.0.0"
},
"createUser": {
"description": "The user reference, which created the first version of this resource object. Set by the System.",
"title": "Resource Object Creation User Reference",
"type": "string",
"example": "some-user@some-company-cloud.com"
},
"id": {
"pattern": "^[\\w\\-\\.]+:master-data\\-\\-Well:[\\w\\-\\.\\:\\%]+$",
"description": "Previously called ResourceID or SRN which identifies this OSDU resource object without version.",
"title": "Entity ID",
"type": "string",
"example": "namespace:master-data--Well:6c60ceb0-3521-57b7-9bd8-e1d7c9f66230"
}
},
"required": [
"kind",
"acl",
"legal"
],
"$id": "https://schema.osdu.opengroup.org/json/master-data/Well.1.0.0.json"
}
```
If you see above the schema for `AbstractMaster:1.0.0` exists in definitions sectionhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/33What is the need for implementing BlobStorageClient by CSPs2021-03-16T12:25:14ZKishore BattulaWhat is the need for implementing BlobStorageClient by CSPsThere are two interfaces that need to be implemented by CSPs `BlobStorageClient` and `BaseCredentials`. I see currently `BaseCredentials` is being used to generate token. What is the purpose of adding `BlobStorageClient` and by when CSPs...There are two interfaces that need to be implemented by CSPs `BlobStorageClient` and `BaseCredentials`. I see currently `BaseCredentials` is being used to generate token. What is the purpose of adding `BlobStorageClient` and by when CSPs need to implement this interface?Siarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Alan HensonSiarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/44Manifest ingestion not picking dataset id generated by file service2021-03-09T04:54:54ZKishore BattulaManifest ingestion not picking dataset id generated by file serviceIn Azure to refer a file in manifest following steps are performed
1. Generate signed url
2. Upload content to signed url
3. Create file metadata
4. Refer the id received from create file metadata in the manifest.
The above steps are ne...In Azure to refer a file in manifest following steps are performed
1. Generate signed url
2. Upload content to signed url
3. Create file metadata
4. Refer the id received from create file metadata in the manifest.
The above steps are needed because manifest ingestion doesn't talk to file service which will reduce step 3 (which will be part of manifest ingestion).
Because of this limitation manifest file will have actual id from platform when referring a file. Master code of manifest ingestion is throwing exception when validating dataset ids. Please refer below for logs
```
Failed validating 'pattern' in schema['properties']['data']['allOf'][1]['properties']['Datasets']['items']:
{'description': 'The SRN which identifies this OSDU File resource.',
'pattern': '^(surrogate-key:.+|[\\w\\-\\.]+:dataset\\-\\-[\\w\\-\\.]+:[\\w\\-\\.\\:\\%]+:[0-9]*)$',
'type': 'string',
'x-osdu-relationship': [{'GroupType': 'dataset'}]}
On instance['data']['Datasets'][0]:
'opendes:dataset--File.Generic:253ea910-a13d-4759-a168-73310a8b2b2e'
```Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/45Manifest ingestion interpreting anything after last ":" as version of the id2021-03-09T13:19:48ZKishore BattulaManifest ingestion interpreting anything after last ":" as version of the idFor example, this id=”opendes:work-product-component--WellLog:feb2:1” is a valid id in the osdu platform. When searching in the search service the input id is split based on last “:” and first part is treated as id and second part as ver...For example, this id=”opendes:work-product-component--WellLog:feb2:1” is a valid id in the osdu platform. When searching in the search service the input id is split based on last “:” and first part is treated as id and second part as version. In the above example, the id that is sent to search service is “opendes:work-product-component--WellLog:feb2” as last 1 is treated as version. Because of the above interpretation the record will not be found in the platform and will skip ingestion for records that dependent on this id.Kishore BattulaKateryna Kurach (EPAM)Kishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/50R3 Ingestion DAG fails to ingest official R3 manifests due to reference to Re...2021-03-17T22:07:34ZSpencer Suttonsuttonsp@amazon.comR3 Ingestion DAG fails to ingest official R3 manifests due to reference to ResourceSecurityClassificationWhen I execute [attached body](/uploads/c9e993a4944f0c4761285c6dad5632e5/example_wp_manifest.json) against the R3 ingestion dag, it fails due to a ResourceSecurityClassification record not being found. The official R3 manifests found [he...When I execute [attached body](/uploads/c9e993a4944f0c4761285c6dad5632e5/example_wp_manifest.json) against the R3 ingestion dag, it fails due to a ResourceSecurityClassification record not being found. The official R3 manifests found [here](https://community.opengroup.org/osdu/platform/open-test-data/-/tree/master/rc--3.0.0/4-instances/TNO/work-products/trajectories) reference this in the ResourceSecurityClassification field: `opendes:reference-data--ResourceSecurityClassification:RESTRICTED:`
It seems as though the dag is trying to find a record by that id in the system. The problem is that this id can't be used to make any record, it throws a 400 on storage. So the dag logs this as a warning:
![image](/uploads/8331a9c98da7af7fc666f2497473da49/image.png)
And then doesn't actually make any records even though it finishes as success.
**Either the official manifests need to be updated to have an actual record id there or the dag needs to be updated to not look for the resource security classification record using that field as an id.**
Specifically, this logic happens on line 181 of `validate_referential_integrity.py`:
![image](/uploads/86ef0fe7a249d35a691d043ad2ea5227/image.png)
Other than this problem, I was able to successfully ingest manifests with this dag on AWS.ethiraj krishnamanaiduJoeSiarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Alan HensonBrady Spiva [AWS]ethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/35Improve Airflow logs2021-04-20T16:13:01ZKateryna Kurach (EPAM)Improve Airflow logsThe problem that we are facing now is that it is hard to read Airflow logs and hard to see what records were stored into Storage service and what records were not.
Several comments here:
1. DAG execution status may be green, but some of ...The problem that we are facing now is that it is hard to read Airflow logs and hard to see what records were stored into Storage service and what records were not.
Several comments here:
1. DAG execution status may be green, but some of the records were not stored.
This is somewhat expected behavior, thats why DAG is displayed green in the Airflow. We expect that some of the entities may fail validation. If they fail validation, we skip them and process other entities. It may be confusing to the user.
2. We have lots of tasks in osdu_ingest DAG now and validation happen at different stages, so logs are spread out between different tasks. -> it is hard for a user to know what log to check
3. Sometimes Airflow logs don't even display skipped ids. This is a critical issue that has to be fixed.
Ideally, it would be great to produce a report at the end of DAG execution. Report should list processed ids and unprocessed ids with the errors.Kateryna Kurach (EPAM)Kateryna Kurach (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/52Determine if Airflow logs are piped to cloud logs2021-04-20T00:53:22ZAlan HensonDetermine if Airflow logs are piped to cloud logsIdeally, logs generated by Airflow are piped to the underlying cloud service provider's (CSP) logging framework. Once there, these logs are accessible via the CSP's respective consoles.
This issue is meant to validate which CSPs have im...Ideally, logs generated by Airflow are piped to the underlying cloud service provider's (CSP) logging framework. Once there, these logs are accessible via the CSP's respective consoles.
This issue is meant to validate which CSPs have implemented this capability:
- [ ] AWS
- [X] Azure
- [X] GCP
- [X] IBMhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/57[POC] Install and investigate Airflow 2.0 [GONRG-2214]2022-08-23T11:19:11ZKateryna Kurach (EPAM)[POC] Install and investigate Airflow 2.0 [GONRG-2214]Install Airflow 2.0 and test:
Backward compatibility
Scheduler performance
Scale the web server by scaling the size of a node that the web server is using
Test Postgresql
Review other features that can improve performance
Link to GCP i...Install Airflow 2.0 and test:
Backward compatibility
Scheduler performance
Scale the web server by scaling the size of a node that the web server is using
Test Postgresql
Review other features that can improve performance
Link to GCP issue-tracking: https://jiraeu.epam.com/browse/GONRG-2214https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/55Bug in utils.py method "split_id" prevents WP manifest ingestion2022-08-23T11:19:16ZSpencer Suttonsuttonsp@amazon.comBug in utils.py method "split_id" prevents WP manifest ingestionThere is a bug in the utils.py method `split_id` where it assumes that any number at the end must be a version for the record and must be removed even when the number at the end is not a version but part of the actual record id. This mak...There is a bug in the utils.py method `split_id` where it assumes that any number at the end must be a version for the record and must be removed even when the number at the end is not a version but part of the actual record id. This makes it so every single WP manifest that references any master data can't be ingested.
**Details:**
WP manifests reference master data like this:
`{{data-partition-id}}:wks:master-data--Well:1000:`
When this gets to the referential integrity step, the method `split_id` in utils.py takes this external reference and returns:
`{{data-partition-id}}:wks:master-data--Well`
This returned value is then passed along to the search service to look for the record's existence. Search returns nothing because that isn't a valid record id it is searching for. Subsequently, the dag logs a warning and never ingests the manifest because it "failed" the referential check.
The `split_id` method should return `{{data-partition-id}}:wks:master-data--Well:1000` like how it does for reference data records. The problem is found on these lines:
![image](/uploads/ec995d5b73ee360f3a614f36c3dc0283/image.png)
It is assuming that anything that is numbers at the end of a record id must be a version number, ignoring the position of those numbers. **This line of code needs to change to allow digits at the end of record ids.**
That first if condition you see above should catch this problem since the record id we're passing in has a trailing colon. However, this trailing colon is removed earlier in the process in the method `_extract_external_references`
![image](/uploads/1e971d49fe55ffb7ff5268cd4a84c6b2/image.png)
If you try to bypass this by removing the colon at the end in the manifest itself, the validation step throws an error and keeps you from ingesting the manifest. The only way pass this for now is to comment out the lines of code I've circled in the image above.Siarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/65Move to Airflow 2.0 ADR2023-10-23T07:49:41ZBen LasscockMove to Airflow 2.0 ADR# Moving to Airflow 2.0
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
### Decision
This decision will authorize the port of the ingestion workflow and associated DAG's (see below) to support ...# Moving to Airflow 2.0
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
### Decision
This decision will authorize the port of the ingestion workflow and associated DAG's (see below) to support Airflow 2.0, deprecating support for Airflow 1.10.x after a transitionary period.
## Deprecating strategy
The existing [experimental](https://airflow.apache.org/docs/apache-airflow/stable/deprecated-rest-api-ref.html) api is still be available in Airflow 2.0 here:\
/api/experimental/
_To restore these APIs while migrating to the stable REST API, set enable_experimental_api option in [api] section to True._
A deprecating strategy will be implemented providing providers with a transitionary period to port to Airflow 2.0. During this period, common code will run on Airflow 2.0 by default, however, configuration (possibly environment variables) will allow code written for 1.10.x to be supported.
A guide to detailing the backward compatibility changes can be found [here](https://github.com/apache/airflow/blob/main/UPDATING.md).
### Dependencies
The following task list (provided by EPAM) gives an overview of the dependencies and level of effort required to implement the move.
| | Task | Estimate | Assigned to | Ticket |
| --- | --- | --- | --- | --- |
| 1. | Install Airflow 2.0 to all environments | | All CSP's |
| 2. | Airflow 2.0 required DAG code changes | | |
| 2.1 | Manifest-based ingestion (Python) | 5 days | (GCP) | [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/60) |
| 2.2 | WITSML parser (Python) | 5 days | (GCP) | [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/61) |
| 2.3 | SEGY -> OpenVDS | 5 days | (GCP) | |
| 2.4 | SEGY -> ZGY | 5 days | | [Seismic](osdu/platform/data-flow/ingestion/segy-to-zgy-conversion#4) |
| 2.5 | CSV Parser | 5 days | (GCP) | [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/33) |
| 3. | Workflow Services | | |
| 3.1 | Common Code (Java) | 10 days | | [issue](https://community.opengroup.org/osdu/platform/data-flow/data-workflow-framework/data-workflow/-/issues/1) |
| 3.2. | AWS | | | |
| 3.3. | Azure | | | |
### Motivation
The release of Airflow 2.0 is a significant upgrade from the previous versions, and includes improvements and new features that support our goals for the ingestion workflow [see link](https://www.astronomer.io/blog/introducing-airflow-2-0).
In the context of our goals for progressing OSDU ingestion project:
**Ease of on boarding developers**
In Airflow <2.0 the "experiment" REST API is being deprecated, with a move a new comprehensive "stable" REST API supported by Airflow >=2.0. Moving to the new Airflow should ensure that the code being created by the OSDU will enjoy greater support online and will be easier for new developers to adopt and extend.
The new Airflow 2.0 **Task Flow API** simplifies the passing for information between tasks in a DAG. This feature does not solve performance problems related to passing large manifests through the workflow, however that is the focus of another effort "Manifest by reference". An example of the new taskflow api can be found [here](https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html).
**Latency**
One of the major features of Airflow 2.0 is a new high availability + low latency scheduler.
Measurements made during the OSDU Airflow 2.0 PoC (conducted by EPAM) it was found that using Airflow 1.10.14, latency between tasks (productive work) could be as much as 30 seconds; with and equivalent code running on Airflow 2.0, this overhead was reduced to 5 seconds.
There has been an issue that Airflow <1.10.15 would be default, only allow the creation of one DAG per second, potentially creating latency issues. Although this was solved with the release [1.10.15](https://github.com/apache/airflow/pull/10633), dependency on a minor version has created a variance in the behavior of the ingestion workflow across providers, and moving to Airflow >2.0 will solve this. A complete list of bug fixes and improvements through to the current release of Airflow 2.1.0 can be found [here](https://airflow.apache.org/docs/apache-airflow/2.0.1/dag-serialization.html).
**Throughput and scalability**
In the current version of Airflow, the Airflow scheduler has been found to fail silently once the max_active_runs_per_dag (configuration default is 20) is exceeded. This creates a variance on the behavior of the ingestion workflow based on the specific configuration of the OSDU platform provider. During the Airflow 2.0 PoC it was found that this problem was solved.
_[A] user can now launch additional "replicas" of the Scheduler to increase the throughput of their Airflow Deployment._\
The option for schedulers creates the potential for providers to scale the ingestion workflow by provisioning more resources, but also removes a single point of failure (when using two or more schedulers), providing a more resilient system.
### Project Risks
| Risk Category | Risk Description | Likelihood | Impact | Comments |
| --- | --- | --- | --- | --- |
| NA | NA | NA | NA | NA |
### Organizational Management
| Name | Project Role | Time Zone |
| --- | --- | --- |
| Kateryna | GCP | CST |
| Kishore | Azure | IST |
| Shrikant | IBM | IST |
| Greg Wibben | AWS | CST |
| Ben | Manifest | CST |
| Chad | Data Loading | CET | |
| Fernando | CSV | | |
| Sacha | [Seismic](osdu/platform/data-flow/ingestion/segy-to-zgy-conversion#4) | | |https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/68Manifest by reference - local dev environmant2021-08-19T14:29:38ZBen LasscockManifest by reference - local dev environmantBuild up locally a complete development environment, this will allow to implement a test a cloud agnostic solution which could be delivered to the EPAM team and others.Build up locally a complete development environment, this will allow to implement a test a cloud agnostic solution which could be delivered to the EPAM team and others.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/73Python SDK - Manifest-based ingestion updates [GONRG-2726] - Part 12021-07-20T15:28:02ZKateryna Kurach (EPAM)Python SDK - Manifest-based ingestion updates [GONRG-2726] - Part 1[https://jiraeu.epam.com/browse/GONRG-2694](https://jiraeu.epam.com/browse/GONRG-2694)[https://jiraeu.epam.com/browse/GONRG-2694](https://jiraeu.epam.com/browse/GONRG-2694)M7 - Release 0.10Siarhei Khaletski (EPAM)Kateryna Kurach (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/75Expand the list of valid dataset kinds through configuration to prevent valid...2021-07-09T10:53:37ZJacob RougeauExpand the list of valid dataset kinds through configuration to prevent validation failure for EDS datasets# Problem Statement
The validate_dataset method in [validate_file_source.py](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/validation/validate_file_source.py#L25) is called ...# Problem Statement
The validate_dataset method in [validate_file_source.py](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/validation/validate_file_source.py#L25) is called during ingestion and validates acceptable dataset types. The logic appears to be hard-coded to only accept Datasets of kind dataset--File and dataset--FileCollection. Any other Dataset type, like the new External Data Services (EDS) [ConnectedSource.Generic.0.2.0](https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/external-data-framework/-/issues/173) in proposal fails validation.
```python
class DatasetType:
FILE = ":dataset--File."
FILE_COLLECTION = ":dataset--FileCollection."
```
```python
def _validate_dataset(self, dataset: dict) -> dict:
"""
:param dataset: A dataset to be validated
:return: Dataset
"""
is_file = DatasetType.FILE in dataset.get("kind", "")
is_file_collection = DatasetType.FILE_COLLECTION in dataset.get("kind", "")
is_valid_dataset = False
```
## Impact
Manifests generated from EDS workflows fail.
## Options for Resolution
- Read acceptable dataset types from a configuration file (best) or
- Add CONNECTED_SOURCE = ":dataset--ConnectedSource" + logic in the validate_dataset to support EDS datasets. The EDS team has made this type of change locally and it is working successfully in our AWS EDS dev environment. I've asked an EDS dev to create a branch with the change for ingestion team review.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/76Existing dataset id validating is preventing external dataset ID being ingest...2021-08-19T14:38:02ZRajesh BollineniExisting dataset id validating is preventing external dataset ID being ingested on EDS**Problem:**
The DATASET_ID_PATTERN variable in [validate_referential_integrity.py](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/validation/validate_referential_integrity.py...**Problem:**
The DATASET_ID_PATTERN variable in [validate_referential_integrity.py](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/blob/master/src/dags/libs/validation/validate_referential_integrity.py#L48) is called during ingestion and validates whether external referred source id record exists on OSDU platform if not fails.
Example: The Dataset id received from Katalyst supplier "katalyst:dataset--File.Generic:19330323"
As part of EDS ingestion process, creating Dataset record on consumer side with supplier dataset as source id entry as value in the EDS Dataset record creation. And later on for the proxy service to pick the actual dataset file based on the information available on the Dataset record as below mentioned example.
====================================================================================================
{
'data': {
'DatasetProperties': {
'ConnectedSourceRegistryEntryID': 'osdu:master-data--ConnectedSourceRegistryEntry:a9d70013-e645-4d7a-a721-89f88c32807f:',
'ConnectedSourceDataJobID': 'osdu:master-data--ConnectedSourceDataJob:a9d70013-e645-4d7a-a721-89f88c32807f:',
'SourceDataPartitionID': 'katalyst',
'SourceRecordID': 'katalyst:dataset--File.Generic:19330323'
}
},
'kind': 'osdu:wks:dataset--ConnectedSource.Generic:0.2.0',
'legal': {
'legaltags': ['osdu-demo-legaltag'],
'otherRelevantDataCountries': ['US']
},
'acl': {
'owners': ['data.default.owners@osdu.example.com'],
'viewers': ['data.default.viewers@osdu.example.com']
},
'id': 'osdu:dataset--ConnectedSource.Generic:Katalyst-katalyst-7334715'
}
========================================================================================================
Because DATASET_ID_PATTERN validates whether SourceRecordId 'katalyst:dataset--File.Generic:19330323' records exists on consumer platform.
As a workaround when passed SourceRecordId 'katalyst:dataset-File.Generic:19330323' instead of '--' in the source record id the record getting processed successful.
**Impact**
Manifests generated from EDS workflows fail.
**Resolution**
Code change is required not to validate for SourceRecordId when comes from EDShttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/79Manifest-based Ingestion - issue with surrogate key handling2021-07-24T00:12:06ZDebasis ChatterjeeManifest-based Ingestion - issue with surrogate key handling[manifest-ingestion-fails-w-surrogate-key.txt](/uploads/ab4d6431ec339b14fceb6399a7d0535e/manifest-ingestion-fails-w-surrogate-key.txt)
Enclosed file shows error from "schema validation" DAG component.
For input, I used a variation of t...[manifest-ingestion-fails-w-surrogate-key.txt](/uploads/ab4d6431ec339b14fceb6399a7d0535e/manifest-ingestion-fails-w-surrogate-key.txt)
Enclosed file shows error from "schema validation" DAG component.
For input, I used a variation of the sample load manifest (JSON) for TNO data.
"Variation" means - providing proper ACL, legal tag information and package inside manifest structure.
[load_log_1013_akm11_1978_comp_las.json](/uploads/cb7442a98391695f5fef25c2aa3d6720/load_log_1013_akm11_1978_comp_las.json)
I also understand that Thomas Dombrowsky has faced similar problem.
cc - @Keith_Wall for informationhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/80M7 Manifest based ingestion - Load Testing2022-01-18T20:06:59ZBen LasscockM7 Manifest based ingestion - Load Testing## Definitions
Load Testing - The number of records or manifests that can be processes at a time.
## Background
Throughout M7 there have been a number of performance improvements delivered by EPAM, as well as work on improving issues ...## Definitions
Load Testing - The number of records or manifests that can be processes at a time.
## Background
Throughout M7 there have been a number of performance improvements delivered by EPAM, as well as work on improving issues with configuration etc. We expect this has made a significant improvement to the capacity of the manifest based ingestion, but we don't have a specific figure.
**The process of load testing should be repeatable, with the expectation it will be applied to the upcoming Airflow 2+ changes.**
## Requirements
We need test the "5000 manifest test" @debasisc @todaiks to be re-run on the M7 release. The result should a binary pass/fail and the wall-time for executing the job. For completeness (Table 1) we show a set of recommended test cases that we believe should ultimately be automated and runnable through the QA group.
| Test | Issue | AWS | Azure | GCP | IBM |
| ----------- | ----------- | --- | ----- | --- | --- |
| the "5000 manifest" | Our current baseline | | | |
| 1 Manifest with 5,000 records | | | | |
| 1 Manifest with 20,000 records | | | | |
| 1 Manifest with 50,000 records | Limit on the size of the request body | | | |
| 50K manifests in multiple requests, not simultaneously | Airflow 1.X doesn’t allow sending multiple requests (Fixed in Airflow 2.0) | | | |
| chunks of 50, 1000 DAG runs | 1. max_active_runs (50) limitation 2. limitation of workflow service: java heap error Issue 64 3. Storage Service has a limitation of storing no more than 500 records/s | | | |
| chunks of 1000 | see above | | | |
| 50 DAG runs | | | | | |
| Launch several different DAGS simultaneously | | | |
| Ingest the Volve data | to promote adoption | | |
| Ingest the TNO data | to promote adoption | | |M7 - Release 0.10Chris ZhangChris Zhanghttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/91Date-Time validation causing ingestion failure2022-03-21T15:29:30ZKeith WallDate-Time validation causing ingestion failureDate values are failing schema validation on ingestion if the dates are not in UTC, or don't contain a time-zone offset.
This is a new validation, that requires date-times to conform to RFC3339. The intent is good, but it does not confo...Date values are failing schema validation on ingestion if the dates are not in UTC, or don't contain a time-zone offset.
This is a new validation, that requires date-times to conform to RFC3339. The intent is good, but it does not conform to schemas, or to our data.
There is a large volume of date in which a date is known, but for which no time or time zone is provided. Recognizing this, the OSDU schema only require that dates be a string.
As a large volume of data is managed that does not have time zone information, our options are either to reject all these dates, or to ingest and maintain them in original format.
If we force a time zone change on data by putting it into a UTC format when we really do not know the time zone, we are corrupting the data.
I have consulted the Enterprise Architecture Geomatics team, and asked if we should (1) Do not load dates with unknown time zones or (2) maintain the dates in as-provided form. There was complete agreement that the industry has a large volume of data with dates without time zones, and we can still make use of those dates, but must not modify them by adding a default time zone.
Please remove the date validation from ingestion.M9 - Release 0.12Kishore BattulaShrikant GargSpencer Suttonsuttonsp@amazon.comYan Sushchynski (EPAM)Kishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/66IBM support to move to airflow 2.02021-11-18T15:06:48Zjingdong sunIBM support to move to airflow 2.0M9 - Release 0.12Anuj GuptaShaonjingdong sunAnuj Guptahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/86Enable Support for Packaged DAGs2021-08-26T14:16:09Zharshit aggarwalEnable Support for Packaged DAGsThe [ADR](https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/47) for package DAG support has been approved and we need to structure the Ingestion Dags repository to support packaged DAGs. The new structure will look li...The [ADR](https://community.opengroup.org/osdu/platform/data-flow/home/-/issues/47) for package DAG support has been approved and we need to structure the Ingestion Dags repository to support packaged DAGs. The new structure will look like as below
**New folder structure**
```
├── osdu_manifest
│ ├── __init__.py
│ ├── libs
│ │ ├── __init__.py
│ │ └── utils.py
│ └── operators
│ | ├── __init__.py
│ | └── customOperator1.py
| |___ hooks
| | |__ __init__.py
| |
| |___ configs
| |__ __init__.py
|
|___ osdu-ingest-r3.py
```
Changes to support this will include
- Restructuring the folders
- Fixing any import statements
- Minor changes to run existing testsM8 - Release 0.11harshit aggarwalharshit aggarwalhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/88Support a policy for missing timezone information in date time data2022-03-16T07:57:57ZBen LasscockSupport a policy for missing timezone information in date time dataMeta information such as units (m, ft, g/cm^3, coordinate reference frame) is required for the ingestion of data. This information is crucial for normalizing the frame of reference.
For date time information, ideally this data is in UTC...Meta information such as units (m, ft, g/cm^3, coordinate reference frame) is required for the ingestion of data. This information is crucial for normalizing the frame of reference.
For date time information, ideally this data is in UTC or has meta information available to transform it to UTC. Currently the schema service is throwing out records that don't conform to this requirement.
However, there is a large body of data already the environment where this information is not available, and so we need to make date time the exception, and waive the requirement to provide UTC information. A counterpoint to waving this requirement is where we have activities like active drilling, where correct date times in UTC are required.
1. Define under what circumstances where date time UTC conversion can be waived.
2. Create a specification for what the behavior of the ingestion application should be to support both the waiving of date time frame of reference, or enforcing it, depending on the policy defined in (1).
e.g. should the ingestion provide warnings that date time meta information isn't available. Or should we have a flag or field in the record to allow the user to waive the requirement etc.Keith WallKeith Wallhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/89Move Airflow common logic to osdu-airflow-lib project2021-09-27T03:39:50ZSiarhei Khaletski (EPAM)Move Airflow common logic to osdu-airflow-lib projectSome of DAGs have dependencies on code from Ingestion DAGs repository.
For instance, the MRs bring updates for parsers DAGs:
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/merge_requests/24
-...Some of DAGs have dependencies on code from Ingestion DAGs repository.
For instance, the MRs bring updates for parsers DAGs:
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-vds-conversion/-/merge_requests/24
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/segy-to-zgy-conversion/-/merge_requests/36
- https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/merge_requests/149
this requires `UpdateStatusOperator` class from Ingestion DAGs project. It means, that it will require to deploy Ingestion DAGs code into environment to be able to use it for DAGs from MRs above.
The real case now is for WISTML Parser, where we have to add `osdu_manifest` code to use operators for WITSML Parser DAG steps.
**Expects**: All the Airflow related logic (operators, hooks, etc.) is able to be installed into environment independently (using pip) of Ingestion DAGs code base.M9 - Release 0.12Siarhei Khaletski (EPAM)Siarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/95Integration E2E Tests for manifest ingestion - MSFT2021-11-17T02:46:12ZChris ZhangIntegration E2E Tests for manifest ingestion - MSFTThis is to track the MSFT team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85This is to track the MSFT team's work for Integration E2E Tests for manifest ingestion.
Related to issue 85 https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/85M10 - Release 0.13Krishnan GanesanKrishnan Ganesanhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/98WIP - Performance Benchmarking2023-10-23T07:51:46ZBen LasscockWIP - Performance BenchmarkingThe purpose of this WIP issue is plan around how we are going to benchmark ingestion performance.
Need to address the performance of ingestion mechanisms.
Expected performance in production env is in excess of 33k records per minute - W...The purpose of this WIP issue is plan around how we are going to benchmark ingestion performance.
Need to address the performance of ingestion mechanisms.
Expected performance in production env is in excess of 33k records per minute - Wells, well logs, trajectories etc...
- [ ] @Devendra_R @npickus to connect with @todaiks & @chad if possible to confirm testing approach, timing and feedback cycles
Data examples from real use cases include wells, wellbores, trajectories, etc.. (already using tno and volve)
- [ ] Nick to check with Chevron teams to see if there is an opportunity for the teams to schedule a test of the current manifest ingestion in a real production environment to compare to current test rates within the Forum. Team can use Script from Jean Rainauld and test with the same synthetic data as the forum tests.
[testing info](https://community.opengroup.org/osdu/platform/data-flow/ingestion/osdu-airflow-lib/-/merge_requests/17)
- [ ] @debasisc to follow up with CSPs to gain alignment on CSPs testing ingestion in their environments
*Issues
- No defined custodians & developers for Manifest ingestion
- data sets used for testing not representative of real data - only master data
- testing requires close coordination with CSPs -
## Load Testing & Performance
[performance changes since M6](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/101)
[load testing Issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/80)
*Old test results
[M7](https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/pre-shipping/home/-/tree/master/TeamD_M7/ManifestLoadTesting), [M8](https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M8/Results/OSDU_LoadTesting_Results_M8_TeamD.xlsx)
### Basic Load testing
Testing the ingestion of 500, 1000, 50,000 record manifests.
Uses synthetic manifests to perform basic testing of the ingestion. Load testing is run by
pre shipping for each release, one release in arreas, which means MX is testing during the
development cycle of M(X+1). A spreadsheet showing the pass/fail and timing per CSP is provided
by pre shipping on conclusion of the test.
Using the Airflow console, additional information regarding run time and latency of the Airflow scheduler can be found using the Airflow console [Gantt Chart](https://airflow.apache.org/docs/apache-airflow/stable/ui.html). This data provides a view of where the performance bottlenecks might be.
Assets:
- [x] Basic load testing passed True/False (per release)
- [x] Timing information (scaling as a function of manifest size).
- [ ] Snap shots of the Airflow [Gantt Chart](https://airflow.apache.org/docs/apache-airflow/stable/ui.html)
### Advanced Load testing
The the other items (beyond the basic) [load testing](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/80) give insights into the sensitivity of the performance on the Airflow configuration. This maybe carried out if resources are available.
## Defining standards @npickus
Today teams are loading 8-10 Million records, including validations, outside of the Manifest or CSV ingestion mechanisms in ~5 hours for a rate of about 33k Records per minute.
- [x] Collect x2 user stories from operators.
- [x] Define OSDU EA/community expectations.
## For applications developers (local mode) @epeysson
There is a use case for applications developers to run the Airflow part of the ingestion locally (with core services provided accessible through a REST API). This local mode can be used to better profile the performance of just the Airflow component, agnostic.
Standalone installation instructions are [here](https://community.opengroup.org/osdu/platform/deployment-and-operations/individual-airflow).
- [ ] Complete basic load testing with standalone Airflow.Devendra RawatDevendra Rawathttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/103Manifest Based Ingestion - Operator Performance Benchmarking for M10 Release2023-10-23T07:51:06ZDevendra RawatManifest Based Ingestion - Operator Performance Benchmarking for M10 ReleasePurpose of this WIP issue is to plan around how we are going to benchmark ingestion performance. Testing to be performed on Data Example from Reference Data, Master Data (Wells, Wellbores), and Work Product Components (Trajectory), etc.
...Purpose of this WIP issue is to plan around how we are going to benchmark ingestion performance. Testing to be performed on Data Example from Reference Data, Master Data (Wells, Wellbores), and Work Product Components (Trajectory), etc.
This is to gauge the operator acceptance on performance upgrade as benchmarked in [issue](https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/101). The upgrade has shown significant improvement in throughput and speed as highlighted.
| Manifest Type | Operator 1 |Operator 2 | Operator 3 |Operator 4 | Operator 5 |Operator 6 |
| ------ | ------ | ------| ------ | ------| ------ | ------|
| Reference Data |
| Master Data |
| Work Product Component |Devendra RawatDevendra Rawathttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/104Manifest ingestion DAG is not creating master-data--Wellbore2022-11-11T12:29:06ZThomas DombrowskyManifest ingestion DAG is not creating master-data--WellboreWhen running the manifest ingestor DAG with the attached payload, no records are inserted into the storage.
The Airflow logs show no error, so it is unknown why the ingestion fails.
Expected: The manifest contains a single record. The r...When running the manifest ingestor DAG with the attached payload, no records are inserted into the storage.
The Airflow logs show no error, so it is unknown why the ingestion fails.
Expected: The manifest contains a single record. The record should be inserted into storage during the ingestion.
Expected: The Airflow logs need to be improved. Logs should show the payload that was received and what processing has occurred. If there are errors that prevent the ingestion of data, these should be fully logged.
[wellbore-wks_sandbox.json](/uploads/10c1885dffa4288891cb48a13ad1e489/wellbore-wks_sandbox.json)https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/105Performance testing in R3 M11 - Need to determine the maximum size of the pa...2024-03-20T15:49:55ZKamlesh TodaiPerformance testing in R3 M11 - Need to determine the maximum size of the payload allowed during ingestion using Osdu_ingest DAGAll,
For R3M11, I performed the performance load testing using the Osdu_ingest DAG running in Airflow v2.0
The environment I used was IBM Pre-ship R3 M11. Here is the summary:
As expected, we can see that when batch_upload is used ...All,
For R3M11, I performed the performance load testing using the Osdu_ingest DAG running in Airflow v2.0
The environment I used was IBM Pre-ship R3 M11. Here is the summary:
As expected, we can see that when batch_upload is used the time required to ingest the data goes down ( performance gain)
Some observations of the process used.
There is a difference in the python scripts that are used to generate the payload for Ingestion and batch_upload.
The python script that generates the payload for ingestion, generates records of kind: “opendes:wks:master-data--Organisation:1.0.0”. So when user specifies 5 records, it generates 5 records for kind Organisation
The python script that generates the payload for batch_upload, generates records of "osdu:wks:master-data--Organisation:1.0.0 and osdu:wks:reference-data--ContractorType:1.0.0". So when a user specifies 5 records, it generates records of kind
Organisation and ContractorType. So it is actually generating twice the amount of records that were specified.
At present to establish the benchmark for the performance, we are using the number of records. Probably because it is convenient to tell the users that to ingest a certain number of wells for example takes x amount of time.
But the well record size may vary from one user environment to another and hence performance numbers derived using a number of records may not hold true in all the situations.
How much one can ingest in one job or one time is based on the size of the payload in KB. So I think we should use the payload size in KB to establish the benchmark. The number of records that can fit in the payload would depend on the size of the records
I have done the testing in the IBM environment, but the test for 50000 records in batch_upload seems to be failing in all the environments.
I do not know, where the size limit is coming from? REST API, Network, Airflow, DAG implementation.
Nor do I know whether that size is configurable?
It is important for us to understand where that limitation is coming from and whether it is a hard limit or a configurable limit.
The python script file should honor that limit and generate data/payload files (multiple) containing a correct number of records to avoid failures.https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/108Manifest by reference : error while DAG run2022-11-11T12:26:55ZDevdatta SantraManifest by reference : error while DAG run**While running the manifest by reference DAG, we are getting the following error in "validate_manifest_schema_task".**
```
[2022-10-13 08:56:52,287] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'Osdu_ingest_by_r...**While running the manifest by reference DAG, we are getting the following error in "validate_manifest_schema_task".**
```
[2022-10-13 08:56:52,287] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'Osdu_ingest_by_reference', 'validate_manifest_schema_task', '2022-10-13T08:56:41.095723+00:00', '--job-id', '13024', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/osdu-ingest-by-reference-r3.py', '--cfg-path', '/tmp/tmpv4ta88jt', '--error-file', '/tmp/tmpkxyyt6ok']
[2022-10-13 08:56:52,288] {standard_task_runner.py:77} INFO - Job 13024: Subtask validate_manifest_schema_task
[2022-10-13 08:56:52,390] {logging_mixin.py:104} INFO - Running <TaskInstance: Osdu_ingest_by_reference.validate_manifest_schema_task 2022-10-13T08:56:41.095723+00:00 [running]> on host ***-worker-0.***-worker.osdu.svc.cluster.local
[2022-10-13 08:56:52,509] {taskinstance.py:1300} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=Osdu_ingest_by_reference
AIRFLOW_CTX_TASK_ID=validate_manifest_schema_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-13T08:56:41.095723+00:00
AIRFLOW_CTX_DAG_RUN_ID=83247382-218b-44b5-b1c1-0b921ee67dd6
[2022-10-13 08:57:04,974] {taskinstance.py:1501} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/osdu_airflow/operators/validate_manifest_schema_by_reference.py", line 110, in execute
manifest_data = self._get_manifest_data_by_reference(context=context,
File "/home/airflow/.local/lib/python3.8/site-packages/osdu_airflow/operators/mixins/ReceivingContextMixin.py", line 105, in _get_manifest_data_by_reference
retrieval_content_url = retrieval.json()["delivery"][0]["retrievalProperties"]["signedUrl"]
KeyError: 'delivery'
[2022-10-13 08:57:04,977] {taskinstance.py:1544} INFO - Marking task as FAILED. dag_id=Osdu_ingest_by_reference, task_id=validate_manifest_schema_task, execution_date=20221013T085641, start_date=20221013T085652, end_date=20221013T085704
```
It would be very helpful to get any resolution regarding this.
======================================
Updates about the new errors encountered:-
1) `AttributeError: 'dict' object has no attribute 'to_JSON'` - as mentioned in this below comment
https://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/108#note_159282
2) "Schema is not present" error from Dataset service while running the DAG
```
2022-10-19 12:03:14.191 DEBUG 1 --- [nio-8080-exec-1] .m.m.a.ExceptionHandlerExceptionResolver : Using @ExceptionHandler org.opengroup.osdu.dataset.util.GlobalExceptionMapper#handleAppException(AppException)
2022-10-19 12:03:14.193 WARN 1 --- [nio-8080-exec-1] o.o.o.c.common.logging.DefaultLogWriter : dataset-registry.app: Schema is not present
AppException(error=AppError(code=404, reason=Schema Service: get 'opendes:wks:dataset--File.Generic:1.0.0', message=Schema is not present, errors=null, debuggingInfo=null, originalException=null), originalException=null)
at org.opengroup.osdu.dataset.service.DatasetRegistryServiceImpl.validateDatasets(DatasetRegistryServiceImpl.java:233)
at org.opengroup.osdu.dataset.service.DatasetRegistryServiceImpl.createOrUpdateDatasetRegistry(DatasetRegistryServiceImpl.java:112)
at org.opengroup.osdu.dataset.api.DatasetRegistryApi.createOrUpdateDatasetRegistry(DatasetRegistryApi.java:66)
at org.opengroup.osdu.dataset.api.DatasetRegistryApi$$FastClassBySpringCGLIB$$774ab2c5.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:123)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:61)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)
at org.opengroup.osdu.dataset.api.DatasetRegistryApi$$EnhancerBySpringCGLIB$$649af8f9.createOrUpdateDatasetRegistry(<generated>)
```Valentin GauthierValentin Gauthierhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/109Manifest ingestion by Reference - error while running DAG for first time2023-02-13T08:56:59ZNaveen RamachandraiahManifest ingestion by Reference - error while running DAG for first timeTeam,
For Azure, we are trying to implement feature Manifest by reference but getting issues while running DAG. Attached the error log and the screenshot of the DAG Graph. Please do help[DAG_-error.log](/uploads/80013a342d3e6d5fbfd843f...Team,
For Azure, we are trying to implement feature Manifest by reference but getting issues while running DAG. Attached the error log and the screenshot of the DAG Graph. Please do help[DAG_-error.log](/uploads/80013a342d3e6d5fbfd843fcf27c0707/DAG_-error.log)![DAG-_tree](/uploads/e09e4f01cfb6b1a4166a4df2efa83e4d/DAG-_tree.png)M16 - Release 0.19Jayesh BagulJayesh Bagul