OSDU Software issueshttps://community.opengroup.org/groups/osdu/-/issues2023-10-03T14:38:44Zhttps://community.opengroup.org/osdu/platform/ci-cd-pipelines/-/issues/28Make a daily build2023-10-03T14:38:44ZOkoun-Ola Fabien HouetoMake a daily build**Context and Issue**: On master branch, we rely on merge requests to trigger builds. As a results, there can be days or weeks without a successful merge and it becomes quite difficult to figure out the state of the master branch.
**Sol...**Context and Issue**: On master branch, we rely on merge requests to trigger builds. As a results, there can be days or weeks without a successful merge and it becomes quite difficult to figure out the state of the master branch.
**Solution Proposa**l: To alleviate that issue, we propose to have a daily triggered build at least on the core components on the master branch
**Impact**: The build will run on each CSP dev environment. Timing for the build and dependencies in the build could be adjusted depending on availability pattern for resources
**Trade off**: The alternative is to leave the current system as is or maybe stagger the build on a week rather than a day. Leaving the system as is brings the issue highlighted. If resources maybe a constraint, staggering the builds over a week may provide a more reliable way to find the latest status of master while reducing the load.
cc- @chad @dividohttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/225Disable Registry Scan feature for flux2022-04-12T09:17:46ZDzmitry_Paulouski (slb)Disable Registry Scan feature for fluxThere are a lot of error messages in the Flux pod.
They are caused by Flux checking for new images, but access to container registry is not provided:
https://fluxcd.io/legacy/flux/faq/#how-do-i-give-flux-access-to-an-image-registry
_Flux...There are a lot of error messages in the Flux pod.
They are caused by Flux checking for new images, but access to container registry is not provided:
https://fluxcd.io/legacy/flux/faq/#how-do-i-give-flux-access-to-an-image-registry
_Flux transparently looks at the image pull secrets that you attach to workloads and service accounts, and thereby uses the same credentials that Kubernetes uses for pulling each image. In general, if your pods are running, then Kubernetes has pulled the images, and Flux should be able to access them too._
Since we do not use this feature, it can be disabled. https://fluxcd.io/legacy/flux/faq/#can-i-disable-flux-registry-scanningDzmitry_Paulouski (slb)Dzmitry_Paulouski (slb)https://community.opengroup.org/osdu/platform/system/storage/-/issues/121Storage Schema endpoints should be obsoleted2022-08-24T10:54:38ZGary MurphyStorage Schema endpoints should be obsoletedRemove code and config related to the storage schemas APIs from OSDU as they are EOL.
The following APIs are to be removed
- GET /Schema
- DELETE /Schema
- POST /schemaRemove code and config related to the storage schemas APIs from OSDU as they are EOL.
The following APIs are to be removed
- GET /Schema
- DELETE /Schema
- POST /schemahttps://community.opengroup.org/osdu/ui/data-loading/wellbore-ddms-data-loader/-/issues/51Well Log loader from DLIS format2024-03-18T04:26:16ZDebasis ChatterjeeWell Log loader from DLIS formatSimilar to existing LAS loaderSimilar to existing LAS loaderhttps://community.opengroup.org/osdu/platform/pre-shipping/-/issues/284Manifest ingestion (Osdu_ingest) of 50000 records is failing when trying to p...2022-08-23T11:25:05ZKamlesh TodaiManifest ingestion (Osdu_ingest) of 50000 records is failing when trying to perform batch_upload. It works fine when batch_upload is not performed and the regular option is used.In R3M11 pre-ship environment, While executing the workflow to do performance load testing using batch_upload -
Manifest ingestion (Osdu_ingest) of 50000 records is failing when trying to perform batch_upload.
It works fine when batch_...In R3M11 pre-ship environment, While executing the workflow to do performance load testing using batch_upload -
Manifest ingestion (Osdu_ingest) of 50000 records is failing when trying to perform batch_upload.
It works fine when batch_upload is not performed and the regular option (Single process) is used.
Gateway is timing out.
**Here is the console log**
ue 26-Apr-2022 12:41:11 INFO Selected Cloud Service Provider: ibm
Tue 26-Apr-2022 12:41:11 DEBUG Starting new HTTPS connection (1): keycloak-osdu-keycloak.odi-osdu-og-fa7661852f2ab29a6be32f560b2f5573-0000.us-south.containers.appdomain.cloud:443
Tue 26-Apr-2022 12:41:11 DEBUG https://keycloak-osdu-keycloak.odi-osdu-og-fa7661852f2ab29a6be32f560b2f5573-0000.us-south.containers.appdomain.cloud:443 "POST /auth/realms/OSDU/protocol/openid-connect/token HTTP/1.1" 200 3501
Tue 26-Apr-2022 12:41:11 DEBUG ### Inserting file json <<<<<<<<<<<: D:\OSDU\PreShipping\M11\loadTesting\IngestionBulkBatch\json\ibm\ibm_batchManifest_2022-04-26_12-14-44_50000.json
Tue 26-Apr-2022 12:41:13 DEBUG Starting new HTTPS connection (1): osdu-cpd-osdu.odi-osdu-og-fa7661852f2ab29a6be32f560b2f5573-0000.us-south.containers.appdomain.cloud:443
Tue 26-Apr-2022 12:42:27 DEBUG https://osdu-cpd-osdu.odi-osdu-og-fa7661852f2ab29a6be32f560b2f5573-0000.us-south.containers.appdomain.cloud:443 "POST /osdu-workflow/api/workflow/v1/workflow/Osdu_ingest/workflowRun HTTP/1.1" 504 164
Tue 26-Apr-2022 12:42:27 DEBUG HTTP POST https://osdu-cpd-osdu.odi-osdu-og-fa7661852f2ab29a6be32f560b2f5573-0000.us-south.containers.appdomain.cloud/osdu-workflow/api/workflow/v1/workflow/Osdu_ingest/workflowRun
Tue 26-Apr-2022 12:42:27 DEBUG Response: 504
Tue 26-Apr-2022 12:42:27 DEBUG text = <html>
<head><title>**504 Gateway Time-out**</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>openresty</center>
</body>
</html>
Tue 26-Apr-2022 12:42:27 ERROR ### HTML 504 with response None
Tue 26-Apr-2022 12:42:27 INFO ######### Exiting Process due to error to POST manifest #########
**This is the airflow log**
*** Log file does not exist: /opt/airflow/logs/Osdu_ingest/update_status_finished_task/2022-04-26T17:41:52.705910+00:00/1.log
*** Fetching from: http://airflow-worker-0.airflow-worker.osdu-airflow.svc.cluster.local:8793/log/Osdu_ingest/update_status_finished_task/2022-04-26T17:41:52.705910+00:00/1.log
[2022-04-26 17:43:08,018] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: Osdu_ingest.update_status_finished_task 2022-04-26T17:41:52.705910+00:00 [queued]>
[2022-04-26 17:43:08,049] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: Osdu_ingest.update_status_finished_task 2022-04-26T17:41:52.705910+00:00 [queued]>
[2022-04-26 17:43:08,049] {taskinstance.py:1087} INFO -
--------------------------------------------------------------------------------
[2022-04-26 17:43:08,049] {taskinstance.py:1088} INFO - Starting attempt 1 of 1
[2022-04-26 17:43:08,050] {taskinstance.py:1089} INFO -
--------------------------------------------------------------------------------
[2022-04-26 17:43:08,082] {taskinstance.py:1107} INFO - Executing <Task(UpdateStatusOperator): update_status_finished_task> on 2022-04-26T17:41:52.705910+00:00
[2022-04-26 17:43:08,094] {standard_task_runner.py:52} INFO - Started process 1308 to run task
[2022-04-26 17:43:08,108] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'Osdu_ingest', 'update_status_finished_task', '2022-04-26T17:41:52.705910+00:00', '--job-id', '15807', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/osdu-ingest-r3.py', '--cfg-path', '/tmp/tmpzwokqu_w', '--error-file', '/tmp/tmpmglxfmpm']
[2022-04-26 17:43:08,110] {standard_task_runner.py:77} INFO - Job 15807: Subtask update_status_finished_task
[2022-04-26 17:43:11,063] {logging_mixin.py:104} INFO - Running <TaskInstance: Osdu_ingest.update_status_finished_task 2022-04-26T17:41:52.705910+00:00 [running]> on host ***-worker-0.***-worker.osdu-***.svc.cluster.local
[2022-04-26 17:43:13,860] {taskinstance.py:1300} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=Osdu_ingest
AIRFLOW_CTX_TASK_ID=update_status_finished_task
AIRFLOW_CTX_EXECUTION_DATE=2022-04-26T17:41:52.705910+00:00
AIRFLOW_CTX_DAG_RUN_ID=b90defff-6b20-4b9b-8c0b-aa9d93a90aaf
[2022-04-26 17:43:17,232] {update_status.py:66} INFO - There are failed tasks before this one. So it has status FAILED
[2022-04-26 17:43:22,353] {logging_mixin.py:104} INFO - env_vars_enabled ************************ true
[2022-04-26 17:43:22,353] {logging_mixin.py:104} INFO - cloud provider ******************* ibm
[2022-04-26 17:43:22,353] {logging_mixin.py:104} INFO - Inside if
[2022-04-26 17:43:26,773] {taskinstance.py:1501} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/osdu_airflow/operators/update_status.py", line 140, in execute
raise PipelineFailedError("Dag failed")
osdu_ingestion.libs.exceptions.PipelineFailedError: Dag failed
[2022-04-26 17:43:26,777] {taskinstance.py:1544} INFO - Marking task as FAILED. dag_id=Osdu_ingest, task_id=update_status_finished_task, execution_date=20220426T174152, start_date=20220426T174308, end_date=20220426T174326
[2022-04-26 17:43:26,919] {local_task_job.py:151} INFO - Task exited with return code 1M11 - Release 0.14Anuj GuptaAnuj Guptahttps://community.opengroup.org/osdu/platform/pre-shipping/-/issues/291IBM M11 - Well Delivery Invalid responses2022-08-23T11:25:07ZMichaelIBM M11 - Well Delivery Invalid responsesI am getting responses where the valid=false for some of the requests in the collection: https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M11/IBM-M11/IBM_ODI_R3_WellDelivery.postman_collection.json
The attached ...I am getting responses where the valid=false for some of the requests in the collection: https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M11/IBM-M11/IBM_ODI_R3_WellDelivery.postman_collection.json
The attached document contains details on the failed requests:
[IBM_M11_Well_Delivery_Failed_Requests.docx](/uploads/26d5a619b72b434cdcc14894fab6eb0c/IBM_M11_Well_Delivery_Failed_Requests.docx)https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/292Azure - WITSML Parser collection. Suggestions and feedback2023-09-28T13:07:16ZDebasis ChatterjeeAzure - WITSML Parser collection. Suggestions and feedbackI have a few improvement suggestions for this collection.
![Azure-WITSML-Parser-collection-steps](/uploads/298062b85e86bb7dab3de418b0810336/Azure-WITSML-Parser-collection-steps.PNG)
Why is this called “1. Record id”?
Combine steps 1.0...I have a few improvement suggestions for this collection.
![Azure-WITSML-Parser-collection-steps](/uploads/298062b85e86bb7dab3de418b0810336/Azure-WITSML-Parser-collection-steps.PNG)
Why is this called “1. Record id”?
Combine steps 1.0 and 1.4 – in sub-folder “Pre-requisite steps”.
Then have another sub-folder (name = “Repeat for each data type”) for steps 1.1-1.3, 1.5-1.9.
Keep a companion guide which will instruct the user/tester to perform the repeatable steps by using test data of “Well” record.
Ensure that “Well” mentioned in Wellbore test data is actually the one created above.
Then steps for Log, Marker, Trajectory, Tubular can come in any order.
Each should ensure that mentioned Wellbore ID actually exists (from previous steps).
Ideally, please create dedicated folder for WITSML in Azure/Preship site with guide (word document), Postman Collection and 6 sample data files.
https://community.opengroup.org/osdu/platform/pre-shipping/-/tree/main/R3-M11/Azure-M11/Services/Ingestion
Thanks
cc @krveduruKishore BattulaKishore Battulahttps://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/57Lack of documentation of WITSML parser2023-05-22T18:33:25ZAha!Lack of documentation of WITSML parserCurrently there is an ongoing work to migrate the functionality to enhance the capability of WITSML parser to write WITSML log, trajectory data into Wellbore DDMS.
There is a need to refactor the existing code base to be able to ...Currently there is an ongoing work to migrate the functionality to enhance the capability of WITSML parser to write WITSML log, trajectory data into Wellbore DDMS.
There is a need to refactor the existing code base to be able to support that.
There are few issues identified :
- The [energistics-osdu-integration](https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics-osdu-integration) repo is extremely big - there is a lot of code in there so it is difficult to work out what it does.
- No documentation is available as confirmed by Energistics devleopers - it would take a significant amount of time and effort for current developers to understand and work out how all of the code in the repo works.
- Lack of working data (current WITSML supports v2.0) but majority of data avaialble now is in v1.4. - despite numerous requests to the forum, we did not get any feedback.
Actions needed:
1. Understand what all of the functional code does in the repo and document it.
2. Identify code that is not used and remove it.
3. Consider splitting up or refactoring the remaining code into components (e.g. the core parser component)
4. Support from public data
Created from Aha! https://osdu.aha.io/features/TICKETS-11https://community.opengroup.org/osdu/platform/pre-shipping/-/issues/294Azure R3M11 - Manifest Ingestion FoR CRS: skipped_ids "Entity doesn't pass th...2022-08-23T13:29:58ZEsmira RafigayevaAzure R3M11 - Manifest Ingestion FoR CRS: skipped_ids "Entity doesn't pass the schema validation"Manifest Ingestion FoR CRS test execution: indicating in json payload
"id": "osdu:master-data--Well:1115ER",
"kind": "osdu:wks:master-data--Well:1.0.0",
getting skipped ids.
"runId": "fd4e79f9-acc...Manifest Ingestion FoR CRS test execution: indicating in json payload
"id": "osdu:master-data--Well:1115ER",
"kind": "osdu:wks:master-data--Well:1.0.0",
getting skipped ids.
"runId": "fd4e79f9-accd-42d7-8992-11d648eb29eb"
Airflow response:
XCom
Key Value
saved_record_ids {'process_single_manifest_file_task': ['opendes:reference-data--ResourceSecurityClassification:RESTRICTED']}
skipped_ids {'validate_manifest_schema_task': [{'id': 'osdu:master-data--Well:1115ER', 'kind': 'osdu:wks:master-data--Well:1.0.0', 'reason': "Entity doesn't pass the schema validation."}]}
Appreciate your quick response. Regards, Esmira cc: @sehuboy @ankurrawat @krganesan @krveduru @NikhilSingh @debasiscM11 - Release 0.14Nikhil Singh[MicroSoft]Esmira RafigayevaNikhil Singh[MicroSoft]https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/58WITSML Parser is failing with Tubular data2022-06-30T20:14:04ZDebasis ChatterjeeWITSML Parser is failing with Tubular dataTested in Azure R3M11 Preship environment.
I have experienced a failure.
Data file.
[Tubular__witsml-DC.xml](/uploads/9f7618f3cf1825573343cc7a39e6a2bd/Tubular__witsml-DC.xml)
Log
[M11_Azure_WITSML-Tubular-Debasis.txt](/uploads/11ae3e2...Tested in Azure R3M11 Preship environment.
I have experienced a failure.
Data file.
[Tubular__witsml-DC.xml](/uploads/9f7618f3cf1825573343cc7a39e6a2bd/Tubular__witsml-DC.xml)
Log
[M11_Azure_WITSML-Tubular-Debasis.txt](/uploads/11ae3e249b9ddd2bc3b1b62ae902904c/M11_Azure_WITSML-Tubular-Debasis.txt)
cc - @todaiks for informationetienne peyssonetienne peyssonhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/87Can't determine when page limit is for search request based on error message2022-08-24T10:50:28ZMichaelCan't determine when page limit is for search request based on error messageThe Search API is limited to only returning a maximum number of matching results for a query (default is 10,000 results total).
When this limit is reached a 400 error is returned however no details are provided about what parameters are...The Search API is limited to only returning a maximum number of matching results for a query (default is 10,000 results total).
When this limit is reached a 400 error is returned however no details are provided about what parameters are invalid.
Below is an example search request that retrieves records beyond the max allowed limit:
```
curl --location --request POST 'https://r3m11.preshiptesting.osdu.aws/api/search/v2/query' \
--header 'data-partition-id: osdu' \
--header 'Authorization: Bearer ...' \
--header 'Content-Type: application/json' \
--data-raw '{
"kind": "osdu:wks:work-product-component--WellboreTrajectory:*",
"limit": 1000,
"offset": 10000
}'
```
Response:
```
{
"code": 400,
"reason": "Bad Request",
"message": "Invalid parameters were given on search request"
}
```
The generic 400 error message "Invalid parameters were given on search request" makes it difficult for the user to determine what parameters were specified incorrectly in the request.
Ideally, the message should be more specific about what parameters are invalid and why.https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/57Utilizing Standard Pipelines2023-03-24T19:24:00ZDavid Diederichd.diederich@opengroup.orgUtilizing Standard PipelinesI'd like this project to consider merging your CI pipeline work with the osdu/platform/ci-cd-pipelines> project, and utilize more jobs by includes than using local CI config.
### Some Reasons to Consider
**Copy/paste code is hard to ke...I'd like this project to consider merging your CI pipeline work with the osdu/platform/ci-cd-pipelines> project, and utilize more jobs by includes than using local CI config.
### Some Reasons to Consider
**Copy/paste code is hard to keep maintained**
Most of your CI logic appears to have started as a copy/paste from the main repository, anyway.
But keeping it local means that developers need to update changes in multiple places, and when they're working on the improvements they don't have your use case in mind.
This included some recent developments to get the dev2 environment going, but it also includes the changes to the FOSSA scanning -- you're still using an older, unmaintained image for the scanning.
And, when I did the changes, I worked test examples for maven and pip, the two supported build systems.
If npm had been there, I would have had it in mind.
**You miss new pipeline developments**
I'm moving pieces of the release management scripts into the pipeline to make more aspects of the tagging process happen automatically from branch creation.
For now, it's only dependency scanning data, but upgrades are planned to do more stages from there.
The GitLab Ultimate scanners check for security vulnerabilities, and the InfoSec team utilizes these results to plan their work.
These scanners aren't running on your project, but would be if included the appropriate CI configuration -- or at least, we'd see what needs to be improved on those scanners to function if they don't work out of the box.
**Your improvements aren't available to others**
Any improvements you make to the CI process after you've copied it remains in your local repository.
Others could benefit from having this available in a common location.
Supporting another language gives future OSDU projects more capabilities right at the start.
You'd even get to define the basic processes for these.
### Open to Discussion
I'd like to hear more about how the custom pipelines came to be, and if they are serving a need that can't be generalized.
For steps that are truly custom and unique to your project, it makes sense to have them as local CI config files.
If we do decide to start using more of the standard pipeline logic, I think we'll need to implement it slowly, a piece at a time.
Of course, if you think a big bang MR is better, I'd consider that, too.
Thank you in advance for your thoughts.https://community.opengroup.org/osdu/platform/security-and-compliance/legal/-/issues/25Compliance APIs are inconsistent, causing legal tag status not updated if new...2022-09-27T08:16:58ZAn NgoCompliance APIs are inconsistent, causing legal tag status not updated if new expiration date is within 1 day and associated record not deleted- API Validate Legal Tag: validates the legal tag at the time the API is called. Specifically, it checks for the validation of the properties on the fly.
- API List Legal tag (valid/invalid): returns the list of legal tags whose valid ...- API Validate Legal Tag: validates the legal tag at the time the API is called. Specifically, it checks for the validation of the properties on the fly.
- API List Legal tag (valid/invalid): returns the list of legal tags whose valid flag is true or false. This flag is updated during the daily cron job. Therefore, if the legal tag is update in the afternoon from invalid to valid, but the cron job is not ran until midnight, so there is a period between the afternoon and midnight where the legal tag is still shown as invalid. Therefore, this legal tag would still be returned in the response because its valid flag has not been updated.
Due to the aforementioned inconsistency, the system allows ingestion of records whose associated legal tag is not get marked as valid.
**Use case:**
- LegalTagA expires on February 4. After the cron job, it is marked as valid=false.
- On April 21, LegalTagA is updated with new expiration date of April 22. At time of this update, the valid flag is still false because the cron job has not ran.
- Also on April 21, user creates new records with LegalTagA. At this time, Storage calls ValidateLegalTag -> returns true because at this time, the legal tag is set to expire on April 22 so it is considered valid, so record creation was successful.
- At midnight April 22, the cron job is ran. It finds that LegalTagA has expiration date of April 22. HOWEVER, the valid flag was already false, so nothing was changed. So even though the legal tag was updated earlier in the day, its valid flag was never true. Because the valid flag never changed, the associated records were not updated. In this case, on April 22 the legal tag expires, its associated records were not deleted. Now we have invalid records in the system.
**Fix:**
Fix the API inconsistency should resolve the issue mentioned in the use case, and also allows the correct legal tags to be returned in the Get API.https://community.opengroup.org/osdu/ui/data-loading/osdu-cli/-/issues/20Simplify usage of multiple config-files2022-05-05T10:22:27ZJan MortensenSimplify usage of multiple config-filesCurrently a clean install of OSDU CLI will rely on a configuration-file amply named "config" which is stored on a pre-defined location based on your OS. E.g. on my Mac it is ```/users/<user>/.osducli/config```. Creating multiple environm...Currently a clean install of OSDU CLI will rely on a configuration-file amply named "config" which is stored on a pre-defined location based on your OS. E.g. on my Mac it is ```/users/<user>/.osducli/config```. Creating multiple environment-configs is easily handled by having multiple files in this location. Now there are already some other files here, namely the state-file used for the default config and a msal-token cache in case you use that authentication-type. When switching between different configs you use the ```osdu config default```-command, which requires you to write manually the whole path and the name of the config-file you would like to use.
I propose two enhancements:
1. the ability to refer to the config-file without using the entire path (and possibly with free naming, i.e. not relying on the ```config```-part.
1. the ability to easy list the available configurations (aka environments/instances) you already have. E.g. like the ```kubectl config get-contexts```-commandhttps://community.opengroup.org/osdu/platform/security-and-compliance/legal/-/issues/26Break up Legal-tag object2023-08-07T09:35:30ZJan MortensenBreak up Legal-tag objectNot sure this is the right location for this, but the current object for legal tag with enumerated values for some of the elements is inhibiting usage of the more savvy interactions with the platform, e.g. expiration date and auto inheri...Not sure this is the right location for this, but the current object for legal tag with enumerated values for some of the elements is inhibiting usage of the more savvy interactions with the platform, e.g. expiration date and auto inheritance of legal tags based on the record ancestry.
I would suggest that the legal tag object is broken up into name-value pairs, as the regular ```tags```. But that they are of such a type that one can leverage the functionality of legal:
* inheritance for these tags when there is an appropriate lineage in place, e.g. ancestry.
* expiration-date enforcement for current record and deriatives
* ability to add to the enforced enumerated values, e.g. classification to adhere to in-house-governance.https://community.opengroup.org/osdu/platform/system/storage/-/issues/123Storage GET record returns 404 for records with optional version (Record ID e...2023-06-06T20:04:08ZAn NgoStorage GET record returns 404 for records with optional version (Record ID ending with colon)Storage GET /api/storage/v2/records/{id} returns 404 error for records whose ID ends with a colon (version is empty).
For example, "osdu:master-data--Wellbore:nz-100000391126:"
This is the case where the version component is empty (this...Storage GET /api/storage/v2/records/{id} returns 404 error for records whose ID ends with a colon (version is empty).
For example, "osdu:master-data--Wellbore:nz-100000391126:"
This is the case where the version component is empty (this is allowed as part of [this change](https://community.opengroup.org/osdu/platform/system/storage/-/issues/26#summary-january-26-2021) in record ID validation).
Expected behavior should be returning the latest version of the record.https://community.opengroup.org/osdu/platform/data-flow/ingestion/energistics/witsml-parser/-/issues/59Support of V1.4 data files2022-05-07T15:09:47ZDebasis ChatterjeeSupport of V1.4 data filesTime and again we are hearing that most data files in real world still use V1.4. So, there is a need to support older version 1.4 over and above current support of V2.0.
See note from TotalEnergy - 6-May-2022.
TotalEnergies – access to...Time and again we are hearing that most data files in real world still use V1.4. So, there is a need to support older version 1.4 over and above current support of V2.0.
See note from TotalEnergy - 6-May-2022.
TotalEnergies – access to representative test data in WITSML V2.0 format (that is what is supported by the Parser today).
- (most of) our suppliers still use 1.4.1 so we do not have WITSML v2.0 data available
cc - @epeysson , @chad , @Keith_Wall , @jean_francois.rainaudhttps://community.opengroup.org/osdu/platform/data-flow/ingestion/ingestion-dags/-/issues/106Manifest Ingestion by Reference - point to a large set of identical files2022-05-10T14:47:01ZDebasis ChatterjeeManifest Ingestion by Reference - point to a large set of identical filesDiscussed with Jean Francois Rainaud recently.
Such as collection has identical manifests for different records, such as for 5000 TNO Wellbores.
https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data/-/tree...Discussed with Jean Francois Rainaud recently.
Such as collection has identical manifests for different records, such as for 5000 TNO Wellbores.
https://community.opengroup.org/osdu/platform/data-flow/data-loading/open-test-data/-/tree/master/rc--3.0.0/4-instances/TNO/master-data/Wellbore
It is feasible to make use of **File Collection** type Dataset.
https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/dataset/FileCollection.Generic.1.0.0.md
The program can point to a Dataset record which is file collection and handles processing of all 5000 records.
Thus there can be two alternatives for the new program (Manifest Ingestion by Reference) – one with large (concatenated) JSON file and the other with “collection”.
This thought is actually triggered by user feedback (see below).
Manifest Ingestion Issues:
1. While ingesting a set of a batch files, the files picked up by the script and invoking the DAG.
a. DAG has a limitation to run only 32 concurrent runs. Hence python scripts trigger 100 files, it is taking only 32 at a time, and once the job finishes, it picks up the other one.
b. During concurrent runs, some of the DAGs fail, but Airflow still shows as success, which is the pain area to identify the unsuccess file unless the customer reported that the file did not ingest properly.
2. The TNO dataset takes almost 2-3 hrs to ingest (~5000 wells), and we are concerned with the massive volume of (TB) data and how many days it ll take to ingest it. Therefore, the performance of ingestion need to improve more.
Regards,
Jegan (Accenture)https://community.opengroup.org/osdu/platform/system/search-service/-/issues/88Search service returns date-time fields with an incorrect format2022-08-24T10:48:48ZAn NgoSearch service returns date-time fields with an incorrect formatI created a record in Storage containing a date-time field. When I get the record from OSDU Storage, it returns correctly the ISO 8601 / RFC 3339 format I inserted.
Problem: When I retrieve the same record from the OSDU Search service, ...I created a record in Storage containing a date-time field. When I get the record from OSDU Storage, it returns correctly the ISO 8601 / RFC 3339 format I inserted.
Problem: When I retrieve the same record from the OSDU Search service, the date has an unexpected format, which causes unmarshalling to fail.
If you fetch it from the OSDU Storage service, the two date-time fields are displayed correctly as inserted.
**What I expect:**
Date fields in the format "2021-06-25T19:16:55+00:00" or "2021-06-25T19:16:55Z"
**What I got instead:**
"2021-06-25T19:16:55+**0000**" (missing colon at the decimal fraction of a second)
**per documentation, the correct format should be:**
**Complete date plus hours, minutes and seconds:**
YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
**Complete date plus hours, minutes, seconds and a decimal fraction of a second**
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/60index-worker & reindex-worker are exposed2022-08-24T10:47:12ZAn Ngoindex-worker & reindex-worker are exposedWhen a record-changed event is triggered by Storage, it results in a Service Bus message from Storage that is handled by the **indexer-queue** service. This service calls standard HTTP endpoints for **indexer** service via its kubernete...When a record-changed event is triggered by Storage, it results in a Service Bus message from Storage that is handled by the **indexer-queue** service. This service calls standard HTTP endpoints for **indexer** service via its kubernetes-internal name, never transiting through the App Gateway (or out of the cluster).
The **indexer** service exposes those same endpoints outside the cluster. While most endpoints for services are protected by an Istio **AuthorizationPolicy** to require a valid token (and subsequently use that token to extract user information for authorization within the service), there is no token sent with these requests by the **indexer-queue** and the **indexer** service's **AuthorizationPolicy** excludes these endpoints from the token requirement.
This means any outside caller can send requests to these endpoints, with no authorization and no restriction. This could be exploited to cause denial-of-service attacks, send forged event messages, or use vulnerabilities within the **indexer** service to compromise the entire OSDU system. Because no token is required, these attacks can be done by anyone. Because the OSDU Community software is open-source, even "security through obscurity" is not effective here.
**Recommended approaches to solve this:**
- Use **indexer** VirtualService to reject external requests to these endpoints, making them reachable only from within Kubernetes.
- Add a token to the requests going to **indexer **service's index-worker and reindex-worker endpoints
- [additionally] Consider using Istio's mTLS and/or Kubernetes Network Policy to restrict communication to the **indexer** service's index-worker and reindex-worker endpoints to traffic coming from **indexer-queue**
There may be other acceptable solutions.