Storage issueshttps://community.opengroup.org/osdu/platform/system/storage/-/issues2023-11-15T10:54:25Zhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/189[SAST] Vue_DOM_XSS in file index.html2023-11-15T10:54:25ZYauhen Shaliou [EPAM/GCP][SAST] Vue_DOM_XSS in file index.html**Description**
The method m-1"\> embeds untrusted data in generated output with href, at line 36 of \\storage\\provider\\storage-azure\\src\\main\\resources\\static\\index.html. This untrusted data is embedded into the output without p...**Description**
The method m-1"\> embeds untrusted data in generated output with href, at line 36 of \\storage\\provider\\storage-azure\\src\\main\\resources\\static\\index.html. This untrusted data is embedded into the output without proper sanitization or encoding, enabling an attacker to inject malicious code into the generated web-page.
# **Location:**
<table>
<tr>
<th> </th>
<th>Source</th>
<th>Destination</th>
</tr>
<tr>
<th>File</th>
<td>storage/provider/storage-azure/src/main/resources/static/index.html</td>
<td>storage/provider/storage-azure/src/main/resources/static/index.html</td>
</tr>
<tr>
<th>Line number</th>
<td>92</td>
<td>36</td>
</tr>
<tr>
<th>Object</th>
<td>pathname</td>
<td>href</td>
</tr>
<tr>
<th>Code line</th>
<td>return location.protocol + '//' + location.host + location.pathname</td>
<td>
\<a :href="signInUrl" class="btn btn-primary" v-if="!token" class="col-2"\>Login\</a\>
</td>
</tr>
</table>M21 - Release 0.24https://community.opengroup.org/osdu/platform/system/storage/-/issues/187ADR: Replay API2024-02-29T14:40:15ZAkshat JoshiADR: Replay APITwo New API in Storage service as part of Replay flow will be introduce in the storage service.
* [] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
This ADR is centered around the design of ...Two New API in Storage service as part of Replay flow will be introduce in the storage service.
* [] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
This ADR is centered around the design of the new replay API within OSDU's storage service which is introduced as the part of the [Replay ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/186). The purpose of this Replay API is to publish messages that indicate changes to records, which are subsequently received and processed by consumers. It's important to note that the handling of these messages follows an idempotent process.
## Terminology
<table>
<tr>
<td><strong> Name</strong>
</td>
<td><strong> Explanation</strong>
</td>
</tr>
<tr>
<td><strong> Record</strong>
</td>
<td>The record is stored in OSDU Data Platform in two parts, i.e., document database, which contains basic data (id, kind, legal information, and access permissions), and file storage in a Java Script Object Notation (JSON) format, which contains other relevant information of the record. We are interested in the document database part.
</td>
</tr>
</table>
## Tradeoff Analysis
The new APIs does not represent a breaking change of any other API, and consequently neither for the consuming applications. Only concerned-consuming applications would benefit from this new feature, while it remains entirely transparent for others.
## Additional Requirement
The newly introduced APIs must facilitate [Collaboration workflows](https://community.opengroup.org/osdu/platform/system/storage/-/issues/149) through the utilization of the x-collaboration header. Additionally, the replay mechanism should ensure the accurate publication of collaboration context information in the corresponding event.
## Decision
The proposal is to provide POST and GET Replay API -
The new APIs does not represent a breaking change of any other API, and consequently neither for the consuming applications. Only concerned-consuming applications would benefit from this new feature, while it remains entirely transparent for others.
<table>
<tr>
<td><strong> API fields </strong>
</td>
<td><strong>Explanation</strong>
</td>
</tr>
<tr>
<td><strong>kind</strong>
</td>
<td>It specifies to what Kind the schema belongs to. [optional]
</td>
</tr>
<tr>
<td><strong>repalyId</strong>
</td>
<td>It represents status id. [required]
</td>
</tr>
<tr>
<td><strong>operation</strong>
</td>
<td> Define the replay operation to be carried out. [required]
</td>
</tr>
<tr>
<td><strong>filter</strong>
</td>
<td> Define based on which field the record is selected. [optional]
</td>
</tr>
</table>
<strong>Allowed roles for API access</strong> : users.datalake.ops
<br>
<table>
<tr>
<td>
<strong>Method</strong>
</td>
<td>
<strong> API Endpoint</strong>
</td>
<td>
<strong>Design</strong>
</td>
</tr>
<tr>
<td> POST
</td>
<td>v1/replay
</td>
<td>
<strong>Request Example - </strong>
<p>
<strong> </strong>
<p>
1. <strong>Description</strong> - This API request will reindex all the storage records.
<p>
This phase we will pass empty body for reindexall
<p>
{
<p>
}
<p>
In next phase -
<p>
![operationrepaly](/uploads/d7679bf7d4d6d9745e0d9c579905fc74/operationrepaly.png)
<p>
2. <strong>Description</strong> - This API request will reindex the specific kinds of storage records in this operationName is optional by default, it will reindex specific kinds with filter field. Currently we will replay for single kind only so the array of kind will be restricted to size one.
<p>
![operationrepaly](/uploads/f06805a167d15986688ba23ac85ee897/operationrepaly.png)
<p>
<p>
<strong>Response example – </strong>
![responsepostreplay](/uploads/c557910f6369deda3971866bd2130864/responsepostreplay.png)
<p>
<strong>
</td>
</tr>
<tr>
<td> GET
</td>
<td>
replay/status/{id}</em>
<p>
</td>
<td>
<strong>Request:</strong>
<p>
<p>
<p>
1. <strong>Response Replay in Progress:</strong> <br>
<p>
a) <b>Scenario</b> - In Replay All <br><br>
![replaystatusAllKind](/uploads/12f155b5d491010f3ea37c2576e56e19/replaystatusAllKind.png) <br>
b) <b>Scenario</b> - In Replay single kind <br><br> ![replaystatusforsinglekind](/uploads/2043d80e2d350faa2f3fdb41d4601e0f/replaystatusforsinglekind.png)
<br>
<p>
<p>
2. <strong>Response Replay in Failed:</strong> <br>
<p>
a) <b>Scenario</b> - In Replay All <br><br>
![replayFailedForAllKind](/uploads/3d9a64803b229d3b46d4e283047d285e/replayFailedForAllKind.png)
<br>
b) <b>Scenario</b> - In Replay single kind <br><br>
![replayfailedforsinglekind](/uploads/407b53b19ddfa4545f52e9e88d34fb11/replayfailedforsinglekind.png)
<p>
<p>
</td>
</tr>
</table>
<br>
API spec swagger yaml -[ReplayAPISpecs.yaml](/uploads/f9e8ddd4958bf04f9bc99994ebdc4e41/ReplayAPISpecs.yaml)https://community.opengroup.org/osdu/platform/system/storage/-/issues/186ADR: Replay2024-03-05T10:59:17ZAkshat JoshiADR: Replay<a name="ppadhi"></a>OSDU - Replay and Replay API
# Table of Contents
[Context ](#_toc119676063)
[Problems with Current Reindex All Solution ](#_toc119676075)
[Replay ](#_toc119676076)
[Requirements to address ](#_toc119676077)
[Arc...<a name="ppadhi"></a>OSDU - Replay and Replay API
# Table of Contents
[Context ](#_toc119676063)
[Problems with Current Reindex All Solution ](#_toc119676075)
[Replay ](#_toc119676076)
[Requirements to address ](#_toc119676077)
[Architectural Options ](#_toc119676078)
[Decision ](#_toc119676079)
[Replay API](#_toc119676080)
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## <a name="_toc119676063"></a>Context
This ADR is centered around the design of the new replay flow within OSDU's storage service. The purpose of this Replay flow is to publish messages that indicate changes to records, which are subsequently received and processed by consumers. It's important to note that the handling of these messages follows an idempotent process.
The Replay flow will address following-
1. In case of disaster, this replay flow will help us to rebuild the indexes that to RPO.[Out of Scope of ADR]
2. Reindexing the records by publishing the record change messages to consumer Indexer service.
3. Correction of indices after changes to structure of the storage records of a particular kind.
**Replay rate** - It is the rate at which storage publish the record changes message to service bus.
## <a name="_toc119676075"></a>Problems with Current Reindex All Solution
|**Problem**|**Details**|**What is Required?**|
| :- | :- | :- |
|Reliability |<p>**Operation is Synchronous.**</p><p>- Very long HTTP call is never reliable</p><p></p><p></p><p>The Reindex is a synchronous operation, making the operation Unreliable and not resilient to failures. If there is any interruption to the connection, all the status and progress could be lost.</p><p></p>|The operation must be reliable. If the operation is triggered, it must either succeed or it must fail and in both the cases, the user must be diligently informed with the right reasons for success/failures. The system should not be in a state where the user has no clue what’s happening.|
|Resiliency|Abrupt disturbance of the reindex-process leaves the system in an inconsistent state. For example, if there is any exception or if the process crashes, then the system is left entirely in an inconsistent state.|The system must be resilient to failure and must always succeed. If the operation fails, then the system must be left in the previous state.|
|Scale|Due to the synchronous and non-resilient nature of the current implementation, the scale is very limited. It cannot ingest more than a couple of million records reliably.|The reindex operation must scale to any number of records|
|Speed|The speed is very slow. It’s known to take close to an hour for 1 million records.|Faster rate of reindexing is required. For example, 100 million records should not take more than a few hours. |
|Tracking/Observability|There is no way for the user to know about the progress.||
|Pausing/Resuming reindex|Today, there is no capability to pause and resume reindex. Given that this will be a long running operation, having pause and resume will be good to have.||
|No Delta Reindexing|For some Disaster Recovery Scenarios, there may be partial backups available. So reindexing only a subset of records of a kind can prove to be useful. This functionality is not available today.||
|Parallelization|Currently, the reindex is a procedural process. This has impact on both scale as well as speed.||
## <a name="_toc119676077"></a>Requirements to address
To be able to address these issues, we need to re-design the way reindex works, addressing various functional and non-functional aspects like speed, scale, reliability, observability, etc. The below table outlines what is expected out of the new Reindex design.
|**Requirement**|**Details**|**Technical Implications** | **Scope** |
| :- | :- | :- | :- |
|1. Scalability|<p>The Replay operation must be scalable; it should be able to handle infinitely large amounts of records.</p><p><br>A realistic goal to target can be 100m records in 4-5 hours.</p>|<p>Need to ensure Elasticsearch storage can be scaled up.</p><p></p><p>For achieving a higher scale, the following must be done: -</p><p>- The whole operation must be **Asynchronous** in nature</p><p>- It must be resilient to failures due to pod crashes, 429s due to high Database/Service Bus/Elasticsearch load.</p><p>- We can leverage Message Broker to divide and conquer and have the framework.</p><p>- We can also look at job schedulers like QUARTZ to achieve a reliable reindex.</p><p>- Need to evaluate which is the best service to perform this reindexing. </p><p>- Can also try to leverage **Airflow**</p><p></p>| In Scope of ADR |
|2. Reliable Responses|<p>When the operation is triggered, the response must be reliable. </p><p></p><p>There could be some pre-validation done to check whether the reindex process can be completed either successfully or not.</p><p>The result of whether the operation is success or fail, should be communicated via response to the user properly.</p>|Today, we don’t return anything apart from 200 OK in the response even if things fail. <br><br>The entire response should be revamped and reworked on how the status can be conveyed to the user in a useful way.| In Scope of ADR |
|3. Observability and Monitoring|<p>Given the fact that reindex is a long running operation, the User triggering the reindex must have insights into what is going on, using a track status API.</p><p></p><p>Some of the details should include:</p><p>- **Status:** Validating, Stopping-Ingestion, In-progress, Finalizing, Complete, Error, etc.</p><p>- **Progress:** Overall percentage, per index progress, remaining records count, ETA</p><p></p>|We could store the progress in a Redis cache or elsewhere that can be used to report back to the user on the progress.| In Scope of ADR |
|4. Reliable System State – Consistency before/after operation in case of failure|<p>Guarantee to reindex valid storage records – **Must have**</p><p><br>**(depends on message broker reliability)**</p><p></p><p><br>**Rollbacks** – nice to have</p>|<p></p><p>If there are unrecoverable errors during reindexing a particular kind, then that leaves the system in an inconsistent state. It would be good to “**rollback**” the operation to restore the system to the state before the operation was triggered for that kind.</p><p></p><p>There should also be **no concurrent “reindexAll” operation** running. There can however, be concurrent reindex of different kinds happening at the same time.</p><p>It can be a configurable parameter on whether the rollback should be done in case of unrecoverable failures, due to internal system errors.<br><br>How this can be achieved is that, all the reindexed records for a kind, should be indexed into a new “secondary index” for that kind, and only if that is succeeds completely, the index can be renamed and replace the primary index.<br><br>Elasticsearch’s clone index feature can be utilized to achieve this.</p><p></p><p>- Reindex failed record IDs</p>|Out of Scope of ADR |
|5. Stop Ingestion/Search during Reindex|<p>During **Reindex**, the normal ingestion should stop. This is because:</p><p>- There are some edge cases which could end up the system in an inconsistent state. Edge Cases: **<TODO>**</p><p>- Load on Elasticsearch</p><p></p>| | Out of Scope of ADR|
|6. Speed</p>|<p></p><p>The operation is quite slow today. It takes almost an hour to reindex a million records. This means it will take a few days to reindex 100m records, which is not practical.</p><p></p><p>Two Issues:</p><p>1. Finding Unique Kinds</p><p>2. Reindexing – Database load</p>|<p></p><p>This is **directly dependent on the scalability of the underlying infra like Database** and Elasticsearch. </p><p>Database can be scaled up/out on demand, by either the UI by customer (i.e., a via CP), or some other means. </p><p></p><p>Auto scaling-out of Elasticsearch is currently not possible, so we may be limited in speed due to Elasticsearch. We can, however, scale up Elasticsearch and this can help in higher speed.</p><p></p><p>How this scale up is triggered automatically or manually is something we need to evaluate and do a POC.</p><p></p><p>Storage Service’s queries can also be revisited – there was a change done in some service which had a more efficient implementation of paginated queries - [Performance improvement on paginated query for CosmosDB (!244) · Merge requests · Open Subsurface Data Universe Software / Platform / System / Lib / cloud / azure / OS Core Lib Azure · GitLab (opengroup.org)](https://community.opengroup.org/osdu/platform/system/lib/cloud/azure/os-core-lib-azure/-/merge_requests/244/diffs)</p><p></p><p></p>| |Out Scope of ADR |
|7. **Delta Reindex** and **Consistency Checker/Enforcer**|<p>Doing a delta reindex can be useful if there is restoration of backups during a disaster recovery. This will result in faster recovery times.</p><p></p><p>Delta Reindex = reindex only those records that are not present in Backup.<br><br>When we talk about delta reindex, we need to ensure there is consistency across all 3 components – storage blob, storage records and Elasticsearch.</p><p></p>|<p>Need to explore feasibility. The operation can be something like Reindex All records whose create/update time > X.</p><p></p><p>A consistency enforcer should be built that will ensure that the 3 entities are in consistent state.</p>| Out Scope of ADR |
|8. Snapshot Backup/Cluster replication|<p>Backup Elasticsearch storage Snapshots frequently, and in case of disaster, restore the snapshot and then perform the delta reindex.<br><br>This will make the recovery times much faster| |Out Scope of ADR |
|9. Source of trigger|During a recovery process, who will make the call to reindex? Is it the user or internal system? |Will need to design and account for this fact in the reindex design.| Out Scope of ADR |
|10. Pause/Resume Reindex|Since reindex is a long running operation, having the ability to pause and resume reindex operation would be nice to have|<p>We need to ensure system consistency when the operation is paused and resumed. </p><p></p><p>Also, any new records ingested after the pause must be included in the reindex process when it’s resumed.</p><p></p>| Out Scope of ADR |
## <a name="_toc119676078"></a> Architectural Options:
<br>
|**Options**|**Pro**|**Cons**|**Work Required**|
| :- | :- | :- | :- |
|1. Using **Airflow** + Message Broker + StorageService + Workflow Service|<p>- Proven Workflow Engine</p><p>- Lesser new implementations in storage services, so lesser work required by other CSPs.</p>|<p>- Process becomes slower and inefficient.</p><p>- Lot of HTTP calls from Airflow <-> AKS</p><p>- Airflow will require access to internal Infrastructure to operate in the most efficient manner.</p><p>- Some required features are not yet available in ADF Airflow </p><p>- Parallelization may spawn up 1000s of tasks waiting to be scheduled. **Scalability can be issue.**</p><p>- Concurrency and Safety guarantee is tricky – allowing no more than one reindex for a kind</p><p></p>|<p>**Airflow**</p><p>- DAG using TaskGroups, Dynamic Task Mapping, Concurrency handling.</p><p>- Build pipelines to integrate new DAG.</p><p></p><p>**Storage Service**</p><p>- Implement new APIs to publish messages to message broker.</p><p></p><p>**Indexer Service**</p><p></p><p>**Workflow Service**</p><p>- Have new APIs to support observability</p><p>- Design for checkpointing</p>|
|2. Using **StorageService** + **Message Broker**|<p>- Simple, Lesser moving parts</p><p>- Fast & Efficient</p>|- Parallelization may require state management.|<p>**Storage Service**</p><p>- New APIs for exposing Replay functionality (ReplayAll, ReplayKind, GetReplayStatus)</p><p>- New Modules for replay message processing</p><p></p><p>**Indexer Service**</p><p>- Delete ALL kinds API</p>|
## <a name="_toc119676079"></a> Decision:
We chose design option 2 using storage service and message broker as the advantage is to persist the replay status and allows to re-play and return the status and simpler implementation.
- **[Decision]** What led us to select the Storage service for the Replay API decision? <br>
* The source of truth for the storage records is – Storage Service. It is the storage service, that publishes the record change messages, which are then consumed by the consumers. This processing of those messages is idempotent.So, it’s fair to say that to trigger reindexing, we must invoke some procedure in storage service, that will make it emit record change messages onto the message broker.<br>
* Indexer is just a consumer of the recordChange messages, and there could be other consumers who require this replay functionality as well. In those cases, instead of letting each consumer build their own replay logic, if we have it in one common place, it would benefit all the consumers. <br>
* This way, one consumer doesn’t have to depend on indexer, which is also just another consumer<br>
* Reindex is just one-use cases, that uses this new Replay functionality. Other consumers can have their own use case for consuming those replayed messages.
<br>
**Design Approach for option 2:**
![Aspose.Words.71972436-70f7-48df-8f1c-d2035f55ce34.004](/uploads/5a573b82493315f91adeee547fd97fee/Aspose.Words.71972436-70f7-48df-8f1c-d2035f55ce34.004.png)
**Note**
The ADR also helps to address following issues - <br>
- **[Issue]** https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/91 <br>
* The Replay flow will include a Service Bus topic for every event. If we need to introduce new events in the future that necessitate message publishing, we can easily do so by introducing a new topic and associated logic. This approach can help prevent unintended consequences that may arise from triggering other listeners on the same topic, as they can be resolved accordingly. <br>
- **[Issue]** https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/66
* Utilizing the service bus and tracking its progress assists us in achieving a reliable design, including the built-in reliability of message queuing. <br>
- **[Issue]** https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/80
* With the flexibility to introduce new topics in the ReindexAkshat JoshiAkshat Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/185ADR: API to retrieve past events of storage records2023-10-11T16:28:52ZYifan YeADR: API to retrieve past events of storage recordsNew API in Storage service to rehydrate past creation and last modified events for a given kind within the given time range.
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## Context & Scope
The OSDU Sto...New API in Storage service to rehydrate past creation and last modified events for a given kind within the given time range.
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## Context & Scope
The OSDU Storage service does not provide a way to retrieve past events of the records been created/modified. Many OSDU applications would be interested in retrieving the past events of the records that happened before the application subscribed to the notification service. This new API proposed in the ADR will provide the concerned applications with a way to backtrack the events.
The proposal is to provide an API on storage service to support retrieving past events of records of a kind that happened in the given time range, where the events will be returned in a paginated format and ascending chronological order based on the timestamp.
The new API will retrieve the first and the last events of the record, filter the events by the start date and end date provided by the user, and then return the filtered events.
## Tradeoff Analysis
The new API does not represent a breaking change of any other API, and consequently neither for the consuming applications. Only concerned-consuming applications would benefit from this new feature, while it remains entirely transparent for others.
## Decision
Provide an API to query past events of records of the given kind and return the events in paginated ascending chronological order.
{
“id”: \<RECORD_ID\>
“kind”: \<KIND\>
“op”: \<CREATE|UPDATE|DELETE, etc\>
"version": \<VERSION\>
"timestamp": \<TIMESTAMP\>\
}
## Consequences
* A new API on the Storage service would be available.
* Documentation of the Storage service should be modified with details for the new API.Yifan YeYifan Yehttps://community.opengroup.org/osdu/platform/system/storage/-/issues/183Add a note on deleted records to /versions2023-09-06T13:20:14ZMarton NagyAdd a note on deleted records to /versions**GET /records/versions/{id}** "Get all record versions" endpoint in [Storage Service](https://p4d.developer.delfi.cloud.slb-ds.com/workspace/apiCatalog/OSDU-Storage-Service) seems to retrieve record versions regardless of the record its...**GET /records/versions/{id}** "Get all record versions" endpoint in [Storage Service](https://p4d.developer.delfi.cloud.slb-ds.com/workspace/apiCatalog/OSDU-Storage-Service) seems to retrieve record versions regardless of the record itself **being (soft) deleted** or not. While neither **GET /records/{id}** "Get record" or **GET /records/{id}/{version}** "Get record version" retrieves the record when it's (soft) deleted... doing so correctly.
Please add a note to the **GET /records/versions/{id}** endpoint description to highlight the difference.
cc @nthakur, @gehrmannhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/181GET: /records/{recordID}/{version} - ERROR 5002024-01-01T08:47:32ZSiarhei Khaletski (EPAM)GET: /records/{recordID}/{version} - ERROR 500**Context**
GET: /records/{recordID}/{version} fails with error 500 if an invalid version is provided (see the attachment)
We noticed an odd behavior of the service:
List of existing versions of the following record: `opendes:work-pro...**Context**
GET: /records/{recordID}/{version} fails with error 500 if an invalid version is provided (see the attachment)
We noticed an odd behavior of the service:
List of existing versions of the following record: `opendes:work-product-component--SamplesAnalysis:e9f02f48f43149a8b69606ff7597f391`
![image](/uploads/3d75fd80a57f5558c7d0eb00a4d795eb/image.png)
If request unexisting version `1` - status error 500
![image](/uploads/d3dc228f70263bd24ff7d09975baa63c/image.png)
Meanwhile, if request unexisting version `1234` - status 404
![image](/uploads/e82da89c3673b643aaa26845f0eb0c81/image.png)
**Azure GLab Logs**
![image](/uploads/8d54b1addcbc1835b4ea3c90135072b6/image.png)
**Expected Behavior**
404 - status codeM22 - Release 0.25Siarhei Khaletski (EPAM)Chad LeongSiarhei Khaletski (EPAM)https://community.opengroup.org/osdu/platform/system/storage/-/issues/180Unable to nullify a non-system attribute from DateTime value to null or empty...2023-08-22T10:20:49ZShubhankar SrivastavaUnable to nullify a non-system attribute from DateTime value to null or empty value using Storage serviceTo support a business use case, a user would need to update an existing attribute (with data type as date time) residing under data { } section of a **work-product-component** schema from a valid DateTime value (e.g.- 2023-08-10T00:00:00...To support a business use case, a user would need to update an existing attribute (with data type as date time) residing under data { } section of a **work-product-component** schema from a valid DateTime value (e.g.- 2023-08-10T00:00:00+0000) to "null" or an empty string (""). But when this transaction is attempted and executed via STORAGE service, the value of the attribute remains unchanged even after a successful execution (HTTP status code - 200). STORAGE service should allow users to register an empty/null value for DateTime attribute.
Please note that the attribute "DateSubmitted" does not belong to the list of System Properties like "createTime" or "modifyTime" and might not be used for auditing purposes.
1. "kind": "shell:wks:work-product-component--LQCWebSheet:1.0.0"
2. Example record:
{
"data": {
"ApprovalStatusTypeID": "osdu:reference-data--LQCApprovalStatusType:Submitted:",
"Source": "shell",
"Name": null,
"IsBonus": false,
"LoggingInterpreter": null,
"FinalDeliveryDuration": 1.0,
"WebSheetName": "Test_LWD_Websheet_Edit_Approver_Request_v2",
"LastUpdatedPPEmail": null,
"ApproverEmail": "NewApprover1.Nayak@shell.com",
"WellboreID": "osdu:master-data--Wellbore:BDLQCGOM2_1_WB2:", "
"OperationalComment": "Test_LWD_Websheet_Edit_Approver_v2_Operational_Comments",
"ApproverComment": null,
"SourceApplication": "Created in LQC WebSheets",
"SubmitterName": "Sujith.Submitter@shell.com",
"IsApprovalStatusReset": true,
"DateSubmitted": "2023-06-05T07:56:19.914485+0000"
},
"kind": "shell:wks:work-product-component--LQCWebSheet:1.0.0",
"source": "wks",
"acl": {
"viewers": [
"data.default.viewers@osdu.shell.com"
],
"owners": [
"data.default.owners@osdu.shell.com"
]
},
"type": "work-product-component--LQCWebSheet",
"version": 1686283555925808,
"tags": {
"normalizedKind": "shell:wks:work-product-component--LQCWebSheet:1"
},
"modifyUser": "Monalisa.Mohapatra@shell.com",
"modifyTime": "2023-06-09T04:05:56.083Z",
"createTime": "2022-12-15T11:26:58.940Z",
"authority": "shell",
"namespace": "shell:wks",
"legal": {
"legaltags": [
"osdu-shell-lqc-dataset-testing"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "Labanyendu.Nayak@shell.com",
"id": "osdu:work-product-component--LQCWebSheet:62008"
}
3. Target attribute - "data.DateSubmitted"https://community.opengroup.org/osdu/platform/system/storage/-/issues/179Storage batch API returns 404 for unauthorized records2024-03-07T13:08:37ZAn NgoStorage batch API returns 404 for unauthorized records**Use-case:** Reindex Kind API is called.
Noted in the logs there were 404s returned.
Record Fetch on some of the impacted records, 403s were returned.
Investigation shows Batch Record fetch returned 404s instead.
Issue identified f...**Use-case:** Reindex Kind API is called.
Noted in the logs there were 404s returned.
Record Fetch on some of the impacted records, 403s were returned.
Investigation shows Batch Record fetch returned 404s instead.
Issue identified from this workflow:
- Storage batch API responds unauthorized records (403) as not found (404)
### ADR: Storage batch API responds unauthorized records (403) as not found (404)
#### Status
- [x] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
#### Context & Scope
The current behavior of Storage batch API: if a record is not authorized, it is put in the _notFound_ field of the response body along with other not found records. The response body in this case looks like this:
```
{
"records": [],
"notFound": [
"opendes:facet:unauthorizedrecord1",
"opendes:facet:unauthorizedrecord2",
//other not found records...
],
"conversionStatuses": []
}
```
#### Solution
To fix this behavior of the Storage batch API we can introduce a new field to the response body. The proposed solution is to add a new field (_unauthorized_) to the response body, so we can distinguish between unauthorized and actual not found records. Sample response body:
```
{
"records": [],
"notFound": [
//not found records...
],
"unauthorized": [
"opendes:facet:unauthorizedrecord1",
"opendes:facet:unauthorizedrecord2"
],
"conversionStatuses": []
}
```
#### Сonsequence
This solution is a breaking change as it implies changing API contract. It will include a change in the core library, a change in Storage, and then a change in the Indexer service to handle batch API response.Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/storage/-/issues/178ADR: CosmosDb saturation/throttling when records reach too many versions2024-03-25T06:43:30ZAlok JoshiADR: CosmosDb saturation/throttling when records reach too many versions## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
***ISSUE***: Storage service stability issues due to too many versions of records.
***User behavior that causes this issue***: ...## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
***ISSUE***: Storage service stability issues due to too many versions of records.
***User behavior that causes this issue***: Creating a lot of versions for the same record ID. When multiple applications/teams do this long enough, we have too many versions for many records. There are no checks in place to prevent this scenario. We eventually hit infrastructure limits (i.e. CosmosDb document max size 2MB) but observe service instability much before.
***Why is this a problem***: Record versions are stored as part of record metadata. This is part of the `gcsVersionPaths` array. Each version is a string that represents the full path to the version's blob location. Record metadata is stored in CosmosDb. While CosmosDb has a hard size limit (2MB) for each document, this size is already too big when RU usage is considered. If we have hundreds or thousands of such records being updated, the total RU consumed is very high, incurring huge costs. This scenario poorly impacts service latency and availability. While not ideal, it is quite possible for applications to create versions of the same record for their workflows.
![image](/uploads/3f53fa471e7566a04d69ea539712db76/image.png)
For reference, here are some preliminary observations on the number of versions, size of the document and RU consumed to perform an UPSERT on a ***single*** document (note that the number of versions is not an ***absolute*** indicator to say how much RU will be consumed in performing an UPSERT, because its the size of the document that matters, and each version string can be of different length. One can fit a lot more versions if each version's length is small. However, as we stand today, it is the only metadata property that is causing documents to be big).
~1500 versions, ~300 RU consumed, ~243kb file size
~1500 versions, ~370 RU consumed, ~300kb file size
~3800 versions, ~1250 RU consumed, ~750kb file size
~5300 versions, ~1253 RU consumed, ~880kb file size
~9850 versions, ~2502 RU consumed, ~1.3mb file size
It is quite easy to have a few hundred or thousand records cripple the system once the records reach certain number of versions.
***CLARIFICATION***: The issue we observed is more specific to the Azure use case. Infrastructure limitations (i.e. cost to access a large document, hard limit on the size of the document) may vary per CSP (i.e. 2MB for CosmosDb, 1MB for GCP datastore). Other CSPs may see this issue once the number of versions reaches a certain number.
## Tradeoff Analysis
It is clear we want to limit the number of record versions. We see 2 ways to achieve this.
1. ***Set a hard limit*** on the number of versions on each record (say 1000) (preferred approach).
- Pros: Easy to implement, no behind-the-scenes magic.
- Cons: Breaking change for the existing workflows, when their records already have more than 1000 versions. Needs advance notice of breaking change and time for teams to update the workflows.
We can roll this out by first introducing a `deleteVersion` API in Storage that would give users time to delete older versions by themselves before breaking change is introduced so they don't break immediately.
2. ***Only keep 1000 recent versions***. For new records, this would mean actively start deleting the oldest version once we reach 1000 versions. For existing records with more than 1000 versions, this would mean cleaning up all older versions.
- Pros: Older versions are cleaned up for users automatically.
- Cons: Still a breaking change as older versions would get deleted automatically. Involves behind-the-scenes cleanup of older versions. For records that currently have more than 1000 records, this includes all remaining versions. There can be failure scenarios with cleanup and performance implications.
## Consequences
Storage will introduce a limit on the number of versions a record can have. Depending on the solution we choose, API will either fails after n versions (hard limit) OR older versions will get deleted automatically.M23 - Release 0.26Alok JoshiChad LeongThulasi Dass SubramanianOm Prakash GuptaAlok Joshihttps://community.opengroup.org/osdu/platform/system/storage/-/issues/177Integration test coverage for users.data.root2023-07-20T11:05:00ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comIntegration test coverage for users.data.rootChanges to data authentication were recently introduced with the merge request: https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/694. However, we currently lack integration test cases to cover these modificat...Changes to data authentication were recently introduced with the merge request: https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/694. However, we currently lack integration test cases to cover these modifications.
It is essential to ensure that these changes won't disrupt the current flow and that `users.data.root` will consistently have access to ingested data.
To address this, we need to implement integration test cases to cover the new data authentication mechanisms.M20 - Release 0.23Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/176Storage x-collaboration header bug2023-09-26T14:21:44ZShane HutchinsStorage x-collaboration header bugFound this issue in /api/storage/v2/query/records, /api/storage/v2/query/records:batch
Received a response with 5xx status code: 500
Run this curl command to reproduce this failure:
curl -X GET -H 'Authorization: Bearer TOKEN' -H ...Found this issue in /api/storage/v2/query/records, /api/storage/v2/query/records:batch
Received a response with 5xx status code: 500
Run this curl command to reproduce this failure:
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' -H 'x-collaboration: ^À' 'https://osdu.r3m18.preshiptesting.osdu.aws/api/storage/v2/query/records?kind='
curl -X POST -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' -H 'x-collaboration: ^À' -d '[]' https://osdu.r3m18.preshiptesting.osdu.aws/api/storage/v2/records/delete
PUT /api/storage/v2/records
curl -X PUT -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' -H 'x-collaboration: €' -d '[]' https://osdu.r3m18.preshiptesting.osdu.aws/api/storage/v2/records
Azure PUT /api/storage/v2/records:
curl -X PUT -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: opendes' -H 'x-collaboration: €' -d '[]' https://osdu-ship.msft-osdu-test.org/api/storage/v2/records
Confirmed this bug in AWS and Azure.https://community.opengroup.org/osdu/platform/system/storage/-/issues/173Does not detect mismatch of entity name between "kind" and "id"2023-06-06T00:42:59ZDebasis ChatterjeeDoes not detect mismatch of entity name between "kind" and "id"I made this test case to create record directly by using Storage service and then the same record by using Manifest-based Ingestion.
```
"kind": "osdu:wks:work-product-component--TubularComponent:1.0.0",
"id": "osdu:work...I made this test case to create record directly by using Storage service and then the same record by using Manifest-based Ingestion.
```
"kind": "osdu:wks:work-product-component--TubularComponent:1.0.0",
"id": "osdu:work-product-component--TubularAssembly:TUBULARDC31May",
```
As you can see "kind" speaks of **TubularComponent** whereas "id" speaks of **TubularAssembly**.
Storage service seems very forgiving. It creates the record and also Indexer replicates the record in Index store. So, we can also retrieve by using Search service.
Whereas Manifest-based Ingestion rejects this JSON payload wit suitable reason, as expected.
See this document.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M17/Test_Plan_Results_M17/Core_Services/M17-AWS-Storage-service-test-sanity.docxhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/172Metadata update API succeeds on remove operation on a `tag` if the tag doesn'...2023-05-25T10:36:21ZAlok JoshiMetadata update API succeeds on remove operation on a `tag` if the tag doesn't existSteps to reproduce:
- Create a record with some tags
- Try to update the record metadata via [metadata update API](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/docs/tutorial/StorageService.md#metadata-updat...Steps to reproduce:
- Create a record with some tags
- Try to update the record metadata via [metadata update API](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/docs/tutorial/StorageService.md#metadata-update-api) by removing a non-existing tag
```
curl --request PATCH \
--url '/api/storage/v2/records' \
--header 'accept: application/json' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json'\
--header 'Data-Partition-Id: common'
--data-raw ‘{
"query": {
"ids": [
"tenant1:type:unique-identifier:version"
]
},
"ops": [
{
"op":"remove",
"path":"/tags",
"value":[
"tagthatdoesntexist"
]
}
]
}
```
This should return 4xx, but returns 2xxhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/171Metadata only updates (via PATCH api) creates a mismatch in modifyUser and mo...2023-07-05T09:50:37ZAlok JoshiMetadata only updates (via PATCH api) creates a mismatch in modifyUser and modifyTime fields between record metadata and record data[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUse...[This ADR](https://community.opengroup.org/osdu/platform/system/storage/-/issues/148) introduces separate modifyTime and modifyUser fields for every version of OSDU Storage record. This creates a mismatch between modifyTime and modifyUser fields for metadata and data objects respectively.
Repro steps:
- Create a storage record
- Modify the metadata ACL with PATCH api
- Retrieve the record with Storage records:batch api or getRecord api
- modifyTime and modifyUser fields are not returned.
OR
- Create a storage record
- Update the same record with PUT api
- Modify the metadata ACL with PATCH api
- Retrieve the record
- modifyTime and modifyUser are returned but not correct
Expected: From a user's perspective, when they update a record (either metadata or data or both), they should get back modifyUser and modifyTime values appropriatelyhttps://community.opengroup.org/osdu/platform/system/storage/-/issues/170Invalidate derived data when parent record is deleted2023-03-31T10:02:02ZAn NgoInvalidate derived data when parent record is deletedDerived data (records with ancestry/parent) inherit the legal tags from the parent record(s).
So when at least one of the parent records is deleted, then the children records are no longer valid. Without this step, there are records wit...Derived data (records with ancestry/parent) inherit the legal tags from the parent record(s).
So when at least one of the parent records is deleted, then the children records are no longer valid. Without this step, there are records with invalid legal tags (or no legal tag) still exists in the system.https://community.opengroup.org/osdu/platform/system/storage/-/issues/169ADR: API to purge a batch of storage records2023-05-02T12:16:58ZMandar KulkarniADR: API to purge a batch of storage recordsNew API in Storage service to purge a batch of records
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The OSDU Storage service provides 2 ways to delete a record. One way is ...New API in Storage service to purge a batch of records
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
The OSDU Storage service provides 2 ways to delete a record. One way is to logically delete the record in which the record with same id can be revived later because its version history is maintained.The other way is to permanently delete the record (called as purging) in which case, the record's version history is deleted too. This operation cannot be undone meaning records purged cannot be revived.
In both types of deletions, the record content cannot be accessed using storage or search service.
The storage service provides separate APIs for logical deletion (`POST /records/{id}:delete`) and purging of records (`DELETE /records/{id}`).
The storage service provides API for logical deletion of batch of records (`POST /records/delete`), but such an API is not available for purging of records.
The proposal is to provide an API on storage service to support purging of batch of records, where the maximum batch of size 500 will be supported.
Only the record IDs passed in the request body will be deleted not including any linked records or files if they exist. Cleaning up of all the linked records, such as child records, records in relationship block, and actual data (files ingested via workflow service), would not be in the scope of this API. It would be the user's responsibility.
The new bulk API will work on active as well as non-active (soft-deleted) records, similar to the existing purge API.
Purging of records can be performed by the owner of the records and the owner should be part of users.datalake.admins group.
The API response would be similar to the response of the logical deletion API that is `POST /records/{id}:delete`
In case of partial success, the response code would be 207 and the not-deleted-record-IDs would be listed in the response.
## Tradeoff Analysis
In the absence of an API to purge a batch of records, users would have to call the DELETE API once for every record and it would increase the number of calls to the storage service.
## Decision
Provide an admin-only API to purge a batch of records, with maximum batch size of 500 records.
The Open API specs for storage service with new API is here:
[storage_openapi_batchpurge.yaml](/uploads/1da3f68253419edd693a87d706049565/storage_openapi_batchpurge.yaml)
## Consequences
- New API on Storage service would be available.
- Documentation of Storage service should be modified with details for the new API.https://community.opengroup.org/osdu/platform/system/storage/-/issues/168Storage should allow empty data block upon record creation/update2023-03-22T04:13:47ZAn NgoStorage should allow empty data block upon record creation/updateStorage PUT api should allow empty data block upon record creation/update if that is compliant with the schema being defined.
Currently, data block is required.
data: {}
This is a breaking change since it changes the behavior of the ...Storage PUT api should allow empty data block upon record creation/update if that is compliant with the schema being defined.
Currently, data block is required.
data: {}
This is a breaking change since it changes the behavior of the API.
Indexer service needs to be checked to ensure empty data block is being handled correctly.https://community.opengroup.org/osdu/platform/system/storage/-/issues/166Need example of how to use the POST /query/records:batch Fetch multiple rec...2023-04-20T03:00:55ZKamlesh TodaiNeed example of how to use the POST /query/records:batch Fetch multiple recordsThe Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU ...The Storage API documentation mention about
POST /query/records/batch Fetch multiple records. Would like to get the sample of how is this feature expected to be used.
Need clarification on
Account ID is the active OSDU account (OSDU account or customer's account) which the users choose to use with the Search API.
frame-of-reference: This value indicates whether normalization applies, should be either 'none' or 'units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;'
@chad @debasiscM17 - Release 0.20https://community.opengroup.org/osdu/platform/system/storage/-/issues/163The request to get records of particular kind using the limit is not working.2023-06-20T05:07:07ZKamlesh TodaiThe request to get records of particular kind using the limit is not working.The Storage API CI/CD v1.11 (from Platform Validation project) was working on all the platforms and passing with 100% pass rate.
https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_Stor...The Storage API CI/CD v1.11 (from Platform Validation project) was working on all the platforms and passing with 100% pass rate.
https://community.opengroup.org/osdu/platform/testing/-/blob/master/Postman%20Collection/12_CICD_Setup_StorageAPI/Storage%20API%20CI-CD%20v1.11.postman_collection.json
At present, it is still passing with 100% pass rate in AWS R3 M16 Platform Validation (forum testing environment)
But it is not passing with 100% pass rate in all other Platform Validation CSPs environments as well as
it is not passing with 100% pass rate in all CSPs environments in pre-ship
In the referenced collection Request #8 is failing.
The following request for STORAGE API is in question 08 - Storage - Get all records for a kind with limit of 10 records
=====================================================================
e.g. of passing in Platform Validaition R3 M16 (forum testing)
curl --location 'https://r3m16.forumtesting.osdu.aws/api/storage/v2/query/records?limit=10&kind=osdu%3Awks%3AautoTest_955280%3A1.1.0' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOi...4XnucQETfnB3biA' \
--header 'Cookie: session=eyJfZnJlc2giOmZhbHNlLCJfcGVybWFuZW50Ijp0cnVlfQ.Y_VNrw.SMJbZoZwlkMYCD7E9ge4ICPnqJY'
https://{{STORAGE_HOST}}/query/records?limit=10&kind={{authority}}:{{schemaSource}}:{{entityType}}:{{schemaVerMajor}}.{{schemaVerMinor}}.{{schemaVerPatch}}
The response code: 200 OK
{
"results": [
"osdu:999611481173:999301114394"
]
}
===================================================================
Example of when it is failing
curl --location 'https://r3m16-ue1.preshiptesting.osdu.aws/api/storage/v2/query/records?limit=10&kind=osdu%3Awks%3AautoTest_20923%3A1.1.0' \
--header 'data-partition-id: osdu' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer eyJraWQiOi...tW7kPscDabFJ3sEPeNA'
Response code: 415 Unsupported Media Type
Body of response is blank
It is same message for all the CSP where failure is happening
============================================================================
@chad @debasiscM16 - Release 0.19https://community.opengroup.org/osdu/platform/system/storage/-/issues/162Record ACL should be case insensitive2023-03-09T18:17:51ZAn NgoRecord ACL should be case insensitiveEntitlements group creation always lowercases the group name, regardless of the input.
Storage honors the ACL group name case sensitivity. This creates inconsistency for ACL validation.
**For example:**<br>
User creates a data group cal...Entitlements group creation always lowercases the group name, regardless of the input.
Storage honors the ACL group name case sensitivity. This creates inconsistency for ACL validation.
**For example:**<br>
User creates a data group called: data.SomeGroup.viewers<br>
Upon this request, Entitlements creates a group called: data.somegroup.viewers
Upon creating a record, the user enters data.SomeGroup.viewers as the ACL.<br>
If the user tries to fetch the record, a 403 is returned since Entitlements only sees group data.somegroup.viewers.
**Fix:**<br>
**For existing records (addressing the ghosted records):** Storage fetch record validation should lowercase the ACL group against the list of groups returned from Entitlements.<br>
**Long term solution:** The fix should be in the record creation. Storage PUT API should lowercase the ACL upon record creation. OR We could fail the PUT request if the ACL group has mixed case. Note that there is no ACL group existence validation upon record creation.