File issueshttps://community.opengroup.org/osdu/platform/system/file/-/issues2022-09-27T11:43:47Zhttps://community.opengroup.org/osdu/platform/system/file/-/issues/10Add driver field in /getLocation API response2022-09-27T11:43:47ZWei SunAdd driver field in /getLocation API responseIn current File service API design, the getLocation API is used to return signed URL only, but the backend storage providers have some http headers to optimize the data operation. I suggest to add new field for driver in getLocation resp...In current File service API design, the getLocation API is used to return signed URL only, but the backend storage providers have some http headers to optimize the data operation. I suggest to add new field for driver in getLocation response to enable client applications to upload file to the signed URL with optimized way.
origin:
```json
{
"FileID": "file ID",
"Location": {
"SignedURL": "GCS signed URL"
}
}
```
change to:
```json
{
"FileID": "file ID",
"Location": {
"SignedURL": "GCS signed URL"
"Driver": "GCS"
}
}
```https://community.opengroup.org/osdu/platform/system/file/-/issues/64ADR: Calculate Checksum before saving metadata2023-07-05T09:41:49ZParesh BehedeADR: Calculate Checksum before saving metadata# Decision Title
Calculate checksum of uploaded file before creating its metadata
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
We support dataset--File.Generic entity record...# Decision Title
Calculate checksum of uploaded file before creating its metadata
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
We support dataset--File.Generic entity record to be created in data platform while user hits /metadata endpoint of File Service. this schema has couple of useful attribute which we don't use as of now which is checksum and checksum algorigthm.
These attributes would be super useful to detect any duplicate file uploads in data platform.
## Mechanism for calculating checksum
I propose to implement new method in core module (lets say generateChecksum()) which can be implemented by every CSPs in provider module before we make call to storage service for saving metadata of file.
Now this method can be implemented in various ways and algorithms as per CSPs choice, for e.g., in Azure, we really don't need to generate checksum explicitly as its been calculated by blob store automatically, so implementation of generateChecksum() will be to just fetch the blob's metadata and they are done. similarly it can be implemented by other providers if there storage solution also supports calculating checksum while storing blob.
## Decision
We should generate checksum of single file before creating its metadata in data platform, so that we can provide that checksum value in metadata record (instance of dataset--File.Generic entity)M12 - Release 0.15Paresh BehedeParesh Behedehttps://community.opengroup.org/osdu/platform/system/file/-/issues/88[ADR] Dataset service security enhancments2023-07-10T10:43:58ZOm Prakash Gupta[ADR] Dataset service security enhancments# Decision Title
Security Enhancements for Dataset Service's Signed URL APIs
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about Fi...# Decision Title
Security Enhancements for Dataset Service's Signed URL APIs
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about File Service's `POST GetStorageInstructions` and `POST GetReterievalInstructions` APIs under the scenario of a malicious user getting hold of the generated signed URLs and using them to access files from storage. When Private Link is not a desired option to mitigate these concerns for the customer due to policy and deployment complexity reasons, the following enhancements are proposed to the two existing APIs and introducing a new API to alleviate the customer's security concerns.
## Decision
### Proposed Changes
1. For `POSTS GetStorageInstructions` API: Change default TTL from 7 days to 1 hour and make TTL configurable through a query Paramater `expiryTime` in Time Units Minutes,Hours,Days. The expiry time is capped at 7 Days if the time provided by the User exceeds the capped value. In absence of this parameter, the Signed URL would be valid for 1 Hour by default.
2. For `POST GetReterievalInstructions` API: Change default TTL from 7 days to 1 hour. and make TTL configurable through a query Paramater `expiryTime` in Time Units Minutes,Hours,Days. The expiry time is capped at 7 Days if the time provided by the User exceeds the capped value.
These two changes make the two APIs behave consistently also.
3. New API to revoke all Signed URLs generated for a specified storage account. Storage account is specified through a query parameter `storageAccount`. User can grab the storageAccount from the `GetReterievalInstructions` or `GetStorageInstructions` response.
POST api/Dateset/v1/revokeURLs
This API will use the `StorageAccountRevokeUserDelegationKeys` to revoke all the User Delegation Keys for the storage account and that will revoke all the User Delegation SAS tokens and thus invalidate all the Signed URLs.
4. Start using user-defined delegation keys for storage accounts rather than using storage account keys.
## Rationale
Shortened TTL for the Signed URLs decreases the Window of opportunities for a malicious user to use the Sighed URLs to access any sensitive information; Additional Revoking API provides customers a capability to mitigate the risk at the earliest moment if Signed URL leaking is detected.
## Consequences
**Caution**: SAS token in a Signed URL cannot be individually revoked. This API will revoke all SAS tokens generated and invalidate all signed URLs for that storage account. A user needs to send `GET uploadURL` and `GET downloadURL` requests again to generate new URLs. It should only be used when the customer knows for sure a signed URL has been compromised.
**Caution**: User Delegation Keys are cached by Azure Storage, so there may be a delay between when the user initiates the process of revocation and when an existing user delegation SAS becomes invalid. So after calling `POST revokeURLs`, wait for sometime and verify the compromised URL no longer works before sending `GET uploadURL` and `GET downloadURL` requests again.
These cautions need to be included in the file service open API spec and be communicated to customers clearly.
## Backward Compatibility
This is NOT a breaking change.https://community.opengroup.org/osdu/platform/system/file/-/issues/80ADR: Leverage File Service for Storage Operations2023-07-05T09:43:17ZElizabeth HalperADR: Leverage File Service for Storage Operations# Introduction
## Status
- [x] Initiated
- [x] Proposed
- [x] Under Review
- [ ] Approved
- [ ] Rejected
## Decision
All DMS's will leverage the File Service as a layer between the DMS and storage. Additionally, the File Service will...# Introduction
## Status
- [x] Initiated
- [x] Proposed
- [x] Under Review
- [ ] Approved
- [ ] Rejected
## Decision
All DMS's will leverage the File Service as a layer between the DMS and storage. Additionally, the File Service will provide methods through the exposed service interface to move data on different storage tiers for all DMS's in the most cost-effective manner based on how it is being used. File Service will provide an abstraction over all storage actions, including calls to the partition service. Therefore, this will not need to be implemented in services that don't have that functionality yet. All DDMS's will use File Service for the storage of binary data, and other services will be able to leverage File Service as well.
This decision is a proposed solution to the rejection of [this ADR](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/39)
![DataStorageFlow](/uploads/92dbaeae4a51fa893e59e8ff2cb7934c/DataStorageFlow.png)
### Add File Service Endpoints
Provide a new utility endpoint to retrieve the list of supported storage tiers. We will need to discuss how implement an endpoint that will return the list of supported storage tiers. Additionally, other functionality, such as finer-grained access which is required for SDMS storage procedures, will need to be included in File Service.
### Refactor DMS Dataset Requests
Instead of directly loading in the Cosmos client library to each DMS, we will send the REST requests above to the file service to add datasets to the database.
## Rationale
We want all capabilities regarding storage to be available for all DMS's with the smallest amount of variation possible. Additionally, by implementing these features once in one service (File Service), the community will save a lot of time because other services will not need to change when storage features change. The example we use above is storage tiers. Although this requires an initial investment in refactoring services to leverage File Service, we will ultimately be able to implement storage tiers for all DMS's without much change to the services themselves.
## Consequences
We will need to:
- Add this additional functionality to the File Service
- "Onboard" services to using the File Service for all their storage actions
- Refactor all services to make REST requests to File Service as opposed to directly using the library
- We would need to enforce uniformity of requests given different services will be adding storage tier to their models
These tasks take a lot of time and effort as well as collaboration across many parties. We will need all CSV's and ISV's to support this motion and contribute to ensuring all services are compliant with this decision.Elizabeth HalperElizabeth Halperhttps://community.opengroup.org/osdu/platform/system/file/-/issues/11ADR Master and Reference Schema versioning; SRN format2023-07-05T10:09:40ZKateryna Kurach (EPAM)ADR Master and Reference Schema versioning; SRN format### Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
### Context and Scope
## 1. Reference and Master Data Schema version format
Different aspects of OSDU Reference Schemas (RS), Reference OSDU Resources populated with speci...### Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
### Context and Scope
## 1. Reference and Master Data Schema version format
Different aspects of OSDU Reference Schemas (RS), Reference OSDU Resources populated with specific Reference Value Lists, and other OSDU schemas can change with time. It was discussed on the Data Definitions team and Reference Data Ingestion meetings that there are requirements to track these different categories of change/versioning. Many of the identified categories are below. We have added other versioning categories and clarifications as well.
1.1 For any OSDU schema, capture:
- **Schema version** - Describes the version of the Schema structure. Usually a new schema structure version will be delivered together with a new OSDU release, but minor schema versions may also be released (e.g. a schema change that simply adds a property (which is a non-breaking change)).
*Question: Is governance established that the schema version will be tracked by the schema name, or was this a temporary solution by Thomas Gehrmann? Is this documented somewhere? If not, then OSDU needs to establish the proper governance on this and document it.*
*Question: Are we capturing minor and major schema changes? If yes, how is each defined?*
- **Resource version** - Data change within the same schema version. The schema/structure itself didn't change, but a new version of the Resource was added to OSDU (e.g. because one or more property values needed to be updated). For most schemas, it is understood that data change simply creates a new Resource, with incremented version, with the different data values. However, this concept deserves special attention with Reference Data Values/Lists since changing some Reference Data Values can sometimes have massive and breaking data management consequences (e.g. Reference Lists classified by the DD&M subcommittee as “fixed” are defined by OSDU. This exact list is critical either to system functionality or to industry interoperability).
This version number must be incremented regardless of what the reason was for any change to the contents of the data, including the categories below in the Reference Values section.
- **Source** –Uniquely describes the system and/or organization from which this data object comes. Many different source-versions can attempt to identify the same real-world object (such as Wellbore) or activity (such as Production Volume reporting). (For a Wellbore, for example, this would be similar to PPDM’s WELL_VERSION.)
Ideally, we could track:
- Source to my organization (value would capture an outside organization) “data.DataSourceOrganisationID” property?
- Source system/application/database “source” property?
*Notes: This identifies a version data that attempts to define a real-world object or measurement, not a version of a data object that would need to be numerically incremented like the other version categories here.*
1.2 For Reference Values:
- **Reference Value data changes** – In addition to the general “version” resource property, the following properties are needed to better govern Reference Value lists:
- OSDU-governed: You might create a new version because of an OSDU-governed change to a reference list. The OSDU Reference List version must be captured, and incremented whenever an updated OSDU-governed list is published and subsequently used in a Reference Data resource. This applies to the OSDU-governed reference values in an “open” list, and to “fixed” governance categories of reference value lists, as determined by the OSDU Reference Values team. A way to capture this does not exist yet.
- Locally governed: You might create a new version because a governed Reference List for a particular implementation was updated, like at an operator (i.e. “open” and “local” reference list category). The locally-governed Reference List version must be captured and incremented whenever the local data governance group publishes and the list is susbsequently used in a Reference Data resource. This applies to the reference values in an “open” list, and to “local” governance categories of reference value lists, as determined by the OSDU Reference Values team. A way to capture this does not exist yet.
- **Attribution Authority**: For any reference value or reference list, those values and descriptions may have been created by OSDU or by an outside organization (such as PPDM or Energistics). Both OSDU and outside standards may change over time, so it is critical to capture both the source organization and the publication version of those outside standards used. This is already accommodated by the Attribution Authority, Publication, and Revision properties which are standard Reference Resource properties.
*Note: this is different than the “OSDU-governed” versioning category mentioned above. The OSDU-governed versioning category refers to a complete list of Reference Values for a particular reference object. The attribution authority is captured to each value individually. In other words, an OSDU-governed reference list could potentially include some values created by OSDU attribution authority and some from an outside attribution authority, but the list as a whole will be considered “OSDU-governed”.*
Summary: OSDU should establish clear governance to appropriately and consistently track these categories of versioning:
For any resource:
- Schema Version (might exist in schema name format; needs confirmation)
- Resource Version (exists)
Additional to Reference List resources:
- OSDU-governed list version (does not exist)
- Locally-governed list version (does not exist)
- Attribution Authority + Publication + Revision (exists)
The best solution would be to create appropriate properties for the version categories that do not yet exist.
In addition, OSDU should also capture the OSDU governance category of Reference Value Lists within the reference schema and resource itself: “Fixed”, “Open”, or “Local”. A way to capture this does not exist yet.
## 2. SRN format
Also, decision has to be taken regarding SRN format. It must be decided whether it has to contain corresponding schema version or not. Currently SRN doesn't contain a version (e.g. "srn:<namespace>:reference-data/VerticalCRS:MSL:").
*Note: Tentatively, we think that capturing Schema Version + Resource Version in the schema name would uniquely identify resource referenced (like a foreign key).*
For reference lists, you want to be able to identify the specific version of the reference list that a WPC (e.g.) references.
However, for a WPC (Marker, e.g.) to reference a parent Master object (Wellbore, e.g.), it doesn’t need to reference a specific point-in-time version of it; It should reference the most recent version.
If this is true, SRNs for Reference Data would need to include Schema + Resource Version in the SRN, but SRN would be more generic for all other group types.
Problem: SRN identity is uncertain.
A. Is SRN intended to uniquely define the physical real-world object in the case of Master Data (like a Wellbore)? If yes, then SRN should not contain version for Master Data references.
B. Or is SRN intended to uniquely define a data record with its version (like a GUID)? If yes, then Master Data Version should be included in the SRN.
It should not be used for both, but both must be accommodated by OSDU.
Some additional condiderations:
A. Version is NOT included in an SRN.
Pros:
- It simplifies end-user aggregation of data to a single parent record. Your WPCs, created at different times will be referencing the same Master data record, not a point-in-time older version of that Master record. Existing WPC are always in the "current" state and users do not have to enrich and create a new version of WPC each time corresponding RS or Master Data Schema changes.
Cons:
- It leaves the question open as to how you could have different Wellbore Versions (similar to WELL_VERSION in PPDM). It seems that this is not currently supported by the OSDU canonical schemas, but is a real use case – similar to the way you can have different versions of Trajectories in WPC.
- You can loose aspects of historical parent-child relationships/data lineage. For example, a Trajectory might have TVD calculated based on the “active” elevation of a particular Wellbore resource version. Then that Wellbore gets updated, and the newest version of that Wellbore record has a different active elevation type or active elevation value. Now the Trajectory file is out-of-sync in this regard with its parent Wellbore from that point-in-time.
B. Version is included in an SRN.
Pros:
- It it potentially allows you to have different Wellbore Versions (using UWI and Source, for example, as the natural key)
- Traceability and lineage of the data
Cons:
- Raises the question of how to uniquely identify the one physical wellbore, or the “gold” Wellbore record (similar to WELL in PPDM)
- Complexities with updating existing WPCs that have links to older versions of MDS. End-user aggregations can be disaligned if there are WPCs in the system that are linked to different schema versions.
- Another consideration is related to possible future search complexity. If SRN value changes, some WPC could be found using "new" SRN value and some WPCs should be found using "old" SRN value.
Users will have to implement additional enrichment workflows to solve the issues related to SRN version descrepancies (and probably develop some functions that will detect all "outdated" SRN links). That leads to high usage of computing resources (e.g. to change all WPC SRNs to point to the new version etc).
### Decision
- There is a requirement to track Schemas versioning. Decision has to be taken on the Schema version format (especially for Reference Schemas)
- Decision has to be taken on the SRN format: will it contain Schema version or not ("srn:<namespace>:reference-data/VerticalCRS:MSL:" VS "srn:<namespace>:reference-data/VerticalCRS.1.0.0:MSL:"
### Rationale
### Consequence
- No consequences for CSPs
- Consequences for majority of the OSDU services. Change in the Schema definition will lead to the change in the Manifest creation process as well as in Enrichment and Delivery API.https://community.opengroup.org/osdu/platform/system/file/-/issues/29ADR: Rationalizing File and File DMS services.2023-08-08T11:23:16ZKrishna Nikhil VedurumudiADR: Rationalizing File and File DMS services.# Decision Title
Rationalizing File and File DMS services.
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
The Dataset service in its implementation of the DMS APIs (getStor...# Decision Title
Rationalizing File and File DMS services.
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
The Dataset service in its implementation of the DMS APIs (getStorageInstructions and getRetrievalInstructions) delegates the call to specific DMS's implementation of the `get*Instructons` API. That lead to creation of new service called `File-DMS` which supports the APIs that Dataset Service requires.
> Note: The File-DMS does not have an approved ADR, and is not an approved by the community yet.
On the other hand, we already have a File Service that owns the responsibility for the management of files. Also, the design and APIs of the File service are approved by a thorough process of ADR.
So, instead of having two services who own the similar responsibility of managing files (File and File-DMS) we should rationalize both the services and move the APIs that Dataset Service requires to File Service itself. It would reduce the additional maintenance overhead of managing another service.
## Decision
The decision is to merge File and File DMS services and consolidate all the File management APIs in **File Service** keeping inline with the File Service ADR that was approved [File Service ADR](https://community.opengroup.org/osdu/platform/system/home/-/issues/47).
## Rationale
Having multiple services with similar functionalities and responsibilities is an additional overhead w.r.t maintenance. Since there is already one approved service for supporting Files, **the Additional APIs on Files required to support DMS Functionalities** should be hosted by File Service.
## Consequences
> The **DMS APIs** **getStorageInstructions** and **getRetrievalInstructions** for Files and File-Collections will be moved to File Service.
| Functionality | API to be Added in File Service | Existing API in File-DMS Service | Status | Capability |
|---------------------|-------------------------|-------------------------|--------|--------------------------------------------------|
| File Get-Storage Instructions | POST /files/storageInstructions | GET /file/getStorageInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to upload Files |
| File Get-Retrieval Instructions | POST /files/retrievalInstructions | POST /file/getRetrievalInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to download Files |
| File Collection Get-Storage Instructions | POST /file-collections/storageInstructions | GET /file-collection/getStorageInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to upload File Collections |
| File Collection Get-Retrieval Instructions | POST /file-collections/retrievalInstructions | POST /file-collection/getRetrievalInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to download File Collections |
### Pros of merging File DMS APIs into File Service
- Existing File Service clients will not be impacted because the existing APIs will continue to stay.
- The Core-Logic of handling File uploads and downloads already exist in File Service and are well tested. So, the DMS APIs that are required for [Dataset ADR](https://community.opengroup.org/osdu/platform/system/home/-/issues/57) will be able to directly hook into it reducing the development time.
- Features such as "Uploading to staging container" will be available out of the box. This reduces gaps between Dataset Architecture and existing File Service, which in-turn enables easier migration for clients.JoeMadhur Tanwani [Microsoft]Joehttps://community.opengroup.org/osdu/platform/system/file/-/issues/78ADR: Security Enhancements for File Service's Signed URL APIs2023-11-29T09:58:29ZLucy LiuADR: Security Enhancements for File Service's Signed URL APIs
# Decision Title
Security Enhancements for File Service's Signed URL APIs
## Status
- [X] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about File...
# Decision Title
Security Enhancements for File Service's Signed URL APIs
## Status
- [X] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about File Service's `GET uploadURL` and `GET downloadURL` APIs under the scenario of a malicious user getting hold of the generated signed URLs and using them to access files from storage. When Private Link is not a desired option to mitigate these concerns for the customer due to policy and deployment complexity reasons, the following enhancements are proposed to the two existing APIs and introducing a new API to alleviate the customer's security concerns.
## Decision
### Proposed Changes
1. For `GET uploadURL` API: Change default TTL from 7 days to 1 hour and make TTL configurable through a query Paramater `expiryTime` in Time Units Minutes,Hours,Days. The expiry time is capped at 7 Days if the time provided by the User exceeds the capped value. In absence of this parameter, the Signed URL would be valid for 1 Hour by default.
2. For `GET downloadURL` API: Change default TTL from 7 days to 1 hour. TTL is already configurable through the query Paramater `expiryTime`.
These two changes make the two APIs behave consistently also.
3. New API to revoke all Signed URLs generated for a specified storage account. Storage account is specified through a query parameter `storageAccount`. User can grab the storageAccount from the `GET uploadURL` or `GET downloadURL` response.
POST api/file/v2/files/revokeURLs
This API will use the `StorageAccountRevokeUserDelegationKeys` to revoke all the User Delegation Keys for the storage account and that will revoke all the User Delegation SAS tokens and thus invalidate all the Signed URLs.
## Rationale
Shortened TTL for the Signed URLs decreases the Window of opportunities for a malicious user to use the Sighed URLs to access any sensitive information; Additional Revoking API provides customers a capability to mitigate the risk at the earliest moment if Signed URL leaking is detected.
## Consequences
**Caution**: SAS token in a Signed URL cannot be individually revoked. This API will revoke all SAS tokens generated and invalidate all signed URLs for that storage account. A user needs to send `GET uploadURL` and `GET downloadURL` requests again to generate new URLs. It should only be used when the customer knows for sure a signed URL has been compromised.
**Caution**: User Delegation Keys are cached by Azure Storage, so there may be a delay between when the user initiates the process of revocation and when an existing user delegation SAS becomes invalid. So after calling `POST revokeURLs`, wait for sometime and verify the compromised URL no longer works before sending `GET uploadURL` and `GET downloadURL` requests again.
These cautions need to be included in the file service open API spec and be communicated to customers clearly.
## Backward Compatibility
This is NOT a breaking change.M18 - Release 0.21Om Prakash GuptaOm Prakash Guptahttps://community.opengroup.org/osdu/platform/system/file/-/issues/48ADR: Upgrade swagger version from swagger 2.0 to swagger 3.0 (OpenAPI 3)2022-05-12T06:27:04ZAman VermaADR: Upgrade swagger version from swagger 2.0 to swagger 3.0 (OpenAPI 3)## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. An OpenAPI file allows you to describe your e...## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. An OpenAPI file allows you to describe your entire API, including: endpoints, optional parameters, authentication methods and contact/ license information.
The latest swagger version has been released with name OpenAPI 3.0
### Why is the upgrade required
The OpenAPI 3.0 offers several new features such as [servers](https://swagger.io/docs/specification/api-host-and-base-path/) field, which enables us to write automation on top of it. Let's say if you have a tool which scans the REST endpoint exposed by your service for various reasons, it can consume the openapi.json directly and skim through all the end points in that service
### Scope
All the services
## Trade-off Analysis
- **Are there any breaking changes in REST API doc while upgrading from swagger 2.0 to swagger 3.0**: NO
- **Are there any breaking changes in UI**: There are no breaking changes in UI per say. If anything, it has become more fluid and lightweight.
- **How involved are the code changes**: For most of the services, it's a matter of upgrading the maven package with minor changes here and there. For few services the changes can be more involved as some the the tags have changes between 2.0 and 3.0, like `@Api` has becom `@Tag`. More details are documented here: https://springdoc.org/#migrating-from-springfox
- **Would the URL change**: YES. The swagger home page for 2.0 is `swagger-ui.html`, while for 3.0, it is `swagger-ui/index.html`. However, we also have an explicit endpoint `/swagger` (or equivalent) which can be leveraged to abstract the underlying swagger URL. Users can continue visiting the swagger homepage by hitting `/swagger` endpoint.
## Decision
cc: @kibattul @madhurtanwaniAman VermaAman Vermahttps://community.opengroup.org/osdu/platform/system/file/-/issues/16ADR Validation Service2020-10-26T13:53:53ZKateryna Kurach (EPAM)ADR Validation Service### Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
### Context and Scope
The problem and the solution that we are raising are related to the already submitted issue:
https://community.opengroup.org/osdu/platform/system/hom...### Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
### Context and Scope
The problem and the solution that we are raising are related to the already submitted issue:
https://community.opengroup.org/osdu/platform/system/home/-/issues/37
In additions to the scenarios described in the original issue, we see some scenarios when schema / data validation should be enforced. Some of these scenarios require more sophisticated validation approach than the one described in the original issue:
**A. Schema Structure Validation scenarios:**
1. Manifest Schema structure validation during Ingestion
Manifest file structure should be aligned with the corresponding Schema structure otherwise ingestion fails. It would improve user experience if there is a possibility to get a report with discrepancies between the Manifest file and corresponding Schema structure. In this case a user will know what parts of the Manifest must be corrected to run the Ingestion flow.
2. Continuous Schema structure change
It is likely that a specific WPC Schema structure will change with time. The complexity of change can be different: either 1 attribute can be added \ deleted or multiple. An operator will need to enrich older WPCs (created with the previous version) and create new versions of these WPCs. To simplify the process of needed WPCs selection, validation service can be run to compare older WPCs with the new schema structure and produce a report that will list all found discrepancies.
**B. Data Validation scenarios:**
3. Reference SRN checks
WPC and Master Data Manifests contain SRN references to Reference Data values (including “fixed” Reference data schemas values). That means that corresponding Reference Data must be ingested into OSDU before Master Data or WPC data are ingested.
If a user tries to ingest WPC / Master data when reference value does not exist ingestion should be terminated. User should know what caused ingestion process termination.
As a solution a validation check should be implemented on the ingestion step to help a user. Validation should be executed to check whether all Reference data values linked to SRNs in the Manifest are present in OSDU. A user should have an ability to get a report that will tell him which validation checks failed. In this case the user can ingest corresponding Reference data and then proceed with WPC or Master Data installation.
4. Master Data SRN checks
Similarly to the scenario described above, in the best case scenario Master data should be ingested before WPC.
However, if Master Data for WPC is unavailable, the ingestion workflow should be configurable:
a. Either WPC ingestion should be rejected
b. Or ingestion workflow should allow creation of “orphan” WPC (linked Master data doesn’t’ exist in OSDU, but WPC is created) and somehow “tag” properties that miss real SRN values. Enrichment of the “orphan” WPCs should be done later after corresponding Master data is ingested.
5. Multiple data quality scenarios
There is a need to provide a mechanism to do data quality checks in the Manifest file content. (e.g. validation that x, y coordinates in the resource correspond resource geo entity value; not null property value validation etc). These checks can be implemented during Ingestion and post-Ingestion.
### Suggested Implementation Approach
Suggested approach is to develop a Validation service that will provide an API contract to validate a virtual object (json Manifest). This will allow a user to run validation rules over stored resource record and over manifest in progress. That gives us a flexibility to run validation on different stages of data lifecycle.
Validation API will allow user to:
1. Register and store a validation rule
Rules should be configured as a pluggable code.
It is up to the individual Operator to create a code for pluggable rules.
We can consider supplying several rules out-of-the box. For example, rules #1 and #2 described above related to schema structure validation, can be created by the OSDU team. Also, rules #3 and #4 related to Reference and Master Data validation can be developed by the OSDU team, but the workflow configuration based on the validation results using these rules should be up to an Operator. It is up to an individual Operator to configure data quality rules.
2. Send a validation request
Validation Service calls can be “plugged in” into different OSDU services: Ingestion, Enrichment, pre-Ingestion, etc
3. Produce a response with validation rule results
Response can be generated in a different format (must be negotiated with the Data Definitions team):
- Updated original Manifest object: “Extended properties” block in the schema
- Updated original Manifest object: additional attribute can be developed to store property validation result
- Separate json file can be generated
Depending on the validation result, additional steps in the data workflow can be taken (e.g. ignore validation results and just store them or trigger another DAG).
**Pros of implementing Validation functionality as a service:**
1. Can work over physical resource record and over manifest that hasn’t been ingested yet
2. Validation requests can be sent by Java and Python applications
3. Validation checks can be configurable
4. Flexibility when validation check has to be applied (Pre-Ingestion, Ingestion, Enrichment etc).
![R3_Ingestion_-_R3_Functional_Mapping](/uploads/338a1a2c6fff120354c35bda8cc68016/R3_Ingestion_-_R3_Functional_Mapping.png)
![R3_Ingestion_-_Validation_Service](/uploads/779552d0366565cf953e571e0d047507/R3_Ingestion_-_Validation_Service.png)
### Rationale
### Consequence
Additional service development and maintenance.https://community.opengroup.org/osdu/platform/system/file/-/issues/70Already generated pre-signed url can be use to upload a file multiple time2022-09-27T11:18:15ZPiyush TalwalkarAlready generated pre-signed url can be use to upload a file multiple timeSteps to reproduce an issue:
- Generate Pre-Signed url
- Upload file using Pre-Signed url generated in step1
- Now post metadata for uploaded file and verify that file is moved to persistent location
- Now again try to upload another fil...Steps to reproduce an issue:
- Generate Pre-Signed url
- Upload file using Pre-Signed url generated in step1
- Now post metadata for uploaded file and verify that file is moved to persistent location
- Now again try to upload another file using same pre-signed url from step1
Expected Result:
User should able to Pre-Signed url only once for one file
Actual Result:
Already generated pre-signed url can be use to upload a file multiple timehttps://community.opengroup.org/osdu/platform/system/file/-/issues/26API Documentation missing indication that file is moved from staging area to ...2023-05-15T10:32:38ZAlan HensonAPI Documentation missing indication that file is moved from staging area to persistent areaIn reviewing the [OpenAPI Spec](https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yaml) for File Service, members of the meeting noticed that the OpenAPI Spec documentation for the `/v1/fil...In reviewing the [OpenAPI Spec](https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yaml) for File Service, members of the meeting noticed that the OpenAPI Spec documentation for the `/v1/files/metadata` does not mention that the endpoint moves the file from a staging area to a persistent area. The request is as follows:
- Update the OpenAPI Spec documentation linked above to mention the file is moved from a staging area to a persistent area
- The code that performs that operation within the above endpoint is found [here](https://community.opengroup.org/osdu/platform/system/file/-/blob/master/file-core/src/main/java/org/opengroup/osdu/file/service/FileMetadataService.java#L66)https://community.opengroup.org/osdu/platform/system/file/-/issues/86/api/file/v2/files bugs2023-09-26T14:22:03ZShane Hutchins/api/file/v2/files bugsReceived a response with 5xx status code: 500. I expected a 404 from these APIs but got 500 Internal Server Error.
Run this curl command to reproduce this failure:
GET /api/file/v2/files/%0A/metadata:
curl -X GET -H 'Authorization...Received a response with 5xx status code: 500. I expected a 404 from these APIs but got 500 Internal Server Error.
Run this curl command to reproduce this failure:
GET /api/file/v2/files/%0A/metadata:
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' https://osdu.r3m18.preshiptesting.osdu.aws/api/file/v2/files/%0A/metadata
DELETE /api/file/v2/files/%3B/metadata:
curl -X DELETE -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' https://osdu.r3m18.preshiptesting.osdu.aws/api/file/v2/files/%3B/metadata
GET /api/file/v2/files/%3B/downloadURL:
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' https://osdu.r3m18.preshiptesting.osdu.aws/api/file/v2/files/%3B/downloadURL
Confirmed this issue on AWS and Azure.
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'Cookie: JSESSIONID=SESSIONIDHERE' -H 'data-partition-id: opendes' https://osdu-ship.msft-osdu-test.org/api/file/v2/files/%0A/metadatahttps://community.opengroup.org/osdu/platform/system/file/-/issues/87Apply role-based access to File V2 endpoints.2023-08-07T11:13:22ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comApply role-based access to File V2 endpoints.File V2/DMS API doesn't use Authorization filters (@PreAuthorize), and doesn't evaluate the roles of the requester, which could lead to data leaks.
Also, it was marked as Hidden but this rule was not applied to the Infra level automatic...File V2/DMS API doesn't use Authorization filters (@PreAuthorize), and doesn't evaluate the roles of the requester, which could lead to data leaks.
Also, it was marked as Hidden but this rule was not applied to the Infra level automatically.
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/file-core/src/main/java/org/opengroup/osdu/file/api/FileDmsApi.java#L57
Potential issues:
- If not closed from Istio, data leaks are possible.
- Even if closed from the outside, authorization of internal requests will not be evaluated.M19 - Release 0.22Oleksandr Kosse (EPAM)Riabokon Stanislav(EPAM)[GCP]Andrei Dalhikh [EPAM/GC]Oleksandr Kosse (EPAM)https://community.opengroup.org/osdu/platform/system/file/-/issues/83Checksum values do not match up - value prior to upload, value as auto-popula...2023-07-10T08:37:54ZDebasis ChatterjeeChecksum values do not match up - value prior to upload, value as auto-populated by File service and persisted in Dataset recordPlease see my recent test in Azure/M17/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M17/Test_Plan_Results_M17/Core_Services/M17-Azuere-Core-File-and-Dataset-steps-Debasis.zip
Prior to uploading the ...Please see my recent test in Azure/M17/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M17/Test_Plan_Results_M17/Core_Services/M17-Azuere-Core-File-and-Dataset-steps-Debasis.zip
Prior to uploading the file, I found checksum value from Linux Operating system.
After the file is uploaded and Dataset record is created, I try to compare with the value as auto-populated by File Service.
The values do not match.
Please check this.M19 - Release 0.22Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/file/-/issues/44Compilation failure in Master2021-09-29T12:53:40ZAbhishek Kumar (SLB)Compilation failure in MasterThere is a compilation issue in the master branch.
@ethiraj : Please assign it to the right person.
Job [#619104](https://community.opengroup.org/osdu/platform/system/file/-/jobs/619104) failed for 384101a9fe1c781d5e2783fce313803344c62...There is a compilation issue in the master branch.
@ethiraj : Please assign it to the right person.
Job [#619104](https://community.opengroup.org/osdu/platform/system/file/-/jobs/619104) failed for 384101a9fe1c781d5e2783fce313803344c6276a:ethiraj krishnamanaiduRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRiabokon Stanislav(EPAM)[GCP]ethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/system/file/-/issues/49Compilation failure in Master Branch2021-11-18T12:19:28ZRiabokon Stanislav(EPAM)[GCP]Compilation failure in Master Branchhttps://community.opengroup.org/osdu/platform/system/file/-/pipelines/77475
`ERROR: Uploading artifacts as "archive" to coordinator... too large archive id=723684 responseStatus=413 Request Entity Too Large status=413 token=6k_YiZaq
FA...https://community.opengroup.org/osdu/platform/system/file/-/pipelines/77475
`ERROR: Uploading artifacts as "archive" to coordinator... too large archive id=723684 responseStatus=413 Request Entity Too Large status=413 token=6k_YiZaq
FATAL: too large `
Could you increase a limit?M10 - Release 0.13David Diederichd.diederich@opengroup.orgDavid Diederichd.diederich@opengroup.orghttps://community.opengroup.org/osdu/platform/system/file/-/issues/12Complete Azure File Service CI/CD and Tests2020-09-16T20:28:29ZDania Kodeih (Microsoft)Complete Azure File Service CI/CD and TestsM1 - Release 0.1Dania Kodeih (Microsoft)Dania Kodeih (Microsoft)https://community.opengroup.org/osdu/platform/system/file/-/issues/13Complete CI/CD and Integration Tests for Azure2020-10-15T19:39:39ZDania Kodeih (Microsoft)Complete CI/CD and Integration Tests for AzureM1 - Release 0.1https://community.opengroup.org/osdu/platform/system/file/-/issues/77Concurrent file metadata create calls with same file source path results in 5002022-10-03T15:49:23Zharshit aggarwalConcurrent file metadata create calls with same file source path results in 500During file metadata create flow if concurrent requests with same file source path are made it results in 500 errors, reason being the file gets cleaned up from staging location hence exceptions thrown at cloudStorageOperation.deleteFile...During file metadata create flow if concurrent requests with same file source path are made it results in 500 errors, reason being the file gets cleaned up from staging location hence exceptions thrown at cloudStorageOperation.deleteFile(stagingLocation) are handled as generic internal server errors which is not the correct response code
CSPs can handle exceptions in their implementation to throw 4xx exceptionshttps://community.opengroup.org/osdu/platform/system/file/-/issues/1[Data flow/Ingestion] Ingestion code sync from GitHub to ADO2023-04-25T13:50:36Zethiraj krishnamanaidu[Data flow/Ingestion] Ingestion code sync from GitHub to ADOGoogle's team is working on Ingestion services and the internal process is to push the code to GitHub which creates some challenges for the R2 Development team.
As discussed and agreed last week, we need to make sure that we push the I...Google's team is working on Ingestion services and the internal process is to push the code to GitHub which creates some challenges for the R2 Development team.
As discussed and agreed last week, we need to make sure that we push the Ingestion, File, and Delivery Service code from GitHub to ADO for the R2 Development team so that all cloud providers can contribute/develop SPIs.
@Stephen Henderson volunteered to work with the Google team(@fargyle) to set up the process to move the code from GitHub to ADO. The initial code is pushed to GitHub on Feb 10th.
* GitHub code sync with ADO, we need a process in place to sync every day and this is not a onetime task.
* Mon-repo structure, we need to follow the agreed-upon core services structure where each service is different repo. We are not going to discuss Mon-repo vs Multi-repo in R2, Again it does not matter how its managed in GitHub but when we push to ADO we need to follow the standards.
* Noticed os-core-common library within osdu-r2 and its duplicate, We need to make we use the core-common library that we have created for core-services.
* I don't see SPIs in the provider's folder, we will have to follow the core service package structure.
* Integration Tests
ethiraj krishnamanaiduFerris ArgyleJoeethiraj krishnamanaidu2020-02-21