File issueshttps://community.opengroup.org/osdu/platform/system/file/-/issues2023-11-29T09:58:29Zhttps://community.opengroup.org/osdu/platform/system/file/-/issues/78ADR: Security Enhancements for File Service's Signed URL APIs2023-11-29T09:58:29ZLucy LiuADR: Security Enhancements for File Service's Signed URL APIs
# Decision Title
Security Enhancements for File Service's Signed URL APIs
## Status
- [X] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about File...
# Decision Title
Security Enhancements for File Service's Signed URL APIs
## Status
- [X] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about File Service's `GET uploadURL` and `GET downloadURL` APIs under the scenario of a malicious user getting hold of the generated signed URLs and using them to access files from storage. When Private Link is not a desired option to mitigate these concerns for the customer due to policy and deployment complexity reasons, the following enhancements are proposed to the two existing APIs and introducing a new API to alleviate the customer's security concerns.
## Decision
### Proposed Changes
1. For `GET uploadURL` API: Change default TTL from 7 days to 1 hour and make TTL configurable through a query Paramater `expiryTime` in Time Units Minutes,Hours,Days. The expiry time is capped at 7 Days if the time provided by the User exceeds the capped value. In absence of this parameter, the Signed URL would be valid for 1 Hour by default.
2. For `GET downloadURL` API: Change default TTL from 7 days to 1 hour. TTL is already configurable through the query Paramater `expiryTime`.
These two changes make the two APIs behave consistently also.
3. New API to revoke all Signed URLs generated for a specified storage account. Storage account is specified through a query parameter `storageAccount`. User can grab the storageAccount from the `GET uploadURL` or `GET downloadURL` response.
POST api/file/v2/files/revokeURLs
This API will use the `StorageAccountRevokeUserDelegationKeys` to revoke all the User Delegation Keys for the storage account and that will revoke all the User Delegation SAS tokens and thus invalidate all the Signed URLs.
## Rationale
Shortened TTL for the Signed URLs decreases the Window of opportunities for a malicious user to use the Sighed URLs to access any sensitive information; Additional Revoking API provides customers a capability to mitigate the risk at the earliest moment if Signed URL leaking is detected.
## Consequences
**Caution**: SAS token in a Signed URL cannot be individually revoked. This API will revoke all SAS tokens generated and invalidate all signed URLs for that storage account. A user needs to send `GET uploadURL` and `GET downloadURL` requests again to generate new URLs. It should only be used when the customer knows for sure a signed URL has been compromised.
**Caution**: User Delegation Keys are cached by Azure Storage, so there may be a delay between when the user initiates the process of revocation and when an existing user delegation SAS becomes invalid. So after calling `POST revokeURLs`, wait for sometime and verify the compromised URL no longer works before sending `GET uploadURL` and `GET downloadURL` requests again.
These cautions need to be included in the file service open API spec and be communicated to customers clearly.
## Backward Compatibility
This is NOT a breaking change.M18 - Release 0.21Om Prakash GuptaOm Prakash Guptahttps://community.opengroup.org/osdu/platform/system/file/-/issues/29ADR: Rationalizing File and File DMS services.2023-08-08T11:23:16ZKrishna Nikhil VedurumudiADR: Rationalizing File and File DMS services.# Decision Title
Rationalizing File and File DMS services.
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
The Dataset service in its implementation of the DMS APIs (getStor...# Decision Title
Rationalizing File and File DMS services.
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [x] Approved
* [ ] Retired
## Context & Scope
The Dataset service in its implementation of the DMS APIs (getStorageInstructions and getRetrievalInstructions) delegates the call to specific DMS's implementation of the `get*Instructons` API. That lead to creation of new service called `File-DMS` which supports the APIs that Dataset Service requires.
> Note: The File-DMS does not have an approved ADR, and is not an approved by the community yet.
On the other hand, we already have a File Service that owns the responsibility for the management of files. Also, the design and APIs of the File service are approved by a thorough process of ADR.
So, instead of having two services who own the similar responsibility of managing files (File and File-DMS) we should rationalize both the services and move the APIs that Dataset Service requires to File Service itself. It would reduce the additional maintenance overhead of managing another service.
## Decision
The decision is to merge File and File DMS services and consolidate all the File management APIs in **File Service** keeping inline with the File Service ADR that was approved [File Service ADR](https://community.opengroup.org/osdu/platform/system/home/-/issues/47).
## Rationale
Having multiple services with similar functionalities and responsibilities is an additional overhead w.r.t maintenance. Since there is already one approved service for supporting Files, **the Additional APIs on Files required to support DMS Functionalities** should be hosted by File Service.
## Consequences
> The **DMS APIs** **getStorageInstructions** and **getRetrievalInstructions** for Files and File-Collections will be moved to File Service.
| Functionality | API to be Added in File Service | Existing API in File-DMS Service | Status | Capability |
|---------------------|-------------------------|-------------------------|--------|--------------------------------------------------|
| File Get-Storage Instructions | POST /files/storageInstructions | GET /file/getStorageInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to upload Files |
| File Get-Retrieval Instructions | POST /files/retrievalInstructions | POST /file/getRetrievalInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to download Files |
| File Collection Get-Storage Instructions | POST /file-collections/storageInstructions | GET /file-collection/getStorageInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to upload File Collections |
| File Collection Get-Retrieval Instructions | POST /file-collections/retrievalInstructions | POST /file-collection/getRetrievalInstructions | Moved from File-DMS Service to File Service | Generates the Signed URLs / temporary access tokens required to download File Collections |
### Pros of merging File DMS APIs into File Service
- Existing File Service clients will not be impacted because the existing APIs will continue to stay.
- The Core-Logic of handling File uploads and downloads already exist in File Service and are well tested. So, the DMS APIs that are required for [Dataset ADR](https://community.opengroup.org/osdu/platform/system/home/-/issues/57) will be able to directly hook into it reducing the development time.
- Features such as "Uploading to staging container" will be available out of the box. This reduces gaps between Dataset Architecture and existing File Service, which in-turn enables easier migration for clients.JoeMadhur Tanwani [Microsoft]Joehttps://community.opengroup.org/osdu/platform/system/file/-/issues/87Apply role-based access to File V2 endpoints.2023-08-07T11:13:22ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comApply role-based access to File V2 endpoints.File V2/DMS API doesn't use Authorization filters (@PreAuthorize), and doesn't evaluate the roles of the requester, which could lead to data leaks.
Also, it was marked as Hidden but this rule was not applied to the Infra level automatic...File V2/DMS API doesn't use Authorization filters (@PreAuthorize), and doesn't evaluate the roles of the requester, which could lead to data leaks.
Also, it was marked as Hidden but this rule was not applied to the Infra level automatically.
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/file-core/src/main/java/org/opengroup/osdu/file/api/FileDmsApi.java#L57
Potential issues:
- If not closed from Istio, data leaks are possible.
- Even if closed from the outside, authorization of internal requests will not be evaluated.M19 - Release 0.22Oleksandr Kosse (EPAM)Riabokon Stanislav(EPAM)[GCP]Andrei Dalhikh [EPAM/GC]Oleksandr Kosse (EPAM)https://community.opengroup.org/osdu/platform/system/file/-/issues/90Fixing sonar quality issues2023-07-28T09:39:01ZGauri ChitaleFixing sonar quality issuesSonarqube analysis has reported multiple code smells in the file service code.Sonarqube analysis has reported multiple code smells in the file service code.https://community.opengroup.org/osdu/platform/system/file/-/issues/88[ADR] Dataset service security enhancments2023-07-10T10:43:58ZOm Prakash Gupta[ADR] Dataset service security enhancments# Decision Title
Security Enhancements for Dataset Service's Signed URL APIs
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about Fi...# Decision Title
Security Enhancements for Dataset Service's Signed URL APIs
## Status
- [X] Proposed
- [ ] Trialing
- [ ] Under review
- [ ] Approved
- [ ] Retired
## Context & Scope
A customer has voiced a security concern about File Service's `POST GetStorageInstructions` and `POST GetReterievalInstructions` APIs under the scenario of a malicious user getting hold of the generated signed URLs and using them to access files from storage. When Private Link is not a desired option to mitigate these concerns for the customer due to policy and deployment complexity reasons, the following enhancements are proposed to the two existing APIs and introducing a new API to alleviate the customer's security concerns.
## Decision
### Proposed Changes
1. For `POSTS GetStorageInstructions` API: Change default TTL from 7 days to 1 hour and make TTL configurable through a query Paramater `expiryTime` in Time Units Minutes,Hours,Days. The expiry time is capped at 7 Days if the time provided by the User exceeds the capped value. In absence of this parameter, the Signed URL would be valid for 1 Hour by default.
2. For `POST GetReterievalInstructions` API: Change default TTL from 7 days to 1 hour. and make TTL configurable through a query Paramater `expiryTime` in Time Units Minutes,Hours,Days. The expiry time is capped at 7 Days if the time provided by the User exceeds the capped value.
These two changes make the two APIs behave consistently also.
3. New API to revoke all Signed URLs generated for a specified storage account. Storage account is specified through a query parameter `storageAccount`. User can grab the storageAccount from the `GetReterievalInstructions` or `GetStorageInstructions` response.
POST api/Dateset/v1/revokeURLs
This API will use the `StorageAccountRevokeUserDelegationKeys` to revoke all the User Delegation Keys for the storage account and that will revoke all the User Delegation SAS tokens and thus invalidate all the Signed URLs.
4. Start using user-defined delegation keys for storage accounts rather than using storage account keys.
## Rationale
Shortened TTL for the Signed URLs decreases the Window of opportunities for a malicious user to use the Sighed URLs to access any sensitive information; Additional Revoking API provides customers a capability to mitigate the risk at the earliest moment if Signed URL leaking is detected.
## Consequences
**Caution**: SAS token in a Signed URL cannot be individually revoked. This API will revoke all SAS tokens generated and invalidate all signed URLs for that storage account. A user needs to send `GET uploadURL` and `GET downloadURL` requests again to generate new URLs. It should only be used when the customer knows for sure a signed URL has been compromised.
**Caution**: User Delegation Keys are cached by Azure Storage, so there may be a delay between when the user initiates the process of revocation and when an existing user delegation SAS becomes invalid. So after calling `POST revokeURLs`, wait for sometime and verify the compromised URL no longer works before sending `GET uploadURL` and `GET downloadURL` requests again.
These cautions need to be included in the file service open API spec and be communicated to customers clearly.
## Backward Compatibility
This is NOT a breaking change.https://community.opengroup.org/osdu/platform/system/file/-/issues/11ADR Master and Reference Schema versioning; SRN format2023-07-05T10:09:40ZKateryna Kurach (EPAM)ADR Master and Reference Schema versioning; SRN format### Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
### Context and Scope
## 1. Reference and Master Data Schema version format
Different aspects of OSDU Reference Schemas (RS), Reference OSDU Resources populated with speci...### Change Type:
- [X] Feature
- [ ] Bugfix
- [ ] Refactoring
### Context and Scope
## 1. Reference and Master Data Schema version format
Different aspects of OSDU Reference Schemas (RS), Reference OSDU Resources populated with specific Reference Value Lists, and other OSDU schemas can change with time. It was discussed on the Data Definitions team and Reference Data Ingestion meetings that there are requirements to track these different categories of change/versioning. Many of the identified categories are below. We have added other versioning categories and clarifications as well.
1.1 For any OSDU schema, capture:
- **Schema version** - Describes the version of the Schema structure. Usually a new schema structure version will be delivered together with a new OSDU release, but minor schema versions may also be released (e.g. a schema change that simply adds a property (which is a non-breaking change)).
*Question: Is governance established that the schema version will be tracked by the schema name, or was this a temporary solution by Thomas Gehrmann? Is this documented somewhere? If not, then OSDU needs to establish the proper governance on this and document it.*
*Question: Are we capturing minor and major schema changes? If yes, how is each defined?*
- **Resource version** - Data change within the same schema version. The schema/structure itself didn't change, but a new version of the Resource was added to OSDU (e.g. because one or more property values needed to be updated). For most schemas, it is understood that data change simply creates a new Resource, with incremented version, with the different data values. However, this concept deserves special attention with Reference Data Values/Lists since changing some Reference Data Values can sometimes have massive and breaking data management consequences (e.g. Reference Lists classified by the DD&M subcommittee as “fixed” are defined by OSDU. This exact list is critical either to system functionality or to industry interoperability).
This version number must be incremented regardless of what the reason was for any change to the contents of the data, including the categories below in the Reference Values section.
- **Source** –Uniquely describes the system and/or organization from which this data object comes. Many different source-versions can attempt to identify the same real-world object (such as Wellbore) or activity (such as Production Volume reporting). (For a Wellbore, for example, this would be similar to PPDM’s WELL_VERSION.)
Ideally, we could track:
- Source to my organization (value would capture an outside organization) “data.DataSourceOrganisationID” property?
- Source system/application/database “source” property?
*Notes: This identifies a version data that attempts to define a real-world object or measurement, not a version of a data object that would need to be numerically incremented like the other version categories here.*
1.2 For Reference Values:
- **Reference Value data changes** – In addition to the general “version” resource property, the following properties are needed to better govern Reference Value lists:
- OSDU-governed: You might create a new version because of an OSDU-governed change to a reference list. The OSDU Reference List version must be captured, and incremented whenever an updated OSDU-governed list is published and subsequently used in a Reference Data resource. This applies to the OSDU-governed reference values in an “open” list, and to “fixed” governance categories of reference value lists, as determined by the OSDU Reference Values team. A way to capture this does not exist yet.
- Locally governed: You might create a new version because a governed Reference List for a particular implementation was updated, like at an operator (i.e. “open” and “local” reference list category). The locally-governed Reference List version must be captured and incremented whenever the local data governance group publishes and the list is susbsequently used in a Reference Data resource. This applies to the reference values in an “open” list, and to “local” governance categories of reference value lists, as determined by the OSDU Reference Values team. A way to capture this does not exist yet.
- **Attribution Authority**: For any reference value or reference list, those values and descriptions may have been created by OSDU or by an outside organization (such as PPDM or Energistics). Both OSDU and outside standards may change over time, so it is critical to capture both the source organization and the publication version of those outside standards used. This is already accommodated by the Attribution Authority, Publication, and Revision properties which are standard Reference Resource properties.
*Note: this is different than the “OSDU-governed” versioning category mentioned above. The OSDU-governed versioning category refers to a complete list of Reference Values for a particular reference object. The attribution authority is captured to each value individually. In other words, an OSDU-governed reference list could potentially include some values created by OSDU attribution authority and some from an outside attribution authority, but the list as a whole will be considered “OSDU-governed”.*
Summary: OSDU should establish clear governance to appropriately and consistently track these categories of versioning:
For any resource:
- Schema Version (might exist in schema name format; needs confirmation)
- Resource Version (exists)
Additional to Reference List resources:
- OSDU-governed list version (does not exist)
- Locally-governed list version (does not exist)
- Attribution Authority + Publication + Revision (exists)
The best solution would be to create appropriate properties for the version categories that do not yet exist.
In addition, OSDU should also capture the OSDU governance category of Reference Value Lists within the reference schema and resource itself: “Fixed”, “Open”, or “Local”. A way to capture this does not exist yet.
## 2. SRN format
Also, decision has to be taken regarding SRN format. It must be decided whether it has to contain corresponding schema version or not. Currently SRN doesn't contain a version (e.g. "srn:<namespace>:reference-data/VerticalCRS:MSL:").
*Note: Tentatively, we think that capturing Schema Version + Resource Version in the schema name would uniquely identify resource referenced (like a foreign key).*
For reference lists, you want to be able to identify the specific version of the reference list that a WPC (e.g.) references.
However, for a WPC (Marker, e.g.) to reference a parent Master object (Wellbore, e.g.), it doesn’t need to reference a specific point-in-time version of it; It should reference the most recent version.
If this is true, SRNs for Reference Data would need to include Schema + Resource Version in the SRN, but SRN would be more generic for all other group types.
Problem: SRN identity is uncertain.
A. Is SRN intended to uniquely define the physical real-world object in the case of Master Data (like a Wellbore)? If yes, then SRN should not contain version for Master Data references.
B. Or is SRN intended to uniquely define a data record with its version (like a GUID)? If yes, then Master Data Version should be included in the SRN.
It should not be used for both, but both must be accommodated by OSDU.
Some additional condiderations:
A. Version is NOT included in an SRN.
Pros:
- It simplifies end-user aggregation of data to a single parent record. Your WPCs, created at different times will be referencing the same Master data record, not a point-in-time older version of that Master record. Existing WPC are always in the "current" state and users do not have to enrich and create a new version of WPC each time corresponding RS or Master Data Schema changes.
Cons:
- It leaves the question open as to how you could have different Wellbore Versions (similar to WELL_VERSION in PPDM). It seems that this is not currently supported by the OSDU canonical schemas, but is a real use case – similar to the way you can have different versions of Trajectories in WPC.
- You can loose aspects of historical parent-child relationships/data lineage. For example, a Trajectory might have TVD calculated based on the “active” elevation of a particular Wellbore resource version. Then that Wellbore gets updated, and the newest version of that Wellbore record has a different active elevation type or active elevation value. Now the Trajectory file is out-of-sync in this regard with its parent Wellbore from that point-in-time.
B. Version is included in an SRN.
Pros:
- It it potentially allows you to have different Wellbore Versions (using UWI and Source, for example, as the natural key)
- Traceability and lineage of the data
Cons:
- Raises the question of how to uniquely identify the one physical wellbore, or the “gold” Wellbore record (similar to WELL in PPDM)
- Complexities with updating existing WPCs that have links to older versions of MDS. End-user aggregations can be disaligned if there are WPCs in the system that are linked to different schema versions.
- Another consideration is related to possible future search complexity. If SRN value changes, some WPC could be found using "new" SRN value and some WPCs should be found using "old" SRN value.
Users will have to implement additional enrichment workflows to solve the issues related to SRN version descrepancies (and probably develop some functions that will detect all "outdated" SRN links). That leads to high usage of computing resources (e.g. to change all WPC SRNs to point to the new version etc).
### Decision
- There is a requirement to track Schemas versioning. Decision has to be taken on the Schema version format (especially for Reference Schemas)
- Decision has to be taken on the SRN format: will it contain Schema version or not ("srn:<namespace>:reference-data/VerticalCRS:MSL:" VS "srn:<namespace>:reference-data/VerticalCRS.1.0.0:MSL:"
### Rationale
### Consequence
- No consequences for CSPs
- Consequences for majority of the OSDU services. Change in the Schema definition will lead to the change in the Manifest creation process as well as in Enrichment and Delivery API.https://community.opengroup.org/osdu/platform/system/file/-/issues/64ADR: Calculate Checksum before saving metadata2023-07-05T09:41:49ZParesh BehedeADR: Calculate Checksum before saving metadata# Decision Title
Calculate checksum of uploaded file before creating its metadata
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
We support dataset--File.Generic entity record...# Decision Title
Calculate checksum of uploaded file before creating its metadata
## Status
- [X] Proposed
- [X] Trialing
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
We support dataset--File.Generic entity record to be created in data platform while user hits /metadata endpoint of File Service. this schema has couple of useful attribute which we don't use as of now which is checksum and checksum algorigthm.
These attributes would be super useful to detect any duplicate file uploads in data platform.
## Mechanism for calculating checksum
I propose to implement new method in core module (lets say generateChecksum()) which can be implemented by every CSPs in provider module before we make call to storage service for saving metadata of file.
Now this method can be implemented in various ways and algorithms as per CSPs choice, for e.g., in Azure, we really don't need to generate checksum explicitly as its been calculated by blob store automatically, so implementation of generateChecksum() will be to just fetch the blob's metadata and they are done. similarly it can be implemented by other providers if there storage solution also supports calculating checksum while storing blob.
## Decision
We should generate checksum of single file before creating its metadata in data platform, so that we can provide that checksum value in metadata record (instance of dataset--File.Generic entity)M12 - Release 0.15Paresh BehedeParesh Behedehttps://community.opengroup.org/osdu/platform/system/file/-/issues/26API Documentation missing indication that file is moved from staging area to ...2023-05-15T10:32:38ZAlan HensonAPI Documentation missing indication that file is moved from staging area to persistent areaIn reviewing the [OpenAPI Spec](https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yaml) for File Service, members of the meeting noticed that the OpenAPI Spec documentation for the `/v1/fil...In reviewing the [OpenAPI Spec](https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yaml) for File Service, members of the meeting noticed that the OpenAPI Spec documentation for the `/v1/files/metadata` does not mention that the endpoint moves the file from a staging area to a persistent area. The request is as follows:
- Update the OpenAPI Spec documentation linked above to mention the file is moved from a staging area to a persistent area
- The code that performs that operation within the above endpoint is found [here](https://community.opengroup.org/osdu/platform/system/file/-/blob/master/file-core/src/main/java/org/opengroup/osdu/file/service/FileMetadataService.java#L66)https://community.opengroup.org/osdu/platform/system/file/-/issues/3[Data flow/Ingestion] Pluggable ingestion routines support2023-04-25T13:50:37ZRaj Kannan[Data flow/Ingestion] Pluggable ingestion routines supportIs the vision to expand the data types in future in a pluggable manner (ala the role of the registry?) – at the moment I see this as a enum that limits a consumer to add additional types and workflows to process these from input files.Is the vision to expand the data types in future in a pluggable manner (ala the role of the registry?) – at the moment I see this as a enum that limits a consumer to add additional types and workflows to process these from input files.M1 - Release 0.1Ferris ArgyleFerris Argylehttps://community.opengroup.org/osdu/platform/system/file/-/issues/2[Data flow/Ingestion] Support many to many correlation between ingestion file...2023-04-25T13:50:37ZRaj Kannan[Data flow/Ingestion] Support many to many correlation between ingestion file and kindsThere is a many to many correlation between an ingestion file and datatypes/kinds in schema – therefore a call to submit a job with just a file id and data type seems misleading:
* For example a shape file or a RESQML file is actually...There is a many to many correlation between an ingestion file and datatypes/kinds in schema – therefore a call to submit a job with just a file id and data type seems misleading:
* For example a shape file or a RESQML file is actually many OS files that are collectively called as RESQML or shape. In some cases you may have a tarball or zip or sorts with a manifest, but in other cases these are separate files altogether.
* Likewise a single file can have many different entities from different kinds as well – for example a LAS file would have the wellbore WP/WPC, log channel set WPC and log curve WPC entities in that file.
Ferris ArgyleFerris Argylehttps://community.opengroup.org/osdu/platform/system/file/-/issues/1[Data flow/Ingestion] Ingestion code sync from GitHub to ADO2023-04-25T13:50:36Zethiraj krishnamanaidu[Data flow/Ingestion] Ingestion code sync from GitHub to ADOGoogle's team is working on Ingestion services and the internal process is to push the code to GitHub which creates some challenges for the R2 Development team.
As discussed and agreed last week, we need to make sure that we push the I...Google's team is working on Ingestion services and the internal process is to push the code to GitHub which creates some challenges for the R2 Development team.
As discussed and agreed last week, we need to make sure that we push the Ingestion, File, and Delivery Service code from GitHub to ADO for the R2 Development team so that all cloud providers can contribute/develop SPIs.
@Stephen Henderson volunteered to work with the Google team(@fargyle) to set up the process to move the code from GitHub to ADO. The initial code is pushed to GitHub on Feb 10th.
* GitHub code sync with ADO, we need a process in place to sync every day and this is not a onetime task.
* Mon-repo structure, we need to follow the agreed-upon core services structure where each service is different repo. We are not going to discuss Mon-repo vs Multi-repo in R2, Again it does not matter how its managed in GitHub but when we push to ADO we need to follow the standards.
* Noticed os-core-common library within osdu-r2 and its duplicate, We need to make we use the core-common library that we have created for core-services.
* I don't see SPIs in the provider's folder, we will have to follow the core service package structure.
* Integration Tests
ethiraj krishnamanaiduFerris ArgyleJoeethiraj krishnamanaidu2020-02-21https://community.opengroup.org/osdu/platform/system/file/-/issues/35File should get download with actual filename and extension after hitting dow...2023-04-13T10:39:15Zsachin GuptaFile should get download with actual filename and extension after hitting downloadUrl### Problem Statement
Currently, the file service stores file in the persistent location with some random name and without extension. When we download the file using download SignedURL, it's download without extension and to see the co...### Problem Statement
Currently, the file service stores file in the persistent location with some random name and without extension. When we download the file using download SignedURL, it's download without extension and to see the content of the file we need to explicitly give an extension.
### Solution
We can overcome this problem by adding Content-Disposition and Content-Type header at the time of download signed URL creation. We can get file name from file metadata payload and based on file extension we can get content type of the file, both name and content type we can use to create download URL.
Note: the name field in the metadata payload is optional and if it is not present in the payload, then the above solution won't work for that and follow the current implementation of download URL creation.M9 - Release 0.12Paresh BehedeParesh Behedehttps://community.opengroup.org/osdu/platform/system/file/-/issues/82File Services Context Path2023-03-28T13:03:00ZThulasi Dass SubramanianFile Services Context PathCurrent File service settings
- context path: `server.servlet.contextPath=/api/file/`
- Endpoints: eg for downloadURL operation: `/v2/files/{id}/downloadURL
1. All API Endpoints have **/v2/** prefixed to the **RequestMapping** Path (_Sc...Current File service settings
- context path: `server.servlet.contextPath=/api/file/`
- Endpoints: eg for downloadURL operation: `/v2/files/{id}/downloadURL
1. All API Endpoints have **/v2/** prefixed to the **RequestMapping** Path (_Screenshot attached below_)
1. To be consistent for **swagger ui** and **api-docs** path customization and also to make easy versioning of the API,
_Can we add '**v2**' to the context path and remove it from all endpoints_?
- context path: `server.servlet.contextPath=/api/file/v2`
- Endpoints: eg for downloadURL operation: `/files/{id}/downloadURL`
**Consequences:**
- There will not be any changes w.r.t to Consumers of the API
- Its internal refactoring of the maintenance of the Base Path and Version
CSP can provide their inputs if we see any breaking changes or any settings need to be updated.
![image](/uploads/9878bb676457816652e6760868d73002/image.png)M17 - Release 0.20Thulasi Dass SubramanianThulasi Dass Subramanianhttps://community.opengroup.org/osdu/platform/system/file/-/issues/76POST files/metadata Re-try Failure due to Staging File being Deleted Pre-matu...2023-03-09T15:24:22ZLucy LiuPOST files/metadata Re-try Failure due to Staging File being Deleted Pre-maturelyAn issue was observed in POST files/metadata retries during MSFT use of this API in M12: retries performed for a failed POST files/metadata API are likely to result in 400s errors no matter how many times retry is performed. Further inve...An issue was observed in POST files/metadata retries during MSFT use of this API in M12: retries performed for a failed POST files/metadata API are likely to result in 400s errors no matter how many times retry is performed. Further investigation shows the root cause is that when metadata creation failed, staging file is also deleted. Thus subsequent retries with the same source file ID that mapped to the deleted staging file will result in failure. The staging file should not be deleted pre-maturely if metadata creation failed.
Current workaround is to perform the extra two steps to upload the file to staging again and then retry POST files/metadata:
1. Get a signed URL by calling File location API
2. Upload File to blob storage using signed url
3. Create the metadata using POST Metadata API
Suggested fix:
1. In FileMetadataService::saveMetadata, move the deleteStagingFile step to the last step right before successful return. So that staging file will only be deleted when everything succeeds.
2. Check staging file existence before deleting. Catch and ignore exceptions thrown from staging file delete. Staging file deletion failure is very rare but could happen under special concurrency situations: simultaneous calls for Metadata create with same payload results to one of the delete failure because file already deleted by the other caller. Failed staging file deletion should not invalidate successful metadata creation.M17 - Release 0.20Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/file/-/issues/63Preloadfilepath & ExtensionProperties removed from file Metadata API2022-11-28T14:10:32Zivar SoerheimPreloadfilepath & ExtensionProperties removed from file Metadata APIDuring ingestion of file metadata under /files/metadata using POST command the Preloadfilepath and ExtensionProperties are not persisted when returning the record post ingest.
This seems like strange behaviour to me. I would like to ei...During ingestion of file metadata under /files/metadata using POST command the Preloadfilepath and ExtensionProperties are not persisted when returning the record post ingest.
This seems like strange behaviour to me. I would like to either understand why this happens, or extend the file metadata api so these properties are not removed.
This is the workflow:
1. Get Signed URL for upload
2. Upload file using signed URL
3. Upload file metadata using file api (this returns ID of created record and can be searched in storage)
4. Refer to this ID when creating well log record or any other record
The problem with this workflow is that:
- PreloadFilePath and ExtensionProperties are removed from the record during metadata uploadhttps://community.opengroup.org/osdu/platform/system/file/-/issues/79File ci cd pipelines do not use file-test-core-bdd with vital test cases.2022-11-04T10:15:34ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comFile ci cd pipelines do not use file-test-core-bdd with vital test cases.There is a BDD tests defined in the File testing module: <br/>
https://community.opengroup.org/osdu/platform/system/file/-/tree/master/testing/file-test-core-bdd <br/>
They get test case updates with new feature introductions, for exampl...There is a BDD tests defined in the File testing module: <br/>
https://community.opengroup.org/osdu/platform/system/file/-/tree/master/testing/file-test-core-bdd <br/>
They get test case updates with new feature introductions, for example: <br/>
https://community.opengroup.org/osdu/platform/system/file/-/merge_requests/138/diffs#d67c53013c6814c8d874d0daf0cffc9179ad1d00 <br/>
But they are not used in cicd pipelines, which left those features not cowered. <br/>
And it looks like because of ignoring them for a long time, tests have some compatibility issues which leads to runtime errors like
~~~
java.lang.NoClassDefFoundError: Could not initialize class io.restassured.RestAssured
at org.opengroup.osdu.file.util.test.RestAssuredClient.<init>(RestAssuredClient.java:30)
at org.opengroup.osdu.file.util.test.HttpClientFactory.getInstance(HttpClientFactory.java:8)
at org.opengroup.osdu.file.stepdefs.FileStepDef_GET.lambda$new$1(FileStepDef_GET.java:76)
~~~
Keeping them ignored may cause issues with feature introduction and verification. <br/>
There are several possible solutions: <br/>
- Fix and enable file-test-core-bdd tests in the integration step
- Copy missing tests from to file-test-CSP_PROVIDER_MODULERustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/file/-/issues/55Update Swagger documentation for end point - File uploadURL2022-09-29T13:30:08ZDebasis ChatterjeeUpdate Swagger documentation for end point - File uploadURLSee this below-
![API-File-service](/uploads/fb1508499115056f6ccd8abbb10c9a01/API-File-service.PNG)
Should be lowercase "**uploadURL**"
{{FILE_HOST}}/files/uploadURLSee this below-
![API-File-service](/uploads/fb1508499115056f6ccd8abbb10c9a01/API-File-service.PNG)
Should be lowercase "**uploadURL**"
{{FILE_HOST}}/files/uploadURLM14 - Release 0.17Shrikant GargShrikant Garghttps://community.opengroup.org/osdu/platform/system/file/-/issues/51Update swagger documentation for end point "downloadURL"2022-09-29T13:29:51ZDebasis ChatterjeeUpdate swagger documentation for end point "downloadURL"Please check here -
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yaml
GET {{FILE_HOST}}/files/{{FILE_ID}}/**downloadURL**
Swagger documentation shows **DownloadURL**
cc - @harshit2...Please check here -
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yaml
GET {{FILE_HOST}}/files/{{FILE_ID}}/**downloadURL**
Swagger documentation shows **DownloadURL**
cc - @harshit283 for informationM14 - Release 0.17Shrikant GargShrikant Garghttps://community.opengroup.org/osdu/platform/system/file/-/issues/20Metadata request payload schema not aligned to API spec2022-09-28T03:11:27Zdevesh bajpaiMetadata request payload schema not aligned to API specMetadata request payload is not aligned to api spec
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yamlMetadata request payload is not aligned to api spec
https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/file-service_openapi.yamlhttps://community.opengroup.org/osdu/platform/system/file/-/issues/10Add driver field in /getLocation API response2022-09-27T11:43:47ZWei SunAdd driver field in /getLocation API responseIn current File service API design, the getLocation API is used to return signed URL only, but the backend storage providers have some http headers to optimize the data operation. I suggest to add new field for driver in getLocation resp...In current File service API design, the getLocation API is used to return signed URL only, but the backend storage providers have some http headers to optimize the data operation. I suggest to add new field for driver in getLocation response to enable client applications to upload file to the signed URL with optimized way.
origin:
```json
{
"FileID": "file ID",
"Location": {
"SignedURL": "GCS signed URL"
}
}
```
change to:
```json
{
"FileID": "file ID",
"Location": {
"SignedURL": "GCS signed URL"
"Driver": "GCS"
}
}
```