File issueshttps://community.opengroup.org/osdu/platform/system/file/-/issues2022-12-12T15:21:28Zhttps://community.opengroup.org/osdu/platform/system/file/-/issues/75Need to bypass checksum generation for file size more than 5 gbs2022-12-12T15:21:28ZParesh BehedeNeed to bypass checksum generation for file size more than 5 gbsWhile user makes a call to POST /metadata api endpoint for registering file on data platform, before saving that as record, file service generates checksum of file provided in request to help duplicate detection for further downstream wo...While user makes a call to POST /metadata api endpoint for registering file on data platform, before saving that as record, file service generates checksum of file provided in request to help duplicate detection for further downstream workflows.
As this is HTTP blocking call checksum calculation takes quite a long if file size is huge (like more than 3-5 gbs) and HTTP post call gets hang and never respond.
**We have tested checksum generation and metadata registration takes about 2 mins for file size of 5 GB.**
We have experienced this when one of the user tried uploading file size of 14 GBs.
Though percentage of such a huge file being uploaded is quite low we still need to allow them to register metadata and to enable that we must bypass checksum generation logic for such a huge file sizes.
By doing this we still enable duplicate detection ability (by calculating and saving file checksum in storage record) for majority of files uploaded like for 95% of the files and we ignore that for 5% of the request.
Also to enable this for rest of the 5% of file requests we can think of async way to calculate checksum and update the storage record later.M15 - Release 0.18https://community.opengroup.org/osdu/platform/system/file/-/issues/83Checksum values do not match up - value prior to upload, value as auto-popula...2023-07-10T08:37:54ZDebasis ChatterjeeChecksum values do not match up - value prior to upload, value as auto-populated by File service and persisted in Dataset recordPlease see my recent test in Azure/M17/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M17/Test_Plan_Results_M17/Core_Services/M17-Azuere-Core-File-and-Dataset-steps-Debasis.zip
Prior to uploading the ...Please see my recent test in Azure/M17/Preship.
https://community.opengroup.org/osdu/platform/pre-shipping/-/blob/main/R3-M17/Test_Plan_Results_M17/Core_Services/M17-Azuere-Core-File-and-Dataset-steps-Debasis.zip
Prior to uploading the file, I found checksum value from Linux Operating system.
After the file is uploaded and Dataset record is created, I try to compare with the value as auto-populated by File Service.
The values do not match.
Please check this.M19 - Release 0.22Chad LeongChad Leonghttps://community.opengroup.org/osdu/platform/system/file/-/issues/89Unable to download the file2023-08-08T16:30:43ZSachin JaiswalUnable to download the file**Problem**
File download fails if file name contains the special character like ","
**Steps to reproduce**
- Generate a signedurl to upload a file (tesing,copy).
- Upload a file
- create a file metadata and use same filename (tesing,...**Problem**
File download fails if file name contains the special character like ","
**Steps to reproduce**
- Generate a signedurl to upload a file (tesing,copy).
- Upload a file
- create a file metadata and use same filename (tesing,copy)
- Generate a signedurl to download a file
- Copy and paste the signedurl in browser
**Error notice on bowser** - `<storage-url> sent an invalid response. ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION `
**Proposed Solutions**
- File service - wrap quotes to file name before passing it to OS Core Lib Azure **OR**
- OS Core Lib Azure - Below changes in [BlobStore class](https://community.opengroup.org/osdu/platform/system/lib/cloud/azure/os-core-lib-azure/-/blob/master/src/main/java/org/opengroup/osdu/azure/blobstorage/BlobStore.java#:~:text=581-%2cblobServiceSasSignatureValues.setContentDisposition%28%22attachment%3b%20filename%3D%20%22%20%2B%20fileName%29%3b%2c-582) of OS Core Lib Azure repo.
`blobServiceSasSignatureValues.setContentDisposition("attachment; filename=\"" + fileName + "\"")`;https://community.opengroup.org/osdu/platform/system/file/-/issues/86/api/file/v2/files bugs2023-09-26T14:22:03ZShane Hutchins/api/file/v2/files bugsReceived a response with 5xx status code: 500. I expected a 404 from these APIs but got 500 Internal Server Error.
Run this curl command to reproduce this failure:
GET /api/file/v2/files/%0A/metadata:
curl -X GET -H 'Authorization...Received a response with 5xx status code: 500. I expected a 404 from these APIs but got 500 Internal Server Error.
Run this curl command to reproduce this failure:
GET /api/file/v2/files/%0A/metadata:
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' https://osdu.r3m18.preshiptesting.osdu.aws/api/file/v2/files/%0A/metadata
DELETE /api/file/v2/files/%3B/metadata:
curl -X DELETE -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' https://osdu.r3m18.preshiptesting.osdu.aws/api/file/v2/files/%3B/metadata
GET /api/file/v2/files/%3B/downloadURL:
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'data-partition-id: osdu' https://osdu.r3m18.preshiptesting.osdu.aws/api/file/v2/files/%3B/downloadURL
Confirmed this issue on AWS and Azure.
curl -X GET -H 'Authorization: Bearer TOKEN' -H 'Cookie: JSESSIONID=SESSIONIDHERE' -H 'data-partition-id: opendes' https://osdu-ship.msft-osdu-test.org/api/file/v2/files/%0A/metadatahttps://community.opengroup.org/osdu/platform/system/file/-/issues/85Duplicate PreloadFilePath value on ingestion of File.Generics2023-06-16T13:17:04ZGorm-Erik AarsheimDuplicate PreloadFilePath value on ingestion of File.GenericsWhen ingesting File.Generics into OSDU, we are using the {OSDU_BASE_URL}/file/v2/files/metadata endpoint to create the File.Generic. In the body we are setting the PreloadFilePath value on data.DatasetProperties.FileSourceInfo.PreloadFil...When ingesting File.Generics into OSDU, we are using the {OSDU_BASE_URL}/file/v2/files/metadata endpoint to create the File.Generic. In the body we are setting the PreloadFilePath value on data.DatasetProperties.FileSourceInfo.PreloadFilePath as described in the schema. The file is created successfully but when we read out the record from Storage API there is a duplicate value for PreloadFilePath where it has both 'PreloadFilePath' with a capital 'P', and 'preloadFilePath' with a non-capital 'P'. We've tried to ingest a file with a non-capital 'preloadFilePath' as well, but it still comes out with both values when reading it back. We have also tried generating the File.Generic raw json object from our code and using this to create it directly from the metadata endpoint in Postman to disclose that our code is somehow meddeling with the object. This also comes out with the same result. It's also worth mentioning that we are running OSDU through Microsoft ADME.Om Prakash GuptaOm Prakash Guptahttps://community.opengroup.org/osdu/platform/system/file/-/issues/84File Service rejects Record ID from dataset--File.Generic manifest and genera...2023-06-09T15:27:51ZSamiullah GhousudeenFile Service rejects Record ID from dataset--File.Generic manifest and generate always GUID as record id**File Service rejects Record ID from dataset--File.Generic manifest and generate always GUID as record id**
Please check below CURL request to register metadata record for file dataset, I have included ID in the manifest, however all t...**File Service rejects Record ID from dataset--File.Generic manifest and generate always GUID as record id**
Please check below CURL request to register metadata record for file dataset, I have included ID in the manifest, however all time it rejects this id and create system generated GUID for this registered record.
**{+ Create File metadata - Request +}**
curl --location 'https://osdu-ship.msft-osdu-test.org/api/file/v2/files/metadata' \
--header 'Data-Partition-Id: opendes' \
--header 'Authorization: Bearer ...' \
--header 'Content-Type: application/json' \
--header 'Cookie: JSESSIONID=F4452A7D9F8752E8A82DC6E354D29B26' \
--data-raw '{
"kind": "osdu:wks:dataset--File.Generic:1.0.0",
{+ "id":"opendes:dataset--File.Generic:sami-test1",+}
"acl": {
"viewers": [
"data.default.viewers@opendes.contoso.com"
],
"owners": [
"data.default.viewers@opendes.contoso.com"
]
},
"legal": {
"legaltags": [
"opendes-Test-Legal-Tag-4766549"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"data": {
"Endian": "BIG",
"DatasetProperties": {
"FileSourceInfo": {
"FileSource": "/osdu-user/1686225883215-2023-06-08-12-04-43-215/4a62ec123d43427e93af2a4a1c515a6b"
}
}
}
}'
**{+ Create File metadata - Response +}**
{
{+"id": "opendes:dataset--File.Generic:d5c226d6-3eb2-4825-8b9b-e0834d0464cb"+}
}
cc- @todaiks @debasisc @chadhttps://community.opengroup.org/osdu/platform/system/file/-/issues/81Test cases commented FileFlowTest.java - Core module2022-11-22T14:31:27ZPramesh PatilTest cases commented FileFlowTest.java - Core moduleall test cases of FileFlowTest.java got commented and recent M14 someone added annotation to suppress this i.e. `
@SuppressWarnings("java:S2187") // there is no test cases in this class at present
Any specific reason for comment in th...all test cases of FileFlowTest.java got commented and recent M14 someone added annotation to suppress this i.e. `
@SuppressWarnings("java:S2187") // there is no test cases in this class at present
Any specific reason for comment in the FileFlowTest class.Pramesh PatilPramesh Patilhttps://community.opengroup.org/osdu/platform/system/file/-/issues/80ADR: Leverage File Service for Storage Operations2023-07-05T09:43:17ZElizabeth HalperADR: Leverage File Service for Storage Operations# Introduction
## Status
- [x] Initiated
- [x] Proposed
- [x] Under Review
- [ ] Approved
- [ ] Rejected
## Decision
All DMS's will leverage the File Service as a layer between the DMS and storage. Additionally, the File Service will...# Introduction
## Status
- [x] Initiated
- [x] Proposed
- [x] Under Review
- [ ] Approved
- [ ] Rejected
## Decision
All DMS's will leverage the File Service as a layer between the DMS and storage. Additionally, the File Service will provide methods through the exposed service interface to move data on different storage tiers for all DMS's in the most cost-effective manner based on how it is being used. File Service will provide an abstraction over all storage actions, including calls to the partition service. Therefore, this will not need to be implemented in services that don't have that functionality yet. All DDMS's will use File Service for the storage of binary data, and other services will be able to leverage File Service as well.
This decision is a proposed solution to the rejection of [this ADR](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/seismic/seismic-dms-suite/seismic-store-service/-/issues/39)
![DataStorageFlow](/uploads/92dbaeae4a51fa893e59e8ff2cb7934c/DataStorageFlow.png)
### Add File Service Endpoints
Provide a new utility endpoint to retrieve the list of supported storage tiers. We will need to discuss how implement an endpoint that will return the list of supported storage tiers. Additionally, other functionality, such as finer-grained access which is required for SDMS storage procedures, will need to be included in File Service.
### Refactor DMS Dataset Requests
Instead of directly loading in the Cosmos client library to each DMS, we will send the REST requests above to the file service to add datasets to the database.
## Rationale
We want all capabilities regarding storage to be available for all DMS's with the smallest amount of variation possible. Additionally, by implementing these features once in one service (File Service), the community will save a lot of time because other services will not need to change when storage features change. The example we use above is storage tiers. Although this requires an initial investment in refactoring services to leverage File Service, we will ultimately be able to implement storage tiers for all DMS's without much change to the services themselves.
## Consequences
We will need to:
- Add this additional functionality to the File Service
- "Onboard" services to using the File Service for all their storage actions
- Refactor all services to make REST requests to File Service as opposed to directly using the library
- We would need to enforce uniformity of requests given different services will be adding storage tier to their models
These tasks take a lot of time and effort as well as collaboration across many parties. We will need all CSV's and ISV's to support this motion and contribute to ensuring all services are compliant with this decision.Elizabeth HalperElizabeth Halperhttps://community.opengroup.org/osdu/platform/system/file/-/issues/77Concurrent file metadata create calls with same file source path results in 5002022-10-03T15:49:23Zharshit aggarwalConcurrent file metadata create calls with same file source path results in 500During file metadata create flow if concurrent requests with same file source path are made it results in 500 errors, reason being the file gets cleaned up from staging location hence exceptions thrown at cloudStorageOperation.deleteFile...During file metadata create flow if concurrent requests with same file source path are made it results in 500 errors, reason being the file gets cleaned up from staging location hence exceptions thrown at cloudStorageOperation.deleteFile(stagingLocation) are handled as generic internal server errors which is not the correct response code
CSPs can handle exceptions in their implementation to throw 4xx exceptionshttps://community.opengroup.org/osdu/platform/system/file/-/issues/72File service - Compute and store properties such as filesize and checksum at ...2022-08-23T15:21:13ZDebasis ChatterjeeFile service - Compute and store properties such as filesize and checksum at the time of uploading fileCurrently, Data Loader is expected to provide this information manually. This can be error prone and can be missed.
cc - @Keith_Wall and @krveduru for informationCurrently, Data Loader is expected to provide this information manually. This can be error prone and can be missed.
cc - @Keith_Wall and @krveduru for informationhttps://community.opengroup.org/osdu/platform/system/file/-/issues/70Already generated pre-signed url can be use to upload a file multiple time2022-09-27T11:18:15ZPiyush TalwalkarAlready generated pre-signed url can be use to upload a file multiple timeSteps to reproduce an issue:
- Generate Pre-Signed url
- Upload file using Pre-Signed url generated in step1
- Now post metadata for uploaded file and verify that file is moved to persistent location
- Now again try to upload another fil...Steps to reproduce an issue:
- Generate Pre-Signed url
- Upload file using Pre-Signed url generated in step1
- Now post metadata for uploaded file and verify that file is moved to persistent location
- Now again try to upload another file using same pre-signed url from step1
Expected Result:
User should able to Pre-Signed url only once for one file
Actual Result:
Already generated pre-signed url can be use to upload a file multiple timehttps://community.opengroup.org/osdu/platform/system/file/-/issues/68File service: File.GeoJSON kind2024-01-25T06:54:39ZMarton NagyFile service: File.GeoJSON kindFile service currently accepts only `dataset--File.Generic` for kind. We would set it to `dataset--File.GeoJSON` to match the real content of uploaded files, but that currently leads to
> 400 (BadRequest) - {\"error\":{\"code\":400,\"mes...File service currently accepts only `dataset--File.Generic` for kind. We would set it to `dataset--File.GeoJSON` to match the real content of uploaded files, but that currently leads to
> 400 (BadRequest) - {\"error\":{\"code\":400,\"message\":\"Invalid entity in kind\",\"errors\":[{\"domain\":\"global\",\"reason\":\"badRequest\",\"message\":\"Invalid entity in kind\"}]}.
Even swagger doc says: `Kind of data being ingested. Must follow the naming convention:{schema-authority}:wks:dataset--File.Generic:{version}.`
Please allow different kinds like `dataset--File.XxxXxx` to be posted there.
The rule may be extended to accept any kind matching [AbstractFile](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/abstract/AbstractFile.1.0.0.md), like most in [dataset](https://community.opengroup.org/osdu/data/data-definitions/-/tree/master/E-R/dataset) group.
cc @gehrmannhttps://community.opengroup.org/osdu/platform/system/file/-/issues/62POST /metadata endpoint returns 500 error when Blob Not Found2022-09-27T11:52:47ZTsvetelina IvanovaPOST /metadata endpoint returns 500 error when Blob Not FoundPOST /metadata endpoint returns a BlobStorageExcepiton with message "Error occurred while creating file metadata Status code 404, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does no...POST /metadata endpoint returns a BlobStorageExcepiton with message "Error occurred while creating file metadata Status code 404, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist._RequestId:f60ebd44-b01e-0012-532e-3236bf000000_Time:2022-03-07T14:23:07.9454432Z</Message></Error>" Status code 404, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:f60ebd44-b01e-0012-532e-3236bf000000
Time:2022-03-07T14:23:07.9454432Z</Message></Error>"" as a 500 Internal Server Error when calling deleteFile method to delete file from the blob storage.
The file is copied successfully from staging to the persistent area, but the delete of the file from staging fails with "The specified blob does not exist".
For exception logs see the attached file:
[File_Service_Azure_Logs_-_BlobStorageException.csv](/uploads/2cfc5a0baf2616fb456791ce8412babb/File_Service_Azure_Logs_-_BlobStorageException.csv)https://community.opengroup.org/osdu/platform/system/file/-/issues/61Swagger shows data.TotalSize is mandatory but it isn't in actual2022-08-26T10:17:33ZParesh BehedeSwagger shows data.TotalSize is mandatory but it isn't in actualThe data returned from /v2/files/{Id}/metadata does not include data.TotalSize. The documentation at https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/metadataPayload.json suggests that TotalSize is not option...The data returned from /v2/files/{Id}/metadata does not include data.TotalSize. The documentation at https://community.opengroup.org/osdu/platform/system/file/-/blob/master/docs/metadataPayload.json suggests that TotalSize is not optional
We must remove mandatory flag from swagger for that field to remain in synchttps://community.opengroup.org/osdu/platform/system/file/-/issues/60Discrepancy in API endpoint downloadURL2022-09-07T09:18:09ZParesh BehedeDiscrepancy in API endpoint downloadURLSwagger and document available in gitlab shows it as /DownloadURL but actual endpoint implemented is /downloadUrl. we must fix the documentation and swagger to reflect the current state.Swagger and document available in gitlab shows it as /DownloadURL but actual endpoint implemented is /downloadUrl. we must fix the documentation and swagger to reflect the current state.https://community.opengroup.org/osdu/platform/system/file/-/issues/58File SignedURL lifespan2022-09-29T13:41:08ZJan MortensenFile SignedURL lifespanWhen uploading a file there is the concept of a SignedURL that gives full access to the file in question for the lifespan of 7 days. This makes the file available for anyone how happens to have access to this URL.
Now there is the conc...When uploading a file there is the concept of a SignedURL that gives full access to the file in question for the lifespan of 7 days. This makes the file available for anyone how happens to have access to this URL.
Now there is the concept of a staging-area and a persistent-area. When uploading the file it first resides in the staging-area until a metadata has been created and posted to the OSDU-instance; then it will be moved by the system from staging to persistent. BUT this is a manual process and if the user somehow forgets or fails to the this second part the file will stay in the staging-area.
So even if it is the user or the process "fault" the way OSDU has designed this might make the file potentially "open" for "anyone" (with the link).
Is this something that can/should be mitigated?
See also #38 for a mitigating action on the signed url.https://community.opengroup.org/osdu/platform/system/file/-/issues/48ADR: Upgrade swagger version from swagger 2.0 to swagger 3.0 (OpenAPI 3)2022-05-12T06:27:04ZAman VermaADR: Upgrade swagger version from swagger 2.0 to swagger 3.0 (OpenAPI 3)## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. An OpenAPI file allows you to describe your e...## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
OpenAPI Specification (formerly Swagger Specification) is an API description format for REST APIs. An OpenAPI file allows you to describe your entire API, including: endpoints, optional parameters, authentication methods and contact/ license information.
The latest swagger version has been released with name OpenAPI 3.0
### Why is the upgrade required
The OpenAPI 3.0 offers several new features such as [servers](https://swagger.io/docs/specification/api-host-and-base-path/) field, which enables us to write automation on top of it. Let's say if you have a tool which scans the REST endpoint exposed by your service for various reasons, it can consume the openapi.json directly and skim through all the end points in that service
### Scope
All the services
## Trade-off Analysis
- **Are there any breaking changes in REST API doc while upgrading from swagger 2.0 to swagger 3.0**: NO
- **Are there any breaking changes in UI**: There are no breaking changes in UI per say. If anything, it has become more fluid and lightweight.
- **How involved are the code changes**: For most of the services, it's a matter of upgrading the maven package with minor changes here and there. For few services the changes can be more involved as some the the tags have changes between 2.0 and 3.0, like `@Api` has becom `@Tag`. More details are documented here: https://springdoc.org/#migrating-from-springfox
- **Would the URL change**: YES. The swagger home page for 2.0 is `swagger-ui.html`, while for 3.0, it is `swagger-ui/index.html`. However, we also have an explicit endpoint `/swagger` (or equivalent) which can be leveraged to abstract the underlying swagger URL. Users can continue visiting the swagger homepage by hitting `/swagger` endpoint.
## Decision
cc: @kibattul @madhurtanwaniAman VermaAman Vermahttps://community.opengroup.org/osdu/platform/system/file/-/issues/38GET /DownloadURL api to support generating signedURL that can be accessible f...2022-09-29T13:41:08ZOrsu AkhilGET /DownloadURL api to support generating signedURL that can be accessible from select IP address.The /DownloadURL api would return signedURL to access file stored in persistent location based on File Metadata id .
Having the feature wherein User can specify the range of IP address , only through which when signedURL is used would re...The /DownloadURL api would return signedURL to access file stored in persistent location based on File Metadata id .
Having the feature wherein User can specify the range of IP address , only through which when signedURL is used would result in access to the file in persistent location would be good. This IP address or range of IP address can be passed by User via GET query Parameter in the api specification.
Supporting this sort of customization in generated signed url is possible in Azure Cloud and would like to know about this in other cloud providers , if is feasible in others as well then could come up with implementation supporting this feature.Orsu AkhilParesh BehedeOrsu Akhilhttps://community.opengroup.org/osdu/platform/system/file/-/issues/31Soft DELETE functionality for Files and Metadata associated with it2022-09-29T13:41:07ZParesh BehedeSoft DELETE functionality for Files and Metadata associated with itCurrently DELETE endpoint in File service hard deletes the files from file store, we must have ability to do soft delete as well so that if user deletes by mistake then we have ability to restore it back.Currently DELETE endpoint in File service hard deletes the files from file store, we must have ability to do soft delete as well so that if user deletes by mistake then we have ability to restore it back.https://community.opengroup.org/osdu/platform/system/file/-/issues/27File Service should support ingestion of multiple files using 1 request2022-09-27T11:37:18ZKateryna Kurach (EPAM)File Service should support ingestion of multiple files using 1 request(This issue came from R3 pre-shipping testing)
Currently the process to upload a file is the following:
- get {{file_api_url}}/v1/files/uploadURL
(user gets SignedURL)
- put {{upload_signed_url}}
(user uploads a file to the OSDU staging...(This issue came from R3 pre-shipping testing)
Currently the process to upload a file is the following:
- get {{file_api_url}}/v1/files/uploadURL
(user gets SignedURL)
- put {{upload_signed_url}}
(user uploads a file to the OSDU staging area)
- post {{file_api_url}}/v1/files/metadata
(file is moved from the OSDU staging area to OSDU persistent area and metadata record is created)
There is a business need to upload multiple different (not in the same file collection) files using the same SignedURL. That may simplify and speed up the ingestion process.