Search issueshttps://community.opengroup.org/osdu/platform/system/search-service/-/issues2021-06-16T22:18:21Zhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/25[Search] Related to Elastic search upgrade need to document change impact to ...2021-06-16T22:18:21ZRaj Kannan[Search] Related to Elastic search upgrade need to document change impact to apps moving from R2 to R3When upgrading Elastic to the new version (expected 7.8) it is likely that the search syntax may change to accommodate new types (flattened). We need to document and create a prescription of how apps can migrate from R2 (6.x) to R3 (elas...When upgrading Elastic to the new version (expected 7.8) it is likely that the search syntax may change to accommodate new types (flattened). We need to document and create a prescription of how apps can migrate from R2 (6.x) to R3 (elastic 7.8) search syntax.
Should also note that the data migration is not part of this deliverable. Providers may need to reindex data or perhaps reload based on updated data models for R3.
@Dmitriy_Rudko @ethiraj @Nieten @dkodeih to please review this description and refine. Thanks.M1 - Release 0.1ethiraj krishnamanaiduethiraj krishnamanaidu2021-01-15https://community.opengroup.org/osdu/platform/system/search-service/-/issues/2[Search] Cross Data Partition Searches2021-06-16T22:18:26ZGary Murphy[Search] Cross Data Partition SearchesWhen multiple data partitions are available, it should be possible to issue a search query that returns results from multiple data partitions in one call -- the input should include IDs/names of data partitions to query as well as the ac...When multiple data partitions are available, it should be possible to issue a search query that returns results from multiple data partitions in one call -- the input should include IDs/names of data partitions to query as well as the actual query string. For the call on a data partition to succeed, the user must have read access to the data partition as well as data in the data partition itself. It may be the case that a target data partition has data and the requesting user has some access to the data partition but not to any data that the query retrieves. The specifics of how access to a data partition is defined are a concern of Entitlements; likely this Issue should be linked to an Epic involving Entitlements as well.M1 - Release 0.1https://community.opengroup.org/osdu/platform/system/search-service/-/issues/1[Search] Searching Hierarchies with Nested Arrays of Objects2023-09-08T10:45:22ZGary Murphy[Search] Searching Hierarchies with Nested Arrays of ObjectsThe current Search service supports searching indexed documents with nested structures, but not nested arrays of objects. The ability to search such document structures is important for a number of data types, especially those with ind...The current Search service supports searching indexed documents with nested structures, but not nested arrays of objects. The ability to search such document structures is important for a number of data types, especially those with indeterminate numbers of members (e.g. events associated with an activity generator, various acquisition data types, and tags on entities with things like data quality tags, etc.
It should be possible to execute search queries against such documents once indexed and utilize values in the nested arrays of objects in the queries and responses.M1 - Release 0.1JoeJoehttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/61Nested attributes are not shown through search api2021-08-25T05:25:38ZAn NgoNested attributes are not shown through search apiSteps to reproduce:
Create attached [slb-wks-GenericDocumentArtefact.json](/uploads/e47e848c5acb8992b868c7c009a5e87b/slb-wks-GenericDocumentArtefact.json) schema with nested attribute and create a storage record.
**Expected**: All the ...Steps to reproduce:
Create attached [slb-wks-GenericDocumentArtefact.json](/uploads/e47e848c5acb8992b868c7c009a5e87b/slb-wks-GenericDocumentArtefact.json) schema with nested attribute and create a storage record.
**Expected**: All the attributes mentioned in the schema are all shown up and searchable through search API, including the nested attributes
**Actual**: Some the nested attributes are missing when querying the record through search api
Nested attributes such as data.classification.concept-tags, data.classification.taxnodes etc are missing
Also some of the attributes are being flattened is that expected?M8 - Release 0.11https://community.opengroup.org/osdu/platform/system/search-service/-/issues/59Search by Create and Modified info on Records2021-08-25T05:37:08ZAn NgoSearch by Create and Modified info on RecordsAs part of Data Management workflows, a data admin needs the ability to find records that satisfy criteria like "when the record was created within the last 7 days" or "when the record was modified by this <user>". This is currently ...As part of Data Management workflows, a data admin needs the ability to find records that satisfy criteria like "when the record was created within the last 7 days" or "when the record was modified by this <user>". This is currently unavailable since it seems the necessary fields are not being indexed.
The queries should be against the Storage datetimes, not the indexed datetimes (if those are even kept).M8 - Release 0.11https://community.opengroup.org/osdu/platform/system/search-service/-/issues/39Audit trail columns are not available for using in search/query from Search s...2021-09-29T11:02:55ZDebasis ChatterjeeAudit trail columns are not available for using in search/query from Search serviceCheck this enclosed file to see what is exposed from Storage service (Get record) and what is exposed from Search Service query.
```
"createUser": "Debasis.Chatterjee@katalystdm.com",
"createTime": "2021-05-01T00:34:26.213Z"
```...Check this enclosed file to see what is exposed from Storage service (Get record) and what is exposed from Search Service query.
```
"createUser": "Debasis.Chatterjee@katalystdm.com",
"createTime": "2021-05-01T00:34:26.213Z"
```
The above two fields are not available for Search, it seems.
There are use cases where we would need to know records created since certain date and records created by certain user.
This gap would impair that kind of capability.
Please check and advise.
Thank you
[Audit-trail-from-Search-service.txt](/uploads/93a083bf84f3af0ca49ee7dbe38e687b/Audit-trail-from-Search-service.txt)M8 - Release 0.11https://community.opengroup.org/osdu/platform/system/search-service/-/issues/68Search is not returning null string attribute in response2022-01-18T04:01:36ZAn NgoSearch is not returning null string attribute in responseNull string value is supported, and filtering works. However, the response does not include the null string attribute.
For example,
"company": null
is supported, indexed and allowed to be filtered.
But in the response, "company" att...Null string value is supported, and filtering works. However, the response does not include the null string attribute.
For example,
"company": null
is supported, indexed and allowed to be filtered.
But in the response, "company" attribute is omitted.M9 - Release 0.12https://community.opengroup.org/osdu/platform/system/search-service/-/issues/70Improve bad request response messages2022-03-09T10:22:47ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comImprove bad request response messagesRelated to https://community.opengroup.org/osdu/platform/system/search-service/-/issues/57
In case the query does not match with the actual structure of the index mapping, ElasticSearch can provide a detailed message that can help a us...Related to https://community.opengroup.org/osdu/platform/system/search-service/-/issues/57
In case the query does not match with the actual structure of the index mapping, ElasticSearch can provide a detailed message that can help a user to understand that query or mapping should be refactored. For example, `nested` query running for properties that are not nested in elasticsearch:
~~~
{
"kind": "{{data-partition-id}}:wks:master-data--Well:*.*.*",
"query": "nested(data.NameAliases, (AliasName:\"L10-14\" AND AliasNameTypeID:\"osdu:reference-data--AliasNameType:WELL_NAME:\"))"
}
~~~
Elasticsearch will respond with such message:
~~~
{
"error": {
"root_cause": [{
"type": "query_shard_exception",
"reason": "failed to create query: [nested] failed to find nested object under path [data.NameAliases]",
"index_uuid": "XVFuNDNAQvqrdeAlhykBYw",
"index": "odesprod-wks-master-data--well-1.0.0"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [{
"shard": 0,
"index": "odesprod-wks-master-data--well-1.0.0",
"node": "vozOfr32RhSgpm7dCXzp0g",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: [nested] failed to find nested object under path [data.NameAliases]",
"index_uuid": "XVFuNDNAQvqrdeAlhykBYw",
"index": "odesprod-wks-master-data--well-1.0.0",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [data.NameAliases]"
}
}
}
]
},
"status": 400
}
~~~
But for Search service users this message will be suppressed, and they will receive such not descriptive response:
~~~
{
"code": 400,
"reason": "Bad Request",
"message": "Invalid parameters were given on search request"
}
~~~
We need to improve `bad request` response messages, and transfer `elasticsearch` reason to users, this will simplify understanding the current data structure (that can be complicated).<br/>
Example response:
~~~
{
"code": 400,
"reason": "Bad Request",
"message": "failed to create query: [nested] failed to find nested object under path [data.NameAliases]"
}
~~~M10 - Release 0.13Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRostislav Dublin (EPAM)Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/69ADR: Common discovery within and across kinds2023-07-13T09:46:54Zashley kelhamADR: Common discovery within and across kinds## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Today a single schema can define multiple properties for geospatial data. For example Wellbore schema defines both the _GeographicBottomHoleLoca...## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Today a single schema can define multiple properties for geospatial data. For example Wellbore schema defines both the _GeographicBottomHoleLocation_ and _ProjectedBottomHoleLocation_ properties.
The json key used for spatial data is also not consistent across schemas.
This causes issues for common consumption workflows like finding all entities that exist within a given area. This is because I don't know what property to query against for each type so to find all entities in a given area is complicated.
Looking beyond spatial data this is a common problem across different data types, for instance in a Wellbore schema the name is represented by the property 'FacilityName' however this key is not used for the name in other schemas.
We want to define a standard to allow indexing properties in a common way across types. This will provide
- A common property(s) to be searchable against across Kinds
- A priority list of schema properties that this can be populated from
- A way for these common properties to define relationships
## Trade-off Analysis
We could declare a single property to use on each schema to use as the common property. However there are schemas where multiple properties could be used and instances of entities where a specific property is not defined and another one is. Therefore no single property will ever be correct.
We could re-use the property key defined in the schema for indexing. However This causes consumers problems as they have to understand what property to use for each schema when discovering/running analytics across kinds. Defining a common property between schemas that can be used by consumers solves this concern.
We could define the standard directly in the schema only. This follows existing patterns with the indexing hints used [here](https://community.opengroup.org/search?search=x-osdu-indexing&group_id=218&project_id=91&scope=&search_code=true&snippets=false&repository_ref=master&nav_source=navbar). However this solution is inflexible to clients being able to provide their own mappings for OSDU schemas.
It does however allow for the standards to be maintained in the schema allowing control to be maintained by the schema authority. Therefore a solution that supports this whilst also providing flexibility to clients to provide their own mappings is preferable.
A separate ADR is proposed to allow for Schema extensions using the virtual property defined in this ADR.
## Decision
We are proposing a new optional attribute in schemas to define a common property mapping.
For OSDU schemas we propose to introduce a new property `x-osdu-virtual-properties`, with a dictionary of currently only one key `DefaultLocation`. This lists the path to the property and the order defines the priority. The first item in the list has highest priority. If that property does not exist or is not populated, the next get precedent.
`x-osdu-virtual-properties` can be used to map any properties to a new property name that can be used for consumption. Schemas can then declare the same virtual property to allow easier cross schema consumption.
The decision is backed by OSDU Data Definitions as per [Core Concepts meeting July 6, 2021](https://gitlab.opengroup.org/osdu/subcommittees/data-def/projects/core-concepts/docs/-/blob/master/Meeting%20Minutes/2021/DataDefinitionsCoreConcepts_MeetingMinutes-2021-07-06.md#1-decisions).
The virtual property declared is never added to the record however is made use of by consumption services like indexer/search to create an indexed entry for it and so make the data discoverable based on this property.
#### Example use case: Assigning virtual properties within a schema
```json
{
"x-osdu-Virtual-properties":{
"data.VirtualProperties.DefaultLocation": {
"type": "object",
"priority": [
{ "path": "data.ProjectedBottomHoleLocation" },
{ "path": "data.GeographicBottomHoleLocation" },
{ "path": "data.SpatialLocation" }
]}
}
}
```
The above example is prepared for Wellbore, which comes with three potential shapes. The projected representation is preferred over the geographic coordinates. Last priority is the standard shape contributed by the `AbstractFacility`.
For now we should restrict it so every key created through this must be prefixed with the following
```data.VirtualProperties.```
The `DefaultLocation` key name does not clash with any existing entity type property. It becomes relevant in generic search queries across different types including spatial conditions, for example:
```json
{
"kind": "*:*:*:*",
"spatialFilter": {
"field": "data.VirtualProperties.DefaultLocation",
"byGeoPolygon": {
"points": [
{"longitude":-90.65, "latitude":28.56},
{"longitude":-90.65, "latitude":35.56},
{"longitude":-85.65, "latitude":35.56},
{"longitude":-85.65, "latitude":28.56},
{"longitude":-90.65, "latitude":28.56}
]
}
}
```
There's also an _optional_ `isType` key you can apply to the priorities object. This restricts the selection based on the type of data the property points to which can be different per Record instance.
For example datasets and artifacts referenced by a record are generic schemas and so is dependent on the record instance. In the below example the `data.dataset[].filepath` property is only mapped if it points to a GeoJson type ekse it then checks if it is a Raster file type. The `isType` value is not restricted.
```json
{
"x-osdu-virtual-properties":{
"data.VirtualProperties.MyLocation": {
"type": "object",
"priority": [
{
"path": "data.dataset[].filepath",
"isType": "GeoJson"
},
{
"path": "data.dataset[].filepath",
"isType": "Raster"
}
]}
}
}
```
The ```x-osdu-virtual-property``` section also supports an _optional_ ```x-osdu-relationship``` block to describe a relationship this virtual property may have. See the example below.
The OSDU Data Definitions team ensures that canonical, well-known schemas contain a populated `x-osdu-virtual-properties`.
The report will then look like:
|Kind|Default Priority|Comment|
|----|----|----|
|→ [osdu:wks:master-data--SeismicProcessingProject:1.0.0](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/237-ambiguous-locations/E-R/master-data/SeismicProcessingProject.1.0.0.md) | data.SpatialLocation | Undefined x-osdu-virtual-properties definition; Unique Location |
|→ [osdu:wks:master-data--Well:1.0.0](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/237-ambiguous-locations/E-R/master-data/Well.1.0.0.md) | data.SpatialLocation | Undefined x-osdu-virtual-properties definition; Unique Location |
|→ [osdu:wks:master-data--Wellbore:1.0.0](https://gitlab.opengroup.org/osdu/subcommittees/data-def/work-products/schema/-/blob/237-ambiguous-locations/E-R/master-data/Wellbore.1.0.0.md) | 1: data.ProjectedBottomHoleLocation<br>2: data.GeographicBottomHoleLocation<br>3: data.SpatialLocation | Schema Controlled Order|
The first two kinds are reported as undefined, the third reports a proper order definition via the schema.
Keeping the x-osdu-virtual-properties mapping within the schema allows the data definitions team in OSDU to maintain control and order of how properties are mapped. However we still need to allow flexibility for specific client consumption workflows. This will be provided by Schema extensions.
#### Example use case: Describing relationships with virtual properties
It is also possible to tag virtual properties as relationships to achieve specific processing/indexing of relationships. The tagging is performed exactly the same as on standard OSDU schemas using the `x-osdu-` custom tags.
Here a simple relationship 'replication' example - the property `PetrelProjectID` refers to a record id of a record kind `slb:petrel:master-data--PetrelProject:*.*.*`. As a result, the property previously not visible to the indexer becomes declared and visible.
```
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"x-osdu-extensions": {
"authority": "SLB",
"x-osdu-virtual-properties": {
"data.ExtensionProperties.PetrelProjectID": {
"type": "object",
"priority": [
{
"path": "data.ExtensionProperties.PetrelProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"GroupType": "master-data",
"EntityType": "PetrelProject"
}
]
}
]
}
}
}
}
```
Unconstrained or open relationships to unspecified types are declared as `"x-osdu-relationship": []`.
The next example demonstrates a new relationship by means of a virtual property with prioritized sources:
```
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"x-osdu-extensions": {
"authority": "SLB",
"x-osdu-virtual-properties": {
"data.VirtualProperties.ApplicationProjectID": {
"type": "object",
"priority": [
{
"path": "data.ExtensionProperties.TechlogExtensions.TechlogProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"EntityType": "TechlogProject"
}
]
},
{
"path": "data.ExtensionProperties.PetrelProjectID",
"isType": "string",
"type": "string",
"x-osdu-relationship": [
{
"GroupType": "master-data",
"EntityType": "PetrelProject"
}
]
}
]
}
}
}
}
```
It demonstrates the 'virtual merge' of a relationship for a given record. The `data.VirtualProperties.VirtualApplicationProjectID` is expected to carry a relationship to either a Petrel project (kind `*:*:master-data--PetrelProject:*`) or a `*:*:*TechlogProject:*`. Should the Wellbore record contain both property values as defined in the two `path` values, the first one, the `TechlogProjectID` is taken.
## Consequences
- All existing OSDU schemas should be updated that define spatial data with a new ```DefaultLocation``` virtual property
- Data Definitions team validates that all spatial entity types are properly tagged with `"x-osdu-virtual-properties"`.
- Indexer needs to support `"x-osdu-virtual-properties"`
- Indexer needs to re-index based on all schema creation/change notificationsM10 - Release 0.13https://community.opengroup.org/osdu/platform/system/search-service/-/issues/57Search api returning 400 errors for some nested search fields2022-08-23T11:19:23ZMichaelSearch api returning 400 errors for some nested search fieldsWhen calling the search api (/api/search/v2/query) with nested search queries, some of these requests fail with a 400 error code.
I testde on both the aws pre-shipping environment and gcp pre-shipping environment.
The attached document ...When calling the search api (/api/search/v2/query) with nested search queries, some of these requests fail with a 400 error code.
I testde on both the aws pre-shipping environment and gcp pre-shipping environment.
The attached document details which nested search queries fail for master-data--well records
[Nested_Queries_for_Wells.docx](/uploads/21a1603d3da2fcf273b369b7833c4b86/Nested_Queries_for_Wells.docx)M10 - Release 0.13Sehubo AkinyanmiSehubo Akinyanmihttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/73ADR: Search data across multiple kinds in one search2023-06-23T08:08:59ZZhibin MaiADR: Search data across multiple kinds in one search## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
It is quite common for users or applications to search data across multiple kinds in one search. In OSDU search, each kind is mapped to one inde...## Status
- [X] Proposed
- [x] Under review
- [x] Approved
- [ ] Retired
## Context & Scope
It is quite common for users or applications to search data across multiple kinds in one search. In OSDU search, each kind is mapped to one index. That means that users may need to search data across multi-indices in Elasticsearch. Elasticsearch supports search across multi-indices by specifying either index names as wildcard or a list of index names.
Currently, OSDU search only expose the wildcard solution (e.g. "kind": "\*:\*:\*:\*") to support search across multi-indices.
There may be hundreds of kinds if not thousands in one tenant data partition. We found that using wildcard to search across multi-indices introduces significant overhead on performance as comparing with a list of index names. The more indices in Elasticsearch, the bigger overhead could be introduced. The attached diagram shows our observation:
![image](/uploads/e84ac4851dd5d19c280b75e4b602d3ad/image.png)
## Trade-off Analysis
Here is the relevant API spec: https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/api/SearchApi.java
Without introducing new field in the search API, we propose to concatenate the index (kind) names with comma in the existing “kind” property, e.g.
``````````````````````````
e.g. I have kinds in the system
a:b:c:d
a:e:c:d
a:f:c:d
a:g:c:d
I want to search keyword "well" against only 2 kinds
a:b:c:d
a:e:c:d
today I can only do this by forming a query
{
“kind”: "a:*:c:d”,
“query”: “(\"kind\": \"a:b:c:d\" OR \"kind\": \"a:e:c:d\") AND well”
}
This still makes my query slower because the search is performed against all indexes the wildcard matches i.e.
a:b:c:d
a:e:c:d
a:f:c:d
a:g:c:d
even though I know I only want to search against 2 of the indexes. The proposed solution will allow me to change this to
{
“kind”: “a:b:c:d,a:e:c:d”,
"query": "well"
}
Making my query easier to write and potentially a lot more performant as it targets ony the indexes I want to search against
``````````````````````````
Here are the Pros and Cons of the proposal:
| Pros| Cons|
| ------ | ------ |
| - Non-breaking change. No API change required. | - Not following the json pattern to code multiple items |
| - It is consistent with Elasticsearch's pattern on coding multi-indices for search. | |
| - Change only on "Common Code" in both "OSDU Core Common" and "Search Service". | |
## Decision
The proposal is a non-breaking change. Its implementation is pretty simple and safe. Prototype of the implementation in OSDU Core Common and Search Service can be found in MRs:
- [Change on Core Common](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/merge_requests/127)
- [Change on Search service](https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/190)
## Consequences
This is a non-breaking change but with big performance gain when searching across multiple indices.M11 - Release 0.14Zhibin MaiZhibin Mai2022-01-14https://community.opengroup.org/osdu/platform/system/search-service/-/issues/71Cross-cluster Search endpoint needs to be obsoleted2022-12-09T13:13:14ZGary MurphyCross-cluster Search endpoint needs to be obsoleted**Summary**: The cross cluster search endpoint and implementation in Search were incorrectly copied over from the internal SLB implementation leading to issues where experimentation with the exposed service is causing many 5xx errors to...**Summary**: The cross cluster search endpoint and implementation in Search were incorrectly copied over from the internal SLB implementation leading to issues where experimentation with the exposed service is causing many 5xx errors to be thrown. The endpoint and implementation are not functioning in OSDU, and it should be obsoleted as soon as possible.
</br>
**Details**:
Here is the relevant API spec: https://community.opengroup.org/osdu/platform/system/search-service/-/blob/master/search-core/src/main/java/org/opengroup/osdu/search/api/SearchApi.java#L135
The history on this endpoint is that it should never have been contributed to OSDU in its current form (it is incomplete in terms of code robustness and it references concepts not relevant to OSDU such as the "common" data partition which is no longer an OSDU component).
Since this endpoint **does not** work currently, and it would require major rework to have similar functionality, and it is causing runtime issues on the Search service (5xx alerts, etc.) the simplest and correct path is to obsolete it.M11 - Release 0.14https://community.opengroup.org/osdu/platform/system/search-service/-/issues/91In GCP Platform Validation environment - Not able to search the record (that ...2022-08-23T21:15:28ZKamlesh TodaiIn GCP Platform Validation environment - Not able to search the record (that was created using Storage API) using Search APIWhile testing WITSML parser DAG in the GCP platform validation environment (devtwo), I see that DAG finishes successfully.
Using the storage API not able to find the expected record
When one looks at the details of the airflow logs, one ...While testing WITSML parser DAG in the GCP platform validation environment (devtwo), I see that DAG finishes successfully.
Using the storage API not able to find the expected record
When one looks at the details of the airflow logs, one finds the following:
XCom
Key Value
return_value {'manifest': {'ReferenceData': [], 'MasterData': [{'acl': {'owners': ['data.default.owners@devtwo.dev2.osdu.club'], 'viewers': ['data.default.viewers@devtwo.dev2.osdu.club']}, 'kind': 'devtwo:wks:master-data--Well:1.0.0', 'legal': {'legaltags': ['devtwo-WITSML-Legal-Tag-Test1387421'], 'otherRelevantDataCountries': ['US'], 'status': 'compliant'}, 'createTime': '2022-05-16T21:15:06.587899+00:00', 'data': {'ResourceSecurityClassification': 'devtwo:reference-data--ResourceSecurityClassification:RESTRICTED:', 'SpatialLocation': {'AsIngestedCoordinates': {'features': [{'type': 'AnyCrsFeature', 'geometry': {'type': 'AnyCrsPoint', 'bbox': [0.0, 0.0, 0.0, 0.0], 'coordinates': [0.0, 0.0]}, 'properties': {}}], 'persistableReferenceCrs': 'devtwo:reference-data--CoordinateReferenceSystem:23031:', 'type': 'AnyCrsFeatureCollection', 'CoordinateReferenceSystemID': 'devtwo:reference-data--CoordinateReferenceSystem:23031:'}, 'Wgs84Coordinates': {'features': [{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [0.0, 0.0]}, 'properties': {}}], 'type': 'FeatureCollection'}}, 'FacilityID': 'Govt-Number', 'FacilityName': 'Company Legal Name', 'VerticalMeasurements': [{'VerticalMeasurementID': 'KB', 'VerticalMeasurement': 78.5, 'VerticalMeasurementDescription': 'Kelly Bushing', 'VerticalMeasurementPathID': 'devtwo:reference-data--VerticalMeasurementPath:Elevation:', 'VerticalMeasurementTypeID': 'devtwo:reference-data--VerticalMeasurementType:KB:', 'VerticalMeasurementUnitOfMeasureID': 'devtwo:reference-data--UnitOfMeasure:ft:', 'VerticalReferenceID': 'SL'}, {'VerticalMeasurementID': 'SL', 'VerticalMeasurementDescription': 'Sea Level', 'VerticalMeasurementPathID': 'devtwo:reference-data--VerticalMeasurementPath:Elevation:', 'VerticalMeasurementTypeID': 'devtwo:reference-data--VerticalMeasurementType:MSL:'}]}, 'id': 'devtwo:master-data--Well:WELLAUTOTEST_KTMay16', 'modifyTime': '2022-05-16T21:15:06.587899+00:00', 'version': 1}], 'Data': {'Datasets': [], 'WorkProductComponents': [], 'WorkProduct': {}}, 'kind': 'devtwo:wks:Manifest:1.0.0'}}
**skipped_ids** [{'id': 'devtwo:dataset--File.WITSML:WELLAUTOTEST_KTMay16', 'kind': 'devtwo:wks:dataset--File.WITSML:1.0.0', 'reason': **'Missing parents: {SRN: devtwo:reference-data--SchemaFormatType:EnergisticsWITSML}'}]**
**Using Storage API - I can retrieve the record.**
**Request**:
curl --location --request GET 'https://dev2.gcp.gnrg-osdu.projects.epam.com/api/storage/v2/records/devtwo:reference-data--SchemaFormatType:EnergisticsWITSML' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: devtwo' \
--header 'frame-of-reference: units=SI;crs=wgs84;elevation=msl;azimuth=true north;dates=utc;' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer ya29.a0ARrdaM-EQ-TzRAYNSXRgk7o8dl5SUtwO...by1PWX3U'
**Response**: 200 OK
{
"data": {
"Endian": "BIG"
},
"id": "devtwo:reference-data--SchemaFormatType:EnergisticsWITSML",
"version": 1652719841337471,
"kind": "devtwo:wks:reference-data--SchemaFormatType:1.0.0",
"acl": {
"viewers": [
"data.default.viewers@devtwo.dev2.osdu.club"
],
"owners": [
"data.default.viewers@devtwo.dev2.osdu.club"
]
},
"legal": {
"legaltags": [
"devtwo-WITSML-Legal-Tag-Test7374005"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"createUser": "kamlesh_todai@osdu-gcp.go3-nrg.projects.epam.com",
"createTime": "2022-05-16T16:50:43.892Z"
}
**Using Search API - I Cannot retrieve the record.**
**Request**:
curl --location --request POST 'https://dev2.gcp.gnrg-osdu.projects.epam.com/api/search/v2/query' \
--header 'Content-Type: application/json' \
--header 'data-partition-id: devtwo' \
--header 'Authorization: Bearer ya29.a0ARrdaM-QCGVm66qleS22ZvF7OT3NuWDDje...jKx3lnOHBIIshQR4' \
--data-raw '{
"kind": "devtwo:wks:reference-data--SchemaFormatType:1.0.0",
"query": "id:\"devtwo:reference-data--SchemaFormatType:EnergisticsWITSML\""
}'
**Response:** 200 OK
{
"results": [],
"aggregations": [],
"totalCount": 0
}
[Search_WITSML_issue.txt](/uploads/5c7613d90d7f826fb86e114bb15dee0d/Search_WITSML_issue.txt)M12 - Release 0.15Dzmitry Malkevich (EPAM)Dzmitry Malkevich (EPAM)https://community.opengroup.org/osdu/platform/system/search-service/-/issues/86Support multi-kinds separate with comma2022-08-23T13:35:49ZZhibin MaiSupport multi-kinds separate with commaThis is the extension of ADR[73](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/73/) to support search across multi-kinds.
According to our feedback, quite a lots users prefer to code multi-kinds as a stri...This is the extension of ADR[73](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/73/) to support search across multi-kinds.
According to our feedback, quite a lots users prefer to code multi-kinds as a string separated by comma ',' instead of array of strings. This issue is to extend the coding or format of multi-kinds in search API, such as
```
{
“kind”: “a:b:c:d,a:e:c:d”,
"query": "well"
}
```
The proposal solution must be back-compatible.M12 - Release 0.15Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/99Fix the search and indexing performance issues when the geometry of the docum...2023-07-10T16:17:26ZZhibin MaiFix the search and indexing performance issues when the geometry of the document is large##### Background:
Today the geometry or called shapes in the indexed records are not decimated. The size of geometry data could be large and reach tens of MB if hundreds of MB. As we know, the geometry in the search index can be used t...##### Background:
Today the geometry or called shapes in the indexed records are not decimated. The size of geometry data could be large and reach tens of MB if hundreds of MB. As we know, the geometry in the search index can be used to support spatial query, data preview or data discovery.
However, the large size of geometry in the indexed records could significantly affect the performance on retrieving the search results and prevent search results to be used efficiently in some utilities, such as GIS map. In O&G application, GIS map is a critical component that users may use to render the shapes in the given region as a tool for the data discovery. It may require to retrieve and render thousands or even millions of shapes from the OSDU index. If there are tens of thousand of shapes to be retrieved and rendered, the performance won't be good enough even the shapes are decimated. At another end, it is unnecessary to show the detail of the shapes when tens of thousands indexed records are returned from the search.
##### Proposal:
We propose decimate the geometry of the following GeoJSON geometry types by implementing Ramer–Douglas–Peucker algorithm for the original shape attribute and shape attribute "data.VirtualProperties.DefaultLocation.Wgs84Coordinates" if exists.
- LineString
- MultiLineString
- Polygon
- MultiPolygon
Regarding shape attribute "data.VirtualProperties.DefaultLocation", please refer to ADR [Common discovery within and across kinds](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/69)
##### Performance Evaluation:
We did some performance evaluation with the prototype to decimate the original shape attribute and shape attribute "data.VirtualProperties.DefaultLocation.Wgs84Coordinates" using some seismic 2D surveys. The tolerance or epsilon is about 10 meters which is about 0.0001 degree around the equator.
The information of the test dataset and summary of the test report are attached below:
- [performance_test_summary.txt](/uploads/dc913a11d5cead3a1b5b54529c5449de/performance_test_summary.txt)
- [test_dataset.csv](/uploads/0263b8e976526c246e4dd8074a8c52f2/test_dataset.csv)
##### Summary:
1. The decimation of the shape attributes significantly improve the end to end search performance (search and data retrieval from elastic search to the test client)
2. The extra overhead of the decimation during indexing is offset by the gain of saving time on elastic search indexing of the geo-shapes. The test result indicates that it reduced the overall indexing time by 58%.M14 - Release 0.17Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/103Search - Policy Integration "400 Request Header Or Cookie Too Large"2023-07-04T11:01:18ZThulasi Dass SubramanianSearch - Policy Integration "400 Request Header Or Cookie Too Large"**Background**
We are observing an intermittent issue, after enabling Policy in Search Service resulting in the following error response
```
{
"code": 400,
"reason": "Bad Request",
"message": "Failed to derive xcontent"
}
``...**Background**
We are observing an intermittent issue, after enabling Policy in Search Service resulting in the following error response
```
{
"code": 400,
"reason": "Bad Request",
"message": "Failed to derive xcontent"
}
```
**Analysis:**
Based on our localhost analysis ( [logs](/uploads/8c93c067644655091d5e8125885a7fdb/Policy-Translate-Header-issue.txt)) current user (preshipping@azureglobal1.onmicrosoft.com) belongs to _more than 2000 data groups_ as member.
While Search calls Policy translate API, in request header '**X-Data-Groups**' values has **more than 2000 groups** for the user, which results in '**400 Request Header Or Cookie Too Large**'.
```HttpResponse(headers={null=[HTTP/1.1 400 Bad Request], Server=[Microsoft-Azure-Application-Gateway/v2], Connection=[close], Content-Length=[259], Date=[Wed, 02 Nov 2022 07:06:04 GMT], Content-Type=[text/html]}, body=<html><head><title>400 Request Header Or Cookie Too Large</title></head><body><center><h1>400 Bad Request</h1></center><center>Request Header Or Cookie Too Large</center><hr><center>Microsoft-Azure-Application-Gateway/v2</center></body></html>, contentType=text/html, responseCode=400, exception=null, request=https://osdu-ship.msft-osdu-test.org/api/policy/v1/translate, httpMethod=POST, latency=1623```
This error body translated as input query for ElasticSearch which results in ElasticSearch exception
```
{
"code": 400,
"reason": "Bad Request",
"message": "Failed to derive xcontent"
}
```
**Workaround:**
- We have **deleted** the stale/test groups present in X-Data-Groups for the user via Entitlements API.
**Need Inputs:**
The above workaround is not a ideal/permanent solution. Hence we are looking for any inputs to remediate this issue across all environmentsM15 - Release 0.18Shane HutchinsThulasi Dass SubramanianShane Hutchinshttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/127Wildcard Searching not consistent in nested fields2024-02-01T15:22:13ZMark ChanceWildcard Searching not consistent in nested fields```
Various queries of nested fields do not return expected results. For instance, with this context:
{"kind":"osdu:wks:master-data--Well:1.2.0",
"offset":0,"limit":30}
WORKS:
"query":"nested(data.FacilityStates, (FacilityStateType...```
Various queries of nested fields do not return expected results. For instance, with this context:
{"kind":"osdu:wks:master-data--Well:1.2.0",
"offset":0,"limit":30}
WORKS:
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference*))",
Returns 72, including:
"data": {
"FacilityStates": [
{
"FacilityStateTypeID": "osdu:reference-data--FacilityStateType:Abandoned:",
"Remark": null
},
{
"FacilityStateTypeID": "osdu:reference-data--FacilityStateType:Planning:",
"Remark": null
}
],
FAILS:
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference-data*))",
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference?data*))",
"query":"nested(data.FacilityStates, (FacilityStateTypeID:osdu\\:reference\\-data*))",
For what it's worth, the schema has "FacilityStateTypeID": { "type": "string",...
And another example:
{"kind":"osdu:wks:master-data--Wellbore:1.0.0","offset":0,"limit":30}
returns 2453, including
"data": {
"GeoContexts": [
{
"BasinID": null,
"FieldID": "osdu:master-data--Field:Tietjerksteradeel:",
"PlayID": null,
"GeoPoliticalEntityID": null,
"GeoTypeID": "Field",
"ProspectID": null
}
],
...
Works:
"query":"nested(data.GeoContexts, (FieldID:osdu?master*))",
returns 1140
Fails:
"query":"nested(data.GeoContexts, (FieldID:osdu?master?data*))",
"query":"nested(data.GeoContexts, (FieldID:osdu?master\\-*))",
Additional Examples
WORKS:
"query":"nested(data.GeoContexts, (FieldID:\"osdu:master-data--Field:Tietjerksteradeel\"))",
"query":"nested(data.GeoContexts, (FieldID:\"osdu\\:master\\-data\\-\\-Field\\:Tietjerksteradeel\"))",
"query":"nested(data.GeoContexts, (FieldID:osdu\\:master\\-data\\-\\-Field\\:Tietjerksteradeel))",
returns 8
"query":"nested(data.GeoContexts, (FieldID:osdu*))",
returns 1162
And again, the schema has "FieldID": { "type": "string",
The original AHA Link is https://osdu-community.ideas.aha.io/ideas/IDEA-I-68
These queries have been run on a Shell-deployed instance on AWS:
"artifactId":"search-aws",
"version":"0.19.2",
"buildTime":"2023-03-20T22:58:41.497Z",
"branch":"refs/heads/release/r3-m16",
"commitId":"f8549673fca69422a024c9c980a36b22a445ca1e",
"commitMessage":"Change unit test to use older version",
```M16 - Release 0.19https://community.opengroup.org/osdu/platform/system/search-service/-/issues/118"Search data in a given kind with hundreds of copies" test scenario exceeds E...2023-02-17T12:29:21ZMorris Estepa"Search data in a given kind with hundreds of copies" test scenario exceeds Elasticsearch limitsThe integration test for "Search data in a given kind with hundreds of copies" scenario is causing elasticsearch to throw a HTTP 400 error because the "kind" attribute in the query is too long. Elasticsearch only allows up to 4096 bytes ...The integration test for "Search data in a given kind with hundreds of copies" scenario is causing elasticsearch to throw a HTTP 400 error because the "kind" attribute in the query is too long. Elasticsearch only allows up to 4096 bytes in the query.M16 - Release 0.19https://community.opengroup.org/osdu/platform/system/search-service/-/issues/112Azure search service can have high latencies from entitlements2023-02-17T09:06:32Zashley kelhamAzure search service can have high latencies from entitlementsThe search service from Azure can occasionally have high latencies when doing the API check against entitlements and the response is slow. Unlike most other CSPs and on other services search doesnt appear to cache this response from ent...The search service from Azure can occasionally have high latencies when doing the API check against entitlements and the response is slow. Unlike most other CSPs and on other services search doesnt appear to cache this response from entitlements to reuse negating this impact.
We would like to add an azure version of IAuthorizationService with a cache into search to help.M16 - Release 0.19ashley kelhamashley kelhamhttps://community.opengroup.org/osdu/platform/system/search-service/-/issues/108Multi kind search does not work with more than approximately 85 kinds2023-03-28T20:27:53ZZhibin MaiMulti kind search does not work with more than approximately 85 kindsIn [ADR: Search data across multiple kinds in one search](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/73/), we make use of elastic search capability to support multi-kinds to boost the search performance....In [ADR: Search data across multiple kinds in one search](https://community.opengroup.org/osdu/platform/system/search-service/-/issues/73/), we make use of elastic search capability to support multi-kinds to boost the search performance. The solution has been deployed to our clients and the feedback is quite positive in terms of performance improvement. However, one of clients reported an issue that the search returns 400 when the search includes lots of kinds.
The reason is that the elastic search API defines the index names (converted kind names) in the path (as part of URL) and there is length limitation (about 4000 bytes) on the URL in the web server that elastic search uses. Instead letting the elastic search throws exception, in OSDU search service, the validator will validate the length of kinds and throw 400 if it exceeds the limit. For detail, please refer to [MultiKindValidator](https://community.opengroup.org/osdu/platform/system/lib/core/os-core-common/-/blob/master/src/main/java/org/opengroup/osdu/core/common/model/search/validation/MultiKindValidator.java)
We did some calculation about the average length of kinds in our test envs, it is about 45 bytes per kind on average. That means the current multi-kinds solution can support up to 85 kinds on average. We did some research on elastic search API. In order to support more kinds, one of the approaches is to make use of [Aliases](https://www.elastic.co/guide/en/elasticsearch/reference/current/aliases.html#aliases) solution in elastic search to reduce the length of index names (converted kinds) passed to the elastic search. The basic idea is to replace the long string index name with short string alias when passing the index names to the elastic search if the index names are too long. If we use hashcode of index name as alias, the alias length is no longer than 10 bytes. That means we can support about 400 kinds which should be enough in most cases.M16 - Release 0.19Zhibin MaiZhibin Mai