Indexer issueshttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues2023-08-18T15:51:32Zhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/2Elasticsearch version upgrade2023-08-18T15:51:32Zethiraj krishnamanaiduElasticsearch version upgradeCurrent Version
- Elastic Server: 6.8.1
- Elastic Client version(OSDU Indexer service): 6.6.1
Proposed Version upgrade
* Elastic Server: 7
* Elastic Client version(OSDU Indexer service): 7
we need to upgrade the client and ela...Current Version
- Elastic Server: 6.8.1
- Elastic Client version(OSDU Indexer service): 6.6.1
Proposed Version upgrade
* Elastic Server: 7
* Elastic Client version(OSDU Indexer service): 7
we need to upgrade the client and elastic server version, this would require the following changes...
* Update the Indexer service, code change + lib version upgrade.
* Upgraded Elastic server in all clouds(AWS, Azure, Google, and IBM..etc).M1 - Release 0.1ethiraj krishnamanaiduDmitriy Rudkoethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/1[Indexer] Support for indexing documents with nested arrays of objects2024-01-11T12:28:02ZGary Murphy[Indexer] Support for indexing documents with nested arrays of objectsJSON documents with nested arrays of objects are not currently indexed by the Indexer. The capability needs to be added so that search queries on such documents can be executed. Understanding that there are performance issues with all...JSON documents with nested arrays of objects are not currently indexed by the Indexer. The capability needs to be added so that search queries on such documents can be executed. Understanding that there are performance issues with allowing too many levels of nested arrays to be search, it is proposed that limitations be put on the number of levels allowed for nested indexing. Additionally, in cases where an abstract base schema is defined for the attribute type (example: AbstractFacilityEvent in AbstractFacility.json), the indexer should only support indexing the abstract base schema entities and not extensions added to the concrete definition.M1 - Release 0.1https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/4ADR Support self-signed certificates for elasticsearch2020-09-10T17:58:47ZRiabokon Stanislav(EPAM)[GCP]ADR Support self-signed certificates for elasticsearch## Context and Scope
Indexer Service does not support HTTPS connections with self-signed certificates for elastic search.
## Decision
Add a property 'security.https.certificate.trust' into application-*.properties.
If it is 'true', we...## Context and Scope
Indexer Service does not support HTTPS connections with self-signed certificates for elastic search.
## Decision
Add a property 'security.https.certificate.trust' into application-*.properties.
If it is 'true', we will use `TrustSelfSignedStrategy()` in `org.opengroup.osdu.search.util.ElasticClientHandler` that trusts self-signed certificates.
## Rational
We would use self-signed certificates for elastic search instead of signed certificates.
## Consequences
These changes could affect all providers due to these things will be implemented in indexer-core.Dmitriy RudkoDmitriy Rudkohttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/6Branch merged to master with broken AWS integration tests2020-10-09T21:29:54ZMatt WiseBranch merged to master with broken AWS integration tests!29 was merged to master with a failing pipeline. The AWS tests were previously passing prior to the merge, but the merge was completed even though the pipeline failed.
The breakage was the result of changing the [testing/indexer-test-...!29 was merged to master with a failing pipeline. The AWS tests were previously passing prior to the merge, but the merge was completed even though the pipeline failed.
The breakage was the result of changing the [testing/indexer-test-core/pom.xml](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/testing/indexer-test-core/pom.xml) os-core-common version from 0.3.6 to 0.3.12 which seems to have changed the dependency for jackson data mapper (which is required by AWS)
This change should have had CSP approval since it touches core code, but it was not approved nor was the pipeline passing first as required.M1 - Release 0.1David Diederichd.diederich@opengroup.orgethiraj krishnamanaiduDania Kodeih (Microsoft)JoeDaniel SchollMatt WiseDavid Diederichd.diederich@opengroup.orghttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/7Indexer to read from Schema Service as well as Storage Schema -- non-breaking2022-09-16T01:49:34ZGary MurphyIndexer to read from Schema Service as well as Storage Schema -- non-breakingThe Indexer currently reads only from the Storage schema endpoint for schemas that describe storage records.
As part of the Schema Service roadmap, Indexer needs to consume schemas from Schema Service as well as Storage. At this stage ...The Indexer currently reads only from the Storage schema endpoint for schemas that describe storage records.
As part of the Schema Service roadmap, Indexer needs to consume schemas from Schema Service as well as Storage. At this stage in the roadmap, Storage Schemas will not be migrated to the Schema Service, but Indexer will read from both endpoints.
The bulk of record ingestion is expected to be through ingestion services, and these will use the Schema Service schemas to describe records being ingested. So, as noted in the Schema Service roadmap, all consumers of schemas should eventually be using only the Schema Service as the SoR for storage record descriptions and tailoring those full-featured schemas to suit their consumption patterns. The interim step proposed here is that Indexer begin reading from Schema Service for schemas first (and internally flattening those schemas as needed for indexing), then checking for Storage Schemas for a kind's schema if not found in Schema Service. This approach necessitates an eventual migration of Storage Schemas to Schema Service at some point, but this is likely to be needed anyway, i.e. Schema Service as SoR drives that requirement. By not removing the Storage Service schema endpoint and having Indexer check both endpoints, a non-breaking enhancement is introduced that allows evolutionary introduction of Schema Service schemas to the indexing workflow.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/10Indexer support for allOf2021-01-28T22:35:37Zethiraj krishnamanaiduIndexer support for allOfsupport of allOf has been improved for the json schema:
* allOf is not required in properties->data section
* allOf can be placed inside allOf
* allOf might be used inside properties
* changes are in core, all cloud providers are affectedsupport of allOf has been improved for the json schema:
* allOf is not required in properties->data section
* allOf can be placed inside allOf
* allOf might be used inside properties
* changes are in core, all cloud providers are affectedSviatoslav NekhaienkoSviatoslav Nekhaienkohttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/9Update the index creation logic to accept forward slash2021-07-14T19:58:47Zethiraj krishnamanaiduUpdate the index creation logic to accept forward slashDD team introduced new format for [kind ](https://community.opengroup.org/osdu/platform/system/storage/-/issues/26#note_27827), in order to support the new format we need update the index creation logic to support forward slash.
Indexer...DD team introduced new format for [kind ](https://community.opengroup.org/osdu/platform/system/storage/-/issues/26#note_27827), in order to support the new format we need update the index creation logic to support forward slash.
Indexer creates index per kind and uses kind for index name, any special character will require changes to logic where we create map index <-> kind. Here are unsupported character for index name in search backend: , /, \*, ?, ", <, >, |, (space character), ,, #Neelesh ThakurNeelesh Thakurhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/11All CSP Integration tests are failing in master2021-02-05T03:08:49ZMatt WiseAll CSP Integration tests are failing in masterIt seems since a day or 2 ago all CSP integration tests are failing.
Latest Pipeline:
https://community.opengroup.org/osdu/platform/system/indexer-service/-/pipelines/24040
Latest commit (has a core change):
https://community.opengroup...It seems since a day or 2 ago all CSP integration tests are failing.
Latest Pipeline:
https://community.opengroup.org/osdu/platform/system/indexer-service/-/pipelines/24040
Latest commit (has a core change):
https://community.opengroup.org/osdu/platform/system/indexer-service/-/commit/24f5206c0e103958d7f655f471e64f34f5fa6445ethiraj krishnamanaiduDania Kodeih (Microsoft)Wladmir FrazaoJoeDmitriy Rudkoethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/12[Azure] Reindex API for azure fails during entitlement calls2021-03-02T08:52:04ZAman Verma[Azure] Reindex API for azure fails during entitlement callsThe reindex api fails for azure during entitlements call.
Sample request:
POST https://osdu-dev.msft-osdu-test.org/api/indexer/v2/reindex
{
"kind": "opendes:at:wellbore:1.0.0"
}
Response:
{
"timestamp": "2021-02-10T09:31:21.560+...The reindex api fails for azure during entitlements call.
Sample request:
POST https://osdu-dev.msft-osdu-test.org/api/indexer/v2/reindex
{
"kind": "opendes:at:wellbore:1.0.0"
}
Response:
{
"timestamp": "2021-02-10T09:31:21.560+00:00",
"status": 500,
"error": "Internal Server Error",
"message": "java.lang.String cannot be cast to com.microsoft.azure.spring.autoconfigure.aad.UserPrincipal",
"path": "/api/indexer/v2/reindex"
}
The error comes from `getGroups()` method of `EntitlementsServiceAzure` class
cc: @manishk , @njain5 , @nthakur , @komakkarAman VermaAman Vermahttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/13[Azure] Indexer service tests are failing with invalid IPv6 address2021-04-13T14:26:30ZAman Verma[Azure] Indexer service tests are failing with invalid IPv6 addressIntegration tests for indexer service are failing with error-
java.lang.AssertionError: 5fe68ea3f85a44a3b3f620e33ae00286.centralus.azure.elastic-cloud.com:9243: invalid IPv6 address
at org.opengroup.osdu.util.ElasticUtils.deleteIndex(E...Integration tests for indexer service are failing with error-
java.lang.AssertionError: 5fe68ea3f85a44a3b3f620e33ae00286.centralus.azure.elastic-cloud.com:9243: invalid IPv6 address
at org.opengroup.osdu.util.ElasticUtils.deleteIndex(ElasticUtils.java:203)
at org.opengroup.osdu.common.SchemaServiceRecordSteps.deleteIndex(SchemaServiceRecordSteps.java:37)
at org.opengroup.osdu.common.SchemaServiceRecordSteps.lambda$the_schema_is_created_with_the_following_kind$0(SchemaServiceRecordSteps.java:21)
Sample run:
https://community.opengroup.org/osdu/platform/system/indexer-service/-/jobs/235943
cc: @manishkMANISH KUMARMANISH KUMARhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/14[Azure] Entitlements service calls from reindex API are failing2021-03-19T19:05:34ZAman Verma[Azure] Entitlements service calls from reindex API are failing**Description**
The entitlements call during `/reindex` API fail with `NullPointerException` when entitlement URI is the pod name in deployment.yaml files. When the entire URL is hard-coded in deployment.yaml file, requests goes through ...**Description**
The entitlements call during `/reindex` API fail with `NullPointerException` when entitlement URI is the pod name in deployment.yaml files. When the entire URL is hard-coded in deployment.yaml file, requests goes through successfully.
cc: @kiveerap , @manishk
Exception
[{"severityLevel":"Error","parsedStack":[{"method":"java.io.Reader.<init>","level":0,"line":78,"fileName":"Reader.java"},{"method":"java.io.InputStreamReader.<init>","level":1,"line":72,"fileName":"InputStreamReader.java"},{"method":"org.opengroup.osdu.core.common.http.AbstractHttpClient.getBody","level":2,"line":72,"fileName":"AbstractHttpClient.java"},{"method":"org.opengroup.osdu.core.common.http.AbstractHttpClient.send","level":3,"line":53,"fileName":"AbstractHttpClient.java"},{"method":"org.opengroup.osdu.core.common.http.HttpClient.send","level":4,"line":25,"fileName":"HttpClient.java"},{"method":"org.opengroup.osdu.core.common.entitlements.EntitlementsService.getGroups","level":5,"line":73,"fileName":"EntitlementsService.java"},{"method":"org.opengroup.osdu.core.common.entitlements.AuthorizationServiceImpl.authorizeAny","level":6,"line":38,"fileName":"AuthorizationServiceImpl.java"},{"method":"org.opengroup.osdu.indexer.middleware.AuthorizationFilter.hasPermission","level":7,"line":27,"fileName":"AuthorizationFilter.java"},{"method":"org.opengroup.osdu.indexer.middleware.AuthorizationFilter$$FastClassBySpringCGLIB$$585766ba.invoke","level":8,"line":-1,"fileName":"<generated>"},{"method":"org.springframework.cglib.proxy.MethodProxy.invoke","level":9,"line":218,"fileName":"MethodProxy.java"},{"method":"org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint","level":10,"line":752,"fileName":"CglibAopProxy.java"},{"method":"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed","level":11,"line":163,"fileName":"ReflectiveMethodInvocation.java"},{"method":"org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed","level":12,"line":136,"fileName":"DelegatingIntroductionInterceptor.java"},{"method":"org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke","level":13,"line":124,"fileName":"DelegatingIntroductionInterceptor.java"},{"method":"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed","level":14,"line":186,"fileName":"ReflectiveMethodInvocation.java"},{"method":"org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept","level":15,"line":691,"fileName":"CglibAopProxy.java"},{"method":"org.opengroup.osdu.indexer.middleware.AuthorizationFilter$$EnhancerBySpringCGLIB$$f6517c29.hasPermission","level":16,"line":-1,"fileName":"<generated>"},{"method":"sun.reflect.NativeMethodAccessorImpl.invoke","level":18,"line":62,"fileName":"NativeMethodAccessorImpl.java"},{"method":"sun.reflect.DelegatingMethodAccessorImpl.invoke","level":19,"line":43,"fileName":"DelegatingMethodAccessorImpl.java"},{"method":"java.lang.reflect.Method.invoke","level":20,"line":498,"fileName":"Method.java"},{"method":"org.springframework.expression.spel.support.ReflectiveMethodExecutor.execute","level":21,"line":130,"fileName":"ReflectiveMethodExecutor.java"},{"method":"org.springframework.expression.spel.ast.MethodReference.getValueInternal","level":22,"line":138,"fileName":"MethodReference.java"},{"method":"org.springframework.expression.spel.ast.MethodReference.access$000","level":23,"line":54,"fileName":"MethodReference.java"},{"method":"org.springframework.expression.spel.ast.MethodReference$MethodValueRef.getValue","level":24,"line":391,"fileName":"MethodReference.java"},{"method":"org.springframework.expression.spel.ast.CompoundExpression.getValueInternal","level":25,"line":90,"fileName":"CompoundExpression.java"},{"method":"org.springframework.expression.spel.ast.SpelNodeImpl.getTypedValue","level":26,"line":114,"fileName":"SpelNodeImpl.java"},{"method":"org.springframework.expression.spel.standard.SpelExpression.getValue","level":27,"line":308,"fileName":"SpelExpression.java"},{"method":"org.springframework.security.access.expression.ExpressionUtils.evaluateAsBoolean","level":28,"line":26,"fileName":"ExpressionUtils.java"},{"method":"org.springframework.security.access.expression.method.ExpressionBasedPreInvocationAdvice.before","level":29,"line":59,"fileName":"ExpressionBasedPreInvocationAdvice.java"},{"method":"org.springframework.security.access.prepost.PreInvocationAuthorizationAdviceVoter.vote","level":30,"line":72,"fileName":"PreInvocationAuthorizationAdviceVoter.java"},{"method":"org.springframework.security.access.prepost.PreInvocationAuthorizationAdviceVoter.vote","level":31,"line":40,"fileName":"PreInvocationAuthorizationAdviceVoter.java"},{"method":"org.springframework.security.access.vote.AffirmativeBased.decide","level":32,"line":63,"fileName":"AffirmativeBased.java"},{"method":"org.springframework.security.access.intercept.AbstractSecurityInterceptor.beforeInvocation","level":33,"line":233,"fileName":"AbstractSecurityInterceptor.java"},{"method":"org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke","level":34,"line":65,"fileName":"MethodSecurityInterceptor.java"},{"method":"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed","level":35,"line":186,"fileName":"ReflectiveMethodInvocation.java"},{"method":"org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept","level":36,"line":691,"fileName":"CglibAopProxy.java"},{"method":"org.opengroup.osdu.indexer.api.ReindexApi$$EnhancerBySpringCGLIB$$87095a0c.reindex","level":37,"line":-1,"fileName":"<generated>"},{"method":"org.opengroup.osdu.indexer.api.ReindexApi$$FastClassBySpringCGLIB$$8edc5574.invoke","level":38,"line":-1,"fileName":"<generated>"},{"method":"org.springframework.cglib.proxy.MethodProxy.invoke","level":39,"line":218,"fileName":"MethodProxy.java"},{"method":"org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint","level":40,"line":752,"fileName":"CglibAopProxy.java"},{"method":"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed","level":41,"line":163,"fileName":"ReflectiveMethodInvocation.java"},{"method":"org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed","level":42,"line":136,"fileName":"DelegatingIntroductionInterceptor.java"},{"method":"org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke","level":43,"line":124,"fileName":"DelegatingIntroductionInterceptor.java"},{"method":"org.springframework.aop.framework.ReflectiveMethodInvocation.proceed","level":44,"line":186,"fileName":"ReflectiveMethodInvocation.java"},{"method":"org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept","level":45,"line":691,"fileName":"CglibAopProxy.java"},{"method":"org.opengroup.osdu.indexer.api.ReindexApi$$EnhancerBySpringCGLIB$$9b5b5d63.reindex","level":46,"line":-1,"fileName":"<generated>"},{"method":"sun.reflect.NativeMethodAccessorImpl.invoke","level":48,"line":62,"fileName":"NativeMethodAccessorImpl.java"},{"method":"sun.reflect.DelegatingMethodAccessorImpl.invoke","level":49,"line":43,"fileName":"DelegatingMethodAccessorImpl.java"},{"method":"java.lang.reflect.Method.invoke","level":50,"line":498,"fileName":"Method.java"},{"method":"org.springframework.web.method.support.InvocableHandlerMethod.doInvoke","level":51,"line":190,"fileName":"InvocableHandlerMethod.java"},{"method":"org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest","level":52,"line":138,"fileName":"InvocableHandlerMethod.java"},{"method":"org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle","level":53,"line":105,"fileName":"ServletInvocableHandlerMethod.java"},{"method":"org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod","level":54,"line":892,"fileName":"RequestMappingHandlerAdapter.java"},{"method":"org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal","level":55,"line":797,"fileName":"RequestMappingHandlerAdapter.java"},{"method":"org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle","level":56,"line":87,"fileName":"AbstractHandlerMethodAdapter.java"},{"method":"org.springframework.web.servlet.DispatcherServlet.doDispatch","level":57,"line":1040,"fileName":"DispatcherServlet.java"},{"method":"org.springframework.web.servlet.DispatcherServlet.doService","level":58,"line":943,"fileName":"DispatcherServlet.java"},{"method":"org.springframework.web.servlet.FrameworkServlet.processRequest","level":59,"line":1006,"fileName":"FrameworkServlet.java"},{"method":"org.springframework.web.servlet.FrameworkServlet.doPost","level":60,"line":909,"fileName":"FrameworkServlet.java"},{"method":"javax.servlet.http.HttpServlet.service","level":61,"line":652,"fileName":"HttpServlet.java"},{"method":"org.springframework.web.servlet.FrameworkServlet.service","level":62,"line":883,"fileName":"FrameworkServlet.java"},{"method":"javax.servlet.http.HttpServlet.service","level":63,"line":733,"fileName":"HttpServlet.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":64,"line":231,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":65,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.tomcat.websocket.server.WsFilter.doFilter","level":66,"line":53,"fileName":"WsFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":67,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":68,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.opengroup.osdu.azure.filters.TransactionLogFilter.doFilter","level":69,"line":67,"fileName":"TransactionLogFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":70,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":71,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.boot.actuate.web.trace.servlet.HttpTraceFilter.doFilterInternal","level":72,"line":88,"fileName":"HttpTraceFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":73,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":74,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":75,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":76,"line":320,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.access.ExceptionTranslationFilter.doFilter","level":77,"line":119,"fileName":"ExceptionTranslationFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":78,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.session.SessionManagementFilter.doFilter","level":79,"line":137,"fileName":"SessionManagementFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":80,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter","level":81,"line":111,"fileName":"AnonymousAuthenticationFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":82,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter","level":83,"line":170,"fileName":"SecurityContextHolderAwareRequestFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":84,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter","level":85,"line":63,"fileName":"RequestCacheAwareFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":86,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.authentication.logout.LogoutFilter.doFilter","level":87,"line":116,"fileName":"LogoutFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":88,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal","level":89,"line":74,"fileName":"HeaderWriterFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":90,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":91,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter","level":92,"line":105,"fileName":"SecurityContextPersistenceFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":93,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal","level":94,"line":56,"fileName":"WebAsyncManagerIntegrationFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":95,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter","level":96,"line":334,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.FilterChainProxy.doFilterInternal","level":97,"line":215,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.security.web.FilterChainProxy.doFilter","level":98,"line":178,"fileName":"FilterChainProxy.java"},{"method":"org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate","level":99,"line":358,"fileName":"DelegatingFilterProxy.java"},{"method":"org.springframework.web.filter.DelegatingFilterProxy.doFilter","level":100,"line":271,"fileName":"DelegatingFilterProxy.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":101,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":102,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.opengroup.osdu.azure.filters.Slf4jMDCFilter.doFilter","level":103,"line":48,"fileName":"Slf4jMDCFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":104,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":105,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.web.filter.RequestContextFilter.doFilterInternal","level":106,"line":100,"fileName":"RequestContextFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":107,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":108,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":109,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.web.filter.FormContentFilter.doFilterInternal","level":110,"line":93,"fileName":"FormContentFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":111,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":112,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":113,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal","level":114,"line":94,"fileName":"HiddenHttpMethodFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":115,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":116,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":117,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.filterAndRecordMetrics","level":118,"line":114,"fileName":"WebMvcMetricsFilter.java"},{"method":"org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal","level":119,"line":104,"fileName":"WebMvcMetricsFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":120,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":121,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":122,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal","level":123,"line":201,"fileName":"CharacterEncodingFilter.java"},{"method":"org.springframework.web.filter.OncePerRequestFilter.doFilter","level":124,"line":119,"fileName":"OncePerRequestFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":125,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":126,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"com.microsoft.applicationinsights.web.internal.WebRequestTrackingFilter.doFilter","level":127,"line":143,"fileName":"WebRequestTrackingFilter.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.internalDoFilter","level":128,"line":193,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.ApplicationFilterChain.doFilter","level":129,"line":166,"fileName":"ApplicationFilterChain.java"},{"method":"org.apache.catalina.core.StandardWrapperValve.invoke","level":130,"line":202,"fileName":"StandardWrapperValve.java"},{"method":"org.apache.catalina.core.StandardContextValve.invoke","level":131,"line":96,"fileName":"StandardContextValve.java"},{"method":"org.apache.catalina.authenticator.AuthenticatorBase.invoke","level":132,"line":541,"fileName":"AuthenticatorBase.java"},{"method":"org.apache.catalina.core.StandardHostValve.invoke","level":133,"line":139,"fileName":"StandardHostValve.java"},{"method":"org.apache.catalina.valves.ErrorReportValve.invoke","level":134,"line":92,"fileName":"ErrorReportValve.java"},{"method":"org.apache.catalina.core.StandardEngineValve.invoke","level":135,"line":74,"fileName":"StandardEngineValve.java"},{"method":"org.apache.catalina.connector.CoyoteAdapter.service","level":136,"line":343,"fileName":"CoyoteAdapter.java"},{"method":"org.apache.coyote.http11.Http11Processor.service","level":137,"line":373,"fileName":"Http11Processor.java"},{"method":"org.apache.coyote.AbstractProcessorLight.process","level":138,"line":65,"fileName":"AbstractProcessorLight.java"},{"method":"org.apache.coyote.AbstractProtocol$ConnectionHandler.process","level":139,"line":868,"fileName":"AbstractProtocol.java"},{"method":"org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun","level":140,"line":1589,"fileName":"NioEndpoint.java"},{"method":"org.apache.tomcat.util.net.SocketProcessorBase.run","level":141,"line":49,"fileName":"SocketProcessorBase.java"},{"method":"java.util.concurrent.ThreadPoolExecutor.runWorker","level":142,"line":1149,"fileName":"ThreadPoolExecutor.java"},{"method":"java.util.concurrent.ThreadPoolExecutor$Worker.run","level":143,"line":624,"fileName":"ThreadPoolExecutor.java"},{"method":"org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run","level":144,"line":61,"fileName":"TaskThread.java"},{"method":"java.lang.Thread.run","level":145,"line":748,"fileName":"Thread.java"}],"outerId":"0","message":"java.lang.NullPointerException","type":"java.lang.NullPointerException","id":"1155353803"}]https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/15Indexer fails to create/index when deploying with a new ElasticSearch instance2021-05-20T16:11:25ZMatt WiseIndexer fails to create/index when deploying with a new ElasticSearch instanceI Stood up a fresh ElasticSearch 7 and this is when the problems started. It seems the reason the int tests are passing on GL is that if schemas are already registered in ES, new indexes can be created with no issue. However on a fresh...I Stood up a fresh ElasticSearch 7 and this is when the problems started. It seems the reason the int tests are passing on GL is that if schemas are already registered in ES, new indexes can be created with no issue. However on a fresh infrastructure deployment, we see the following issue:
Shortened Stack Trace:
`
2021-03-10 16:48:45.357 ERROR 19 --- [-nio-443-exec-6] o.o.o.c.common.logging.DefaultLogWriter : indexer.app: Elasticsearch exception [type=mapper_parsing_exception, reason=Failed to parse mapping [_doc]: No handler for type [flattened] declared on field [tags]] AppException(error=AppError(code=500, reason=Unknown error, message=An unknown error has occurred., errors=null, debuggingInfo=null, originalException=ElasticsearchStatusException[Elasticsearch exception [type=mapper_parsing_exception, reason=Failed to parse mapping [_doc]: No handler for type [flattened] declared on field [tags]]]; nested: ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=No handler for type [flattened] declared on field [tags]]];), originalException=ElasticsearchStatusException[Elasticsearch exception [type=mapper_parsing_exception, reason=Failed to parse mapping [_doc]: No handler for type [flattened] declared on field [tags]]]; nested: ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=No handler for type [flattened] declared on field [tags]]];) at org.opengroup.osdu.indexer.service.IndexerServiceImpl.processRecordChangedMessages(IndexerServiceImpl.java:153) at org.opengroup.osdu.indexer.api.RecordIndexerApi.indexWorker(RecordIndexerApi.java:79)
`ethiraj krishnamanaiduWladmir FrazaoJoeDmitriy RudkoJasonethiraj krishnamanaiduhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/17Indexer to support new endpoint to expose Indexing Schema converted from Sche...2022-04-06T14:07:58ZGary MurphyIndexer to support new endpoint to expose Indexing Schema converted from Schema Service**Summary**
The Indexer service needs to be modified so that it exposes the output of Schema Service schema conversion into Storage Schema schemas. This is needed for developers as well as operators of the system in order to check qua...**Summary**
The Indexer service needs to be modified so that it exposes the output of Schema Service schema conversion into Storage Schema schemas. This is needed for developers as well as operators of the system in order to check quality and expected results in Search.
**Details**
The Indexer has been modified to support schemas from the Schema Service as well as the original Storage Schemas which are still the basis for actual indexing by Elastic. Here is the issue:
[Indexer to Support Schema Service schemas](https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/7)
However, from a support perspective, the fact that Schema Service schemas are converted internally by the Indexer to produce actionable Storage Schemas is problematic in that it doesn't allow developers and consumers any insight as to the converted schema and how or whether the conversion has been done successfully.
The proposed solution is to add a new endpoint to the Indexer service that will return the converted Schema representation based on the input Schema Service schema. This will make the schema conversion process transparent and allow checks on the content as well as allow better debugging of either Schema Service schemas or the conversion process itself.
<br>Input = Schema Service Schema (JSON representation)
Output = Index Schema
<br>Note that the Input Schema can be from any source (i.e. not necessarily resident in Schema Service already) and will be stateless on the Schema and Storage Schema Services themselves.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/18Indexer error handling improvement2021-04-30T16:40:14Zethiraj krishnamanaiduIndexer error handling improvementImproves handling of erroneous schemas, currently indexer does not index any record if Schema service schema cannot be parsed correctly by Indexer. Indexer must index meta-attribute at very minimum if indexing fails for Schema processing...Improves handling of erroneous schemas, currently indexer does not index any record if Schema service schema cannot be parsed correctly by Indexer. Indexer must index meta-attribute at very minimum if indexing fails for Schema processing.
* Try to get as many errors as possible from the schema converter
* Return null if schema has an error
Here is some of common mistakes users are making on Schema service schema
- missing type attributes -- it's required attribute for indexer
- ref cannot be resolved for various reason, here some examples where indexer schema processing will fail
- "$ref":"slb:wks:toManyRelationship:1.0.0" -- missing definitions
- "$ref":"#/definitions/relationships" – where ‘relationships’ does not exist on ‘definitions’.
- "$ref":"#/definitions/Relationships" – ‘definition’ section has ‘relationship’ (all lower case).
- "$ref":"#/definitions/relationsh" -- definition’ section has ‘relationship’ (spelling mistakes)
Errors are available in provider error logs as well as indexed documents. Users can search those via this [query](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexerService.md#get-indexing-status)Neelesh ThakurNeelesh Thakurhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/21Upgrade Core Common Dependency2022-02-11T22:01:33ZDavid Diederichd.diederich@opengroup.orgUpgrade Core Common Dependencyhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/22Upgrade Core AWS Dependency2022-02-11T22:01:30ZDavid Diederichd.diederich@opengroup.orgUpgrade Core AWS Dependencyhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/23Upgrade Core Azure Dependency2022-02-11T22:01:28ZDavid Diederichd.diederich@opengroup.orgUpgrade Core Azure Dependencyhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/24Upgrade Core GCP Dependency2022-02-11T22:01:25ZDavid Diederichd.diederich@opengroup.orgUpgrade Core GCP Dependencyhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/25Upgrade Core IBM Dependency2022-02-11T22:01:22ZDavid Diederichd.diederich@opengroup.orgUpgrade Core IBM Dependencyhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/26While using indexer API to re-index the data there seems to be some sort of c...2022-11-09T17:03:13ZKamlesh TodaiWhile using indexer API to re-index the data there seems to be some sort of corruption occurring thereby requiring to reload the data.While working on testing the nested search, I was not able to search the nested data. At which point I was told to verify that records were reindexed, and mappings at elastic are up to date. While using the following command e.g. to rein...While working on testing the nested search, I was not able to search the nested data. At which point I was told to verify that records were reindexed, and mappings at elastic are up to date. While using the following command e.g. to reindex WellLog data
curl --location --request POST 'https://4iqp6vd659.execute-api.us-west-2.amazonaws.com/api/indexer/v2/reindex?force_clean=true' \
--header 'data-partition-id: osdu' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer eyJraWQiOiIwekkwRHB5RzBZK3pUUTJpVlZ...' \
--data-raw '{
"kind":"osdu:wks:work-product-component--WellLog:1.0.0"
}'
The response code: 200 OK
But I was still not able to access the nested data. Upon complaining to CSPs, I was told that data had to be reloaded as it got messed up when I made the reindex request.
This kind of behavior was seen in multiple CSP environments (AWS, IBM. Azure)
I understand that in the normal workflow when everything is set correctly one should not have to re-index. Reindexing is not part of the normal workflow performed by users. But when it is required to be performed, it should work correctly, and more importantly, it should not corrupt/mess up the existing data.
@ChrisZhang @ethiraj @ashams_s @Wibben @Kateryna_Kurach @meenarathinavel @manishk @wladmirf @anujgupta @ankitsharmahttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/27Indexer service does not indexes actual schema parsing errors2021-10-25T21:04:33ZAn NgoIndexer service does not indexes actual schema parsing errorsIf schema parsing fails with exception than Indexer service indexes generic 'schema not found' instead of actual schema parsing errors
**Example: **
Payload:
```
{
"kind": "slb-osdu-dev-des-prod-testing:oracle:prosource-wellbore:3.0...If schema parsing fails with exception than Indexer service indexes generic 'schema not found' instead of actual schema parsing errors
**Example: **
Payload:
```
{
"kind": "slb-osdu-dev-des-prod-testing:oracle:prosource-wellbore:3.0.0",
"returnedFields": ["id", "index"]
}
```
Response:
```
{
"results": [
{
"index": {
"trace": [
"schema not found"
],
"statusCode": 404,
"lastUpdateTime": "2021-06-24T14:23:50.596Z"
},
"id": "slb-osdu-dev-sis-internal-hq:wks:seismicSurvey3d-Ukd1cHRhNy1UZXN0MQ"
}
],
"totalCount": 1
}
```
On logs the actual error message was found:` org.opengroup.osdu.indexer.schema.converter.exeption.SchemaProcessingException: Errors occurred during parsing the schema, kind: osdu:osdu:Wellbore:1.0.0 | errors: Wrong definition format:AbstractSpatialLocation:1.0.0`https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/28Indexer fails to index when items is missing for array attribute2021-10-25T21:04:57ZNeelesh ThakurIndexer fails to index when items is missing for array attribute**Steps to reproduce:**
Create a schema where array attribute has items missing.
Example :-
```
"Wellbores": {
"pattern": ".*U1A1NjA3MDM2Mzk5MzUy:.*",
"description": "The Well ID reference.",
"x-osdu-relationship": [
{
...**Steps to reproduce:**
Create a schema where array attribute has items missing.
Example :-
```
"Wellbores": {
"pattern": ".*U1A1NjA3MDM2Mzk5MzUy:.*",
"description": "The Well ID reference.",
"x-osdu-relationship": [
{
"EntityType": "Wellbore",
"GroupType": "master-data"
}
],
"type": "array"
}
```
**What I expected to happen, and what actually happened:**
Expected correct attributes in the schema should be indexed and message should be logged for attribute attribute not indexed due to items missing in the attribute for the schema
**Actual **- indexer completed rejected the record.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/29Indexer service fail to index with nested geopoint2021-10-25T21:05:17ZNeelesh ThakurIndexer service fail to index with nested geopointIndexing entity: opendes-testing:methodrun:UHl0aG9uOiBsaW1pdCBwcm9wZXJ0aWVzL3NsYi1vc2R1LWRldi1kZXMtcHJvZC10ZXN0aW5nOmxvZ1NldDo3YzRkZGMwMC1jOTRhLTU3OTUtODlhNi0yYWJmYmNiNmQxYTQvQUxMLzUwMC4wL1pPTkFUSU9OX0FMTC8yMDE1LTAyLTA1VDAyOjIxOjU5
Kin...Indexing entity: opendes-testing:methodrun:UHl0aG9uOiBsaW1pdCBwcm9wZXJ0aWVzL3NsYi1vc2R1LWRldi1kZXMtcHJvZC10ZXN0aW5nOmxvZ1NldDo3YzRkZGMwMC1jOTRhLTU3OTUtODlhNi0yYWJmYmNiNmQxYTQvQUxMLzUwMC4wL1pPTkFUSU9OX0FMTC8yMDE1LTAyLTA1VDAyOjIxOjU5
Kind of entity: opendes:log:methodrun:1.0.0
Expecting the entity to be indexed correctly.
Current behavior: Entity is not indexed.
ex: the following search query shows no results
```
{
"kind": "opendes:log:methodrun:1.0.0",
"limit": 100,
"query": "data.project_name:\"new_2\"",
"returnedFields": ["id", "data.project_name"],
"cursor": ""
}
```
would expect this entity to be shown:
```
{
"data": {
.....
"project_name": "new_2",
.....
}
"id": "opendes- testing:methodrun:UHl0aG9uOiBsaW1pdCBwcm9wZXJ0aWVzL3NsYi1vc2R1LWRldi1kZXMtcHJvZC10ZXN0aW5nOmxvZ1NldDo3YzRkZGMwMC1jOTRhLTU3OTUtODlhNi0yYWJmYmNiNmQxYTQvQUxMLzUwMC4wL1pPTkFUSU9OX0FMTC8yMDE1LTAyLTA1VDAyOjIxOjU5",
"version": 1628005603319004,
"kind": "opendes:log:methodrun:1.0.0",
.....
}
```https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/30FoR (Frame-of-Reference): FoR value should be passed as query parameter in AP...2021-08-24T13:02:13ZRitika KaushalFoR (Frame-of-Reference): FoR value should be passed as query parameter in API units=SI&crs=wgs84&elevation=msl&azimuth=true north&dates=utc; or as header in requestWith reference to discussion on #55 regarding FoR to be passed as header or query parameter?With reference to discussion on #55 regarding FoR to be passed as header or query parameter?https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/31audit-attributes are not queryable on existing kinds2021-10-25T20:53:46ZNeelesh Thakuraudit-attributes are not queryable on existing kindsElasticsearch allows unmapped field to be included in search service response. This creates issue for newly added audit-attributes on pre-existing indexes. As new record start to come in for pre-existing kinds, these unmapped attributes ...Elasticsearch allows unmapped field to be included in search service response. This creates issue for newly added audit-attributes on pre-existing indexes. As new record start to come in for pre-existing kinds, these unmapped attributes will get populated on record index and returned on Search service response but users cannot query on these attributes. We can fix this problem with re-index force_clean=true option but this will introduce down-time. Indexer should sync meta attribute mapping when it starts to process records for existing kinds.
This issue won't impact new kinds. Elasticsearch addresses this issue but it's not available in current version.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/32Question - How to troubleshoot normalizer if indexed data is falling short of...2021-10-25T14:55:33ZDebasis ChatterjeeQuestion - How to troubleshoot normalizer if indexed data is falling short of expectation?In my test case, I am trying to convert unit (ft to meter). Provided suitable information in meta portion but resultant data in Elasticsearch does not show converted values. I suspect there is some issue in the way I provided "meta" info...In my test case, I am trying to convert unit (ft to meter). Provided suitable information in meta portion but resultant data in Elasticsearch does not show converted values. I suspect there is some issue in the way I provided "meta" information. So, I want to check in relevant place to troubleshoot. Please provide some pointers.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/33Simplify way to provide "meta" information (Ex: Unit of Measure) for Normalizer2022-11-24T11:28:06ZDebasis ChatterjeeSimplify way to provide "meta" information (Ex: Unit of Measure) for NormalizerI am showing sample from Postman collection provided by "CSV Ingestion" Dev team. @frubio
My question - should we not offer option to simple use suitable entry in UnitOfMeasure Reference entity?
Assuming, of course, that entry for "ft"...I am showing sample from Postman collection provided by "CSV Ingestion" Dev team. @frubio
My question - should we not offer option to simple use suitable entry in UnitOfMeasure Reference entity?
Assuming, of course, that entry for "ft" exists with conversion to SI unit (meter for length).
Thus spare the complexity from here and avoid human errors.
Note to @ChrisZhang - we can discuss details offline. Thank you
```
{
"kind": "Unit",
"name": "ft",
"persistableReference": "{\"scaleOffset\":{\"scale\":0.3048,\"offset\":0.0},\"symbol\":\"ft\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
"propertyNames": [
"MD",
"TVD",
"ELEVATION"
],
"propertyValues": [
"ft"
],
"uncertainty": 0
}
```https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/34Reindex API not working2022-06-22T16:33:43ZNeelesh ThakurReindex API not workingReindex API fails in general failing for most of the requests with following error messages:
```json
{
"code": 415,
"reason": "Unsupported media type",
"message": "upstream server responded with unsupported media type: text/...Reindex API fails in general failing for most of the requests with following error messages:
```json
{
"code": 415,
"reason": "Unsupported media type",
"message": "upstream server responded with unsupported media type: text/html"
}
```
This happens on larger data set (> 100K) on most of requests and never gets finished.
Reindex API uses different Storage API to retrieve and per log message if it encounters non-json response from storage service, reindex will stop working.
Another insight, reindex API uses Storage get record by Kind using cursor. Cursor returned by this API at times is too long, when we try to make request to this API (tried with Postman rest client) using such cursor we do get non-JSON 400 response.
```html
<!doctype html>
<html lang="en">
<head>
<title>HTTP Status 400 – Bad Request</title>
<style type="text/css">
body {
font-family: Tahoma, Arial, sans-serif;
}
h1,
h2,
h3,
b {
color: white;
background-color: #525D76;
}
h1 {
font-size: 22px;
}
h2 {
font-size: 16px;
}
h3 {
font-size: 14px;
}
p {
font-size: 12px;
}
a {
color: black;
}
.line {
height: 1px;
background-color: #525D76;
border: none;
}
</style>
</head>
<body>
<h1>HTTP Status 400 – Bad Request</h1>
</body>
</html>
```M11 - Release 0.14https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/35To be useful as intended, data.ExtensionProperties needs a schema extension2021-11-03T14:48:15ZGary MurphyTo be useful as intended, data.ExtensionProperties needs a schema extensionIf the data.ExtensionProperties block is to be searchable, it needs a schema definition, and due to the nature of ExtensionProperties (i.e. extensible information on the data block that is not known at the time of the record schema creat...If the data.ExtensionProperties block is to be searchable, it needs a schema definition, and due to the nature of ExtensionProperties (i.e. extensible information on the data block that is not known at the time of the record schema creation and assignment), that schema definition can't require updating a (possibly) locked schema definition or creating a new schema for each change.
<br>
It may be the case that proposed additions to the Schema Service and definition around virtual properties and schema extensions will handle this case.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/36Azure implementation not executed for reindex and worker tasks2021-10-01T17:31:47ZVibhuti Sharma [Microsoft]Azure implementation not executed for reindex and worker tasksDue to the methods not being overridden, the IndexerQueueTaskBuilderAzure is not getting executed. Instead, the core functionality itself is being executed for reindex task creation as well as worker task creation.Due to the methods not being overridden, the IndexerQueueTaskBuilderAzure is not getting executed. Instead, the core functionality itself is being executed for reindex task creation as well as worker task creation.Vibhuti Sharma [Microsoft]Vibhuti Sharma [Microsoft]https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/37Indexer returns only LAST geometry for "type": "AnyCrsGeometryCollection"2021-10-25T20:52:46ZAn NgoIndexer returns only LAST geometry for "type": "AnyCrsGeometryCollection"Indexer not able to return all the converted geometries for "type": "AnyCrsGeometryCollection", it returns only the last geometry type.
Sample record:
```
"AsIngestedCoordinates": {
"features": [
...Indexer not able to return all the converted geometries for "type": "AnyCrsGeometryCollection", it returns only the last geometry type.
Sample record:
```
"AsIngestedCoordinates": {
"features": [
{
"geometry": {
"type": "AnyCrsGeometryCollection",
"bbox": null,
"geometries": [
{
"type": "Point",
"bbox": null,
"coordinates": [
500000.0,
7000000.0
]
},
{
"type": "LineString",
"bbox": null,
"coordinates": [
[
501000.0,
7001000.0
],
[
502000.0,
7002000.0
]
]
}
]
},
"bbox": null,
"properties": {},
"type": "AnyCrsFeature"
}
],
"bbox": null,
"properties": {},
"persistableReferenceCrs": "{\"lateBoundCRS\":{\"wkt\":\"PROJCS[\\\"ED_1950_UTM_Zone_32N\\\",GEOGCS[\\\"GCS_European_1950\\\",DATUM[\\\"D_European_1950\\\",SPHEROID[\\\"International_1924\\\",6378388.0,297.0]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],PROJECTION[\\\"Transverse_Mercator\\\"],PARAMETER[\\\"False_Easting\\\",500000.0],PARAMETER[\\\"False_Northing\\\",0.0],PARAMETER[\\\"Central_Meridian\\\",9.0],PARAMETER[\\\"Scale_Factor\\\",0.9996],PARAMETER[\\\"Latitude_Of_Origin\\\",0.0],UNIT[\\\"Meter\\\",1.0],AUTHORITY[\\\"EPSG\\\",23032]]\",\"ver\":\"PE_10_3_1\",\"name\":\"ED_1950_UTM_Zone_32N\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"23032\"},\"type\":\"LBC\"},\"singleCT\":{\"wkt\":\"GEOGTRAN[\\\"ED_1950_To_WGS_1984_23\\\",GEOGCS[\\\"GCS_European_1950\\\",DATUM[\\\"D_European_1950\\\",SPHEROID[\\\"International_1924\\\",6378388.0,297.0]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],METHOD[\\\"Position_Vector\\\"],PARAMETER[\\\"X_Axis_Translation\\\",-116.641],PARAMETER[\\\"Y_Axis_Translation\\\",-56.931],PARAMETER[\\\"Z_Axis_Translation\\\",-110.559],PARAMETER[\\\"X_Axis_Rotation\\\",0.893],PARAMETER[\\\"Y_Axis_Rotation\\\",0.921],PARAMETER[\\\"Z_Axis_Rotation\\\",-0.917],PARAMETER[\\\"Scale_Difference\\\",-3.52],AUTHORITY[\\\"EPSG\\\",1612]]\",\"ver\":\"PE_10_3_1\",\"name\":\"ED_1950_To_WGS_1984_23\",\"authCode\":{\"auth\":\"EPSG\",\"code\":\"1612\"},\"type\":\"ST\"},\"ver\":\"PE_10_3_1\",\"name\":\"ED50 * EPSG-Nor N62 2001 / UTM zone 32N [23032,1612]\",\"authCode\":{\"auth\":\"SLB\",\"code\":\"23032023\"},\"type\":\"EBC\"}",
"persistableReferenceUnitZ": "{\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"scaleOffset\":{\"offset\":0.0,\"scale\":0.3048},\"symbol\":\"ft\",\"type\":\"USO\"}",
"type": "AnyCrsFeatureCollection"
}
```M9 - Release 0.12https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/38Indexer not able to handle three coordinates2021-10-25T20:52:20ZAn NgoIndexer not able to handle three coordinatesSample record:
```
"data": {
"SpatialLocation": {
"AsIngestedCoordinates": {
"features": [
{
"geometry": {
"coordinates": [
313405.9477893702,
6544797.620047403,
6.56167979...Sample record:
```
"data": {
"SpatialLocation": {
"AsIngestedCoordinates": {
"features": [
{
"geometry": {
"coordinates": [
313405.9477893702,
6544797.620047403,
6.561679790026246
],
"bbox": null,
"type": "AnyCrsPoint"
},
"bbox": null,
"properties": {},
"type": "AnyCrsFeature"
}
],
"bbox": null,
"properties": {},
"persistableReferenceCrs": "reference",
"persistableReferenceUnitZ": "reference",
"type": "AnyCrsFeatureCollection"
}, "Wgs84Corrdinates": {
"type": "FeatureCollection",
"bbox": null,
"features": [
{
"type": "Feature",
"bbox": null,
"geometry": {
"type": "Point",
"bbox": null,
"coordinates": [
5.7500000010406245,
59.000000000399105,
1.9999999999999998
]
},
"properties": {}
}
],
"properties": {},
"persistableReferenceCrs": null,
"persistableReferenceUnitZ": "reference"
}
"msg": "testing record 2",
"X": 16.00,
"Y": 10.00,
"Z": 0
}
}
```M9 - Release 0.12https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/41Consume schema service events2022-08-26T11:46:19ZSanjeev-SLBConsume schema service events**Summary**
Update the Indexer Service to listen to Schema service notifications (create/update).<br/>
**Details**
The Indexer service currently does not listen to Schema change events that can be published by Schema Service. As par...**Summary**
Update the Indexer Service to listen to Schema service notifications (create/update).<br/>
**Details**
The Indexer service currently does not listen to Schema change events that can be published by Schema Service. As part of a roadmap to automatically re-index records for extensibility (e.g. virtual properties added to schemas to indicate authoritative surface location) and discoverability, the Indexer needs to receive and process those events. <br/>
Note: the Schema Service publication of proper events is not covered by this issue. <br/>
**For Reference** the ADR highlighting the proposed extensibility mechanism for Schemas is here: https://community.opengroup.org/osdu/platform/system/search-service/-/issues/69<br/><br/>
**Typical Use Case**
User has ingested 100k records of a new wellbore type client:new:wellbore:1.0.0 and they have been indexed successfully.<br/>
1. A consuming application of those 100k records realizes that the new wellbore schema uses a different property for the default surface location which makes cross-kind searching difficult.
2. The owners of the consuming application add a virtual property to the new wellbore schema to specify the default location in line with other kinds. `{
"x-osdu-Virtual-properties":{
"data.VirtualProperties.DefaultLocation": {
"type": "object",
"priority": [
{ "path": "data.ProjectedBottomHoleLocation" },
{ "path": "data.GeographicBottomHoleLocation" },
{ "path": "data.SpatialLocation" }
]}
}
}
`
3. Once the schema extension has been added to the new wellbore schema, an event will be fired by Schema Service.
4. The Schema Service event indicating update of the new wellbore schema needs to be picked up by the Indexer so that the new DefaultLocation will be indexed by that property name and cross-kind discovery is enabled.Sanjeev-SLBSanjeev-SLBhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/43Records of a new Kind can be unsearchable due to race condition2022-08-23T13:30:00ZNitin-slbRecords of a new Kind can be unsearchable due to race conditionIf an Elasticsearch index (with proper schema definition and mapping) is not created ahead of record ingestion via Storage, Elasticsearch creates a default index mapping when Indexer processes the first records with that schema. Search s...If an Elasticsearch index (with proper schema definition and mapping) is not created ahead of record ingestion via Storage, Elasticsearch creates a default index mapping when Indexer processes the first records with that schema. Search service does not work with this default mapping.
Creating an index with proper mapping and making a shard ready typically takes a few seconds and an issue has been noticed when multiple Indexer service instances try to index a new kind. One instance will try to create the index, while another instance will see the index as created and start indexing with default mapping. This makes the kind/entity unsearchable.<br/><br/>
Simple (and common scenario):
- Ingestion job created that uses a new kind for the incoming records
- Ingestion job starts using multiple threads.
- When the new kind on the incoming records is encountered by the first indexer thread, it needs to be created (the index), and index creation starts
- In the few seconds the first indexer thread is creating the "real" index, other threads process N records (likely 1.5 * number of seconds for index creation + # of threads) using the default mapping
- The N records created using the default mapping are unusable.M11 - Release 0.14https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/46Rounding Error of Unit Conversion2022-10-06T20:27:22ZGregRounding Error of Unit ConversionRounding error of unit conversion, e.g. 10200 ms converted to 10.200000000000001 s for 'RecordLength' in the above example.
It returns inaccurate results from numerical query due to the inaccurate UoM data.Rounding error of unit conversion, e.g. 10200 ms converted to 10.200000000000001 s for 'RecordLength' in the above example.
It returns inaccurate results from numerical query due to the inaccurate UoM data.Chris ZhangChris Zhang2022-01-14https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/47Provide validation of schema and avoid surprise from indexed data (lacking fi...2021-12-08T01:37:14ZDebasis ChatterjeeProvide validation of schema and avoid surprise from indexed data (lacking fields)Because of gap in schema definition script, this is one case of user experience.
We can successfully populate record.
Retrieve data by using Storage service.
But Search service (query) misses many fields.
Even if we do usual troublesh...Because of gap in schema definition script, this is one case of user experience.
We can successfully populate record.
Retrieve data by using Storage service.
But Search service (query) misses many fields.
Even if we do usual troubleshooting (Storage - Get - id and index), there is no obvious clue anywhere about the failure.
See related issue
https://community.opengroup.org/osdu/platform/data-flow/ingestion/csv-parser/csv-parser/-/issues/64
Hence we propose that strict validation is added to avoid future surprise.
cc - @nthakur for informationhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/48Log4J Expedient Updates and Patches2021-12-17T17:28:06ZDavid Diederichd.diederich@opengroup.orgLog4J Expedient Updates and PatchesThis issue associates MRs that were applied to this project quickly to get a patched version ready as soon as possible. The intent is to provide a reference point for later, more thoughtful, analysis.This issue associates MRs that were applied to this project quickly to get a patched version ready as soon as possible. The intent is to provide a reference point for later, more thoughtful, analysis.David Diederichd.diederich@opengroup.orgDavid Diederichd.diederich@opengroup.orghttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/49Upgrade to Log4J 2.172021-12-21T02:22:29ZDavid Diederichd.diederich@opengroup.orgUpgrade to Log4J 2.17The Apache Foundation released another Log4j2 update, version 2.17, which address a denial of service vulnerability.
This issue tracks progress to upgrade this dependency for this project.The Apache Foundation released another Log4j2 update, version 2.17, which address a denial of service vulnerability.
This issue tracks progress to upgrade this dependency for this project.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/53Handle path prefix in Elasticsearch url2022-07-06T04:21:54ZVibhuti Sharma [Microsoft]Handle path prefix in Elasticsearch url# Context
Indexer and Search connection to Elasticsearch fails if the ElasticSearch Url contains a path after host. This is the value specified in the "**ELASTIC_HOST**" environment variable for the ITs, and "**elastic-username**" value...# Context
Indexer and Search connection to Elasticsearch fails if the ElasticSearch Url contains a path after host. This is the value specified in the "**ELASTIC_HOST**" environment variable for the ITs, and "**elastic-username**" value from partition info for the Services.
There is no provision to specify the path.
**Example**: **host.com** works but **host.com/elasticsearch** leads to failure.
# Problem
There has been an update in Azure elasticsearch endpoint. Earlier it did not have a path, now it does. Due to this Azure ITs are failing on master branch.
# Proposed Solution
For IT's, build the rest client by taking the **path** along with host, port and scheme into account.
For Services, we can revisit the way of specifying the URL. Instead of having ELASTIC_HOST, ELASTIC_SSL_ENABLED, ELASTIC_PORT, we can let the user specify ELASTIC_URL instead, in one shot.Vibhuti Sharma [Microsoft]Vibhuti Sharma [Microsoft]https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/54ADR: Delete api endpoint to delete index for a given kind2023-12-11T17:03:35ZSmitha ManjunathADR: Delete api endpoint to delete index for a given kind## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Today, Integration tests of <b>Search Service</b> connect to Elasticsearch directly to indices setup (add/update/delete) which are later search...## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
Today, Integration tests of <b>Search Service</b> connect to Elasticsearch directly to indices setup (add/update/delete) which are later searched using various query scenario types. This setup has following shortcomings:
- Elasticsearch, being a platform component, must not be exposed through a public interface directly. It also present a security risk.
- It's not a true black box testing of search service. The tests must be run via public interfaces of OSDU Apis.
## Decision
We are proposing to modify the way search tests are initialized: that is, instead of directly inserting records to Elasticsearch, we want to make the ITs add records via storage service which can be then searched.
As a tear-down/clean up procedure, we will need to delete indices created through the test cases and for this purpose, we need to have a <b>delete API in indexer service</b>.
Sample request :
```bash
curl --request DELETE \
--url '/api/indexer/v2/index?kind=opendes:welldb:wellbore:1.0.0' \
--header 'authorization: Bearer <JWT>' \
--header 'content-type: application/json' \
--header 'data-partition-id: opendes'
```
## Consequences
- As a clean up/run down process, implement a delete API in indexer which deletes index for a given kind. MR [273](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/273)
- Update Search service ITs setup & teardown and integrate kind index delete API. MR [229](https://community.opengroup.org/osdu/platform/system/search-service/-/merge_requests/229)Smitha ManjunathSmitha Manjunathhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/55Indexer returns 500 error on reindex for a specific type, with no easy way to...2022-03-29T22:54:34ZEric SchoenIndexer returns 500 error on reindex for a specific type, with no easy way to understand the underlying root causeWe're attempting to store and index some custom objects, and are running into indexer failures. We've installed a schema for our object type into the schema service, and the structure of that schema as returned by the schema service mat...We're attempting to store and index some custom objects, and are running into indexer failures. We've installed a schema for our object type into the schema service, and the structure of that schema as returned by the schema service matches our expectations and the general structure of intrinsic OSDU types. The records are being successfully stored--we can retrieve them by ID. However, we can't query them, and attempts to reindex return status 500 "An unknown error has occurred."
We realize that there might be issues with the schema, or with the data matching the schema, but there's no way to debug this problem without additional information. Is it unreasonable to ask that the indexer service return its more detailed error logs, or at least a correlation id of some sort to make it easier to find those detailed logs?https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/59Cache-control headers for a SLP specific API thru virtual service2022-04-19T15:41:44ZMina OtgonboldCache-control headers for a SLP specific API thru virtual serviceAdd cache-control headers for an SLP-specific API (QueryAttributeApi) thru virtual service.Add cache-control headers for an SLP-specific API (QueryAttributeApi) thru virtual service.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/61[Bug] [CRS normalization] Ingested manifest with 2D CRS Persistable reference...2022-09-30T11:40:34ZKateryna Kurach (EPAM)[Bug] [CRS normalization] Ingested manifest with 2D CRS Persistable reference cannot be found by Search1. Ingest the attached manifest
[SeismicBinGrid_2D.txt](/uploads/91b3863d969d59c8370cb14dd9c5ba86/SeismicBinGrid_2D.txt)
2. After airflow is done, check that the following request was executed successfully:
_GET https://{{STORAGE_HOST}}...1. Ingest the attached manifest
[SeismicBinGrid_2D.txt](/uploads/91b3863d969d59c8370cb14dd9c5ba86/SeismicBinGrid_2D.txt)
2. After airflow is done, check that the following request was executed successfully:
_GET https://{{STORAGE_HOST}}/records/{{data-partition-id}}:work-product-component--SeismicBinGrid:10May2Dpoly_
The record is displayed
3. Execute the following request:
_POST https://{{SEARCH_HOST}}/query_
with the body:
_{
"kind": "*:*:*:*",
"limit": 300,
"query": "id: \"https://{{STORAGE_HOST}}/records/{{data-partition-id}}:work-product-component--SeismicBinGrid:10May2Dpoly\""
}_
Result: the record is not returned.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/62[CRS normalization] Coordinates in the original CRS system are not preserved ...2023-09-19T11:15:44ZKateryna Kurach (EPAM)[CRS normalization] Coordinates in the original CRS system are not preserved in the indexSteps to reproduce:
1. Ingest a manifest with coordinates in the CRS system other than WSG84 (e.g. you can use attached)[wellbore_2D_AnyCRSLine.txt](/uploads/4f191886599f7aaa2f735b32de94f11a/wellbore_2D_AnyCRSLine.txt)
2. Execute the fol...Steps to reproduce:
1. Ingest a manifest with coordinates in the CRS system other than WSG84 (e.g. you can use attached)[wellbore_2D_AnyCRSLine.txt](/uploads/4f191886599f7aaa2f735b32de94f11a/wellbore_2D_AnyCRSLine.txt)
2. Execute the following query:
POST https://{{SEARCH_HOST}}/query
{
"kind": "*:*:*:*",
"limit": 300,
"query": "id: \"{{data-partition-id}}:master-data--Wellbore:10May2Dline\""
}
Result: coordinates were normalized (it is an expected result). However, coordinates were not preserved in the original CRS. Original CRS is not even specified in the returned document.
It seems that the best approach would be to have coordinates in the original CRS + coordinates in WGS84https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/63[Bug] [CRS normalization] CRS conversion with implicitly specified Z coordina...2022-05-19T07:35:09ZKateryna Kurach (EPAM)[Bug] [CRS normalization] CRS conversion with implicitly specified Z coordinate is not working if used AnyCrsMultiPolygon as a geometry typeSteps to reproduce:
1. Ingest the attached manifest [3D_polygon.txt](/uploads/1a87a2fb39c473a43af6df8547e7c10c/3D_polygon.txt)
2. Execute the following request:
POST https://{{SEARCH_HOST}}/query
{
"kind": "*:*:*:*",
"limit": 300,
"...Steps to reproduce:
1. Ingest the attached manifest [3D_polygon.txt](/uploads/1a87a2fb39c473a43af6df8547e7c10c/3D_polygon.txt)
2. Execute the following request:
POST https://{{SEARCH_HOST}}/query
{
"kind": "*:*:*:*",
"limit": 300,
"query": "id: \"odesprod:work-product-component--SeismicBinGrid:12may3Dpolygon\""
}
Expected result:
Coordinates are transformed into WGS84 CRS and are present in the output
Actual result:
No any coordinate information is displayedM12 - Release 0.15Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/67Indexer skipped the entire record with malformed SpatialLocation2022-09-09T21:58:18ZAn NgoIndexer skipped the entire record with malformed SpatialLocation**Observed current behavior:**
The entire record was skipped when the record's SpatialLocation attribute was missing the geometry coordinates.
**Expected behavior:**
When record is incorrect in a way that makes complete indexing impos...**Observed current behavior:**
The entire record was skipped when the record's SpatialLocation attribute was missing the geometry coordinates.
**Expected behavior:**
When record is incorrect in a way that makes complete indexing impossible, the Indexer should report success (200), but the incorrect parts of the record should not be indexed. The statusCode in the index should be what is used for all cases where data issues are detected.
I can index a record with incorrect SpatialLocation and Indexer returns 200.
For these "partially indexed" records, I can do a search where index.statusCode=400 are found.
I can see the partially indexed records by searching on index.statusCode = 400.
For statusCode = 400, I can see the error message in the index.trace field.
The error message should say what the failure was, in this case "Missing feature field in the <propertyName>" where I can see what the <propertyName> is".https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/70[Azure] Jackson conflic dependencies2022-07-27T01:01:25ZErnesto Gutierrez[Azure] Jackson conflic dependenciesJackson xml conflic version 2.11.4 with new jackson core 2.13.2. Causes intermitten indexer behavior, some entities are indexed and other don't.Jackson xml conflic version 2.11.4 with new jackson core 2.13.2. Causes intermitten indexer behavior, some entities are indexed and other don't.M12 - Release 0.15https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/71Indexer not creating new index in Elasticsearch when new schema is added2022-09-12T21:55:55ZYifei XuIndexer not creating new index in Elasticsearch when new schema is addedIt was noticed that Elastic search indexes are not created when we register a Schema. Instead, they are created when we ingest the data the first time. Index mappings are created automatically based on the ingested record, not based on t...It was noticed that Elastic search indexes are not created when we register a Schema. Instead, they are created when we ingest the data the first time. Index mappings are created automatically based on the ingested record, not based on the schema. Due to this behavior many attributes and data types are not properly indexed.
We want to understand if this is the intended behavior in the core code logic. This was at least observed on AWS.
Steps to Reproduce:
- Create new OSDU environment with sample data (Except “osdu:wks:dataset--FileCollection.Generic:1.0.0” data)
- Search for FileCollection Schema {{osdu_base_url}}/api/schema-service/v1/schema/osdu:wks:dataset--FileCollection.Generic:1.0.0. This will return the schema structure.
- Login to Elastic search container
- Run CURL to list indices matching FileCollection curl -u elastic:<pwd> https://localhost:9200/_cat/indices -k | grep -i file
- There will not be any index for FileCollection
- Use Dataset Service to add a record for FileCollection without Data.DatasetProperties.FileSourceInfos
- Login to Elastic search container search for the index using command curl -u elastic:<pwd> https://localhost:9200/_cat/indices -k | grep -i file
- Now new index will be created for FileCollection based on the payload and not by the Schema structure.
- The index will not have any mapping for Data.DatasetProperties.FileSourceInfos
Here are some important questions:
1. Should an index be created after a new schema is created?
1. If not, how will the index be created when a record is added (for cases with and without schema already present in the system)
1. What should happen to the index when the schema is updated?
@fhoueto.amz @gustavurda @debasisc @chad
M14 - Release 0.17Yifei XuGustavo UrdanetaYifei Xuhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/72Indexer service fails to process text array attribute inside nested array2022-08-15T14:01:18ZAn NgoIndexer service fails to process text array attribute inside nested array1. Call ReIndex API.
2. Error logged in index service
{"LoggerName":"org.opengroup.osdu.azure.logging.Slf4JLogger","LoggingLevel":"ERROR","SourceType":"Log4j","FileName":"CoreLogger.java","TimeStamp":"Tue, 02 Aug 2022 03:37:15 GMT","Li...1. Call ReIndex API.
2. Error logged in index service
{"LoggerName":"org.opengroup.osdu.azure.logging.Slf4JLogger","LoggingLevel":"ERROR","SourceType":"Log4j","FileName":"CoreLogger.java","TimeStamp":"Tue, 02 Aug 2022 03:37:15 GMT","LineNumber":"120","Logger Message":"indexer.app Elasticsearch exception [type=mapper_parsing_exception, reason=Failed to parse mapping [_doc]: **No handler for type [text_array]** declared on field [GeologicUnitInterpretationIDs]] {correlation-id=301c5dfc-d93d-4463-88d8-9014c10cb2f1, data-partition-id=testing-eu}","ThreadName":"http-nio-80-exec-1485","ClassName":"org.opengroup.osdu.azure.logging.CoreLogger","MethodName":"error","correlation-id":"301c5dfc-d93d-4463-88d8-9014c10cb2f1","user-id":"","data-partition-id":"testing-eu"}https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/74Not possible to upgrade core-common in parent pom without migration from spri...2023-03-31T11:25:39ZRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comNot possible to upgrade core-common in parent pom without migration from springfox to springdoc-openapiCurrently, in Indexer root pom core-common dependency is quite outdated and furthermore points to release candidate versions which could be easily erased during the repository clean-up routine:<br/>
https://community.opengroup.org/osdu/p...Currently, in Indexer root pom core-common dependency is quite outdated and furthermore points to release candidate versions which could be easily erased during the repository clean-up routine:<br/>
https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/pom.xml#L16
This core-common version propagates old spring-boot dependencies to provider modules:
~~~
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ indexer-service ---
[INFO] org.opengroup.osdu.indexer:indexer-service:pom:0.17.0-SNAPSHOT
[INFO] +- org.opengroup.osdu:os-core-common:jar:0.14.0-rc8:compile
[INFO] | +- org.springframework.boot:spring-boot-starter-web:jar:2.4.12:compile
[INFO] | | +- org.springframework.boot:spring-boot-starter:jar:2.4.12:compile
[INFO] | | | +- org.springframework.boot:spring-boot:jar:2.4.12:compile
~~~
But if we upgrade it to the latest release version `16.01` it will bring new spring dependencies:
~~~
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ indexer-service ---
[INFO] org.opengroup.osdu.indexer:indexer-service:pom:0.17.0-SNAPSHOT
[INFO] +- org.opengroup.osdu:os-core-common:jar:0.16.1:compile
[INFO] | +- org.springframework.boot:spring-boot-starter-web:jar:2.7.2:compile
[INFO] | | +- org.springframework.boot:spring-boot-starter:jar:2.7.2:compile
[INFO] | | | +- org.springframework.boot:spring-boot:jar:2.7.2:compile
~~~
And they are not compatible with spring-fox that used for API documentation by Indexer service:<br/>
https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/pom.xml#L150
Since spring-fox does not get updates anymore and is not compatible with new versions of spring-boot, it will block us in further dependency upgrades: <br/>
https://github.com/springfox/springfox/issues/3462
Upgrade will cause runtime errors and the Indexer service will not be able to start up:
~~~
org.springframework.context.ApplicationContextException: Failed to start bean 'documentationPluginsBootstrapper'; nested exception is java.lang.NullPointerException
at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:181)
at org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:54)
at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:356)
at java.lang.Iterable.forEach(Iterable.java:75)
at org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
at org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:147)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:734)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:408)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1295)
at org.opengroup.osdu.indexer.IndexerGcpApplication.main(IndexerGcpApplication.java:33)
Caused by: java.lang.NullPointerException: null
at springfox.documentation.spring.web.WebMvcPatternsRequestConditionWrapper.getPatterns(WebMvcPatternsRequestConditionWrapper.java:56)
~~~M15 - Release 0.18Rustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comRustam Lotsmanenko (EPAM)rustam_lotsmanenko@epam.comhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/76Some records are not reflecting in Search Service after ingestion2022-10-03T05:46:54ZNorman MedinaSome records are not reflecting in Search Service after ingestionWhen two or more users are ingesting thousands(~10,000) of records simultaneously via manifest ingestion, some records do not reflect/show in Search Service but can be queried using Storage Service. Airflow logs show no errors nor skippe...When two or more users are ingesting thousands(~10,000) of records simultaneously via manifest ingestion, some records do not reflect/show in Search Service but can be queried using Storage Service. Airflow logs show no errors nor skipped record ids. Current solution is to re-ingest the records.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/78ADR: Normalized kind indexed field2023-09-26T06:00:27ZMingyang ZhuADR: Normalized kind indexed field<a name="TOC"></a>
[[_TOC_]]
# Status
- [ ] Proposed
- [x] Approved
- [ ] Retired
# Context & Scope
Schema id includes the semantic versioning and is indexed as "kind" in the OSDU indexer service. Indexer indexes each "kind" as a se...<a name="TOC"></a>
[[_TOC_]]
# Status
- [ ] Proposed
- [x] Approved
- [ ] Retired
# Context & Scope
Schema id includes the semantic versioning and is indexed as "kind" in the OSDU indexer service. Indexer indexes each "kind" as a separate index in elastic search. Therefore, records from different schemas will have different "kind" and "index" in elastic search even for the same major version schemas. So far there is no direct attribute we can use from search to group (aggregateBy payload) the data by the schema major version. However, in the application, user may want to either group all major version of one data type or in some cases only care about the latest version of the same major version. We'd like to propose an approach to enable this for the OSDU applications.
[Back to TOC](#TOC)
---
# Requirement
- The proposed solution should solve the index major version issue without significant performance degradation
- The proposed solution should be compatible with the existing business data that upstream OSDU applications stores
[Back to TOC](#TOC)
---
## Approach 1
Elastic search allows to pass the script to create [runtime field](https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime.html) and then search or aggregate by such field. Since the indexed "kind" field already has all the information, but need to remove the minor and patch version from it. We could solve the problem from the OSDU search side to build the pre-defined runtime field script for user to consume.
The advantage of the approach is that we don't need to re-index the existing data. However, there is a cost that the server needs to run the script at runtime so there is performance degradation. We have done some load test to compare the aggregateBy on indexed field and runtime field. The performance degradation is pretty significant which is about 70% slower on median and 90%ile latency, so we pass this approach
[Back to TOC](#TOC)
---
## Approach 2 (Proposed)
Take the performance into account, we have to physically indexed the new field. We are proposing to index this additional field under [record tags](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/Guides/Chapters/06-LifecycleProperties.md#619-record-tags) field with a new sub attribute key "normalizedKind". The value of the "normalizedKind" will be derived from the original "kind" value by removing minor and patch version. E.g. if a mater-data--Wellbore record of kind "osdu:wks:master-data--Wellbore:1.1.0", such record will have a new field tags.normalizedKind with value "osdu:wks:master-data--Wellbore:1"
- Example of how to use the new field in search query
```
{
"query": "tags.normalizedKind:\"osdu:wks:master-data--Wellbore:1\""
}
```
- Example of how to use the new field in search aggregateBy
```
{
"aggregateBy": "tags.normalizedKind"
}
```
**This approach requires re-indexing operation during deployment to take effect on existing data.**
[Back to TOC](#TOC)M16 - Release 0.19Mingyang ZhuZhibin MaiMingyang Zhuhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/80Reindex API triggers storage event replay2024-03-01T12:03:32ZMingyang ZhuReindex API triggers storage event replayThe current reindex API implementation will trigger storage event replay therefore it causes all the downstream services which subscribe to the topic to handle the event.
By design, reindex API in indexer service should only impact elas...The current reindex API implementation will trigger storage event replay therefore it causes all the downstream services which subscribe to the topic to handle the event.
By design, reindex API in indexer service should only impact elastic search index only.https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/82Erroneous Inconsistent search results when adding or removing records2023-06-05T09:11:47ZMichaelErroneous Inconsistent search results when adding or removing recordsWe are adding and removing records from OSDU Storage, and we see erroneous results where a record that we add is not shown in the search result or a record that we delete is shown.
We tried waiting for the expected result from the sear...We are adding and removing records from OSDU Storage, and we see erroneous results where a record that we add is not shown in the search result or a record that we delete is shown.
We tried waiting for the expected result from the search and then trying again. We see erroneous results sometimes when we find it in the search results, but the next search does not find it. Similarly, for deleted records, we sometimes see the deletion only to see it not deleted in the next search.
Here is a video that demonstrates the inconsistent search results after a record has been created: [osdu_indexing_consistency.zip](/uploads/b7333d1e515516d0460d3dc874f9f34a/osdu_indexing_consistency.zip)
Here is the postman collection that is used in the video: [Indexing_Test.postman_collection.json](/uploads/e841351e1326883d95697bfebf7fc0d9/Indexing_Test.postman_collection.json)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/90ADR: new reindex API to reindex the given records2023-10-03T14:39:44ZMingyang ZhuADR: new reindex API to reindex the given records
## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context
As of now, indexer has a reindex API to reindex the whole given kind. The API is useful in the scenarios when index data need to be migr...
## Status
- [ ] Proposed
- [ ] Trialing
- [ ] Under review
- [X] Approved
- [ ] Retired
## Context
As of now, indexer has a reindex API to reindex the whole given kind. The API is useful in the scenarios when index data need to be migrated because of some bug fixes, new indexer features etc. Sometimes, it may not necessary to reindex the entire kind if we know the exact impact, so it will be good to have a reindex API that only reindex the given records.
The use cases of the new API could be:
1. If there is a indexer bug or new indexer feature deployed, and we know exactly what are the records been impacted, we could use such API to only reindex those records
2. When user ingests data, and data successfully created in storage, but failed to be indexed in indexer for any reason. Application could use such API to manually fix the impacted records instead of reindexing the whole kind
## API spec
```yaml
paths:
"/api/indexer/v2/reindex/records":
post:
requestBody:
content:
application/json:
shema:
$ref: '#/components/schemas/ReindexRecordsRequest'
schemas:
ReindexRecordsRequest:
type: object
properties:
recordIds:
type: array
items:
type: string
example: ["recordId1", "recordId2]
```
## Limit
We will limit the given number of records as 1000 initially
```M19 - Release 0.22Mingyang ZhuMingyang Zhuhttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/91Use specific topic instead of the storage record change topic to send the re-...2024-03-01T12:04:33ZZhibin MaiUse specific topic instead of the storage record change topic to send the re-index eventsIn current implementation of Azure indexer, re-index events share the same topic of the storage record change events. It creates several kinds of problems:
1. Create unnecessary load on the storage service as many other services monitor ...In current implementation of Azure indexer, re-index events share the same topic of the storage record change events. It creates several kinds of problems:
1. Create unnecessary load on the storage service as many other services monitor the storage change events and react, e.g. data synch with external datastores
2. It could affect the index/re-index performance if storage service is busy
3. Create unnecessary duplicate copies of the data, e.g. multiple copies/versions of wks records with extract same content could be created.
4. Events generated from re-index or index-extension could block storage record change events which could have impact on SLO requirements in terms of index update latency.
We should use specific topic for re-index to send and receive the re-index events.M19 - Release 0.22Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/93()2023-07-02T23:01:20ZAkshat Joshi()()()https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/94ADR: Replay API2023-12-13T11:36:34ZAkshat JoshiADR: Replay API
<a name="ppadhi"></a>OSDU - Replay API
# Table of Contents
[Context ](#_toc119676063)
[Decision ](#_toc119676075)
[Design ](#_toc119676076)
[Requirements to address ](#_toc119676077)
## Status
* [x] Proposed
* [ ] Trialing
* [ ] ...
<a name="ppadhi"></a>OSDU - Replay API
# Table of Contents
[Context ](#_toc119676063)
[Decision ](#_toc119676075)
[Design ](#_toc119676076)
[Requirements to address ](#_toc119676077)
## Status
* [x] Proposed
* [ ] Trialing
* [ ] Under review
* [ ] Approved
* [ ] Retired
## <a name="_toc119676063"></a>Context
This ADR is centered around the implementation of the new replay API within OSDU's storage service. The purpose of this Replay API is to publish messages that indicate changes to records, which are subsequently received and processed by consumers. It's important to note that the handling of these messages follows an idempotent process.
## <a name="_toc119676075"></a>Decision
The Replay API will address following-
a) **Disaster recovery -** All records in storage are brought back to RPO (Recovery Point Objective) state.
b) **Responsibility of publishing record change messages for consumer services** -
1. **Indexer Service** - The Indexer service will be the consumer to the reindex event.
2. **Schema Service**- Correction of indices after changes to structure of the storage records of a particular kind.
## <a name="_toc119676076"></a>Design
The following options were considered for Design -
|**Options**|**Pro**|**Cons**|**Work Required**|
| :- | :- | :- | :- |
|1. Using **Airflow** + Message Broker + StorageService + Workflow Service|<p>- Proven Workflow Engine</p><p>- Lesser new implementations in storage services, so lesser work required by other CSPs.</p>|<p>- Process becomes slower and inefficient.</p><p>- Lot of HTTP calls from Airflow <-> AKS</p><p>- Airflow will require access to internal Infrastructure to operate in the most efficient manner.</p><p>- Some required features are not yet available in ADF Airflow </p><p>- Parallelization may spawn up 1000s of tasks waiting to be scheduled. **Scalability can be issue.**</p><p>- Concurrency and Safety guarantee is tricky – allowing no more than one reindex for a kind</p><p></p>|<p>**Airflow**</p><p>- DAG using TaskGroups, Dynamic Task Mapping, Concurrency handling.</p><p>- Build pipelines to integrate new DAG.</p><p></p><p>**Storage Service**</p><p>- Implement new APIs to publish messages to message broker.</p><p></p><p>**Indexer Service**</p><p></p><p>**Workflow Service**</p><p>- Have new APIs to support observability</p><p>- Design for checkpointing</p>|
|2. Using **StorageService** + **Message Broker**|<p>- Simple, Lesser moving parts</p><p>- Fast & Efficient</p>|- Parallelization may require state management.|<p>**Storage Service**</p><p>- New APIs for exposing Replay functionality (ReplayAll, ReplayKind, GetReplayStatus)</p><p>- New Modules for replay message processing</p><p></p><p>**Indexer Service**</p><p>- Delete ALL kinds API</p>|
**Design Approach for option 2:**
![Aspose.Words.71972436-70f7-48df-8f1c-d2035f55ce34.004](/uploads/362b2ef367dc8e21657ba87f7777c60d/Aspose.Words.71972436-70f7-48df-8f1c-d2035f55ce34.004.png)
**Implementation Steps:**
Attaching the swagger yaml describing the Replay API.
[ReplayAPI_2.0.yaml](/uploads/2337ac52ea50c34ae50937c7086bfb9e/ReplayAPI_2.0.yaml)Akshat JoshiAkshat Joshihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/95ADR: Index AsIngestedCoordinates2024-03-18T14:08:04ZKeith WallADR: Index AsIngestedCoordinates# ADR: Index AsIngested Coordinates
@chad @gehrmann @Keith_Wall @LFlakes @josh.townsend @lifeiliu @Java1Guy @srabanaguha
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Re...# ADR: Index AsIngested Coordinates
@chad @gehrmann @Keith_Wall @LFlakes @josh.townsend @lifeiliu @Java1Guy @srabanaguha
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Background
- Discussed in OSDU Geomatics Integration workstream and supported by Shell, BP, Exxon and Equinor geomatics representatives.
- Discussed during AAF 2023-06-07 (Josh Townsend which has some recording and limited notes).
- Discussion further in issue !95 (this issue)
- Which refers to related issues:
- #62 (1 year ago; reporting in M12 the AsIngestedCoordinates are not returned; kept open but with answer that GET storage can be used to retrieve the original record.)
- [Issue 70 on geomatics board](https://gitlab.opengroup.org/osdu/subcommittees/ea/projects/geomatics/home/-/issues/70) (This is only a placeholder pointing to this issue 95 for monitoring. It has some interesting comments that belong really here, as follows:
- _"The Architectural Advice Forum did not endorse indexing the AsIngested Coordinates as spatial objects that would permit spatial search, but that is not needed or requested. As we discussed, there are at least two options that would allow return of the coordinates from search: (1) Index AsIngested as an ordinary array. or (2) Add data needed for search as extended properties.)"_
- _"Thomas @gehrmann and I discussed and agreed the most robust solution is to index the AsIngested coordinates and CRS as a simple array, not a spatial object."_
This ADR writeup by Bert Kampes on request by Chad Leong is to help Shell developers have a clear idea of the proposed changes/specification, distilled from the above sources. The solution "way forward" is agreed, but not yet marked as "Approved" until after comments are received on this ADR specification design.
# Context & Scope
AsIngestedCoordinates are currently not returned by the search, but only the Wgs84Coordinates (after normalization of ingested data that has an AbstractSpatialLocation). These Wgs84Coordinates are in a GeoJSON structure and potentially can contain a geometry with many vertices. At some point in the past a determination was made in OSDU architecture that returning AsIngestedCoordinates would not be necessary. It is true to Wgs84Coordinates are normalized and used for search. However, AsIngestedCoordinates and CRS are important properties to be available from Search results for example for a list of wells.
The Geomatics Workstream and others have commented that AsIngestedCoordinates were not returned as was expected.
We learned AsIngestedCoordinates were omitted by design because of fear of performance degredation and because these coordinate values are not used for searches in most use cases. (However they are used for discovery and QC across records; and existing solutions typically do allow search with logical operators).
A use case is for Well records. A developer may want to show to a user all the wells from a platform in a table, where one of the properties are the original coordinates and CRS. Currently this is only possible by retrieving each record through storage and it would be more efficient to have been returned by Search. Wells master data do not have an associated data file, such as a Wellbore might have in the form of a path in witsml.
Another use case is ingesting data without a BoundCRS, i.e., cannot be normalized to Wgs84. Then it is useful to have Original Coordinates in the array so someone can see there were coordinates but no Wgs84 coordinates normalized.
See also attached pptx from AAF and description and comments on issue !95.
[Back to TOC](#TOC)
## AbstractSpatialLocation
* [link to AbstractSpatialLocation](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/abstract/AbstractSpatialLocation.1.1.0.md), which has:
* Quality metadata
*
* And includes [AbstractAnyCrsFeatureCollection](https://community.opengroup.org/osdu/data/data-definitions/-/blob/master/E-R/abstract/AbstractAnyCrsFeatureCollection.1.1.0.md?ref_type=heads), _A schema like GeoJSON FeatureCollection with a non-WGS 84 CRS context; based on https://geojson.org/schema/FeatureCollection.json. Attention: the coordinate order is fixed: Longitude/Easting/Westing/X first, followed by Latitude/Northing/Southing/Y, optionally height as third coordinate_, which has:
* features[].geometry.type: Point, MultiPoint, LineString, MultiLineString, Polygon or MultiPolygon.
* features[].geometry.coordinates (array),
* And properties for
* CoordinateReferenceSystemID
* persistableReferenceCrs
* VerticalCoordinateReferenceSystemID
* persistableReferenceVerticalCrs
* VerticalUnitID
* persistableReferenceUnitZ
[Back to TOC](#TOC)
## Requirements
In addition to the simplified Elastic GeoJSON derived from Wgs84Coordinates that are currently already returned (i.e., no change to Wgs84Coordinates):
* (Efficient) method to see the first AsIngested Coordinates, with their horizontal (and possible vertical) CRS(s), and specific metadata on location quality (which are part of the AbstractSpatialLocation entity).
* It is expected that the first point coordinates are returned in search query responses if desired.
* The string properties are expected to be
1. usable in queries and
2. be returned in search query responses if desired.
* The coordinates of the first point are
1. numbers (in JSON speak floating point numbers), AsIngestedCoordinates.FirstPoint.X, AsIngestedCoordinates.FirstPoint.Y, AsIngestedCoordinates.FirstPoint.Z.
2. It is expected that the numbers can be used in simplistic box queries, provided the AsIngestedCoordinates.CoordinateReferenceSystemID (and AsIngestedCoordinates.VerticalCoordinateReferenceSystemID for 3D) are part of the query condition.
3. It is expected that the first point coordinates are returned in search query responses if desired.
[Back to TOC](#TOC)
# Tradeoff Analysis
Discussion yielded that returning AsIngestedCoordinates as properties in the Search query response, only for the first point, and with some other SpatialLocation metadata is the correct tradeoff to satisfy Geomatics use cases and not burden the indexer performance or memory.
[Back to TOC](#TOC)
# Proposed solution (to be analyzed and implemented by Shell developers)
* Following approach is proposed. It says proposed because I am not intimately familiar with the code or all possible gotchas that you may run into when developing. It mainly describes the situation from an end-user what is needed to be returned.
For a record being ingested, for example a Well that may somehow have following AsIngestedCoordinates:
```json
"data": {
// Pseudo json follows. feel free to replace with a real example
// AbstractSpatialLocation
"SomeLocation": {
"SpatialLocationCoordinatesDate": "2023-02-19",
"QuantitativeAccuracyBandID": "<1 m",
"QualitativeSpatialAccuracyTypeID": "Checked: Approved",
"CoordinateQualityCheckPerformedBy": "Bert",
"CoordinateQualityCheckDateTime": "2023-01-19",
"CoordinateQualityCheckRemarks": [
"good",
"really",
"vertical is good too"
],
"AppliedOperations": [
"conversion from ED_1950_UTM_Zone_31N to GCS_European_1950; 1 points converted",
"transformation GCS_European_1950 to GCS_WGS_1984 using ED_1950_To_WGS_1984_24; 1 points successfully transformed"
],
"SpatialParameterTypeID": "Outline",
"SpatialGeometryTypeID": "Point"
},
// AbstractAnyCrsFeatureCollection
"AsIngestedCoordinates": {
"CoordinateReferenceSystemID": "osdu:reference-data--CoordinateReferenceSystem:BoundProjected:EPSG::32021_EPSG::15851:",
"VerticalCoordinateReferenceSystemID": "osdu:reference-data--CoordinateReferenceSystem:Vertical:EPSG::5714:",
"VerticalUnitID": "osdu:reference-data--UnitOfMeasure:m:",
"persistableReferenceCrs": "{\"authCode\":{\"auth\":\"OSDU\",\"code\":\"32021079\"},\"lateBoundCRS\":{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"32021\"},\"name\":\"NAD_1927_StatePlane_North_Dakota_South_FIPS_3302\",\"type\":\"LBC\",\"ver\":\"PE_10_9_1\",\"wkt\":\"PROJCS[\\\"NAD_1927_StatePlane_North_Dakota_South_FIPS_3302\\\",GEOGCS[\\\"GCS_North_American_1927\\\",DATUM[\\\"D_North_American_1927\\\",SPHEROID[\\\"Clarke_1866\\\",6378206.4,294.9786982]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],PROJECTION[\\\"Lambert_Conformal_Conic\\\"],PARAMETER[\\\"False_Easting\\\",2000000.0],PARAMETER[\\\"False_Northing\\\",0.0],PARAMETER[\\\"Central_Meridian\\\",-100.5],PARAMETER[\\\"Standard_Parallel_1\\\",46.18333333333333],PARAMETER[\\\"Standard_Parallel_2\\\",47.48333333333333],PARAMETER[\\\"Latitude_Of_Origin\\\",45.66666666666666],UNIT[\\\"Foot_US\\\",0.3048006096012192],AUTHORITY[\\\"EPSG\\\",32021]]\"},\"name\":\"NAD27 * OGP-Usa Conus / North Dakota CS27 South zone [32021,15851]\",\"singleCT\":{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"15851\"},\"name\":\"NAD_1927_To_WGS_1984_79_CONUS\",\"type\":\"ST\",\"ver\":\"PE_10_9_1\",\"wkt\":\"GEOGTRAN[\\\"NAD_1927_To_WGS_1984_79_CONUS\\\",GEOGCS[\\\"GCS_North_American_1927\\\",DATUM[\\\"D_North_American_1927\\\",SPHEROID[\\\"Clarke_1866\\\",6378206.4,294.9786982]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],GEOGCS[\\\"GCS_WGS_1984\\\",DATUM[\\\"D_WGS_1984\\\",SPHEROID[\\\"WGS_1984\\\",6378137.0,298.257223563]],PRIMEM[\\\"Greenwich\\\",0.0],UNIT[\\\"Degree\\\",0.0174532925199433]],METHOD[\\\"NADCON\\\"],PARAMETER[\\\"Dataset_conus\\\",0.0],OPERATIONACCURACY[5.0],AUTHORITY[\\\"EPSG\\\",15851]]\"},\"type\":\"EBC\",\"ver\":\"PE_10_9_1\"}",
"persistableReferenceVerticalCrs": "{\"authCode\":{\"auth\":\"EPSG\",\"code\":\"5714\"},\"name\":\"MSL_Height\",\"type\":\"LBC\",\"ver\":\"PE_10_9_1\",\"wkt\":\"VERTCS[\\\"MSL_Height\\\",VDATUM[\\\"Mean_Sea_Level\\\"],PARAMETER[\\\"Vertical_Shift\\\",0.0],PARAMETER[\\\"Direction\\\",1.0],UNIT[\\\"Meter\\\",1.0],AUTHORITY[\\\"EPSG\\\",5714]]\"}",
"persistableReferenceUnitZ": "{\"scaleOffset\":{\"scale\":1.0,\"offset\":0.0},\"symbol\":\"m\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
"features": [ // NOTE: A well will only have a single AnyCrsPoint for the surface location, potentially 2D, rather than 3D (and then also no vertical CRS, etc.). But I added here the 3D and additional AnyCrsLineString just to make clear what to do in this case.
{
"type": "AnyCrsFeature"
"geometry": {
"type": "AnyCrsPoint"
"coordinates": [1500000.0, 12345678.0, 100.0]
}
},
{
"type": "AnyCrsFeature"
"geometry": {
"type": "AnyCrsLineString"
"coordinates": [[1400000.0, 12345666.0, 99.0], [1600000.0, 12345777.0, 101.0]]
}
} ]
// Wgs84 Coordinates
"Wgs84Coordinates": { etc. Not relevant}
}
}
```
The desired end result of a query search response would include the following properties. They are a direct copy of the input record AbstractSpatialLocation fragment.
```json
{
"data": {
"AsingestedCoordinates.FirstPoint.X": 222222.0, // Number (floating point) if given on ingest of course
"AsingestedCoordinates.FirstPoint.Y": 111111.0, // Number.
"AsingestedCoordinates.FirstPoint.Z": 100.0, // Number. Blank (null) unless the input had a Z value
"AsingestedCoordinates.CoordinateReferenceSystemID": "xxx", // see note below. OSDU allows data ingesting with PR and not with a reference to a CRS record id. What to do then?
"AsingestedCoordinates.VerticalCoordinateReferenceSystemID": "xxx", // for 3D Z value if in input
"AsingestedCoordinates.persistableReferenceCrs": "string xxx", // see note below.
"AsingestedCoordinates.persistableReferenceVerticalCrs": "string xxx",
"AsingestedCoordinates.persistableReferenceUnitZ": "string xxx",
"AsingestedCoordinates.QuantitativeAccuracyBandID": "xxx",
"AsingestedCoordinates.QualitativeSpatialAccuracyTypeID": "xxx",
"AsingestedCoordinates.CoordinateQualityCheckPerformedBy": "xxx",
"AsingestedCoordinates.CoordinateQualityCheckDateTime": "xxx",
"AsingestedCoordinates.CoordinateQualityCheckRemarks[]": "(string array)",
"AsingestedCoordinates.AppliedOperations[]": "(string array)"
}
}
```
Note:
* AsingestedCoordinates.FirstPoint.Type is not needed because Wgs84Coordinates will have the original type. Though perhaps it is useful in case the FirstPoint was something like "AnyCrsMultiPoint" to know that
* AsingestedCoordinates.SpatialLocationCoordinatesDate is not needed because QC time is already there and this is more for plate motion that seems not needed at the moment. We could add it though.
<details><summary>expand me</summary>
Got you!
</details>
[Back to TOC](#TOC)
## Accepted Limitations / things to work out
The following are some accepted limitation of the proposed solution, e.g., that we agree only to index the first point in a flat array and not as a geometry for reasons of performance. There are also some Questions which the developers will have to contemplate and propose a solution for (which may be that there is no solution).
* Only first point of the AsIngested geometry is accepted if geometry contains more than 1 point.
* If it would be useful or better to use a switch or flag to search so user can decide when to include geometry in the response (I would argue then Wgs84 and AsIngested) then it is fine if by default they are returned but can be omitted. But I expect this is already the case using the ReturnedFields.
- In itself it seems not a bad option to be default omit the geometry because it can be large for 2D lines or so. But that is not the intention of this issue.
* What to do if the ingested Geometry is complex?
* _It is not relevant to the implementation, but please clarify if AnyCrsfeatureCollection indeed can contain both Points and LineStrings (for example) or has to contain only a single feature. The name collection suggests it can be complex combination of types._
* If AsIngested geometry contains multiple types or OneOff then
- Index the Point if it exists, else the first point of a MultiPoint, else the first point of a LineString, else of a MultiLineString, else of a Polygon, else of the MultiPolygon (else nothing, there is no geometry!).
* What to do if there is a PR but no CRS id on input?
- Option 1 is to not return the CRS and no coordinates but that is not satisfactory.
- Option 2 is to not return the CRS but coordinates.
- Option 3 is to return the PR in the CRS ID field.
- Option 4 is to return the PR as PR (preferred).
- Option 5 is to look up the id of the PR (but we do not have a function for that and would take time...). In a way this is ideal though, but we expect people to ingest data with a (bound CRS) record id.
* Can somehow the CRS Name (Hor and Vert) be returned?
- Option 1 is no. I think we have to accept this, because the name is not part of the input.
- Option 2 is yes. Because the normalizer will print in OperationsApplied the CRS Name (at least for the horizontal which is most important).
- Option 3 is to look up the CRS by id and then retrieve some parameters (for example the PR to augment the stored and indexed record with the numerical definition used at the time of normalization; as a permanent record frozen in time what was applied at the time of ingestion - which was the original requirement in 2021 for ingested data to look up the PR and store it with the data but this was said not to be possible.)
* Can somehow the AppliedOperations be returned or not too useful to bother?
- Option 1 is yes.
- Option 2 is no.
[Back to TOC](#TOC)
# Change Management
* Operators may need to re-ingest data or update the index. Is it possible to "patch" data to re-run the indexer on data already ingested?
# Decision
* Implement by Shell developers working on Search Service.
# Consequences
* The indexer code changes should have no noticable impact on the system or applications (only additional property returned).
[Back to TOC](#TOC)
#EOF.M22 - Release 0.25Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/108Poor performance for index augmenter2023-08-24T20:45:43ZZhibin MaiPoor performance for index augmenterThough we made several enhancements related to index augmenter directly or indirectly, such as creating separating re-index topic, splitting the big message with 1000 records to small message with 50 records to support parallel indexing,...Though we made several enhancements related to index augmenter directly or indirectly, such as creating separating re-index topic, splitting the big message with 1000 records to small message with 50 records to support parallel indexing, and etc. We still found that the index performance with augmenter enabled is much worse than the index performance with augmenter disabled. For example, for WellLog with multiple extension configurations, the performance with augmenter enabled is about 15 times slower than the performance with augmenter disabled.
With augmenter enabled,
1. Index one record individually, each record (for given property configurations) requires 8 queries to get all information in order to populate the extended properties. In this test test, cache does not take effect at all.
2. Index a kind with 291 WellLog records, each record requires 6.8 queries on average. In this test case, the cache should play important role. However, we found the cache mechanism basically does not take much effect.
As I ran the tests from local, the latency of search is about 1.5 times longer than the latency of search in cloud env. I estimated that the performance with augmenter enabled is still about 10 times slower if we don't make any enhancement.M20 - Release 0.23Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/109ADR: Full reindex API access must be elevated2023-12-01T12:38:36ZNeha KhandelwalADR: Full reindex API access must be elevated[[_TOC_]]
# Status
* [x] Proposed
* [x] Trialing
* [x] Under review
* [ ] Approved
* [ ] Retired
# Context & Scope
Expected use-case for the full reindex API is for disaster recovery scenario as it reindexes everything in a data-part...[[_TOC_]]
# Status
* [x] Proposed
* [x] Trialing
* [x] Under review
* [ ] Approved
* [ ] Retired
# Context & Scope
Expected use-case for the full reindex API is for disaster recovery scenario as it reindexes everything in a data-partition.
Currently, full reindex API access is set to same level as other reindex APIs. Due to this, users with **users.datalake.admin** permission can **accidently** trigger a full reindex. To make matter worse, there are no APIs to cancel ongoing re-index, so this operation can run for hours/days depending on data-partition size. This can have impact on cost and service performance.
# Requirements
We need to elevate the permission level for the full reindex API so that users with Admin access cannot accidently trigger a full reindex.
# Tradeoff Analysis
This will be breaking change, but it should have low impact as this API is used very rarely/infrequently.
# Solution
The proposed solution is that the permission level for full reindex API should be elevated and set to **users.datalake.ops**.
# Consequences
* Change in indexer-core to Reindex API (permission elevation for full reindex) and PartitionSetup API (refactor)
* Indexer service documentation needs to be updated
# ADR Comments BelowM21 - Release 0.24https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/110ReIndex API does not always update the schema mapping to ElasticSearch2023-09-08T19:53:33ZZhibin MaiReIndex API does not always update the schema mapping to ElasticSearchWhen an augmenter configuration is deployed, in order to make use of the updated configuration, two operations are required:
1. Update the schema mapping with extended schema from the augmenter configuration to ElasticSearch
2. Re-index ...When an augmenter configuration is deployed, in order to make use of the updated configuration, two operations are required:
1. Update the schema mapping with extended schema from the augmenter configuration to ElasticSearch
2. Re-index the records of the affected kind.
In this scenario, it is expected that users still can search the "old" data before the re-index is completed.
Current implementation of ReIndex API does not always update the schema mapping to ElasticSearch if the forceClean option is not set to true. However, when the forceClean option is set to true, the original index will be deleted/purged. In this case, users may not be able to search the expected data before a new index is fully populated.Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/112ADR: Create field for case insensitive search2024-02-26T17:16:00ZMark ChanceADR: Create field for case insensitive search# ADR: Add keywordLower Index Mapping field
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Background
Application developers would like to provide to their users...# ADR: Add keywordLower Index Mapping field
<a name="TOC"></a>
[[_TOC_]]
# Status
- [x] Proposed
- [x] Trialing
- [x] Under review
- [x] Approved
- [ ] Retired
# Background
Application developers would like to provide to their users a simple mechanism to enable searching that is much like SQL "LIKE" queries with lower function. Currently, none of the existing ElasticSearch fields implement this.
# Context & Scope
## Requirements
The desire is to support the following search query:
```json
{
"kind": "osdu:wks:master-data--Well:1.0.0",
"query": "data.FacilityName.keywordLower:exam*"
}
```
Which would return
```json
{
"results": [
{
"data": {
"FacilityName": "Example test"
},
"id": "osdu:master-data--Well:1012"
}
]
}
```
# Tradeoff Analysis
# Proposed solution
A field in the index called keywordLower in which all input is normalized to lower case.
For example, this mapping in master-data--Well would be created:
```json
"CurrentOperatorID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"null_value": "null",
"ignore_above": 256
},
"keywordLower": {
"type": "keyword",
"normalizer": "lowercase",
"null_value": "null",
"ignore_above": 256
}
}
},
```
The 'keywordLower' field is added and has the additional attribute:
"normalizer": "lowercase"
# Change Management
* Operators may need to re-ingest data or update the index. Is it possible to "patch" data to re-run the indexer on data already ingested?
# Decision
# Consequences
* The indexer code changes should have no noticeable impact on the system or applications (only additional property created).
* The index will be larger with the addition of the many instances of this field.
Draft MR: https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/618M22 - Release 0.25Stanisław BienieckiStanisław Bienieckihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/114The RelatedConditionMatches of the augmenter is not flexible2023-10-03T07:51:41ZZhibin MaiThe RelatedConditionMatches of the augmenter is not flexibleCurrent implementation of the RelatedConditionMatches in the augmenter has following limitations:
1. The condition match is text match only. The following two cases demonstrate that the regular expression match is needed:
##### Case 1:...Current implementation of the RelatedConditionMatches in the augmenter has following limitations:
1. The condition match is text match only. The following two cases demonstrate that the regular expression match is needed:
##### Case 1: Extend the properties from the related objects whose IDs are defined under data.LineageAssertions[].ID
```
{
"Name": "Document-IndexPropertyPathConfiguration",
"Code": "osdu:wks:work-product-component--Document:1.",
"AttributionAuthority": "OSDU",
"Configurations": [{
"Name": "AssociatedFacilityNames",
"Policy": "ExtractAllMatches",
"Paths": [{
"RelatedObjectsSpec": {
"RelationshipDirection": "ChildToParent",
"RelatedObjectID": "data.LineageAssertions[].ID",
"RelatedObjectKind": "osdu:wks:master-data--Wellbore:1.",
"RelatedConditionMatches": [
"^[\\w\\-\\.]+:master-data\\-\\-Wellbore:[\\w\\-\\.\\:\\%]+$"
],
"RelatedConditionProperty": "data.LineageAssertions[].ID"
},
"ValueExtraction": {
"ValuePath": "data.FacilityName"
}
}
]
}, {
"Name": "AssociatedProjectNames",
"Policy": "ExtractAllMatches",
"Paths": [{
"RelatedObjectsSpec": {
"RelationshipDirection": "ChildToParent",
"RelatedObjectID": "data.LineageAssertions[].ID",
"RelatedObjectKind": "osdu:wks:master-data--SeismicAcquisitionSurvey:1.",
"RelatedConditionMatches": [
"^[\\w\\-\\.]+:master-data\\-\\-SeismicAcquisitionSurvey:[\\w\\-\\.\\:\\%]+$"
],
"RelatedConditionProperty": "data.LineageAssertions[].ID"
},
"ValueExtraction": {
"ValuePath": "data.ProjectName"
}
}
]
}
]
}
]
}
```
##### Case 2: Match the reference data values in any data partition (or ignoring the data partition)
```
{
"Name": "WellLog-IndexPropertyPathConfiguration",
"Code": "osdu:wks:work-product-component--WellLog:1.",
"AttributionAuthority": "OSDU",
"Configurations": [{
"Name": "WellUWI",
"Policy": "ExtractFirstMatch",
"Paths": [{
"ValueExtraction": {
"RelatedConditionMatches": [
"^[\\w\\-\\.]+:reference-data--AliasNameType:UniqueIdentifier:$",
"^[\\w\\-\\.]+:reference-data--AliasNameType:RegulatoryName:$",
"^[\\w\\-\\.]+:reference-data--AliasNameType:PreferredName:$",
"^[\\w\\-\\.]+:reference-data--AliasNameType:CommonName:$",
"^[\\w\\-\\.]+:reference-data--AliasNameType:ShortName:$"
],
"RelatedConditionProperty": "data.NameAliases[].AliasNameTypeID",
"ValuePath": "data.NameAliases[].AliasName"
}
}
]
}
]
}
```
As required, to extend a property from a related record, the kind of the related record must be defined in the configuration. However, the Relationship type under ExtensionProperties does not define the kind of the target object. In some cases, the source record
Example: Extend the related object's name to the document, name of the related objects
2. RelatedConditionProperty is limited to be a property of one level nested object.
In the above examples, both `data.NameAliases[].AliasNameTypeID` and `data.ExtensionProperties.Relationships[].TargetID` are properties of one level nested object. In some cases, RelatedConditionProperty can be a property of multi-level nested object. For example
```
{
"Name": "WellLog-IndexPropertyPathConfiguration",
"Code": "osdu:wks:work-product-component--WellLog:1.",
"AttributionAuthority": "OSDU",
"Configurations": [{
"Name": "OrganisationNames",
"Policy": "ExtractAllMatches",
"Paths": [{
"RelatedObjectsSpec": {
"RelationshipDirection": "ChildToParent",
"RelatedObjectKind": "osdu:wks:master-data--Organisation:1.",
"RelatedObjectID": "data.TechnicalAssurances[].Reviewers[].OrganisationID"
"RelatedConditionMatches": [
"^[\\w\\-\\.]+:reference-data--ContactRoleType:ProjectManager:AccountOwner:$",
"^[\\w\\-\\.]+:reference-data--ContactRoleType:AccountOwner:$"
],
"RelatedConditionProperty": "data.TechnicalAssurances[].Reviewers[].RoleTypeID"
},
"ValueExtraction": {
"ValuePath": "data.OrganisationName"
}
}
]
}
]
}
```M21 - Release 0.24Thomas Gehrmann [slb]Zhibin MaiThomas Gehrmann [slb]https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/118Avoid using query by cursor if possible2023-12-02T13:58:46ZZhibin MaiAvoid using query by cursor if possibleIn M20, we created a MR [601](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/601) that tried to improve the performance of the augmenter and reduce the usage of the query with cursor. With the MR, w...In M20, we created a MR [601](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/601) that tried to improve the performance of the augmenter and reduce the usage of the query with cursor. With the MR, we only have two places (getting related children records) that use query with cursor.
However, it is expensive to use query with cursor, it allows max. 500 queries with cursor within one minutes in most of the Elasticsearch deployments. The reason that we still use queries with cursor is that the normal queries can return max. 10,000 records. When trying to fetch children records for a given set of parent records, we are not sure whether the returned results will exceed the 10,000.
During our stressful tests with large datasets, we found that there are lots of errors from the queries with cursor when re-indexing 100k wellbores that have 5M welllogs in total (each wellbore has 50 welllogs on average). Based on our knowledge on Augmenter, more than 99% of cases that the query results won't reach 10,000 records. We need to find a way to ensure both correctness (no result missed) and error-free from the queries.
The basic idea is that Augmenter will use normal queries by default. In case the totalCount from the query result reaches the limit (10000), query with cursor will be automatically kicked in.M22 - Release 0.25Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/120Augmenter can't recursively resolve the schema (property/type pair) of the au...2023-12-19T15:40:02ZZhibin MaiAugmenter can't recursively resolve the schema (property/type pair) of the augmented properties.Augmenter is supposed to be able to augment the properties from other augmented properties when updating schema mapping and creating the Document for Elasticsearch.
We illustrate the issue using the following examples:
### Well has aug...Augmenter is supposed to be able to augment the properties from other augmented properties when updating schema mapping and creating the Document for Elasticsearch.
We illustrate the issue using the following examples:
### Well has augmented properties `CountryNames` from kind `osdu:wks:master-data--GeoPoliticalEntity:1.` and `WellUWI` from itself
```
{
"Name": "Well-IndexPropertyPathConfiguration",
"Description": "The index property list for master-data--Well:1., valid for all master-data--Well kinds for major version 1.",
"Code": "osdu:wks:master-data--Well:1.",
"AttributionAuthority": "OSDU",
"Configurations": [{
"Name": "CountryNames",
"Policy": "ExtractAllMatches",
"UseCase": "As a user I want to find objects by a country name, with the understanding that an object may extend over country boundaries.",
"Paths": [{
"RelatedObjectsSpec": {
"RelatedConditionProperty": "data.GeoContexts[].GeoTypeID",
"RelatedConditionMatches": [
"opendes:reference-data--GeoPoliticalEntityType:Country:"
],
"RelatedObjectID": "data.GeoContexts[].GeoPoliticalEntityID",
"RelatedObjectKind": "osdu:wks:master-data--GeoPoliticalEntity:1.",
"RelationshipDirection": "ChildToParent"
},
"ValueExtraction": {
"ValuePath": "data.GeoPoliticalEntityName"
}
}
]
}, {
"Name": "WellUWI",
"Policy": "ExtractFirstMatch",
"UseCase": "As a user I want to discover and match Wells by their UWI. I am aware that this is not globally reliable, however, I am able to specify a prioritized AliasNameType list to look up value in the NameAliases array.",
"Paths": [{
"ValueExtraction": {
"RelatedConditionProperty": "data.NameAliases[].AliasNameTypeID",
"RelatedConditionMatches": [
"opendes:reference-data--AliasNameType:UniqueIdentifier:",
"opendes:reference-data--AliasNameType:RegulatoryName:",
"opendes:reference-data--AliasNameType:PreferredName:"
],
"ValuePath": "data.NameAliases[].AliasName"
}
}
]
}
]
}
```
### Wellbore has augmented properties `CountryNames` and `WellUWI` from kind `osdu:wks:master-data--osdu:wks:master-data--Wellbore:1.`
```
{
"Name": "Wellbore-IndexPropertyPathConfiguration",
"Description": "The index property list for master-data--Wellbore:1., valid for all master-data--Wellbore kinds for major version 1.",
"Code": "osdu:wks:master-data--Wellbore:1.",
"AttributionAuthority": "OSDU",
"Configurations": [{
"Name": "CountryNames",
"Policy": "ExtractFirstMatch",
"UseCase": "As a user I want to discover Wellbore instances by the well's name value.",
"Paths": [{
"RelatedObjectsSpec": {
"RelatedObjectID": "data.WellID",
"RelatedObjectKind": "osdu:wks:master-data--Well:1.",
"RelationshipDirection": "ChildToParent"
},
"ValueExtraction": {
"ValuePath": "data.CountryNames"
}
}
]
}, {
"Name": "WellUWI",
"Policy": "ExtractFirstMatch",
"UseCase": "As a user I want to discover Wellbore instances by the well's UWI value.",
"Paths": [{
"RelatedObjectsSpec": {
"RelatedObjectID": "data.WellID",
"RelatedObjectKind": "osdu:wks:master-data--Well:1.",
"RelationshipDirection": "ChildToParent"
},
"ValueExtraction": {
"ValuePath": "data.WellUWI"
}
}
]
}
]
}
```
When the indexer tries to resolve the schema for `Wellbore`, the resolved schema should include both `CountryNames` and `WellUWI`.
However, in current implementation, the resolved schema for `Wellbore` does not include augmented properties `CountryNames` and `WellUWI`. At the result, these two properties are not searchable though their values are created in the `Wellbore` records.Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/136The augmented attributes are not searchable2024-01-26T18:29:27ZZhibin MaiThe augmented attributes are not searchableA common issue as mentioned in the [IndexAugmenter.md](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexAugmenter.md), the augmented attributes are not searchable. It requires the reco...A common issue as mentioned in the [IndexAugmenter.md](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexAugmenter.md), the augmented attributes are not searchable. It requires the records of the augmented kind(s) to be re-indexed.
It is understandable that in order to augment the existing records, it needs to re-index the existing records of the augmented kind(s). However, there is a common scenario, data admins/managers want to verify the effect/result of the augmenter configuration immediately after they deploy the augmenter configuration. They normally don't have permission to trigger re-index. Furthermore, in this scenario, they should not trigger re-index of the whole kind(s) before they finalize the augmenter configuration for given kind(s).
If the indexer can automatically update the schema mapping of the augmented kind(s) in the ElasticSearch when it detects that the augmented configuration was updated. Then data admins/managers can see the effect/result of the augmenter configuration immediately by updating one of the existing data records or inserting a new data record. It will tremendously reduce the time on troubleshooting as well as developing and deploying new/updated augmenter configurations.
Given updating the schema mapping of the augmented kind(s) in the ElasticSearch is a lightweight operation as comparing to re-index the whole kind(s), I think it is worth to make this enhancement.M23 - Release 0.26Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/137String array becomes String after index2024-01-24T08:54:10ZZhibin MaiString array becomes String after indexThe String array becomes String after it is indexed. Bug should be introduced by [MR 649](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/649)
To illustrate the problem, I used one example from Augm...The String array becomes String after it is indexed. Bug should be introduced by [MR 649](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/649)
To illustrate the problem, I used one example from Augmenter Configuration that has String array attributes.
- Storage Format of part of data payload:
![image](/uploads/9dacff15a729788fffb02e916b704569/image.png)
- Index (document) Format of part of data payload returned by method in class StorageIndexerPayloadMapper
```
public Map<String, Object> mapDataPayload(ArrayList<String> asIngestedCoordinatesPaths, IndexSchema storageSchema, Map<String, Object> storageRecordData,
String recordId) {
Map<String, Object> dataCollectorMap = new HashMap<>();
//..
mapDataPayload(storageSchema.getDataSchema(), storageRecordData, recordId, dataCollectorMap);
//...
return dataCollectorMap;
}
```
![image](/uploads/dfe1df18988936c5b137c542edd58c96/image.png)
- Search result before re-index from local indexer service with the [MR 649](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/649):
![image](/uploads/7714272f8aa0286c90b278e7546d8b33/image.png)
- Search result after re-index from local indexer service with the [MR 649](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/649):
![image](/uploads/b51dbecdc83cc6279b71017d1f8f1b61/image.png)M22 - Release 0.25Mark ChanceStanisław BienieckiMark Chancehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/138Datetime formatting/parsing issues result in field not appearing in search index2024-03-04T12:23:12ZMark ChanceDatetime formatting/parsing issues result in field not appearing in search index**Subject:** Certain "date" type attributes unavailable via SEARCH API but available by STORAGE API
QA team just highlighted that some “date” related fields have gone missing again from the SEARCH API services. I have posted the sna...**Subject:** Certain "date" type attributes unavailable via SEARCH API but available by STORAGE API
QA team just highlighted that some “date” related fields have gone missing again from the SEARCH API services. I have posted the snapshot below. Please note that no schema updates/changes has happened. QA (as end users) are ingesting and retrieving data (CRUD) to and from the schema.\
\
{
"kind": "tenant1:wks:work-product-component--Sheet:1.0.0",
"query": "\\"tenant1:work-product-component--Sheet:d92b4ff85fd040dba9009209e85a3c31\\""
}\
\
Through SEARCH:
![Search.png](/uploads/8c66dfb8d694298852a39f7d7eb50918/Search.png)Through STORAGE:
![Storage.png](/uploads/7af8e15227941a43f1a3a8b6440931aa/Storage.png)This is fixed by https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/694M23 - Release 0.26Mark ChanceMark Chancehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/139Too many results returned after bagofwords feature2024-01-19T19:47:34ZGuillaume CailletToo many results returned after bagofwords featureHi,
When enabling the [BagOfWords feature](https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/113), some search query with a "query" filter return too many results.
I've reproduced the issue on several AWS env...Hi,
When enabling the [BagOfWords feature](https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/113), some search query with a "query" filter return too many results.
I've reproduced the issue on several AWS environment, and I don't have this issue if the indexer is deployed with the Feature flag `featureFlag.bagOfWords.enabled` set to False.
I have attached the 3 records and schema I used (these are from the `os-search` integration tests in `testing/integration-tests/search-test-core/src/main/resources/testData/records_1.json`)
[records.json](/uploads/196fce2d3f739b3c4349bd4e5075aeed/records.json)
[schema.json](/uploads/990d8ac4242d6a09921e16236f6a72e5/schema.json)
( I didn't delete these 3 records from the `main.osdu-gl.osdu.aws` environment, so if you have access to it, you should be able to reproduce these queries )
Once the records are indexed :
Issue a `search` query with the following payload:
```
{
"kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
"query": "OFFICE9"
}
```
I have all 3 records returned, instead of 0 (there are no "OFFICE9" text in the 3 records)
Same if I use a "valid" query matching at least one record, for example
```
{
"kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
"query": "OFFICE4"
}
```
Also returns 3 records instead of one.
This issue seems to occurs only when using digit suffix. If I use a letter, it works properly, for example
```
{
"kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
"query": "OFFICEZ"
}
```
Properly returns 0 results.
I have managed to reproduce the issue directly on the elasticsearch server by using their REST API, so the issue is not with the Search service I think :
POST https://localhost:9200/opendes-search1704732571020-test-data--integration-1.0.1/_search (I'm using k8s port-forwarding to dircetly connect to the ES server)
with the following payload
```{
"from": 0,
"size": 10,
"timeout": "1m",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"query_string": {
"query": "OFFICE9"
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
]
}
}
}
```
Returns 3 results when BagOfWords is enabled, only 1 if not.M22 - Release 0.25Mark ChanceStanisław BienieckiMark Chancehttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/151Augmenter throws null pointer exception when casting the related object ids r...2024-02-26T11:47:11ZZhibin MaiAugmenter throws null pointer exception when casting the related object ids retrieved from the GeoContextAugmenter throws null pointer exception when it tried to get the reference id and the reference object id is null. The bug was introduced by the MR [620](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_reques...Augmenter throws null pointer exception when it tried to get the reference id and the reference object id is null. The bug was introduced by the MR [620](https://community.opengroup.org/osdu/platform/system/indexer-service/-/merge_requests/620) included in M21.
The issue was discovered in M22 deployment when augmenter tried to cast a null reference object id from GoeContext object. GeoContext has 5 reference object id properties defined in Schema. In each GeoContext, only 1 reference object id is not null.
Here is the example:
- Format in Storage record
```
{
"GeoContexts": [{
"GeoPoliticalEntityID": "opendes:master-data--GeoPoliticalEntity:111111:",
"GeoTypeID": "opendes:reference-data--GeoPoliticalEntityType:LicenseBlock:"
}, {
"FieldID": "opendes:master-data--Field:444444:"
}
]
}
```
- Format in index record
```
{
"GeoContexts": [{
"BasinID": null,
"FieldID": null,
"PlayID": null,
"GeoPoliticalEntityID": "opendes:master-data--GeoPoliticalEntity:111111:",
"GeoTypeID": "opendes:reference-data--GeoPoliticalEntityType:LicenseBlock:",
"ProspectID": null
}, {
"BasinID": null,
"FieldID": "opendes:master-data--Field:444444:",
"PlayID": null,
"GeoPoliticalEntityID": null,
"GeoTypeID": "Field",
"ProspectID": null
}
]
}
```
With the bug introduced, the Augmenter considers the related object id "FieldID" has value without checking whether the value is null or not before casting it to String. In this case, a NullPointerException is thrown. The augmenting for the record will fail though it does not affect the normal indexing.M23 - Release 0.26Zhibin MaiZhibin Maihttps://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/152Unable search ingested records using search api2024-03-08T12:30:33ZMohd Asad ShaikhUnable search ingested records using search apiHi Team,
I am not able to search artifacts using a search service. However, I can able to search using a storage service. Attached the Dag ingested success response, empty search response.![Dag_Success_result](/uploads/0a5813e7d6d4d8f6a0...Hi Team,
I am not able to search artifacts using a search service. However, I can able to search using a storage service. Attached the Dag ingested success response, empty search response.![Dag_Success_result](/uploads/0a5813e7d6d4d8f6a08052fc23ef8852/Dag_Success_result.png)
![image__3_](/uploads/6638d7ff797274aaacc35e7a09e9dbf3/image__3_.png)
![search_result_](/uploads/d316028de45b17c3be16e9d4a575c5c1/search_result_.png)https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/154Augmented Index - Use Case 2 (Country Names) not working in M22 Preship Envir...2024-03-07T11:27:59ZNorman MedinaAugmented Index - Use Case 2 (Country Names) not working in M22 Preship EnvironmentI was testing out the augmented index feature on the M22 Preship environment. I was trying to implement the use cases documented in this [tutorial](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/t...I was testing out the augmented index feature on the M22 Preship environment. I was trying to implement the use cases documented in this [tutorial](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexAugmenter.md#use_cases). Use cases 1 and 5 worked for me. Use case 2 did not, as the field `CountryNames` wasn't coming out of search after reindexing.
I also tried this on `osdu:wks:master-data--Wellbore:1.0.0` but replaced the item in `RelatedConditionMatches` with `^[\\w\\-\\.]+:reference-data--GeoPoliticalEntityType:Province:$` and the Name to `ProvinceNames`, but the custom `ProvinceNames` field is not appearing.
Please see below the reference data I used:
```
[
{
"acl": {
"owners": [
"{{New_OwnerDataGroup}}@{{data-partition-id}}{{domain}}"
],
"viewers": [
"{{New_ViewerDataGroup}}@{{data-partition-id}}{{domain}}"
]
},
"legal": {
"legaltags": [
"{{LegalTagNameExists}}"
],
"otherRelevantDataCountries": [
"US"
],
"status": "compliant"
},
"meta": [],
"data": {
"Code": "osdu:wks:master-data--Wellbore:1.",
"Configurations": [
{
"Name": "ProvinceNames",
"Policy": "ExtractAllMatches",
"Paths": [
{
"RelatedObjectsSpec": {
"RelatedObjectID": "data.GeoContexts[].GeoPoliticalEntityID",
"RelatedObjectKind": "osdu:wks:master-data--GeoPoliticalEntity:1.",
"RelatedConditionMatches": [
"^[\\w\\-\\.]+:reference-data--GeoPoliticalEntityType:Province:$"
],
"RelatedConditionProperty": "data.GeoContexts[].GeoTypeID"
},
"ValueExtraction": {
"ValuePath": "data.GeoPoliticalEntityName"
}
}
],
"UseCase": "As a user I want to find objects by a province name."
}
]
},
"id": "{{data-partition-id}}:reference-data--IndexPropertyPathConfiguration:wks:master-data--Wellbore:1.",
"kind": "osdu:wks:reference-data--IndexPropertyPathConfiguration:1.0.0",
"version": 0
}
]
```https://community.opengroup.org/osdu/platform/system/indexer-service/-/issues/155Augmented Index - Use Case 3 (Wellbore Name) not working in M22 Preship Envir...2024-03-07T11:27:28ZNorman MedinaAugmented Index - Use Case 3 (Wellbore Name) not working in M22 Preship EnvironmentI was testing out the augmented index feature on the M22 Preship environment. I was trying to implement the use cases documented in this [tutorial](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/t...I was testing out the augmented index feature on the M22 Preship environment. I was trying to implement the use cases documented in this [tutorial](https://community.opengroup.org/osdu/platform/system/indexer-service/-/blob/master/docs/tutorial/IndexAugmenter.md#use_cases). Use cases 1 and 5 worked for me. Use case 3 did not, as the field `WellboreName` wasn't coming out of search after reindexing. I tried testing this out three times, but it still didn't work.
I used the snippet that was provided in the tutorial page and didn't modify anything in it.