Indexing Records of same Kind but different case for "kind" results in undesired behaviour.
Assume a new schema is created with Kind: osdu:wks:reference-data--VelocityAnalysisMethodX1:1.0.0
Next, a storage record is PUT, with the value for kind as "OSDU:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Notice the Upper case OSDU.
When indexing, the following happens:
- Indexer service does a GET request to schema for the kind "OSDU:wks:reference-data--VelocityAnalysisMethodX1:1.0.0". Schema is NOT found. (due to upper case)
- Indexer service still goes ahead, as per design, creates an elasticsearch index "osdu-wks-reference-data--velocityanalysismethodx1-1.0.0", and this index's mapping, for the field authority, the allowed value is set as constant "OSDU" (in uppercase, as derived from storage record's "kind" field). Then the record is indexed in this index, with
"trace": ["schema not found"],
as the reason for the data fields not being indexed.
Now this causes two major issues:
- Legitimate records with kind osdu:wks:reference-data--VelocityAnalysisMethodX1:1.0.0 will not get indexed, because the elasticsearch index for this kind is already created with "authority" field allowed to have only the constant value OSDU, (hence cant accept osdu).
- The mapping also would have got created with no details about data fields due to schema not found the first time (as discussed earlier). This will cause the data fields of the storage record to NOT get indexed. and hence, these records won't be searchable.
This happens because elasticsearch index is created by converting kind string to lowercase, so two records with logically the same kinds, but different CASE, will have these conflicts during indexing. (index name = lowercase(kind), and replace : with - )
To solve this, we need to design a strategy to handle different casing of the meta attributes like "kind"/"authority" appropriately.