Indexer not creating new index in Elasticsearch when new schema is added
It was noticed that Elastic search indexes are not created when we register a Schema. Instead, they are created when we ingest the data the first time. Index mappings are created automatically based on the ingested record, not based on the schema. Due to this behavior many attributes and data types are not properly indexed.
We want to understand if this is the intended behavior in the core code logic. This was at least observed on AWS.
Steps to Reproduce:
- Create new OSDU environment with sample data (Except “osdu:wks:dataset--FileCollection.Generic:1.0.0” data)
- Search for FileCollection Schema {{osdu_base_url}}/api/schema-service/v1/schema/osdu:wks:dataset--FileCollection.Generic:1.0.0. This will return the schema structure.
- Login to Elastic search container
- Run CURL to list indices matching FileCollection curl -u elastic: https://localhost:9200/_cat/indices -k | grep -i file
- There will not be any index for FileCollection
- Use Dataset Service to add a record for FileCollection without Data.DatasetProperties.FileSourceInfos
- Login to Elastic search container search for the index using command curl -u elastic: https://localhost:9200/_cat/indices -k | grep -i file
- Now new index will be created for FileCollection based on the payload and not by the Schema structure.
- The index will not have any mapping for Data.DatasetProperties.FileSourceInfos
Here are some important questions:
- Should an index be created after a new schema is created?
- If not, how will the index be created when a record is added (for cases with and without schema already present in the system)
- What should happen to the index when the schema is updated?