Records of a new Kind can be unsearchable due to race condition
If an Elasticsearch index (with proper schema definition and mapping) is not created ahead of record ingestion via Storage, Elasticsearch creates a default index mapping when Indexer processes the first records with that schema. Search service does not work with this default mapping.
Creating an index with proper mapping and making a shard ready typically takes a few seconds and an issue has been noticed when multiple Indexer service instances try to index a new kind. One instance will try to create the index, while another instance will see the index as created and start indexing with default mapping. This makes the kind/entity unsearchable.
Simple (and common scenario):
- Ingestion job created that uses a new kind for the incoming records
- Ingestion job starts using multiple threads.
- When the new kind on the incoming records is encountered by the first indexer thread, it needs to be created (the index), and index creation starts
- In the few seconds the first indexer thread is creating the "real" index, other threads process N records (likely 1.5 * number of seconds for index creation + # of threads) using the default mapping
- The N records created using the default mapping are unusable.
Edited by Gary Murphy