Avoid using query by cursor if possible
In M20, we created a MR 601 that tried to improve the performance of the augmenter and reduce the usage of the query with cursor. With the MR, we only have two places (getting related children records) that use query with cursor.
However, it is expensive to use query with cursor, it allows max. 500 queries with cursor within one minutes in most of the Elasticsearch deployments. The reason that we still use queries with cursor is that the normal queries can return max. 10,000 records. When trying to fetch children records for a given set of parent records, we are not sure whether the returned results will exceed the 10,000.
During our stressful tests with large datasets, we found that there are lots of errors from the queries with cursor when re-indexing 100k wellbores that have 5M welllogs in total (each wellbore has 50 welllogs on average). Based on our knowledge on Augmenter, more than 99% of cases that the query results won't reach 10,000 records. We need to find a way to ensure both correctness (no result missed) and error-free from the queries.
The basic idea is that Augmenter will use normal queries by default. In case the totalCount from the query result reaches the limit (10000), query with cursor will be automatically kicked in.