Skip to content

Refactor queryRecordsInBatch to broadly support varying batch sizes

Type of change

  • Bug Fix
  • Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)
#577

Does this introduce a change in the core logic?

  • [YES/NO]

Does this introduce a breaking change?

  • [YES/NO]

What is the current behavior?

  • Dependency on batchSize being less than 1000 or a multiple of 1000.
  • Assumption that cursor will not expire mid-batch

What is the new/expected behavior?

  • batchSize may be any value greater than 0 and the batches will adjust accordingly
  • In event of cursor expiration, query will retry, this time specifying the offset and hitting the regular non-cursor query URL

Any other useful information

Batch Size - What is it?

  • batchSize dictates the number of records read from OSDU before they are turned over for processing by the Transformer
    • Lifecycle of a batch:
      1. Ingestion from OSDU
        • Ingestion through OSDU Search, which is capped at a limit of 1000. So if batchSize is greater than 1000, we must sub-batch the ingestion queries until total ingested records has met the batchSize
        • The sub-batching is represented by Search#queryRecordsInBatch, whereas the larger batch life cycle is captured by FeatureCacheSynchronizerHelper#synchronizeInBatch
      2. Process all records (conversion to GeoJSON, etc.)
      3. Load records into Ignite Cache
  • batchSize must be set on the Transformer level, but can optionally be set on a per-kind level
    • If batchSize is set on a kind, it overrides the batchSize set by the Transformer

Example:

  1. Configuration: batchSize is 1005
  2. Batch Lifecycle: FeatureCacheSynchronizerHelper#synchronizeInBatch will call getData() on a kind with the specified batchSize of 1005
    • Per-batch sub-batching: Search#queryRecordsInBatch will attempt to ingest 1005 records from OSDU, but must do so with a max limit of 1000.
      1. First will make a query to retrieve 1000 records
      2. Use resulting cursor to retrieve the next 5 records
      3. If cursor has expired, will query with an offset of 1000 and a limit of 5 to retrieve the next 5 records.
  3. Batch has been collected, and is now processed in bulk
  4. The next batch lifecycle of 1005 is started
  5. These batches go on until there are no more records to ingest

Cursor Expiration

  • During my testing, GLAB OSDU Search was very slow, taking over 40 seconds to query 1000 Wellbore records.
  • With a batchSize over 1000, I found the cursor would frequently expire and our code did not have a fallback. Hence the update to fall back to a non-cursor query with offset if cursor has expired.
  • This is highly unusual, as we have not encountered cursor expiration within the normal 1000 limit before. Something may be off with the environment. However, it is fair to assume this will happen again, and with other environments, so it is best for code to have a backup method.

Merge request reports