Draft: Address `esriOid` inaccuracies (!303) · Merge requests · OSDU Software / OSDU Data Platform / Consumption Services / GeoSpatial

Levi Remington requested to merge 609_esriOid_fix_lr into master Oct 17, 2024

Type of change

Bug Fix
Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)
#609

Does this introduce a change in the core logic?

[YES/NO]

Does this introduce a change in the cloud provider implementation, if so which cloud?

AWS
Azure
GCP
IBM

Does this introduce a breaking change?

[YES/NO]

What is the current behavior?

Currently, the esriOid field will begin tracking records at 2, and increment from there.
During batch loads, the esriOid incrementation logic assumes that 100% of records in a batch were loaded, so if batchSize is 100 and only 50 records succeeded, the second batch will have esriOid start at 101.

What is the new/expected behavior?

During a batch, esriOid is only incremented after it is first stored to the record.
- This resolves issue of esriOid appearing to start at 2 instead of 1.
At the start of every batch, after a feature set has been prepared, the starting esriOid for the batch will be dynamically calculated from the cache's current size.
- This resolves issue where previous esriOid assumed the last batch was 100% successful, leading the starting esriOid for each batch to be a multiple of the batchSize regardless of how many records failed ingestion.

Have you added/updated Unit Tests and Integration Tests?

Any other useful information

Intentionally did not use getMaxObjectId since it uses the MAX(esriOid) call on a field which is not indexed, meaning the performance is slower than getting the total cache size and incrementing by 1. Since every record will have an esriOid, getting cache size and adding 1 should be equivalent to calling MAX(esriOid).

We should consider reworking or removing the getMaxObjectId() function during incremental refresh, especially if the esriOid will be automatically resolved in the putFeatures() function. Or perhaps we should consider indexing esriOid.

Edited Oct 17, 2024 by Levi Remington

Draft: Address `esriOid` inaccuracies