Skip to content

Fix NullPointerException in indexer when processing duplicate schemas

Summary

This change resolves a potential NullPointerException (reported with #257 (closed)) that occurs when the indexer processes multiple records with the same schema kind. The issue arises when duplicate IndexSchema objects are added to the schemas list during payload preparation.

Changes Made

  • Modified IndexerServiceImpl.getIndexerPayload(): Changed from List<IndexSchema> to LinkedHashSet<IndexSchema> to automatically deduplicate schemas while preserving insertion order
  • Added comprehensive test case: testGetIndexerPayload_ShouldDeduplicateSchemas() verifies that multiple records with the same kind result in only one unique schema in the payload
  • Updated imports: Added LinkedHashSet and Set imports

Technical Details

The root cause was that when multiple records shared the same kind (schema type), duplicate IndexSchema objects were being added to the schemas list. This could lead to downstream processing issues and potential NullPointerExceptions.

Before:

List<IndexSchema> schemas = new ArrayList<>();
// ... processing loop
schemas.add(schema); // Could add duplicates

After:

Set<IndexSchema> schemasSet = new LinkedHashSet<>();
// ... processing loop  
schemasSet.add(schema); // Automatically deduplicates
return RecordIndexerPayload.builder()
    .schemas(new ArrayList<>(schemasSet))
    .build();

Test Coverage

Added unit test that:

  • Creates multiple records with the same schema kind
  • Verifies that only one unique schema is included in the final payload
  • Confirms that all records are still processed correctly
  • Uses reflection to test the private getIndexerPayload() method

Backward Compatibility

Fully backward compatible - no API changes
Maintains existing behavior for unique schemas
Only affects duplicate schema handling
Preserves schema insertion order

Potential Future Improvements

TypeMapper.getDataAttributeIndexerMapping is mutating object that is passed into it. This contributed to this problem when the schema was processed a second time. This method should be modified such that its not modifying the original object. This will require more extensive testing to ensure other operations do not depend on this method's current behavior of mutating the IndexSchema object.

Merge request reports

Loading