AWS Replay Feature

Overview

This merge request implements the Replay API feature for the AWS provider in the OSDU Storage Service. The implementation follows the consolidated messaging approach using Amazon SNS and SQS, with DynamoDB for status tracking.

Key Components

  • ReplayServiceAWSImpl: AWS-specific implementation of the core ReplayService
  • ReplayRepositoryImpl: DynamoDB-based implementation for storing replay status
  • ReplayMessageHandler: Handles publishing replay messages to SNS
  • ReplaySubscriptionMessageHandler: Polls SQS for replay messages
  • ReplayMessageProcessorAWSImpl: Processes replay messages and updates status
  • ParallelReplayProcessor: Handles asynchronous and parallel processing of replay operations

Features

  1. Consolidated Messaging: Single SNS topic for all replay operations with operation-specific attributes
  2. Per-Kind Status Tracking: Individual records in DynamoDB for each kind being replayed
  3. Asynchronous Processing: Non-blocking API with background processing
  4. Parallel Processing: Configurable batch size and parallelism level
  5. Robust Error Handling: Exponential backoff retries and dead letter queue support
  6. Schema service integration: Retrieval of all kinds during a replay all operation
  7. Resume-on-Failure: DynamoDB tracks the current cursor for each batch retrieval, allowing any pod to resume processing from the last saved position if the original pod fails.

Testing

  • Added unit tests for all major components
  • Implemented integration tests for the API endpoints
  • Tested replay with TNO/Volve/Reference datasets
  • Tested with stringToBool fix (See: indexer-service!809 (merged))

Related Issues

Related ADRs

Edited by Marc Burnie [AWS]

Merge request reports

Loading