ADR: Persisting / Querying Status messages
Decision Title
Persisting / Querying Status messages to do a post query or analysis on status of a workflow that is either executing or was executed in past.
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Overview
With ADR#80 only real time updates of status was possible via subscribing to events. However, there is no mechanism to do a post query or analysis on status of a workflow that is either executing or was executed in past. This is the problem statement the current ADR is trying to address, and this can be done by persisting the status events and exposing a type safe API on top of this persisted content.
Context & Scope
ADR#80 talks about Global Status Monitoring framework that provide a mechanism to track status of data journey/dataflows on the data platform. More details can be found in here #80#
Data journey/Dataflows - A typical Dataflow can be expressed as shown
Dataflow could have millions of records spread across multiple datasets to ingest in Data Platform. Consumer of this status model should have way to decide
whether dataflow has finished or not if it is Successful or Failed if failed then why? what's the reason? at what stage it is in currently? The most important aspect of that ADR#80 was agreeing on the contract of status message. To ensure that every OSDU service emitting a status message abides by the contract and there is a standard data model. Please view raised MR for the same.
This MR defines two type of GSM status messages.
DataSet Details - Dataset pertains to any data, such as file, collection of files, etc. DatasetId is a metadata record id that is returned in the response by the Metadata API while the metadata record is created. User can use Dataset Id to find correlation id of workflows initiated and track their progress using status detail messages. Status Details - Holds the status of multiple stages in a dataflow initiated. As the decision of ADR it was agreed that every OSDU service will publish their status to message queue, against a CorrelationId. Consumers can simply subscribe to that message queue or notification service to get events of status change for that CorrelationId. All our OSDU services are making use of CorrelationId and propagate the same for further REST calls. We leverage CorrelationId for tying all related status changed notifications.
In the same ADR it was also mentioned that, Status data processor service can be built to listen to these status changed events and put into persistent store to make them accessible for future references using querying capabilities.
Solution
This ADR talks about contributing two services to OSDU community.
Status Collector Service Status Collector is an internal service (not exposed for external calls) and reacts to status messages that are published to status message queue. It picks up messages published to status message queue from every stage(OSDU Service) of the process and normalizes them to store in persistent storage for future reference.
Status Processor Service Status Processor Service provides APIs that allow users to query persisted status with multiple filters like correlationId, recordId, stage and status.
Status Processor endpoints The Status Processor service will support the operations listed below via different endpoints:
Query Dataset details
Query Status
Decision
Implement services to persist and query GSM messages.
Rationale
At present only real time updates of status was possible via subscribing to events. However, there is no mechanism to do a post query or analysis on status of a workflow that is either executing or was executed in past.