ADR: Workflow Scheduling Endpoints
We are often left to address the gaps from architectural principles (which stay at a pretty high and abstract level) to the actual implementation detail. Here is an attempt to bridge that gap by providing a set of Lightweight Architecture Decision Records (LADRs) which are simple to follow and can be implemented in a given team/project by the developers
Decision Title
Workflow Scheduling
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context & Scope
While building out the External Data Services functionality, we found that Airflow’s scheduling of DAGs has two shortcomings:
- You can only have one schedule per DAG.
- Schedules on DAGs cannot change without modifying the code and re-deploying those DAGs.
Therefore, we need an OSDU service to leverage advanced scheduling capabilities, wrapping Airflow’s DAGs in a cloud provider’s scheduling technology. In the attached swagger doc, the following endpoints have been added to support scheduling:
* Create Workflow Schedule
* Get Workflow Schedules
* Delete Workflow Schedule
* List Workflow Schedules
The schedule will use cron formatting to ensure a common standard is followed.
This is a smaller part of the data workflow adr: #69 (closed)
Decision
Adding scheduling capabilities of workflows into ingestion-workflow service
Rationale
- Introduce more robust scheduling functionality on OSDU workflows
- Keep in same workflow service to simplify code by re-using existing workflow code
Consequences
New Endpoints and a new SPI for CSPs. Here's the SPI:
public interface ISchedulerService {
public void createWorkflowSchedule(WorkflowSchedule request);
public GetWorkflowSchedulesResponse getWorkflowSchedules(GetWorkflowSchedulesRequest request);
public ListWorkflowSchedulesResponse getAllWorkflowSchedules();
public void deleteWorkflowSchedule(String scheduleId);
}
And this is what is in a WorkflowSchedule model class:
public class WorkflowSchedule {
@NotNull(message = "Schedule name cannot be missing")
private String name;
@NotNull(message = "Schedule description cannot be missing")
private String description;
@NotNull(message = "Cron schedule cannot be missing")
@Pattern(regexp = "((((\\d+,)+\\d+|(\\d+(\\/|-)\\d+)|\\d+|\\*|\\?) ?){5,7})", message="Invalid cron expression")
private String cronSchedule;
@NotNull(message = "Schedule dag name cannot be empty")
private String dagName;
@NotNull(message = "Schedule input parameters cannot be empty")
Map<String, Object> inputParameters;
}
When to revisit
Tradeoff Analysis - Input to decision
EDS workflows need ability to schedule dag runs based on configuration. The same DAG will be used on multiple different schedules each with differing input. This could exist as a totally separate service however, the code will be cleaner if implemented in ingestion-workflow
Alternatives and implications
Decision criteria and tradeoffs
Decision timeline
Decision ready to be made