Skip to content

ADR: Workflow Scheduling Endpoints

We are often left to address the gaps from architectural principles (which stay at a pretty high and abstract level) to the actual implementation detail. Here is an attempt to bridge that gap by providing a set of Lightweight Architecture Decision Records (LADRs) which are simple to follow and can be implemented in a given team/project by the developers

Decision Title

Workflow Scheduling

Status

  • Proposed
  • Trialing
  • Under review
  • Approved
  • Retired

Context & Scope

While building out the External Data Services functionality, we found that Airflow’s scheduling of DAGs has two shortcomings:

  • You can only have one schedule per DAG.
  • Schedules on DAGs cannot change without modifying the code and re-deploying those DAGs.

Therefore, we need an OSDU service to leverage advanced scheduling capabilities, wrapping Airflow’s DAGs in a cloud provider’s scheduling technology. In the attached swagger doc, the following endpoints have been added to support scheduling:

* Create Workflow Schedule
* Get Workflow Schedules
* Delete Workflow Schedule
* List Workflow Schedules

The schedule will use cron formatting to ensure a common standard is followed.

This is a smaller part of the data workflow adr: #69 (closed)

Decision

Adding scheduling capabilities of workflows into ingestion-workflow service

Rationale

  • Introduce more robust scheduling functionality on OSDU workflows
  • Keep in same workflow service to simplify code by re-using existing workflow code

Consequences

New Endpoints and a new SPI for CSPs. Here's the SPI:

public interface ISchedulerService {
  public void createWorkflowSchedule(WorkflowSchedule request);

  public GetWorkflowSchedulesResponse getWorkflowSchedules(GetWorkflowSchedulesRequest request);

  public ListWorkflowSchedulesResponse getAllWorkflowSchedules();

  public void deleteWorkflowSchedule(String scheduleId);
}

And this is what is in a WorkflowSchedule model class:

public class WorkflowSchedule {
  @NotNull(message = "Schedule name cannot be missing")
  private String name;

  @NotNull(message = "Schedule description cannot be missing")
  private String description;

  @NotNull(message = "Cron schedule cannot be missing")
  @Pattern(regexp = "((((\\d+,)+\\d+|(\\d+(\\/|-)\\d+)|\\d+|\\*|\\?) ?){5,7})", message="Invalid cron expression")
  private String cronSchedule;

  @NotNull(message = "Schedule dag name cannot be empty")
  private String dagName;

  @NotNull(message = "Schedule input parameters cannot be empty")
  Map<String, Object> inputParameters;
}

When to revisit


Tradeoff Analysis - Input to decision

EDS workflows need ability to schedule dag runs based on configuration. The same DAG will be used on multiple different schedules each with differing input. This could exist as a totally separate service however, the code will be cleaner if implemented in ingestion-workflow

Alternatives and implications

Decision criteria and tradeoffs

Decision timeline

Decision ready to be made

Edited by Chad Leong
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information