David Diederich requested to merge manual-pipelines into master Mar 01, 2021

What this MR does at a high level

This adds in new stages to the pipeline, which "select" various cloud deployments during the early parts of the pipeline. Each cloud deployment can be independently included in the pipeline, and will not run until a maintainer selects that cloud deployment.

The default branches, release branches, and tags will always run all deployments. Trusted branches and Committer branches, however will both present cloud deployment options.

The committers that are merging the MRs to the default branches will need to be cognizant of what changes are present in the MR, and which integration test pipelines are necessary to run. They will also need to express that knowledge by choosing the appropriate clouds in the pipeline view -- pipelines will not run until instructed to.

Rationale for Merging

This is a small, but significant change to the standard process of creating, testing, and approving MRs. It comes with costs and benefits, and for that reason I'd like to get the feedback from all the major development teams as well as the PMC on if and when to merge this in.

Some Benefits

Cuts down on pipeline jobs. By restricting the cloud deployments to only those that matter to the MR, we can reduce how much work is performed each MR, which may reduce total pipeline time. Remember, the jobs were always run in parallel on different runner machines, but there were some shared resources (including community itself) that they competed over.
Reduce cross-MR interference. If one team is testing a new feature out and rapidly deploying to their infrastructure with new commits, it will no longer be running the deployments of other clouds. This can avoid situations where unnecessary deployment interferes with work being done on a different MR.
Formally allow intuitions about what an MR affects. Committers have already bypassed failing pipelines if they believe the failure could not possibly have been caused by the MR. This formalizes that process, and extends it a step further by allowing them to skip the execution altogether when it isn't affected by the change.

Some Costs

Last minute process change. Getting the word out to all developers about the new process takes time, and there will be confusion immediately after the change about why pipelines aren't running like they used to. This confusion could end up costing time, in a moment where we have very little of it.
More possibility of errors. Running less tests runs the risk of not catching mistakes as quickly. If a change is thought to only affect one cloud, but actually affects all of them, this information won't be known until the branch lands in the default branch. At that point, it is harder to debug. Because our integration test environments are shared, some test failures are normal -- this will delay the time it takes to detect the problem, and may muddy the research into which MR caused the problem.
More complexity in the pipeline. This creates even more complexity in the CI-CD Pipelines, which all committers will need to maintain and consider going forward.
Not fully tested. This logic was not tested in every service using real deployment logic. It was tested using similar pipeline structures. Some services may have unforeseen problems after merging the change.

How this works technically

Each of the major cloud-provider CI definitions now includes a new step named "select-${CLOUD}". The ruleset for those jobs sets it to one of three states:

when: never disables the job altogether. This happens when the cloud deployment isn't possible, usually because the pipeline isn't a protected one.
when: always enables the job and sets it to run automatically. This unblocks the cloud deployment chain without any user input. This happens on the major branches, where we always want stability.
when: manual causes GitLab to wait for user input before proceeding. A maintainer can open the pipeline and explicitly start this job, which enables the rest of the deployment and integration test jobs to execute. All trusted and committer branches follow this pattern.

The cloud selection jobs are marked as prerequisites to the initial stages for each deployment. As a result, the jobs do not execute normally. The dependency is added to the existing one (compile-and-unit-test), so both prerequisites must complete before the deployments and integration testing commences.

Examples

To help illustrate the pipelines, I created a new project in my personal namespace that mirrors the structure of the OSDU pipelines without the content -- I replaced all script tags with simple echo statements. This should help describe the look and feel of the change.

Everybody in OSDU is granted access to this project -- feel free to create branches, MRs, etc. to play around. The project has fake CI variables and fake job scripts, no real deployments should occur. Of course, if you modify the .gitlab-ci.yml file to include new pipelines, make sure the overrides are set appropriately to maintain this.

Default Branch (Pipeline), Release Branch (Pipeline), and Release Tags (Pipeline)

These pipelines automatically runs all stages without prompting. The "select" jobs still appear so as to satisfy the dependency on the containerize steps, but it always runs without prompting the user.

These jobs utilize the osdu-small runners, which are plentiful, and they do not clone the repository to execute. As a result, they should have very little overhead to the execution environment, and will complete long before the compile-and-unit-test job.

Contributor Branch In Progress (Pipeline)

While a contributor branch is in development -- that is, without an MR -- it will not generate the jobs that correspond to the integration tests. This includes the jobs that select the integration tests. This is not a change in any way for these branches.

Contributor Branches with A Trusted Branch (Pipeline, MR)

When a contributor branch has an MR and a corresponding trusted branch, the protected pipeline will show the selection jobs as manual execution. Each one can be independently started.

Note that the MR itself controls the parent pipeline, whose only job is to launch a child (trusted) pipeline, then mirror the pass/fail result. If some manual executions are left unrun, the pipeline will be mirror as a 'failed' state -- because the parent job is explicitly looking for a fully successful or allowed-to-fail state. We could potentially update this logic in a future enhancement, but it shouldn't simply treat unexecuted pipelines as success -- there needs to be some logic to ensure that at least one pipeline was executed.

Contributor Branches with an outdated Trusted Branch (Pipeline, MR)

When a contributor branch's trust is out of date, because the original contributor pushed changes beyond what was trusted by a committer, the pipeline still fails on the first stage. The selection jobs, as well as the compilation jobs, all have implicit need statements -- they rely on all previously stages (columns) to be successful.

Nobody can force a deployment to a cloud environment in this case.

Committer Branches In Progress (Pipeline)

While a committer branch is in development -- that is, without an MR -- it will generate the selection jobs for all clouds. The committer that is working on the branch can choose on each pipeline whether to run one, some, or all of the cloud integration tests against the commit.

These branches are designed for when a committer is working on the pipeline logic or the integration tests themselves, so it is expected that frequently the decision will be to test only one cloud. This gives the committer a way to rapidly test changes to the integration test, in the real GitLab environment, without creating rapid and unnecessary deployments to other clouds.

Committer Branches in MR (Pipeline, MR)

When a committer branch is submitted for MR, the no-detached-pipeline tag is used to prevent the trusted MR triggering. In these cases, the most recent branch pipeline is used for the MR status. So, this looks identical to committer branches in progress.

Manual pipelines for integration tests