ADR: Merge Gate Policy for CSP Validation
Merge Gate Policy for CSP Validation
Status
- Proposed
- Trialing
- Under review
- Approved
- Retired
Relates to
- ADR #49: Acceptance Testing for One CI/CD Pipeline
- ADR #50 (closed): Simplified Gradual Transition (closed, done)
- Epic #42: Venus CSP decoupling
Context
The Venus roadmap and Epic #42 point toward a CIMPL-only merge gate. Key prerequisites are landing: #50 (closed) decoupled the build, and #49 standardizes acceptance tests. However, the merge-gate policy itself has not been formally decided.
In practice, the current "all jobs must pass" policy is not enforced. Maintainers routinely merge with red pipelines when CSP infrastructure failures block unrelated work.
The replacement gate, CIMPL acceptance tests, does not yet match the coverage and quality of the CSP integration tests currently used at merge time. A one-step cutover would retire that signal before its replacement is ready. The phased approach raises CIMPL acceptance-test quality to match or exceed the provider integration tests service by service, while AWS, Azure, and GCP remain visible in the trusted child pipeline as a reference signal during the transition.
IBM is a separate case. Its cascading deploy failures in services across the platform produce noise rather than useful signal, so it is disconnected from the merge path immediately.
Decision
Adopt a phased transition to a CIMPL-only merge gate, with routing and advisory behavior defined per provider.
The policy has three governing rules:
- Each provider runs in exactly one pipeline for a given source.
- Advisory behavior is rule-scoped and limited to the MR-connected trusted child pipeline.
- Movement between phases requires explicit evidence and named approval.
Phase plan
Phase 1: hybrid routing
CIMPL becomes the voting merge gate. AWS, Azure, and GCP remain in the MR-connected trusted child pipeline as advisory reference signals during the transition. IBM moves immediately to a disconnected trusted-push pipeline because its current failure pattern does not provide reliable merge-time signal.
| Pipeline | Source | CIMPL | AWS / Azure / GCP | IBM |
|---|---|---|---|---|
| Feature / default / release / tag | push |
voting | voting | voting |
| Trusted child (MR-connected) | pipeline |
voting (merge gate) | advisory | skipped |
| Trusted push (disconnected) |
push / web
|
skipped | skipped | runs |
Advisory behavior is MR-only. Release and tag pipelines remain voting.
Measurement begins immediately through the OSDU Quality dashboard:
- per-service acceptance-test pass rates (weekly PMC review)
- per-provider rolling 30-day pass rates
- failure attribution between infrastructure and code
Phase 2: validate
Per-service acceptance-test coverage analysis. Compare scenarios covered by standardized acceptance tests (#49) against existing CSP integration tests. Document gaps, resolve or accept risk, and obtain PMC sign-off with CSP-representative participation.
Formalize CSP health thresholds, with 80% rolling 30-day pass rate as the initial proposed threshold.
Phase 3: transition
CSP jobs are removed from the MR-connected pipeline entirely. Each CSP operates independent build, test, and deployment mechanisms. The community tracks regression rates attributable to the transition.
Phase gates
| Transition | Required evidence | Approver |
|---|---|---|
| Phase 1 to 2 | Stable pass-rate baseline established per provider; #49 acceptance tests landed per service | PMC review of pass-rate trends |
| Phase 2 to 3 | Per-service coverage analysis complete; no unresolved gaps | PMC + CSP sign-off |
| Phase 3 exit | Each CSP operates independent build, test, and deployment mechanisms; no regression attributable to the transition | PMC |
Mechanism
Two guard patterns, applied per job.
CSP / CIMPL, stay on the MR-connected pipeline:
rules:
- if: $CI_COMMIT_REF_NAME =~ /^trusted-/ && $CI_PIPELINE_SOURCE != 'pipeline'
when: never # skip on disconnected trusted-push
- if: $CI_COMMIT_REF_NAME =~ /^trusted-/ && <existing condition>
when: on_success
allow_failure: true # advisory ONLY in MR-connected trusted child (omit for CIMPL)
- if: <existing condition>
when: on_success # voting elsewhere (master/release/tags)
IBM, run only on disconnected push (Phase 1a):
rules:
- if: '$CI_PIPELINE_SOURCE == "pipeline" && $CI_COMMIT_REF_NAME =~ /^trusted-/'
when: never # skip on MR-connected pipeline
- <existing conditions> # run normally on trusted-push
CIMPL carries the same guard as healthy CSPs but remains voting (no allow_failure). It is the replacement merge gate.
Rollout requirements
Every needs: edge that crosses a provider boundary is a potential scope gap. So is every legacy only:-based template gated on $CSP == '1'. A completeness check is required before filing an implementation MR:
- Grep every downstream job whose
needs:points at a provider job being routed. - Walk every
cloud-providers/*.ymltemplate that matches the routing scope. Do not rely on only the files being actively edited. - Apply the guard pattern consistently across all matches.
Review iteration alone is insufficient for this class of change. The audit is non-optional and must be repeated whenever routing or pipeline topology changes.
Consequences
Positive
- Eliminates CSP-driven merge blockages at Phase 1 while keeping AWS, Azure, and GCP visible as a reference signal during the transition.
- Raises CIMPL acceptance-test coverage and quality to match or exceed provider integration tests before retiring the old signal.
- Release and production pipelines still fail hard on CSP regressions, because advisory posture is rule-scoped to the MR-connected trusted child.
- Applies differentiated provider treatment based on signal quality rather than a uniform policy.
- Uses patterns already proven in production (CSP acceptance tests already run with
allow_failure: true).
Trade-offs
- Slower than a one-step cutover.
- Requires active PMC review of pass-rate trends during Phase 1.
- Phase 2 coverage analysis requires coordination with CSP teams.
- Routing changes require a repeated completeness audit across
needs:edges and provider templates.
Alternatives considered
- Status quo. All jobs must pass. Rejected: policy not enforced in practice.
- Immediate uniform cutover. Move all CSP jobs to a disconnected trusted-push pipeline in one step. Rejected because it retires the current merge-time signal before CIMPL acceptance tests reach comparable coverage and quality. The phased approach keeps provider integration tests visible as a reference signal during that transition.
- Uniform advisory for all CSPs (no IBM disconnect). Rejected: IBM's cascading deploy failures produce continuous yellow noise without adding decision value.
-
Stage-based scoping (advisory for deploy/integration, leave acceptance alone). Rejected: breaks
needs:dependency graphs because acceptance jobs depend on deploy jobs. The correct unit of routing is per-provider, not per-stage.