CI/CD Architecture Improvement: Terraform Kubernetes Provider resources should be managed with separate apply operations
As outlined in the official Terraform documentation:
The most reliable way to configure the Kubernetes provider is to ensure that the cluster itself and the Kubernetes provider resources can be managed with separate apply operations. Data-sources can be used to convey values between the two stages as needed.
(source: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs)
There should be two separate Azure DevOps Pipelines:
- one for Kubernetes (AKS) Cluster creation
- one for deploying Kubernetes provider resources (namespaces, helm charts, etc..) on top of already existing AKS
Today there is only one ADO Pipeline triggering terraform apply
step that creates AKS cluster and Kubernetes provider resources as a single Terraform action.
This approach is not in line with Terraform internal architecture and leads to Terraform dependency tree issues described in the following thread: https://github.com/hashicorp/terraform/issues/27601
The same issue occurs in the OSDU case after we destroy AKS cluster (eg: via Azure Portal User Interface):
[...]
2021-06-09T13:52:05.6505397Z [31m╷[0m[0m
2021-06-09T13:52:05.6506490Z [31m│[0m [0m[1m[31mError: [0m[0m[1mGet "http://localhost/api/v1/namespaces/osdu": dial tcp [::1]:80: connect: connection refused[0m
2021-06-09T13:52:05.6507392Z [31m│[0m [0m
2021-06-09T13:52:05.6508006Z [31m│[0m [0m[0m with kubernetes_namespace.osdu[0],
2021-06-09T13:52:05.6509247Z [31m│[0m [0m on config_map.tf line 24, in resource "kubernetes_namespace" "osdu":
2021-06-09T13:52:05.6510077Z [31m│[0m [0m 24: resource "kubernetes_namespace" "osdu" [4m{[0m[0m
2021-06-09T13:52:05.6510672Z [31m│[0m [0m
2021-06-09T13:52:05.6511157Z [31m╵[0m[0m
2021-06-09T13:52:05.6511666Z [31m╷[0m[0m
2021-06-09T13:52:05.6512555Z [31m│[0m [0m[1m[31mError: [0m[0m[1mGet "http://localhost/api/v1/namespaces/cert-manager": dial tcp [::1]:80: connect: connection refused[0m
2021-06-09T13:52:05.6513455Z [31m│[0m [0m
2021-06-09T13:52:05.6514047Z [31m│[0m [0m[0m with kubernetes_namespace.certs,
2021-06-09T13:52:05.6514834Z [31m│[0m [0m on helm_certs.tf line 26, in resource "kubernetes_namespace" "certs":
2021-06-09T13:52:05.6515659Z [31m│[0m [0m 26: resource "kubernetes_namespace" "certs" [4m{[0m[0m
[...]
As a result, Terraform is unable to restore its Desired State which is a basic principle for Infrastructure as a Code Platform like OSDU.
I suggest to split the current terraform apply
ADO Pipeline (pipeline-service-resources.yml) into two separate steps/pipelines as like suggested by Hashicorp and their official documentation so Terraform is always capable to restore its Desired State while running terraform apply
.
Today explicit dependency rules like:
- https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/service_resources/helm_flux.tf#L34
- https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/service_resources/helm_flux.tf#L51
- https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/service_resources/helm_flux.tf#L108
- https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/service_resources/helm_keda.tf#L31
- https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/infra/templates/osdu-r3-mvp/service_resources/helm_keda.tf#L41
have no effect and are completely ignored by Terraform if triggered on already initialized/existing infrastructure!