infra-azure-provisioning issueshttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues2021-09-06T10:21:59Zhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/201Concurrent runs dags dashboard not reflecting correct information on graph2021-09-06T10:21:59ZBhakti ThakkarConcurrent runs dags dashboard not reflecting correct information on graph4 dags running concurrent. Atached is the screenshot of running dags concurrently.
![image](/uploads/6af01530b1f3ac0a482a764a7b65ca3a/image.png)
Graph shows number of concurrent dags as 0
![image](/uploads/93c26eabf7da46ed68a6530fe5b79...4 dags running concurrent. Atached is the screenshot of running dags concurrently.
![image](/uploads/6af01530b1f3ac0a482a764a7b65ca3a/image.png)
Graph shows number of concurrent dags as 0
![image](/uploads/93c26eabf7da46ed68a6530fe5b793bf/image.png)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/183Create new pipeline for deploying airflow in data partitions2022-08-23T10:47:29ZVineeth Guna [Microsoft]Create new pipeline for deploying airflow in data partitionsNeed to automate deployment of airflow in data partition to enable multi partition support for airflow
To accomplish this need to create an ADO pipeline which can be used by customers to automate the deploymentNeed to automate deployment of airflow in data partition to enable multi partition support for airflow
To accomplish this need to create an ADO pipeline which can be used by customers to automate the deploymentVineeth Guna [Microsoft]Vineeth Guna [Microsoft]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/322[critical?] Service Schema Loading Fails with `InaccessibleObjectException`2023-09-25T10:56:22ZPaweł Grudzień[critical?] Service Schema Loading Fails with `InaccessibleObjectException`**Description:**
When executing the Service Schema Loading step from the Load Service Data section during OSDU installation, the provided Docker setup repeatedly returns "Internal server error". These errors are preventing the successfu...**Description:**
When executing the Service Schema Loading step from the Load Service Data section during OSDU installation, the provided Docker setup repeatedly returns "Internal server error". These errors are preventing the successful addition of schemas to OSDU.
**Error Details:**
Received multiple `500 Internal Server Error` responses for different schemas including:
```
Error with kind osdu:wks:AbstractAccessControlList:1.0.0: Message: Internal server error
Error with kind osdu:wks:AbstractActivityParameter:1.1.0: Message: Internal server error
Error with kind osdu:wks:AbstractActivityParameter:1.0.0: Message: Internal server error
Error with kind osdu:wks:AbstractActivityState:1.0.0: Message: Internal server error
Error with kind osdu:wks:AbstractAliasNames:1.0.0: Message: Internal server error
Error with kind osdu:wks:AbstractAnyCrsFeatureCollection:1.1.0: Message: Internal server error
Error with kind osdu:wks:AbstractAnyCrsFeatureCollection:1.0.0: Message: Internal server error
Error with kind osdu:wks:AbstractCoordinates:1.0.0: Message: Internal server error
...
Error with kind osdu:wks:work-product--WorkProduct:1.0.0: Message: Internal server error
Error with kind osdu:wks:reference-data--WorkflowPersonaType:1.0.1: Message: Internal server error
Error with kind osdu:wks:reference-data--WorkflowPersonaType:1.0.0: Message: Internal server error
Error with kind osdu:wks:reference-data--WorkflowUsageType:1.0.1: Message: Internal server error
Error with kind osdu:wks:reference-data--WorkflowUsageType:1.0.0: Message: Internal server error
```
Each resulting in error like this:
```
Error with kind osdu:wks:master-data--WellboreOpening:1.0.0: Message: Internal server error
Try PUT for id: osdu:wks:reference-data--WellboreOpeningStateType:1.0.0
{"error":{"code":500,"message":"Internal server error","errors":[{"domain":"global","reason":"internalError","message":"Internal server error"}]}}
https://osdu-pl2-srpl2-k8q2-istio-gw.centralus.cloudapp.azure.com/api/schema-service/v1/schemas/system
500
```
Logs for schema servic[schama.zip](/uploads/ba19c442c8b488ecb22d79deaa48b1aa/schama.zip)report the following exception:
```
java.lang.reflect.InaccessibleObjectException: Unable to make field private static final long java.util.ArrayList.serialVersionUID accessible: module java.base does not "opens java.util" to unnamed module @61e86192
```
(full logs in the attachment)
As a result full installation of OSDU is impossible.
**Expected Behavior:**
Schemas should be successfully added to OSDU without any errors.
**Actual Behavior:**
Repeated "Internal server error" prevents the addition of schemas.
**Steps to Reproduce:**
1. Proceed to the Service Schema Loading step from the Load Service Data section of OSDU installation instructions.
2. Execute the provided commands.
3. Observe the repeated "Internal server error" and check logs for details.
**Suggested Fix:**
Research suggests that the Java runtime environment might be causing the `InaccessibleObjectException` due to module restrictions in more recent Java versions. Consider revisiting the implementation to ensure compatibility with the used Java version or adjust the runtime environment to a version that doesn't enforce these module boundaries. I'm not sure if this service changed the java version but this may be something to consider.
**Environment:**
- OSDU version: 0.23
EDIT: added some formatting and got spam update errorhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/153Dataset Service Onboarding2022-08-23T10:47:32ZVivek OjhaDataset Service Onboarding**Service name**: `Dataset`
The following steps must be completed for a service to onboard with OSDU on Azure. Additionally, please add the `Service Onboarding` tag to this issue when it is created.
For more information, visit our serv...**Service name**: `Dataset`
The following steps must be completed for a service to onboard with OSDU on Azure. Additionally, please add the `Service Onboarding` tag to this issue when it is created.
For more information, visit our service onboarding documentation [here](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-onboarding.md).
## Steps:
**Infrastructure and Initial Requirements**
- [ ] Add any additional Azure cloud infrastructure (Cosmos containers, Storage containers, fileshares, etc.) to the Terraform template. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/infra/templates/osdu-r3-mvp). Note that if the infrastructure is a part of the data-partition template, you may need to add secrets to the keyvault that are partition specific; if doing so, update the createPartition REST request to include the keys that you have added so they are accessible in service code. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/rest/partition.http#L48)
- [ ] Create an ingress point for the service. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/charts/osdu-common/templates/appgw-ingress.yaml)
- [ ] Add any test data that is required for the service integration tests. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/tools/test_data)
- [ ] Update `upload-data.py` to upload any new test data files you created. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/test_data/upload-data.py).
- [ ] Update the integration tester with any entitlements required to test the service. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/test_data/user_info_1.json)
- [ ] Add in any new secrets that the service needs to run. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/charts/osdu-common/templates/kv-secrets.yaml)
- [ ] Create environment variable script to generate .yaml files to be used with Intellij [EnvFile](https://plugins.jetbrains.com/plugin/7861-envfile) plugin and .envrc files to be used with [direnv](https://direnv.net/). [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/tools/variables)
**Gitlab Code and Documentation**
- [ ] Complete the service code such that it passes all integration tests locally. There is some documentation on starting off implementing an Azure provider. [Link](./gitlab-service-readme-template.md)
- [ ] Create helm charts for service. The charts for each service are located in the `devops/azure` directory. You can look at charts from other services as a model. The charts will be nearly identical except for the different environment variables, values, etc each service needs to run. [Link](./gitlab-service-guide.md)
- [ ] Implement Istio for the service if this has not already been done. Here is an example MR that shows what steps are required. [Link](https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/64)
- [ ] Create an Istio auth policy in the `devops/azure/chart/templates` directory. Here is an example of an Istio auth policy that is generic and can be used by other services. [Link](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/devops/azure/chart/templates/azure-istio-auth-policy.yaml)
- [ ] Add any variables that are required for the service integration tests to the Azure CI-CD file. [Link](https://community.opengroup.org/osdu/platform/ci-cd-pipelines/-/blob/master/cloud-providers/azure.yml)
- [ ] Verify that the README for the Azure provider correctly and clearly describes how to run and test the service. There is a README template to help. [Link](./gitlab-service-readme-template.md)
- [ ] Push any changes and verify that the Gitlab pipeline is passing in master.
**Development and Demo Azure Devops Pipelines**
- [ ] Create development ADO pipeline at `devops/azure/development-pipeline.yml` in the service repo.
- [ ] Verify development pipeline passes in ADO.
- [ ] Create Demo ADO pipeline at `devops/azure/pipeline.yml` in the service repo.
- [ ] Verify demo pipeline is passing in ADO.
**User Documentation**
- [ ] Add the service to the mirror pipeline instructions. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/code-mirroring.md)
- [ ] Add the service to the manual deployment instructions. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/charts)
- [ ] Add any required variables to the already existing variable group instructions for automated deployment. You should know if any variables need to be added to existing variable groups from creating the development and demo pipelines. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-automation.md#create-osdu-service-libraries)
- [ ] Add a variable group `Azure Service Release - $SERVICE_NAME` to the documentation. You should know what values to set for this variable group from creating the development and demo pipelines. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-automation.md#create-osdu-service-libraries)
- [ ] Add a step for creating the service pipeline at the bottom of the service-automation page. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-automation.md#create-osdu-service-libraries)
- [ ] Create a rest script with sample calls to the service for users. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/tools/rest)Vivek OjhaVivek Ojhahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/225Disable Registry Scan feature for flux2022-04-12T09:17:46ZDzmitry_Paulouski (slb)Disable Registry Scan feature for fluxThere are a lot of error messages in the Flux pod.
They are caused by Flux checking for new images, but access to container registry is not provided:
https://fluxcd.io/legacy/flux/faq/#how-do-i-give-flux-access-to-an-image-registry
_Flux...There are a lot of error messages in the Flux pod.
They are caused by Flux checking for new images, but access to container registry is not provided:
https://fluxcd.io/legacy/flux/faq/#how-do-i-give-flux-access-to-an-image-registry
_Flux transparently looks at the image pull secrets that you attach to workloads and service accounts, and thereby uses the same credentials that Kubernetes uses for pulling each image. In general, if your pods are running, then Kubernetes has pulled the images, and Flux should be able to access them too._
Since we do not use this feature, it can be disabled. https://fluxcd.io/legacy/flux/faq/#can-i-disable-flux-registry-scanningDzmitry_Paulouski (slb)Dzmitry_Paulouski (slb)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/241Drop support of keda_v2_enabled flag on services side2022-08-30T20:48:30ZVasyl Leskiv [SLB]Drop support of keda_v2_enabled flag on services sideAs beyond infrastructure repo the feature flag has been added into service repos ( for example helm-charts-azure) we need to make cleanup on services side and drop the file [infra-azure-provisioning/docs/keda-upgrade.md](https://communit...As beyond infrastructure repo the feature flag has been added into service repos ( for example helm-charts-azure) we need to make cleanup on services side and drop the file [infra-azure-provisioning/docs/keda-upgrade.md](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/keda-upgrade.md) as it doesn't make sense to support keda v1 anymore.Arturo Hernandez [EPAM]Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/167Enable autoscaling2022-08-23T10:47:29Zashley kelhamEnable autoscaling## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The Azure OSDU AKS deployment and the services deployed do not utilize autoscaling. This is due to limitations in [AGIC](https://docs.microsoft....## Status
- [X] Proposed
- [X] Under review
- [X] Approved
- [ ] Retired
## Context & Scope
The Azure OSDU AKS deployment and the services deployed do not utilize autoscaling. This is due to limitations in [AGIC](https://docs.microsoft.com/en-us/azure/application-gateway/ingress-controller-overview) that is used to populate Pod IPs to the Application Gateway Backend pools to route requests.
Application Gateway is slow at updating this information when Pod IPs change e.g. during autoscaling and so can cause high error rates on client requests.
This is a simplified deployment view of the current AKS and Application Gateway setup.
![current-infra-azure](/uploads/3f75f573997e8f34c26287838d530d76/current-infra-azure.PNG)
To enable autoscaling we then need to replace AGIC with a different ingress controller technology. Istio Ingress Controller is already being utilized for East West traffic within the cluster. We therefore want to extend this to be used by the North South traffic.
This will allow us to enable cluster autoscaling and Horizontal Pod Scaling (HPA) of services.
Also of note, this is not the end solution as a fork will be made of the OSDU infrastructure to re-design the solution for the needs of the PAAS deployment on Azure. The time frame for this is approximately 6 months and so this should be seen as a temporary solution to enable autoscaling.
## Trade-off Analysis
One approach is to expose the Istio Ingress Controller directly to external traffic. This simplifies the architecture and enables TLS for in cluster traffic where as today TLS termination happens at App Gateway and HTTP is used after.
However this would mean we need to replicate both the WAF and telemetry created by the Application Gateway today in the Istio Ingress controller. We would also need to update the monitoring solutions to use the new telemetry.
Although possible, this is all extra work that will take time. Given this is a temporary solution having the optimal implementation is not necessary as long as we can enable autoscaling.
Therefore a compromise solution where we keep the Application Gateway and forward requests to the Istio ingress controller will be simpler to implement and still achieve our goal. We would then keep the Istio ingress controller endpoint private and not expose it to external traffic.
We are also proposing to have a separate node pool in the AKS as currently the system node pool is utilized for the services deployed. This is the recommended best practice to prevent user services compromising critical system resources. This new node pool will be configured to autoscale.
mTLS is required between the ingress controller and app gateway to enforce requests to the cluster can only be routed through app gateway.
Individual services will need to be configured their own HPA based on the services needs after this change is applied.
Below is the simplified deployment view of the new solution.
![autoscale-infra-azure](/uploads/07c1ee912f955cdaf8dfaee054f7fa63/autoscale-infra-azure.PNG)
## Decision
- AGIC will be removed
- Istio ingress will be used to route requests to the services deployed in AKS
- Application gateway will remain with a backend pool per service. Each backend pool forwards the request onto the same Istio Ingress Controller
- A new node pool will be added to the AKS deployment. Cluster autoscaling will be enabled on this node pool. All services will be deployed
## Future work / Out of scope
- Understand Node limits for autoscaled cluster
- Understand [HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
configuration needed for individual services
- Optimizing the configuration of the [cluster autoscaler](https://docs.microsoft.com/en-us/azure/aks/cluster-autoscaler#using-the-autoscaler-profile)
- Autoscaling the cluster for [burst traffic scenarios](https://docs.microsoft.com/en-us/azure/aks/virtual-nodes)ashley kelhamashley kelhamhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/164Environment Grid Topic Name Size Constraint2022-08-23T13:30:00ZAbhishek ChowdhryEnvironment Grid Topic Name Size ConstraintThere is a size limitation for the eventgridtopic name: it should be between 3-50 chars. This creates an issue when we are trying to create new topics with bigger names. For example, if we want to configure a topic called legaltagstatech...There is a size limitation for the eventgridtopic name: it should be between 3-50 chars. This creates an issue when we are trying to create new topics with bigger names. For example, if we want to configure a topic called legaltagstatechangetopic, then the eventgridtopic names which will be configured for dev and glab env are:
"osdu-mvp-dp1dev-qs29-grid-legaltagstatechangetopic" - size 50 for dev env
"osdu-mvp-dp1glab-ky7v-grid-legaltagstatechangetopic" - size 51 for glab env
So for dev env, everything will go smoothly but builds will fail for glab env. This happened because of the environment alias size i.e. dev vs glab(3 char vs 4 char). And this would happen for demo, pentest environments as well. For now we will have to keep the topic name in accordance with the largest env alias so that eventgrid topic name doesn't cross 50 chars for any environment. We will have to see how we can mitigate this issue and whether we can get rid of things like "osdu-mvp-dp1dev-qs29-grid" from eventgrid topic names to allow for bigger custom topic names.
We will also have to see if we are having similar issues while naming other resources.https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/214Error Azure Infra setup on step "Deploy Monitoring Resources"2021-11-23T08:53:37ZSergey ZemskovError Azure Infra setup on step "Deploy Monitoring Resources"I have successfully completed all the required steps before:
- `common_prepare.sh` script has executed without errors and warnings
- `.envrc` file contain all necessary parameters
I get this error while execute deployment `terraform app...I have successfully completed all the required steps before:
- `common_prepare.sh` script has executed without errors and warnings
- `.envrc` file contain all necessary parameters
I get this error while execute deployment `terraform apply -var-file custom.tfvars`:
```
Error: Error creating or updating Scheduled Query Rule "airflow-import-errors-alert-osdu-mvp-mrdemo-e5sm" (resource group "osdu-mvp-mrdemo-e5sm-rg"): insights.ScheduledQueryRulesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="BadRequest" Message="Scope 'osdu-mvp-crdemo-dz5-ai' does not exists"
on main.tf line 239, in resource "azurerm_monitor_scheduled_query_rules_alert" "alerts":
239: resource "azurerm_monitor_scheduled_query_rules_alert" "alerts" {
```https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/317Error: Plugin did not respond2023-09-11T19:29:09ZYukuo WangError: Plugin did not respondWe captured several terraform plan failure recently.
│ Error: Plugin did not respond
│
│ with module.system_storage_account.azurerm_storage_account.main,
│ on ../../../modules/providers/azure/storage-account/main.tf line 19, in reso...We captured several terraform plan failure recently.
│ Error: Plugin did not respond
│
│ with module.system_storage_account.azurerm_storage_account.main,
│ on ../../../modules/providers/azure/storage-account/main.tf line 19, in resource "azurerm_storage_account" "main":
│ 19: resource "azurerm_storage_account" "main" {
│
│ The plugin encountered an error, and failed to respond to the
│ plugin.(*GRPCProvider).ReadResource call. The plugin logs may contain more
│ details.
╷
│ Error: Request cancelled
│
│ with module.keyvault_policy.azurerm_key_vault_access_policy.keyvault[0],
│ on ../../../modules/providers/azure/keyvault-policy/main.tf line 15, in resource "azurerm_key_vault_access_policy" "keyvault":
│ 15: resource "azurerm_key_vault_access_policy" "keyvault" {
│
│ The plugin.(*GRPCProvider).UpgradeResourceState request was cancelled.
╵
Also with stack trace logs:
Stack trace from the terraform-provider-azurerm_v3.39.1_x5 plugin:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4c12582]
goroutine 1950 [running]:
github.com/hashicorp/terraform-provider-azurerm/internal/services/containers.resourceKubernetesClusterRead(0xc001d94480, {0x5d01ea0?, 0xc000737000})
github.com/hashicorp/terraform-provider-azurerm/internal/services/containers/kubernetes_cluster_resource.go:2060 +0x9c2
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).read(0x6f8e340?, {0x6f8e340?, 0xc001fd32c0?}, 0xd?, {0x5d01ea0?, 0xc000737000?})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/resource.go:712 +0x178
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc000b56b60, {0x6f8e340, 0xc001fd32c0}, 0xc001f90750, {0x5d01ea0, 0xc000737000})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/resource.go:1015 +0x585
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ReadResource(0xc00152f980, {0x6f8e340?, 0xc001fd2ea0?}, 0xc001c5a100)
github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/grpc_provider.go:613 +0x4a5
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ReadResource(0xc001930320, {0x6f8e340?, 0xc001fd2780?}, 0xc001127140)
github.com/hashicorp/terraform-plugin-go@v0.14.1/tfprotov5/tf5server/server.go:748 +0x4b1
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ReadResource_Handler({0x63c6d80?, 0xc001930320}, {0x6f8e340, 0xc001fd2780}, 0xc001347b20, 0x0)
github.com/hashicorp/terraform-plugin-go@v0.14.1/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:349 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00027a000, {0x6f9e380, 0xc000f9e000}, 0xc002595b00, 0xc001993530, 0xb246a90, 0x0)
google.golang.org/grpc@v1.50.1/server.go:1340 +0xd23
google.golang.org/grpc.(*Server).handleStream(0xc00027a000, {0x6f9e380, 0xc000f9e000}, 0xc002595b00, 0x0)
google.golang.org/grpc@v1.50.1/server.go:1713 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
google.golang.org/grpc@v1.50.1/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
google.golang.org/grpc@v1.50.1/server.go:963 +0x28a
Error: The terraform-provider-azurerm_v3.39.1_x5 plugin crashed!
This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.
By troubleshooting on this, we noticed that there is a bug fix:
Fix nil panic by correcting nil check expression: https://github.com/hashicorp/terraform-provider-azurerm/pull/21850
This fix is inclued in terraform-provider-azurerm v3.57.0 (May 19, 2023)
https://github.com/hashicorp/terraform-provider-azurerm/blob/v3.57.0/CHANGELOG.md
BUG FIXES:
data.azurerm_kubernetes_cluster - prevent a panic when some values returned are nil (#21850)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/151Event Grid Role Assignment for Webhook.2021-06-23T09:27:44ZKomal MakkarEvent Grid Role Assignment for Webhook.
<!--- Provide a general summary of the issue in the Title above -->
# Priority: High
## Expected Behavior
<!--- Tell us what should happen -->
Register Service should be able to create a subscriber on Event Grid Topic.
## Current Beh...
<!--- Provide a general summary of the issue in the Title above -->
# Priority: High
## Expected Behavior
<!--- Tell us what should happen -->
Register Service should be able to create a subscriber on Event Grid Topic.
## Current Behavior
<!--- Tell us what happens instead of the expected behavior -->
EG errors out during the creation of subscribers is blocked.
## Possible Solution
<!--- Not obligatory, but suggest a fix/reason for the bug, -->
https://docs.microsoft.com/en-us/azure/event-grid/secure-webhook-delivery
## Steps to Reproduce
<!--- Provide a link to a live example or an unambiguous set of steps to -->
<!--- reproduce this bug. Include code to reproduce, if relevant -->
Run ITs on Notification or register. Examine the logs.
## Impact
<!--- How has this issue affected you? What are you trying to accomplish? -->
<!--- Providing context helps us come up with a solution that is most useful in the real world -->
#### Environments impacted
All the environments that we have been impacted.
#### Release impacted
OSDU M5 tagging is waiting for this fix.
#### Services impacted
Register, Notification
#### Scenario impacted
Creation of a new subscriber to a topic.
<!--- Provide a general summary of the issue in the Title above -->
## Detailed Description
<!--- Provide a detailed description of the change or addition you are proposing -->
Due to security issues, EG updated the API behavior and hence all the environments got impacted. For more details, refer [this](https://docs.microsoft.com/en-us/azure/event-grid/secure-webhook-delivery)Komal MakkarKomal Makkarhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/107External Data Service Onboarding2022-08-23T10:47:30ZGarrett EdmondsonExternal Data Service Onboarding**Service name**: `External Data Service`
[External Data Service Repo](https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/tree/master)
[External Data Service CI/CD bran...**Service name**: `External Data Service`
[External Data Service Repo](https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/tree/master)
[External Data Service CI/CD branch](https://community.opengroup.org/osdu/platform/data-flow/ingestion/external-data-sources/core-external-data-workflow/-/tree/CI-CD)
**Infrastructure and Initial Requirements**
- [ ] Add any additional Azure cloud infrastructure (Cosmos containers, Storage containers, fileshares, etc.) to the Terraform template. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/infra/templates/osdu-r3-mvp). Note that if the infrastructure is a part of the data-partition template, you may need to add secrets to the keyvault that are partition specific; if doing so, update the createPartition REST request to include the keys that you have added so they are accessible in service code. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/rest/partition.http#L48)
- [ ] Create an ingress point for the service. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/charts/osdu-common/templates/appgw-ingress.yaml)
- [ ] Add any test data that is required for the service integration tests. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/tools/test_data)
- [ ] Update `upload-data.py` to upload any new test data files you created. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/test_data/upload-data.py).
- [ ] Update the integration tester with any entitlements required to test the service. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/tools/test_data/user_info_1.json)
- [ ] Add in any new secrets that the service needs to run. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/charts/osdu-common/templates/kv-secrets.yaml)
- [ ] Create environment variable script to generate .yaml files to be used with Intellij [EnvFile](https://plugins.jetbrains.com/plugin/7861-envfile) plugin and .envrc files to be used with [direnv](https://direnv.net/). [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/tools/variables)
**Gitlab Code and Documentation**
- [ ] Complete the service code such that it passes all integration tests locally. There is some documentation on starting off implementing an Azure provider. [Link](./gitlab-service-readme-template.md)
- [ ] Create helm charts for service. The charts for each service are located in the `devops/azure` directory. You can look at charts from other services as a model. The charts will be nearly identical except for the different environment variables, values, etc each service needs to run. [Link](./gitlab-service-guide.md)
- [ ] Implement Istio for the service if this has not already been done. Here is an example MR that shows what steps are required. [Link](https://community.opengroup.org/osdu/platform/system/storage/-/merge_requests/64)
- [ ] Create an Istio auth policy in the `devops/azure/chart/templates` directory. Here is an example of an Istio auth policy that is generic and can be used by other services. [Link](https://community.opengroup.org/osdu/platform/system/storage/-/blob/master/devops/azure/chart/templates/azure-istio-auth-policy.yaml)
- [ ] Add any variables that are required for the service integration tests to the Azure CI-CD file. [Link](https://community.opengroup.org/osdu/platform/ci-cd-pipelines/-/blob/master/cloud-providers/azure.yml)
- [ ] Verify that the README for the Azure provider correctly and clearly describes how to run and test the service. There is a README template to help. [Link](./gitlab-service-readme-template.md)
- [ ] Push any changes and verify that the Gitlab pipeline is passing in master.
**Development and Demo Azure Devops Pipelines**
- [ ] Create development ADO pipeline at `devops/azure/development-pipeline.yml` in the service repo.
- [ ] Verify development pipeline passes in ADO.
- [ ] Create Demo ADO pipeline at `devops/azure/pipeline.yml` in the service repo.
- [ ] Verify demo pipeline is passing in ADO.
**User Documentation**
- [ ] Add the service to the mirror pipeline instructions. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/code-mirroring.md)
- [ ] Add the service to the manual deployment instructions. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/charts)
- [ ] Add any required variables to the already existing variable group instructions for automated deployment. You should know if any variables need to be added to existing variable groups from creating the development and demo pipelines. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-automation.md#create-osdu-service-libraries)
- [ ] Add a variable group `Azure Service Release - $SERVICE_NAME` to the documentation. You should know what values to set for this variable group from creating the development and demo pipelines. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-automation.md#create-osdu-service-libraries)
- [ ] Add a step for creating the service pipeline at the bottom of the service-automation page. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/master/docs/service-automation.md#create-osdu-service-libraries)
- [ ] Create a rest script with sample calls to the service for users. [Link](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/tools/rest)Krishna Nikhil VedurumudiVivek OjhaKrishna Nikhil Vedurumudihttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/157[Feature] Add Probe test for OSDU based on scenario2022-08-23T10:47:28ZAryaan Singh[Feature] Add Probe test for OSDU based on scenarioM6 - Release 0.9 - removeAryaan SinghAryaan Singhhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/315[Feature] Airflow2 stage with private endpoints2023-08-02T22:19:08ZArturo Hernandez [EPAM][Feature] Airflow2 stage with private endpoints# Airflow2 stage
---
By default airflow2 it is deployed at service resources stage, one airflow it is configured for the hole OSDU.
It looks like airflow2 it is not enough for service resources in a multipartition environment, therefor...# Airflow2 stage
---
By default airflow2 it is deployed at service resources stage, one airflow it is configured for the hole OSDU.
It looks like airflow2 it is not enough for service resources in a multipartition environment, therefore, airflow2 it is deployed externally per data partition, in a separated network and subnet (brand new airflow2 resources will be created).
In order to secure and increase performance when using an external airflow2, we need to setup private endpoint for those resources, including private endpoint for the partition-airflow2 application gateway from the main AKS cluster.
Airflow2 interacts with the storage accounts mostly, therefore I guess those private endpoints will be needed as well.
## Airflow2 independent stage
---
I have an strong opinion that airflow 2 it should be segregated from the partition resources, in case there is need for new external airflow, that should be created as separate stage (like data-partition, service, central), should be some "airflow" resources stage, which should provide configured airflow out of the box.
## ADF Replacement
---
To achieve convergence between ADME and community we might want to start thinking about Azure Data Factory, which it is already available for [terraform - AzureRM](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/data_factory.html), we might start smoothly this migration at data partition level and then at service resource level.
## Action items
---
@lucynliu @nursheikh I would like to start these discussion here in the forum, it would be nice to start convergence between community and ADME, I have the feeling we should get rid of the per-partition Airflow2 resources (including AKS for airflow), and start considering at this first stage to use ADF per partition as optional feature, then move forward with ADF at Service Resources, or if should be fine now to start considering using ADF per partition (I don't know if this it is really convenient).
We should also include the optional feature of private endpoints from AKS to ADF/AKS-Airflow in any case.
cc. @lucynliu @vleskivArturo Hernandez [EPAM]Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/138Feature Change - Add tags create and modified timestamps as tags to azure res...2021-06-23T09:22:25ZKrishna Nikhil VedurumudiFeature Change - Add tags create and modified timestamps as tags to azure resourcesFor an azure resource, the activity logs are available only for 90 days.
Post that there is no way to find out the "Creation Date" for the resource.
Sometimes creation date matters w.r.t availability of features. For eg: All the Cosmo...For an azure resource, the activity logs are available only for 90 days.
Post that there is no way to find out the "Creation Date" for the resource.
Sometimes creation date matters w.r.t availability of features. For eg: All the Cosmos DB instances that have created post a certain date have default partition key size more than 2k bytes. Prior to that the default value was 100 bytes.
Quoting Azure support team
> Unless you add a tag with the creation date, the only way to get a creation date is to use Azure Activity Log to search for the creation operation of the resource. These logs are only saved for 90 days, so if the resource was created more than 90 days ago, there is no way to find the creation date.Krishna Nikhil VedurumudiKrishna Nikhil Vedurumudihttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/118Feature Change - Enable AKS host-based encryption2022-08-23T11:17:12ZDaniel SchollFeature Change - Enable AKS host-based encryptionCurrently the AKS nodes aren't configured to use host-based encryption which needs to be enabled to support encryption at rest security requirements.
OSDU Security - Universal Encryption - A2
> Host-based encryption on Azure Kubernetes...Currently the AKS nodes aren't configured to use host-based encryption which needs to be enabled to support encryption at rest security requirements.
OSDU Security - Universal Encryption - A2
> Host-based encryption on Azure Kubernetes Service is in Preview. [Link](https://docs.microsoft.com/en-us/azure/aks/enable-host-encryption#:~:text=With%20host%2Dbased%20encryption%2C%20the,encrypted%20to%20the%20Storage%20service.&text=The%20cache%20of%20OS%20and,type%20set%20on%20those%20disks.)
Acceptance Criteria
---
1. Infrastructure Automation should automatically configure this feature.
2. Unit and Integration Tests Should passM7 - Release 0.10.0 - removeVivek OjhaVivek Ojhahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/117Feature Change - Enable Pod to Pod transport security2021-06-14T04:26:42ZDaniel SchollFeature Change - Enable Pod to Pod transport securityCurrent implementation terminates and ssl offloads transport security at the Load Balancer. Transport Security should exist all the way to the Kubernetes Pod and between Pods.
OSDU Security - Universal Encryption - B2
Acceptance Crite...Current implementation terminates and ssl offloads transport security at the Load Balancer. Transport Security should exist all the way to the Kubernetes Pod and between Pods.
OSDU Security - Universal Encryption - B2
Acceptance Criteria
---
1. A design decision should be made on the best way to handle this feature.
2. Infrastructure/Helm automation should automatically configure and enable this feature.
3. Service Helm charts should be changed to move to httpshttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/69Feature change - Helm values override support - Support of overriding helm co...2021-06-23T09:18:00ZKiran VeerapaneniFeature change - Helm values override support - Support of overriding helm config in airflow deployment**Why is this change needed**
Customers requested to provide a way to override default helm-config.yaml that is checkedin into the repository. Airflow is deployed in AKS cluster using helm charts. For airflow helm charts configuration l...**Why is this change needed**
Customers requested to provide a way to override default helm-config.yaml that is checkedin into the repository. Airflow is deployed in AKS cluster using helm charts. For airflow helm charts configuration like database to connect to, redis, replicas etc., are provided using helm-config.yaml. This helm config is checkedin into master repository which is basic and tuned for gitlab and dev environment. Using the override feature customer can provide a override configuration file which will override default configuration provided in the repository. This is optional and customer can deploy the existing templates without any change.
**Current Behavior**
Currently customers are changing the helm-config.yaml file directly once forked from infrastructure repository. This is causing conflicts when the helm-config.yaml file is updated in the infrastructure repository and manual intervention is needed to resolve this.
**Expected Behavior**
Provide a way for customers to provide a override file without worrying about handling merge conflicts.
**Design proposal**
ADO pipelines are modified to take extra values files "helm-config-override.yaml" which is always empty in the infrastructure repository. ADO pipelines are modified to take two input value files "helm-config.yaml" and "helm-config-override.yaml". After this change customers who want to deploy airflow can give overriding values through "helm-config-override.yaml". As this file is always empty there won't be any conflicts.
**Acceptance Criteria**
- Design Feature to ensure can be implemented with a non breaking change.
- Update all required documentationKishore BattulaKishore Battulahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/262Feature - multiregion private endpoints2023-05-10T16:53:32ZArturo Hernandez [EPAM]Feature - multiregion private endpointsAs for now when private endpoints were introduced, the main restriction it is that private endpoints are restricted to the same region.
* https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview#private-endpoint-pr...As for now when private endpoints were introduced, the main restriction it is that private endpoints are restricted to the same region.
* https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview#private-endpoint-properties
> The private endpoint must be deployed in the same region and subscription as the virtual network.
This means that if we plan to deploy another partition on different region, it will not be possible due this limitation.
Ideally, I think this would be best approach:
* CR and SR should be deployed in the same region (No need for new virtual network)
* SR will need new subnet for network peering.
* DP should have own net and subnet which will peer to the SR subnet. All the private endpoints will be created in the DP attached to the dedicated virtual network of the DP resources.
I think this is not priority to be implemented on specific Milestone, I guess there are few use cases on which we may want to have partitions on different region than the control plane.Arturo Hernandez [EPAM]Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/182Fix airflow chart to deploy it through helm install/upgrade2022-08-23T10:47:28ZVineeth Guna [Microsoft]Fix airflow chart to deploy it through helm install/upgradeAirflow charts needs to be templatized to be used with helm install/upgrade
This is needed for enabling multi partition support for airflow
These changes should not effect the existing flux deploymentAirflow charts needs to be templatized to be used with helm install/upgrade
This is needed for enabling multi partition support for airflow
These changes should not effect the existing flux deploymentVineeth Guna [Microsoft]Vineeth Guna [Microsoft]