infra-azure-provisioning issueshttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues2023-09-22T11:56:40Zhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/263Incorrect rights and GUID for terraform registration for AAD role2023-09-22T11:56:40ZPaweł GrudzieńIncorrect rights and GUID for terraform registration for AAD roleScript common_prepare.sh use a hardcoded GUID value for assigning Application.ReadWrite.OwnedBy (824c81eb-e3f8-4ee6-8f6d-de7f50d565b7) which is invalid.
Moreover even if granting terraform app registration Application.ReadWrite.OwnedBy....Script common_prepare.sh use a hardcoded GUID value for assigning Application.ReadWrite.OwnedBy (824c81eb-e3f8-4ee6-8f6d-de7f50d565b7) which is invalid.
Moreover even if granting terraform app registration Application.ReadWrite.OwnedBy. The deployment fails with this error:
```
module.graph_account.azurerm_cosmosdb_gremlin_database.cosmos_dbs[0]: Creation complete after 36s [id=/subscriptions/797a3722-248f-4a96-99b4-25dc6afd2a32/resourceGroups/osdu-mvp-crpriv3-c5mq-rg/providers/Microsoft.DocumentDB/databaseAccounts/osdu-mvp-crpriv3-c5mq-graph/gremlinDatabases/osdu-graph]
module.graph_account.azurerm_cosmosdb_gremlin_graph.cosmos_graphs[0]: Creating...
module.graph_account.azurerm_cosmosdb_gremlin_graph.cosmos_graphs[0]: Still creating... [10s elapsed]
module.graph_account.azurerm_cosmosdb_gremlin_graph.cosmos_graphs[0]: Still creating... [20s elapsed]
module.graph_account.azurerm_cosmosdb_gremlin_graph.cosmos_graphs[0]: Still creating... [30s elapsed]
module.graph_account.azurerm_cosmosdb_gremlin_graph.cosmos_graphs[0]: Creation complete after 37s [id=/subscriptions/797a3722-248f-4a96-99b4-25dc6afd2a32/resourceGroups/osdu-mvp-crpriv3-c5mq-rg/providers/Microsoft.DocumentDB/databaseAccounts/osdu-mvp-crpriv3-c5mq-graph/gremlinDatabases/osdu-graph/graphs/Entitlements]
╷
│ Error: Adding password for application with object ID "bba42f65-4c1d-438a-9458-54baf6ce4fc3"
│
│ with module.ad_application.azuread_application_password.main[0],
│ on ../../../modules/providers/azure/ad-application/main.tf line 98, in resource "azuread_application_password" "main":
│ 98: resource "azuread_application_password" "main" {
│
│ ApplicationsClient.BaseClient.Post(): unexpected status 403 with OData error: Authorization_RequestDenied: Insufficient privileges to complete the operation.
╵
```
After testing minimal rights needed is Application.ReadWrite.All and we weren't able to narrow it down to Application.ReadWrite.OwnedBy.
I need to note that this contradicts Microsoft documentation for the azure ad terraform module. At the moment of writing the note of this rights can be found in README.md file in under infra\modules\providers\azure\ad-application\README.md (after downloading the module).shivani karipesaketh somarajushivani karipehttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/318README issues after latest update2023-09-21T10:58:33ZDmytro KomisarREADME issues after latest updateLink to "Steps to create Flux Manifest Repository" [here](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blame/master/README.md#L45)
https://community.opengroup.org/osdu/platform/deploy...Link to "Steps to create Flux Manifest Repository" [here](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blame/master/README.md#L45)
https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/blob/az/sa-fix-documentation/docs/flux.md
is broken.
In the same paragraph: "This step is optional and not recommended one for new installations" but next step with running common_prepare.sh fails because of
```bash
./infra/scripts/common_prepare.sh $(az account show --query id -otsv) $UNIQUE $PREFIX
ERROR: GIT_REPO not provided
```
and it looks like the previous step is needed. Or common_prepare.sh should be fixed.https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/321[IMPROVEMENT] Lack of Automated Script in Cosmos DB Firewall Update Instructions2023-09-20T10:32:21ZPaweł Grudzień[IMPROVEMENT] Lack of Automated Script in Cosmos DB Firewall Update Instructions**Description:**
The current instructions for updating the Cosmos DB firewall settings do not include an automated method to add the user's current public IP address. Users often encounter issues when their requests originate from an IP...**Description:**
The current instructions for updating the Cosmos DB firewall settings do not include an automated method to add the user's current public IP address. Users often encounter issues when their requests originate from an IP that is blocked by the Cosmos DB firewall, as indicated by the error:
```
azure.cosmos.exceptions.CosmosHttpResponseError: (Forbidden) Request originated from IP xx.xx.xx.xx through public internet. This is blocked by your Cosmos DB account firewall settings.
```
**Details:**
When users attempt to connect to Cosmos DB from an unlisted IP address, they receive the above error. This requires them to manually check their public IP and then update the Cosmos DB firewall settings, a process that can be tedious and error-prone.
**Expected Behavior:**
Users should have a seamless way to add their current public IP address to the Cosmos DB firewall settings without having to manually determine their IP and update the settings.
**Actual Behavior:**
Users need to manually determine their public IP and update the Cosmos DB firewall settings, resulting in possible human errors and inefficiencies.
**Steps to Reproduce:**
1. Access Cosmos DB from an IP not listed in the firewall settings.
2. Observe the aforementioned error.
3. Manually determine the public IP.
4. Manually update the Cosmos DB firewall settings to include the new IP.
**Suggested Fix:**
Provide an automated bash script that:
1. Determines the user's public IP.
2. Fetches the existing allowed IPs from the Cosmos DB firewall settings.
3. Adds the new IP to the list if not already present.
4. Updates the firewall settings with the new list.
Here's the script:
```bash
#!/bin/bash
# Ensure required environment variables are set
if [[ -z "$COSMOS_ENDPOINT" || -z "$GROUP" ]]; then
echo "Please make sure the COSMOS_ENDPOINT and GROUP environment variables are set."
exit 1
fi
# Extract Cosmos DB account name from the endpoint URL
COSMOS_DB_ACCOUNT_NAME=$(echo $COSMOS_ENDPOINT | awk -F'://' '{print $2}' | awk -F'.' '{print $1}')
# Fetch the public IP address
MY_IP=$(curl -s ifconfig.me)
# Fetch existing allowed IPs from Cosmos DB
EXISTING_IPS=$(az cosmosdb show --name $COSMOS_DB_ACCOUNT_NAME --resource-group $GROUP --query "ipRangeFilter" -o tsv)
# Check if your IP is already in the list
if [[ $EXISTING_IPS == *$MY_IP* ]]; then
echo "Your IP ($MY_IP) is already in the list."
exit 0
fi
# Combine your IP with the existing IPs
if [ -z "$EXISTING_IPS" ]; then
NEW_IPS=$MY_IP
else
NEW_IPS="$EXISTING_IPS,$MY_IP"
fi
# Update the firewall rules
az cosmosdb update --name $COSMOS_DB_ACCOUNT_NAME --resource-group $GROUP --ip-range-filter "$NEW_IPS"
echo "Firewall rules updated successfully."
```
Users just need to update the placeholders with their Cosmos DB account name and resource group and then run the script.
**Environment:**
- Azure Cosmos DB SDK version: (e.g., 2.14.0, or the version you are referring to)
- Azure CLI version: (e.g., 2.x.x)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/320Authentication hangs with AzureRM provider version 2.64.0 in Monitoring Resou...2023-09-18T11:13:02ZPaweł GrudzieńAuthentication hangs with AzureRM provider version 2.64.0 in Monitoring Resources terraform and needs update**Description:**
When using the AzureRM Terraform provider at version `2.64.0`, the Monitoring Resources 'terraform apply' script hangs indefinitely without providing any error message. However, after updating the AzureRM provider to a ...**Description:**
When using the AzureRM Terraform provider at version `2.64.0`, the Monitoring Resources 'terraform apply' script hangs indefinitely without providing any error message. However, after updating the AzureRM provider to a newer version, the problem is resolved, suggesting an authentication issue with version `2.64.0`. I did not capture the logs sadly.
**Details:**
In a Terraform script with the below configuration:
```
terraform {
required_version = ">= 1.3"
backend "azurerm" {
key = "terraform.tfstate"
}
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.64.0"
}
random = {
source = "hashicorp/random"
version = "=2.3.1"
}
}
}
```
The terraform apply command hangs indefinitely during execution. Although no error message was shown in standard logs, verbose logs indicated an authentication error to Azure.
**Expected Behavior:**
The terraform apply should either execute successfully.
**Actual Behavior:**
The script hangs indefinitely without any feedback to the user.
**Steps to Reproduce:**
1. Use the AzureRM provider at version `2.64.0` in a Terraform script.
2. Execute the script.
3. Observe that it hangs without any clear error message.
**Workaround:**
Upgrading the AzureRM provider to a newer version (e.g., `3.73.0`) resolves the problem.
**Suggested Fix:**
Upgrade to the latest version of the provider in documentation.
**Environment:**
- Terraform version: (e.g., 1.5.1)
- AzureRM provider version where the issue was observed: `2.64.0`https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/319Incorrect usage of trim function leads to malformed resource names in monitor...2023-09-18T10:59:58ZPaweł GrudzieńIncorrect usage of trim function leads to malformed resource names in monitoring resources terraformIn the Terraform module, the trim function is being used to remove specific suffixes from strings. However, the current usage can lead to the unintended removal of characters, causing malformed resource names in Azure resources.
Descri...In the Terraform module, the trim function is being used to remove specific suffixes from strings. However, the current usage can lead to the unintended removal of characters, causing malformed resource names in Azure resources.
Description:
In the Monitoring Resources main.tf Terraform module, the trim function is being used to remove specific suffixes from strings. However, the current usage can lead to the unintended removal of characters, causing malformed resource names in Azure resources.
Details:
The specific instance observed is in the trimming of the -rg suffix from resource group names. The current code uses:
```
central_group_prefix = trim(data.terraform_remote_state.central_resources.outputs.central_resource_group_name, "-rg")
```
The intention is to remove the -rg suffix, but due to the behavior of trim, it also removes any individual -, r, and g characters from the ends of the string, leading to unexpected results.
For instance, a name like "osdu-pl2-crpl2-583g-rg" is trimmed to "osdu-pl2-crpl2-583" instead of the expected "osdu-pl2-crpl2-583g".
Expected Behavior:
The -rg suffix should be removed without affecting other characters in the string.
Actual Behavior:
Characters within the -rg suffix are being removed individually if they are at the ends of the string, leading to unexpected results.
Steps to Reproduce:
Use a resource group name like "osdu-pl2-crpl2-583g-rg".
Apply the Terraform module.
Observe that resources dependent on the central_group_prefix variable have the g character missing.
Suggested Fix:
Replace the trim function with the trimsuffix function, which will only remove the exact -rg suffix:
```
central_group_prefix = trimsuffix(data.terraform_remote_state.central_resources.outputs.central_resource_group_name, "-rg")
```
This change should be applied wherever the trim function is used in a similar context.https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/317Error: Plugin did not respond2023-09-11T19:29:09ZYukuo WangError: Plugin did not respondWe captured several terraform plan failure recently.
│ Error: Plugin did not respond
│
│ with module.system_storage_account.azurerm_storage_account.main,
│ on ../../../modules/providers/azure/storage-account/main.tf line 19, in reso...We captured several terraform plan failure recently.
│ Error: Plugin did not respond
│
│ with module.system_storage_account.azurerm_storage_account.main,
│ on ../../../modules/providers/azure/storage-account/main.tf line 19, in resource "azurerm_storage_account" "main":
│ 19: resource "azurerm_storage_account" "main" {
│
│ The plugin encountered an error, and failed to respond to the
│ plugin.(*GRPCProvider).ReadResource call. The plugin logs may contain more
│ details.
╷
│ Error: Request cancelled
│
│ with module.keyvault_policy.azurerm_key_vault_access_policy.keyvault[0],
│ on ../../../modules/providers/azure/keyvault-policy/main.tf line 15, in resource "azurerm_key_vault_access_policy" "keyvault":
│ 15: resource "azurerm_key_vault_access_policy" "keyvault" {
│
│ The plugin.(*GRPCProvider).UpgradeResourceState request was cancelled.
╵
Also with stack trace logs:
Stack trace from the terraform-provider-azurerm_v3.39.1_x5 plugin:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4c12582]
goroutine 1950 [running]:
github.com/hashicorp/terraform-provider-azurerm/internal/services/containers.resourceKubernetesClusterRead(0xc001d94480, {0x5d01ea0?, 0xc000737000})
github.com/hashicorp/terraform-provider-azurerm/internal/services/containers/kubernetes_cluster_resource.go:2060 +0x9c2
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).read(0x6f8e340?, {0x6f8e340?, 0xc001fd32c0?}, 0xd?, {0x5d01ea0?, 0xc000737000?})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/resource.go:712 +0x178
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc000b56b60, {0x6f8e340, 0xc001fd32c0}, 0xc001f90750, {0x5d01ea0, 0xc000737000})
github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/resource.go:1015 +0x585
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).ReadResource(0xc00152f980, {0x6f8e340?, 0xc001fd2ea0?}, 0xc001c5a100)
github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/schema/grpc_provider.go:613 +0x4a5
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).ReadResource(0xc001930320, {0x6f8e340?, 0xc001fd2780?}, 0xc001127140)
github.com/hashicorp/terraform-plugin-go@v0.14.1/tfprotov5/tf5server/server.go:748 +0x4b1
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_ReadResource_Handler({0x63c6d80?, 0xc001930320}, {0x6f8e340, 0xc001fd2780}, 0xc001347b20, 0x0)
github.com/hashicorp/terraform-plugin-go@v0.14.1/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:349 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00027a000, {0x6f9e380, 0xc000f9e000}, 0xc002595b00, 0xc001993530, 0xb246a90, 0x0)
google.golang.org/grpc@v1.50.1/server.go:1340 +0xd23
google.golang.org/grpc.(*Server).handleStream(0xc00027a000, {0x6f9e380, 0xc000f9e000}, 0xc002595b00, 0x0)
google.golang.org/grpc@v1.50.1/server.go:1713 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
google.golang.org/grpc@v1.50.1/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
google.golang.org/grpc@v1.50.1/server.go:963 +0x28a
Error: The terraform-provider-azurerm_v3.39.1_x5 plugin crashed!
This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.
By troubleshooting on this, we noticed that there is a bug fix:
Fix nil panic by correcting nil check expression: https://github.com/hashicorp/terraform-provider-azurerm/pull/21850
This fix is inclued in terraform-provider-azurerm v3.57.0 (May 19, 2023)
https://github.com/hashicorp/terraform-provider-azurerm/blob/v3.57.0/CHANGELOG.md
BUG FIXES:
data.azurerm_kubernetes_cluster - prevent a panic when some values returned are nil (#21850)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/77Architecture Change - Central Resources - Add Graph Database2023-09-06T17:03:21ZDaniel SchollArchitecture Change - Central Resources - Add Graph DatabaseThe addition of a Graph Database is required in order to support enhanced Entitlements and a new Entitlements Service based on Graph Database Functionality. This database has been determined to be a Cosmos Database and leverage the [Azu...The addition of a Graph Database is required in order to support enhanced Entitlements and a new Entitlements Service based on Graph Database Functionality. This database has been determined to be a Cosmos Database and leverage the [Azure Cosmos Graph API](https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction)
The database for entitlements needs to be a single database for the OSDU stamp and is not part of a Data Partition and is planned to be a part of the Central Resources.
---
__Design__
Terraform Resources exist in AzureRM for managing a Gremlin Graph within a Cosmos Account. These resources are different than those used by a SQL Database and Container. Two options exist for the module work.
1. Enhance the CosmosDB Module to support both SQL and Gremlin Databases.
2. Create a separate module for each database type that is independent.
There are no known advantages at the moment as to why a single module would be of benefit so the default decision is to use a new module for this Graph API functionality.
_Module Requirements_
- The Module if possible should be as similar to Cosmos DB as possible.
_Template Requirements_
- Database will be named with the suffix of graph to distinguish from table or db
- Database will be created as part of the Central Resource Template
- Database will be locked
- Database location and replication location will be consistent in naming patterns to Data Partitions
- Database by default will use the same type of throughput settings as CosmosDB.
---
__Acceptance Criteria__
1. Architecture Diagram Change
2. Modify or create an infrastructure module responsible for adding Cosmos Graph Database.
3. Modify Central Resources to add the additional database.
4. Ensure all Module Unit Tests Pass
5. Ensure all Template Unit Tests and Integration Tests Pass
6. Update all required documentationJanuary - 21Daniel SchollDaniel Schollhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/246Feature - Security rules for OSDU Infrastructure - Network2023-08-16T18:28:56ZArturo Hernandez [EPAM]Feature - Security rules for OSDU Infrastructure - Network| Done | Infra Relation | Rule |
|------|----------------|--------------------------------------------------------------------------------------------...| Done | Infra Relation | Rule |
|------|----------------|---------------------------------------------------------------------------------------------|
| !740 | NETWORK | ~~Ensure keyvault is recoverable~~ |
| !825 | NETWORK | ~~Ensure that public network access is disabled for Azure Key Vaults~~ |
| !843 | NETWORK | Ensure that Azure CosmosDB does not allow access from all networks |
| !776 | NETWORK | ~~Ensure that public network access is disabled in Redis Cache~~ |
| !776 | NETWORK | ~~Ensure that Redis Cache uses private link~~ |
| !620 #218 | NETWORK | ~~Ensure that Azure Kubernetes Service Private Clusters is enabled~~ |
| !825 | NETWORK | ~~Ensure that Azure Key Vaults use Private Links~~ |
| | NETWORK | Ensure that Postgres DB use Private Links |
| | NETWORK | Ensure that Storage Accounts use Private Links |
| !879 | NETWORK | Ensure that Event Grid uses Private Links |
* [ ] All changes must be well documented and if downtime it would be expected
* [ ] TF scripts should work without errors in greenfield environments
* [ ] If TF Brownfield apply presents any migration or downtime, to be documented
* [ ] Check if Cosmos/resource backup policies are affected by private endpointsArturo Hernandez [EPAM]Igor Zimovets (EPAM)Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/176Update airflow resource requests and limits2023-08-16T10:40:37ZAalekh JainUpdate airflow resource requests and limitsThe ITs for glab environment are failing because of timeout. This is due to the unavailability of enough resources to execute dag runs in case the number of dags in the airflow increases. Since this slows down the dag runs execution lead...The ITs for glab environment are failing because of timeout. This is due to the unavailability of enough resources to execute dag runs in case the number of dags in the airflow increases. Since this slows down the dag runs execution leading to timeout, hence, we need to request for more resources for the following -
1. Airflow - WebUI
2. Airflow - Worker
3. Airflow - Scheduler
Link to the MR: !331
cc: @vineethguna @kibattulAalekh JainAalekh Jainhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/196Close Release 0.112023-08-16T10:40:37ZMANISH KUMARClose Release 0.11Vivek OjhaVivek Ojhahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/197Enable BYOAD by adding feature flag for ad application in central resources2023-08-16T10:40:37ZVivek OjhaEnable BYOAD by adding feature flag for ad application in central resourcesVivek OjhaVivek Ojhahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/222Enable XCOM Summary for Manifest Ingestion Dags2023-08-16T10:39:58Zharshit aggarwalEnable XCOM Summary for Manifest Ingestion DagsEnable XCOM Summary for Manifest Ingestion Dag. The IDS for the records ingested as well the ones which were skipped can be checked in the xcomEnable XCOM Summary for Manifest Ingestion Dag. The IDS for the records ingested as well the ones which were skipped can be checked in the xcomhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/220Update Data loading Scripts for CSV/Manifest Ingestion to support packaged Dags2023-08-16T10:39:58Zharshit aggarwalUpdate Data loading Scripts for CSV/Manifest Ingestion to support packaged DagsUpdating Data loading Scripts for CSV/Manifest Ingestion to support packaged DagsUpdating Data loading Scripts for CSV/Manifest Ingestion to support packaged Dagshttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/221WITSML Parser Dag Loading Scripts2023-08-16T10:39:58Zharshit aggarwalWITSML Parser Dag Loading ScriptsAdding Data loading Scripts for WITSML Parser DagAdding Data loading Scripts for WITSML Parser Daghttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/223Upgrade AGIC to 1.4.0 to support Health Probe annotation2023-08-16T10:39:58ZSabarish K R EUpgrade AGIC to 1.4.0 to support Health Probe annotationUpgrade AGIC to 1.4.0 to support custom Health Probe annotation.Upgrade AGIC to 1.4.0 to support custom Health Probe annotation.Sabarish K R ESabarish K R Ehttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/224Create Hierarchical storage account to support File collection2023-08-16T10:39:58ZHarshit SaxenaCreate Hierarchical storage account to support File collectionTo support file collection feature, we need to initialize Azure datalake in storage account.
MR - https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/merge_requests/570To support file collection feature, we need to initialize Azure datalake in storage account.
MR - https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/merge_requests/570Harshit SaxenaHarshit Saxenahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/315[Feature] Airflow2 stage with private endpoints2023-08-02T22:19:08ZArturo Hernandez [EPAM][Feature] Airflow2 stage with private endpoints# Airflow2 stage
---
By default airflow2 it is deployed at service resources stage, one airflow it is configured for the hole OSDU.
It looks like airflow2 it is not enough for service resources in a multipartition environment, therefor...# Airflow2 stage
---
By default airflow2 it is deployed at service resources stage, one airflow it is configured for the hole OSDU.
It looks like airflow2 it is not enough for service resources in a multipartition environment, therefore, airflow2 it is deployed externally per data partition, in a separated network and subnet (brand new airflow2 resources will be created).
In order to secure and increase performance when using an external airflow2, we need to setup private endpoint for those resources, including private endpoint for the partition-airflow2 application gateway from the main AKS cluster.
Airflow2 interacts with the storage accounts mostly, therefore I guess those private endpoints will be needed as well.
## Airflow2 independent stage
---
I have an strong opinion that airflow 2 it should be segregated from the partition resources, in case there is need for new external airflow, that should be created as separate stage (like data-partition, service, central), should be some "airflow" resources stage, which should provide configured airflow out of the box.
## ADF Replacement
---
To achieve convergence between ADME and community we might want to start thinking about Azure Data Factory, which it is already available for [terraform - AzureRM](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/data_factory.html), we might start smoothly this migration at data partition level and then at service resource level.
## Action items
---
@lucynliu @nursheikh I would like to start these discussion here in the forum, it would be nice to start convergence between community and ADME, I have the feeling we should get rid of the per-partition Airflow2 resources (including AKS for airflow), and start considering at this first stage to use ADF per partition as optional feature, then move forward with ADF at Service Resources, or if should be fine now to start considering using ADF per partition (I don't know if this it is really convenient).
We should also include the optional feature of private endpoints from AKS to ADF/AKS-Airflow in any case.
cc. @lucynliu @vleskivArturo Hernandez [EPAM]Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/256[Feature] Standardize airflow2 resources2023-08-02T22:06:09ZArturo Hernandez [EPAM][Feature] Standardize airflow2 resourcesWe are no longer using airflow1, Airflow2 was introduced in Release M9 (0.12), slowly adopted by all CSP's. We are now in M17 and still maintaining airflow1 resources.
In dp resources `airflow2_enabled` `deploy_dp_airflow` `dp_airflow_a...We are no longer using airflow1, Airflow2 was introduced in Release M9 (0.12), slowly adopted by all CSP's. We are now in M17 and still maintaining airflow1 resources.
In dp resources `airflow2_enabled` `deploy_dp_airflow` `dp_airflow_aks_version` variables are misleading, which seems to be relying one from another on data partition stages.
Variable `airflow2_enabled` will not have any effect if `deploy_dp_airflow` it is not enabled either, we have a mess of airflow options that would be wise to standardize and refactor all the infra/helm code to use only airflow2 in the upcoming releases if no one is using anymore airflow1.
cc @lucynliu @nursheikh @shivani_karipeArturo Hernandez [EPAM]Igor Zimovets (EPAM)shivani karipeArturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/245Feature - Security rules for OSDU Infrastructure - Encryption2023-08-01T22:26:09ZArturo Hernandez [EPAM]Feature - Security rules for OSDU Infrastructure - EncryptionFrom EPAM security recommendations we got the following suggestions for *ENCRYPTION* to comply with:
| Done | Infra Relation | Rule |
|------|--------...From EPAM security recommendations we got the following suggestions for *ENCRYPTION* to comply with:
| Done | Infra Relation | Rule |
|------|----------------|---------------------------------------------------------------------------------------------|
| [ ] | ENCRYPTION | Ensure Storage Service Encryption is enabled for Storage Accounts |
| [ ] | ENCRYPTION | Ensure that Storage Accounts have infrastructure encryption enabled |
| [ ] | ENCRYPTION | Ensure Storage Accounts are using the latest version of TLS encryption |
| [ ] | ENCRYPTION | Ensure that "OS and Data" disks are encrypted with Customer Managed Key |
| [ ] | ENCRYPTION | Ensure that public network access is disabled in Managed Disks |
| [ ] | ENCRYPTION | Ensure that all unattached VM disks are encrypted |
| [ ] | ENCRYPTION | Ensure that Container Registries are configured to disable public network access |
| [ ] | ENCRYPTION | Ensure that Container Registries are encrypted with a customer-managed key |
All changes must be well documented and if downtime it would be expected.
Would be nice to test this in greenfield environments as well.Arturo Hernandez [EPAM]Igor Zimovets (EPAM)Siarhei Symanovich (EPAM)Aliaksei Kruk2Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/260Feature - Security rules for OSDU Infrastructure - Network (ServiceBus PE)2023-07-13T13:25:32ZVasyl Leskiv [SLB]Feature - Security rules for OSDU Infrastructure - Network (ServiceBus PE)Currently the connection from AKS to Service Bus is established through public endpoint. This has the next impact on highly loaded production:
- Security (public internet traffic)
- Load & auto scaling (AKS SNAT outgoing port limitation)...Currently the connection from AKS to Service Bus is established through public endpoint. This has the next impact on highly loaded production:
- Security (public internet traffic)
- Load & auto scaling (AKS SNAT outgoing port limitation)
- Performance (latency)
Switching to Private endpoints should resolve the items above.M19 - Release 0.22Arturo Hernandez [EPAM]Srinivasan Narayananshivani karipeArturo Hernandez [EPAM]