infra-azure-provisioning issueshttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues2022-08-02T02:52:16Zhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/234Airflow Logs getting truncated in log Analytics2022-08-02T02:52:16Zdevesh bajpaiAirflow Logs getting truncated in log AnalyticsAirflow logs created in blobs tore are are sent to log analytics
refer : https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/source/airflow-function
but it is observed that in ...Airflow logs created in blobs tore are are sent to log analytics
refer : https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/tree/master/source/airflow-function
but it is observed that in case airflow logs has multiple line those logs are truncated in log analytics
> e.g.
airlfow logs in blob store
<pre>
[2022-06-10, 06:23:09 UTC] {validate_schema.py:322} ERROR - Schema validation error. Data field.
[2022-06-10, 06:23:09 UTC] {validate_schema.py:323} ERROR - Manifest kind: osdu:wks:work-product-component--WellboreTrajectory:1.1.0
[2022-06-10, 06:23:09 UTC] {validate_schema.py:324} ERROR - Error: None is not of type 'string'
Failed validating 'type' in schema['properties']['data']['allOf'][3]['properties']['AppliedOperations']['items']:
{'type': 'string'}
On instance['data']['AppliedOperations'][0]:
None
</pre>
> export from logAnalytics
<pre>
--------------------------------------------------------------------------------",INFO
"2022-06-10 06:23:09,305","Error: None is not of type 'string'",ERROR
"2022-06-10 06:23:09,305","Manifest kind: osdu:wks:work-product-component--WellboreTrajectory:1.1.0",ERROR
"2022-06-10 06:23:09,304","Schema validation error. Data field.",ERROR
"2022-06-10 06:23:09,026","Exporting the following env vars:
</pre>https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/1Airflow Middleware Onboarding2021-02-01T17:53:45ZKiran VeerapaneniAirflow Middleware OnboardingThe ingest project requires the use of Airflow as a Middleware layer to be running in AKS so that Ingest Services can leverage Airflow as a Workflow Engine.
- [x] Architecture Design of Required Azure Resources Necessary for Airflow
1. ...The ingest project requires the use of Airflow as a Middleware layer to be running in AKS so that Ingest Services can leverage Airflow as a Workflow Engine.
- [x] Architecture Design of Required Azure Resources Necessary for Airflow
1. Postgres
2. Redis
3. File Storage
- [x] Host 3rd Party Source Code
1. airflow-function
2. airflow-statsd
- [x] GitLab Pipeline required to containerize and host containers
1. airflow-function
2. airflow-statsd
- [x] Host Helm Charts for installation
1. osdu-airflow
**Automation Onboarding**
- [x] create Pipelines for airflow deployment
- [x] Update helm template task to run python script to add namespace for generated airflow yamls
- [x] Update git ops task to copy the charts generated from airflow.targz in different folder to flux repository
- [x] Execute Installation in Terrforom
1. osdu-airflow
---
__Acceptance Criteria__
1. Airflow Installs automatically as part of the service_resources template.
2. All Tests Pass
3. All Pipelines Pass
4. Documentation Exists
5. Services are able to leverage Airflow Workflow EngineDecemberDaniel SchollHema Vishnu Pola [Microsoft]Daniel Scholl2020-12-19https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/211Airflow Pipelines - Include airflow docker image build as part of pipelines2021-11-12T03:54:14Zharshit aggarwalAirflow Pipelines - Include airflow docker image build as part of pipelinesGoing forward we plan to install Osdu python packages as part of airflow docker image itself, rather than via extraPipPackages during airflow charts deployment. As of now we are manually building the airflow image, we should automate thi...Going forward we plan to install Osdu python packages as part of airflow docker image itself, rather than via extraPipPackages during airflow charts deployment. As of now we are manually building the airflow image, we should automate this process by including a step in pipelines itself to build the airflow image along with relevant versions of python packages and publish the same to a container registry
The changes to remove python packages from extraPipPackages are part of this [MR](https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/merge_requests/514).
The image hardcoded in the values file **msosdu.azurecr.io/airflow-docker-image:v0.10** already contains the latest release/0.12 python packages installedhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/134Airflow task logging not printing correlation-id2021-03-08T07:23:01ZKishore BattulaAirflow task logging not printing correlation-idAirflow task logs have correlation-id as None. Ideally this should be the correlation-id with which the workflow service run API is triggered.Airflow task logs have correlation-id as None. Ideally this should be the correlation-id with which the workflow service run API is triggered.Kishore BattulaKishore Battulahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/146AKS cluster failing to create for long environment names2021-06-23T09:20:35ZVineeth Guna [Microsoft]AKS cluster failing to create for long environment namesTerraform fails to create AKS cluster if the environment name is long
The error thrown by terraform is related to https://docs.microsoft.com/en-us/azure/aks/troubleshooting#what-naming-restrictions-are-enforced-for-aks-resources-and-par...Terraform fails to create AKS cluster if the environment name is long
The error thrown by terraform is related to https://docs.microsoft.com/en-us/azure/aks/troubleshooting#what-naming-restrictions-are-enforced-for-aks-resources-and-parameters
**Steps to reproduce**
- Set environment variable UNIQUE=ingestiontestenv
- Terraform workspace name = sr-ingestiontestenv
- Run service resources terraform pipelinehttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/147AKS Jobs don't rerun on redeploy2021-06-23T04:56:00ZMayank Saggar [Microsoft]AKS Jobs don't rerun on redeployOn redeploying a service, the jobs related to the service will not re run if they completed in the previous deployment.
For eg. this issue occurs when deploying airflow, where in a post install job which completed in the previous deplo...On redeploying a service, the jobs related to the service will not re run if they completed in the previous deployment.
For eg. this issue occurs when deploying airflow, where in a post install job which completed in the previous deployment, will not redeploy whereas all the other pods and services will be redeployed. This only occurs when the service was already deployed and a new update or change is rolled out.
For new deployments the jobs will be deployed and the script runs.
A quick workaround would be to delete the job and it's pod along with it after running the charts/pipeline. The flux as it monitors the deployment, will see that the job doesn't exist and deploy it.https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/227[AKS Policies] Fix volume types policy to comply with least privilege principle2022-06-07T12:25:37ZArturo Hernandez [EPAM][AKS Policies] Fix volume types policy to comply with least privilege principleCurrently policy applied for "Allowed volume types" it is allowing `*`:
```json
{
"effect": { "value": "audit"},
"excludedNamespaces": {"value": ["kube-system", "gatekeeper-system", "azure-arc"]},
"allowedVolumeTypes": {"value": [...Currently policy applied for "Allowed volume types" it is allowing `*`:
```json
{
"effect": { "value": "audit"},
"excludedNamespaces": {"value": ["kube-system", "gatekeeper-system", "azure-arc"]},
"allowedVolumeTypes": {"value": ["*"]}
}
```
To support keyvault and csi providers, need to adopt least privilege principle to get rid of "all" expression.
Related to #218https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/27App Gateway needs to handle HTTP gracefully2021-06-14T04:26:38ZKomal MakkarApp Gateway needs to handle HTTP gracefully
## Type
<!-- Please choose the type of ticket. -->
- [x] Feature Request
- [ ] Bug Report
## Priority
- [ ] High
- [ ] Medium
- [ ] Low
Estimating the priority. We will update when we have gauged it.
------------------------
----...
## Type
<!-- Please choose the type of ticket. -->
- [x] Feature Request
- [ ] Bug Report
## Priority
- [ ] High
- [ ] Medium
- [ ] Low
Estimating the priority. We will update when we have gauged it.
------------------------
------------------------
## Feature Request
<!-- If this is a feature request, fill up the following -->
__Why is this change needed?__
<!-- Please add relevant details. -->
The HTTP request is not handled. Some services have Integration Tests that are failing with the current infra configuration.
1. There has to be graceful handling of HTTP requests.
2. HTTP might send the request to the service OR redirect to HTTPS.
__Current behavior__
<!-- Please describe the current behavior you observe -->
The Application Gateway is not acknowledging HTTP protocol requests. The connection is timing out.
__Expected behavior__
<!-- Please describe the behavior you anticipate -->
The Application gateway should redirect HTTP to HTTPS.
----------------------------
--------------------------
## Bug Report
<!-- If this is a bug report, fill up the following -->
__Breaking__
<!-- Is the bug breaking something. -->
- [ ] YES
- [ ] NO
__Attached Logs?__
<!-- Please attach relevant logs. -->
- [ ] YES
- [ ] NO
__Reproduction__
<!-- Please mention how often can you reproduce it. -->
__Current behavior__
<!-- Please describe the current behavior you observe -->
__Expected behavior__
<!-- Please describe the behavior you anticipate -->
__Steps to reproduce__
<!-- Please add how to reproduce the bug -->
--------------------------
--------------------------
## Other information
<!-- Any other information that is important to this Issue such as screenshots of how the component looks before and after the change. -->
The application gateway can be configured to route the HTTP requests to HTTPS.
https://docs.microsoft.com/en-us/azure/application-gateway/redirect-http-to-https-portalDaniel SchollDaniel Schollhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/25App Gateway WAF request size limits possibly don't support necessary http bod...2021-06-23T04:36:03ZDaniel SchollApp Gateway WAF request size limits possibly don't support necessary http body size requirementsCurrently infrastructure is using WAF v2 and has max body limitations of 128KB. This might not be adequate for what is necessary in a true production type scenario. Further assessment is being done.
[Documentation](https://docs.micros...Currently infrastructure is using WAF v2 and has max body limitations of 128KB. This might not be adequate for what is necessary in a true production type scenario. Further assessment is being done.
[Documentation](https://docs.microsoft.com/en-us/azure/web-application-firewall/ag/application-gateway-waf-configuration#waf-request-size-limits)https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/258Application Insights (Classic) is retiring on 29 Feb 20242023-11-07T18:21:55ZVasyl Leskiv [SLB]Application Insights (Classic) is retiring on 29 Feb 2024https://github.com/azure-deprecation/dashboard/issues/141
As this change have impact on all services, we have to prepare to this in advance and make a manual (if not covered by pipelines) how to apply the change on production environmen...https://github.com/azure-deprecation/dashboard/issues/141
As this change have impact on all services, we have to prepare to this in advance and make a manual (if not covered by pipelines) how to apply the change on production environments.
It would be good to check if any changes required on services source code (AI library, etc)Arturo Hernandez [EPAM]Arturo Hernandez [EPAM]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/106Arch Change - Data Partition - Ingestion Workflow Database and Storage new co...2023-10-19T09:40:41ZVineeth Guna [Microsoft]Arch Change - Data Partition - Ingestion Workflow Database and Storage new collections and filesharesNeed to create new collections for R3 ingestion workflow, as there is a change in partition key semantics
Collections to be created
- WorkflowV2 - (Partition key - /partitionKey)
- WorkflowRunV2 - (Partition key - /partitionKey)
- Workf...Need to create new collections for R3 ingestion workflow, as there is a change in partition key semantics
Collections to be created
- WorkflowV2 - (Partition key - /partitionKey)
- WorkflowRunV2 - (Partition key - /partitionKey)
- WorkflowCustomOperatorV2 - (Partition key - /partitionKey)
File Share folders to be created
- plugins/hooks
- plugins/sensors
All the above collections should support large size partition keysVineeth Guna [Microsoft]Vineeth Guna [Microsoft]https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/123Arch Change - Ingestion Workflow Cosmos Collections2022-08-23T10:47:32ZAalekh JainArch Change - Ingestion Workflow Cosmos CollectionsNeed to create new collections for R3 ingestion workflow, as there is a change in partition key semantics
Collections to be created
* WorkflowTasksSharingInfoV2- (Partition key - /partitionKey)
The above collection should support larg...Need to create new collections for R3 ingestion workflow, as there is a change in partition key semantics
Collections to be created
* WorkflowTasksSharingInfoV2- (Partition key - /partitionKey)
The above collection should support large size partition keysM4 - Release 0.7 - removeAalekh JainAalekh Jainhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/77Architecture Change - Central Resources - Add Graph Database2023-09-06T17:03:21ZDaniel SchollArchitecture Change - Central Resources - Add Graph DatabaseThe addition of a Graph Database is required in order to support enhanced Entitlements and a new Entitlements Service based on Graph Database Functionality. This database has been determined to be a Cosmos Database and leverage the [Azu...The addition of a Graph Database is required in order to support enhanced Entitlements and a new Entitlements Service based on Graph Database Functionality. This database has been determined to be a Cosmos Database and leverage the [Azure Cosmos Graph API](https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction)
The database for entitlements needs to be a single database for the OSDU stamp and is not part of a Data Partition and is planned to be a part of the Central Resources.
---
__Design__
Terraform Resources exist in AzureRM for managing a Gremlin Graph within a Cosmos Account. These resources are different than those used by a SQL Database and Container. Two options exist for the module work.
1. Enhance the CosmosDB Module to support both SQL and Gremlin Databases.
2. Create a separate module for each database type that is independent.
There are no known advantages at the moment as to why a single module would be of benefit so the default decision is to use a new module for this Graph API functionality.
_Module Requirements_
- The Module if possible should be as similar to Cosmos DB as possible.
_Template Requirements_
- Database will be named with the suffix of graph to distinguish from table or db
- Database will be created as part of the Central Resource Template
- Database will be locked
- Database location and replication location will be consistent in naming patterns to Data Partitions
- Database by default will use the same type of throughput settings as CosmosDB.
---
__Acceptance Criteria__
1. Architecture Diagram Change
2. Modify or create an infrastructure module responsible for adding Cosmos Graph Database.
3. Modify Central Resources to add the additional database.
4. Ensure all Module Unit Tests Pass
5. Ensure all Template Unit Tests and Integration Tests Pass
6. Update all required documentationJanuary - 21Daniel SchollDaniel Schollhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/84Architecture Change - Data Partition - Add dedicated Storage Account for use ...2021-06-14T04:26:40ZAalekh JainArchitecture Change - Data Partition - Add dedicated Storage Account for use by Ingestion Service### Service name : Ingestion Service
#### Why is this needed
Currently we only have a single container where the data is being stored for all the dag runs. The data sharing across tasks is being done by generating SAS tokens at contain...### Service name : Ingestion Service
#### Why is this needed
Currently we only have a single container where the data is being stored for all the dag runs. The data sharing across tasks is being done by generating SAS tokens at container level. This gives the access to any dag run to access the data from any other dag runs as well. This leads to a **security concern** regarding data storage and hence and brings a requirement to change the existing infrastructure.
#### Current behavior
Generating sas tokens on a container level. This container is dedicated towards storing the data required for all the dag runs.
#### Expected behavior
The new change will add a storage account, which is dedicatedly used for ingestion-workflow where the containers will be created and deleted on the fly.
**Created** - Whenever we have the requirement to share data across tasks in workflow for a particular dag run.
**Deleted** - Once the DAG run is completed either with success or failure the container created for that DAG run is deleted.
As containers are created and deleted on the fly, a dedicated storage account is needed for this usecase so that these temporary storage containers don't pollute the existing storage account.
**Other Solutions Considered**
Explored ways to handled this isolation at directory level where we would use a single container for storing data for all the dag runs. There is no support for the SAS generation at directory level. This forced us to go with SAS generation on container level.
#### Acceptance criteria
1. Adding a new storage account to the existing infra without breaking changes
2. Ensure the unit tests for infra-azure-provisioning pass
3. Update the ingestion service code to reflect on the infra chanes (`getSignedUrl` for ingestion service to use new storage account where the sas tokens will be generated for the newly created containers)
4. Update all required documentation
5. Update architecture diagram
Storage account config requirements -
1. Replication type - LRS
2. Backup requirements - No backup
3. Data retention requirements - No data retention
## Prerequisites:
> The lock must be removed on the storage account prior to executing this change due to the removal action of a container from the storage account.
## Steps:
**Infrastructure Onboarding**
- [x] Creation of **new Storage Account**
- [x] **Deletion of storage container** - "workflow-tasks-sharing"
- [x] Obtain approval for any infrastructure requirements.
- [x] Implement any required infrastructure changes.
- [x] Obtain approval for merge request(s) containing infrastructure changes.
**Chart Onboarding**
**Integration Test Onboarding**
**Manual Onboarding**
**Automation Onboarding**January - 21Daniel SchollDaniel Schollhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/163Architecture change- service resources- Add cosmos db and Storage account2022-08-23T10:47:29ZAman VermaArchitecture change- service resources- Add cosmos db and Storage accountAdditional cosmos DB and Storage account is needed to in services resource group to support shared schemas. This database/ SA would be in addition to all the partition specific cosmos dbs/ SAs
---
__Design__
1. We already have a module...Additional cosmos DB and Storage account is needed to in services resource group to support shared schemas. This database/ SA would be in addition to all the partition specific cosmos dbs/ SAs
---
__Design__
1. We already have a module for cosmos db. The same can be leveraged to create cosmos db in service resources.
2. We already have a module for Storage account. The same can be leveraged to create Storage Account in service resources.
_Module Requirements_
- Required modules are already present
_Template Requirements_
- Database will be named with the suffix of "system" to distinguish from table or db
- Database will be created as part of the service Resource Template
- Database will be locked
- Database location and replication location will be consistent in naming patterns to Data Partitions
- Database by default will use the same type of throughput settings as other CosmosDBs.
- Storage account will be named with the suffix of "system" to distinguish from other SAs
- Storage account will be created as part of the service Resource Template
- Storage account will be locked
- Storage account location and replication location will be consistent in naming patterns to Data Partitions
---
__Acceptance Criteria__
1. Architecture Diagram Change
3. Modify Central service to add the additional database/ SA.
4. Ensure all Module Unit Tests Pass
5. Ensure all Template Unit Tests and Integration Tests Pass
6. Update all required documentation
cc: @polavishnu, @manishkM7 - Release 0.10Aman VermaAman Vermahttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/213Ariflow 2.0 Performance Improvements2021-11-17T17:08:56ZKishore BattulaAriflow 2.0 Performance Improvements**Topic:**: `Airflow 2.0 Performance Improvements`
**Tasks**
- [ ] Airflow to support 10000 parallel DAG runs at any point in time
- [ ] Airflow autoscaling shouldn't disrupt running workflows as part of scale in.
- [ ] Airflow to suppo...**Topic:**: `Airflow 2.0 Performance Improvements`
**Tasks**
- [ ] Airflow to support 10000 parallel DAG runs at any point in time
- [ ] Airflow autoscaling shouldn't disrupt running workflows as part of scale in.
- [ ] Airflow to support 8M queuing capacity
- [ ] Documentation with necessary configuration to run above mentioned performancehttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/31Assign Redis db number for service seismic-dms2021-06-23T04:36:24ZDiego MolteniAssign Redis db number for service seismic-dmsThe seismic-dms service requires a redis db number assigned.The seismic-dms service requires a redis db number assigned.https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/13Assign Redis namespace number for services2021-06-14T04:26:36Zashley kelhamAssign Redis namespace number for servicesMultiple services are using the same Redis instance deployed.
There is possibility for conflict between services with keys between different services. This could cause data leakge, corruption etc.
To prevent this we want different serv...Multiple services are using the same Redis instance deployed.
There is possibility for conflict between services with keys between different services. This could cause data leakge, corruption etc.
To prevent this we want different services to use different Redis namespaces (no. 0-15).
The infrastructure can assign the Redis database number for individual services to use so each service can just pull the assigned number.
This keeps a centralized view on the separation, lets us see when we have hit capacity (15) and means the service has less work to do to make sure they maintain a separation with one another.Sprint 10/25 - 10/31Daniel SchollDaniel Schollhttps://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/320Authentication hangs with AzureRM provider version 2.64.0 in Monitoring Resou...2023-09-18T11:13:02ZPaweł GrudzieńAuthentication hangs with AzureRM provider version 2.64.0 in Monitoring Resources terraform and needs update**Description:**
When using the AzureRM Terraform provider at version `2.64.0`, the Monitoring Resources 'terraform apply' script hangs indefinitely without providing any error message. However, after updating the AzureRM provider to a ...**Description:**
When using the AzureRM Terraform provider at version `2.64.0`, the Monitoring Resources 'terraform apply' script hangs indefinitely without providing any error message. However, after updating the AzureRM provider to a newer version, the problem is resolved, suggesting an authentication issue with version `2.64.0`. I did not capture the logs sadly.
**Details:**
In a Terraform script with the below configuration:
```
terraform {
required_version = ">= 1.3"
backend "azurerm" {
key = "terraform.tfstate"
}
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.64.0"
}
random = {
source = "hashicorp/random"
version = "=2.3.1"
}
}
}
```
The terraform apply command hangs indefinitely during execution. Although no error message was shown in standard logs, verbose logs indicated an authentication error to Azure.
**Expected Behavior:**
The terraform apply should either execute successfully.
**Actual Behavior:**
The script hangs indefinitely without any feedback to the user.
**Steps to Reproduce:**
1. Use the AzureRM provider at version `2.64.0` in a Terraform script.
2. Execute the script.
3. Observe that it hangs without any clear error message.
**Workaround:**
Upgrading the AzureRM provider to a newer version (e.g., `3.73.0`) resolves the problem.
**Suggested Fix:**
Upgrade to the latest version of the provider in documentation.
**Environment:**
- Terraform version: (e.g., 1.5.1)
- AzureRM provider version where the issue was observed: `2.64.0`https://community.opengroup.org/osdu/platform/deployment-and-operations/infra-azure-provisioning/-/issues/231Automation gaps in Release process - Phase 12022-08-11T14:53:48ZKrishna Nikhil VedurumudiAutomation gaps in Release process - Phase 1- [x] Use helm-charts from `helm-charts-azure` repository in Gitlab pipelines.
- [x] Service CI-CD pipeline will get helm-charts from the "helm-charts-azure" repository in the azure-containerize step.
- [x] The helm-package step should...- [x] Use helm-charts from `helm-charts-azure` repository in Gitlab pipelines.
- [x] Service CI-CD pipeline will get helm-charts from the "helm-charts-azure" repository in the azure-containerize step.
- [x] The helm-package step should also update "app-version" with the corresponding tag used for docker image.
- [x] Use helm upgrade command in "azure-deploy" step to push a new release.
- [x] When the service's CI-CD pipeline is triggered against release branch or a tag, a "publish" stage should get executed, that would copy helm charts and docker images from OSDU Gitlab ACR to the "msosdu" acr.
- [x] CICD for publish non-services helm-charts-azureM13 - Release 0.16Krishna Nikhil VedurumudiArturo Hernandez [EPAM]Krishna Nikhil Vedurumudi