Audit and Metrics issueshttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues2023-06-16T21:20:50Zhttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/44Data API - File size transferred - provide alternative to gather stats over c...2023-06-16T21:20:50ZDebasis ChatterjeeData API - File size transferred - provide alternative to gather stats over certain time periodCurrent implementation gives size of one or more datasets from Seismic Store (SD_STORE).
It would be more helpful if we can obtain stats for given period, for all data uploaded to SD_STORE during that time.
That would help us to underst...Current implementation gives size of one or more datasets from Seismic Store (SD_STORE).
It would be more helpful if we can obtain stats for given period, for all data uploaded to SD_STORE during that time.
That would help us to understand which ingestion added how much data over time.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/43Data API - Time to ingest - provide alternative to obtain stats showing multi...2023-06-16T21:16:26ZDebasis ChatterjeeData API - Time to ingest - provide alternative to obtain stats showing multiple ingestions during a periodCurrent implementation gives timing for one specific ingestion.
It would be more helpful if we can obtain stats for given period, for all ingestions performed during that time.
That would help us to understand which ingestion is taking ...Current implementation gives timing for one specific ingestion.
It would be more helpful if we can obtain stats for given period, for all ingestions performed during that time.
That would help us to understand which ingestion is taking how much time.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/45Data API - File size transferred - provide option for File/Dataset service2023-06-16T20:43:49ZDebasis ChatterjeeData API - File size transferred - provide option for File/Dataset serviceCurrent implementation gives timing for one specific ingestion, and also for Seismic DDMS.
Provide similar option for file transferred through normal File/Dataset service
It would also be helpful if we can obtain stats for given perio...Current implementation gives timing for one specific ingestion, and also for Seismic DDMS.
Provide similar option for file transferred through normal File/Dataset service
It would also be helpful if we can obtain stats for given period, for all ingestions performed during that time.
That would help us to understand which ingestion added how much data.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/42Data API - Search speed - provide alternative to gather statistics for certai...2023-06-16T20:35:02ZDebasis ChatterjeeData API - Search speed - provide alternative to gather statistics for certain periodCurrent implementation gives timing for one specific search.
It would be more helpful if we can obtain stats for given period, for all searches performed during that time.
That would help us to detect cases where some user's searches ta...Current implementation gives timing for one specific search.
It would be more helpful if we can obtain stats for given period, for all searches performed during that time.
That would help us to detect cases where some user's searches take long.
You may use same construct like you did for some Platform APIs.
```
{
"metric_id":"cpu/usage_time",
"interval" : "3600",
"start_time":"01/06/23 21:55:19",
"end_time":"02/06/23 01:30:19"
}
```https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/41Audit and Metrics is not portable2023-01-11T17:33:02ZMorris EstepaAudit and Metrics is not portableThe current implementation of Audit and Metrics requires users to directly modify a python file (data.py) in order to get the service to run in a given environment. This makes the service non-portable.
We need to change the service to m...The current implementation of Audit and Metrics requires users to directly modify a python file (data.py) in order to get the service to run in a given environment. This makes the service non-portable.
We need to change the service to make environment variables configurable.Debasis ChatterjeeChad LeongDebasis Chatterjeehttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/1Top-level issue to track requirements for metrics, tracing and audit logging2022-08-23T15:55:57ZRaj KannanTop-level issue to track requirements for metrics, tracing and audit logging**Log, Metric Aggregation**
The OSDU Data platform is a large complex system with multiple services (SLIs) running against multiple Infrastructure components (System Telemetry). Holistic monitoring and event correlation requires aggregat...**Log, Metric Aggregation**
The OSDU Data platform is a large complex system with multiple services (SLIs) running against multiple Infrastructure components (System Telemetry). Holistic monitoring and event correlation requires aggregating logs and metrics from all of these sources.
- OSDU operations readiness workstream recommends deploying and configuring a Central logging service that is decoupled and isolated from the Data Platform implementation that can act as a simple point of access for filtering, searching, alerting, notification and dashboards.
- Isolation is important to avoid having the logging system subject to the same reliability issues as the OSDU data platform.
### Alerts and Notification
- The system should be able to detect slow and fast burning issues based on threshold and trends
### Metric Examples
- availability - the % of successful responses
- latency & performance- the % of requests that complete faster than a target
- freshness - the % of data that is up to date
- correctness - the % of request that return the correct result
- durability - the % of data that is recorded that can be read successfully
### Overview from Ops Workstream
![Overview](/uploads/6488c20d2d78430f94afe9093016b62a/image.png)
Please refer to the following links
- [API Logging Requirements for Cyber Security](https://community.opengroup.org/osdu/platform/security-and-compliance/home/-/issues/6)
- [Platform logging requirements for Cyber Security](https://community.opengroup.org/osdu/platform/security-and-compliance/home/-/issues/5)M1 - Release 0.1https://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/40GCP Endpoints Authentication/Authorization2022-05-23T16:18:13ZSiarhei Khaletski (EPAM)GCP Endpoints Authentication/Authorization# Context
GCP/EPAM team has finished onboarding of the service.
Now all endpoints of the service are open to the world. In our view, it is not secure to provide an access to the information about GCP environment/metrics.
We noticed `a...# Context
GCP/EPAM team has finished onboarding of the service.
Now all endpoints of the service are open to the world. In our view, it is not secure to provide an access to the information about GCP environment/metrics.
We noticed `auth.py` module but it seems like not completed. From security perspective it requires to pay additional attention to security concerns.
# Issue
The MR !14 has some drawbacks, namely there were used `username`, `password`, `secret` properties to manage token validation (look at URI on screenshot above).
![Screen_Shot_2022-03-09_at_2.21.43_PM](/uploads/7fa7900c9dac811e89ce6bbd3ac4bdaa/Screen_Shot_2022-03-09_at_2.21.43_PM.png)
For GPC we can't use external system to receive the `x-access-token`.
# Expected Behavior
All GCP endpoints require `access_token` (not `id_token`) for user authentication and authorization.
The token should be received from `https://oauth2.googleapis.com/token` Google Oauth Endpoint.
On the code level `google.oauth` package can be used for the token validation.
# Improvement Proposal
Potentially, all user access rights for `Audit & Metrics` service can be managed by `OSDU Entitlements service`.M11 - Release 0.14Srinivasan RamamoorthiSrinivasan Ramamoorthihttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/26Search Performance Stability2021-10-18T13:18:03ZStephen Whitley (Invited Expert)Search Performance Stability## Description
The time to service a repeatable search request as seen from the outside.
## Scope
Search Service
## How Measured
Implement an external Probe to hit the Search Service with a well-known search request.
## Where Meas...## Description
The time to service a repeatable search request as seen from the outside.
## Scope
Search Service
## How Measured
Implement an external Probe to hit the Search Service with a well-known search request.
## Where Measured
Probehttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/31Service / Environment KPIs2021-10-18T13:18:00ZStephen Whitley (Invited Expert)Service / Environment KPIs## Description
Typical resource consumption rates for the OSDU Services
## Scope
All Services
## How to Measure
Infrastructure metrics for
- CPU
- Memory
- IOPs## Description
Typical resource consumption rates for the OSDU Services
## Scope
All Services
## How to Measure
Infrastructure metrics for
- CPU
- Memory
- IOPsupendra kumarupendra kumarhttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/22Data Ingestion Latency2021-10-18T13:17:53ZStephen Whitley (Invited Expert)Data Ingestion Latency## Description
Loading latency, from request data load to data availability through Search
## Scope
Data ingestion workflow
## How Measured
Time of Index completion - Time of Load request
## Where Measured
Indexing Service log
Wor...## Description
Loading latency, from request data load to data availability through Search
## Scope
Data ingestion workflow
## How Measured
Time of Index completion - Time of Load request
## Where Measured
Indexing Service log
Workflow Service Loghttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/34Data Ingestion Volume2021-10-18T13:17:37ZStephen Whitley (Invited Expert)Data Ingestion Volume## Description
Total number of records (metadata) and content per Ingestion Job
## Scope
- Ingestion workflow
- DDMS## Description
Total number of records (metadata) and content per Ingestion Job
## Scope
- Ingestion workflow
- DDMShttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/25Average Search Performance2021-10-18T13:16:38ZStephen Whitley (Invited Expert)Average Search Performance## Description
The time to service a search request as seen by the users
## Scope
Search Service
## How Measured
API request/response time
## Where Measured
API Gateway / Load Balancer## Description
The time to service a search request as seen by the users
## Scope
Search Service
## How Measured
API request/response time
## Where Measured
API Gateway / Load Balancerhttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/19Service Availability2021-10-18T13:16:24ZStephen Whitley (Invited Expert)Service Availability## Description
The availability of core services as measure by the percentage of successful requests
## Scope
Core Services
## How computed
Successful http requests / Total http requests
where successful http requests include: 2XX...## Description
The availability of core services as measure by the percentage of successful requests
## Scope
Core Services
## How computed
Successful http requests / Total http requests
where successful http requests include: 2XX, 4XX & unsuccessful includes 5XX
## Where measured
API Gatewayhttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/32API Request Activity and Backlog2021-10-18T13:14:28ZStephen Whitley (Invited Expert)API Request Activity and Backlog## Description
Monitor the HTTPS Request queue for each endpoint. Provide a moving window count (activity in the last hour, day, week)
## Where measured
Either at the API endpoints of the Service logs## Description
Monitor the HTTPS Request queue for each endpoint. Provide a moving window count (activity in the last hour, day, week)
## Where measured
Either at the API endpoints of the Service logshttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/21Search Activity2021-10-18T13:11:19ZStephen Whitley (Invited Expert)Search Activity## Search Activity
Number of search requests returned as a moving window (1 week).
## Scope
Search Service
## How Measured
Either:
Number of Search requests hitting the platform as measured from the API
or
Number of Search reques...## Search Activity
Number of search requests returned as a moving window (1 week).
## Scope
Search Service
## How Measured
Either:
Number of Search requests hitting the platform as measured from the API
or
Number of Search requests mined from the Search Log
## Where Measured
Search API or Search Audit Loghttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/39Elasticity: Autoscaling2021-10-18T11:43:23ZStephen Whitley (Invited Expert)Elasticity: Autoscaling# SLI Title
## SLI Group
- [ ] Data Access & Egress
- [ ] Data Access Rights
- [ ] Data Governance
- [ ] Data Ingest
- [ ] Data Quality
- [ ] Data Volume
- [ ] Delivery
- [X] Platform Performance
- [X] Platform Cost
- [ ] Platform Trac...# SLI Title
## SLI Group
- [ ] Data Access & Egress
- [ ] Data Access Rights
- [ ] Data Governance
- [ ] Data Ingest
- [ ] Data Quality
- [ ] Data Volume
- [ ] Delivery
- [X] Platform Performance
- [X] Platform Cost
- [ ] Platform Traction
- [ ] Search
- [ ] Other
## How Measured
Capture Autoscaling events (scale up/scale down)
## Dependencieshttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/38Data Ingestion Status2021-10-18T11:41:34ZStephen Whitley (Invited Expert)Data Ingestion Status# Breakdown of Data Ingestion by Status
Error : violates ingestion rule
Failure : The workflow fails
Success : The ingestion job completes correctly.
## SLI Group
- [ ] Data Access & Egress
- [ ] Data Access Rights
- [ ] Data Governanc...# Breakdown of Data Ingestion by Status
Error : violates ingestion rule
Failure : The workflow fails
Success : The ingestion job completes correctly.
## SLI Group
- [ ] Data Access & Egress
- [ ] Data Access Rights
- [ ] Data Governance
- [X] Data Ingest
- [ ] Data Quality
- [x] Data Volume
- [ ] Delivery
- [ ] Platform Performance
- [ ] Platform Traction
- [ ] Search
- [ ] Other
## How Measured
Ingestion Logs
## Dependencieshttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/37SeismicDMS API's Error in Azure Preshipping Environment r3m8.2021-10-06T18:55:19ZAnanth TSeismicDMS API's Error in Azure Preshipping Environment r3m8.@manishk @todaiks As per the recent discussion (with Kamlesh) and further execution/analysis ....We are unable move forward in SeismicDMS api's using the AZURE Preshipping environment r3m8. It's only giving the response for the Legal Tag...@manishk @todaiks As per the recent discussion (with Kamlesh) and further execution/analysis ....We are unable move forward in SeismicDMS api's using the AZURE Preshipping environment r3m8. It's only giving the response for the Legal Tag (New seismic store) and further the subsequent API's are ended up with 404 errors.
The details of the results has been enclosed. Request you to kindly review the PPT and revert me with the possible solution.
[Azure_Preshipr3m8_Errors_in_SeismicDMS_Collection_1021.pdf](/uploads/e0a6660ba2718c71c0cf47b6eee9af56/Azure_Preshipr3m8_Errors_in_SeismicDMS_Collection_1021.pdf)MANISH KUMARMANISH KUMARhttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/33Data Ingestion Success Rate (Data Related)2021-09-09T06:58:30ZStephen Whitley (Invited Expert)Data Ingestion Success Rate (Data Related)## Description
## Scope
Data Ingestion
## How Measured
Gross:
- Total successful Jobs / Jobs
Fine grained:
- Total successful records / total records
## Where Measured
Status logs from Ingestion workflow / operators
Metrics to b...## Description
## Scope
Data Ingestion
## How Measured
Gross:
- Total successful Jobs / Jobs
Fine grained:
- Total successful records / total records
## Where Measured
Status logs from Ingestion workflow / operators
Metrics to be defined and emitted.Ananth TMohd Asad ShaikhAnanth Thttps://community.opengroup.org/osdu/platform/deployment-and-operations/audit-and-metrics/-/issues/35Platform Data Volume2021-09-09T06:14:54ZStephen Whitley (Invited Expert)Platform Data Volume## Description
These are two measurements
1. Total amount of data "known" to the platform
2. Total infrastructure resources consumed by the Data Platform
## Scope
1. Data referred to by the Storage Service
2. Document Store, Blob S...## Description
These are two measurements
1. Total amount of data "known" to the platform
2. Total infrastructure resources consumed by the Data Platform
## Scope
1. Data referred to by the Storage Service
2. Document Store, Blob Store, Elastic Store
## How Measured
1. Query storage service for Records, and chase content data references to capture full content
2. Infrastructure metricsMohd Asad Shaikhupendra kumarMohd Asad Shaikh