ADR: Enabling User Context in Ingestion
Currently Ingestion Jobs in OSDU like CSV Parser, Manifest Ingestion etc. uses Service Account Token while calling any OSDU Service APIs like Storage/Dataset, which means any Authorization checks happening for API Access or Data Level Access (ACL checks in Storage service) is based on permission level of Service Account rather than based on the User who initiated the Ingestion in the first place.
Therefore, the Users indirectly gets highest level of permissions in OSDU which can be used to modify data of other users in the system (a scenario from CSV Parser will be discussed later to understand the issue better). This problem is not just specific to Ingestion but can be true for any service which performs long running jobs and is relying on Service Account Token. For rest of this ADR we will discuss Ingestion Scenario and related flows to highlight the problem and solution, but as said it can be applicable to OSDU in general.
This is the similar problem raised in the Issue - osdu/platform/data-flow/ingestion/csv-parser/csv-parser!219 (closed)
And proposed temporary fix - osdu/platform/data-flow/ingestion/csv-parser/csv-parser!219 (closed)
This ADR is targeting to solve it holistically for similar use cases
Note – In this ADR all the references to Service Account Token means the Service Account Token used by OSDU Services internally for any Service to Service calls, so this is kind of a privileged user. We are not referring to any external Service accounts which might be getting used by Clients for external applications
- Why can't these long running Jobs rely on User Tokens passed in request Headers by the End User? - Because of long running nature of jobs Tokens will eventually expire and would require renewal, and System can't renew these User Tokens as it can't have access to User specific Credentials and Auth Codes
Scenario to Understand this Issue
In CSV Ingestion IDs of Record generated are predetermined by using Natural Keys [Code Link] And there is possibility of two users ingesting records with same Ids
Now say User A invoked CSV Ingestion where he tried to ingest an existing record again (created by a different User B with xyz ACLs). User A who is trying to effectively update this record may not have access to the ACLs associated with it but since the Ingestion Job uses Service Account Token, ACL validations will succeed in Storage service Create/Update Record flow (Service Accounts are part of
users.data.root group which gives them access over entire data in the system)
As a result User A updated records created by User B resulting in data loss for original user which is not expected behavior and a major gap in Authorization. Similar issue can happen in Manifest Ingestion as well, if a user tried to Ingest Manifest with same Record IDs.
Key Issue to Address
Any Flow which is invoked by an External User should preserve user context and use the same for Authorization, hence for any User to Service Call and subsequent internal Service to Service Calls should use user identity (context)
All the asynchronous flows like Indexer Queue, WKS, Notification etc. can still rely on Internal Service Account for Authorization as these are some background system operations and not user driven
We can leverage the SPI Layer (Service Mesh/API Gateway) responsible for Authentication & Identity Resolution in this scenario. So as part of Entitlements V2 onboarding, Authentication and identity resolution was extracted from Entitlements service and service expects the identity to be provided to it in the requests. The x-user-identity header is an expected header on the requests into the service. This header provides the identity of the user in the request and is set by the SPI Layer (Service Mesh/Gateway)
Service Side Changes
A new header x-on-behalf-of will be introduced which will store the user identity (context)
Workflow Service will add a new field user identity (present in DPS Headers) to Airflow
Confwhile triggering the Dag Run
Ingestion Jobs (CSV/Manifest) will extract the newly added user identity field from Airflow Context and then set x-on-behalf-of header in the requests before calling any downstream services
Change in SPI Layer (Service Mesh)
If the request contains Internal Service Account Token and x-on-behalf-of header is not empty or null, then the x-user-id header will be set to x-on-behalf-of header
Else set the x-user-id header by existing logic
This allows preserving the User Identity (Context) and hence all API Level and Data Level Authorization checks will be performed based on Entitlement Groups of the User rather than Service Account
Authentication can still be carried out using Service Account token as the User was already authenticated when they triggered Workflow API
Scope of Proposed Solution
The above proposed Solution is for Trusted DAGs only and not for any custom private DAGs. So when we say trusted we mean piece of Code which is reviewed and signed off by all Stakeholders like code for OSDU services, so trusted DAGs here will include all the community DAGs which are present like CSV Parser, Manifest, SGY, VDS, WITSML etc.
Going forward we might want to support clients bringing in their own custom private DAGs, then the obvious question is how will we enforce those new DAGs/Ingestors adhere to the proposed guidelines and setting of x-behalf-of headers and things. Now we will anyways publish guidelines for new developers to follow while bringing new DAGs but enforcement angle is still missing and needs to be ironed out. Hence to clearly call out, this ADR is not focusing on handling custom private DAGs and a separate ADR and discussion will be proposed
Advantages of Above Approach
ACL validations will be performed based on user-id instead of Service Account which resolves the elevated permissions issue
Service side code changes are minimal and will be scoped to setting up request headers/payloads
Implementation of adding the header logic can be taken up each CSP as per their Infra
Hardens the Authorization checks as it also ensures that only Users with appropriate API permissions will be able to Trigger Ingestion
No change in service behavior expected
User to Service Calls and Service to Service Calls
This is a proposed enhancement in extension to the above approach to further harden the security of the system. Currently in OSDU deployment, the Internal Service Account Token can be directly used by clients to invoke OSDU APIs
The Privileged Service Account token should be restricted for usage by only internal services and external users shouldn’t be permitted to perform this operation. Customers can still create and use any other Service Account for external calls eg. for use cases like an external monitoring service etc. only restriction is on the usage of the internal account for external calls. This will block any malicious tempering of x-on-behalf-of header from outside and also increases security in case of accidental Service Account Token leak.
Mechanism to distinguish external calls vs internal calls, this can be handled by each CSP in their own Service Mesh Implementation
Block any external calls made using Internal Service Account Token
Details on response codes can be flushed out later
Q - Is there any security Issue by sending the user Identifiers in plain text in request headers during service-to-service communication?
A - Here CSPs can leverage their infrastructure and enable encryption of all the traffic sent between service containers in OSDU
Q - What is the Service Account we are referring to in the ADR?
In the ADR all the references to Service Account Token means the Service Account Token used by OSDU Services internally for any Service to Service calls, so this is kind of a privileged user. We are not referring to any external Service accounts/SPN which might be getting used by Clients for external applications
Q - Ingestors (DAGs) can execute any piece of Code and the onus is on these DAGs to set x-behalf-of header with user identity so how do we ensure the header is indeed correctly set with right user identity?
A - This question is from viewpoint of consumption i.e. how will these proposed changes in the design consumed by the DAGs, we would leverage core libraries for implementation of any Http clients and header manipulation logic in the DAGs. For instance we have osdu-airflow library getting used across all the DAGs, for csv parser we have a os-core-common java library etc. hence all the logic should be scoped to the common libraries and developers of new DAGs should consume these libraries only.
Now this still doesn't resolve the enforcement angle to it, like how we can enforce and reject any DAGs not adhering to this design, so as mentioned this is not in scope for this ADR
Q - Do we need any validation in Service Mesh to ensure x-on-behalf-of header was correctly set by Ingestors?
A - This is related with previous question, for scope of this ADR we are assuming DAGs and OSDU services are trusted and won't be manipulating the headers in any wrong manner
Q - Will this handle Scenarios where an Ingestion Job may be execute outside of System Airflow Cluster for instance may be an external application or an external Airflow Cluster?
A - In case of any external ingestion jobs, any outside calls to OSDU services using Internal Service Account will be blocked so the x-on-behalf-of flow won't be involved. Therefore existing logic of JWT claims extraction will be followed and these external jobs should either use User tokens or create a new service account for token generation and this will automatically take care of elevated privileges
Q - Are we proposing blocking all Service Account calls from outside as part of Security Enhancement?
A - No only the external calls using internal privileged Service Account will be blocked