ADR - Platform Feature Flag Management Standardization
ADR: Platform Feature Management Standardization.
Status
-
Proposed -
Trialing -
Under review -
Approved -
Retired
Context
- The OSDU Platform uses a Feature Flag pattern to control releasing software allowing for code to be continually deployed into production environments while optionally controlling whether the functional change the code enables is accessible. https://osdu.projects.opengroup.org/pmc/work-products/pmc-portal/pmc-policies/main/projects/feature-flag.html
- The usage of Feature Flags is inconsistent, with some being configured via environment variables, which makes them global, but likely not because it's a requirement, but rather due to how they've been implemented in practice, for example, Policy integration or Global Status Publishing, etc. osdu/platform&29.
Problem Statement
- Platform maintenance and configuration aren't centralized in one location, instead, they are scattered across environment variables, application properties files, and Partition service configurations. To update some feature flags the environment settings need to be modified, and the services will require redeployment.
- There is no proper way to configure the feature flag globally and per tenant. It's either global or per tenant. For example, it's not possible to enable Policy integration for the platform as a whole but disable it for specific tenants.
- The current global configurations managed through environment variables are too low-level; updating config maps and redeploying services are necessary to make changes.
Proposal
It is proposed to use Partition info to hold feature flags for specific services and utilise existing implementation for this with some enhancements: to have an ability to use several flags for different services we need to cache whole partition info propertied set instead of single property like it was implemented for Policy service individually (note: this implementation is not used).
Decision
System Partition info should be used to determine whether a feature is disabled globally or not.
The existing environment variables approach will be deprecated - we will use the system partition to hold all the global environment flags and variables. These system flags can be overridden by partition configurations, allowing granular feature control while still making it possible to define the behavior of a platform in general if there is no need for granularity.
The implementation code may be moved to the common library for reuse in different services.
Naming conventions:
system feature flags: system.feature.[feature_name].enabled. Example: system.feature.policy.enabled
feature flags: [partition_id].feature.[feature_name].enabled. Example: opendes.feature.policy.enabled
Generic flow implementation is described in the diagram below:
- Partition info is taken from the cache (if present) or obtained from the Partition service API call and cached afterward for performance optimization with reasonable invalidation timeout (5-10 minutes, should be configurable)
- Algorithm checks partition info for the corresponding feature flags
- uses system partition variable - if it is set to false considers the corresponding feature as switched off
- otherwise uses partition-specific flag value
Rationale
- By having standardized feature management it will be possible to introduce new features gradually. For example, enabling them for a single tenant for test purposes.
- It will be possible to manage resources more efficiently, for example, it will be possible to configure GSM(global status messaging) per tenant, not per platform, enabling it only where it's needed.
- It will be possible to manage all features in the runtime without restarts and redeployments.
- It will be possible to manage features exclusively via API thus allowing implementing admin UI board.
Consequences
- We will have to enforce to use of a new approach for new feature implementations managed by feature flags. For that, we might need a guide, on how to update Partition configurations for dev environments for each CSP.