Skip to content

GONRG-3993 Migrate csv-parser to anthos (new approach)

Anastasiia Gelmut requested to merge gcp-obm-oqm-migration into master

Type of change

  • Bug Fix
  • Feature

Please provide link to gitlab issue or ADR(Architecture Decision Record)

Does this introduce a change in the core logic?

  • [NO]

Does this introduce a change in the cloud provider implementation, if so which cloud?

  • AWS
  • Azure
  • GCP
  • IBM

Does this introduce a breaking change?

  • [NO]

What is the current behavior?

Csv-parser works with Blobs and messaging directly.

What is the new/expected behavior?

Csv-parser service will use EPAM OBM, OQM mappers for data management flexibility

Have you added/updated Unit Tests and Integration Tests?

yes

Any other useful information

Features of implementation

This is a universal solution created using EPAM OBM and OQM mappers technology. It allows you to work with various implementations of data stores and message brokers.

Limitations of the current version

In the current version, the mappers are equipped with several drivers to the stores and the message broker:

  • OBM (mapper to Blob stores): Google Cloud Storage (GCS); MinIO
  • OQM (mapper to message brokers): Google PubSub; RabbitMQ

Extensibility

To use any other store or message broker, implement a driver for it. With an extensible set of drivers, the solution is unrestrictedly universal and portable without modification to the main code.

Mapper tuning mechanisms

This service uses specific implementations of DestinationResolvers based on the tenant information provided by the OSDU Partition service. A total of 6 resolvers are implemented, which are divided into two groups:

for universal technologies:
  • for MinIO: mappers/oqm/MioTenantOqmDestinationResolver.java
  • for RabbitMQ: mappers/oqm/MqTenantOqmDestinationResolver.java
Their algorithms are as follows:
  • incoming Destination carries data-partition-id
  • resolver accesses the Partition service and gets PartitionInfo
  • from PartitionInfo resolver retrieves properties for the connection: URL, username, password etc.
  • resolver creates a data source, connects to the resource, remembers the datasource
  • resolver gives the datasource to the mapper in the Resolution object
for native Google Cloud technologies:
  • for GCS: mappers/oqm/CsTenantOqmDestinationResolver.java
  • for PubSub: mappers/oqm/PsTenantOqmDestinationResolver.java
Their algorithms are similar,

Except that they do not receive special properties from the Partition service for connection, because the location of the resources is unambiguously known - they are in the GCP project. And credentials are also not needed - access to data is made on behalf of the Google Identity SA under which the service itself is launched. Therefore, resolver takes only the value of the projectId property from PartitionInfo and uses it to connect to a resource in the corresponding GCP project.

Merge request reports