OpenDES Ingestion Workflow Developer
These stories come the data workflow team and representing the current capabilities available in OpenDES
- As a developer, I expect to be able to develop my software with simple, standardized, strongly opinionated, and technology-agnostic interfaces.
- As a developer, I need to be able to ingest data from my company’s proprietary application into the appropriate data store through specialized (and potentially proprietary) DDMS services
- As a developer, I need the ability to ingest structured data from my company’s proprietary application (usually an RDMS database) directly into storage service
- As a developer, I need the ability to register a proprietary DDMS for data that doesn’t have a dedicated DDMS, for example, point cloud data (XYZ) with billions of points for a single record
- As a developer, I need the ability to register and de-register, on the fly, multiple proprietary domain file parsers. Also, an ability to automatically invoke one or multiple parsers depending on the type, source, size, tool, author, etc of the data that is getting stored in any DDMS. Play and continuously evolve the parsers capabilities in the live environment.
- As a developer, I need the ability to plug a service that runs various algorithms on each data as and when it is stored in the data platform and derive a quality score for each data.
- As a developer, I need the ability to build a self-service log enrichment (Quality check, Interpretations, etc) process that reacts to, a log measurement stored, data platform event.
- As a developer, I need the ability to build a self-service document enrichment (OCR, Classify, extract, etc) process that reacts to, a document stored, data platform event.
- As a developer, I need the ability to make all the data liberated from my application to be discoverable to all the other platform-native applications.
- As a developer, I need the ability to build a self-service spatial indexer service that reacts to, a spatial data change, data platform event.
Added Sept 23, 2020
As a Workflow Manager, I need the ability to
- Register components (parsers or otherwise) that can be reused and referenced from within multiple workflows.
- Register & trigger a customized workflow based on any number of data factors, not just data type (i.e. avoid architecting a rigid 1:1 relationship between ingestion workflows and data types)
These guiding principles apply to all of the user stories that support OpenDES behavior
- Simple, Dedicated and Efficient APIs as ingestion entry points for each data format
- Store original high-fidelity data as is. Data in its original form must land first in the most appropriate store. Ex: - A DLIS file must land in the File DMS, A ZGY seismic survey must land in the Seismic DMS, etc. Once the data, in its original form, lands in the appropriate DMS, Parsers/Scanners/Enrichment processes can be applied to that data. The original data is stored as-is and parsers/scanner/clean-up processes will output derived entities.
- Framework should inherently enforce data lifecycle, Original Well Known Structure Well known Entity
- Extensibility of Parsers/Scanner/Enrichment processes through Configurations/Registrations
- Track flow and lifecycle of data in the data platform
Ability to port Ingestion workflows that currently run on OpenDES into OSDU