OpenDES Data Manager/Administrator User Stories
These stories come the data workflow team and representing the current capabilities available in OpenDES
As a data manager or data administrator, I expect to be able to ingest data amongst several workflows, as follows:
- I need to be able to ingest wellbore data from a single file which may contain one or many data records inside of it that represent a single entity type (e.g. a CSV file, that represent wellbores), and have that data available in the Wellbore DDMS.
- I need to be able to ingest wellbore measurements from a single file which may contain many log channels and/or represent one or more entity types (e.g. a LAS file in LAS2.0 format) and have that data available in the Wellbore DDMS.
- I need to be able to ingest survey geometries from a single file which may contain one or many files inside of it that in conjunction represent a single entity type (e.g. a ShapeFile, that represents country geometries), and have that data available in the spatial canvas.
- I need to be able to ingest a single data pack of files which represents a single “dataset” (e.g. several PDF’s, zips, CSV’s, TIFFs, etc). Each of the files could contain one or more entity types and be ingested in different ways, and potentially stored in different data stores but collectively managed as a single unit (e.g. a single “ingestion job”)
- I need to be able to track the status of ingestion jobs, including the data flows throughout the system, e.g. what’s been ingested into storage, what’s been transformed into the standard schema (e.g. “WKS”), what’s been merged into a WKE (if anything), what’s been indexed in search, what’s been indexed into the spatial canvas, what’s been indexed into the analytics canvas, etc.
- I need to be able to do basic job management functions – such as pause, stop, resume, start on ingestion (and downstream) jobs.
- I need to be able to understand what data has been ingested directly into the various DDMS, for example a data migration wrote directly to Wellbore DDMS.
- I need to be able to manage the schemas of my data that I bring into the system – for example adding an attribute, or declaring frame of reference information such as units, date formats, etc.
- I need the ability to configure mappings from “Raw” to “WKS” and automatically transform the data from the standardized schema (WKS).
- I need the ability to configure merge rules from “WKS” to “WKE” and automatically create the merged records.
- I need the ability to quickly visualize file metadata for large files (~5GB) so that I can provide appropriate inputs so that downstream ingestion happens seamlessly, such as declaring units, CRS, and other frame of reference information.
Added Sept 23, 2020
- Ingest child objects in some cases before a parent well and bind the child to the parent when it becomes available in the system. (Particularly useful in the case of migrating existing data into the platform).
- Perform mass ingestion at a “new environment” scale (minimum 12 million+ records as a bulk load)
- Perform compositing of data records that correspond to the same object but may have been sourced from different existing SoRs.
- Specify multiple alias names when creating a well record and search by any one of those well alias names (I don’t recall if it is already possible in the latest iteration of OSDU to search for any wells that have a particular alias name, while supporting the possibility that a well could have multiple aliases that are indexed. Although this is a search-related need, there would be implications on ingestion (& data definitions) to make this possible)
- Perform bulk updates of records based on a condition (i.e. update all wells whose operator is “OPCO USA” to “OPCO”)
- Monitor/track the progress of Ingestion jobs (I know that this is being delivered as part of the Ingestion Workflow design, but figured it should be formally captured)
These guiding principles apply to all of the user stories that support OpenDES behavior
- Simple, Dedicated and Efficient APIs as ingestion entry points for each data format
- Store original high-fidelity data as is. Data in its original form must land first in the most appropriate store. Ex: - A DLIS file must land in the File DMS, A ZGY seismic survey must land in the Seismic DMS, etc. Once the data, in its original form, lands in the appropriate DMS, Parsers/Scanners/Enrichment processes can be applied to that data. The original data is stored as-is and parsers/scanner/clean-up processes will output derived entities.
- Framework should inherently enforce data lifecycle, Original Well Known Structure Well known Entity
- Extensibility of Parsers/Scanner/Enrichment processes through Configurations/Registrations
- Track flow and lifecycle of data in the data platform
Ability to port Ingestion workflows that currently run on OpenDES into OSDU
For more information, contact @tdixon.