Newer
Older
# Open Subsurface Data Universe Open Test data
Open source project for the Open Subsurface Data Universe (OSDU) containing the open test data contributed to the project or gathered from other open sources.
## Use this initial structure for organizing the repository.
marcavaldez
committed
### Work with the OSDU developers for any structural changes
Mick Bass
committed
```
marcavaldez
committed
repo_root/<version>/
├── current -> <most-recent-version>
└── <version>/
├── 1-data # originally contributed, wip, and prepared data
├── 2-scripts # scripts to prepare instances and re-project data
├── 3-schemas # schemas associated with each OSDU Resource Type in scope for demo
└── 4-instances # instances suitable for loading to OSDU:
# osdu types, reference data, master data, WPC manifests
marcavaldez
committed
Each `<version>` folder coprresponds to a release itteration of the test data, as driven by the versioning of schemas, supported types and possible extensions to the total data set.
`latest` is a symlink to the most recent release of the OSDU, and is currently at v2019-09-13. Newer versions are assumed to then be work in progress.
marcavaldez
committed
JSON files have been formatted for human readability. The python snippet below is what was used to format the files and should be ran from the directory containing the files that are desired to be formatted.
The important part is using a 4 space indent and sorting keys alphabetically.
```python
import os
import json
import glob
import io
ROOT_DIR = os.getcwd()
file_paths = glob.glob(os.path.join(ROOT_DIR, '*.json'))
for file_path in file_paths:
if file_path.endswith('.json'):
file_decoded = io.open(file_path, 'r', encoding='utf-8')
file_json = json.load(file_decoded)
with open(file_path, 'w') as outfile:
json.dump(file_json, outfile, indent=4, sort_keys=True)
marcavaldez
committed
```
### TNO / Volve Dataset
Significant efforts have been put to prepare the original [Volve](https://data.equinor.com/) and [TNO](https://www.nlog.nl/datacenter/) datasets. The prepped dataset can be found in aws s3 bucket and it has public-read permission.
To query for the dataset, use the following aws cli command.
```bash
```
To copy the dataset to the local machine, use the following command.
```bash
aws s3 cp s3://osdu-seismic-test-data/[file] [target] --no-sign-request