README.md

# Open Subsurface Data Universe Open Test data

Open source project for the Open Subsurface Data Universe (OSDU) containing the open test data contributed to the project or gathered from other open sources.

## Use this initial structure for organizing the repository.

### Work with the OSDU developers for any structural changes

```
repo_root/<version>/
├── current -> <most-recent-version>
└── <version>/
     ├── 1-data      # originally contributed, wip, and prepared data
     ├── 2-scripts   # scripts to prepare instances and re-project data
     ├── 3-schemas   # schemas associated with each OSDU Resource Type in scope for demo
     └── 4-instances # instances suitable for loading to OSDU:
                     #   osdu types, reference data, master data, WPC manifests
```

Each `<version>` folder coprresponds to a release itteration of the test data, as driven by the versioning of schemas, supported types and possible extensions to the total data set.

`latest` is a symlink to the most recent release of the OSDU, and is currently at v2019-09-13. Newer versions are assumed to then be work in progress.

### Formatting JSON Files

JSON files have been formatted for human readability. The python snippet below is what was used to format the files and should be ran from the directory containing the files that are desired to be formatted.

The important part is using a 4 space indent and sorting keys alphabetically.

```python
import os
import json
import glob
import io

ROOT_DIR = os.getcwd()

file_paths = glob.glob(os.path.join(ROOT_DIR, '*.json'))

for file_path in file_paths:
    if file_path.endswith('.json'):
        file_decoded = io.open(file_path, 'r', encoding='utf-8')
        file_json = json.load(file_decoded)
        with open(file_path, 'w') as outfile:
            json.dump(file_json, outfile, indent=4, sort_keys=True)
```

### TNO / Volve Dataset

Significant efforts have been put to prepare the original [Volve](https://data.equinor.com/) and [TNO](https://www.nlog.nl/datacenter/) datasets. The prepped dataset can be found in aws s3 bucket and it has public-read permission.

To query for the dataset, use the following aws cli command.

```bash
aws s3 ls s3://osdu-seismic-test-data/ --no-sign-request
```

To copy the dataset to the local machine, use the following command.

```bash
aws s3 cp s3://osdu-seismic-test-data/[file] [target] --no-sign-request
```