README.md 15 KB
Newer Older
1
2
# Introduction

Luc Yriarte's avatar
Luc Yriarte committed
3
Wellbore Domain Data Management Services (Wellbore-DDMS) Open Subsurface Data Universe (OSDU) is one of the several backend services that comprise OSDU software ecosystem. It is a single, containerized service written in Python that provides an API for wellbore related data.
4

Luc Yriarte's avatar
Luc Yriarte committed
5
6
[[_TOC_]]

7
8
## Install Software and Packages

Luc Yriarte's avatar
Luc Yriarte committed
9
1. Clone the os-wellbore-ddms [repository](https://community.opengroup.org/osdu/platform/domain-data-mgmt-services/wellbore/wellbore-domain-services.git)
10
11
12
2. Download [Python](https://www.python.org/downloads/) >=3.7
3. Ensure pip, a pre-installed package manager and installer for Python, is installed and is upgraded to the latest version.

Luc Yriarte's avatar
Luc Yriarte committed
13
14
15
16
      ```bash
      # Windows
      python -m pip install --upgrade pip
      python -m pip --version
17

Luc Yriarte's avatar
Luc Yriarte committed
18
19
20
21
      # macOS and Linux
      python3 -m pip install --upgrade pip
      python3 -m pip --version
      ```
22
23
24

4. Using pip, download [FastAPI](https://fastapi.tiangolo.com/), the main framework to build the service APIs. To install fastapi and uvicorn (to work as the server), run the following command:

Luc Yriarte's avatar
Luc Yriarte committed
25
26
27
    ```bash
    pip install fastapi[all]
    ```
28

Luc Yriarte's avatar
Luc Yriarte committed
29
5. [venv](https://docs.python.org/3/library/venv.html) allows you to manage separate package installations for different projects. They essentially allow you to create a "virtual" isolated Python installation and packages into that virtual environment. venv is already included in the Python standard library and requires no additional installation.
30
31
32
33
34
35
36
37
38

### Fast API Dependencies

- [pydantic](https://pydantic-docs.helpmanual.io/): provides the ability to do data validation using python type annotations. It enforces type hints at runtime provide a more robust data validation option.
  - [dataclasses](https://pydantic-docs.helpmanual.io/usage/dataclasses/): module in python which provides a decorator and functions for automatically adding generated special methods to user-defined classes.
- [starlette](https://fastapi.tiangolo.com/features/#starlette-features): lightweight ASGI framework. FastAPI is a sub-class of Starlette and includes features such as websocket support, startup and shutdown events, session and cookie support.

### Additional Dependencies

39
- [uvicorn](https://www.uvicorn.org/) used as ASGI server to run Wellbore-DDMS app
40
41
42
43
44
- [cachetools](https://pypi.org/project/cachetools/)
- [pyjwt](https://pypi.org/project/PyJWT/) and [cryptography](https://pypi.org/project/cryptography/) for auth purposes
- [pandas](https://pandas.pydata.org/) and [numpy](https://numpy.org/) for data manipulation
- [pyarrow](https://pypi.org/project/pyarrow/) for load and save data into parquet format
- [opencensus](https://opencensus.io/guides/grpc/python/) for tracing and logging on cloud provider
45
- [dask](https://docs.dask.org/en/latest/) to manage huge amount of bulk data
46
47
48
49

### Library Dependencies

- Common parts and interfaces
50
  - osdu-core-lib-python
51
52

- Implementation of blob storage on GCP
53
  - osdu-core-lib-python-gcp
54

Luc Yriarte's avatar
Luc Yriarte committed
55
- Implementation of blob storage and partition service on Azure
56
  - osdu-core-lib-python-azure
Luc Yriarte's avatar
Luc Yriarte committed
57

58
59
60
- Client libraries for OSDU data ecosystem services
  - osdu-data-ecosystem-search
  - osdu-data-ecosystem-storage
61
62
63

## Project Startup

64
65
66
67
68
69
70
71
72
73
### Dask Configuration - Locally
By default, It will use all memory available and use CPU resources through workers. The number of workers is determined by the quantity of core the current local machine has.

### Dask Configuration - In a cluster
In a container context, such as Kubernetes we recommend to set container memory limit at 3Gi of RAM and 4-8 CPUs.
At the minimum 1.2Gi and 1 cpu but performance will be reduced, but enough to handle WellLogs of 10 curves with 1M values each.

Note: container memory is not entirely dedicated to Dask workers, fastapi service with its process also require some.  


74
75
### Run the service locally

Luc Yriarte's avatar
Luc Yriarte committed
76
1. Create virtual environment in the wellbore project directory. This will create a folder inside of the wellbore project directory. For example: ~/os-wellbore-ddms/nameofvirtualenv
77

Luc Yriarte's avatar
Luc Yriarte committed
78
79
80
    ```bash
    # Windows
    python -m venv env
81

Luc Yriarte's avatar
Luc Yriarte committed
82
83
84
    # macOS/Linux
    python3 -m venv env
    ```
85
86
87

2. Activate the virtual environment

Luc Yriarte's avatar
Luc Yriarte committed
88
89
90
    ```bash
    # Windows
    source env/Scripts/activate
91

Luc Yriarte's avatar
Luc Yriarte committed
92
93
94
    # macOS/Linux
    source env/bin/activate
    ```
95

Luc Yriarte's avatar
Luc Yriarte committed
96
5. Install dependencies
97

Luc Yriarte's avatar
Luc Yriarte committed
98
99
100
    ```bash
    pip install -r requirements.txt
    ```
101

102
103
104
105
106
107
    Or, for a developer setup, this will install tools to help you work with the code.
    ```bash
    pip install -r requirements.txt -r requirements_dev.txt
    ```
    

Luc Yriarte's avatar
Luc Yriarte committed
108
6. Run the service
109

Luc Yriarte's avatar
Luc Yriarte committed
110
111
112
    ```bash
    # Run the service which will default to http://127.0.0.1:8080
    python main.py
113

Luc Yriarte's avatar
Luc Yriarte committed
114
115
116
    # Run on specific host, port and enforce dev mode
    python main.py --host MY_HOST --port MY_PORT --dev_mode 1
    ```
117

Luc Yriarte's avatar
Luc Yriarte committed
118
119
    If host is `127.0.0.1` or `localhost`, the dev_mode is automatically set to True.
    The only significant change if dev_mode is on, is that configuration errors at startup are logged but don’t prevent the service to run, and allow to override some implementations.
120

Luc Yriarte's avatar
Luc Yriarte committed
121
The hosts for the search and storage services have to be provided as environment variables, or on the command line.
122
123

```bash
Luc Yriarte's avatar
Luc Yriarte committed
124
python main.py -e SERVICE_HOST_STORAGE https://api.example.com/storage -e SERVICE_HOST_SEARCH https://api.example.com/search
125
126
127
128
129
130
```

### Connect and Run Endpoints

1. Generate bearer token as all APIs but `/about` require authentication.

131
    - Navigate to `http://127.0.0.1:8080/api/os-wellbore-ddms/docs`. Click `Authorize` and enter your token. That will allow for authenticated requests.
132
133
134
135


2. Choose storage option

136
    Even if the service runs locally it still relies on osdu data ecosystem storage service to store documents and google blob store to store binary data (`bulk data`). It is possible to override this and use your local file system instead by setting the following environment variables:
137

Luc Yriarte's avatar
Luc Yriarte committed
138
139
    - `USE_INTERNAL_STORAGE_SERVICE_WITH_PATH` to store on a local folder instead of osdu ecosystem storage service.
    - `USE_LOCALFS_BLOB_STORAGE_WITH_PATH` to store on a local folder instead of google blob storage.
140

Luc Yriarte's avatar
Luc Yriarte committed
141
142
143
144
    ```bash
    # Create temp storage folders
    mkdir tmpstorage
    mkdir tmpblob
145

Luc Yriarte's avatar
Luc Yriarte committed
146
147
    # Set your repo path
    path="C:/source"
148

Luc Yriarte's avatar
Luc Yriarte committed
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
    python main.py -e USE_INTERNAL_STORAGE_SERVICE_WITH_PATH $path/os-wellbore-ddms/tmpstorage -e USE_LOCALFS_BLOB_STORAGE_WITH_PATH $path/os-wellbore-ddms/tmpblob
    ```

3. Choose Cloud Provider

    - The code can be run with specifying environment variables and by setting the cloud provider. The accepted values are `gcp`, `az` or `local`. When a cloud provider is passed as an environment variables, there are certain additional environment variables that become mandatory.

### Setting the Cloud Provider Environment Variables

- The following environment variables are required when the cloud provider is set to GCP:
  - OS_WELLBORE_DDMS_DATA_PROJECT_ID: GCP Data Tenant ID
  - OS_WELLBORE_DDMS_DATA_PROJECT_CREDENTIALS: path to the key file of the SA to access the data tenant
  - SERVICE_HOST_SEARCH: The Search Service host
  - SERVICE_HOST_STORAGE: The Storage Service host

  ```bash
  python main.py -e CLOUD_PROVIDER gcp \
  -e OS_WELLBORE_DDMS_DATA_PROJECT_ID projectid \
  -e OS_WELLBORE_DDMS_DATA_PROJECT_CREDENTIALS pathtokeyfile \
  -e SERVICE_HOST_SEARCH search_host \
  -e SERVICE_HOST_STORAGE storage_host
  ```

- The following environment variables are required when the cloud provider is set to Azure:
  - AZ_AI_INSTRUMENTATION_KEY: Azure Application Insights instrumentation key
  - SERVICE_HOST_SEARCH: The Search Service host
  - SERVICE_HOST_STORAGE: The Storage Service host
  - SERVICE_HOST_PARTITION: The Partition Service internal host
  - KEYVAULT_URL: The Key Vault url (needed by the Partition Service)
  - USE_PARTITION_SERVICE: `enabled` when Partition Service is available in the environment. Needs to be `disabled` for `dev` or to run locally.

  ```bash
  python main.py -e CLOUD_PROVIDER az \
  -e AZ_AI_INSTRUMENTATION_KEY instrumentationkey \
  -e SERVICE_HOST_SEARCH search_host \
  -e SERVICE_HOST_STORAGE storage_host \
  -e SERVICE_HOST_PARTITION partition_host \
  -e KEYVAULT_URL keyvault_url \
  -e USE_PARTITION_SERVICE disabled
  ```

Note: If you're running locally, you may need to provide environmental variables in your IDE. Here is a sample for providing a `.env` file.

As default, all Core Services endpoint values are set to `None` in `app/conf.py`, you can update `.env` file for core services endpoints based on your cloud provider.

### Create a log record
195

Luc Yriarte's avatar
Luc Yriarte committed
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
To create a `WellLog` record, below is a payload sample for the POST `/ddms/v3/welllogs` API. The response will contain an id you can use to create some bulk data.

```json
[
  {
    "acl": {
      "viewers": [
        "data.default.viewers@{{datapartitionid}}.{{domain}}"
      ],
      "owners": [
        "data.default.owners@{{datapartitionid}}.{{domain}}"
      ]
    },
    "data": {
      "Curves": [
        {
          "CurveID": "GR_ID",
          "Mnemonic": "GR",
          "CurveUnit": "{{datapartitionid}}:reference-data--UnitOfMeasure:m:",
          "LogCurveFamilyID": "{{datapartitionid}}:reference-data--LogCurveFamily:GammaRay:"
        },
        {
          "CurveID": "POR_ID",
          "Mnemonic": "NPOR",
          "CurveUnit": "{{datapartitionid}}:reference-data--UnitOfMeasure:m:",
          "LogCurveFamilyID": "{{datapartitionid}}:reference-data--LogCurveFamily:NeutronPorosity:"
        },
        {
          "CurveID": "Bulk Density",
          "Mnemonic": "RHOB",
          "CurveUnit": "{{datapartitionid}}:reference-data--UnitOfMeasure:m:",
          "LogCurveFamilyID": "{{datapartitionid}}:reference-data--LogCurveFamily:BulkDensity:"
        }
      ],
      "WellboreID": "{{datapartitionid}}:master-data--Wellbore:{{wellboreId}}:",
      "CreationDateTime": "2013-03-22T11:16:03Z",
      "VerticalMeasurement": {
        "VerticalMeasurement": 2680.5,
        "VerticalMeasurementPathID": "{{datapartitionid}}:reference-data--VerticalMeasurementPath:MD:",
        "VerticalMeasurementUnitOfMeasureID": "{{datapartitionid}}:reference-data--UnitOfMeasure:ft:"
      },
      "TopMeasuredDepth": 12345.6,
      "BottomMeasuredDepth": 13856.25,
      "Name": "{{welllogName}}",
      "ExtensionProperties": {
        "step": {
          "unitKey": "ft",
          "value": 0.1
        },
        "dateModified": "2013-03-22T11:16:03Z"
Luc Yriarte's avatar
Luc Yriarte committed
246
      }
Luc Yriarte's avatar
Luc Yriarte committed
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
    },
    "id": "{{datapartitionid}}:work-product-component--WellLog:{{welllogId}}",
    "kind": "osdu:wks:work-product-component--WellLog:1.0.0",
    "legal": {
      "legaltags": [
        "{{legaltags}}"
      ],
      "otherRelevantDataCountries": [
        "US",
        "FR"
      ]
    },
    "meta": [
      {
        "kind": "Unit",
        "name": "ft",
        "persistableReference": "{\"scaleOffset\":{\"scale\":0.3048,\"offset\":0.0},\"symbol\":\"ft\",\"baseMeasurement\":{\"ancestry\":\"Length\",\"type\":\"UM\"},\"type\":\"USO\"}",
        "propertyNames": [
          "stop.value",
          "elevationReference.elevationFromMsl.value",
          "start.value",
          "step.value",
          "reference.unitKey"
        ],
        "propertyValues": [
          "ft"
        ]
      },
Luc Yriarte's avatar
Luc Yriarte committed
275
      {
Luc Yriarte's avatar
Luc Yriarte committed
276
277
278
279
280
281
282
        "kind": "DateTime",
        "name": "datetime",
        "persistableReference": "{\"format\":\"yyyy-MM-ddTHH:mm:ssZ\",\"timeZone\":\"UTC\",\"type\":\"DTM\"}",
        "propertyNames": [
          "dateModified",
          "dateCreated"
        ]
Luc Yriarte's avatar
Luc Yriarte committed
283
      }
Luc Yriarte's avatar
Luc Yriarte committed
284
285
286
287
288
    ]
  }
]
```

289
290
291
292
293
294
295

### Run with Uvicorn

```bash
uvicorn app.wdms_app:wdms_app --port LOCAL_PORT
```

296
Then access app on `http://127.0.0.1:<LOCAL_PORT>/api/os-wellbore-ddms/docs`
297
298
299
300
301
302
303
304
305
306

### Run with Docker

#### Build Image

```bash
# Set IMAGE_TAG
IMAGE_TAG="os-wellbore-ddms:dev"

# Build Image
Luc Yriarte's avatar
Luc Yriarte committed
307
docker build -t=$IMAGE_TAG --rm . -f ./build/dockerfile --build-arg PIP_WHEEL_DIR=python-packages
308
309
310
311
312
313
```

#### Run Image

1. Run the image

Luc Yriarte's avatar
Luc Yriarte committed
314
    Replace the LOCAL_PORT value with a local port
315

Luc Yriarte's avatar
Luc Yriarte committed
316
317
    ```bash
    LOCAL_PORT=<local_port>
318
319
320
    IMAGE_TAG=<image_name>
   
    docker run -d -p $LOCAL_PORT:8080 -e CLOUD_PROVIDER=local -e USE_LOCALFS_BLOB_STORAGE_WITH_PATH="/tmp" -e USE_INTERNAL_STORAGE_SERVICE_WITH_PATH="/tmp" -e OS_WELLBORE_DDMS_DEV_MODE=True -e USE_PARTITION_SERVICE=disabled $IMAGE_TAG
Luc Yriarte's avatar
Luc Yriarte committed
321
    ```
322

323
2. Access app on `http://127.0.0.1:<LOCAL_PORT>/api/os-wellbore-ddms/docs`
324
325
326
327
328

3. The environment variable `OS_WELLBORE_DDMS_DEV_MODE=1` enables dev mode

4. Logs can be checked by running

Luc Yriarte's avatar
Luc Yriarte committed
329
330
331
    ```bash
    docker logs CONTAINER_ID
    ```
332

333

334
335
336
337
### Run Unit Tests Locally

```bash
# Install test dependencies
338
pip install -r requirements.txt -r requirements_dev.txt
339
340
341
342
343
344

python -m pytest --junit-xml=unit_tests_report.xml --cov=app --cov-report=html --cov-report=xml ./tests/unit
```

Coverage reports can be viewed after the command is run. The HMTL reports are saved in the htmlcov directory.

345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
### Run Integration Tests locally

This example runs basic tests using the local filesystem for blob storage and storage service. There's no search or entilements service, everything runs locally.  

First, create the temp storage folders and run the service.

```bash
mkdir -p tmpstorage
mkdir -p tmpblob
python main.py -e USE_INTERNAL_STORAGE_SERVICE_WITH_PATH $(pwd)/tmpstorage -e USE_LOCALFS_BLOB_STORAGE_WITH_PATH $(pwd)/tmpblob -e CLOUD_PROVIDER local
```

In another terminal, generate a minimum configuration file and run the integration tests.

```bash
cd tests/integration
python gen_postman_env.py --token $(pyjwt --key=secret encode email=nobody@example.com) --base_url "http://127.0.0.1:8080/api/os-wellbore-ddms" --cloud_provider "local" --data_partition "dummy"
pytest ./functional --environment="./generated/postman_environment.json" --filter-tag=basic
```

For more information see the [integration tests README](tests/integration/README.md)

367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
### Manage package dependencies

Anytime, you may want to ensure your virtual environment is in sync with your requirements specification.
For this you can use:

```bash
pip-sync
```

If you want to work with other requirements file, you can specify them
```bash
pip-sync requirements.txt requirements_dev.txt
```

If you want to update `requirements.txt` to retrieve the most recent version, respecting bounds set in `requirements.in`, you can use:

```bash
pip-compile
```

387
388
389
390
391
392
If you want to update the version of only one dependency, for instance fastapi:

```bash
pip-compile --upgrade-package fastapi
```

393
394
For more information: https://github.com/jazzband/pip-tools/

395
396
### Debugging:
#### Port Forward from Kubernetes
397

Luc Yriarte's avatar
Luc Yriarte committed
398
399
 1. List the pods: `kubectl get pods`
 2. Port forward: `kubectl port-forward pods/POD_NAME LOCAL_PORT:8080`
400
 3. Access it on `http://127.0.0.1:<LOCAL_PORT>/api/os-wellbore-ddms/docs`
401

Luc Yriarte's avatar
Luc Yriarte committed
402
### Tracing
403

Luc Yriarte's avatar
Luc Yriarte committed
404
405
OpenCensus libraries are used to record incoming requests metrics (execution time, result code, etc...).
At the moment, 100% of the requests are saved.