Changes

ashley kelham · 0e47b64d
--- a/How-to-become-a-DDMS.md
+++ b/How-to-become-a-DDMS.md
+## Characteristics of a DDMS
+A Domain Data Management Service (DDMS) can be seen as any source of truth for data that manages the data life cycle and satisfies certain data access concerns by integrating with the OSDU. 
+
+A DDMS could be a standalone service dedicated to a specific entity type or a subcomponent of an application or platform. As long as it satisfies the following criteria it can be considered a DDMS within OSDU
+
+*  Enforces legal compliance on the data within the DDMS based on OSDU compliance
+*  Enforces data access authorization based on OSDU Entitlements
+*  Makes the data discoverable through the OSDU search service
+*  Makes this data retrievable to other OSDU subscribers through a set of APIs exposed through the Registration service
+
+OSDU solves most of these concerns using Storage Records. A Storage Record is metadata pertaining to the bulk data stored in the DDMS. Every Storage Record created enforces that ACLs are assigned, checks compliance and then indexes the record into search, making it discoverable. 
+
+As long as the DDMS uses Storage Service correctly and exposes the data it stores through a registered set of APIs to consumers of the OSDU it can be classed as a DDMS.
+
+## Granularity
+One of the most important points for a DDMS to decide is at what granularity to apply these concerns on the data stored. 
+
+For instance a DDMS that relates only to seismic data may choose to ingest data using SEG-Y files. Then a single legal tag and ACL may be applied to the whole file during ingestion.
+
+However, it may want this data to be discoverable and retrievable for each trace individually. It would then need to create a separate storage record for each trace in the file so that each ne is individually discoverable and apply the same ACL and legal tag to each. It would then need to expose an API that allowed for retrieval of individual traces.
+
+It is important to think about these things up front so you make the correct decisions about how you expose these concerns and integrate with the OSDU core platform. 
+
+The following is the preferred implementation to enable the mandatory characteristics of a DDMS.
+
+## Register as a DDMS
+
+The first step is to register as a DDMS. This makes your DDMS discoverable to clients and presents them with an API definition that tells them how to retrieve the bulk data when a record from their DDMS is discovered. 
+
+The only API that needs to be defined is the one that tells the client how to retrieve the bulk data based on an Id.
+
+Note that you can register as much of your API specification as you like. You only need to define the method clients should use to retrieve the bulk data using the custom property x-ddms-retrieve-entity: true.
+
+<details><summary>curl</summary>
+
+```
+
+    curl --request POST \
+    --url '/api/register/v1/ddms' \
+    --header 'accept: application/json' \
+    --header 'authorization: Bearer <JWT>' \
+    --header 'content-type: application/json' \
+    --header 'Data-Partition-Id: common' \
+    --data '{
+    {
+    "id": "{DDMS-ID}",
+    "name": "logDDMS",
+    "description": "My test ddms.",
+    "contactEmail": "test@test.com",
+    "interfaces": [
+        {
+        "entityType": "wellbore",
+        "schema": {
+            "openapi": "3.0.0",
+            "info": {
+            "description": "This is a sample Wellbore domain DM service.",
+            "version": "1.0.0",
+            "title": "DELFI Data Ecosystem Wellbore Domain DM Service",
+            "contact": {
+                "email": "dataecosystem-sre@slb.com"
+            }
+            },
+            "servers": [
+            {
+                "url": "https://subsurface.data.delfi.slb.com/v1"
+            }
+            ],
+            "tags": [
+            {
+                "name": "wellbore",
+                "description": "Wellbore data type services"
+            }
+            ],
+            "paths": {
+            "/wellbore/{wellboreId}": {
+                "get": {
+                "tags": [
+                    "wellbore"
+                ],
+                "summary": "Find wellbore by ID",
+                "description": "Returns a single wellbore",
+                "operationId": "getWellboreById",
+                "x-ddms-retrieve-entity": true,
+                "parameters": [
+                    {
+                    "name": "wellboreId",
+                    "in": "path",
+                    "description": "ID of wellbore to return",
+                    "required": true,
+                    "schema": {
+                        "type": "string"
+                    }
+                    }
+                ],
+                "responses": {
+                    "200": {
+                    "description": "successful operation",
+                    "content": {
+                        "application/json": {
+                        "schema": {
+                            "$ref": "#/components/schemas/wellbore"
+                        }
+                        }
+                    }
+                    },
+                    "400": {
+                    "description": "Invalid ID supplied"
+                    },
+                    "401": {
+                    "description": "Not authorized"
+                    },
+                    "404": {
+                    "description": "Wellbore not found"
+                    }
+                }
+            }
+        }
+    }
+
+```
+
+</details>
+
+## Create a schema
+
+It is up to the DDMS to determine what properties of the bulk data they want to push into a Storage Record and to make discoverable within OSDU.
+
+They define a schema to represent this. The schema is a list of properties and the type of data they represent that will be on the Record.
+
+When deploying your service, you should do the one-time operation of publishing the schema via the Storage APIs. 
+
+<details><summary>curl</summary>
+
+```
+
+    curl --request POST \
+    --url '/api/storage/v2/schemas' \
+    --header 'accept: application/json' \
+    --header 'authorization: Bearer <JWT>' \
+    --header 'content-type: application/json' \
+    --header 'Data-Partition-Id: common' \
+    --data '{
+        "kind": "common:welldb:wellbore:1.0.0",
+        "schema": [
+            {
+            "path": "name",
+            "kind": "string"
+            },
+            {
+            "path": "ddmsId",
+            "kind": "string"
+            },
+            {
+            "path": "localId",
+            "kind": "string"
+            },
+            {
+            "path": "entityType",
+            "kind": "string"
+            }]
+    }'
+
+```
+
+</details>
+
+This will then allow any Record that references this schema to be indexed in the search. Without this, the Record will be published but without any of the data and it will be hidden by default in search.
+
+Notice, we are also declaring 3 properties:
+
+**ddmsId** is the id used when you register as a DDMS.
+
+**entityType** is the domain object type the data represents, e.g. ‘seismic’, ‘well’.
+
+**localId** is the id of the bulk data as it is referenced in your DDMS. The end user should be able to use this id to retrieve the bulk data from the DDMS API.
+
+These act as well-known properties that should be added to the record by your DDMS. Clients can then use this information to retrieve the data after discovery using the registered DDMS API. Every schema used by a DDMS should declare these properties.
+
+## Expose legal tag and ACLs through your APIs
+Unless you have a scenario where you know what legal tag and ACL should be applied to the data you are storing, you will need to expose the legal tag and ACL in the DDMS ingestion APIs. This allows your clients to supply the legal tag and ACL themselves.
+
+You can expose the same interface as the Storage records API, allowing you to assign them to the record you create.
+
+<details>
+
+```
+
+    "acl": {
+      "viewers": ['{viewer-acl}'],
+      "owners": ['{owner-acl}']
+    },
+    "legal": {
+      "legaltags": ['{legal-tag-name}']
+
+```
+</details>
+
+## Optionally expose compliance for derivative data through your APIs
+If you expect derivative data to be stored in your DDMS, you need to expose 2 more properties through your APIs that can be appended to your Storage record.
+
+Again, you can expose the same interface as the Storage records API, allowing you to assign them to the record directly. The 2 properties are:
+
+* otherRelevantDataCountries: The alpha 2 country code of the country where the derivative was created or calculated
+* parents: The record ids and versions of the Records this derivative was created from
+
+If a derivative is being created then a legal tag does not need to be assigned as it inherits this from its parents.
+
+<details><summary>curl</summary>
+
+```
+
+    "legal" :{
+                  "otherRelevantDataCountries": ["US"] 
+      },
+      "ancestry" :{
+                  "parents": ["{recordid1:version}", "{recordid2:version}"]
+      }    
+
+```
+</details>
+
+## Ingest data and create a shadow record
+
+Whenever bulk data is ingested, you need to create a shadow record within Storage. This shadow record represents a single retrievable entity from the DDMS (see Granularity section) in a 1:1 relationship.
+
+First, you should store the data in the DDMS, and then create the shadow Record. This way, a global piece of data is not discoverable before the data is available. If this is not successful, e.g. because an invalid legal tag is provided, the request will fail and you should return this response to the client and attempt to clean up the data in the DDMS.
+
+Remember, you should append your DDMS Id, entityType and the bulk data’s local id to the Storage record.
+
+<details><summary>curl</summary>
+
+```
+
+    curl --request PUT \
+    --url '/api/storage/v2/records' \
+    --header 'accept: application/json' \
+    --header 'authorization: Bearer <JWT>' \
+    --header 'content-type: application/json' \
+    --header 'Data-Partition-Id: common' \
+    --data '[
+    {
+        "kind": "common:welldb:wellbore:1.0.0",
+        "acl": {
+        "viewers": ['{viewer-acl}'],
+        "owners": ['{owner-acl}']
+        },
+        "legal": {
+        "legaltags": ['common-sample-legaltag'],
+        "otherRelevantDataCountries": ["FR”]
+        },
+        "data": {
+        "name": "well1",
+        "entityType": wellbore,
+        "ddmsId": "abcdef",
+        "localId": "123456"
+        }]'
+
+```
+
+</details>
+
+
+## Perform compliance and ACL checks using shadow records
+
+As mentioned, a DDMS should create a shadow record for each  retrievable entity ingested. This can have advantages beyond global discover-ability. Whenever you request a storage record, both compliance and entitlements are checked before returning the data. A DDMS can use this to their advantage.
+
+By forwarding on any request by the client to retrieve the record, you can delegate these responsibilities to the Storage service. If Data Ecosystem returns the Record, the client can access both this and the bulk data, and so you can return the same to the client or only the Record.
+
+<details><summary>curl</summary>
+
+```
+
+    curl --request POST \
+    --url '/api/storage/v2/query/records:batch' \
+    --header 'Authorization: Bearer <JWT>' \
+    --header 'Content-Type: application/json' \
+    --header 'Slb-Data-Partition-Id: common' \
+    --header 'slb-frame-of-reference: NONE' \
+    --data '{
+        "records": [
+            "common:test:fetchtest-1",
+            "common:test:fetchtest-2",
+            "common:test:fetchtest-4",
+            "common:test:fetchtest-5",
+            "common:test:fetchtest-6"
+
+        ]
+    }
+
+```
+
+</details>
+
+In this scenario, you also don’t need to store the ACL or legal tag information in your DDMS because those are being retrieved directly from the Data Ecosystem in this request. However, you need to either store or be able to generate the Storage record ID needed to retrieve the record for the bulk data requested. Therefore using a deterministic ID for the Storage Record is advisable.
+
+## Client retrieves the bulk data <a name="retrieve"></a>
+
+Imagine the client discovered a record with the following data
+
+<details><summary>curl</summary>
+
+```
+
+    "data": {
+      "name": "well1",
+      "entityType": wellbore,
+      "ddmsId": "abcdef",
+      "localId": "123456"
+    }
+
+```
+
+</details>
+
+They can use the ddmsId property of the data object to retrieve the API definition of the DDMS you registered at the start. 
+
+<details><summary>curl</summary>
+
+```
+
+    curl --request GET \
+    --url '/api/register/v1/ddms/abcdef' \
+    --header 'Authorization: Bearer <JWT>' \
+    --header 'Content-Type: application/json' \
+    --header 'Data-Partition-Id: common' 
+
+```
+</details>
+    
+This will return them the registered DDMS with the API specification. So if we retrieved the original specification we registered this would look like
+
+<details>
+
+```
+    {
+        "entityType": "wellbore",
+        "schema": {
+            "openapi": "3.0.0",
+            "info": {
+            "description": "This is a sample Wellbore domain DM service.",
+            "version": "1.0.0",
+            "title": "DELFI Data Ecosystem Wellbore Domain DM Service",
+            "contact": {
+                "email": "dataecosystem-sre@slb.com"
+            }
+            },
+            "servers": [
+            {
+                "url": "https://subsurface.data.delfi.slb.com/v1"
+            }
+            ],
+            "tags": [
+            {
+                "name": "wellbore",
+                "description": "Wellbore data type services"
+            }
+            ],
+            "paths": {
+            "/wellbore/{wellboreId}": {
+                "get": {
+                "tags": [
+                    "wellbore"
+                ],
+                "summary": "Find wellbore by ID",
+                "description": "Returns a single wellbore",
+                "operationId": "getWellboreById",
+                "x-ddms-retrieve-entity": true,
+                "parameters": [
+                    {
+                    "name": "wellboreId",
+                    "in": "path",
+                    "description": "ID of wellbore to return",
+                    "required": true,
+                    "schema": {
+                        "type": "string"
+                    }
+                    }
+                ],
+                "responses": {
+                    "200": {
+                    "description": "successful operation",
+                    "content": {
+                        "application/json": {
+                        "schema": {
+                            "$ref": "#/components/schemas/wellbore"
+                        }
+                        }
+                    }
+                    },
+                    "400": {
+                    "description": "Invalid ID supplied"
+                    },
+                    "401": {
+                    "description": "Not authorized"
+                    },
+                    "404": {
+                    "description": "Wellbore not found"
+                    }
+                }
+            }
+        }
+
+```
+</details>
+
+The client can then use the entityType property to work out which API definition to use as there may be more than one per registration. They then use the localId property to work out how to create the API call to retrieve the bulk data from the DDMS using the GET API defined.
+
+Using the returned specification and data we discovered, the resulting API call the user would be expected to make to retrieve the bulk data would be
+
+<details><summary>curl</summary>
+
+```
+    curl --request GET \
+    --url 'https://subsurface.data.delfi.slb.com/v1/wellbore/123456' \
+    --header 'Authorization: Bearer <JWT>' \
+    --header 'Content-Type: application/json' \
+    --header 'Data-Partition-Id: common' 
+
+```
+
+</details>
+
+Where 123456 is the localId stored in the records data and the URL is defined in the API spec in the server, and path sections of the operation that has the property x-ddms-retrieve-entity. 
+
+## Conclusion
+
+This is not the only implementation available to a DDMS it is the simplest. A DDMS can choose to use the Entitlements and legal compliance services directly. However this comes at greater implementation effort and so is not recommended unless necessary. However it is important to remember the standard characteristics a DDMS must exhibit to be classed as a DDMS within the OSDU.