SearchService.md 35.3 KB
Newer Older
1
2
3
4
5
## Search service

### Table of contents <a name="TOC"></a>
- [Introduction](#introduction)
- [Search API access](#search-api-access)
6
- [Permissions](#permissions)
7
8
9
- [Query](#query)
    * [Query by kind](#query-by-kind)
    * [Text Queries](#text-queries)
10
11
12
13
14
15
16
        + [Examples](#examples)
        + [Grouping](#grouping)
        + [Reserved characters](#reserved-characters)
        + [Wildcards](#wildcards)
        + [Query by `nested` arrays objects](#nested-queries)
    * [Aggregation](#aggregate-queries)
        + [Aggregation by `nested` arrays objects](#nested-aggregation)
17
    * [Sort](#sort-queries)
18
        + [Sort by `nested` arrays objects](#nested-sort)
19
20
    * [Range Queries](#range-queries)
    * [Geo-Spatial Queries](#geo-spatial-queries)
21
22
23
24
        + [Geo Distance](#geo-distance)
            - [Distance Units](#distance-units)
        + [Geo Bounding Box](#bounding-box)
        + [Geo Polygon](#geo-polygon)
25
26
27
    *  [Cross-Kind Queries](#cross-kind-queries)
- [Query With Cursor](#query-with-cursor)
- [Get indexing status](#get-indexing-status)
28
- [Version info endpoint](#version-info-endpoint)
29
30
31

## Introduction <a name="introduction"></a>

32
The Search API provides a mechanism for indexing documents that contain structured data. You can search an index, and organize and present search results. Documents and indexes are saved in a separate persistent store optimized for search operations. The Search API can index any number of documents.
33
34
35
36
37
38
39

The API supports full text search on string fields, range queries on date, numeric or string fields etc. along with geo-spatial search.

## Search API access <a name="search-api-access"></a>

* Required roles

40
41
  Search service requires that users have dedicated roles in order to use it. Users must be a member of `users.datalake.viewers` or `users.datalake.editors` or `users.datalake.admins`, roles can be assigned using the [Entitlements Service](/solutions/osdu/tutorials/core-services/entitlementsservice). Please look at the API documentation for specific requirements.

42
43
44
45
  In addition to service roles, users __must__ be a member of data groups to access the data.

* Required headers

46
  The OSDU Data Platform stores data in different partitions, depending on the different accounts in the OSDU system.
47

48
49
  A user may belong to more than one account.  As a user, after logging into the OSDU portal, you need to select the account you wish to be active.
  Likewise, when using the Search APIs, you need to specify the active account in the header called `Data-Partition-Id`. The correct `Data-Partition-Id` can be obtained from the CFS services. The `Data-Partition-Id` enables the search within the mapped partition. e.g.
50
  ```
51
  Data-Partition-Id: opendes
52
53
54
55
56
57
  ```

* Optional headers

  The Correlation-Id is a traceable ID to track the journey of a single request. The Correlation-Id can be a GUID on the header with a key. It is best practice to provide the Correlation-Id so the request can be tracked through all the services.
  ```
58
  Correlation-Id: 1e0fef08-22fd-49b1-a5cc-dffa21bc0b70
59
  ```
60
61
62
63
64
65
66
67
68
69
If the service is initiating the request, an ID should be generated. If the Correlation-Id is not provided, then a new ID will be generated by the service so that the request would be traceable.

[Back to table of contents](#TOC)

## Permissions <a name="permissions"></a>

| **_Endpoint URL_** | **_Method_** | **_Minimum Permissions Required_** | **_Data Permissions Required _** |
| --- | --- | --- | --- |
| /search/v2/query | POST | users.datalake.viewers | Yes |
| /search/v2/query_with_cursor | POST | users.datalake.viewers | Yes |
70
71
72
73
74

[Back to table of contents](#TOC)

## Query <a name="query"></a>

75
OSDU Data Platform search provides a JSON-style domain-specific language that you can use to execute queries. Query request URL and samples are as follows:
76

77
```json
78
79
POST /api/search/v2/query
{
80
  "kind": "opendes:welldb:wellbore:1.0.0",
81
  "query": "data.Status:Active AND nested(data.VerticalMeasurements)",
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
  "offset": 0,
  "limit": 30,
  "sort": {
    "field": ["id"],
    "order": ["ASC"]
  },
  "queryAsOwner": false,
  "spatialFilter": {
    "field": "data.Location",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 37.450727,
        "longitude": -122.174762
      },
      "bottomRight": {
        "latitude": 36.450727,
        "longitude": -122.174762
      }
    }
  },
  "returnedFields": [ "data.Status" ]
}
```

<details><summary>**Curl**</summary>

108
```bash
109
110
111
112
113
curl --request POST \
  --url '/api/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
114
  --header 'data-partition-id: opendes' \
115
  --data '{
116
  "kind": "opendes:welldb:wellbore:1.0.0",
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
  "query": "data.Status:Active",
  "offset": 0,
  "limit": 30,
  "sort": {
    "field": ["id"],
    "order": ["ASC"]
  },
  "queryAsOwner": false,
  "spatialFilter": {
    "field": "data.Location",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 37.450727,
        "longitude": -122.174762
      },
      "bottomRight": {
        "latitude": 36.450727,
        "longitude": -122.174762
      }
    }
  },
  "returnedFields": [ "data.Status" ]
}'
```
</details>

143
__Note:__ : It can take a delay of atleast 30 seconds once records are successfully ingested via Storage service to become searchable in OSDU Data Platform. You can check the [index status](#get-indexing-status).
144
145
146
147
148

#### Parameters <a name="parameters"></a>

| Parameter | Description |
| :--- | :--- |
149
150
| kind | The kind of the record to query e.g. 'opendes:welldb:wellbore:1.0.0'. kind is a __required__ field and can be formatted as authority/data-partition-id:data-source-id:entity-type:schema-version |
| query | Query string based on Lucene query string syntax, supplemented with a specific format for describing queries to fields of object arrays indexed with the `nested` hint. |
151
| offset | The starting offset from which to return results. |
neelesh thakur's avatar
neelesh thakur committed
152
| limit | The maximum number of results to return from the given offset. If no limit is provided, then it will return __10__ items. Max number of items which can be fetched by the query is __1000__. (If you wish to fetch large set of items, please use [query_with_cursor](#query-with-cursor) API). |
153
| sort | Allows you to add one or more sorts on specific fields. The length of fields and the length of order must match. Order value must be either ASC or DESC (case insensitive). For more details, ability and limitation about this feature, please refer to [Sort](#sort-queries)
154
155
| queryAsOwner | If true, the result only contains the records that the user owns. If false, the result contains all records that the user is entitled to see. Default value is false | 
| spatialFilter | A spatial filter to apply, please see [Geo-Spatial Queries](#geo-spatial-queries). |
neelesh thakur's avatar
rebase    
neelesh thakur committed
156
| trackTotalCount | Tracks accurate record count matching the query if 'true', partial count otherwise. Partial count queries are more performant. Default is 'false' and returns 10000 if matching records are higher than 10000. |
157
| aggregateBy | Allows user to get unique value of given field, please see [Aggregate Queries](#aggregate-queries). |
158
159
| returnedFields | The fields on which to project the results. |

160
161
> __Important:__ Field names in request parameters are case-sensitive.

162
163
164
165
__Note:__ Offset + Limit can not be more than the 10,000. See the [Query With Cursor](#query-with-cursor) for more efficient ways to do deep scrolling.

### Query by kind <a name="query-by-kind"></a>

166
"kind" can be formatted as authority/data-partition-id:data-source-id:entity-type:schema-version and a __required__ field. Available list of "kind" can be retrieved via Storage service(GET /query/kinds API). Users can make search documents just by providing "kind" as shown:
167

168
```json
169
170
POST /api/search/v2/query
{
171
  "kind": "opendes:welldb:wellbore:1.0.0"
172
173
174
175
}
```
<details><summary>**Curl**</summary>

176
```bash
177
178
179
180
181
curl --request POST \
  --url '/api/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
182
  --header 'data-partition-id: opendes' \
183
  --data '{
184
  "kind": "opendes:welldb:wellbore:1.0.0"
185
186
187
188
}'
```
</details>

189
The query will return 10 (default limit) documents for the kind.
190

191
Wildcard queries on kind are also supported, please look at [Cross-Kind Queries](#cross-kind-queries) for more information.
192

193
OSDU Data Platform indexer also splits "kind" and index each part individually. These terms can then be queried by `query` request parameter, e.g. `opendes:welldb:wellbore:1.0.0` will be indexed as `authority=opendes`, `source=welldb`, `namespace=opendes:welldb`, `type=well` and `version=1.0.0`. OSDU Data Platform can be now queried to search based on one these attributes.
194
195
196

### Text Queries <a name="text-queries"></a>

197
OSDU Data Platform provides comprehensive query options in [Lucene query syntax](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html). The query string is parsed into a series of terms and operators. A term can be a single word - "producing" or "well" - or a phrase, surrounded by double quotes - "producing well" - which searches for all the words in the phrase, in the same order. The default operator for query is __OR__.
198

199
A field in the document can be searched by using `<field-name>:<value>`. If field is not defined, then it defaults to all queryable fields; and the query will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields.
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232

The query language is quite comprehensive and can be intimidating at first glance, but the best way to actually learn it is to start with a few basic examples.

__Note:__ __kind__ is a required parameter and is omitted for brevity in following examples. Also, all storage record properties are in 'data' block, any reference to a field inside the block should be prefixed with 'data.'

#### Examples <a name="examples"></a>

* search all fields which contains text 'well'

```json
{
  "query": "well"
}
```

__Note:__ In absence of `<field-name>`, the query string will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields. Search query will be more performant if field name are specified in the query instead of searching across all queryable attribute. The following examples cover this:

* where the Basin field contains "Permian"

```json
{
  "query": "data.Basin:Permian"
}
```

* where the Rig_Contractor field contains "Ocean" or "Drilling". OR is the default operator

```json
{
  "query": "data.Rig_Contractor:(Ocean OR Drilling)"
}
```

233
or
234
235
236
237
238
239
240

```json
{
  "query": "data.Rig_Contractor:(Ocean Drilling)"
}
```

241
* where the Rig_Contractor field contains the exact `phrase` "Ocean Drilling"
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296

```json
{
  "query": "data.Rig_Contractor:\"Ocean Drilling\""
}
```

* where any of the fields ValueList.OriginalValue, ValueList.Value or ValueList.AppDataType contains "PRODUCING" or "DUAINE" (note how we need to escape the * with a backslash)

```json
{
  "query": "data.ValueList.\\*:(PRODUCING DUAINE)"
}
```

* where the field Status has any non-null value, use the \_exists\_ prefix for a field will search to see if the field exists

```json
{
  "query": "_exists_:data.Status"
}
```

#### Grouping <a name="grouping"></a>

Multiple terms or clauses can be grouped together with parentheses, to form sub-queries

```json
{
  "query": "data.Rig_Contractor:(Ocean OR Drilling) AND Exploration NOT Basin"
}
```

#### Reserved characters <a name="reserved-characters"></a>

If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for (1+1)=2, you would need to write your query as \\(1\\+1\\)\\=2.

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

__Note:__ < and > can’t be escaped at all. The only way to prevent them from attempting to create a [range query](#range-queries) is to remove them from the query string entirely.

#### Wildcards <a name="wildcards"></a>

Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters.

```json
{
  "query": "data.Rig_Contractor:Oc?an Dr*"
}
```

Be aware that wildcard queries can use an enormous amount of memory and therefore can effect the performance.  They should be used very sparingly.

297
__Note:__ Leading wildcards are disabled by OSDU Data Platform Search Service. Allowing a wildcard at the beginning of a word (e.g. "*ean") is particularly heavy, because all terms in the index need to be examined, just in case they match.
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328

#### Date Format <a name="date-format"></a>

If you need to use date in your query, it has to be in one of the following formats

```
 date-opt-time = date-element ['T' [time-element] [offset]]
 
 Example : 2017-12-29T00:00:00.987
 
 Please note that the time element is optional
```
```
 date-element = std-date-element 
  
 std-date-element  = yyyy ['-' MM ['-' dd]]
 
 Example: 2017-12-29
```
```
 time-element = HH [minute-element] | [fraction]
 
 minute-element = ':' mm [second-element] | [fraction]
   
 second-element = ':' ss [fraction]
  
 fraction = ('.' | ',') digit+
   
 offset = 'Z' | (('+' | '-') HH [':' mm [':' ss [('.' | ',') SSS]]])
```

329
330
331
332
333
334
335
For more info please refer to [Date format](http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--)

### Query by `nested` arrays objects <a name="nested-queries"></a>

Starting from OSDU version 0.9.0, we can set `nested` hints in data schemes object array nodes. It leads to accurate indexing of those arrays objects in the underlying search backend.

`nested` attributes can be queried via Search service in the form of the ```nested()``` function:
336
337

- for one level "nested array":
338

339
340
341
```json
nested(<path-to-root-nested-array-node>, <root-nested-array-object-fields-query>)
```
342

343
- for nested (multi-level) "nested array" queries
344

345
346
347
```json
nested(<path-to-root-nested-array-node>, nested(<path-to-subrootA-nested-array-node>, <subrootA-nested-array-object-fields-query>))
```
348

349
350
Multi-level nested queries are not limited in their depth. You nest them as required by the certain schema.

351
Several examples of the root and multi-level nested queries examples you can see in the below paragraphs. The syntax of those queries is the same we learned from the above sections. The only distinction is that their conditions are scoped by the own fields of objects of the array, pointed in the first argument of the current `nested(path,(conditions))` function.
352

353
354
355
356
357
For more details, ability and limitation about this feature, please refer to [ArrayOfObjects](#docs/tutorial/ArrayOfObjects.md).

#### Single-level one condition `nested` query

* where `work-product-component--WellboreMarkerSet` has any Marker with MarkerMeasuredDepth field value greater than 10000
358

359
360
```json
{
361
362
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0",
  "query":"nested(data.Markers, (MarkerMeasuredDepth:(>10000)))"
363
364
}
```
365
366
367
368
369

#### Single-level several conditions `nested` query

* where `work-product-component--WellboreMarkerSet` has any Marker with VerticalMeasurement field value greater than 100 and VerticalMeasurementPathID field value is "osdu-openness:reference-data--VerticalMeasurementPath:ELEV:"

370
371
```json
{
372
373
    "kind": "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0",
    "query": "nested(data.VerticalMeasurements, (VerticalMeasurement:(>100) AND VerticalMeasurementPathID:\"osdu-openness:reference-data--VerticalMeasurementPath:ELEV:\"))"
374
375
}
```
376
377
378
379
380

#### Combination of single-level `nested` queries

* where `work-product-component--WellboreMarkerSet` has any Marker with MarkerMeasuredDepth field value greater 10000 or SurfaceDipAzimuth field value less than 360

381
382
```json
{
383
384
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
  "query":"nested(data.Markers, (MarkerMeasuredDepth:(>10000))) OR nested(data.Markers, (SurfaceDipAzimuth:(<360)))"
385
386
}
```
387
388
389
390
391

#### Multi-level `nested` queries

Assume we have data.Markers Marker object has a nested "Revisions" array of Revision objects having two own fields: "RevisionDate" and "RevisionEngineer". An indexed document might then look like this:

392
```json
393
394
395
396
397
398
399
400
401
402
403
404
405
  "data": {
      ...
      "Markers": [
          {
          ...
          "MarkerMeasuredDepth": 12345.6,
          "PositiveVerticalDelta": 12345.6,
          "Revisions": [
            "RevisionDate": "2020-02-13T09:13:15.55+0000",
            "RevisionEngineer": "John Smith"
            ] 
          }
      ]
406
407
408
}
```

409
We then might wish to search for `work-product-component--WellboreMarkerSet` having any Marker revised on a certain date by a certain engineer:
410
411
412

```json
{
413
414
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
  "query":"nested(data.Markers, nested(data.Markers.Revisions, (RevisionDate:\"2020-02-13T09:13:15.55+0000\" AND RevisionEngineer:\"John Smith\")))"
415
416
417
}
```

418
419
#### Nested and non-nested queries parts combinations

420
We can combine both types of queries in one request, eg:
421

422
423
```json
{
424
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0",
425
426
427
428
  "query":"data.Name:\"Example Name\" AND nested(data.Markers, (MarkerMeasuredDepth:(>10000)))"
}
```

429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
__Note__: Supported boolean operator for `nested` queries are `AND`, `OR`, `NOT`. These operators are case-sensitives.

## Aggregation <a name="aggregate-queries"></a>

Allows user to get unique value of field specified by `aggregateBy` request parameter. It supports String, numeric & boolean fields. Maximum 1000 unique values can be returned by this request.

```json
{
  "kind": "opendes:welldb:*:*",
  "aggregateBy": "kind"
}
``` 

### Aggregation by `nested` arrays objects <a name="nested-aggregation"></a>

`nested` attributes can be aggregated by `nested(<path-to-root-nested-array-node>, <root-nested-array-object-fields-query>)` function.

```json
{
  "kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
  "aggregateBy": "nested(data.Markers, MarkerMeasuredDepth)"
}
```

For more details, ability and limitation about this feature, please refer to [ArrayOfObjects](#docs/tutorial/ArrayOfObjects.md).

455
## Sort <a name="sort-queries"></a>
456
457
458
Sort query allows to add one or more sorts on specific fields. Each sort can be reversed as well.

The sort feature supports string, int, float, double, long, datetime, nested object & nested array of objects. Sorting on array of string, geo-point & geo-shape type is not supported.
459

460
461
462
463
464
465
466
467
The records either do not have the sorted fields or have empty values that are listed last in the result.

E.g. Consider following scenarios:

1. opendes data partition has two kinds for welldb data source: opendes:welldb:wellbore:1.0.0 and opendes:welldb:well:1.0.0
2. data.Id in opendes:welldb:wellbore:1.0.0 has been ingested as INTEGER, but data.Id in opendes:welldb:well:1.0.0 has been ingested as TEXT
3. opendes:welldb:wellbore:1.0.0 has 10 records in total and 5 of them have empty value of data.Id field
4. opendes:welldb:well:1.0.0 also has 10 records in total and all of them have values in data.Id field
468
469
470

```json
{
471
  "kind": "opendes:welldb:*:*",
472
473
474
475
476
477
  "sort": {
    "field": ["data.Id"],
    "order": ["ASC"]
  }
}
``` 
478
The above request payload asks Search service to sort on "data.Id" in an ascending order, and the expected response will have "totalCount: 10" (instead of 20, please note that the 10 returned records are only from opendes:welldb:wellbore:1.0.0 because the data.Id in opendes:welldb:well:1.0.0 is of data type string, which is not currently supported - and therefore, will not be returned) and should list the 5 records which have empty data.Id value at last.
479

neelesh thakur's avatar
neelesh thakur committed
480
481
482
483
484
485
486
487
488
489
490
491
492
493
Search results are by default ordered by relevancy `_score` in descending order. Users are not required to provide any sort query for this. Users can also make request to query record in reverse relevancy order.

```json
{
  "kind": "*:*:*:*",
  "query": "well",
  "sort": {
    "field": ["_score"],
    "order": ["ASC"]
  }
}
```

**NOTE:** Search service does not validate the provided sort field, whether it exists or is of the supported data types. Different kinds may have attributes with the same names, but are different data types. Therefore, it is the user's responsibility to be aware and validate this in one's own workflow.
494
495

The sort query could be very expensive, especially if the given kind is too broad (e.g. "kind": "*:*:*:*"). The current time-out threshold is 60 seconds; a 504 error ("Request timed out after waiting for 1m") will be returned if the request times out. The suggestion is to make the kind parameter as narrow as possible while using the sort feature.
496

497
498
499
### Sort by `nested` arrays objects <a name="nested-sort"></a>

We generally have several objects in each `nested` array. The `mode` option (in sorting function: `nested(path, field, mode)`) controls what array value is picked for sorting the document it belongs to. For this, there is the third parameter "mode" of the . The `mode` option can have following values: min, max, avg.
neelesh thakur's avatar
neelesh thakur committed
500

501
In the following example we apply two levels of sorting by different fields of the `nested` Markers array objects. For the first level we use 'min' mode and then ASC sorting order, for the second level - 'max' mode and then DESC sorting order.
502
503
504

```json
{
505
"kind" : "osdu:wks:work-product-component--WellboreMarkerSet:1.0.0", 
506
507
508
509
510
511
512
"sort": {
    "field": ["nested(data.Markers, MarkerMeasuredDepth, min)", "nested(data.Markers, SurfaceDipAzimuth, max)"],
    "order": ["ASC", "DESC"]
  }
}
```

513
For more details, ability and limitation about this feature, please refer to [ArrayOfObjects](#docs/tutorial/ArrayOfObjects.md).
514

515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
## Range Queries <a name="range-queries"></a>

Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets `[min TO max]` and exclusive ranges with curly brackets `{min TO max}`. Here are some of the examples:

* All SpudDate in 2012

```json
{
  "query": "data.SpudDate:[2012-01-01 TO 2012-12-31]"
}
```

* Count 1..5

```json
{
  "query": "data.Count:[1 TO 5]"
}
```

* Count from 10 upwards

```json
{
  "query": "data.Count:[10 TO *]"
}
```

* Ranges with one side unbounded can use the following syntax

```json
{
  "query": "data.ProjDepth:>10"
}
```

* combine an upper and lower bound with the simplified syntax, you would need to join two clauses with an AND operator

```json
{
  "query": "data.ProjDepth:(>=10 AND <20)"
}
```

559
560
561
562
563
564
565
566
* jobStatus tags between IN_PROGRESS & SUCCESS

```json
{
  "query": "tags.jobStatus:{IN_PROGRESS TO SUCCESS}"
}
```

567
568
569
570
571
[Back to table of contents](#TOC)


## Geo-Spatial Queries <a name="geo-spatial-queries"></a>

572
OSDU Data Platform supports geo-point geo data which supports lat/lon pairs. `spatialFilter` and `query` group in the request have AND relationship. If both of the criteria are defined in the query, then the search service will return results which match both clauses.
573
574
575

The queries in this group are [Geo Distance](#geo-distance), [Geo Polygon](#geo-polygon) and [Bounding Box](#bounding-box). Only __one__ spatial criteria can be used while defining filter.

576
577
__Note:__ Geo-spatial fields (which are indexed with GeoJSON FeatureCollection payload) in Search service query response have different structure compared to Storage records and optimized for search use-case. These are no valid GeoJSON. To retrieve, valid GeoJSON please use Storage service's record API.

578
579
580
581
### Geo Distance <a name="geo-distance"></a>

Filters documents that include only hits that exist within a specific distance from a geo point.

582
```json
583
584
POST /api/search/v2/query
{
585
  "kind": "opendes:welldb:wellbore:1.0.0",
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
  "spatialFilter": {
    "field": "data.Location",
    "byDistance": {
      "point": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
        "distance": 1500
    }
  },
  "offset": 0,
  "limit": 30
}
```

<details><summary>**Curl**</summary>

603
```bash
604
605
606
607
608
curl --request POST \
  --url '/api/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
609
  --header 'data-partition-id: opendes' \
610
  --data '{
611
  "kind": "opendes:welldb:wellbore:1.0.0",
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
  "spatialFilter": {
    "field": "data.Location",
    "byDistance": {
      "point": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
        "distance": 1500
    }
  },
  "offset": 0,
  "limit": 30
}'
```
</details>

| Parameter | Description |
| :--- | :--- |
| field | `geo-point` field in the index on which filtering will be performed. |
| distance | The radius of the circle centered on the specified location. Points which fall into this circle are considered to be matches. The distance can be specified in various units. See [Distance Units](#distance-units) |
| point.latitude | latitude of field. |
| point.longitude | longitude of field. |

### Distance Units <a name="distance-units"></a>

If no unit is specified, then the default unit of the distance parameter is meter. Distance can be specified in other units, such as "1km" or "2mi" (2 miles).

__Note:__ In the current version, the Search API only supports distance in meters. In future versions, distance in other units will be made available. The maximum value of distance is 1.5E308.

### Bounding Box <a name="bounding-box"></a>

A query allowing to filter hits based on a point location within a bounding box.

645
```json
646
647
POST /api/search/v2/query
{
648
  "kind": "opendes:welldb:wellbore:1.0.0",
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
  "spatialFilter": {
    "field": "data.Location",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
      "bottomRight": {
        "latitude": 37.438485,
        "longitude": -122.156110
      }
    }
  },
  "offset": 0,
  "limit": 30
}
```

<details><summary>**Curl**</summary>

669
```bash
670
671
672
673
674
curl --request POST \
  --url '/api/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
675
  --header 'data-partition-id: opendes' \
676
  --data '{
677
  "kind": "opendes:welldb:wellbore:1.0.0",
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
  "spatialFilter": {
    "field": "data.Location",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 37.450727,
        "longitude": -122.174762
        },
      "bottomRight": {
        "latitude": 37.438485,
        "longitude": -122.156110
      }
    }
  },
  "offset": 0,
  "limit": 30
}'
```
</details>

| Parameter | Description | 
| :--- | :--- |
| field | `geo-point` field in the index on which filtering will be performed. |
| topLeft.latitude | latitude of top left corner of bounding box. |
| topLeft.longitude | longitude of top left corner of bounding box. |
| bottomRight.latitude | latitude of bottom right corner of bounding box. |
| bottomRight.longitude | longitude of bottom right corner of bounding box. |

### Geo Polygon <a name="geo-polygon"></a>

707
A query allowing to filter hits that only fall within a closed polygon.
708

709
```json
710
711
POST /api/search/v2/query
{
712
  "kind": "opendes:welldb:wellbore:1.0.0",
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
  "spatialFilter": {
    "field": "data.Location",
    "byGeoPolygon": {
      "points": [
        {"longitude":-90.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":28.56} 
      ]
    }
  },
  "offset": 0,
  "limit": 30
}
```

<details><summary>**Curl**</summary>

732
```bash
733
734
735
736
737
curl --request POST \
  --url '/api/search/v2/query' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
738
  --header 'data-partition-id: opendes' \
739
  --data '{
740
  "kind": "opendes:welldb:wellbore:1.0.0",
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
  "spatialFilter": {
    "field": "data.Location",
    "byGeoPolygon": {
     "points": [
        {"longitude":-90.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":35.56},
        {"longitude":-85.65, "latitude":28.56},
        {"longitude":-90.65, "latitude":28.56} 
      ]
    }
  },
  "offset": 0,
  "limit": 30
}'
```
</details>

| Parameter | Description | 
| :--- | :--- |
| field | `geo-point` field in the index on which filtering will be performed. |
| points | list of `geo-point` describing polygon. |


[Back to table of contents](#TOC)


## Cross-Kind Queries <a name="cross-kind-queries"></a>

770
OSDU Data Platform search supports cross-kind queries. A typical kind can be formatted as authority/data-partition-id:data-source-id:entity-type:schema-version. Each of the text partitioned by ':' can be replaced with wildcard characters to support cross-kind search.
771

772
* search across all data-source, types & versions for opendes
773
774
775

```json
{
776
  "kind": "opendes:*:*:*"
777
778
779
780
781
782
783
}
```

* search across all data-source, type well with schema version 1.0.0

```json
{
784
  "kind": "opendes:*:well:1.0.0"
785
786
787
}
```

788
* search across all types and versions for welldb namespace in opendes
789
790
791

```json
{
792
  "kind": "opendes:welldb:*:*"
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
}
```

[Back to table of contents](#TOC)


## Query With Cursor <a name="query-with-cursor"></a>

While a search request returns a single “page” of results, the `query_with_cursor` API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

Cursor API is not intended for real time user requests, but rather for processing large amounts of data.

The [parameters](#parameters) passed in the request body are exactly the same as the `query` API except for the offset and cursor values. Please note that offset is not a valid parameter in `query_with_cursor` API

__Note:__ The results that are returned from a `query_with_cursor` request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

809
In order to use the `query_with_cursor` request, initial search request should use the following endpoint:
810

811
```json
812
813
POST /api/search/v2/query_with_cursor
{
814
  "kind": "opendes:welldb:wellbore:1.0.0",
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
  "query": "data.Status:Active",
  "limit": 30,
  "spatialFilter": {
    "field": "data.Location",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 48.450727,
        "longitude": -122.174762
      },
      "bottomRight": {
        "latitude": 37.450727,
        "longitude": -122.174762
      }
    }
  },
  "returnedFields": [ "data.Status" ]
}
```

<details><summary>**Curl**</summary>

836
```bash
837
838
839
840
841
curl --request POST \
  --url '/api/search/v2/query_with_cursor' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
842
  --header 'data-partition-id: opendes' \
843
  --data '{
844
  "kind": "opendes:welldb:wellbore:1.0.0",
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
  "query": "data.Status:Active",
  "limit": 30,
  "spatialFilter": {
    "field": "data.Location",
    "byBoundingBox": {
      "topLeft": {
        "latitude": 48.450727,
        "longitude": -122.174762
      },
      "bottomRight": {
        "latitude": 37.450727,
        "longitude": 22.174762
      }
    }
  },
  "returnedFields": [ "data.Status" ]
}'
```
</details>

865
The successful response from the above request will include a "cursor", which should be passed to next call of `query_with_cursor` API in order to retrieve the next batch of results.
866

867
```json
868
869
POST /api/search/v2/query_with_cursor
{
870
  "kind": "opendes:welldb:wellbore:1.0.0",
871
872
873
874
875
876
  "cursor": "cursor-key"
}
```

<details><summary>**Curl**</summary>

877
```bash
878
879
880
881
882
curl --request POST \
  --url '/api/search/v2/query_with_cursor' \
  --header 'accept: application/json' \
  --header 'authorization: Bearer <JWT>' \
  --header 'content-type: application/json' \
883
  --header 'data-partition-id: opendes' \
884
  --data '{
885
  "kind": "opendes:welldb:wellbore:1.0.0",
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
  "cursor": "cursor-key"
}'
```
</details>

__Caution:__ As next batches of results are retrieved by `query_with_cursor` API, cursor value may or may not change. API users should not expect different cursor value in each `query_with_cursor` response.

__Note:__ To process the next `query_with_cursor` request, the search service keeps the search context alive for 1 minute, which is the time required to process the next batch of results. Each cursor request sets a new expiry time. The cursor will expire after 1 min and won't return any more results if the requests are not made in specified time.

[Back to table of contents](#TOC)


## Get indexing status <a name="get-indexing-status"></a>

Indexer service adds internal meta data to each record which registers the status of the indexing. The meta data includes the status and the last indexing date and time. This additional meta block helps to see the details of indexing. The format of the index meta block is as follows:

902
```json
903
904
905
906
907
908
909
910
911
912
"index": {
    "trace": [
        String,
        String
    ],
    "statusCode": Integer,
    "lastUpdateTime": Datetime
}
```
Example:
913
```json
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
{
    "results": [
        {
            "index": {
                "trace": [
                    "datetime parsing error: unknown format for attribute: endDate | value: 9000-01-01T00:00:00.0000000",
                    "datetime parsing error: unknown format for attribute: startDate | value: 1990-01-01T00:00:00.0000000"
                ],
                "statusCode": 400,
                "lastUpdateTime": "2018-11-16T01:44:08.687Z"
            }
        }
    ],
    "totalCount": 31895
} 
```

Details of the index block:
1) trace: This field collects all the issues related to the indexing and concatinates using '|'. This is a String field.
2) statusCode: This field determines the category of the error. This is integer field. It can have the following values:
    * 200 - All OK
935
936
    * 404 - Schema is missing in Schema and Storage service
    * 400 - Some fields were not properly mapped with the schema defined, e.g. schema defined as `int` for field but input record had attribute value as `text`
937
938
939
940
3) lastUpdateTime: This field captures the last time the record was updated by by the indexer service. This is datetime field so you can do range queries on this field.

You can query the index status using the following example query:

941
```bash
942
943
944
945
curl --request POST \
  --url /api/search/v2/query \
  --header 'Authorization: Token' \
  --header 'Content-Type: application/json' \
946
  --header 'Data-Partition-Id: Data partition id' \
947
948
949
950
951
952
953
954
  --data '{"kind": "*:*:*:*","query": "index.statusCode:404","returnedFields": ["index"]}'
  
NOTE: By default, the API response excludes the 'index' attribute block. The user must specify 'index' as the 'returnedFields" in order to see it in the response.
```
The above query will return all records which had problems due to fields mismatch.

[Back to table of contents](#TOC)

955
## Version info endpoint
956

957
Provides build and git related information for Search service.
958
959

#### Example response:
960

961
```json
962
GET /api/search/v2/info
963
964
{
    "groupId": "org.opengroup.osdu",
965
    "artifactId": "search-gcp",
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
    "version": "0.10.0-SNAPSHOT",
    "buildTime": "2021-07-09T14:29:51.584Z",
    "branch": "feature/GONRG-2681_Build_info",
    "commitId": "7777",
    "commitMessage": "Added copyright to version info properties file",
    "connectedOuterServices": [
      {
        "name": "elasticSearch",
        "version":"..."
      },
      {
        "name": "redis",
        "version":"..."
      }
    ]
}
```
983
984

This endpoint takes information from files generated by `spring-boot-maven-plugin`, `git-commit-id-plugin` plugins. Need to specify paths for generated files to matching properties:
985
986
987
988
- `version.info.buildPropertiesPath`
- `version.info.gitPropertiesPath`

[Back to table of contents](#TOC)