Search service using wildcards in query field fail to retrieve results
When we send /query requests to search using wildcard *, it fails to retrieve all results. A simple to reproduce case is when * wildcard must match just one character. In that case, the record is not retrieved; this is inconsistent with Lucene wildcard documentation, as it states that * wildcard matches from 0 to n characters.
This occurs in M20 fyi.
Steps to reproduce:
- Create a data partition e.g. osdu
- Create a user
- Add him to users.data.root entitlements group in uqfrhblcek
- Check record exists: Invoke the /query endpoint as that user, with data-partition-id header set to osdu and with no wildcard
curl 'https://{baseurl}/api/search/v2/query'
-H 'authorization: Bearer {authtoken}'
-H 'content-type: application/json; charset=UTF-8'
-H 'data-partition-id: osdu'
--data-raw '{"kind":":::","query":"data.FacilityName:"BIR-09"","returnedFields":["data.FacilityName"]}'
--compressed
A record will be retrieved:
{ "results": [ { "data": { "FacilityName": "BIR-09" } } ], "aggregations": null, "totalCount": 1 }
- Retrieve the record using * wildcard for multiple characters: Invoke the /query endpoint as that user, with data-partition-id header set to osdu and with wildcard for more than one character
curl 'https://{baseurl}/api/search/v2/query'
-H 'authorization: Bearer {authtoken}'
-H 'content-type: application/json; charset=UTF-8'
-H 'data-partition-id: osdu'
--data-raw '{"kind":":::","query":"data.FacilityName:"BIR-*"","returnedFields":["data.FacilityName"]}'
--compressed
Multiple records will be retrieved
{ "results": [ { "data": { "FacilityName": "BIR-03" } }, { "data": { "FacilityName": "BIR-02" } }, { "data": { "FacilityName": "BIR-01" } }, { "data": { "FacilityName": "BIR-12" } }, { "data": { "FacilityName": "BIR-10" } }, { "data": { "FacilityName": "BIR-09" } }, { "data": { "FacilityName": "BIR-06" } }, { "data": { "FacilityName": "BIR-04" } }, { "data": { "FacilityName": "BIR-07" } }, { "data": { "FacilityName": "BIR-05" } } ], "aggregations": null, "totalCount": 14 }
- Retrieve the record using * wildcard for multiple characters: Invoke the /query endpoint as that user, with data-partition-id header set to osdu and with wildcard for more than one character
curl 'https://{baseurl}/api/search/v2/query'
-H 'authorization: Bearer {authtoken}'
-H 'content-type: application/json; charset=UTF-8'
-H 'data-partition-id: osdu'
--data-raw '{"kind":":::","query":"data.FacilityName:"BIR-0*""}'
--compressed
No records will be retrieved
{ "results": [], "aggregations": [], "totalCount": 0 }
- Additional case when retrieving records using * wildcard for multiple characters does not retrieve all records: Invoke the /query endpoint as that user, with data-partition-id header set to osdu and with wildcard for more than one character
curl 'https://{baseurl}/api/search/v2/query'
-H 'authorization: Bearer {authtoken}'
-H 'content-type: application/json; charset=UTF-8'
-H 'data-partition-id: osdu'
--data-raw '{"kind":":::","query":"data.FacilityName:"B*-09"","returnedFields":["data.FacilityName"]}'
--compressed
Multiple records will be retrieved, but some are missing (e.g.: BIR-09)
{ "results": [ { "data": { "FacilityName": "K10-B-09" } }, { "data": { "FacilityName": "L10-B-09" } }, { "data": { "FacilityName": "K12-B-09" } }, { "data": { "FacilityName": "P15-RIJN-B-09" } } ], "aggregations": null, "totalCount": 4 }