Too many results returned after bagofwords feature
Hi,
When enabling the BagOfWords feature, some search query with a "query" filter return too many results.
I've reproduced the issue on several AWS environment, and I don't have this issue if the indexer is deployed with the Feature flag featureFlag.bagOfWords.enabled
set to False.
I have attached the 3 records and schema I used (these are from the os-search
integration tests in testing/integration-tests/search-test-core/src/main/resources/testData/records_1.json
)
( I didn't delete these 3 records from the main.osdu-gl.osdu.aws
environment, so if you have access to it, you should be able to reproduce these queries )
Once the records are indexed :
Issue a search
query with the following payload:
{
"kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
"query": "OFFICE9"
}
I have all 3 records returned, instead of 0 (there are no "OFFICE9" text in the 3 records)
Same if I use a "valid" query matching at least one record, for example
{
"kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
"query": "OFFICE4"
}
Also returns 3 records instead of one.
This issue seems to occurs only when using digit suffix. If I use a letter, it works properly, for example
{
"kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
"query": "OFFICEZ"
}
Properly returns 0 results.
I have managed to reproduce the issue directly on the elasticsearch server by using their REST API, so the issue is not with the Search service I think :
POST https://localhost:9200/opendes-search1704732571020-test-data--integration-1.0.1/_search (I'm using k8s port-forwarding to dircetly connect to the ES server) with the following payload
"from": 0,
"size": 10,
"timeout": "1m",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"query_string": {
"query": "OFFICE9"
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
}
]
}
}
}
Returns 3 results when BagOfWords is enabled, only 1 if not.