Too many results returned after bagofwords feature

Hi,

When enabling the BagOfWords feature, some search query with a "query" filter return too many results. I've reproduced the issue on several AWS environment, and I don't have this issue if the indexer is deployed with the Feature flag featureFlag.bagOfWords.enabled set to False.

I have attached the 3 records and schema I used (these are from the os-search integration tests in testing/integration-tests/search-test-core/src/main/resources/testData/records_1.json)

records.json schema.json

( I didn't delete these 3 records from the main.osdu-gl.osdu.aws environment, so if you have access to it, you should be able to reproduce these queries )

Once the records are indexed :

Issue a search query with the following payload:

{
    "kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
    "query": "OFFICE9"
}

I have all 3 records returned, instead of 0 (there are no "OFFICE9" text in the 3 records)

Same if I use a "valid" query matching at least one record, for example

{
    "kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
    "query": "OFFICE4"
}

Also returns 3 records instead of one.

This issue seems to occurs only when using digit suffix. If I use a letter, it works properly, for example

{
    "kind": "opendes:search1704732571020:test-data--Integration:1.0.1",
    "query": "OFFICEZ"
}

Properly returns 0 results.

I have managed to reproduce the issue directly on the elasticsearch server by using their REST API, so the issue is not with the Search service I think :

POST https://localhost:9200/opendes-search1704732571020-test-data--integration-1.0.1/_search (I'm using k8s port-forwarding to dircetly connect to the ES server) with the following payload

    "from": 0,
    "size": 10,
    "timeout": "1m",
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "must": [
                            {
                                "query_string": {
                                    "query": "OFFICE9"
                                }
                            }
                        ],
                        "adjust_pure_negative": true,
                        "boost": 1.0
                    }
                }
            ]
        }
    }
}

Returns 3 results when BagOfWords is enabled, only 1 if not.

Edited Jan 08, 2024 by Chad Leong
Assignee Loading
Time tracking Loading