Manifest size calculation is incorrect
Context: EDS ingest calculates the total size of the search results to determine if it should use manifest by reference DAG. If the size of the search results > 12MB, the manifest by reference DAG is used (if available).
Problem
When calculating the size of the search results, we noticed the calculated size is significantly greater than expected.
Testing & Root Cause
Below is the same request that EDS is making to the search API. The total size of content returned is 3.7MB (highlighted by the green box)
The code currently calculates file size using UTF-32 encoding
I added additional lines to calculate the size using UTF-8 encoding as that is the encoding of the response from the API.
The results show that UTF-8 encoding matches the content size reported by the API. UTF-32 encoding reports the content to be > 15MB (note: there is a typo in the log statement that should read UTF-32 instead of UTF-16)
As a result, this now triggers the large data ingestion flow when it doesn't need to.
Ultimately the request to send the manifest to the workflow service uses the default encoding (UTF-8). src/osdu_api/clients/base_client.py · master · OSDU / OSDU Data Platform / System / SDKs / Python SDK · GitLab
Is there a reason the manifest is being encoded as UTF-32 instead of UTF-8 when evaluating content size?

