CRS Catalog does not return all records but max 1000
@fhoueto.amz @gehrmann @debasisc @josh.townsend @sigridmatthes @MonicaJohns
-
discuss issue and agree option to fix -
fix/demonstrate coordinate-reference-system works -
fix/demonstrate coordinate-transformation works -
describe clearly in swagger and tutorial what these endpoints do (i.e., paginate to get all results. Also check it clearly says the invalid data are excluded automatically) -
closeout
The CRS Catalog was specified and intents to make life easier for developers not used to working with CoordinateReferenceSystem. In general use cases applications would want to fetch a list of all CRS names and then display/sort/filter them to present to a user to associate with a project or dataset. Therefore it by default is specified to return all records, as described on swagger. However this is not what happens. Consider following queries as examples
{{osduonaws_base_url}}/api/crs/catalog/v3/coordinate-reference-system
{
"codeSpace": "EPSG"
}
{{osduonaws_base_url}}/api/crs/catalog/v3/coordinate-reference-system
{
"limit": 99999
}
{{osduonaws_base_url}}/api/crs/catalog/v3/coordinate-reference-system
{
"returnBoundProjectedAndProjectedBasedOnWgs84": true
}
All state at the end of response body it says "totalCount": >1000 for these cases, but when counting the number of actually returned records it is 1000. I.e., the API is supposed to return approximately 1203 records for the latter case with the standard OSDU loaded CoordinateReferenceSystems.
The reason for this issue is that the Search service query API returns max 1000 records no matter what limit is specified, and the CRS catalog is wrapping to the search query. It is likely the current query set "limit": 99999.
The solutions to fix this problem could be:
-
OSDU to increase the max. limit to 10000 or 100000. For CRS data, there are a few thousand records so it would really solve the issue if this limit was removed or increased to 10000. Please see the attached email for some notes on this (this would also help people who use ordinary search). Attachment: https___osdu-community_ideas_aha_io_ideas_IDEA-I-75.msg
-
Let catalog/v3/coordinate-reference-system API make the first call as currently is done with a query. But check the TotalCount returned (e.g., TotalCount=1203) against the number of records actually retrieved (e.g., CurrentCount=1000). Then as long as the CurrentCount is less than TotalCount, keep fetching more data using the offset parameter. When done, return all.
-
Similar to 2, but instead of query, use query_by_cursor. I don't see any benefit or difference between this and the offset method described above.
To get this resolved, clearly item 2 can be done and controlled and will work as quick fix. But it does appear in the bigger picture perhaps that a platform solution (1) is better. A decision/advice is needed from PMC/CSPs what approach should be implemented.
In the meantime, a workaround is obviously if caller monitors and checks retrieved records vs. reported totalCount. However, the purpose of the CRS Service is that developers do not have to do this.
To explain solution 2 above, I imagine currently the code does like
CRS_list = search/query { some_query }
return CRS_list
And bug fix would be like following I imagine to keep fetching records until are retrieved:
CRS_list = search/query { some_query }
MyCount = len(number of fetched records)
while MyCount<totalCount
CRS_list += search/query { some_query, offset=MyCount} // Maybe offset=MyCount +1 or -1 depending if the first record is 0 or 1 (I assume it is zero based in which case in first call you would get records 0 to 999, and offset by MyCount=1000 would give records 1000 to 1999 in second call, etc. but one should not rely on the current max limit that is set but simply count what gets returned vs the total number of records).
MyCount += len(number of fetched records) // Append the new records
endwhile
return CRS_list