Poor performance of schema info list endpoint and uniqueness check
Performance issue related to get schema info list request and uniqueness check in the schema creation process:
curl --location --request GET 'localhost:8080/api/schema-service/v1/schema?authority=SchemaSanityTest' \
--header 'Data-Partition-Id: osdu'
This request can use the offset
and limit
parameters, it is ok when these parameters are used at the data access layer,
but in the Schema service they were used at the core level by design:
Also, this logic is used during schema creation, the same methods used to verify schema uniqueness and whether breaking changes are present or not.
This leads to loading a lot of unwanted data, for example, the query presented in the example will fetch over 6500 schema information from the GCP dev env, but by default, they will be discarded in the core service and only 100 records will be returned in the response.
Previously issues were spotted at GCP and Azure envs, to fix GCP we manually delete schemas created by IT's, my guess is that the Azure team does the same:
Suggestion for the fix is to pass limit
and offset
parameters to the provider level and use them directly.