Skip to content

Performance work

Paal Kvamme requested to merge kvamme62/small-cache into master

Add a 256 KB cache to speed up opening a file on the cloud.

ZGY files have multiple consecutive headers that are read one at a time. The reason for this is that the size of each header might not be known until the previous header has been read. On the cloud this is really inefficient because reading less than 2 MB usually has the same round trip time for 8 bytes as for 2 MB.

To combat this, implement a very simple short lived cache to be used only while opening a ZGY file. There are no issues with stale data or eviction since the cache is only active during that function call.

This change means that in almost all cases all headers will be read at once. For reading on-prem file this makes no practical difference. For cloud access this save time.

To keep the implementation simple the cache will not cover all possible cases.

  • If the file is huge, all the headers might not fit in 256 KB. In that case there will be multiple read requests but still less than whet is needed is there is no cache.

  • The choice of 256 KB is done because OpenZGY will pad the header area to a multiple of the brick size. Which for older ZGY files can never be less than 646464 samples times one byte per sample 1.e. 256 KB. When written to the cloud the header area can be written in a separate segment. This means that requesting e.g. 1 MB for headers may in the low level code need to be split into two requests. This isn't an error, it is just inefficient. Files written by OpenZGY may have smaller brick sizes so this scenario is still possible. But very unlikely because smaller block sizes will likely cause a severe performance issue will reading bulk. In that case you have much bigger problems than a single extra read on open. Could I have solved the issue by querying the size of the first segment? Yes, but... keep it simple.

Merge request reports