ZGY Compression plug-in

Abstract

There was recently a question about using more modern compression algorithms in ZGY, using some kind of plug-in mechanism. Here is an explanation of how this might be implemented.

The request was, "just plug in the new compression code and see if it works".

It isn't quite that simple but it is definitely doable.

Details

You can stop reading now, unless you need to know why it is a bit more work.

The explanation below has also been added to the main ZGY documentation. As it also describes the existing situation.

Understand that regardless of method, this will require changes to the ZGY file format. If nothing else it will require a field telling which compression algorithm to use. As for the ZGY library, the plug-in mechanism needs to be added.

What I propose is to modify the "Uncompressed ZGY" format, allowing individual bricks to be compressed instead of stored directly. We might also allow lossless compression of the headers. I don't think this will be a lot of work.

The existing compression algorithm can (I believe) be extracted from the "Compressed ZGY" implementation and made into a plug-in. The legacy "Compressed ZGY" format would be supported only for read, and only in the closed-source accessor. Applications wanting to write compressed files directly can choose "Old Perel Compression" which is more or less the same thing but won't be 100% compatible.

If the above doesn't work, we can keep two distinct formats: a) Uncompressed or compressed via modern plug-in, and b) Compressed in the old fashion. This will be more costly to maintain though.

As far as OpenZGY is concerned, the new plug-in mechanism should definitely be included. We can choose whether to include the old compression algorithm as open source, closed source, or not at all. (Side note: It seems like OpenZGY might only be released to specific clients so the name might be misleading. Maybe just call it NewZGY. Or, worst case, TrashCanZGY).

Here is the reason why the more intuitive approach (modify the existing "Compressed ZGY" format) won't work.

In compressed ZGY, the separation between compression logic and general file access is not very clean. Details about our particular compression algorithms are visible in places they shouldn't be. The compressed zgy format was developed first. When we needed an uncompressed format it was not practical to make the existing compressed zgy also support uncompressed data. Because of the less-than-clean separation. So we ended up with a completely new format for uncompressed. Users of the API won't notice, but there is a switch at a very high level choosing which of the formats to handle. The two formats share some code. But seen from the archirecture point of view we have two separate formats with a common API.

The reasons for why the "Compressed ZGY" file format is a dead end are similar to the reasons why it couldn't just be adapted to handle uncompressed data.

Important assumption: The pluggable compression algorithm should be able to take a single block of bulk data and compress / decompress it without knowing about what the rest of the file contains. E.g. for huffman coding it cannot require frequency information from the entire survey. If this assumption fails then we need a very different approach.

Tasks:

Figure 1: Existing situation

image/svg+xml ZGY-Public API ZGY-Private API Synthetic Uncompressed Compressed Accessorimplementation Format switch(not pluggable) Accessorimplementationtainted withcompressionknowledge Compressionalgorithm Adding pluggable compression to ZGY Figure 1: Existing situation On-prem file Cloud Cache Back end switch(runtime plug-in) Back end switch(compile time plug-in) GCS Seismic Store

Figure 2: New design

image/svg+xml On-prem file Cloud Cache Back end switch(runtime plug-in) Back end switch(compile time plug-in) GCS Seismic Store ZGY-Public API ZGY-Private API Synthetic Uncompressed(or maybe not) Compressed(legacy, R/O) Accessorimplementation Format switch(not pluggable) Adding pluggable compression to ZGY Figure 2: New design Oldcompressionalgorithm Stub plug-in(uncompressed) Newalgorithms Compressionplug-in (new) Accessorimplementationtainted withcompressionknowledge Compressionalgorithm