This is version 0.4 of the document, last updated 2021-07-23.

Copyright

Copyright 2017-2021, Schlumberger

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Intended Audience

This document is written for developers that need to understand some of the finer points about ZGY. It contains notes on several different topics. This is not the place to start if you just want an introduction. The document should be considered as an extension of the comments in the source code. If you are not a person that is comfortable reading Python and C++ source code then you are of course still welcome to continue reading. But you will likely find the content boring and pedantic.

OpenZGY implementation notes

Terminology

ZGY-Public, ZGY-Cloud and ZGY-Internal refer to the old closed source ZGY library. Having three different apis is not a good idea. This is part of the technical debt that I am trying to remove.

ZGY-Public
is the old api we allow external clients to use. Also used internally by e.g. the ML project. Available in C++ and as a Python wrapper. This api has no knowledge of seismic store whatsoever. The ZGY-Public mdule was available in binary form in the "zgy" folder but has been logically deleted. If you need it then do a "git checkout" to an older date.
ZGY-Internal
is a richer API with some difficult, deprecated, and dangerous entry points. C++ only. Used by Petrel.
ZGY-Cloud
is a plug-in that, just by being linked into the application, makes it possible to use the ZGY-Public or the ZGY-Internal API for accessing files on the cloud. In theory this is a near perfect separation of concerns. In practice the separation is too good. Application code does need to be aware of the cloud at some level. So the ZGY-Cloud module, in addition to being a plug-in, comes with its own api. This is where the problems start; trying to work around that "perfect isolation".
OpenZGY
will replace all three of the above.

Differences in the OpenZGY implementation compared to ZGY-Public:

Some the limitations listed here are already enforced by the existing ZGY-Public API, so they only affect Petrel which uses the ZGY-Internal API instead.

How the old accessor deals with default values:

Regarding alpha tiles:

Regarding writes:

Resizing a survey:

Multiple passes:

Challenges for writing files with lossy compression.

Explaining multi resolution a.k.a. LOD data:

Details of the seismic store access:

Zero centric coding range:

If floating point bulk data can contain both positive and negative values we must assume that zero has a special importance. This is definitely the case for seismic data. A float zero converted to int and back to float must remain precisely zero to avoid introducing a bias.

The linear transform between actual sample values and the integers in storage is specified as the "coding range" which is often just set to the min and max sample values in the entire survey. In most cases this will not end up zero centric by itself. So the coding range may need to be adjusted slightly.

Example: Consider the case where samples [-1.0 to 3.0> are stored as 200, [3.0, 7.0> are stored as 201, et cetera.

       199     200     201     202     (int8)
    +-------+-|-----+-------+-------+
    -5     -1 0    +3      +7      +11 (float)
    

So, a floating point value of zero is stored as (int8)200. When read back the (int8)200 is known to correspond to something between -1 and +3. So it is assumed to be the average of those limits, i.e. (float)+1. Not what we want to see.

OpenZGY compression

Compression in OpenZGY is not the same as the old compressed ZGY format. The old implementation treated compressed ZGY as a completely different format that just happened to have the same API as uncompressed. In OpenZGY there is just one file format. Initially identical to uncompressed ZGY, but with a few small changes to allow individual bricks to be compressed.

Historical note: The reason this was not done before is that the compressed ZGY format is much older than the uncompressed version and was already well established when the uncompressed format was introduced. Extending the old compressed format to also allow bricks to be stored uncompressed was not feasible. Taking the time to make the newer ZGY format also support compression was not felt to be urgent.

OpenZGY does not compress headers, only data blocks. Header compression might be added in the future. But that compression is unlikely to make the file noticeably smaller. Just more difficult to parse.

OpenZGY compression is handled by a plug-in to make it fairly simple to add more algorithms in the future. It is highly recommended that any new algorithm is included in the OpenZGY source tree. This ensures that OpenZGY always knows how to decode each brick. If this is not possible then the OpenZGY files written with an unrecognized compressor are essentially a proprietary format.

OpenZGY makes a few assumptions about the compressor:

Comparing different compression algorithms

We would like to know the relationship between SNR (quality, as described by signal to noise ratio) and compression (compressed size as a percentage of the size in uncompressed float32). An algorithm that achieves better compression for the same SNR is probably better. But this is tricky to measure.

There is no easy way of calculating the quality. What really matters is whether the noise is low enough to not affect the workflows the data is needed for. E.g. autotracking. Expressing this using a formula is not possible. Even the term signal to noise ratio is ambiguous. The choice of how the SNR is computed might affect which algorithm appears to be best.

Compression results are also strongly affected by the input data. Data with fewer high frequency components generally compress better. Data that was compressed in an unfortunate manner might compress worse.

Example: You have a float cube that unbeknownst to you is a straight copy of an int16 cube. Even the most basic lossless compressor ought to achieve at least 2x compression bringing the file size back to what it was when stored as 16 bits. A fancier algorithm might not manage this since it might be optimized for the more general cases. When comparing algorithms this means it is difficult to find a "fair" test data set.

Data ranges

ZGY uncompressed v3 files have a statistical range, coding range, and histogram range.

The statistical range works as follows:

The coding range works as follows:

The histogram range works as follows:

Read only ZGY files

Seismic Store has the ability to set a file to read-only mode. This may help performance because less locking is needed and more caching is possible.

The initial plan was to treat all ZGY files on the cloud as immutable. But requirements have changed. OpenZGY now allows updating an existing file in some situations. The requirements / use cases that need to be supported are still not clearly defined. So I will try to define them here and see if anybody protests.

Current implementation

Three additional settings have been added to the IOContext.

setRoAfterWrite(bool)

Set the ZGY file to read-only when done writing it. Has no effect on files opened for read. Defaults to on. Most applications will want to turn in this on because most applications do not expect to update ZGY files in place.

forceRoBeforeRead(bool)

Sneak past the mandatory locking in SDAPI by forcing the read-only flag to true on the ZGY file, if needed, on each open for read. This allows for less overhead, more caching, and use of the altUrl feature. This option is useful if the file is logically immutable but was not flagged as such. E.g. the creator forgot to call setRoCloseWrite(true), or the ZGY file was not created by OpenZGY. The option has no effect on files opened for create or update. Caveat: Allowing a read-only open to have a permanent effect of the file being opened is not ideal.

forceRwBeforeWrite

Dangerous option. Sneak past the mandatory locking in SDAPI by forcing the read-only flag to false on the ZGY file, if needed, that is about to be opened for update. The application must know that the file is not open for read by somebody else. There is also a risk that data might exists in cache even for a closed file. The application assumes all responsibility.

Files created by the old ZGY-Cloud library will still be left writable. This means that altUrl will not work for those, unless forceRoBeforeRead is in effect. Hopefully applications will move away from the deprecated ZGY-Cloud fast enough that this will not become a problem.