Commit 082b183e authored by Paal Kvamme's avatar Paal Kvamme
Browse files

Update documentation.

parent d0f3adf8
Pipeline #43314 passed with stages
in 7 minutes and 38 seconds
......@@ -234,8 +234,8 @@ tests to write to.
|Package |linux|windows|read|write|update|seisstore|zfp compress|old compress|
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|OpenZGY/C++ |y|y|y|y|-|y|y|N/A|
|OpenZGY/C++ Python wrapper |y|-|y|y|-|y|y|N/A|
|OpenZGY/C++ |y|y|y|y|y|y|y|N/A|
|OpenZGY/C++ Python wrapper |y|-|y|y|y|y|y|N/A|
|OpenZGY/Python |y|y|y|y|N/A?|linux|y|N/A|
|ZGY-Public, ZGY-Cloud |y|y|y|y|y|y|N/A|y|
|Old Python wrapper |y|y|y|y|y|y|N/A|y|
......
......@@ -396,254 +396,211 @@ Other issues: More work for apps.
C++.
</p>
<h2>Support for ZGY file partial update</h2>
<p>
We need to understand how OpenZGY clients are planning to use
the library before deciding how flexible we need to be when it
comes to updating of existing files. If supported at all.
If supported then the next question is how good we need to
be on incremental updating of low resolution bricks.
</p>
<p>
Applications wanting update support for a file stored on the
cloud should normally be able to delete and re-create the file
instead. Unless the file is being used as some kind of working
storage that is updated frequently. If the file is updated
<i>really</i> frequently it might even be needed to defer
updating low resolution data until actually accessed.
</p>
<p>
Switching to Cloud enabled ZGY access <b>cannot be made
completely transparent</b> to application code. Especially not
if updating parts of a file. The main problem is that neither
GCS buckets nor Azure storage allow updates. We can allow
updates in the OpenZGY API but each time the ZGY file is
updated the file size on the cloud will grow.
Another can of worms, outside the scope of this document,
is the need for <b>read and write locks</b> for files on the
cloud if ZGY files cannot be treated as immutable.
</p>
<p>
For similar reasons OpenZGY will also leak disk space if a
compressed file is allowed to be opened for update. For
compressed data this happens both in cloud storage and
on-prem.
</p>
<p>
Update support has no obvious issues for on-prem. But do bear
in mind one of the early design decisions: Avoid spending time
implementing something that will only work on-prem. Because
soon everybody will be 100% on the cloud anyway.
</p>
<p>
Supporting any kind of partial update means that OpenZGY will
either need to re-generate all the statistics, histogram, and
low resolution bricks, or it will need to do some kind of
incremental update. This is the topic of the explanation that
follows.
</p>
<table border="1" style="border-collapse: collapse">
<tr>
<td style="padding-top: 20px; padding-bottom: 20px">
<img src="update-lowres-fig2-1.png"
width="257" height="172" alt="Alternative 1"/>
</td>
<td>
<dl>
<dt style="padding-bottom: 20px; padding-left: 10px">
All low resolution bricks get updated.
All full resolution data needs to be read.
</dt>
<dd>
This is the existing solution. It assumes that the
application will not update the file. Or that updating
a file will usually change a large part of it. ISSUE:
Very slow if the applicaction wants to update a tiny
part of a very large file. Also leaks more disk space
than the alternative, although that can be fixed
relatively easy.
</dd>
</dl>
</td>
<td>
Tasks:
<ul>
<li>
Per brick, skip write if contents not changed.
</li>
</ul>
</td>
</tr>
<tr>
<td style="padding-top: 20px; padding-bottom: 20px">
<img src="update-lowres-fig2-2.png"
width="257" height="172" alt="Alternative 2"/>
</td>
<td>
<dl>
<dt style="padding: 20px">
Affected low resolution bricks get updated. Always
re-calculate whole low resolution bricks which means
that some unchanged data needs to be read. Usually
from different LOD levels.
</dt>
<dd>
Keep today's plan C which recursively calculates the
low resolution bricks top down. Short cut the algorithm
when all inputs are known to be unchanged. ISSUE:
the simplest implementation would do this per genlod
buffer which is larger than one brick.
ISSUE: Not usable for compressed files due to
accumulation of noise. Except possibly if LOD
levels 2 and above are stored uncompressed.
</dd>
</dl>
</td>
<td>
Tasks:
<ul>
<li>
Keep track of dirty bricks.
</li>
<li>
Implement the shortcut.
</li>
<li>
Shortcut per brick.
</li>
</ul>
</td>
</tr>
<tr>
<td style="padding-top: 20px; padding-bottom: 20px">
<img src="update-lowres-fig2-3.png"
width="257" height="172" alt="Alternative 3"/>
</td>
<td>
<dl>
<dt style="padding: 20px">
Only the affected low resolution samples get updated.
A read/modify/write is often needed when updating a
low resolution brick. Sometimes even twice. Old data
is still being read back but this time the read back is
from the bricks about to be updated. There will be less
read back in most cases; the figure is misleading
because it shows a 2d view. The main problem is the
bricks that need more than one read/modify/write.
</dt>
<dd>
Use a completely different algorithm (plan B) for
updates and might as well use this for all writes
to uncompressed files. Calculate bottom-up.
If we ever allow updating of compressed files
(unlikely) then this alternative will not work.
ISSUE: More code to write, maintain, and test.
ISSUE: Not usable for compressed files due to
accumulation of noise. Except possibly if LOD
levels 2 and above are stored uncompressed.
</dd>
</dl>
</td>
<td>
Tasks:
<ul>
<li>
Keep track of dirty bricks.
</li>
<li>
Implement plan&nbsp;B.
</li>
</ul>
</td>
</tr>
<tr>
<td style="padding-top: 20px; padding-bottom: 20px">
<img src="update-lowres-fig2-4.png"
width="257" height="171" alt="Alternative 4"/>
</td>
<td>
<dl>
<dt style="padding: 20px">
As alternative 3 but somehow detect which bricks
receive multiple updates, that for those somehow
choose a different algorithm that avoids this.
</dt>
<dd>
See alternatives 2 and 3.
</dd>
</dl>
</td>
<td>
Tasks:
<ul>
<li>
Keep track of dirty bricks.
</li>
<li>
Implement plan&nbsp;B.
</li>
<li>
Implement additional logic.
</li>
</ul>
</td>
</tr>
<tr>
<td style="padding-top: 20px; padding-bottom: 20px">
<img src="update-lowres-fig2-5.png"
width="215" height="152" alt="Alternative 5"/>
</td>
<td>
<dl>
<dt style="padding: 20px">
Introduce a new &quot;Virtual ZGY&quot; file format to
be used for very large files. This would basically be
a list of regular ZGY files to read the real data
from. OpenZGY would hide that detail from the API. The
individual real ZGY files would be assumed to be small
enough to be deleted and re-created if any part of them
is touched. Incidentally this feature might also
facilitate allowing multiple processes write to the
(locically) same ZGY file.
</dt>
<dd>
Minor note: If it is desirable to still provide all LOD
levels up to the one where only one brick is required
then the single topmost LOD in each partial file should
be stored uncompressed and should be duplicated in a
single &quot;low-resolution&quot; partial file
holding all LOD levels that are built from data in
multiple files.
</dd>
</dl>
</td>
<td>
Tasks:
<ul>
<li>
Too many steps to list here.
</li>
</ul>
</td>
</tr>
</table>
<p>
Additional issue for all alternatives except the first one.
The code must handle incremental update of statistics and
histogram by reading the area to be overwritten and
subtracting its contribution before adding the new data. There
are issues with numerical accuracy and what happens if the new
value range is larger than the old so the histogram has to be
resized.
</p>
<p>
Additional issue for alternative 1: Updating data on the cloud
actually ends up appending data, wasting the space taken by
the old contents. This is how the cloud buckets work and is
not feasible to work around. So updates need to be kept to a
minimum. The algorithm in the first alternative can end up
overwriting some low resolution bricks which the exact same
content that they already have. So, additional code is needed
to prevent that.
</p>
<p>
OpenZGY allows an already written file to be opened for update
and have data appended or (uncompressed only) updated. This
raises the question of when and how low resolution data is
updated.
</p>
<p>
Applications wanting update support for a file stored on the
cloud should consider very carefully whether they want to use
the partial update feature. See the cloud-specific caveats
discussed below. Often it works better to just delete and
re-create the file instead. Unless the file is being used as
some kind of working storage that is updated frequently. If the
file is updated <i>really</i> frequently it might also be needed
to defer updating low resolution data until actually accessed.
</p>
<p>
When writing to the cloud there are several caveats. Switching
to Cloud enabled ZGY access <b>cannot be made completely
transparent</b> to application code. Especially not if updating
parts of a file. The main problem is that neither GCS buckets
nor Azure storage allow updates. We can allow updates in the
OpenZGY API but each time the ZGY file is updated the file size
on the cloud will grow. Another can of worms, outside the scope
of this document, is the need for <b>read and write locks</b>
for files on the cloud if ZGY files cannot be treated as
immutable.
</p>
<p>
For similar reasons OpenZGY will also leak disk space if a
compressed file is allowed to be opened for update. For
compressed data this happens both in cloud storage and on-prem.
</p>
<h2>Support for incremental build of low resolution data</h2>
<p>
This only applies to files where at least the low resolution
bricks are stored uncompressed. Otherwise the operation would
keep accumulating compression noise.
</p>
<p>
If a file to be updated has a valid set of low resolution bricks
the OpenZGY library will start tracking in memory changes to
histogram and statistics and maintaining a list of which bricks
have been touched. This means that after updating a file it may
be possibly to only re-calculate low resolution data in areas
that have changed. Currently the default is still to run a
complete rebuild on each close. Applications can request the
incremental option in the finalize() call. The change tracking
is not foolproof. Numerical inaccuracy might affect the
histogram. The histogram range cannot grow and the statistical
range cannot shrink.
</p>
<p>
The incremental algorithm is reasonably efficient. Whenever a
brick-column is processed and the flags say it is unchanged,
instead of calculating the contents recursively it will just
read the decimated data from the layer above. The fact that the
granularity is one brick-column instead of just one brick means
that some redundant computation might still be done. Also,
reading decimated data from the layer above means reading just
1/4 of a brick column. If there are 2 or 3 dirty "sibling
brick-columns", i.e. data that map to the same brick at lod+1,
then the same data gets read more than once. Fixing those issues
is probably not worth the effort.
</p>
<h2>Allow reading low resolution data from a file still open for write</h2>
<h4>Ambition level 1</h4>
<p>
Add a "lod" argument to IZgyWriter::read().
</p>
<p>
This is a trivial change now that support has been added to the
lower level code. There is even a unit test ready to be enabled
if the change is done. But due to several caveats the feature
should not be enabled unless somebody needs it.
</p>
<p>
The main caveat is that the low resolution data will only be up
to date after a call to finalize(). This puts an additional
burden on the application and a risk of subtle errors if the
application forgets.
</p>
<p>
Another caveat: The first time finalize() is called, the value
range of the histogram gets locked down to the range of samples
seen at that point. It gets unlocked again on a full finalize.
If the application calls the first finalize too early then this
means more subtle behavior the application needs to be aware of.
</p>
<p>
Another caveat is that if the file is on the cloud then every
finalize will leak some disk space.
</p>
<h4>Ambition level 2</h4>
<p>
Make this transparent to the application by doing an incremental
finalize each and every time the application reads lowres data
from a file open for write. This is not as expensive as it
sounds because if there has not been any writes between two or
more reads then the incremental builds the finalize is a no-op.
And of course, being incremental it won't be that expensive. But
higher level lods might be rebuilt fairly often and the single
brick that is the highest lod will be rebuilt on any change.
</p>
<p>
nlods() should be changed to return the possible number of lods
even when the lowres data is not current. Because it magically
becomes current when needed.
</p>
<p>
Caveat: The note in ambition level 1 about calling finalize()
too often is worse here since the application is no longer in
control of when finalize is called. As a minimum the finalize on
file close probably needs a full rebuild even when it otherwise
seems safe to do an incremental one. The code should set a
special flag to this effect if the change from "need full" to
"incremental allowed" is caused by one of these implicit
rebuilds. Calling close() then does a full build. On the cloud,
an arbitrary amount of space can be leaked because there is no
control of how many finalize() calls there will be.
</p>
<h4>Ambition level 3</h4>
<p>
Only generate the data actually needed to resolve the user's
read request. After thinking more closely about this I believe
the cost of coding is way higher than the benefits.
</p>
<p>
The current code assumes that if brick X is clean, the
corresponding 1/8th of the parent is up to date. This will no
longer be true. So another bit "half-dirty" in the dirty bitmap
needs to note that brick X is clean but its parent is not so if
this is a recursive call then X must be treated as dirty. I
think I need to set this in all bricks of the requested LOD that
were dirty. Because the in-memory buffer of decimated data will
be discarded (or just not computed). Bricks in recursively
computed lower layers won't need this special handling. I hope.
</p>
<p>
The improved algorithm shown below is still not optimal because
if new lowres bricks are required then _calculate() will write
them to file and then the read request will read back that same
data instead of having _calculate() somehow keep the data in
memory. Implementing that appears to be ridiculously
complicated. Both because not all the bricks might have needed a
rebuild and because the rebuild returns both more (full brick
columns) and less (because just one brick column at a time) data
then what the application asked for. So copying the result out
to the user's buffer might get pretty complicated.
</p>
<pre>
Actual algorithm:
On read of LOD > 0: Check whether the exact area of the read request is dirty.
IF dirty:
IF no pre-existing histogram range: set special full-rebuild-flag = true.
Instanciate a GenLodC.
IF ambition level == 2:
Invoke _calculate() on top level
ELSE:
Make list of brick-columns at lod overlapping the read request.
FOR EACH brick column:
Invoke _calculate() for this column, passing need_result = false.
Somehow set the special half-dirty in the approptiate places.
END IF dirty
</pre>
<h4>Ambition level 4</h4>
<p>
This is orthogonal to the rest.
</p>
<p>
Re-introduce the dynamically widening histogram. This fixes the
caveat of incorrect histograms but will be expensive to code.
</p>
<h2>Virtual ZGY files</h2>
<img src="update-lowres-fig2-5.png"
width="215" height="152" alt="Alternative 5"/>
<p>
A different approach both to partial updates and incremental
builds is to introduce a new &quot;Virtual ZGY&quot; file format
to be used for very large files. This would basically be a list
of regular ZGY files to read the real data from. Plus histogram
and statistics for the whole file.
</p>
<p>
Incidentally this feature might also facilitate allowing
multiple processes to write to the (logically) same ZGY file in
parallel.
</p>
<p>
The OpenZGY API would hide these details in IZgyReader so
application code wouldn't even know that multiple files were
involved. Writing such files would require application changes.
</p>
<p>
individual real ZGY files would be assumed to be small enough to
be deleted and re-created if any part of them is touched. And to
do a full rebuild of low resolution data in this file only.
</p>
<p>
Minor note: If it is desirable to still provide all LOD levels
up to the one where only one brick is required then the single
topmost LOD in each partial file should be stored uncompressed
and should be duplicated in a single &quot;low-resolution&quot;
partial file holding all LOD levels that depend on data from
multiple files.
</p>
</body>
</html>
......@@ -1269,9 +1269,10 @@ public:
* that case it is not possible to request a progress callback.
*
* It is valid to call finalize() and then continue writing to the
* file. This might be useful if the application is mixing reads and
* writes in the same open file. The low resolution data will only be
* up to date after a call to finalize().
* file. This is currently not useful. The reason the application
* might want this is to allow reading low resolution data from
* a file that is still open for write. The API blocks this today,
* simply by removing the lod parameter from IZgyWriter::read().
*
* If the processing raises an exception the data is still marked
* as clean. So a second attempt will do nothing unless the
......
......@@ -2233,6 +2233,15 @@ ZgyInternalBulk::_writeAlignedRegion(
* and brick consolidation) and what is best depends on whether
* we are tracking changes (so every brick must be read anyway) and
* whether the file is on the cloud and how its bricks are sorted.
*
* TODO-@@@: If the file is open for update and if old contents are
* only needed to update stats not for r/m/w then try a readconst
* first. Especially if the new data to be written is a large
* writeconst. This can save inflating some scalar buffers. Which can
* also reduce the risk of running out of memory in a few special
* cases with a large writeconst that is still smaller than the entire
* survey. Unfortunately this change complicates the r/m/w logic even
* further. It adds a bad code smell.
*/
void
ZgyInternalBulk::writeRegion(
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment