Commit a5c72e15 authored by Paal Kvamme's avatar Paal Kvamme
Browse files

Update comments.

parent 72e38b66
......@@ -36,10 +36,14 @@
*
* Code flow for writing:
*
* The main logic is a couple of methods where each function ends
* with calling the next level down. So they might easily be refactored
* into a set of methods to be called sequentially. Or (horror!) into
* a single function. The order is:
* The main logic is a couple of methods where each function might have
* ended with calling the next level down. But to make multi threading
* easier to implement (and obfuscate the code a bit), _writeOneBrick()
* and _writeOneNormalBrick() instead return to the calling
* _writeAlignedRegion() with an argument package that should be sent
* to the next method in the chain.
*
* The order of calls is:
*
* writeRegion (arbitrary region, user's value_type)
*
......@@ -51,18 +55,15 @@
* Apply splitting into bricks.
* Expand bricks at survey edge to full size unless r/m/w already did.
* Convert sample position to brick number. I.e. divide by bricksize.
* The next layer (_writeOneBrick) can be called with one brick
* at a time so we can re-use the buffer. On the other hand,
* passing a list of bricks or even all the bricks in the request
* might help parallelization.
*
* _writeOneBrick (single brick, native vt)
*
* Apply conversion from scalar to bulk or vice versa in some cases.
* Disallow compression in some circumstances. This is why compress
* should not be done in the higher levels.
* If the data end up constant then call _writeOneConstantBrick
* and we are done. So skip the rest.
* Might convert from a scalar to a regular buffer or vice versa.
* Might decide whether to update an existing brick, and where.
* Might decide to veto the compression.
* The next step depends on whether the buffer ended up scalar
* (_writeOneConstantBrick) or not (_writeOneNormalBrick eventually
* followed by _writeWithRetry)
*
* _writeOneNormalBrick
*
......@@ -180,7 +181,7 @@ struct WriteBrickArgPack
std::array<std::int64_t,3> brickpos;
std::int32_t lod;
std::shared_ptr<DataBuffer> data;
compressor_t compressor; // TODO-minor, caller use std::ref?
compressor_t compressor; // TODO-Low, caller use std::ref?
//bool use_compressor; // or this?
std::int64_t fileoffset;
WriteBrickArgPack(const std::array<std::int64_t,3>& brickpos_in,
......@@ -321,7 +322,7 @@ public:
/**
* ZgyInternalBulk is used both from ZgyReader, which is in general not
* allowed to change any state to make it thread safe, and from the
* allowed to change any state to help make it thread safe, and from the
* non threadsafe ZgyWriter.
*
* To mitigate the risk of accidentally modifying data using a ZgyReader
......@@ -460,7 +461,7 @@ ZgyInternalBulk::readToExistingBuffer(
const double defaultvalue = this->_metadata->ih().defaultvalue();
// Make a separate pass to gather all the bricks we need to read.
// FUTURE: we might fetch some of them in parallel and we might be
// The lower levels might fetch some of them in parallel and we might be
// able to combine bricks to read larger blocks at a time. Both changes
// can have a dramatic performance impact effect on cloud access.
......@@ -1145,11 +1146,13 @@ ZgyInternalBulk::_setPaddingSamples(
* Done here: write the brick to the physical file or to the cloud.
* Update the lookup table if a new block was allocated.
*
* TODO-Multithreading:
* The operation is currently not thread safe. For cloud writes it would
* not help much to allow multi threading since most writes end up being
* buffered anyway. Supporting parallel writes in that scenario would
* require changes at a lower level. Flushing multiple buffers in paralell,
* require changes at a lower level. Also, SegmentClosedExceptopn would
* get really problematic to handle.
*
* TODO-Performance: Instead consider flushing multiple buffers in parallel,
* or (more likely) use some parallel-upload features in GCS / Azure.
* Or maybe have the lower level split each object apart from the one
* with the headers into a fixed number of parts that can be uploaded
......@@ -1159,21 +1162,6 @@ ZgyInternalBulk::_setPaddingSamples(
* the two xx_write() calls might help. But that requires a lot of testing.
* Don't do it unless certain there will be a substantial benefit.
*
* Multi-threading padding and compression is a different matter and
* would very likely improve the speed of compressed writes. Coding wise
* it might be better to solve this by doing compression earlier.
* Perhaps even making compression a feature of DataBuffer.
*
* Can I instead queue up writes to be handled later? No. At least not for
* cloud writes which might throw a SegmentIsClosed exception. Recovering
* from that would be next to impossible. Also the lifetime of the rawdata
* pointer is only the duration of this call. Expensive (at runtime)
* changes would be needed in DataBuffer.
*
* Would it help to refactor this function and its callers to operate on
* a list of bricks instead of one? Possibly but the added complexity
* might not be worth it.
*
* Args: brickpos, lod, fileoffset, rawdata, brickstatus.
*/
void
......@@ -1213,7 +1201,7 @@ ZgyInternalBulk::_writeWithRetry(const WriteNowArgPack& args)
/**
* Process a single brick that is to be written to storage.
* TODO-High: Fix: Input must be a regular brick, not a scalar.
* Input must be a regular brick, not a scalar.
* In this function:
*
* * Apply padding
......@@ -1222,9 +1210,6 @@ ZgyInternalBulk::_writeWithRetry(const WriteNowArgPack& args)
* * Convert DataBuffer to void* + size suitable for the file i/o layer.
*
* Thread safety:
* FUTURE: Change the function to return the modified argument pack
* instead of chaining the next function.
* With that change the following shouls be true:
* The function is thread safe in the sense that multiple bricks
* belong to the same write request may be processed in parallel.
* No other changes, such as data being written to the file, are
......@@ -1247,12 +1232,8 @@ ZgyInternalBulk::_writeOneNormalBrick(const WriteBrickArgPack& args)
if (_logger(1))
_logger(1, "_writeOneNormalBrick(" + args.toString() + ")\n");
// TODO-Medium: if scalar(data): return data, impl.enum.BrickStatus.Constant
// Make it possible for this function to be called for all data
// in _writeOneBrick(). Then move it even further up to
// _writeAlignedRegion() which will hopefully enable some
// parallelization on write. But there are other ways of achieving this
// as well. E.g. make more of these functons operate on lists.
if (args.data->isScalar())
throw OpenZGY::Errors::ZgyInternalError("Wrong brick type in _writeOneNormalBrick");
if (args.data->size3d() != this->_metadata->ih().bricksize())
throw OpenZGY::Errors::ZgyInternalError("Wrong brick size in _writeOneNormalBrick");
if (args.data->totalsize() != args.data->allocsize())
......@@ -1267,17 +1248,6 @@ ZgyInternalBulk::_writeOneNormalBrick(const WriteBrickArgPack& args)
// convenient if we decide to queue the write request.
rawdata_t rawdata(args.data->voidData(), args.data->allocsize() * args.data->itemsize());
// TODO-Compression, might also want to check for all-constant
// by calling _isUsedPartOfBrickAllConstant() and if true, return
// (data.flat[0], impl.enum.BrickStatus.Constant) and do not attempt
// to compress. This function would then be called from
// _writeAlignedRegion() instead of _writeWithRetry().
// There is a slight amount of added work because if the data block
// already exists on file (which we don't know yet) the test would
// not be needed.
// With that change both _writeOneNormalBrick() and _writeOneBrick()
// will need to know the current brick status, and _writeOneNormalBrick()
// would return .Constant for constant data.
BrickStatus brickstatus = BrickStatus::Normal;
if (args.compressor) {
_logger(5, "Attempting compression");
......@@ -1404,6 +1374,9 @@ ZgyInternalBulk::_mustLeakOldBrick(
* * Might decide whether to update an existing brick, and where.
* * Might decide to veto the compression.
*
* The veto and the potential convert to scalar is why compression
* should not be done earlier.
*
* brickpos is given relative to this lod level. For lod 0 the valid
* range is the survey size in bricks. For lod 1 it is half that,
* rounded upwards.
......@@ -1413,9 +1386,6 @@ ZgyInternalBulk::_mustLeakOldBrick(
* The data can be either a scalar or a regular buffer.
*
* Thread safety:
* FUTURE: Change the function to return the modified argument pack
* (or update it in place) instead of chaining the next function.
* With that change the following shouls be true:
* The function is thread safe in the sense that multiple bricks
* belong to the same write request may be processed in parallel.
* No other changes, such as data being written to the file, are
......@@ -1462,10 +1432,6 @@ ZgyInternalBulk::_writeOneBrick(const WriteBrickArgPack& args_in)
// Caller did not explicitly ask for a constant value,
// but all useable samples (i.e. all samples that are
// inside the survey boundaries) have the same value.
// TODO-Low, the caller could have made this test.
// This would waste some time if it turns out we need
// to expand the brick again. But it might help with
// more efficient parallelization. Or not.
_logger(1, "Implicit store constant.\n");
// Need to change to a real scalar. This is a bit roundabout.
// TODO-Low can I have a DataBuffer::makeSimilar() that retains
......@@ -1490,7 +1456,7 @@ ZgyInternalBulk::_writeOneBrick(const WriteBrickArgPack& args_in)
// because if it had been compressed then _mustLeakOldBrick()
// would have returned true and we wouldn't have gotten here.
//
// TODO-High: Fragile code:
// TODO-Medium: Fragile code:
//
// * _mustLeakOldBrick() might change to allow overwriting
// a compressed brick with a smaller one. If so, must
......@@ -1599,11 +1565,12 @@ ZgyInternalBulk::_writeAlignedRegion(
// * If multi-threading is disabled at runtime then the "brick" buffer
// can be re-used. _writeOneConstantBrick() or _writeWithRetry(*it)
// would be called inside the loop, one brick at a time.
//
// * Instead of processing every single brick and then writing all
// at the end, consider splitting huge requests into several
// slightly less huge requests and process those in sequence.
// (the same applies to the r/m/w logic).
//
// If the user-provided data is a scalar then it can almost be used as-is
// and using the same brick for every write. But the size needs to be
// adjusted to become just a single brick. _writeOneConstantBrick()
......@@ -1803,9 +1770,3 @@ ZgyInternalBulk::writeRegion(
}
} // namespace
// Refactoring 12-Jan-2021
// (delete this comment after things check out)
//
// * _writeOneBrick, _writeIneNormalBrick, _writeOneConstantBrick
// now use an argument package as input.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment