Commit cf9b4d5b authored by Paal Kvamme's avatar Paal Kvamme
Browse files

Working on making genlod more flexible, needed to reduce granularity.

parent 25cb6976
......@@ -302,7 +302,7 @@ GenLodImpl::call()
<< "\n";
this->_reporttotal(_willneed());
this->_report(nullptr);
this->_calculate(std::array<std::int64_t,3>{0,0,0}, this->_nlods-1);
this->_calculate(index3_t{0,0,0}, this->_bricksize * std::int64_t(2), this->_nlods-1);
return std::make_tuple(this->_stats, this->_histo);
}
......@@ -356,34 +356,27 @@ GenLodImpl::_accumulate(const std::shared_ptr<const DataBuffer>& data)
}
/**
* Read data from the specified (readpos, readlod) and store it back.
* The function will itself decide how much to read. But with several
* constraints. Always read full traces. Size in i and j needs to be
* 2* bs * 2^N where bs is the file's brick size in that dimension,
* Clipped to the survey boundaries. This might give an empty result.
*
* If doing an incremental build the minimum size is also the maximum
* size because the buffer size determines the granularity of the
* "dirty" test and that needs to be as small as possible.
* So, size will end up as 2x2 brick columns or less at survey edge.
* The current code does this also for full rebuilds because it is
* simpled (no need to figure out the available memory size) and also
* unlikely to make a noticeable difference in performance. Except
* _possibly_ if the file is optimized for slice access and resides
* on the cloud.
*
* TODO-Performance: Allow the application to configure how much memory
* we are allowed to use. Increase the block size accordingly.
* Larger bricks might help the bulk layer to become more efficient.
*
* When readlod is 0 and the data was read from the ZGY file then the
* writing part is skipped. Since the data is obviously there already.
*
* In addition to reading and writing at the readlod level, the
* method will compute a single decimated buffer at readlod+1 and
* return it. As with the read/write the buffer might be smaller at
* the survey edge. Note that the caller is responsible for storing
* the decimated data.
* Read data from the specified (readpos, readsize, readlod) and store the
* same data back to file, if it wasn't read from file in the first place.
* The vertical start and size is ignored. Full vertical traces are read.
* The provided readpos and readsize must both be brick aligned.
* The actual size of the returned buffer will be clipped to the survey
* size and might even end up empty. The returned buffer has no padding.
*
* In addition to reading and writing at the readlod level, the method will
* compute a single decimated buffer at readlod+1 and return it. As with
* the read/write the buffer might be smaller at the survey edge. Note that
* the caller is responsible for storing the decimated data.
*
* If doing an incremental build, readsize is best set to one brick-column
* because this determines the granularity of the "dirty" test and that
* needs to be as small as possible. TODO-@@@
*
* TODO-Performance: If doing a full build it might we a good idea to allow
* the application to configure how much memory the computation is allowed
* to use. Increase readsize accordingly. Larger bricks might help the bulk
* layer and the multi-threaded decimation routines become more efficient.
* The gain might not be noticeable though.
*
* Full resolution data (lod 0) will be read from file (plan C) or the
* application (plan D). Low resolution is computed by a recursive call
......@@ -396,16 +389,21 @@ GenLodImpl::_accumulate(const std::shared_ptr<const DataBuffer>& data)
* storing them. For plan B the caller must iterate.
*
* The function is also responsible for collecting statisics and
* histogram data. Note that some of the decimation algorithms use the
* histogram of the entire file. Ideally the histogram of the entire
* file should be available before decimation starts but that is
* impractical. At least make sure the histogram updfate is done early
* enough and the decimation late enough that the chunk of data being
* decimated has already been added to the histogram.
*
* histogram data when doing a full build. For incremental builds
* it becomes the callers repsonsibility to track changes as data
* is written.
*
* Note that some of the decimation algorithms need the histogram of the
* entire file. Ideally the histogram of the entire file should be
* available before decimation starts but that is impractical. At least
* make sure the histogram update is done early enough and the decimation
* late enough that the chunk of data being decimated has already been
* added to the histogram. TODO-Low: Read e.g. 5% of all bricks up front
* to get a better approximation of the histogram and then use that
* result for the enite lowres generation.
*/
std::shared_ptr<DataBuffer>
GenLodImpl::_calculate(const std::array<std::int64_t,3>& readpos_in, std::int32_t readlod)
GenLodImpl::_calculate(const index3_t& readpos_in, const index3_t& readsize_in, std::int32_t readlod)
{
const std::int64_t lodfactor = std::int64_t(1) << readlod;
const std::array<std::int64_t,3> surveysize =
......@@ -413,15 +411,8 @@ GenLodImpl::_calculate(const std::array<std::int64_t,3>& readpos_in, std::int32_
const std::array<std::int64_t,3> readpos{readpos_in[0], readpos_in[1], 0};
if (readpos[0] >= surveysize[0] || readpos[1] >= surveysize[1])
return nullptr;
// Amount of data to read. The method is allowed to return less,
// or even nothing, but only due to going past the survey edge.
// The choice is currently always the same so it is hard coded
// here instead of being passed as a parameter.
// TODO-@@@: if (_incremental) may want to reduce the block size.
const std::array<std::int64_t,3> chunksize
{this->_bricksize[0] * 2,
this->_bricksize[1] * 2,
surveysize[2]}; // always read full traces.
{readsize_in[0], readsize_in[1], surveysize[2]}; // Read full traces.
const std::array<std::int64_t,3> readsize
{std::min(chunksize[0], (surveysize[0] - readpos[0])),
std::min(chunksize[1], (surveysize[1] - readpos[1])),
......@@ -462,31 +453,37 @@ GenLodImpl::_calculate(const std::array<std::int64_t,3>& readpos_in, std::int32_
{chunksize[0], chunksize[1], 0}};
std::shared_ptr<DataBuffer> hires[4]{nullptr, nullptr, nullptr, nullptr};
// TODO-Performance: If algorithm "C" and readlod==1 it should be
// fairly safe to parallelize this loop, using 4 threads for read.
// Because _write() does nothing at readlod==0. Technically we
// could also have consolidated the 4 lod0 read requests but that
// would mean quite a bit of refactoring.
// Worry: The assumption that _write is a no-op is important,
// so it probably needs to be checked to avoid parallel writes.
// Alternatively, and I suspect this will be more difficult, the code
// might be refactored so the bulk layer sees just a single request
// that is 4x larger. The bulk layer can then do multi threading itself.
// The reason this is better than the first approach is that the bulk
// layer in the cloud case might be able to consolidate more bricks.
// Or, can I just check if readlod==1 make just one call and skip paste?
// There are probably some more survey-edge cases I need to handle then.
// Also, carefully analyze how this affects total memory usage.
// Compute the requested result by recursively reading 4 chunks
// at lod-1 and gluing the result together.
//
// This loop should normally not be parallelized or consolidated into a
// single call reading 4x the amount of data. The serial loop is what
// prevents the recursion from trying to read the entire file into memory.
//
// TODO-Performance: A possible exception in algorithm "C" is when
// readlod==1. Caveat: For incremental builds we might not need all 4
// sub-parts. Caveat for multi threading: nested loops, smaller blocks.
// Caveat for replacing with a single call: more special cases to test.
// In particular handling of crops to survey size.
//
// TODO-@@@: If one of more sub-blocks have a nonzero size and are
// not flagged as dirty then we can skip reading that sub-block.
// This means we get a read/modify/write cycle on the block we are
// about to update. That needs to be handled here. Read the old
// contents and pass it down to _paste4() as a preallocated result.
for (int ii=0; ii<4; ++ii)
hires[ii] = this->_calculate(readpos*std::int64_t(2) + offsets[ii], readlod-1);
hires[ii] = this->_calculate(readpos*std::int64_t(2) + offsets[ii],
chunksize, readlod-1);
data = this->_paste4(hires[0], hires[1], hires[2], hires[3]);
// TODO-@@@: Need to Check again whether we are dealing with a buffer
// that isn't tagged as scalar but still has all samples set to the
// same value. There are a few cases where this will not be detected.
// The test should be cheap because the expectation is that the test
// will fail and it will typically do that very fast.
}
// TODO-Performance: If parallelizing above, needs to have a test
// if (plan != "C" || readlod > 0)
if (!wasread)
this->_write(readlod, readpos, data);
......
......@@ -128,7 +128,7 @@ protected:
void _accumulateT(const std::shared_ptr<const DataBuffer>& data_in);
void _accumulate(const std::shared_ptr<const DataBuffer>& data);
std::shared_ptr<DataBuffer>
_calculate(const index3_t& readpos_in, std::int32_t readlod);
_calculate(const index3_t& readpos, const index3_t& readsize, std::int32_t readlod);
std::shared_ptr<DataBuffer>
_decimate(const std::shared_ptr<const DataBuffer>& data, std::int64_t lod);
std::shared_ptr<DataBuffer>
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment