Commit 9402da40 authored by Paal Kvamme's avatar Paal Kvamme
Browse files

Minor performance improvement (CPU) by more short cuts in decimation.

parent b7e65108
......@@ -751,11 +751,13 @@ template <typename T, int NDim>
bool
DataBufferNd<T,NDim>::isAllSame(const std::int64_t *used_in) const
{
if (isScalar())
return true;
SimpleTimerEx tt(AdHocTimers::instance().allsame);
// TODO-Low yet another place I wish I hadn't templeted on NDim.
std::array<std::int64_t,NDim> used;
for (int dim=0; dim<NDim; ++dim)
used[dim] = used_in[dim];
used[dim] = used_in ? used_in[dim] : _size[dim];
if (used[0]<=0 || used[1] <= 0 || used[2] <= 0 || isScalar()) {
// Has zero or one sample, so clearly all samples are the same.
......
......@@ -360,30 +360,46 @@ GenLodImpl::_accumulate(const std::shared_ptr<const DataBuffer>& data)
}
/**
* Read data from the specified (readpos, readsize, readlod) and store the
* same data back to file, if it wasn't read from file in the first place.
* Read or recursively compute a chunk (usually 1 or 4 brick-columns)
* of data at the specified (readpos, readsize, readlod) and return a
* memory buffer 1/8th of the input size holding a decimated version.
* Also store the chunk that was obtained (i.e. not the decimated data)
* back to file, if it wasn't read from file in the first place.
* If the decimated data is to be saved then the caller must do that.
*
* When asked to read (and implicitly write) the highest LOD level then
* nothing is returned because no more decimation is to be done.
*
* The vertical start and size is ignored. Full vertical traces are read.
* The provided readpos and readsize must both be brick aligned.
* The actual size of the returned buffer will be clipped to the survey
* size and might even end up empty. The returned buffer has no padding.
*
* In addition to reading and writing at the readlod level, the method will
* compute a single decimated buffer at readlod+1 and return it. As with
* the read/write the buffer might be smaller at the survey edge. Note that
* the caller is responsible for storing the decimated data.
* When doing an incremental build, if calculate is called with a clean
* region it will not need to do a recursive call and it will not need
* to write anything. It still needs to return an in-memory buffer of
* decimated data but it can get that by reading 1/8th of a chunk of
* decimated data from the lod+1 level. Because when the chunk is clean
* then the corresponding part of the lowres brick is still valid.
*
* If calculate is called with a clean region it will not need to do a
* recursive call and it will not need to write. The code has two choices:
* TODO-@@@: Performance: Incremental build: Reading more data than needed.
* Assuming the chunk size is one brick-column which means that any chunk C
* not on the survey edge will have 4 brick-columns as input. If two or
* three of those inputs are marked as clean then there will eventually be
* two or three reads from different parts of C in order to get hold of
* the still valid decimated data it contains. Each read will read just
* one quarter of a brick column. So the low level I/O routine will need
* to read the full brick column and discard 3/4 or the data. The net
* effect is that the I/O layed may be reading the same data 2 or 3 times.
* Note that the issue might not be noticeable in practice because it
* is more likely for a chunk to have either zero dirty inputs, i.e. no
* calculation needed, or all dirty inputs.
* If a fix really is needed, consider:
*
* - Read the requested brick, which is normally only done in LOD0, and
* decimate and return as usual. For LOD0 this is business as usual.
* For low resolution data it saves soem I/O and CPU time.
*
* - Read 1/4 of the brick at lod+1 and return this already decimated
* data. This technically reads even less data from the file. But if
* the chunk size is a single brick column this needs to read 1/4
* brick column from file which ends up reading the full columns and
* discarding the surplus. So it saves some CPU but might not save I/O.
* - Rewrite the two lines handling recursive calls to something a lot
* more complicated. Implementing some kind of r/m/w logic.
* - Set chunk size to 4 brick columns. Cure may be worse than the disease.
* - Some kind of caching.
*
* If doing an incremental build, readsize is best set to one brick-column
* because this determines the granularity of the "dirty" test and that
......@@ -491,12 +507,6 @@ GenLodImpl::_calculate(const index3_t& readpos_in, const index3_t& readsize_in,
hires[ii] = this->_calculate(readpos*std::int64_t(2) + offsets[ii],
chunksize, readlod-1);
data = this->_paste4(hires[0], hires[1], hires[2], hires[3]);
// TODO-@@@: Need to Check again whether we are dealing with a buffer
// that isn't tagged as scalar but still has all samples set to the
// same value. There are a few cases where this will not be detected.
// The test should be cheap because the expectation is that the test
// will fail and it will typically do that very fast.
}
if (!wasread)
......@@ -518,7 +528,12 @@ GenLodImpl::_calculate(const index3_t& readpos_in, const index3_t& readsize_in,
#endif
}
}
else if (data->isScalar()) {
else if (data->isScalar() || data->isAllSame(nullptr)) {
// The test for isAllSame() is just a performance tweak.
// It normally returns false and in that case it normally
// returns very quickly. So there should be no downside.
// data is shrink-to-fit i.e. has no padding at the survey
// edge so isAllSame() doesn't need the "used" parameter.
result = DataBuffer::makeScalarBuffer3d
(data->scalarAsDouble(), returnsize, data->datatype());
}
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment