Commit 22c0cecf authored by Paal Kvamme's avatar Paal Kvamme
Browse files

Merge branch 'kvamme62/more-parallel' into 'master'

More multi-threading

See merge request !37
parents c1b799d6 c1d5cf2c
Pipeline #24608 passed with stages
in 5 minutes and 39 seconds
......@@ -31,6 +31,34 @@ html/index.html. If you have
installed when building there will also be a single-document pdf
versions of those two next to the .tgz files.
## <span style="color:blue">Performance</span>
The pure Python reference implementation is single threaded.
OpenZGY/C++ will in most cases make use of multiple threads.
Applications are allowed to issue multiple read requests in parallel
to the same open file. Even if they don't, parallelization done inside
OpenZGY ensures that the performance is still reasonable
Read requests to cloud storage will automatically consolidate neighbor
bricks. This makes cloud I/O faster for typical access patterns, since
reading a typical brick (1MB) at a time from cloud storage is very
inefficient. Note that the brick size isn't increased, it just looks
that way. So when you need smaller bricks because you have scattered
access then this will work as well.
Individual read requests will be parallelized inside OpenZGY for
copy-out, int8/int16 to float conversion, compression, linux local
file read, and cloud read. As applicable.
Write requests will be serialized with respect to float to int8/int16
conversion, compression, and upload to cloud where applicable.
Also the algorithms that decimate for low resolution data.
The last two bullets are implemented for local file access but some
are still work in progress for cloud I/O.
## <span style="color:blue">Building and testing the core parts</span>
### Building Linux core
......@@ -487,6 +487,7 @@ ZgyInternalBulk::readToExistingBuffer(
// case the request did in fact include such samples we will
// initialize the entire result buffer to the default value.
// TODO-Performance can I refactor the code so this isn't needed?
// Or can I make it fill only the actual padding area?
result->fill(as_float ? defaultvalue : defaultstorage);
std::vector<ReadRequest> requests;
......@@ -554,7 +555,34 @@ ZgyInternalBulk::readToExistingBuffer(
_logger(2, std::stringstream()
<< requests.size() << " read requests are queued\n");
if (!requests.empty())
this->_file->xx_readv(requests, false, false, true, UsageHint::Data);
this->_file->xx_readv(requests, true, false, true, UsageHint::Data);
// Note-Performance: If passing true in the second arguent above this
// could help performance a lot. Especially for reading compressed files
// where the user sends large requests without multithreading. Also
// when finalizing compressed files. parallel_ok=true will cause the
// decompression step to be multi-threaded. Also the conversion to
// float (if needed) and the copy-out to the applicaton's buffer will
// be multi-threaded. But there are caveats:
// * _deliverOneBrick() must be thread safe.
// * The cloud backend doesn't honor the parallel_ok argument.
// While this would be a very good idea it is also rather difficult
// to implement.
// * There is a risk of creating too many threads if the application
// is doing its own multi threading. Ideally the user should
// be able to configure this.
// * Ditto for the low resolution generator. It can probably speed
// up by reading with 4 (but only 4) threads. So this isn't as
// efficient as setting parallel_ok=true here with respect to
// speeding up compression. But it might help I/O against the
// cloud. Which the setting here won't.
// * See commants in LocalFileLinux::xx_readv() and _deliverOneBrick().
// And GenLodImpl::_calculate().
......@@ -890,6 +918,7 @@ ZgyInternalBulk::_partsNeeded(
* The function will always be called to deliver one full brick. If the raw
* data only contains a single sample then this is a constant-value brick.
* otherwise there should be exactly bricksize-in-bytes available.
* After any decompression has been done.
* If the brick size is (1,1,1) (not even possible right now) then the
* distinction between constant-value and regular goes away.
......@@ -903,7 +932,55 @@ ZgyInternalBulk::_partsNeeded(
* Another caveat: Even though result is reference counted, its buffer
* might end up set to "raw" which means it must not be used after the
* delivery function returns. Mitigated by adding a clear() member to
* DataBuffer. THat doesn't help if somebody already has a raw pointer though.
* DataBuffer. That doesn't help if somebody already has a raw pointer though.
* Thread safety: Yes, but TODO-Worry this is not trivial to verify.
* Multithreading by having multiple read requests from the API layer is
* safe. Those requests don't share any mutable data. Specifically,
* "result" will be different for each read request.
* Multithreading by having the low level xx_readv() deliver the
* requested bricks in parallel is also supposed to be thread safe.
* The result parameter is shared between threads but different
* threads will be writing to different parts of the buffer due to
* "bpos" being different. (Note 1,2). The raw pointer might also be
* referring to shared data. E.g. a cache. (Note 3). But since the
* data will be destined for different parts of the result buffer it
* shouldn't be possible for the same bytes to be sent to different
* threads.
* Note 1:
* No two threads should be trying to write to the same location in
* result. But there are no guarantees of alignment. E.g, thread#1
* might write the first 7 bytes, thread#2 the next 7 bytes. The
* assumption is that if the hardware needs to do a read/modify/write
* due to the memory bus or cache line being wider than 8 bits (which
* they definitely are) then the hardware or the compiler is
* responsible for preventing a race condition. For C++11 this appears
* to be the case. C++11 1.7 [intro.memory]/p2, irrelevant note
* omitted:
* A memory location is either an object of scalar type or a
* maximal sequence of adjacent bit-fields all having non-zero
* width. Two or more threads of execution (1.10) can update
* and access separate memory locations without interfering
* with each other.
* Note (*2):
* In the lower layers there are situations where the same block can
* be requested twice in a single request. The reason for that is that
* the caller needs two regions in the file and padding causes the two
* to overlap. Once we get here the padding shouldn't be visible.
* Note (*3):
* Besides, if xx_readv() was called with immutable_ok=true then there
* will be no race condition because "raw" will only be read from. And
* if immutable_ok=false e.g. because of byteswapping then "raw" needs
* to be a private copy of the data anyway.
......@@ -1599,7 +1676,7 @@ ZgyInternalBulk::_writeAlignedRegion(
std::vector<std::shared_ptr<const WriteBrickArgPack>> const_queue(worksize);
std::vector<std::shared_ptr<const WriteNowArgPack>> normal_queue(worksize);
std::atomic<int> errors(0);
#pragma omp parallel for if(enable_compress_mt())
#pragma omp parallel for if(enable_compress_mt() && worksize > 1)
for (std::int64_t ix = 0; ix < static_cast<std::int64_t>(worksize); ++ix) {
try {
const index3_t surveypos = work[ix]; // user's start i0,j0,k0 rounded down
......@@ -1607,6 +1684,7 @@ ZgyInternalBulk::_writeAlignedRegion(
std::shared_ptr<DataBuffer> brick = constbrick;
if (!brick) {
brick = DataBuffer::makeDataBuffer3d(nullptr, 0, bs, data->datatype());
// TODO-Performance, any way of avoiding the fill()?
brick->copyFrom(data.get(), // source,, // in survey coords
......@@ -1627,7 +1705,7 @@ ZgyInternalBulk::_writeAlignedRegion(
normal_queue[ix] = now;
catch (const std::exception& ex)
catch (const std::exception&)
// No effort to log the actual error or to get the loop to
// end earlier. An exception here really shouldn't happen.
......@@ -1717,22 +1795,10 @@ ZgyInternalBulk::writeRegion(
throw OpenZGY::Errors::ZgyUserError("Invalid data type in writeRegion");
// TODO-Performamce:
// Getting the range (for updating min/max), testing for all in-use
// samples being the same (for storing empty bricks), and downcast
// from float to storage (may skip clipping or isfinite() or both)
// are all somewhat related. The function to get the range might tell
// us whether the buffer has any NaNs or Inf or both. Depending on
// the answer the scale to storage might use short cuts. If there are
// no infinites, no nan, and range inside what the target can accept,
// the conversion might become a lot faster. Another tweak that is
// maybe less clean is to just go ahead and downcast without worrying
// about NaN/Inf and just hope they don't mess up too much.
// TODO-Performamce: Combining range() and _scaleDataToStorage()
// might save some time.
if (!is_storage) {
// TODO-Performance: Multi-threading of _scaleDataToStorage().
// Need to figure out the cutoff where the buffer is too small and the
// overhead of OpenMP starts getting in the way.
SimpleTimer t1(*_ststimer, timers_on());
data = _scaleDataToStorage(data);
......@@ -1794,9 +1860,10 @@ ZgyInternalBulk::writeRegion(
// Or somehow replace all padding samples with defaultvalue.
// The range is in storage values. So e.g. for int8 it will be no
// more than [-128..+127].
// TODO-Performance: Add an OpemMP loop here or in data->range()
// since the caller will always be single threaded. Ideally a new
// rangeMT() should work whether the buffer is contiguous or not.
// Note that range() can be expensive. I have seen it take 5% of
// total time in a copy application. But a single call to range()
// is often fast enough to not warrant an OpemMP loop. See range()
// for more detals.
std::pair<double, double> minmax = data->range();
if (minmax.first <= minmax.second) {
......@@ -27,6 +27,7 @@
#include <sstream>
#include <cmath>
#include <iostream>
#include <iomanip>
namespace InternalZGY {
#if 0
......@@ -462,7 +463,11 @@ DataBufferNd<T,NDim>::DataBufferNd(const std::array<std::int64_t,NDim>& size)
SimpleTimer tt(AdHocTimers::instance().ctor, timers_on());
check_stride(_size, _stride);
memset(_data.get(), 0, make_product(size) * sizeof(T));
// TODO-Worry: Leaving the buffer not initialized might turn an easily
// reproducible bug into a heissenbug. But zeroing the buffers here
// is too wasteful. If you must, then do a fill() just after constructing
// a new buffer,
//memset(_data.get(), 0, make_product(size) * sizeof(T));
template <typename T, int NDim>
......@@ -816,37 +821,52 @@ DataBufferNd<T,NDim>::clear()
* \brief Find lowest and highest sample excluding +/- Inf and NaN.
* \details The OpemMP directive is commented out because it might
* unfortunately turn out to run slower. The inner loop is very
* inexpensive which means the code is sensitive to how much overhead
* OpenMP will add. I have made real-world measurements but only in
* one hardware / OS / compiler environment and only for float input.
* I observed that 1 million elements was the point where
* multi-threading barely started to have a benefit. In a typical
* scenario I don't expect to see more then 5 times that number. That
* is a typical size of one brick-column. With 5 million floats the
* task ran 3.5 times faster using 16 threads. With a less capable
* OpenMP installation it might easily have run slower. It could
* potentially run a *lot* slower. In my test, range() accounted for
* 5% of total elapsed time. So I am, weighing a potential 5%
* performance improvement against the risk of slowing down. I am
* chicken. I hope I can instead parallelize this at a higher level.
* Making this issue moot.
template <typename T, int NDim>
std::pair<double, double>
DataBufferNd<T,NDim>::range() const
SimpleTimer tt(AdHocTimers::instance().range, timers_on());
double min = std::numeric_limits<double>::infinity();
double max = -std::numeric_limits<double>::infinity();
T min = std::numeric_limits<T>::max();
T max = std::numeric_limits<T>::lowest();
if (isScalar()) {
if (IsFiniteT(scalarAsDouble()))
min = max = scalarAsDouble();
if (IsFiniteT(scalarValue()))
min = max = scalarValue();
else if (contiguous()) {
// Short cut: No holes. Treat as 1d.
const T *ptr = data();
const T *end = ptr + allocsize();
for (; ptr < end; ++ptr) {
if (IsFiniteT(*ptr)) {
min = max = *ptr;
// Due to the loop above I can skip the check for min > max
for (; ptr < end; ++ptr) {
const T value = *ptr;
const T * const ptr = data();
const std::int64_t totalsize = allocsize();
//#pragma omp parallel for reduction(min: min) reduction(max: max) if(totalsize >= 1*1024*1024)
for (std::int64_t ii=0; ii<totalsize; ++ii) {
const T value = ptr[ii];
if (!IsFiniteT(value)) continue;
else if (min > value) min = value;
else if (max < value) max = value;
if (min > value) min = value;
if (max < value) max = value;
else {
const T * const ptr = data();
//#pragma omp parallel for -- NOT TESTED YET
for (int ii=0; ii<_size[0]; ++ii) {
for (int jj=0; jj<_size[1]; ++jj) {
for (int kk=0; kk<_size[2]; ++kk) {
......@@ -859,7 +879,7 @@ DataBufferNd<T,NDim>::range() const
return std::pair<double, double>(min, max);
return std::make_pair(static_cast<double>(min), static_cast<double>(max));
template <typename T, int NDim>
......@@ -1040,6 +1060,13 @@ DataBufferNd<T,NDim>::s_scaleToFloat(const DataBuffer* in,
typename dst_type::value_type *dst_ptr = dst->data();
const src_type::value_type *src_ptr = src->data();
const src_type::value_type *src_end = src_ptr + src->allocsize();
#if 0
std::cerr << "@@ Converting" << std::setprecision(14)
<< " " << *src_ptr
<< " to " << static_cast<typename dst_type::value_type>(*src_ptr * a + b)
<< " using factors " << a << " " << b
<< std::endl;
while (src_ptr < src_end)
*dst_ptr++ = static_cast<typename dst_type::value_type>(*src_ptr++ * a + b);
return dst;
......@@ -1087,9 +1114,18 @@ DataBufferNd<T,NDim>::s_scaleFromFloat(const DataBuffer* in,
// for infinite or both is needed.
dst_type::value_type *dst_ptr = dst->data();
const typename src_type::value_type *src_ptr = src->data();
const typename src_type::value_type *src_end = src_ptr + src->allocsize();
while (src_ptr < src_end)
*dst_ptr++ = RoundAndClip<dst_type::value_type>(*src_ptr++ * a + b, defaultstorage);
const std::int64_t totalsize = src->allocsize();
// TODO-Performance: There is a risk of the OpemMP overhead being larger
// then the speedup gained by multiple threads. I ran tests only in one
// environment. It seemed safe by a wide margin if there was at least one
// standard-sized brick (256 KB to 1 MB) being processed. I am still
// worrying, because technically there is no upper limit to how much
// overhead OpenMP might add. While doing serial processing has a fixed
// and not that dramatic cost.
#pragma omp parallel for if(totalsize >=256*1024)
for (std::int64_t ii=0; ii<totalsize; ++ii) {
dst_ptr[ii] = RoundAndClip<dst_type::value_type>(src_ptr[ii] * a + b, defaultstorage);
return dst;
......@@ -35,6 +35,7 @@
#include <unistd.h>
#include <mutex>
#include <atomic>
#include <omp.h>
using OpenZGY::IOContext;
namespace InternalZGY {
......@@ -163,7 +164,7 @@ LocalFileLinux::LocalFileLinux(const std::string& filename, OpenMode mode, const
// Use the SummaryPrintingTimer extension that keeps track of bytes done.
// The instance allocated in the base class constructor is simply dropped.
_rtimer.reset(new SummaryPrintingTimerEx("File::read"));
_rtimer.reset(new SummaryPrintingTimerEx(mode == OpenMode::ReadWrite || mode == OpenMode::Truncate ? "File::reread" : "File::read"));
_wtimer.reset(new SummaryPrintingTimerEx("File::write"));
switch (mode) {
......@@ -295,6 +296,24 @@ LocalFileLinux::xx_read(void *data, std::int64_t offset, std::int64_t size, Usag
* \details: Thread safety: Yes, assuming that the linux ::pread is thread safe.
* If the caller passes parallel_ok=true this means the caller allows and
* even prefers that we deliver each request on a different thread. This
* parallelization comes in addition to allowing multiple reads in parallel
* at the OpenZGY API level.
* Caveat: Consider carefully whether you want both. If the
* application uses OpenMP for multi threading then by default nested
* parallel regions are disabled. You can change this. If the
* application uses some other mechanism than OpenMP used here might
* not realize that it is creating nested loops. Or maybe it does, if
* it uses an application-wide thread pool?
* Caveat: Since finalize() is single threaded then it should probably
* enable parallel here. One problem is that the application might
* still be inside an OpenMP loop, using a lock to make sure that
* finalize() runs unmolested. OpenMP wikk still see it is inside a
* parallel region so it might refuse to make one here.
LocalFileLinux::xx_readv(const ReadList& requests, bool parallel_ok, bool immutable_ok, bool transient_ok, UsageHint usagehint)
......@@ -303,10 +322,67 @@ LocalFileLinux::xx_readv(const ReadList& requests, bool parallel_ok, bool immuta
// If xx_read() is overridden then whoever did that wouldn't expect
// xx_readv() to change. The fact that I choose to implement one in
// terms of the other is an implementation detail.
for (const ReadRequest& r : requests) {
std::unique_ptr<char[]> data(new char[r.size]);
this->LocalFileLinux::xx_read(data.get(), r.offset, r.size, usagehint);, r.size);
if (!parallel_ok || requests.size() < 2) {
for (const ReadRequest& r : requests) {
std::unique_ptr<char[]> data(new char[r.size]);
this->LocalFileLinux::xx_read(data.get(), r.offset, r.size, usagehint);, r.size);
else {
// OpenMP needs signed loop variable on windows.
const std::int64_t requestcount = requests.size();
// Re-use buffers within one thread, to avoid lock contention in
// the CRT. Assume that in almost all cases the requests will have
// the same size. If this is not true and the sizes vary wildly
// then we may be wasting memory here. Even more memory than the
// size being requested.
// TODO-Low, if number of requests per call is typically less than
// the number of available threads then the re-use is pointless.
std::int64_t maxsize = 0;
for (const ReadRequest& r : requests)
maxsize = std::max(maxsize, r.size);
// Cannot use more threads than we have requests, and OpenMP might
// not be smart enough to see this. Definitely not if the parallel
// region starts before the for loop, as is needed to reuse
// buffers. And sorry for the pedantic guard against more than
// 2**31 bricks.
const int threadcount = std::min(std::min(std::numeric_limits<int>::max(), static_cast<int>(requestcount)), omp_get_max_threads());
// Exceptions thrown out of an OpenMP loop are fatal, so I need to
// handle them here.
std::atomic<int> errors(0);
std::string first_error;
#pragma omp parallel num_threads(threadcount)
std::unique_ptr<char[]> data(new char[maxsize]);
#pragma omp for
for (std::int64_t ii=0; ii<requestcount; ++ii) {
if (errors.load() != 0)
const ReadRequest& r = requests[ii];
try {
this->LocalFileLinux::xx_read(data.get(), r.offset, r.size, usagehint);, r.size);
catch (const std::exception& ex) {
if (errors.fetch_add(1) == 0) {
auto what = ex.what();
first_error = std::string(what && what[0] ? what : "EXCEPTION");
} // end omp for
// end parallel
if (errors.load() != 0) {
// The distinction between UserError, EndOfFile, and InternalError
// (and maybe even others) is lost. If it matters I can handle code
// for this as well.
throw OpenZGY::Errors::ZgyInternalError(first_error);
......@@ -144,7 +144,7 @@ FileWithPerformanceLogger::add(const Timer& timer, std::int64_t blocksize)
const double fbin = (value - _histmin) / binwidth;
const int ibin = (fbin < 0 ? 0 :
fbin > (nbuckets-1) ? nbuckets-1 :
_histbins[ibin] += 1;
_nsamples += 1;
_statsum += value;
......@@ -230,7 +230,7 @@ FileWithPerformanceLogger::dumpThroughput(bool clear)
std::lock_guard<std::mutex> lk(_mutex);
std::stringstream ss;
if (_sumtimerbeg < _sumtimerend && _sumbytes > 0) {
const double bytecount = _sumbytes;
const double bytecount = static_cast<double>(_sumbytes);
const double elapsed = _sumtimerend - _sumtimerbeg;
// Note, if you want to emulate an interval timer running
// regardless of whether there was any traffic then round
......@@ -1192,6 +1192,25 @@ SeismicStoreFile::xx_readv(const ReadList& requests, bool parallel_ok, bool immu
this->_config->_debug_trace("readv", /*need=*/asked, /*want=*/realsize,/*parts*/ work.size(), this->_dataset->info()->allSizes(-1));
// Do the actual reading, sequentially, one consolidated chunk at a time.
// TODO-Performance, can this easily be parallelized?
// * I probably need a config option for max threads to avoid
// overloading the data server.
// * Worry: Can there be multiple requests targeting the same area
// of the output buffer? Probably not although there can be multiple
// read requests for the same area of the file.
// If parallel_ok, can I then deliver data as it is received without
// waiting for the last bit? That allows reading and e.g. decompressing
// in parallel. Not a big deal if application has multiple reads in
// flight. Otherwise this might in theory double the speed.
// * Tricky to implement. Receiver doesn't allow partial delivery.
// So if one request requires concatenating data from multiple
// cloud reads then this needs to wait until the end. Or of really
// fancy, keep track of when all the data has need read for each
// of the original requests.
for (const auto& it : work) {
......@@ -1200,6 +1219,25 @@ SeismicStoreFile::xx_readv(const ReadList& requests, bool parallel_ok, bool immu
// TODO-Performance, if parallel_ok, can I parallelize only this
// loop if it gets too difficult to do it inside the above loop?
// This can help speed up ZgyInternalBulk::readToExistingBuffer().
// * At this level. each request might be a large consolidated
// read. This means that in addition to the code here, class
// ConsolidateRequests::_consolidated_delivery might also need to
// change. This is the functor responsible for breaking the jumbo
// requests back into the original requests. It might also need
// to do parallelizing. Which means issues with nested OpenMP
// loops. One unpalatable alternative is to break encapsulation
// and do some of _consolidated_delivery's processing here. Ouch.
// * If I decide I won't even try to do the concurrent read and
// process described earlier then there is a (I think) much
// simpler alternative. Make a "parallelizer" adaptor that first
// does all the reads and then returns them in parallel. I may
// need to allocate yet another buffer for this, though.
std::int64_t pos = 0;
for (const ReadRequest& rr : new_requests) {
std::int64_t this_size = std::max((std::int64_t)0, std::min(rr.size, current_eof - rr.offset));
......@@ -644,6 +644,18 @@ InfoHeaderAccess::bytespersample() const
* The smallest value in storage (e.g. -128) should map to codingrange[0]
* and the largest value (e.g. +127) should map to codingrange[1].
* If file_dtype is a floating point type there is never any conversion.
* Note that the codingrange stored in file is two float32 numbers i.e.
* 24 bits mantissa. This is ok for our purpose because it is only used
* for converting int8 and int16 data so the user doesn't expect anything
* more than 16 bits of precision. But to avoid numerical issues with
* the transform the codingrange should immediately be upcast to float64
* befor being used. Case in point: (double)(hi-lo) is less accurate than
* ((double)hi - (double)lo) when hi,lo are float32. Admittedly only by
* 1 ULPS but the error might be amplified later. Enough to raise errors
* when testing even if the signal still has 16 bits of precision intact.
* See also ZgyInternalBulk._scaleToFloat in OpenZGY/Python.
InfoHeaderAccess::storagetofloat() const
......@@ -652,8 +664,22 @@ InfoHeaderAccess::storagetofloat() const
const RawDataTypeDetails info(this->datatype());
if (info.is_integer) {
const std::array<float,2> range = this->safe_codingrange();
slope = (range[1] - range[0]) / (info.highest - info.lowest);
slope = ((double)range[1]-(double)range[0]) / (info.highest - info.lowest);
intercept = range[0] - slope * info.lowest;
#if 0 // Debug issues with numerical inaccuracy.
static double debug_old_slope = 0, debug_old_intercept = 0;
if (debug_old_slope != slope || debug_old_intercept != intercept) {
std::cerr << "@@ storagetofloat " << std::setprecision(14)
<< " slope " << slope
<< " intercept " << intercept
<< " range " << range[0] << " " << range[1]
<< " test " << slope*-32768+intercept
<< " " << slope*32767+intercept
<< std::endl;
debug_old_slope = slope;
debug_old_intercept = intercept;
return std::array<double,2>{slope,intercept};
......@@ -97,7 +97,7 @@ public:
static void getValue_s(char *buf, int len, const char *name, int count, double total, double adjusted, bool running, bool details, bool msonly);
const char* getValue(bool details = false, bool msonly = false);
// Shouldn't be public but might come in handy if extending Timer.
double getLastStop() const { return end_; }
double getLastStop() const { return static_cast<double>(end_); }
void setVerbose(int v) { verbose_ = v; }
......@@ -44,6 +44,7 @@ void test_databuffer_construct()
//** Allocate a new buffer and take ownership of it.
DataBufferNd<float,3> buffer(std::array<std::int64_t,3>{64, 50, 100});
TEST_CHECK( != nullptr);
TEST_CHECK( == buffer.voidData().get());
......@@ -111,6 +112,7 @@ void test_databuffer_scale()
//** Regular buffer
auto buffer = std::make_shared<intbuffer_t>(size, stride);
buffer->data()[0] = 0;
buffer->data()[1] = 1;
buffer->data()[2] = 7;
......@@ -178,6 +180,7 @@ void test_databuffer_copyfrom()
// A user has requested reading data from survey's origin and with
// size (100, 90, 110). The buffer is allocated by us.
DataBufferNd<float,3> target(std::array<std::int64_t,3>{100,90,110});
std::array<std::int64_t,3> targetorigin{0, 0, 0};
// 64^3 samples worth of -999.25 has been read from (0,0,64) in the survey
......@@ -189,7 +192,7 @@ void test_databuffer_copyfrom()
target.copyFrom(&brick, orig1,, nullptr, nullptr);