Commit 6dc52a79 authored by Paal Kvamme's avatar Paal Kvamme
Browse files

More work on documenting multi-threading support.

parent c2a9e7b6
This diff is collapsed.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
<head>
<title>Multi-threading in OpenZGY/C++</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<style>
th, td {padding: 2px 10px 2px 10px;}
th.section {padding: 14px 10px 14px 10px;}
p.page, h2.page {page-break-before: always}
p.version {font-style: italic; font-size: smaller; text-align: right;}
</style>
</head>
<body>
<p class="version">Last updated: 2020-11-25</p>
<h1>Multi-threading in OpenZGY/C++ (rejected ideas)</h1>
<h2>Allowing concurrent writes from application code in OpenZGY/C++</h2>
<div style="border: 1px solid black; margin-left: 30px; padding: 5px; width: 400px; color: #c00;">
After analyzing the options in detail, my gut feeling is that
this not the way to go. The file is just here in case I change
my mind. Or to discourage others from trying this.
</div>
<p>
To make the write() method thread safe for application code it
will need to place a lock whenever it needs to access a shared
resource. The risk of forgetting a lock somewhere is high. I
believe it is safer (but still not risk free) is to by default a
per-file lock should be held whenever doing any bulk access.
The lock can then be temporarily dropped then re-acquired when
doing simple operations that clearly do not access shared data.
The risk now is to make sure that general lock is set in the
first place.
</p>
<p>
The general rule will be to set the lock when transitioning from
the api layer to the implementation layer. The lock is optional
when accessing metadata that is immutable once the file has been
opened. A more detailed design is shown in the following figure.
</p>
<img src="multithreading-fig1.png" width="1024" height="768"
alt="Figure 1: call graph showing locks"/>
<p>
To avoid deadlocks, all locks will be assigned a rank. Code that
is holding a lock with rank N is forbidden to call code that
directly or indirectly might cause a lock with the same or
higher rank to be acquired. Or to put it simpler: If you want
multiple locks then you need to acquire them in a specific
order.
</p>
<p>
Note that I won't try to check this at runtime. That is rather
difficult. Also not worth he trouble when there are just a few
locks.
</p>
<p>
The API layer will not have any locks.
</p>
<p>
The implementation layer will have one lock per open file in
ZgyInternalBulk::_bulk_lock (rank 10) to protect reading and
writing data bricks and for reading and changing the lookup
tables.
</p>
<p>
The implementation layer can also use
ZgyInternalBulk::_misc_lock (rank 0) as a very short lived
protection of some data structures. Rank 0 implies that no other
locks may be acquired while holding _misc_lock, while it is safe
to hold any other lock already.
</p>
<p>
The meta part of the implementation layer doesn't in general
need any locking because most the information is immutable after
the file has been opened. The exceptions are handled by finding
out which API functions might trigger changing the value, and
then declaring that as thread safe. Lookup tables (sort of part
of the meta data) are protected by _bulk_lock.
</p>
<p>
This means it is safe to access metadata both with and without
holding _bulk_lock.
</p>
<p>
Since _bulk_lock is owned by ZgyInternalBulk but protects data
in two of the meta headers (lookup tables), it is the
responsibility of ZgyInternalBulk members to make sure the lock
is held when accessing those. And, conversely, no other code
than members of ZgyInternalBulk are allowed to access them. TODO
verify.
</p>
<p>
Any code that implements caching is responsible for protecting
the cache. If a lock is needed then this will have a lower rank
than _bulk_lock. This means that _bulk_lock may or may not be
held when the cache is accessed. AND, it means that while the
cache lock is held, the code is forbidden to call any method
that could potentially cause _bulk_lock to be acquired. Since
the file layer sits below the bulk- and meta layer this ought to
be doable.
</p>
<p>
A file back-end may declare itself as either thread safe or not.
If not thread safe then _bulk_lock will probably be held when
doing reads or writes. But don't count on it. Related:
xx_threadsafe() applies to both read and write. Do I needs more
granularity?
</p>
<p>
Several methods in ZgyReader, ZgyWriter, ZgyUtil will be declared as
not thread safe. No other operations, even those methods normally
considered thread safe, may be pending when these are executed. The
code will not set a lock because that could easily deadlock. There
might be a check though. Set a flag saying that one of these
operations are in progress and by whom. The flag is a simple variable
because it is protected by the main lock. Anybody that sets or
re-acquires the main lock should then immediately check whether the
"owner" flag is set and is not this thread. Throw an error if the
check fails.
</p>
<ul>
<li>(constructor)</li>
<li>(destructor)</li>
<li>finalize()</li>
<li>close()</li>
<li>set_numthreads()</li>
<li>setAnnotationAndWorld()</li>
</ul>
<p>
The following methods might need a private lock.
</p>
<ul>
<li>{set_,}errorflag()</li>
</ul>
<p>
These methods will set the file-level lock: MAKE SURE they are not
accidentally called from below with the lock already held. Yes I have
heard about recursive locks. But those are a major code smell and
suggests your call graph looks like spaghetti.
</p>
<ul>
<li>ZgyInternalBulk::readConstantValue</li>
<li>ZgyInternalBulk::readToExistingBuffer</li>
<li>ZgyInternalBulk::writeRegion</li>
</ul>
<p>
For purposes of locking, the following will be treated as part
of the API layer. I.e. they will not lock, and it is not allowed
to call them with the lock already set.
</p>
<ul>
<li>ZgyInternalBulk::readToNewBuffer</li>
<li>GenLodC::call()</li>
</ul>
<p>
These methods are at a lower level. Unless otherwise noted the
lock should already be held when they are invoked.
</p>
<ul>
<li>ZgyInternalBulk::(most of the private methods)</li>
</ul>
<p>
One important exception: Inside
ZgyInternalBulk::readToExistingBuffer() the lock will be dropped
before _file->xx_readv() is invoked, if the backend is thread
safe. If not then the lock should be dropped inside the
deliverance lambda when the read returns. Either way the
ZgyInternalBulk::_deliverOneBrick() method expects to be called
without the lock.
</p>
<h3>Race condition in read/modify/write</h3>
<p>
If the application uses multi threaded writes of data that is
not aligned to brick boundaries then the library needs to
process these as read/modify/write. This is problematic with
respect to multi threading because the read part might need data
from a brick waiting to be flushed from another thread. There
are ways around this but all are somewhat unpalatable.
</p>
<ul>
<li>
<p>
Do locking at block level, so a read from a brick that is
pending write is blocked. This adds both overhead and
complexity.
</p>
</li>
<li>
<p>
Do locking of rectangular regions, done at the same place
where the regular lock is acquired. This locking is coarser
and thus has less overhead but more expensive to code.
</p>
</li>
<li>
<p>
Only allow muli threaded write when the output is a
compressed local file. Compressed files are not allowed to
(or at least really should not) write misaligned data. This
is what the old ZGY-Public library does. And in the old code
the stand alone compress application was the only one
allowed to do parallel writes. Incidentally, am I sure this
was respected by Petrel? I guess that question is getting
rather academic now.
</p>
</li>
<li>
<p>
Only allow writing misaligned data if the application
explicitly promises not to use multi threading. Throw an
error if the app tries to multi-thread without making that
promise. I don't like this solution because it makes the
application even more aware of the ZGY brick size, Which
ought to have been an implementation detail.
</p>
</li>
<li>
<p>
Whenever a misaligned request is seen, place a super-lock
that eventually causes all the current writes to complete.
Then perform the r/m/w in a single thread, then open up for
regular traffic. This is simpler than "lock rectangular
regions" but does more coarse grained locking. The main
caveat with this approach is that if the application makes
one misaligned write request it is reasonable to assume that
most subsequent requests will also be misaligned. Which
means the execution ends up completely serialized anyway.
</p>
</li>
</ul>
<h3>Blocks written in the wrong order</h3>
<p>
When a zgy file is on the cloud or meant to be moved to the
cloud it needs to have its bricks written in a specific
order for performance. This gets tricky to handle if doing
writes in parallel. A compromise may be needed. Application
must write full brick-columns (ensuring that columns stay
together but giving less opportunities for parallelization.
The file on the other hand allows individual columns to be
stored at random locations. Giving somewhat but not terribly
worse performance.
</p>
<p>
Another possibility is to have the application explicitly
state which regions will have non-constant data. If it kows.
All bricks can then be pre-allocated in the right order
before any real writes start. This does make the api more
difficult to use, though. This is generally considered a bad
thing.
</p>
</body>
</html>
This diff is collapsed.
......@@ -79,11 +79,17 @@ namespace OpenZGY { namespace Impl {
/**
* \brief convert between internal and external data types.
*
* \details Thread safety:
* Thread safe because the class only contains static methods with no state.
*
* \internal
* TODO-Test: Move to a separate file to allow it to be unit tested.
*/
class EnumMapper
{
EnumMapper() = delete;
EnumMapper(const EnumMapper&) = delete;
EnumMapper& operator=(const EnumMapper&) = delete;
public:
static SampleDataType mapRawDataTypeToSampleDataType(InternalZGY::RawDataType);
static UnitDimension mapRawHorizontalDimensionToUnitDimension(InternalZGY::RawHorizontalDimension);
......@@ -122,6 +128,23 @@ public:
* The FileADT and ZgyInternalBulk pointers can be declared here since
* both the reader and the writer needs them. Or removed from here and
* duplicated in ZgyReader and ZgyWriter.
*
* Thread safety: TODO-High: THE FOLLOWING IS NOT ENFORCED.
* The following applies both to this class and derived types.
* Modification may lead to a data race. The intent for this class is
* that any calls a ZgyReader instance might make will not modify data.
* Except possibly changes done while the file is being opened.
* FUTURE: The instance holds one or more pointers to the data it
* needs. The pointer targets expose methods, never data, and make are
* declared const-correct. Both const and mutable data pointers are
* stored. The mutable pointers will be left empty if the file is only
* open for read. Accessing the mutable pointers should be done using
* a wrapper that checks for null. Failure to use the wrapper will not
* compromise thread safety but will cause a null pointer exception
* instead of a proper message if there is a programmer error.
* For additional complication, note that there are links between
* ZgyMeta and ZgyInternalBulk which means the latter would also need
* this kind of double pointers.
*/
class ZgyMeta: virtual public IZgyMeta
{
......@@ -295,6 +318,11 @@ public:
/**
* \brief Add coordinate conversion to the concrete ZgyMeta class.
*
* \details Thread safety: See the base class. This specialization
* does not add any additional concerns because all the new members
* are const. So they are safe to call concurrently with other read
* operations.
*/
class ZgyMetaAndTools: public ZgyMeta, virtual public IZgyTools
{
......@@ -359,6 +387,7 @@ public:
/**
* \brief Concrete implementation of IZgyReader.
* \details Thread safety: See the base class.
*/
class ZgyReader : public ZgyMetaAndTools, virtual public IZgyReader
{
......@@ -516,6 +545,17 @@ public:
/**
* \brief Concrete implementation of IZgyWriter.
*
* \details Thread safety: By design this class is not thread safe.
* Also by design this is not enforced using locks. The calling
* application is assumed to know what is doing. FUTURE: Might
* consider adding locks so that applications that are inherently
* multi threaded won't need to do their own synchronization. But this
* is a slippery slope. Proper locking of write operations means we
* also need read locks on all read operations e.g. to handle
* read/modify/write. Will we need that as well? Probably. Telling the
* application that *some* locking is in place might be a rather bad
* idea.
*/
class ZgyWriter : public ZgyMetaAndTools, virtual public IZgyWriter
{
......@@ -897,6 +937,11 @@ public:
/**
* \brief Concrete implementation of IZgyUtils.
*
* \details Thread safety: Depends on the function being executed.
* Currently implemented methods just forward the requests to SDAPI
* or some other cloud back end. So thread safety depends on the
* cloud provider.
*
* \internal TODO-Low: One utility class per backend plug-in.
* This will be a pain to maintain though; as the
* Python wrapper will need to do the same.
......
......@@ -204,6 +204,8 @@ namespace Impl {
*
* Corresponds to RawDataType used in the ZGY file format.
* Data may be read and written using this type or 32-bit float.
*
* Thread safety: Enums do not have race conditions.
*/
enum class OPENZGY_API SampleDataType
{
......@@ -225,6 +227,8 @@ namespace Formatters {
* length. Vertical length is of course the same as depth. Arguably
* there should have been separate enums for horizontal and vertical
* dimension since the allowed values differ.
*
* Thread safety: Enums do not have race conditions.
*/
enum class OPENZGY_API UnitDimension
{
......@@ -245,6 +249,8 @@ namespace Formatters {
* The "classic" ZGY only uses the first two.
*
* Maps to LodAlgorithm in the implementation layer.
*
* Thread safety: Enums do not have race conditions.
*/
enum class OPENZGY_API DecimationType
{
......@@ -271,6 +277,11 @@ namespace Formatters {
/**
* \brief Statistics of all sample values on the file.
*
* \details Thread safety:
* Modification may lead to a data race. This should not be an issue,
* because instances are only meant to be modified when created or
* copied or assigned prior to being made available to others.
*/
class OPENZGY_API SampleStatistics
{
......@@ -331,6 +342,11 @@ public:
* The range should have the zero-centric property. Zero, if present in the
* range, should map to the center of a bin. Otherwise the histogram close to
* zero might look odd.
*
* Thread safety:
* Modification may lead to a data race. This should not be an issue,
* because instances are only meant to be modified when created or
* copied or assigned prior to being made available to others.
*/
class OPENZGY_API SampleHistogram
{
......@@ -388,6 +404,11 @@ public:
* .zinc(0)
* .corners(ZgyWriterArgs::corners_t{0,0,0,0,0,0,0,0});
* \endcode
*
* Thread safety:
* Modification may lead to a data race. This should not be an issue,
* because instances are only meant to be modified when created or
* copied or assigned prior to being made available to others.
*/
class OPENZGY_API ZgyWriterArgs
{
......@@ -597,6 +618,7 @@ ZgyWriterArgs& datarange(float lo, float hi) { _datarange[0] = lo; _datarange[1]
/**
* \brief Base class of IZgyReader and IZgyWriter.
* \details Thread safety: Interfaces do not have race conditions.
*/
class OPENZGY_API IZgyMeta
{
......@@ -644,6 +666,7 @@ public:
/**
* \brief Base class of IZgyReader and IZgyWriter.
* \details Thread safety: Interfaces do not have race conditions.
*/
class OPENZGY_API IZgyTools : virtual public IZgyMeta
{
......@@ -691,6 +714,8 @@ public:
* Note: Yes, I understand that the open() factory method should not have
* been lexically scoped inside the ostensibly pure %IZgyReader interface.
* But this reduces the number of classes a library user needs to relate to.
*
* Thread safety: Interfaces do not have race conditions.
*/
class OPENZGY_API IZgyReader : virtual public IZgyTools
{
......@@ -727,6 +752,8 @@ public:
* Note: Yes, I understand that the open() factory method should not have
* been lexically scoped inside the ostensibly pure %IZgyWriter interface.
* But this reduces the number of classes a library user needs to relate to.
*
* Thread safety: Interfaces do not have race conditions.
*/
class OPENZGY_API IZgyWriter : virtual public IZgyTools
{
......@@ -762,6 +789,8 @@ public:
* Any operations that don't fit into IZgyReader or IZgyWriter go here.
* Such as deleting a file. Or any other operation that does not need
* the file to be open first.
*
* Thread safety: Interfaces do not have race conditions.
*/
class OPENZGY_API IZgyUtils
{
......@@ -838,6 +867,9 @@ public:
*
* When passing a progress reporter to a function, make sure you do not
* pass the class itself. You need to create an instance of it.
*
* Thread safety:
* TODO-High: NEEDS TO BE MADE THREAD SAFE.
*/
class OPENZGY_API ProgressWithDots
{
......
......@@ -44,6 +44,7 @@ namespace OpenZGY { namespace Errors {
/**
* \brief Base class for all exceptions thrown by %OpenZGY.
* \details Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyError: public std::runtime_error
{
......@@ -62,6 +63,8 @@ protected:
* In some cases a corrupted file might lead to a ZgyInternalError
* or ZgyEndOfFile being thrown instead of this one. Because it isn't
* always easy to figure out the root cause.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyFormatError: public ZgyError
{
......@@ -80,6 +83,8 @@ public:
* The safe approach is to assume that the error caused the file to
* become corrupted. It is recommended that the application closes and
* deletes the file.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyCorruptedFile: public ZgyError
{
......@@ -94,6 +99,8 @@ public:
* Determining whether a problem is the fault of the calling application
* or the %OpenZGY library itself can be guesswork. Application code
* might choose to treat ZgyUserError and ZgyInternalError the same way.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyUserError: public ZgyError
{
......@@ -111,6 +118,8 @@ public:
*
* A corrupt file might also be reported as ZgyInternalError instead of
* the more appropriate ZgyFormatError.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyInternalError: public ZgyError
{
......@@ -124,6 +133,8 @@ public:
*
* This is always considered an error, and is often due to a corrupted
* ZGY file. So this error should probably be treated as a ZgyFormatError.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyEndOfFile: public ZgyError
{
......@@ -139,6 +150,8 @@ public:
* written had already been flushed. And the cloud back-end does not
* allow writing it again. The calling code, still inside the OpenZGY
* library, should be able to catch and recover from this problem.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgySegmentIsClosed: public ZgyError
{
......@@ -154,6 +167,8 @@ public:
* false then the operation in progress will and by throwing this
* exception. Which means that this is not an error; it is a consequence
* of the abort.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyAborted: public ZgyError
{
......@@ -168,6 +183,8 @@ public:
* Raised if some optional plug-in (e.g. some cloud back end or a
* compressor) was loaded or explicitly requested, so we know about
* it, but the plug-in is not operational for some reason.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyMissingFeature: public ZgyError
{
......@@ -182,6 +199,8 @@ public:
* Some error was received from a linux syscall acting on a file.
* For cloud access the actual exception thrown by the back end
* might be reported instead of wrapping it into a ZgyIoError.
*
* Thread safety: Exceptions defined in this module are safe.
*/
class OPENZGY_API ZgyIoError: public ZgyError
{
......
......@@ -185,6 +185,13 @@ struct LutInfoEx : public LookupTable::LutInfo
* whether a destructor is being called due to leaving scope normally
* or due to unwinding an exception. So in the C++ version the code
* should explicitly call disarm() at the end of the critical section.
*
* Thread safety: TODO-High: Calls set_errorflag() which is not thread safe.
* Technically this might not be serious since the method is currently
* only being called while writing. It should probably be fixed anyway.
* The class itself is not thread safe but this should not be an issue,
* because instances are meant to be short lived. Typically used inside
* a method and not acceesible outside.
*/
class ZgyInternalBulk::ErrorsWillCorruptFile
{
......
......@@ -44,6 +44,16 @@ struct LutInfoEx;
/**
* Read or write bulk data. The meta data needs to have been read
* already. The user-callable API will forward its read requests here.
*
* Thread safety:
* \li most data members are only used for write.
* Those don't need to be thread safe.
* \li this->_file points to a FileADT which is already thread safe
* where needed.
* \li this->_metadata is problematic because it is too easy to
* accidentally invoke a method that changes data even if the
* file is open for read only. Probably need a solution similar
* to that in class ZgyMeta in api.cpp. TODO-High implement this.
*/
class OPENZGY_TEST_API ZgyInternalBulk
{
......@@ -92,6 +102,7 @@ public:
public: // actually internal
// TODO-High: Make errorflag thread safe using atomics.
bool errorflag() const { return _is_bad; }
void set_errorflag(bool flag) { _is_bad = flag; }
LoggerFn set_logger(const LoggerFn& logger) {
......
......@@ -32,6 +32,10 @@ namespace InternalZGY {
* \details This class is here just as an example and for documentation.
* Real compression classes may use "copydoc" instead of repeating
* what is explained here.
*
* Thread safety: The plug-ins and the compression algorithms they call
* are all supposed to be thread safe. Registering the factory currently
* is not.
*/
class NullCompressPlugin
{
......
......@@ -55,6 +55,9 @@ namespace {
* On the other hand, the Python bindings for ZFP are way simpler to
* use than the C++ api.
*
* Thread safety: The plug-ins and the compression algorithms they call
* are all supposed to be thread safe. Registering the factory currently
* is not.
*/
class ZfpCompressPlugin
{
......
......@@ -34,6 +34,11 @@ namespace InternalZGY {
*
* The compression and decompression algorithms are completely
* separate but we might as well handle both in the same class.
*
* Thread safety: Not thread safe by design.
* Registering plug-ins is expected to be complete before any
* files are opened. TODO-Low: In this class it shouldn't be much
* of a chore to protect with locks.
*/
/**
......
......@@ -43,6 +43,11 @@ namespace InternalZGY {
* (size[0] - 1, size[1] - 1, ?)
*
* Ref: PetrelOrientationHandling
*
* Thread safety:
* Modification may lead to a data race. This should not be an issue,
* because instances are meant to be short lived, used only inside some
* method with no possibility of accessing it outside.
*/
class OPENZGY_TEST_API OrderedCornerPoints
{
......
......@@ -73,9 +73,12 @@ namespace {
* I did not check this list very thoroughly.
*
* Declaring all timers in one spot is not really needed, at least