allow limiting OpenMP/ThreadPool num of threads at runtime (via omp_num_threads)
Hello,
It seems, that setting OMP_NUM_THREADS=<int>
doesn't seem to have an effect, due to constants and limiting logic src/OpenVDS/VDS/WaveletOpenMP.h
and
#define WAVELET_OPENMP_SSE_THREAD_COUNT 4
#define WAVELET_OPENMP_MEMORY_THREAD_COUNT 2
namespace Wavelet {
inline int Wavelet_GetEffectiveOpenMPThreadCount(int wantedThreadCount)
{
return std::max(1, std::min(omp_get_num_procs() - 2, wantedThreadCount));
}
}
and how that's used here
src/OpenVDS/VDS/WaveletAdaptiveLLDecompress.cpp:537: const int threadCount = Wavelet_GetEffectiveOpenMPThreadCount(WAVELET_OPENMP_SSE_THREAD_COUNT);
src/OpenVDS/VDS/WaveletAdaptiveLLDecompress.cpp:666: const int threadCount = Wavelet_GetEffectiveOpenMPThreadCount(WAVELET_OPENMP_SSE_THREAD_COUNT);
src/OpenVDS/VDS/WaveletDecompress.cpp:636: WaveletTransform_InverseTransform_SSE(Wavelet_GetEffectiveOpenMPThreadCount(WAVELET_OPENMP_SSE_THREAD_COUNT), tempBuffer.data(), tempBufferSize, source, m_transformIterations, m_bandSize, m_transformMask, m_allocatedSizeX, m_allocatedSizeXY, m_integerInfo);
which always defaults to omp_get_num_procs
, which is not ideal if the underlying machine is shared between workloads, it's spawning threads either based on WAVELET_OPENMP_SSE_THREAD_COUNT 4
or num_procs - 2
, whereas we would like to control max allowed by OMP_NUM_THREADS
.
Is this something that can be improved? Or there are some designs for WaveletAdaptive implementation, that must have at least 4 compute threads.
context: Our recent observation yielded ~200+ Open VDSCopy threads on a 100+ core machine, which is of a shared nature, causing some other problems and unpredictable bursts of load when VDSCopy is invoked.
EDIT: seems the rest of the threads might be coming from threadPool, can we have a runtime limitation for this as well?
src/OpenVDS/IO/IOManagerInMemory.cpp:11: , m_threadPool(std::thread::hardware_concurrency())
src/OpenVDS/VDS/VolumeDataRequestProcessor.cpp:1721: , m_threadPool(std::thread::hardware_concurrency())
I can submit a PR if you like.
Thanks, Filip