feat: sdutil chunking upload
This MR allows the explicit upfront chunking of input datasets. At this point, only Azure blob storage uploads are supported
-
If the
chunk-size
parameter is specified then the file is uploaded as multiple chunks ofsize=chunk_size
. Here the size value is in MiBpython sdutil.py cp {{dataset}} sd://datapartition/subproject/dataset --chunk-size=30
-
If no
chunk-size
parameter is specified then the file is uploaded as chunks of size 32MiB -
If the
chunk-size
parameter is set to0
then the file is uploaded as a single objectpython sdutil.py cp {{dataset}} sd://datapartition/subproject/dataset --chunk-size=0
Edited by Varunkumar Manohar