Without subset access and hierarchies, analysis scripts are forced to load way too much data.
Make a compromise that’s okay-ish for everybody by chunking along all dimensions.
As chunks are stored separately, this scales for any size of dataset. We are working with a 500 TB dataset in nextGEMS.
Storage layout, chunk shapes | Read time series (sec) | Read spatial slice (sec) | Performance bias (slowest / fastest) |
---|---|---|---|
Contiguous favoring time range | 0.013 | 180 | 14000 |
Contiguous favoring spatial slice | 200 | 0.012 | 17000 |
Default (all axes equal) chunks, 4673 x 12 x 16 | 1.4 | 34 | 24 |
36 KB chunks, 92 x 9 x 11 | 2.4 | 1.7 | 1.4 |
8 KB chunks, 46 x 6 x 8 | 1.4 | 1.1 | 1.2 |