2024-10-20
import os
import urllib.request
import xarray
import shutil
if not os.path.exists("some_data"):
urllib.request.urlretrieve("https://example.org/some_data.zip", "some_data.zip")
shutil.unpack_archive("some_data.zip", "some_data")
ds = xr.open_mfdataset("some_data/*.nc")
vs
e.g. STAC datasets
Metadata in catalogs can be accessed faster
than when burried inside datasets.
This enables quick browse, search and quicklook tools.
Once data is moved, just update the catalog and users seemlessly access data from new location.
😬
Complex catalog entries can be used to concatenate, mix, slice etc… a collection of poorly prepared datasets.
May be better than nothing, but usually comes with bad performance impact.
A list / tree / collection of catalog entries.
May be static, dynamic, searchable, etc.
There are many computing facilities.
We want to work together.