2024-10-20
the time it takes until the analysis plot is ready
Useful output is
written once and
read at least once.
optimize output for analysis
(not write throughput)
(for this talk)
figure from xarray documentation
$ ls *.nc
ngc2009_atm_mon_20200329T000000Z.nc
ngc2009_oce_2d_1h_inst_20200329T000000Z.nc
ngc2009_atm_pl_6h_inst_20200329T000000Z.nc
ngc2009_lnd_tl_6h_inst_20200329T000000Z.nc
ngc2009_lnd_2d_30min_inst_20200329T000000Z.nc
ngc2009_atm_2d_30min_inst_20200329T000000Z.nc
ngc2009_oce_0-200m_3h_inst_1_20210329T000000Z.nc
ngc2009_oce_0-200m_3h_inst_2_20210329T000000Z.nc
ngc2009_oce_moc_1d_mean_20210329T000000Z.nc
ngc2009_oce_2d_1d_mean_20210329T000000Z.nc
ngc2009_oce_ml_1d_mean_20210329T000000Z.nc
ngc2009_oce_2d_1h_mean_20210329T000000Z.nc
...
$ ls *.nc | wc -l
12695
Grid | Cells |
---|---|
1° by 1° | 0.06M |
10 km | 5.1M |
5 km | 20M |
1 km | 510M |
200 m | 12750M |
Screen | Pixels |
---|---|
VGA | 0.3M |
Full HD | 2.1M |
MacBook 13’ | 4.1M |
4K | 8.8M |
8K | 35.4M |
It’s impossible to look at the entire globe in full resolution.
Analysis scripts are forced to load way too much data.
Plots by Marius Winkler & Hans Segura
scale analysis with screen size
(instead of with model size)
Not necessary for the aforementioned.
… but aligns very well.
… but aligns very well.
Select ICON model output at all
dropsonde locations during EUREC4A field campaign:
sonde_pix = healpix.ang2pix(
icon.crs.healpix_nside, joanne.flight_lon, joanne.flight_lat,
lonlat=True, nest=True
)
icon_sondes = (
icon[["ua", "va", "ta", "hus"]]
.sel(time=joanne.launch_time, method="nearest")
.isel(cell=sonde_pix)
.compute()
)
(55 sec, 1GB, single thread, full code at easy.gems)
(100ms, 250MB, single thread)
Output tested on multiple \(\mathcal{O}(\textrm{PB})\)-scale model runs, 100+ users: