Draft Data request
This data request still is in a draft stadium. To improve it, please open an issue (and a matching pull request). It is based on the Dyamond phase 3 request.
- We request data on a hierarchy of HEALPix grids instead of 0.25 degree resolution.
- We added snowfall_flux, liquid_water_content_of_surface_snow, snow_area_fraction_viewable_from_above, soil_liquid_water_content to the request.
- We specify variable names.
- We specify that hourly 2D data should be time means.
Data grid and vertical levels
The data from global models should be provided on the HEALPix grid on zoom level 9 or higher (effective cell size of 13 km). To ease analysis, please provide all HEALPix levels up to this level.
For regional models, we will need further discussion with teams that have a strong experience in intercomparisons of regional models.
3D Output Levels:
The output should be interpolated to the following 25 pressure levels:
import numpy as np
= np.arange(100,900,100)
tr = np.arange(850,1025,25)
lt = np.arange(10,90,20)
ua = sorted({1,5,20,150,250,750}.union(tr,lt,ua)) levels
Data volume
HEALPix grids have 12*4**level
cells, so a level 9 HEALPix grid consists of roughly 3 million cells. It has proven very beneficial for the analysis to also store all lower grid resolutions. This adds approximately 30% to the output volume, and allows prototyping analyses at lower resolution, or generating maps from an amount of data that actually matches the pixels in a plot. For level 9 and all lower levels together, about 4 million floats are needed per 2D slice. Furthermore, we want the 2D fields on 6-hourly interval as well, and all fields also daily. The totals for storing this data (assuming 4 bytes/float, and 50% compression) are
3D: 4.2TB
2D: 2.8TB
total: 7.0TB
See below for the code.
Without the hierarchy, the requirements are:
type | cells / snapshot | MB / snap-shot1 | snapshots | GB / var | vars | GB total |
---|---|---|---|---|---|---|
2D | 3 M | 6 | 365*24 | 52 | 33 | 1650 |
3D | 75 M | 150 | 365*4 | 220 | 12 | 2600 |
Note that for any additional healpix level, the requirements grow by a factor of 4, so a ~6km resolution dataset (HEALPix level 10) already consumes about 20 TB.
File formats
In principle any file format that is compatible with standard software could be used. However, zarr has proven very advantageous, as it allows to
- build large datasets covering anything up to an entire simulation output
- chunk data in all dimensions
If plain zarr 2 is used, data can be read in many programming languages. For C-based software, a recent libnetcdf will do the trick. The downside of this approach is a lot of small files, which can be problematic on HPC systems, especially with inode quota.
Other possible approaches include the use of kerchunk in python for grouping data chunks in (netCDF/HDF5) files into unified datasets that look like zarr to python. Other programming languages / codes can then still make use of the underlying netCDF files.
Variables
For some models, the hydrometeor categories may not map directly onto the specified output. In these cases hydrometeor habits can be left out (for instance if snow and cloud ice are not distinguished), or additional information can be added, e.g., for models with hail. In such cases, please try to follow the CF conventions, and open an issue, so we can amend the table and keep the naming consistent among all teams.
3D Output Variables, write instantaneous values at 6hr interval
CF standard name | short name | units |
---|---|---|
geopotential height | zg | m |
eastward_wind | ua | m s-1 |
northtward_wind | va | m s-1 |
upward_air_velocity | wa | m s-1 |
temperature | ta | K |
relative_humidity | hur | - |
specific_humidity | hus | kg kg-1 |
mass_fraction_of_cloud_liquid_water_in_air | clw | kg kg-1 |
mass_fraction_of_cloud_ice_in_air | cli | kg kg-1 |
mass_fraction_of_rain_in_air | qr | kg kg-1 |
mass_fraction_of_snow_water_in_air | qs | kg kg-1 |
mass_fraction_of_graupel_in_air | qg | kg kg-1 |
2D Output Variables, write at 1hr interval as mean values
CF standard name | short name | units | comment |
---|---|---|---|
atmosphere_mass_content_of_cloud_condensed_water | clwvi | kg m-2 | |
atmosphere_mass_content_of_cloud_ice | clivi | kg m-2 | |
surface_downward_latent_heat_flux | hflsd | W m-2 | direction included in short name |
surface_downward_sensible_heat_flux | hfssd | W m-2 | direction included in short name |
toa_outgoing_longwave_flux | rlut | W m-2 | |
toa_outgoing_longwave_flux_clear_sky | rlutcs | W m-2 | |
toa_incoming_longwave_flux | rldt | W m-2 | |
surface_upwelling_longwave_flux_in_air | rlus | W m-2 | |
surface_upwelling_longwave_flux_in_air_clear_sky | rluscs | W m-2 | |
surface_downwelling_longwave_flux_in_air | rldt | W m-2 | |
surface_downwelling_longwave_flux_in_air_clear_sky | rldscs | W m-2 | |
toa_outgoing_shortwave_flux | rsut | W m-2 | |
toa_outgoing_shortwave_flux_clear_sky | rsutcs | W m-2 | |
toa_incoming_shortwave_flux | rsdt | W m-2 | |
surface_upwelling_shortwave_flux_in_air | rsus | W m-2 | |
surface_upwelling_shortwave_flux_in_air_clear_sky | rsuscs | W m-2 | |
surface_downwelling_shortwave_flux_in_air | rsds | W m-2 | |
surface_downwelling_shortwave_flux_in_air_clear_sky | rsdscs | W m-2 | |
precipitation_flux | pr | kg m-2 s-1 | |
solid_precipitation_flux | prs | kg m-2 s-1 | includes all forms of solid precipitation |
atmosphere_mass_content_of_water_vapor | prw | kg m-2 s-1 | |
surface_air_pressure | ps | Pa | |
air_pressure_at_mean_sea_level | psl | Pa | |
specific_humidity | huss | kg kg-1 | 2m above ground |
air_temperature | tas | K | 2m above ground |
eastward_wind | uas | m s-1 | 10m above ground |
northward_wind | vas | m s-1 | 10m above ground |
surface_temperature | ts | K | |
surface_downward_eastward_stress | tauu | N m-2 | |
surface_downward_northward_stress | tauv | N m-2 | |
cloud_area_fraction | clt | 1 | |
liquid_water_content_of_surface_snow | swe | kg m-2 | short name invented |
snow_area_fraction_viewable_from_above | sncvfa | 1 | short name based on snc for surface_snow_area_fraction |
soil_liquid_water_content | mrso | kg m-2 | short name invented |
sea_ice_area_fraction | siconc | 1 |
2D time-constant Variables
CF standard name | short name | units | comment |
---|---|---|---|
land_area_fraction | sftlf | 1 | |
land_ice_area_fraction | sftgif | 1 | |
surface_altitude | orog | m |
Code for computing the data volume
= 12
vars_3d = 35
vars_2d = 6/24.
interval_3d = 1/24.
interval_2d = 1.
interval_daily = 25
levels_3d
= dict (
params = 9,
max_healpix_level = 365,
duration = 4,
float_precision = .5,
float_compression
)def compute_volume(var_count, levels, interval, max_healpix_level, duration, float_precision, float_compression):
= sum (12 * 4** level for level in range (max_healpix_level + 1))
cells return cells * var_count * levels * duration / interval * float_precision * float_compression
= ( compute_volume(var_count=vars_3d, levels=levels_3d, interval=interval_3d, **params) +
volume_3d =vars_3d, levels=levels_3d, interval=interval_daily, **params))
compute_volume(var_count= (compute_volume(var_count=vars_2d, levels=1, interval = interval_2d, **params) +
volume_2d =vars_2d, levels=1, interval = interval_3d, **params) +
compute_volume(var_count=vars_2d, levels=1, interval = interval_daily, **params))
compute_volume(var_countprint (f'3D: {volume_3d/1024**4:.1f}TB\n2D: {volume_2d/1024**4:.1f}TB\ntotal: {(volume_3d+volume_2d)/1024**4:.1f}TB')
Footnotes
Assuming 4-byte floats and 50% compression↩︎