Draft Data request
This data request still is in a draft stadium. To improve it, please open an issue (and a matching pull request). It is based on the Dyamond phase 3 request.
- We request data on a hierarchy of HEALPix grids instead of 0.25 degree resolution.
- We added snowfall_flux, liquid_water_content_of_surface_snow, snow_area_fraction_viewable_from_above, soil_liquid_water_content to the request.
- We specify variable names.
- We request most 2D fields only as 3-hourly means.
- We specify that hourly 2D data should be instantaneous.
Data grid and vertical levels
The data from global models should be provided on the HEALPix grid on zoom level 10 (effective cell size of 6 km) or zoom 9 (13km). To ease analysis, please provide all HEALPix levels up to this level.
For regional models, we will need further discussion with teams that have a strong experience in intercomparisons of regional models.
3D Output Levels:
The output should be interpolated to the following 25 pressure levels:
import numpy as np
= np.arange(100,900,100)
tr = np.arange(850,1025,25)
lt = np.arange(10,90,20)
ua = sorted({1,5,20,150,250,750}.union(tr,lt,ua)) levels
Data volume
HEALPix grids have 12*4**level
cells, so a level 9 HEALPix grid consists of roughly 3 million cells. It has proven very beneficial for the analysis to also store all lower grid resolutions. This adds approximately 30% to the output volume, and allows prototyping analyses at lower resolution, or generating maps from an amount of data that actually matches the pixels in a plot. For level 9 and all lower levels together, about 4 million floats are needed per 2D slice. Furthermore, we want the 2D fields on 6-hourly interval as well, and all fields also daily. The totals for storing this data (assuming 4 bytes/float, and 50% compression) are
3D: 2.8TB
2D: 1.7TB
total: 4.5TB
See below for the code.
Without the hierarchy, the requirements are:
type | cells / snapshot | MB / snap-shot1 | snapshots | GB / var | vars | GB total |
---|---|---|---|---|---|---|
2D | 3 M | 6 | 365*24 | 52 | 33 | 1650 |
3D | 75 M | 150 | 365*4 | 220 | 12 | 2600 |
Note that for any additional healpix level, the requirements grow by a factor of 4, so a ~6km resolution dataset (HEALPix level 10) already consumes about 20 TB.
File formats
In principle any file format that is compatible with standard software could be used. However, zarr has proven very advantageous, as it allows to
- build large datasets covering anything up to an entire simulation output
- chunk data in all dimensions
If plain zarr 2 is used, data can be read in many programming languages. For C-based software, a recent libnetcdf will do the trick. The downside of this approach is a lot of small files, which can be problematic on HPC systems, especially with inode quota.
Other possible approaches include the use of kerchunk in python for grouping data chunks in (netCDF/HDF5) files into unified datasets that look like zarr to python. Other programming languages / codes can then still make use of the underlying netCDF files.
Variables
For some models, the hydrometeor categories may not map directly onto the specified output. In these cases hydrometeor habits can be left out (for instance if snow and cloud ice are not distinguished), or additional information can be added, e.g., for models with hail. In such cases, please try to follow the CF conventions, and open an issue, so we can amend the table and keep the naming consistent among all teams.
3D Output Variables, write instantaneous values at 6hr interval
standard name | short name | units | comment |
---|---|---|---|
geopotential height | zg | m | |
eastward_wind | ua | m/s | |
northward_wind | va | m/s | |
upward_air_velocity | wa | m/s | (pick appropriate unit for model) |
wap | pa/s | ||
temperature | ta | K | |
relative_humidity | hur | - | |
specific_humidity | hus | kg kg-1 | |
mass_fraction_hydrometeors | qall | kg kg-1 | names invented |
2D Output Variables, write averages at 3hr interval
CF standard name | short name | units | comment |
---|---|---|---|
atmosphere_mass_content_of_cloud_condensed_water | clwvi | kg m-2 | |
atmosphere_mass_content_of_cloud_ice | clivi | kg m-2 | |
surface_downward_latent_heat_flux | hflsd | W m-2 | direction included in short name |
surface_downward_sensible_heat_flux | hfssd | W m-2 | direction included in short name |
toa_outgoing_longwave_flux | rlut | W m-2 | |
toa_outgoing_longwave_flux_clear_sky | rlutcs | W m-2 | |
toa_incoming_longwave_flux | rldt | W m-2 | |
surface_upwelling_longwave_flux_in_air | rlus | W m-2 | |
surface_upwelling_longwave_flux_in_air_clear_sky | rluscs | W m-2 | |
surface_downwelling_longwave_flux_in_air | rlds | W m-2 | |
surface_downwelling_longwave_flux_in_air_clear_sky | rldscs | W m-2 | |
toa_outgoing_shortwave_flux | rsut | W m-2 | |
toa_outgoing_shortwave_flux_clear_sky | rsutcs | W m-2 | |
toa_incoming_shortwave_flux | rsdt | W m-2 | |
surface_upwelling_shortwave_flux_in_air | rsus | W m-2 | |
surface_upwelling_shortwave_flux_in_air_clear_sky | rsuscs | W m-2 | |
surface_downwelling_shortwave_flux_in_air | rsds | W m-2 | |
surface_downwelling_shortwave_flux_in_air_clear_sky | rsdscs | W m-2 | |
precipitation_flux | pr | kg m-2 s-1 | includes all forms of precipitation |
solid_precipitation_flux | prs | kg m-2 s-1 | includes all forms of solid precipitation |
atmosphere_mass_content_of_water_vapor | prw | kg m-2 s-1 | |
surface_air_pressure | ps | Pa | |
air_pressure_at_mean_sea_level | psl | Pa | |
specific_humidity | huss | kg kg-1 | 2m above ground |
air_temperature | tas | K | 2m above ground |
eastward_wind | uas | m s-1 | 10m above ground |
northward_wind | vas | m s-1 | 10m above ground |
surface_temperature | ts | K | |
surface_downward_eastward_stress | tauu | N m-2 | |
surface_downward_northward_stress | tauv | N m-2 | |
cloud_area_fraction | clt | 1 | |
liquid_water_content_of_surface_snow | swe | kg m-2 | short name invented |
snow_area_fraction_viewable_from_above | sncvfa | 1 | short name based on snc for surface_snow_area_fraction |
soil_liquid_water_content | mrso | kg m-2 | short name invented |
sea_ice_area_fraction | siconc | 1 |
2D Output Variables, write at 1hr interval as instantaneous
This list is designed to include key outputs like accumulated precipitation, surface temperature and surface wind speed, as well as other features desired for trackers of convective storms and MCS. Other features requiring vertical information are suggested to be done 6 hourly.
CF standard name | short name | units | comment |
---|---|---|---|
toa_outgoing_longwave_flux | rlut | W m-2 | |
toa_outgoing_shortwave_flux | rsut | W m-2 | |
precipitation_flux* | pr | kg m-2 s-1 | sum of all modes |
air_pressure_at_mean_sea_level | psl | Pa | |
eastward_wind | uas | m s-1 | 10m above ground |
northward_wind | vas | m s-1 | 10m above ground |
surface_temperature | ts | K |
- Average precipitation flux (rate) over the hour is requested. Instantaneous rate for a timestep is okay if that is produced. Please specify.
2D time-constant Variables
CF standard name | short name | units | comment |
---|---|---|---|
land_area_fraction | sftlf | 1 | |
land_ice_area_fraction | sftgif | 1 | |
surface_altitude | orog | m |
Optional: Specific Requests
Several additional requests by specific people have been made. Models are requested to provide them as computer and human time allow.
Baroclinicity in Storms (Maro Giorgetta, MPI-Met)
The purpose of the requested relative vorticity with instantaneous values is to be able to track storms and to assess their baroclinicity. Three levels at 300, 500, and 850 hPa would make it possible. Instantaneous 3 hourly data is needed.
CF standard name | short name | units | comment |
---|---|---|---|
atmosphere_relative_vorticity 300hPa | rva300 | s-1 | |
atmosphere_relative_vorticity 500hPa | rva500 | s-1 | |
atmosphere_relative_vorticity 850hPa | rva850 | s-1 |
Individual Convective Cell Tracking (Zhe Feng, PNNL : Will Jones, Oxford)
For convective cell tracking and case studies, a short re-run is requested: 15 minute instantaneous output is requested at zoom level 9/10 if available. Two 24 or 48 hour periods? Beginning 2020/2/1 and 2020/8/1
2D instantaneous every 15 minutes
CF standard name | short name | units | comment |
---|---|---|---|
toa_outgoing_longwave_flux | rlut | W m-2 | |
toa_outgoing_shortwave_flux | rsut | W m-2 | |
precipitation_flux | pr | kg m-2 s-1 | sum of all modes |
air_pressure_at_mean_sea_level | psl | Pa | |
eastward_wind | uas | m s-1 | 10m above ground |
northward_wind | vas | m s-1 | 10m above ground |
surface_temperature | ts | K |
3D Output Variables, write instantaneous values every 15 minutes
standard name | short name | units | comment |
---|---|---|---|
geopotential height | zg | m | |
eastward_wind | ua | m/s | |
northward_wind | va | m/s | |
upward_air_velocity | wa | m/s | (pick appropriate unit for model) |
wap | pa/s | ||
temperature | ta | K | |
relative_humidity | hur | - | |
specific_humidity | hus | kg kg-1 | |
mass_fraction_hydrometeors | qall | kg kg-1 | names invented |
Code for computing the data volume
= 8
vars_3d = 35
vars_2d_3h = 7
vars_2d_1h = 6/24.
interval_3d = 1/8.
interval_2d_3h = 1/24.
interval_2d_1h = 1.
interval_daily = 25
levels_3d
= dict (
params = 9,
max_healpix_level = 365,
duration = 4,
float_precision = .5,
float_compression
)def compute_volume(var_count, levels, interval, max_healpix_level, duration, float_precision, float_compression):
= sum (12 * 4** level for level in range (max_healpix_level + 1))
cells return cells * var_count * levels * duration / interval * float_precision * float_compression
= ( compute_volume(var_count=vars_3d, levels=levels_3d, interval=interval_3d, **params) +
volume_3d =vars_3d, levels=levels_3d, interval=interval_daily, **params))
compute_volume(var_count= (compute_volume(var_count=vars_2d_3h, levels=1, interval = interval_2d_3h, **params) +
volume_2d_3h =vars_2d_3h, levels=1, interval = interval_3d, **params) +
compute_volume(var_count=vars_2d_3h, levels=1, interval = interval_daily, **params))
compute_volume(var_count= (compute_volume(var_count=vars_2d_1h, levels=1, interval = interval_2d_1h, **params))
volume_2d_1h print (f'3D: {volume_3d/1024**4:.1f}TB\n2D: {(volume_2d_3h+volume_2d_1h)/1024**4:.1f}TB \ntotal: {(volume_3d+(volume_2d_3h+volume_2d_1h))/1024**4:.1f}TB')
Footnotes
Assuming 4-byte floats and 50% compression↩︎