Learnings from the 2025 Hackathon

All in all, this was surprisingly successful. There were some technical issues though.

Overall organization

Important things to cover: Enough space for the group work (ideally all groups should be in one big room which lowers the physical barrier in interaction across groups).
Eventually, a couple of small rooms for small group discussions or Zoom meetings.
Very important: Snack and coffee/ tea breaks in-between for networking.

General impression: The general agenda from past hackathons was a good orientation point. Each node had their own additional programm with key notes or tutorials according to their needs.
An idea came up to use some time before, for a “Mini-preparation workshop”. I.e. if the hackathon starts in the afternoon, the morning of the same day could be used for example for data training or the science teams lead could meet to discuss how to organize themselves.
Pan-node syncs were semi-successful. We should promote the syncs more and also identify the targeted audience (for every participant or just among the node organizers…) or the purpose of these syncs.

LinkedIn was quite lively. We’ve created a sub-page and started this one month before the event started with some introdutory posts. During and after the event, we had a really active engagement rate, not only by the page itself but also from individual people and institutions reacting, reposting and mentioning hk25.

Works
Invite more people earlier in the process
China uses WeChat for most communication (in Chinese, but with auto-translation)

Did not really line up with the schedules (at least at some nodes). Would need earlier planning to be incorporated into the local schedules.
Goals should be very clear.
There were some difficulties in being and changing the host. This led to complications in recording or opening breakout rooms.

Actually have a real team with members from all nodes
Meet with this team at EGU/AGU if possible at all
Distribute tech team members more widely to foster connections?
Might be worth hiring one person to oversee things / fully commit to running this
Have a clear contact for each node who leads the preparations and actually is there during the hackathon

Provide default chunk settings for 2D/3D datasets
Zarr with small (regional) Chunks works well.
- The IFS GRIB datasets were very popular in China despite access via EERIE cloud being difficult. This is due to IFS being popular, not b/c of GRIB being popular.
Prepare more uniform conversion scripts for better sharing among the teams
Try to get datasets ready weeks before the hackathon (maybe practice with sample datesets to get the scripts right if teams are into last-minute simulations)
Create scripts for consistency checks (e.g. do they have crs set?), and encourage their use
A suite of common analysis could be applied to catch obvious errors - perhaps timeseries/zonal means.

Prepare and share instructions on how to work with the data
Pre-hackathon telcos on node-basis would be good (for most ppl. participating this is the 1st and not the 10th hackathon…)

Set up proxies at all nodes and test them with all remote datasets (under fire) before the hackathon
Spread users across different slurm-jobs

intake0.7 did the trick, but we need a new concept, that again includes loading the data
intake0.7 is outdated and will increasingly collide with python environments
Stac has no concept for loading the data
The combined multi-node offline/online mix worked well
Ideally, the catalog could do the cascading.

ds = cat['ERA5'].to_dask().pipe(attach_crs, zoom=zoom).drop_vars(["lat", "lon"], errors="ignore")
CF Standard for HEALPix might improve things in the future.
XDGGS will help with hierarchies.
Full uptake of python at Beijing node
egh.healpix_view was very popular
Much of the work started with regional remaps to lat/lon (for MCS tracking / …). Would be good to provide tools for this.