Climate forcing data#
Now that we have all the static data, we can focus on the climate variables. In this notebook we will…
…download ERA5 land reanalysis data aggregated to our catchment,
…and determine the reference altitude of the data from the geopotential height.
For data preprocessing and download we will again use the Google Earth Engine (GEE) to offload as much as possible to external servers. ERA5-Land is the latest reanalysis dataset from the European Center for Medium-Range Weather Forecast (ECMWF), available from 1950 to near real-time. The GEE data catalog summarizes…
ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land has been produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. Reanalysis produces data that goes several decades back in time, providing an accurate description of the climate of the past.
Source: GEE Data Catalog
To get started we read some settings from the config.ini file again:
cloud project name for the GEE access
input/output folders for data imports and downloads
filenames (DEM, GeoPackage)
include future projections or not
show/hide interactive map in notebooks
…and initialize the GEE API.
--- Attempting Earth Engine Setup for project: matilda-edu ---
1. Trying to initialize with existing credentials...
✅ Earth Engine successfully initialized with existing credentials for project: matilda-edu
We can now load the catchment outline from the previous notebook and convert it to a ee.FeatureCollection to use it in GEE.
Set the date range#
If you are only interested in modeling the past, set PROJECTIONS=False in the config.ini to only download reanalysis data for your defined modeling period. Otherwise, all available historical data (since 1979) is downloaded to provide the best possible basis for bias adjustment of the climate scenario data.
The selected date range is 1979-01-01 to 2025-01-01
ERA5L Geopotential height#
The reference surface elevation of ERA5-Land grid cells cannot be obtained directly, but must be calculated from the geopotential.
This parameter is the gravitational potential energy of a unit mass, at a particular location, relative to mean sea level. It is also the amount of work that would have to be done, against the force of gravity, to lift a unit mass to that location from mean sea level.
The geopotential height can be calculated by dividing the geopotential by the Earth’s gravitational acceleration, g (=9.80665 m s-2). The geopotential height plays an important role in synoptic meteorology (analysis of weather patterns). Charts of geopotential height plotted at constant pressure levels (e.g., 300, 500 or 850 hPa) can be used to identify weather systems such as cyclones, anticyclones, troughs and ridges.
At the surface of the Earth, this parameter shows the variations in geopotential (height) of the surface, and is often referred to as the orography.
Source: ECMWF Parameter Database
Since the ERA5 geopotential height is not available in the GEE Data Catalog, we downloaded it using the ECMWF Copernicus Data Store (CDS) API, converted it to .ncdf format, and reuploaded it. Therefore, the file has to be accessed in a similar way to the ice thickness data in Notebook 1.
Dataset file and reference on media server:
| ref | file_size | file_extension | field8 | |
|---|---|---|---|---|
| 0 | 27215 | 8062521 | zip | ERA5_land_Z_geopotential |
The .ncdf file is then unzipped and loaded as xarray dataset for further processing.
Reading file "ERA5_land_Z_geopotential.nc"...
Dataset contains Geopotential in m**2 s**-2 as variable 'z'
The original dataset covers the entire globe, so we crop it to the catchment area plus a 1° buffer zone.
xr.Dataset cropped to bbox[77.06, 41.05, 79.33, 43]
To load xarray data into GEE a little workaround is needed. Credits to Oliver Lopez for this solution.
Converting xarray to numpy array...
Saving data extent and origin...
Converting numpy array to ee.Array...
Done!
If mapping is enabled in the config.ini you can now display the geopotential data in the GEE map.
Since our workflow will use a lumped model, we will use area-weighted catchment-wide averages of our forcing data. Thus, we also aggregate the geopotential based on the grid cell fractions in the catchment and convert it to geopotential height in meters above sea level. This represents the reference altitude of our forcing data, just as the elevation of a weather station would.
Geopotential mean: 32711.74 m2 s-2
Elevation: 3335.67 m a.s.l.
ERA5-Land Temperature and Precipitation Data#
Our model only requires temperature and precipitation as inputs. We will download both time series from the ERA5-Land Daily Aggregated - ECMWF Climate Reanalysis ECMWF/ERA5_LAND/DAILY_RAW dataset in the Google Earth Engine Data Catalog
The asset is a daily aggregate of ECMWF ERA5 Land hourly assets. […] Daily aggregates have been pre-calculated to facilitate many applications requiring easy and fast access to the data.
Source: GEE Data Catalog
On the server side, we simply create an ee.ImageCollection with the desired bands (temperature and precipitation) and date range. To calculate area-weighted aggregates we apply the ee.Reducer function.
We can then aggregate the results into arrays, download them with .getInfo() and store them as dataframe columns. Depending on the selected date range and server traffic this can take up to a few minutes.
Get timestamps...
Get temperature values...
Get precipitation values...
CPU times: user 269 ms, sys: 1.84 ms, total: 270 ms
Wall time: 3min 5s
The constructed data frame now looks like this:
| ts | dt | temp | temp_c | prec | |
|---|---|---|---|---|---|
| 0 | 283996800000 | 1979-01-01 01:00:00 | 257.392996 | -15.757004 | 0.027381 |
| 1 | 284083200000 | 1979-01-02 01:00:00 | 256.435964 | -16.714036 | 0.004825 |
| 2 | 284169600000 | 1979-01-03 01:00:00 | 257.867342 | -15.282658 | 0.001601 |
| 3 | 284256000000 | 1979-01-04 01:00:00 | 258.322419 | -14.827581 | 0.283469 |
| 4 | 284342400000 | 1979-01-05 01:00:00 | 258.056697 | -15.093303 | 0.108583 |
| ... | ... | ... | ... | ... | ... |
| 16797 | 1735257600000 | 2024-12-27 01:00:00 | 263.114958 | -10.035042 | 0.026132 |
| 16798 | 1735344000000 | 2024-12-28 01:00:00 | 259.717669 | -13.432331 | 0.000925 |
| 16799 | 1735430400000 | 2024-12-29 01:00:00 | 261.213752 | -11.936248 | 0.005297 |
| 16800 | 1735516800000 | 2024-12-30 01:00:00 | 263.517126 | -9.632874 | 1.158257 |
| 16801 | 1735603200000 | 2024-12-31 01:00:00 | 259.652770 | -13.497230 | 1.049000 |
16802 rows × 5 columns
Let’s plot the full time series.
Store data for next steps#
To continue in the workflow, we write the ERA5 data to a .csv file…
…and update the settings.yml file with the reference altitude of the ERA5-Land data (ele_dat) and refresh output_download.zip with newly acquired data.
Data successfully written to YAML at output/settings.yml
Output folder can be download now (file output_download.zip)
You can now continue with Notebook 3.