Calibrating the MATILDA framework#
In this notebook we will
… set up a glacio-hydrological model with all the data we have collected,
… run the model for the calibration period with default parameters and check the results,
… use a statistical parameter optimization routine to calibrate the model,
… and store the calibrated parameter set for the scenario runs in the next notebook.
The framework for Modeling water resources in glacierized catchments [MATILDA] (cryotools/matilda) has been developed for use in this workflow and is published as a Python package. It is based on the widely used HBV hydrological model, extended by a simple temperature-index melt model based on the code of Seguinot (2019). Glacier evolution over time is modeled using a modified version of the Δh approach following Seibert et. al. (2018).
As before we start by loading configurations such as the calibration period and some helper functions to work with yaml
files.
Show code cell source
from tools.helpers import update_yaml, read_yaml, write_yaml
import configparser
import ast
# read local config.ini file
config = configparser.ConfigParser()
config.read('config.ini')
# get output dir and date range from config.ini
dir_input = config['FILE_SETTINGS']['DIR_INPUT']
dir_output = config['FILE_SETTINGS']['DIR_OUTPUT']
date_range = ast.literal_eval(config['CONFIG']['DATE_RANGE'])
print('MATILDA will be calibrated on the period ' + date_range[0] + ' to ' + date_range[1])
MATILDA will be calibrated on the period 1998-01-01 to 2020-12-31
Since HBV is a so-called ‘bucket’ model and all the buckets are empty in the first place we need to fill them in a setup period of minimum one year. If not specified, the first two years of the DATE_RANGE
in the config
are used for set up.
Show code cell source
import pandas as pd
length_of_setup_period = 2
sim_start = pd.to_datetime(date_range[0]) + pd.DateOffset(years = length_of_setup_period)
set_up_end = sim_start - pd.DateOffset(days = 1)
dates = {'set_up_start': date_range[0],
'set_up_end': str(set_up_end).split(' ')[0], # remove hh:mm:ss
'sim_start': str(sim_start).split(' ')[0], # remove hh:mm:ss
'sim_end': date_range[1]}
for key in dates.keys(): print(key + ': ' + dates[key])
set_up_start: 1998-01-01
set_up_end: 1999-12-31
sim_start: 2000-01-01
sim_end: 2020-12-31
Many MATILDA parameters have been calculated in previous notebooks and stored in settings.yaml
. We can easily add the modeling periods using a helper function. The calculated glacier profile from Notebook 1 can be imported as a pandas DataFrame
and added to the settings
dictionary as well.
Finally, we will also add some optional settings that control the aggregation frequency of the outputs, the choice of graphical outputs, and more.
Show code cell source
update_yaml(dir_output + 'settings.yml', dates)
remaining_settings = {"freq": "M", # aggregation frequency of model outputs (D, M, Y)
"warn": False, # show warnings of subpackages?
"plot_type": "all", # interactive and/or non-interactive plots ('print', 'interactive', 'all')
"elev_rescaling": True} # treat mean glacier elevation as constant or change with glacier evolution
update_yaml(dir_output + 'settings.yml', remaining_settings)
settings = read_yaml(dir_output + 'settings.yml')
glacier_profile = pd.read_csv(dir_output + 'glacier_profile.csv')
settings['glacier_profile'] = glacier_profile
print('MATILDA settings:\n\n')
for key in settings.keys(): print(key + ': ' + str(settings[key]))
Data successfully written to YAML at output/settings.yml
Data successfully written to YAML at output/settings.yml
MATILDA settings:
area_cat: 295.67484249904464
area_glac: 31.829413146585885
ele_cat: 3293.491688025922
ele_dat: 3335.668840874115
ele_glac: 4001.8798828125
elev_rescaling: True
freq: M
lat: 42.18280043250193
plot_type: all
set_up_end: 1999-12-31
set_up_start: 1998-01-01
sim_end: 2020-12-31
sim_start: 2000-01-01
warn: False
glacier_profile: Elevation Area WE EleZone
0 1970 0.000000 0.0000 1900
1 2000 0.000000 0.0000 2000
2 2100 0.000000 0.0000 2100
3 2200 0.000000 0.0000 2200
4 2300 0.000000 0.0000 2300
.. ... ... ... ...
156 4730 0.000023 20721.3700 4700
157 4740 0.000013 14450.2180 4700
158 4750 0.000006 10551.4730 4700
159 4760 0.000000 0.0000 4700
160 4780 0.000002 6084.7456 4700
[161 rows x 4 columns]
Run MATILDA with default parameters#
We will force MATILDA with the pre-processed ERA5-Land data from Notebook 2. Although MATILDA can run without calibration on observations, the results would have extreme uncertainties. Therefore, we recommend to use at least discharge observations for your selected point to evaluate the simulations against. Here, we load discharge observations for your example catchment. As you can see, we have meteorological forcing data since 1979 and discharge data since 1982. However, most of the glacier datasets used start in 2000/2001 and glaciers are a significant part of the water balance. In its current version, MATILDA is not able to extrapolate glacier cover, so we will only calibrate the model using data from 2000 onwards.
Show code cell source
era5 = pd.read_csv(dir_output + 'ERA5L.csv', usecols=['dt', 'temp', 'prec'])
era5.columns = ['TIMESTAMP', 'T2', 'RRR']
# remove HH:MM:SS from 'TIMESTAMP' column
era5['TIMESTAMP'] = pd.to_datetime(era5['TIMESTAMP'])
era5['TIMESTAMP'] = era5['TIMESTAMP'].dt.date
print('ERA5 Data:')
display(era5)
obs = pd.read_csv(dir_input + 'obs_runoff_example.csv')
print('Observations:')
display(obs)
ERA5 Data:
TIMESTAMP | T2 | RRR | |
---|---|---|---|
0 | 1979-01-01 | 257.392996 | 0.027381 |
1 | 1979-01-02 | 256.435964 | 0.004825 |
2 | 1979-01-03 | 257.867342 | 0.001601 |
3 | 1979-01-04 | 258.322419 | 0.283469 |
4 | 1979-01-05 | 258.056697 | 0.108583 |
... | ... | ... | ... |
16066 | 2022-12-27 | 253.990888 | 0.002117 |
16067 | 2022-12-28 | 257.364969 | 0.223113 |
16068 | 2022-12-29 | 255.975972 | 0.513904 |
16069 | 2022-12-30 | 256.748078 | 4.721328 |
16070 | 2022-12-31 | 255.077631 | 1.234133 |
16071 rows × 3 columns
Observations:
Date | Qobs | |
---|---|---|
0 | 01/01/1982 | 1.31 |
1 | 01/02/1982 | 1.19 |
2 | 01/03/1982 | 1.31 |
3 | 01/04/1982 | 1.31 |
4 | 01/05/1982 | 1.19 |
... | ... | ... |
14240 | 12/27/2020 | 3.25 |
14241 | 12/28/2020 | 3.23 |
14242 | 12/29/2020 | 3.08 |
14243 | 12/30/2020 | 2.93 |
14244 | 12/31/2020 | 2.93 |
14245 rows × 2 columns
With all settings and input data in place, we can run MATILDA with default parameters.
Show code cell source
from matilda.core import matilda_simulation
output_matilda = matilda_simulation(era5, obs, **settings)
---
MATILDA framework
Reading parameters for MATILDA simulation
Parameter set:
set_up_start 1998-01-01
set_up_end 1999-12-31
sim_start 2000-01-01
sim_end 2020-12-31
freq M
freq_long Monthly
lat 42.1828
area_cat 295.674842
area_glac 31.829413
ele_dat 3335.668841
ele_cat 3293.491688
ele_glac 4001.879883
ele_non_glac 3208.034151
warn False
CFR_ice 0.01
lr_temp -0.006
lr_prec 0
TT_snow 0
TT_diff 2
CFMAX_snow 2.5
CFMAX_rel 2
BETA 1.0
CET 0.15
FC 250
K0 0.055
K1 0.055
K2 0.04
LP 0.7
MAXBAS 3.0
PERC 1.5
UZL 120
PCORR 1.0
SFCF 0.7
CWH 0.1
AG 0.7
CFR 0.15
hydro_year 10
pfilter 0
TT_rain 2
CFMAX_ice 5.0
dtype: object
*-------------------*
Reading data
Set up period from 1998-01-01 to 1999-12-31 to set initial values
Simulation period from 2000-01-01 to 2020-12-31
Input data preprocessing successful
---
Initiating MATILDA simulation
Recalculating initial elevations based on glacier profile
>> Prior glacier elevation: 4001.8798828125m a.s.l.
>> Recalculated glacier elevation: 3951m a.s.l.
>> Prior non-glacierized elevation: 3208m a.s.l.
>> Recalculated non-glacierized elevation: 3214m a.s.l.
Calculating glacier evolution
*-------------------*
Running HBV routine
Finished spin up for initial HBV parameters
Finished HBV routine
*-------------------*
** Model efficiency based on Monthly aggregates **
KGE coefficient: 0.36
NSE coefficient: -0.95
RMSE: 70.31
MARE coefficient: 0.76
*-------------------*
avg_temp_catchment avg_temp_glaciers evap_off_glaciers \
count 6086.000 6086.000 6086.000
mean -3.486 -7.909 0.790
std 9.213 9.213 1.006
min -25.391 -29.815 0.000
25% -11.674 -16.098 0.000
50% -2.804 -7.229 0.233
75% 5.048 0.624 1.482
max 13.693 9.268 5.627
sum -21213.971 -48135.748 4809.575
prec_off_glaciers prec_on_glaciers rain_off_glaciers \
count 6086.000 6086.000 6086.000
mean 3.180 0.301 2.223
std 4.868 0.466 4.848
min 0.000 0.000 0.000
25% 0.148 0.015 0.000
50% 1.254 0.122 0.000
75% 4.206 0.400 2.091
max 53.990 5.637 53.990
sum 19351.565 1834.596 13528.192
snow_off_glaciers rain_on_glaciers snow_on_glaciers \
count 6086.000 6086.000 6086.000
mean 0.957 0.143 0.159
std 1.969 0.422 0.267
min 0.000 0.000 0.000
25% 0.000 0.000 0.000
50% 0.021 0.000 0.034
75% 0.991 0.012 0.209
max 18.865 5.637 2.472
sum 5823.372 869.854 964.741
snowpack_off_glaciers ... total_refreezing SMB \
count 6086.000 ... 6086.000 6086.000
mean 114.318 ... 0.048 -1.099
std 104.649 ... 0.182 7.064
min 0.000 ... 0.000 -41.107
25% 0.000 ... 0.000 0.000
50% 105.177 ... 0.000 0.226
75% 195.261 ... 0.020 1.843
max 440.068 ... 3.105 23.314
sum 695736.921 ... 295.094 -6691.205
actual_evaporation total_precipitation total_melt \
count 6086.000 6086.000 6086.000
mean 0.779 3.481 1.308
std 0.994 5.328 3.147
min 0.000 0.000 0.000
25% 0.000 0.163 0.000
50% 0.228 1.374 0.000
75% 1.462 4.622 1.019
max 5.627 59.627 24.075
sum 4742.736 21186.161 7957.543
runoff_without_glaciers runoff_from_glaciers runoff_ratio \
count 6086.000 6086.000 6.086000e+03
mean 2.434 0.414 1.386980e+06
std 3.120 0.899 1.081805e+08
min 0.000 0.000 0.000000e+00
25% 0.034 0.000 7.800000e-02
50% 0.680 0.000 6.870000e-01
75% 4.263 0.257 3.334000e+00
max 17.805 7.332 8.439462e+09
sum 14811.605 2520.626 8.441161e+09
total_runoff observed_runoff
count 6086.000 6086.000
mean 2.848 2.003
std 3.632 1.719
min 0.000 0.234
25% 0.034 0.701
50% 0.682 1.125
75% 5.427 3.097
max 20.852 8.971
sum 17332.231 12190.326
[9 rows x 28 columns]
End of the MATILDA simulation
---



The results are obviously far from reality and largely overestimate runoff. The Kling-Gupta Efficiency coefficient (KGE) rates the result as 0.36 with 1.0 being a perfect match with the observations. We can also see that the input precipitation is much higher than the total runoff. Clearly, the model needs calibration.
Calibrate MATILDA#
In order to adjust the model parameters to the catchment characteristics, we will perform an automated calibration. The MATILDA Python package contains a calibration module called matilda.mspot that makes extensive use of the Statistical Parameter Optimization Tool for Python (SPOTPY). As we have seen, there can be large uncertainties in the input data (especially precipitation). Simply tuning the model parameters to match the observed hydrograph may over- or underestimate other runoff contributors, such as glacier melt, to compensate for deficiencies in the precipitation data. Therefore, it is good practice to include additional data sets in a multi-objective calibration. In this workflow we will use:
remote sensing estimates of glacier surface mass balance (SMB) and …
snow water equivalent (SWE) estimates from a dedicated snow reanalysis product.
Unfortunately, both default datasets are limited to High Mountain Asia (HMA). For study sites in other areas, please consult other sources and manually add the target values for calibration in the appropriate code cells. We are happy to include additional datasets as long as they are available online and can be integrated into the workflow.
… run this notebook locally on a computer with more cores (ideally a high performance cluster) or …
… reduce the number of calibration parameters based the global sensitivity. We will return to this topic later in this notebook.
For now, we will demonstrate how to use the SPOT features and then continue with a parameter set from a large HPC optimization run. If you need help implementing the routine on your HPC, consult the SPOTPY documentation and contact us if you encounter problems.
Glacier surface mass balance data#
There are various sources of SMB records but only remote sensing estimates provide data for (almost) all glaciers in the target catchment. Shean et. al. 2020 calculated geodetic mass balances for all glaciers in High Mountain Asia from 2000 to 2018. For this example (and all other catchments in HMA), we can use their data set so derive an average annual mass balance in the calibration period.
We pick all individual mass balances that match the glacier IDs in our catchment and calculate the catchment-wide mean. In addition, we use the uncertainty estimate provided in the dataset to derive an uncertainty range.
Show code cell source
import pandas as pd
mass_balances = pd.read_csv(dir_input + '/hma_mb_20190215_0815_rmse.csv', usecols=['RGIId', 'mb_mwea', 'mb_mwea_sigma'])
ids = pd.read_csv(dir_output + '/RGI/Glaciers_in_catchment.csv')
merged = pd.merge(mass_balances, ids, on='RGIId')
mean_mb = round(merged['mb_mwea'].mean() * 1000, 3) # Mean catchment MB in mm w.e.
mean_sigma = round(merged['mb_mwea_sigma'].mean() * abs(mean_mb), 3) # Mean uncertainty of catchment MB in mm w.e.
target_mb = [mean_mb - mean_sigma, mean_mb + mean_sigma]
print('Target glacier mass balance for calibration: ' + str(mean_mb) + ' +-' + str(mean_sigma) + 'mm w.e.')
Target glacier mass balance for calibration: -155.474 +-50.3mm w.e.
Snow water equivalent#
For snow cover estimates, we will use a specialized snow reanalysis dataset from Liu et. al. 2021. Details on the data can be found in the dataset documentation and the associated publication.
Unfortunately, access to the dataset requires a (free) registration at NASA’s EarthData portal, which prevents a seamless integration. Also, the dataset consists of large files and requires some pre-processing that could not be done in a Jupyter Notebook. However, we provide you with the SWEETR
tool, a fully automated workflow that you can run on your local computer to download, process, and aggregate the data. Please refer to the dedicated Github repository for further instructions.

When you run the SWEETR
tool with your catchment outline, it returns cropped raster files as the one shown above and, more importantly, a timeseries of catchment-wide daily mean SWE from 1999 to 2016. Just replace the input/swe.csv
file with your result. Now we can load the timeseries as a dataframe.
Show code cell source
swe = pd.read_csv(f'{dir_input}/swe.csv')
Along with the cropped SWE rasters the SWEETR
tool creates binary masks for seasonal- and non-seasonal snow. Due to its strong signal in remote sensing data, seasonal snow can be better detected leading to more robust SWE estimates. However, the non-seasonal snow largely agrees with the glacierized area. Therefore, we will calibrate the snow routine by comparing the SWE of the ice-free sub-catchment with the seasonal snow of the reanalysis. Since the latter has a coarse resolution of 500 m, the excluded catchment area is a bit larger than the RGI glacier outlines (17.2% non-seasonal snow vs. 10.8% glacierized sub-catchment). Therefore, we use a scaling factor to account for this mismatch in the reference area.
Show code cell source
glac_ratio = settings['area_glac'] / settings['area_cat'] # read glacieriezed and total area from the settings
swe_area_sim = 1-glac_ratio
swe_area_obs = 0.828 # 1 - non-seasonal snow / seasonal snow
sf = swe_area_obs / swe_area_sim
print('SWE scaling factor: ' + str(round(sf, 3)))
SWE scaling factor: 0.928
The MATILDA framework provides an interface for SPOTPY. Here we will use the psample()
function to run MATILDA with the same settings as before but varying parameters. To do this, we will remove redundant settings
and add some new ones specific to the function. Be sure to choose the number of repetitions carefully.
Show code cell source
from tools.helpers import drop_keys
psample_settings = drop_keys(settings, ['warn', 'plots', 'plot_type'])
additional_settings = {'rep': 10, # Number of model runs. For advice, check the documentation of the algorithms.
'glacier_only': False, # True when calibrating an entirely glacierized catchment
'obj_dir': 'maximize', # should your objective function be maximized (e.g. NSE) or minimized (e.g. RMSE)
'target_mb': mean_mb, # Average annual glacier mass balance to target at
'target_swe': swe, # Catchment-wide mean SWE timeseries of seasonal snow to calibrate the snow routine
'swe_scaling': 0.928, # scaling factor for simulated SWE to account for reference area mismatch
'dbformat': None, # Write the results to a file ('csv', 'hdf5', 'ram', 'sql')
'output': None, # Choose where to store the files
'algorithm': 'lhs', # Choose algorithm (for parallelization: mc, lhs, fast, rope, sceua or demcz)
'dbname': 'era5_matilda_example', # Choose name
'parallel': False, # Distribute the calculation on multiple cores or not
# 'cores': 20 # Set number of cores when running in parallel
}
psample_settings.update(additional_settings)
print('Settings for calibration runs:\n')
for key in psample_settings.keys(): print(key + ': ' + str(psample_settings[key]))
Settings for calibration runs:
area_cat: 295.67484249904464
area_glac: 31.829413146585885
ele_cat: 3293.491688025922
ele_dat: 3335.668840874115
ele_glac: 4001.8798828125
elev_rescaling: True
freq: M
lat: 42.18280043250193
set_up_end: 1999-12-31
set_up_start: 1998-01-01
sim_end: 2020-12-31
sim_start: 2000-01-01
glacier_profile: Elevation Area WE EleZone norm_elevation delta_h
0 1970 0.000000 0.0000 1900 1.000000 1.003442
1 2000 0.000000 0.0000 2000 0.989324 0.945813
2 2100 0.000000 0.0000 2100 0.953737 0.774793
3 2200 0.000000 0.0000 2200 0.918149 0.632696
4 2300 0.000000 0.0000 2300 0.882562 0.515361
.. ... ... ... ... ... ...
156 4730 0.000023 20721.3700 4700 0.017794 -0.000265
157 4740 0.000013 14450.2180 4700 0.014235 -0.000692
158 4750 0.000006 10551.4730 4700 0.010676 -0.001119
159 4760 0.000000 0.0000 4700 0.007117 -0.001546
160 4780 0.000002 6084.7456 4700 0.000000 -0.002400
[161 rows x 6 columns]
rep: 10
glacier_only: False
obj_dir: maximize
target_mb: -155.474
target_swe: Date SWE_Mean
0 1999-10-01 0.0003
1 1999-10-02 0.0002
2 1999-10-03 0.0001
3 1999-10-04 0.0001
4 1999-10-05 0.0000
... ... ...
6570 2017-09-26 0.0020
6571 2017-09-27 0.0021
6572 2017-09-28 0.0020
6573 2017-09-29 0.0022
6574 2017-09-30 0.0023
[6575 rows x 2 columns]
swe_scaling: 0.928
dbformat: None
output: None
algorithm: lhs
dbname: era5_matilda_example
parallel: False
With these settings we can start the psample()
to run our model with various parameter combinations. The default parameter boundaries can be found in the MATILDA parameter documentation. If you want to narrow down the parameter space, you can do that using the following syntax. Here, we define custom ranges for the temperature lapse rate and the precipitation correction factor and run a short Latin Hypercube Sampling (LHS) as an example.
Show code cell source
from matilda.mspot_glacier import psample
lim_dict = {'lr_temp_lo': -0.007, 'lr_temp_up': -0.005, 'PCORR_lo': 0.5, 'PCORR_up': 1.5}
best_summary = psample(df=era5, obs=obs, **psample_settings, **lim_dict)
Initializing the Latin Hypercube Sampling (LHS) with 10 repetitions
The objective function will be maximized
Starting the LHS algotrithm with 10 repetitions...
Creating LatinHyperCube Matrix
1 of 10, maximal objective function=-0.123435, time remaining: 00:00:12
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
2 of 10, maximal objective function=-0.06829, time remaining: 00:00:14
3 of 10, maximal objective function=0.340261, time remaining: 00:00:14
4 of 10, maximal objective function=0.340261, time remaining: 00:00:12
5 of 10, maximal objective function=0.340261, time remaining: 00:00:10
6 of 10, maximal objective function=0.340261, time remaining: 00:00:08
7 of 10, maximal objective function=0.340261, time remaining: 00:00:05
8 of 10, maximal objective function=0.340261, time remaining: 00:00:03
9 of 10, maximal objective function=0.340261, time remaining: 00:00:00
10 of 10, maximal objective function=0.340261, time remaining: 23:59:57
*** Final SPOTPY summary ***
Total Duration: 28.68 seconds
Total Repetitions: 10
Maximal objective value: 0.340261
Corresponding parameter setting:
lr_temp: -0.00589245
lr_prec: 0.000291245
BETA: 4.60577
CET: 0.273043
FC: 460.917
K0: 0.122618
K1: 0.131749
K2: 0.0145173
LP: 0.518594
MAXBAS: 3.44291
PERC: 0.270948
UZL: 315.409
PCORR: 0.6446
TT_snow: -0.566121
TT_diff: 1.07428
CFMAX_snow: 1.87311
CFMAX_rel: 1.70443
SFCF: 0.573149
CWH: 0.144535
AG: 0.311545
CFR: 0.146147
******************************
WARNING: The selected algorithm lhs can either maximize or minimize the objective function. You can specify the direction by passing obj_dir to analyze_results(). The default is 'maximize'.
Best parameter set:
lr_temp=-0.005892454422404972, lr_prec=0.00029124454225974684, BETA=4.605769374855129, CET=0.27304282392950957, FC=460.9167939126607, K0=0.12261785632143476, K1=0.1317486252852554, K2=0.01451727506670979, LP=0.5185938102038344, MAXBAS=3.4429065955557165, PERC=0.2709482237637877, UZL=315.4087901113886, PCORR=0.644600068211386, TT_snow=-0.5661209862496375, TT_diff=1.0742768250311752, CFMAX_snow=1.8731100472578421, CFMAX_rel=1.7044317968894827, SFCF=0.573148791466686, CWH=0.1445349329739672, AG=0.31154507645846624, CFR=0.14614693866226292
Run number 2 has the highest objectivefunction with: 0.3403
Process-based calibration#
Every parameter governs a different aspect of the water balance, and not all parameters affect every calibration variable. Therefore, we propose an iterative process-based calibration approach where we calibrate parameters in order of their importance using different algorithms, objective functions, and calibration variables. Details on the calibration strategy can be found in the model publication which is currently under review. The individual steps will be implemented in the notebook soon.

Run MATILDA with calibrated parameters#
The following parameter set was computed applying the mentioned calibration strategy on an HPC cluster.
Show code cell source
param = {
'RFS': 0.15,
'SFCF': 1,
'CET': 0,
'PCORR': 0.58,
'lr_temp': -0.006,
'lr_prec': 0.0015,
'TT_diff': 0.76198,
'TT_snow': -1.44646,
'CFMAX_snow': 3.3677,
'BETA': 1.0,
'FC': 99.15976,
'K0': 0.01,
'K1': 0.01,
'K2': 0.15,
'LP': 0.998,
'MAXBAS': 2.0,
'PERC': 0.09232826,
'UZL': 126.411575,
'CFMAX_rel': 1.2556936,
'CWH': 0.000117,
'AG': 0.54930484
}
print('Calibrated parameter set:\n\n')
for key in param.keys(): print(key + ': ' + str(param[key]))
Calibrated parameter set:
RFS: 0.15
SFCF: 1
CET: 0
PCORR: 0.58
lr_temp: -0.006
lr_prec: 0.0015
TT_diff: 0.76198
TT_snow: -1.44646
CFMAX_snow: 3.3677
BETA: 1.0
FC: 99.15976
K0: 0.01
K1: 0.01
K2: 0.15
LP: 0.998
MAXBAS: 2.0
PERC: 0.09232826
UZL: 126.411575
CFMAX_rel: 1.2556936
CWH: 0.000117
AG: 0.54930484
Properly calibrated, the model shows a much better results.
Show code cell source
output_matilda = matilda_simulation(era5, obs, **settings, **param)
---
MATILDA framework
Reading parameters for MATILDA simulation
Parameter set:
set_up_start 1998-01-01
set_up_end 1999-12-31
sim_start 2000-01-01
sim_end 2020-12-31
freq M
freq_long Monthly
lat 42.1828
area_cat 295.674842
area_glac 31.829413
ele_dat 3335.668841
ele_cat 3293.491688
ele_glac 4001.879883
ele_non_glac 3208.034151
warn False
CFR_ice 0.01
lr_temp -0.006
lr_prec 0.0015
TT_snow -1.44646
TT_diff 0.76198
CFMAX_snow 3.3677
CFMAX_rel 1.255694
BETA 1.0
CET 0
FC 99.15976
K0 0.01
K1 0.01
K2 0.15
LP 0.998
MAXBAS 2.0
PERC 0.092328
UZL 126.411575
PCORR 0.58
SFCF 1
CWH 0.000117
AG 0.549305
CFR 0.15
hydro_year 10
pfilter 0
TT_rain -0.68448
CFMAX_ice 4.228799
dtype: object
*-------------------*
Reading data
Set up period from 1998-01-01 to 1999-12-31 to set initial values
Simulation period from 2000-01-01 to 2020-12-31
Input data preprocessing successful
---
Initiating MATILDA simulation
Recalculating initial elevations based on glacier profile
>> Prior glacier elevation: 4001.8798828125m a.s.l.
>> Recalculated glacier elevation: 3951m a.s.l.
>> Prior non-glacierized elevation: 3208m a.s.l.
>> Recalculated non-glacierized elevation: 3214m a.s.l.
Calculating glacier evolution
*-------------------*
Running HBV routine
Finished spin up for initial HBV parameters
Finished HBV routine
*-------------------*
** Model efficiency based on Monthly aggregates **
KGE coefficient: 0.89
NSE coefficient: 0.83
RMSE: 20.94
MARE coefficient: 0.25
*-------------------*
avg_temp_catchment avg_temp_glaciers evap_off_glaciers \
count 6086.000 6086.000 6086.000
mean -3.490 -7.915 0.732
std 9.213 9.213 0.858
min -25.396 -29.821 0.000
25% -11.679 -16.105 0.000
50% -2.807 -7.229 0.233
75% 5.042 0.619 1.514
max 13.688 9.264 3.065
sum -21239.814 -48167.692 4452.837
prec_off_glaciers prec_on_glaciers rain_off_glaciers \
count 6086.000 6086.000 6086.000
mean 1.945 0.306 1.355
std 2.939 0.303 2.836
min 0.000 0.000 0.000
25% 0.000 0.106 0.000
50% 0.739 0.187 0.000
75% 2.706 0.385 1.496
max 31.136 3.331 31.136
sum 11835.937 1862.910 8245.379
snow_off_glaciers rain_on_glaciers snow_on_glaciers \
count 6086.000 6086.000 6086.000
mean 0.590 0.138 0.169
std 1.458 0.299 0.216
min 0.000 0.000 0.000
25% 0.000 0.000 0.000
50% 0.000 0.000 0.101
75% 0.331 0.147 0.212
max 15.462 3.331 1.936
sum 3590.557 837.237 1025.673
snowpack_off_glaciers ... total_refreezing SMB \
count 6086.000 ... 6086.000 6086.000
mean 67.800 ... 0.026 -1.306
std 67.907 ... 0.067 7.221
min 0.000 ... 0.000 -36.090
25% 0.000 ... 0.000 -1.906
50% 57.622 ... 0.000 0.988
75% 122.659 ... 0.009 2.094
max 309.720 ... 0.486 18.739
sum 412628.425 ... 157.861 -7945.570
actual_evaporation total_precipitation total_melt \
count 6086.000 6086.000 6086.000
mean 0.515 2.251 0.937
std 0.595 3.242 2.569
min 0.000 0.000 0.000
25% 0.000 0.106 0.000
50% 0.168 0.924 0.000
75% 1.100 3.092 0.847
max 2.011 34.467 22.866
sum 3134.853 13698.846 5703.558
runoff_without_glaciers runoff_from_glaciers runoff_ratio \
count 6086.000 6086.000 6086.000
mean 1.439 0.439 4.681
std 1.006 0.822 7.825
min 0.000 0.000 0.000
25% 0.526 0.000 0.466
50% 1.177 0.000 1.482
75% 2.232 0.525 5.215
max 4.725 5.314 59.982
sum 8756.219 2670.634 28491.045
total_runoff observed_runoff
count 6086.000 6086.000
mean 1.878 2.003
std 1.630 1.719
min 0.000 0.234
25% 0.526 0.701
50% 1.191 1.125
75% 3.169 3.097
max 8.292 8.971
sum 11426.854 12190.326
[9 rows x 28 columns]
End of the MATILDA simulation
---



In addition to the standard plots we can explore the results interactive ploty
plots. Go ahead and zoom as you like or select/deselect individual curves.
Show code cell source
output_matilda[9].show()
The same works for the long-term seasonal cycle.
Show code cell source
output_matilda[10].show()
Reducing the parameter space - Sensitivity analysis with FAST#
To reduce the computation time of the calibration procedure, we need to reduce the number of parameters to be optimized. Therefore, we will perform a global sensitivity analysis to identify the most important parameters and set the others to default values. The algorithm of choice will be the Fourier Amplitude Sensitivity Test (FAST) available through the SPOTPY library. As before, we will show the general procedure with a few iterations, but display results from extensive runs on a HPC. You can use the results as a guide for your parameter choices, but keep in mind that they are highly correlated with your catchment properties, such as elevation range and glacier coverage.
First, we calculate the required number of iterations using a formula from Henkel et. al. 2012. We choose the SPOTPY default frequency step of 2 and set the interference factor to a maximum of 4, since we assume a high intercorrelation of the model parameters. The total number of parameters in MATILDA is 21.
Show code cell source
def fast_iter(param, interf=4, freqst=2):
"""
Calculates the number of parameter iterations needed for parameterization and sensitivity analysis using FAST.
Parameters
----------
param : int
The number of input parameters being analyzed.
interf : int
The interference factor, which determines the degree of correlation between the input parameters.
freqst : int
The frequency step, which specifies the size of the intervals between each frequency at which the Fourier transform is calculated.
Returns
-------
int
The total number of parameter iterations needed for FAST.
"""
return (1 + 4 * interf ** 2 * (1 + (param - 2) * freqst)) * param
print('Needed number of iterations for FAST: ' + str(fast_iter(21)))
Needed number of iterations for FAST: 52437
That is a lot of iterations! Running this routine on a single core would take about two days, but can be sped up significantly with each additional core. The setup would look exactly like the parameter optimization before.
Note: No matter what number of iterations you define, SPOTPY will run \(N*k\) times, where \(k\) is the number of model parameters. So even if we set rep=10
, the algorithm will run at least 21 times.
Show code cell source
from matilda.mspot_glacier import psample
fast_settings = {'rep': 52437, # Choose wisely before running
'target_mb': None,
'algorithm': 'fast',
'dbname': 'fast_matilda_example',
'dbname': dir_output + 'fast_example',
'dbformat': 'csv'
}
psample_settings.update(fast_settings)
print('Settings for FAST:\n\n')
for key in psample_settings.keys(): print(key + ': ' + str(psample_settings[key]))
# fast_results = psample(df=era5, obs=obs, **psample_settings)
Settings for FAST:
area_cat: 295.67484249904464
area_glac: 31.829413146585885
ele_cat: 3293.491688025922
ele_dat: 3335.668840874115
ele_glac: 4001.8798828125
elev_rescaling: True
freq: M
lat: 42.18280043250193
set_up_end: 1999-12-31
set_up_start: 1998-01-01
sim_end: 2020-12-31
sim_start: 2000-01-01
glacier_profile: Elevation Area WE EleZone norm_elevation delta_h
0 1970 0.000000 0.0000 1900 1.000000 1.003442
1 2000 0.000000 0.0000 2000 0.989324 0.945813
2 2100 0.000000 0.0000 2100 0.953737 0.774793
3 2200 0.000000 0.0000 2200 0.918149 0.632696
4 2300 0.000000 0.0000 2300 0.882562 0.515361
.. ... ... ... ... ... ...
156 4730 0.000023 20721.3700 4700 0.017794 -0.000265
157 4740 0.000013 14450.2180 4700 0.014235 -0.000692
158 4750 0.000006 10551.4730 4700 0.010676 -0.001119
159 4760 0.000000 0.0000 4700 0.007117 -0.001546
160 4780 0.000002 6084.7456 4700 0.000000 -0.002400
[161 rows x 6 columns]
rep: 52437
glacier_only: False
obj_dir: maximize
target_mb: None
target_swe: SWE_Mean
Date
1999-10-01 0.0003
1999-10-02 0.0002
1999-10-03 0.0001
1999-10-04 0.0001
1999-10-05 0.0000
... ...
2017-09-26 0.0020
2017-09-27 0.0021
2017-09-28 0.0020
2017-09-29 0.0022
2017-09-30 0.0023
[6575 rows x 1 columns]
swe_scaling: 0.928
dbformat: csv
output: None
algorithm: fast
dbname: output/fast_example
parallel: False
We ran FASTs for the example catchment with the full number of iterations required. The results are saved in CSV files. We can use the spotpy.analyser
library to create easy-to-read data frames from the databases. The summary shows the first (S1
) and the total order sensitivity index (ST
) for each parameter. S1
refers to the variance of the model output explained by the parameter, holding all other parameters constant. The ST
takes into account the interaction of the parameters and is therefore a good measure of the impact of individual parameters on the model output.
Show code cell source
import spotpy
import os
import contextlib
def get_si(fast_results: str, to_csv: bool = False) -> pd.DataFrame:
"""
Computes the sensitivity indices of a given FAST simulation results file.
Parameters
----------
fast_results : str
The path of the FAST simulation results file.
to_csv : bool, optional
If True, the sensitivity indices are saved to a CSV file with the same
name as fast_results, but with '_sensitivity_indices.csv' appended to
the end (default is False).
Returns
-------
pd.DataFrame
A pandas DataFrame containing the sensitivity indices and parameter
names.
"""
if fast_results.endswith(".csv"):
fast_results = fast_results[:-4] # strip .csv
results = spotpy.analyser.load_csv_results(fast_results)
# Suppress prints
with contextlib.redirect_stdout(open(os.devnull, 'w')):
SI = spotpy.analyser.get_sensitivity_of_fast(results, print_to_console=False)
parnames = spotpy.analyser.get_parameternames(results)
sens = pd.DataFrame(SI)
sens['param'] = parnames
sens.set_index('param', inplace=True)
if to_csv:
sens.to_csv(os.path.basename(fast_results) + '_sensitivity_indices.csv', index=False)
return sens
display(get_si(dir_input + 'FAST/' + 'example_fast_nolim.csv'))
S1 | ST | |
---|---|---|
param | ||
lr_temp | 0.001027 | 0.028119 |
lr_prec | 0.000068 | 0.020323 |
BETA | 0.000406 | 0.044880 |
CET | 0.000183 | 0.061767 |
FC | 0.000177 | 0.070427 |
K0 | 0.000128 | 0.056534 |
K1 | 0.000611 | 0.096560 |
K2 | 0.000564 | 0.073538 |
LP | 0.000296 | 0.087176 |
MAXBAS | 0.000166 | 0.045120 |
PERC | 0.000079 | 0.051002 |
UZL | 0.000129 | 0.056398 |
PCORR | 0.011253 | 0.857075 |
TT_snow | 0.000339 | 0.087251 |
TT_diff | 0.000221 | 0.064408 |
CFMAX_ice | 0.001746 | 0.121897 |
CFMAX_rel | 0.000108 | 0.083392 |
SFCF | 0.007414 | 0.137477 |
CWH | 0.000612 | 0.063503 |
AG | 0.000275 | 0.116503 |
RFS | 0.000089 | 0.098914 |
If you have additional information on certain parameters, limiting their bounds can have a large impact on sensitivity. For our example catchment, field observations showed that the temperature lapse rate and precipitation correction were unlikely to exceed a certain range, so we limited the parameter space for both and ran a FAST again.
Show code cell source
lim_dict = {'lr_temp_lo': -0.007, 'lr_temp_up': -0.005, 'PCORR_lo': 0.5, 'PCORR_up': 1.5}
# fast_results = psample(df=era5, obs=obs, **psample_settings, **lim_dict)
To see the effect of parameter restrictions on sensitivity, we can plot the indices of both runs. Feel free to explore further by adding more FAST outputs to the plot function.
Show code cell source
import plotly.graph_objs as go
import pandas as pd
import plotly.io as pio
def plot_sensitivity_bars(*dfs, labels=None, show=False, bar_width=0.3, bar_gap=0.6):
"""
Plots a horizontal bar chart showing the total sensitivity index for each parameter in a MATILDA model.
Parameters
----------
*dfs : pandas.DataFrame
Multiple dataframes containing the sensitivity indices for each parameter.
labels : list of str, optional
Labels to use for the different steps in the sensitivity analysis. If not provided, the default
labels will be 'No Limits', 'Step 1', 'Step 2', etc.
bar_width : float, optional
Width of each bar in the chart.
bar_gap : float, optional
Space between bars.
"""
traces = []
colors = ['darkblue', 'orange', 'purple', 'cyan'] # add more colors if needed
for i, df in enumerate(dfs):
df = get_si(df)
if i > 0:
if labels is None:
label = 'Step ' + str(i)
else:
label = labels[i]
else:
label = 'No Limits'
trace = go.Bar(y=df.index,
x=df['ST'],
name=label,
orientation='h',
marker=dict(color=colors[i]),
width=bar_width)
traces.append(trace)
layout = go.Layout(title=dict(text='<b>' +'Total Sensitivity Index for MATILDA Parameters' + '<b>', font=dict(size=24)),
xaxis_title='Total Sensitivity Index',
yaxis_title='Parameter',
yaxis=dict(automargin=True),
bargap=bar_gap,
height=700)
fig = go.Figure(data=traces, layout=layout)
if show:
fig.show()
step1 = dir_input + 'FAST/' + 'example_fast_nolim.csv'
step2 = dir_input + 'FAST/' + 'era5_ipynb_fast_step1.csv'
plot_sensitivity_bars(step1, step2, labels=['No Limits', 'lr_temp and PCORR limited'], show=True)
From this figure, we can easily identify the most important parameters. Depending on your desired accuracy and computational resources, you can choose a sensitivity threshold. Let’s say we want to fix all parameters with a sensitivity index below 0.1. For our example, this will reduce the calibration parameters to 7.
Show code cell source
si_df = get_si(step2)
sensitive = si_df.index[si_df['ST'] > 0.10].values
print('Parameters with a sensitivity index > 0.1:')
for i in sensitive: print(i)
Parameters with a sensitivity index > 0.1:
K1
K2
LP
PCORR
TT_snow
CFMAX_ice
SFCF
To give you an idea how much this procedure can reduce calibration time, we run the fast_iter()
function again.
Show code cell source
print('Needed number of iterations for FAST with all parameters: ' + str(fast_iter(21)))
print('Needed number of iterations for FAST with the 7 most sensitive parameters: ' + str(fast_iter(7)))
Needed number of iterations for FAST with all parameters: 52437
Needed number of iterations for FAST with the 7 most sensitive parameters: 4935
For a single core this would roughly translate into a reduction from 44h to 4h.
To run the calibration with the reduced parameter space, we simply need to define values for the fixed parameters and use a helper function to translate them into boundary arguments. Here we simply use the values from our last SPOTPY run (param
).
Show code cell source
from matilda.mspot_glacier import dict2bounds
fixed_param = {key: val for key, val in param.items() if key not in sensitive}
fixed_param_bounds = dict2bounds(fixed_param)
print('Fixed bounds:\n')
for key in fixed_param_bounds.keys(): print(key + ': ' + str(fixed_param_bounds[key]))
Fixed bounds:
RFS_lo: 0.15
CET_lo: 0
lr_temp_lo: -0.006
lr_prec_lo: 0.0015
TT_diff_lo: 0.76198
CFMAX_snow_lo: 3.3677
BETA_lo: 1.0
FC_lo: 99.15976
K0_lo: 0.01
MAXBAS_lo: 2.0
PERC_lo: 0.09232826
UZL_lo: 126.411575
CFMAX_rel_lo: 1.2556936
CWH_lo: 0.000117
AG_lo: 0.54930484
RFS_up: 0.15
CET_up: 0
lr_temp_up: -0.006
lr_prec_up: 0.0015
TT_diff_up: 0.76198
CFMAX_snow_up: 3.3677
BETA_up: 1.0
FC_up: 99.15976
K0_up: 0.01
MAXBAS_up: 2.0
PERC_up: 0.09232826
UZL_up: 126.411575
CFMAX_rel_up: 1.2556936
CWH_up: 0.000117
AG_up: 0.54930484
The psample()
setup is then as simple as before.
Show code cell source
new_settings = {'rep': 10, # Number of model runs. For advice check the documentation of the algorithms.
'glacier_only': False, # True when calibrating a entirely glacierized catchment
'obj_dir': 'maximize', # should your objective funtion be maximized (e.g. NSE) or minimized (e.g. RMSE)
'target_mb': -156, # Average annual glacier mass balance to target at
'dbformat': None, # Write the results to a file ('csv', 'hdf5', 'ram', 'sql')
'output': None, # Choose where to store the files
'algorithm': 'lhs', # Choose algorithm (for parallelization: mc, lhs, fast, rope, sceua or demcz)
'dbname': 'era5_matilda_example', # Choose name
'parallel': False, # Distribute the calculation on multiple cores or not
# 'cores': 20 # Set number of cores when running parallel
}
psample_settings.update(new_settings)
best_summary = psample(df=era5, obs=obs, **psample_settings, **fixed_param_bounds)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[23], line 15
1 new_settings = {'rep': 10, # Number of model runs. For advice check the documentation of the algorithms.
2 'glacier_only': False, # True when calibrating a entirely glacierized catchment
3 'obj_dir': 'maximize', # should your objective funtion be maximized (e.g. NSE) or minimized (e.g. RMSE)
(...)
11 # 'cores': 20 # Set number of cores when running parallel
12 }
13 psample_settings.update(new_settings)
---> 15 best_summary = psample(df=era5, obs=obs, **psample_settings, **fixed_param_bounds)
File ~/Seafile/Ana-Lena_Phillip/data/matilda/matilda/mspot_glacier.py:1609, in psample(df, obs, rep, output, dbname, dbformat, obj_func, opt_iter, fig_path, set_up_start, set_up_end, sim_start, sim_end, freq, lat, area_cat, area_glac, ele_dat, ele_glac, ele_cat, glacier_profile, interf, freqst, parallel, cores, save_sim, elev_rescaling, glacier_only, obs_type, target_mb, target_swe, swe_scaling, algorithm, obj_dir, fix_param, fix_val, demcz_args, **kwargs)
1590 setup = spot_setup_glacier(
1591 set_up_start=set_up_start,
1592 set_up_end=set_up_end,
(...)
1605 **kwargs,
1606 )
1608 else:
-> 1609 setup = spot_setup(
1610 set_up_start=set_up_start,
1611 set_up_end=set_up_end,
1612 sim_start=sim_start,
1613 sim_end=sim_end,
1614 freq=freq,
1615 area_cat=area_cat,
1616 area_glac=area_glac,
1617 ele_dat=ele_dat,
1618 ele_glac=ele_glac,
1619 ele_cat=ele_cat,
1620 lat=lat,
1621 interf=interf,
1622 freqst=freqst,
1623 glacier_profile=glacier_profile,
1624 elev_rescaling=elev_rescaling,
1625 target_mb=target_mb,
1626 fix_param=fix_param,
1627 fix_val=fix_val,
1628 target_swe=target_swe,
1629 swe_scaling=swe_scaling,
1630 **kwargs,
1631 )
1633 psample_setup = setup(
1634 df, obs, target_swe, obj_func
1635 ) # Define custom objective function using obj_func=
1636 alg_selector = {
1637 "mc": spotpy.algorithms.mc,
1638 "sceua": spotpy.algorithms.sceua,
(...)
1651 "nsgaii": spotpy.algorithms.nsgaii,
1652 }
TypeError: spot_setup() got an unexpected keyword argument 'RFS_lo'
Now, we save the best parameter set for use in the next notebook. The set is stored in the best_summary
variable.
Note: In our example we will skip this step and use the result from the calibration on an HPC cluster.
Show code cell source
# param = best_summary['best_param']
Finally, we store the parameter set in a .yml
file.
Show code cell source
write_yaml(param, dir_output + 'parameters.yml')
print(f"Parameter set stored in '{dir_output}parameters.yml'")
Show code cell source
import shutil
# refresh `output_download.zip` with data retrieved within this notebook
shutil.make_archive('output_download', 'zip', 'output')
print('Output folder can be download now (file output_download.zip)')
Show code cell source
%reset -f