River flooding is one of the most frequent and devastating weather-related catastrophes, causing thousands of fatalities, millions of human displacements and billions of dollars in economic losses every year. Furthermore, floods cause ecosystem degradation and sometimes loss of cultural heritage. Most river flooding is caused by high-intensity and prolonged seasonal precipitation. Sometimes, fluvial flooding is worsened or even caused by human actions, such as deforestation or land-use change (e.g., urbanization). In addition, human-influenced global warming is leading to continental-scale changes in observed flood discharge (Blöschl et al. 2019). Due to further global warming, floods are expected to become even more destructive and frequent in the future, with severe implications for the environment and our society (IPCC 2014). To investigate these future changes in flood magnitude we can use a global flood modeling chain.
What is a global flood modeling chain and what it is good for?
A global flood modeling chain is a series of computer models used to estimate societal and economic risks of continental scale river flooding, now and in the future. The first part of the modeling chain (Figure 1) consists of meteorological data (climate forcing), which is provided either for the past by a climate reanalysis of historical observations or for the future by global climate models. The meteorological data, such as precipitation, serves as input for global hydrological models, which compute the corresponding surface runoff. This information is then routed through a continental river network of a global flood model to finally simulate a gridded flood extent and water level depths. For more detailed information on river flood modeling, please read this ISIPEDIA article. With such a global flood modeling chain we can simulate historical flood events all over the world. This allows us, for example, to estimate changes in vulnerability or attributing trends in reported flood damages. By using projections of future climate, we can also anticipate the effect of climate change on fluvial floods.
What do we know about the uncertainty of the modeling chain?
However, there are still large uncertainties associated with the global flood modeling chain. For example, a comparison of several state-of-the-art global flood models in a case study for three African river sections revealed that there are considerable differences in how accurately observed flood events are reproduced (Bernhofen et al. 2018). Reasons are, among many others, differences in the approaches of the global flood models and uncertainties and differences in underlying datasets. Here, we investigate how the choice of climate forcing and global hydrological model influences the simulated flood extent. Within ISIMIP, multiple climate reanalyses and multiple global hydrological models have been used to force the global flood model CaMa-Flood, however, we do not know to what degree the input combinations are interchangeable, or if some are even superior to the others.
So how do we determine the uncertainty regarding the forcing data?
To do so, we prepared a case study of eight historical flood events covering a variety of climate and hydraulic characteristics (Figure 2). 33 input combinations (three climate reanalysis datasets and eleven global hydrological models, all part of ISIMIP) were routed through the global flood model CaMa-Flood and their performance in simulating flood extent was evaluated against satellite imagery, see Figure 3 (left).

Overview of the countries in which study areas are located: (1) Guatemala, (2) Bolivia, (3) Nigeria, (4 Mozambique), (5) Pakistan, (6) China), (7) Thailand and (8) Australia. Country shapes by www.gadm.org
FIG 2 / Image credit: Mester et al. (2021)

Left: Flooded areas derived by satellite imagery (turquoise) in the study area of Phimai (dashed yellow box) in Thailand in 2010. Right: Satellite imagery (gray) superimposed with flood extent simulated by the global flood model CaMa-Flood (blue).
FIG 3 / Image credit: Mester et al. (2021)
Model agreement maps are generated by superimposing the flood extent simulated by CaMa-Flood and with the extent observed in NASA MODIS satellite imagery (Figure 3, right). This allows to visually assess spatial differences and determine uncertainty “hotspots”. Further, two spatial performance metrics are computed to determine the performance also in numerical terms. The Critical Success Index (CSI) describes how well the global flood model can reproduce the same flood extent as derived from satellite imagery. It is calculated as the ratio of the intersection and the union area between modeled and observed flooded area. The CSI ranges from 0 to 1, where 1 represents a perfect model “fit” (Sampson et al. 2015). The Bias Score, on the other hand, indicates if the flood modelling chain has a tendency towards over- or underprediction of flood extents. A positive score means overprediction, and a negative score means underprediction; whereas a Bias Score of zero would mean that there is no difference between the simulated and observed total area.
Finally, we also included the FLOPROS database (Scussolini et al. 2016) in an additional run to test the effect of accounting for known or estimated flood protection levels.
What are the results of our case study?
Figure 4 shows as an example of the model agreement maps for a flood event that happened at Phimai, located at the Mun River Basin in Thailand, in 2010. For each of the three climate reanalyses a map was created by superimposing eleven global hydrological model combination results and the satellite imagery. Most flooded parts are simulated with all combinations, represented by orange / red pixels. However, for some parts the agreement among the global hydrological models is rather low (blue / green pixels) or even no model simulated these parts to be flooded (only gray satellite image is visible). The differences among the climate reanalysis datasets are small, with only a few exceptions, e.g., the north-east (top-right) part of the flood area.

Model agreement maps indicating the flood extent overlap between the 11 global hydrological models and the satellite data for the study region of Phimai in Thailand and three climate forcings (columns). The cell color represents the number of global hydrological models that computed the corresponding cell to be flooded. The underlying flood extent of the satellite imagery (light grey) is assigned a dark color tone if it matches with at least one global hydrological model.
FIG 4 / Image credit: Mester et al. (2021)
The CSI scores for the study regions of the climate forcing GSWP3 are displayed in Figure 5. Each row corresponds to one region and each column to one global hydrological model. Additionally, the median, minimum and maximum CSI score for every region are shown. The CSIs within the orange-dashed box around Phimai (THA) also confirm numerically that the differences between the global hydrological models are small, which holds true for most of the other regions, too. For Chemba (MOZ), however, the spread among the global hydrological models is high (see last column “Spread GHMs”), which can be explained by the result of the GHM VIC: This particular model suggests no flooding at all in the study region. Interestingly though, despite its poor performance for this region, it is not advisable to exclude this global hydrological model from future analyses, as it generally shows an average performance, and is even best model for the region of Dalby (AUS).

Critical success index (CSI) scores for all combinations of global hydrological models and the climate forcing GSWP3. The “Median Region” across the even number of regions is calculated as the mean of the two middle values. The floods in Lokoja (NGA) and Idah (NGA), which happened in 2012, were excluded from the computation as data for GSWP3 and WFDEI was available only until 2010. A black box indicates the best-performing global hydrological model(s) for a given region.
FIG 5 / Image credit: Mester et al. (2021)
Figure 6 shows a summary of CSI (left plot) and Bias Score (right plot) results. For every climate reanalysis, first the median score across all global hydrological models was calculated for each region; the boxplots show the distribution of those regional median scores. For the “default” runs without flood protection, the differences among the regions are high, and differences among the climate reanalysis datasets are small, with better CSI scores on average for GSWP3 and WFDEI. However, both climate forcings also tend to overestimate the simulated flood extent, represented by higher, positive Bias Scores. The inclusion of flood protection measures (“protect”) leads to an underestimation of flood extent, the performance is thus degraded for many regions, represented by negative Bias Scores and lower CSI. This may be explained by the fact that the protection levels are purely model-based for most of the study areas, and no information about actual design standards or corresponding policy regulations was available to inform the estimates provided in the FLOPROS database.

Comparison of CSI and Bias scores between the default setting (“default”) and the inclusion of spatially explicit flood protection levels of FLOPROS (“protect”) for the climate forcings PGFv2, GSWP3 and WFDEI. Each boxplot covers the median score for every study region. The distribution of the results is displayed in four equal sized groups, framed by the lower whisker, the box bottom, the central line, the box top and the upper whisker. The regions Lokoja (NGA) and Idah (NGA) were excluded from the computation.
FIG 6 / Image credit: Mester et al. (2021)
What are the takeaways from this study?
A global flood modeling chain is a useful – in fact, often indispensable – tool to enable estimates of societal and economic risk of river flooding, both for the past and the future. In our case study, we could show that for most regions the performance of the global flood modeling chain is relatively insensitive to the choice of the underlying climate reanalysis and global hydrological model. However, for some regions mutually dependent effects can be detected and individual global hydrological models lead to much lower agreement with satellite observations than the others. The climate reanalysis PGFv2 performs poorer than the other two climate forcing datasets for many regions and global hydrological models, but better for some. Surprisingly, the inclusion of spatially explicit flood protection measures worsens the modeling results, which will be investigated further in the future. Since no clear priority can be assigned to any forcing combination, we recommend a multi-model, multi-forcing approach for future studies, especially when there is no prior knowledge about the performance of a particular combination for a specific type of region.
References
Affiliations
1 Potsdam Institute for Climate Impact Research, Member of the Leibniz Association, Potsdam, Germany
2 Potsdam University, Potsdam, Germany