The influence of social and economic change on the consequences of natural
hazards has been a matter of much interest recently. However, there is a lack
of comprehensive, high-resolution data on historical changes in land use,
population, or assets available to study this topic. Here, we present the
Historical Analysis of Natural Hazards in Europe (HANZE) database, which
contains two parts: (1) HANZE-Exposure with maps for 37 countries and
territories from 1870 to 2020 in 100 m resolution and (2) HANZE-Events, a
compilation of past disasters with information on dates, locations, and
losses, currently limited to floods only. The database was constructed using
high-resolution maps of present land use and population, a large compilation
of historical statistics, and relatively simple disaggregation techniques and
rule-based land use reallocation schemes. Data encompassed in HANZE allow one
to “normalize” information on losses due to natural hazards by taking into
account inflation as well as changes in population, production, and wealth.
This database of past events currently contains 1564 records (1870–2016) of
flash, river, coastal, and compound floods. The HANZE database is freely
available at
Natural hazards take place when recurring extremes of the Earth's environment collide with human activities. Beyond the natural or anthropogenic changes to the environment, the extent of those activities has profound effects on the consequences of disasters. Even in a span of a few decades, social, economic, and technological developments drive the constant evolution of exposure and vulnerability to hazards. Therefore, there is growing interest in how much the number of persons and assets at risk has changed over time worldwide (Jongman et al., 2012; Kummu et al., 2016; Schumacher and Strobi, 2011), and what consequences those findings have for observed trends in natural hazards-related losses (Bouwer, 2011; Bouwer et al., 2007; Daniell et al., 2011; Munich Re, 2016; Schiermeier, 2006).
Floods in Europe have received particular attention (Barredo, 2007). Barredo (2009) found that correcting reported flood losses for inflation and economic growth yields no trend for 1970–2006, in contrast to steep rise in originally reported losses. Similar findings were presented for the United Kingdom, covering years 1884–2013 (Stevens et al., 2016). Other studies on trends in flood exposure were carried out, e.g. for Austria (Fuchs et al., 2015b), Italy (Domeneghetti et al., 2015), the Netherlands (Jongman et al., 2014), Spain (Barredo et al., 2012), Switzerland (Röthlisberger et al., 2016), and the United Kingdom (Stevens et al., 2015). The importance of population and economic growth, but also land use distribution, has been emphasised (Boudou et al., 2016; Sofia et al., 2017). At the same time, information on past flood losses are being collected in national (Guzzetti and Tonelli, 2004; Haigh et al., 2015) and international databases (Brakenridge, 2017; Guha-Sapir et al., 2017; Munich Re, 2017), including data collected as part of European Union-mandated preliminary flood risk assessments (European Environment Agency, 2015).
Workflow of the HANZE database from input data sets to final exposure maps and flood events database, and an example of how the two components interact to derive normalized flood losses.
However, there are several limitations of the aforementioned studies and databases. Exposure data sets were derived at a variety of spatial and temporal resolutions with different thematic coverage. Within a given country, typically one series of population, gross domestic product (GDP), housing stock, or other variable were used to normalize reported flood losses. This approach neglects substantial variation in development within countries. Also, the availability of past flood damage information is very uneven between countries, and international databases only provide reasonable coverage beginning in the 1980s. The timespan of the studies on exposure is usually limited to the most recent decades, given the lack of adequate data.
A typical source of gridded historical population and land use is HYDE (Klein
Goldewijk et al., 2017), which has a 5
Drawing from recent developments in pan-European demographic and land use mapping, as well as new studies on historical changes in population, production, and wealth, we seek to address the aforementioned weaknesses with a new comprehensive data set. Historical Analysis of Natural Hazards in Europe (HANZE) is a database enabling the study of historical trends and driving factors of vulnerability to natural hazards, with a particular focus on floods. It has two components, namely HANZE-Exposure and HANZE-Events. HANZE-Exposure consists of high-resolution gridded data with information on land use, population, production, and wealth per 100 m grid cell from 1870 to 2020. It allows one to derive potential damages for any past natural hazard with a defined spatial extent. The other component, HANZE-Events, contains information on location, time, and quantitative data on consequences of past natural disasters, currently limited to floods (1870–2016). It is supplemented by economic data necessary for converting nominal monetary losses into a single benchmark. HANZE covers 37 European countries and territories constituting approximately 70 % of the continent's population (Eurostat, 2017). The composition of the domain is detailed in Supplementary File 1 and Fig. S1 in this file.
As presented in Fig. 1, the starting points for constructing HANZE-Exposure database were a gridded land cover/use map (100 m resolution) and a population map (1 km resolution), both covering the situation in Europe ca. 2011. Based on previously published methods, demographic and economic data were disaggregated to 100 m resolution, and changes in historical land use and population were modelled utilizing a large compilation of historical statistics at the regional level. HANZE-Events was created from a wide array of published sources and databases. The end date of HANZE-Exposure is different from HANZE-Events, because exposure data are prepared with a 10-year time step for 1870–1970 and a 5-year time step for 1970–2020. Therefore, a short-term projection for 2020 is necessary to calculate exposure for post-2015 events. It should be noted that the starting year of 1870 was chosen mainly due to data availability.
The creation of HANZE-Exposure data involved four major steps, which are explained below. Main sources and concepts for HANZE-Events are outlined afterwards.
There are very few high-resolution population and land cover/land use maps, and data sets constructed with a certain methodology rarely extend beyond a single time point. Therefore, two maps (one each for population and land cover) for a single year (2011 or 2012) were collected as baseline for the study. All other time points between 1870 and 2020 are calculated from those baseline maps using historical statistics with substantially lower resolution.
The baseline land cover/use is based on CORINE Land Cover (CLC) 2012, version
18.5a (Copernicus Land Monitoring Service, 2017). CLC is a project supervised
by the European Environment Agency. It has so far produced four
pan-European land use maps for 1990, 2000, 2006 and 2012. The maps are
prepared mostly by manual classification of land cover patches from satellite
imagery with a resolution of 25 m or better. For the latest edition, images
collected during 2011–2012 were used. The inventory consists of 44 classes
(Fig. S3). The minimum size of areal features is 25 ha. For linear objects
such as roads, railways, and rivers, a minimum width of 100 m is used. CLC
2012 is first displayed as a vector map, and can then be transformed into a
raster with 100 m resolution. CLC 2012 covers the entire domain with the
exception of Andorra. For this particular country, the land cover/use map was
constructed with overlaying data from four different sources, top to bottom:
CLC 2012 v18.5a, which covers a small strip around the border; CLC 2000 v18.5, an earlier edition which covers a larger strip around the
border (Copernicus Land Monitoring Service, 2017); OpenStreetMap, accurate as of mid-2016 (Gisgraphy, 2017); Global Land Cover 2000 (Joint Research Centre, 2015).
The final map for the full domain of 37 countries and territories is
presented in Fig. S2.
The baseline population map is based on the GEOSTAT 2011 population grid,
version 2.0.1 (Eurostat, 2017). This data set has 1 km resolution and for
most countries it represents the actual population enumerated and
georeferenced during the 2011 round of population censuses, complemented by
estimates by the European Commission's Joint Research Centre. This data set is
presented in Fig. S4. For this study, the 1 km grid had to be further
disaggregated to 100 m resolution. Several methods have been proposed for
this procedure and tested for Europe (Gallego, 2010; Gallego et al., 2011).
Here, we combine methods M1 and M3 described in Batista e Silva et
al. (2013). M1 denotes the “limiting variable method” used in cartography
for creating dasymetric maps of population density. The procedure is an
iterative algorithm applied separately for each 1 km grid cell. The steps
are as follows:
First, uniform population density is assigned for each land use class
in a 1 km grid cell: where A population density threshold Land use classes are ranked and the subindex Proceeding in order starting with Surplus is then redistributed among the remaining land use classes If after completing all iterations there is still surplus population, i.e.
if
The crucial aspect of this method is defining the threshold
The result of the calculation, however, is only the population per land use in each 1 km grid cell. Hence, the population had to be disaggregated further. For this we used an approach similar to method M3. This method redistributes the population proportionally to the level of soil sealing, or imperviousness of the ground. This variable has a range from 0 %, which indicates completely natural surface, to 100 %, which indicates land completely sealed by an artificial surface. This information could not be used directly to redistribute the population as large soil sealing may be caused both by residential and non-residential buildings as well as infrastructure. However, large elements of infrastructure or industry were already taken into account using the limiting variable method.
Data on soil sealing were obtained from the Imperviousness 2012 data set
(Copernicus Land Monitoring Service, 2017). It was created based on
high-resolution satellite photos taken during 2011–2012 in visible and
infrared spectrum. This data set has 100 m resolution, which was resampled
to a 1 km grid, so that average population density in grid cells with given
imperviousness could be calculated. The resulting relationship can be
approximated as a power law function, based on cell imperviousness ranging
from 1 to 96 % (Fig. S5). Cells with 0 % imperviousness should,
in principle, not be inhabited. Additionally, a power law function converges at
0 %. At the opposite on scale, almost no 1 km cells have values above
96 %. Hence, the population
Disaggregation result and source data for a fragment of the city of
Delft in the Netherlands. The area shown corresponds to a 1 km grid in the
GEOSTAT population data set. In this grid cell, the population at the time of
the 2011 census was 1218. Panel
Reconstruction of exposure for years other than the baseline maps requires historical statistics for several variables. Most of those statistics have been collected at regional level. The Nomenclature of Territorial Units for Statistics (NUTS), 2010 edition (European Union, 2011), was used here to define the region. This classification has four levels (0, 1, 2, 3), where 0 is the national level and 3 is the finest regional division. Level 3 was chosen for this study, resulting in 1353 regions in the study area (Fig. S6). A vector map of regions was obtained from ESRI (2016) with amendments based on Eurostat (2017) map in order to fully match NUTS 2010 classification. Coastlines in the vector map were further adjusted using the aforementioned CLC 2012 map. NUTS favours administrative divisions in defining the regions, though often statistical (analytical) regions are used instead, created by amalgamation of smaller administrative units. It should be noted that NUTS 2010 was used instead of newer editions because 2011 census data, matching the baseline population map, were disseminated using this classification of regions.
All variables collected and used as input to HANZE-Exposure are listed in Table 1. Detailed definitions and concepts for all variables are include in Supplementary File 1. Their utility for the study is explained in the subsequent subsections. In general, all variables were collected from almost 300 sources, so that a time series for one variable for one country was typically merged from several sources. Due to the number of sources and transformations required to complete the database, only the most important methods and sources are mentioned in the Supplementary File 1. Full descriptions of sources and methods are included per country, separately for each variable, with the exception of the “forestry index”, “airports”, and “reservoirs” variables, which are described in this manuscript as they were compiled in a more straightforward manner.
Input variables in HANZE-Exposure.
After the baseline maps and a database of historical statistics were completed, changes in land use and population over time were modelled. This was carried out for each of the 1353 NUTS 3 regions separately in specified order. A summary of the procedure is included in Table 2 and the most important details of the methodology are described below.
Summary of historical land use and population modelling approaches, by CORINE Land Cover classes (see Fig. S3). The number in first column indicates the order in which the modification of land use and population was done.
Redistribution of population within urban areas and growth of cities were modelled based on two factors: change in urban population size and change in number of persons per households. Increasing population combined with smaller families in each dwelling have caused a substantial increase in the demand for housing. Between 1870 and 2011, the number of urban households in Europe increased eight-fold. Those extra dwellings were typically constructed outside the urban centres, as existing houses were rarely replaced by bigger ones. Many studies have shown a functional relationship between population density and distance from the city centre (Berry et al., 1963; Anas et al., 1998; Papageorgiou, 2014). Clark (1967) showed that over time the sharp decline in population density with distance has become much less pronounced. This is largely caused by the aforementioned social change: in existing households, families have became smaller, and thus the population declines closer to the centre and the surplus population is accommodated further from the centre in less-developed areas.
In light of the above, the modelling procedure is as follows:
In every urban fabric grid cell where All grid cells in a NUTS 3 region are ranked by distance from urban
centres, where the highest ranked cells are the closest to any urban centre. Surplus population where If If
The important aspect influencing the result of this process is the “distance
from urban centre”. Urban networks have several levels of hierarchy, with
large agglomerations influencing population distributions far outside their
borders. Therefore, the distance from urban centre is a weighted sum of three
Euclidean distances from the following:
Centres of large agglomerations, as presented in a shapefile data set from
United Nations (2014), which shows the arbitrary centres of cities with a
population larger than 300 000. Centroids of population clusters. These clusters were calculated by
Eurostat (2017) from the 1 km population grid. The centroids were weighted,
based on the population in each grid cell. Centroids of patches of urban fabric. The patches were taken from CORINE
Land Cover 2012 (Copernicus Land Monitoring Service, 2017), and centroids are
based on the geometry of those patches.
Equal weighting of the three layers was found to be optimal by analysing the
approach's accuracy (see Sect. 3.2). After urban fabric and population are
redistributed, changes in area covered by other types of artificial surfaces,
as well as reservoirs, are accounted for (see Table 2). Then, evolution in
cropland area is modelled using an approach similar to one utilized in HYDE
database of historical land use and population (Klein Goldewijk et
al., 2011). It involves changing the allocation of croplands over time
according to the land's suitability for agriculture. Therefore, if in time
step
The suitability is a sum of two indicators, which were also used in the HYDE
database. The first indicator is the slope of the terrain (Fig. S7), which is
a serious limiter on agricultural activity, and which was calculated from
EU-DEM data set at 100 m resolution (Eurostat, 2017). We found a close
exponential-type relationship between percentage of area used for croplands
and slope. The second indicator is the crop suitability index for
high-input cereals as calculated by FAO in the Global Agro-Ecological Zones
(GAEZ) database (FAO, 2016; Fischer et al., 2002). The resolution of this
data set is 5
For the slope indicator, the upper bound was set at 0 % slope, while for
the crop suitability index the upper bound was set at the polynomial
function's maximum (approx. 1500). The suitability indicator for croplands
The main drawback of the method is that due to the relatively coarse resolution of the GAEZ data set, there are often many cells with the same rank, and the total area of croplands from the model does not exactly match the data in the historical statistics database. Therefore, when too many cells have the same rank, they are further ranked by the centroid distance (as for urban population), so that agricultural land with a given suitability class is the first added closest to urban areas, and is the first removed furthest away from urban areas.
Modelling the changes in pastures follows the same methodology as croplands,
except that the crop suitability index for cereals was replaced by the same index
for high-input alfalfa (also known as lucerne), a common crop growing in
meadows and pastures (Fig. S9). The suitability indicator for pastures
For a given time step If If Then, the population in all non-urban grid cells was modified according to
the change in average household size, i.e. where In the case that the realized
Disaggregation of economic data provides estimates of GDP and wealth per grid cell, just like the population and land use data. It was carried out after historical gridded population and land use were obtained. The methodology presented here extends the approach proposed in the European Union's ESPON 2013 Programme (Milego and Ramos, 2011) and some others studies, such as G-Econ project (Nordhaus and Xi, 2011), in which the GDP is disaggregated proportionally to the population. This approach works well with a relatively coarse resolution of the output grid; however at 100 m resolution the economic variables are much less connected with the place of residence of the population. On the other hand, all economic activities still require labour input. Using the observation that employee's compensation constitutes approximately half of GDP in European countries (Eurostat, 2017), the GDP and wealth are disaggregated in equal proportion using population and land use. It should be noted that wealth is defined here as tangible, produced, non-financial fixed assets. The composition of wealth is detailed in the Supplement and Table S2.
Table 3 provides a summary of the assumptions behind the disaggregation. Additional assumptions had to be made for the agricultural sector, which is the most dispersed, as almost three-quarters of the study area are covered by agricultural land use or forests. At the same time, farmland and pastures are more productive and contain more assets than forests, especially since trees do not count as fixed assets. However, a breakdown of GDP by agriculture and forestry is not available at regional level, and very limited historical data exist with such detail on national level. Hence, agricultural GDP and wealth at the regional level were broken down to forestry (including logging) and remaining agriculture (including fishing and aquaculture) using the sectoral split at national level in 2011 from Eurostat (2017). The share of forestry in the agricultural sector varies from zero in Malta to 73 % in Sweden.
Disaggregation of economic variables by population and land use classes (CLC: CORINE Land Cover).
Half of the GDP generated by agriculture (excluding forestry), as well as half of the wealth in this sector is distributed proportionally to the population living in agricultural areas. The other half was distributed equally among CLC classes 211–244 (“agricultural areas”). GDP and wealth in forestry was distributed the same way, but by using CLC classes 311–313 (“forests”). Half of the GDP and half of the wealth in industry and services were distributed proportionally to the population in all grid cells, while the other halves were distributed equally among specific land use classes where a given production is concentrated, as in Table 3.
For the remaining two classes of wealth, the approach was slightly different. The entire wealth in housing (dwellings) was distributed proportionally to the population in all grid cells. The entire value of infrastructure, on the other hand, was distributed equally over selected land use classes: urban fabric, airports, ports, roads, and railway sites (CLC 111, 112 and 122–124).
HANZE-Events includes information on past damaging floods that occurred in
the domain (37 countries and territories) between 1870 and 2016. Several
rules were applied to determine whether a flood event indicated in sources
should be included in the database, as follows:
At least one of four statistics (area flooded, persons killed, persons
affected, losses) had to be available for a given event. However, if no
persons were known to have been killed or missing in the flood, at least one
of the other statistics had to be available. Insignificant floods, i.e. events which affected only a small part of one
region, with no fatalities and less than 200 persons affected, were not
included. Available information for a given event had to be sufficient in order to
assign month, year, country, regions affected, type of flood, and general
cause of the event. Flood source (river/lake/sea name), detailed information
on the cause and day of the event were not required. Floods that were caused by insufficient drainage in urban areas not
connected with any river system, floods caused entirely by dam failure
unrelated with a severe meteorological event, or caused by geophysical
phenomena (such as tsunamis or Flood events that had impact on more than one country were split per
country as long as data were available on per country basis. Otherwise they
were presented as one flood event. Also, in the case of an event affecting
several regions of a country, when the availability of statistics per region
is uneven, the event was split accordingly.
Records of flood events were obtained from a large variety of sources (more than 300), including international and national databases, scientific publications, and news reports. The source of information is indicated per event in the HANZE-Events data set. In the majority of cases, entries taken from international databases were cross-checked with other sources and amended as necessary. Databases particularly worth mentioning are EM-DAT (Guha-Sapir, 2017), Dartmouth Flood Observatory (Brakenridge, 2017), NatCatService (Munich Re, 2017), European Environment Agency database of historical information submitted under Floods Directive (2015), the national flood databases of France (Lang et al., 2016), Italy (Guzzetti and Tonelli, 2004), Spain (Dirección General de Protección Civil, 2015), and the United Kingdom (Black and Law, 2004; Haigh et al., 2015), and several national and regional preliminary flood risk assessments.
In order to convert reported losses from various currencies and reference years to a single benchmark, information on inflation and currencies were collected. Two tables were prepared and are included with other HANZE input data. The first one includes all currencies that were used in the study area between 1870 and present, with their names, ISO 4217 codes, starting and ending dates of validity as well as conversion factors to euro. For countries not currently using the euro, 2011 exchange rates from Eurostat (2017) were used. Information on currencies and conversion factors was mostly gathered from ISO 4217 (ISO, 2017) and GHOC databases (Taylor, 2004), supplemented by various Internet resources.
List of files of HANZE database. XXXX represents the value indicating the year to which data set pertains.
The second table contains deflators used to adjust nominal losses to real losses in 2011 prices. The GDP deflator was generally used, as it allowed us to make the loss adjustments consistent with GDP values. Alternative price indices were used only if the GDP was not available, but they were always “anchored” to the GDP deflator series. These other series included indices of consumer, wholesale, retail, or cost-of-living prices. The source of the data was usually the same as those for the GDP data; they are listed in detail in the data files themselves. It should be noted that the currency conversions and deflators omit four cases of hyperinflation: Germany 1923, Poland 1923, Greece 1944 and Hungary 1946. Inclusion of those cases would cause large distortions to the data series. Hyperinflation periods and resulting currency changes were marked in the data set. The data set also includes deflator series for three former countries – Czechoslovakia, the Soviet Union and Yugoslavia – as many countries were their constituents in the past.
The complete list of files of HANZE and their contents is listed in Table 4.
Exposure maps in 100 m resolution are provided as GeoTIFF rasters in
ETRS89/LAEA projection, consistent with INSPIRE European grid. The baseline
maps of land use and population (100 m resolution) are also included. For
the benefit of climate research groups in particular, the data sets are
provided also in aggregated, lower-resolution versions. Two files in netCDF
format are included: 5
Input historical statistics and the HANZE-Events database of past damaging floods are provided as Excel files. The structure of the files with input data is detailed in Tables S3 and S4. Apart from the statistical information, the two files (with demographic/environmental and economic data) each includes a table with all sources and transformations made to the data per country, per variable, and per year, as well as a list of references. The contents of the HANZE-Events database with explanations of all data recorded per event is shown in Table 5.
Information included in HANZE-Events database.
The accuracy of the data involved in HANZE database is influenced by three elements: (1) quality of baseline maps and historical statistics; (2) robustness of the methodologies used for disaggregation of data and modelling change in population and land use; and (3) completeness and reliability of the records of past damaging floods.
The baseline land cover/use map, CORINE Land Cover 2012, was employed for this analysis before final validation was made, but subsequently the map was found to have thematic accuracy of around 90 % (Copernicus Land Monitoring Service, 2017). Still, the use of thresholds of minimum size (25 ha) and width (100 m) of objects necessary for inclusion in the map result in many small objects with large effects on population distribution to be omitted, e.g. small bodies of water or smaller pieces of infrastructure and villages. It should also be noted that mapping was done by country independently, and therefore the classification of land use is not always fully consistent between countries, and the thematic accuracy varied from 82 to 97 % between countries. Validation reports are also available for imperviousness layers and elevation models from Copernicus Land Monitoring Service (2017).
The baseline GEOSTAT population grid's accuracy is described in reports by
the provider (Eurostat, 2017). Though for most countries the quality of the
1 km grid is very high, with 98–100 % of a national population
georeferenced, there are exceptions. In Bulgaria, for example, only 57 %
of the population was georeferenced and the remainder was disaggregated from
settlements or local administrative units. In Italy the entire data set was
calculated from enumeration areas, albeit their average size was below
1 km
Historical statistics were compiled from a large variety of sources. Total population figures were mostly available at regional level, while the remaining statistics were usually available only at national level beyond the most recent 2–3 decades. Inevitably, there are inaccuracies from applying national trends at the regional level. Also, economic data series before approx. 1950 for western Europe and 1990 for central Europe are more often than not reconstructions based on ancillary or proxy data. Notwithstanding those limitations, we believe that, for the study area, the HANZE database represents an improvement in resolution and thematic coverage over the HYDE database. A comparison in the number of regional estimates of total and urban population included in both databases is shown in Fig. S12.
Estimates of
In this study, the population distribution was disaggregated from 1 km to 100 m using two methods validated previously in literature (Batista e Silva et al., 2013). Lack of comparative data at such resolution prevents us from further analysing the quality of the disaggregation. Still, the original resolution is very fine and the refining narrows the distribution of population by eliminating areas that are uninhabitable or very unlikely to be inhabited. There is no comparative information for economic variables downscaled from regional level to gridded data.
Lack of comparable data for validation is also evident for historical land use changes. Some local reconstructions of past land cover/use were made from old maps, but there is limited consistency in classification or minimal mapping units to allow for an accurate comparison. CORINE Land Cover is available for 2000 and 2006, but often indicated changes in land use are only reclassifications rather than actual developments. Hence, changes in historical croplands and pasture distribution were not validated directly. The general methodology used here, i.e. reallocating croplands and pastures based on land suitability for agriculture, has been extensively utilized in many studies before (Hurtt et al., 2011; Kaplan et al., 2011; Klein Goldewijk and Verburg, 2013; Pongratz et al., 2008; Ramankutty and Foley, 1999). A more detailed uncertainty and sensitivity analysis of the input data and methods would be possible using structured expert judgment (Colson and Cooke, 2017; Cooke and Goossens, 2008).
Some analysis, however, could be made on the historical distribution on urban
population. Estimates with the Clark (1951) model of urban population density are
available for 19 cities, which consider population distribution in urban
areas as an exponential function:
A comparison of function parameters is presented in Fig. 3. Overall, a good
fit was achieved for the
Distribution of flood events in HANZE by year and type.
Further validation of historical population grid was done by using
Eurostat-produced estimates of population at local administrative unit (LAU).
This data set (Gløersen and Lüer, 2013) is provided at LAU level 2,
except Denmark, Lithuania, Portugal, and Slovenia, where coarser LAU level 1
data are available; data for microstates, except Liechtenstein, are missing.
Population is provided at census dates or interpolated/extrapolated to six
reference dates (1 January every decade from 1961 to 2011). Data at census
dates were extensively used in HANZE database by aggregating them to NUTS3
regions. Here, we connected LAUs in the Eurostat data set with a vector map
from Eurostat (2018). For Greece, only LAU level 1 map was available;
therefore population estimates were aggregated accordingly. Administrative
changes were accounted for to synchronize the population data set and the map,
though a small number of LAUs for Ireland and the United Kingdom could not be
matched between the data sets (as a result validation was not possible for
region UKK14). The final map has 109 177 units, which was then intersected
with population grids for 1960, 1970, 1980, 1990, 2000, and 2010. Then, for
each LAU two measures commonly used for flood map validation were employed
(Alfieri et al., 2014). Test for “correctness” (or “hit rate”
Scores in two measures of accuracy of gridded population estimates (simple average for all LAUs).
The quality of records of past floods depends on two main factors: (1) completeness (what share of past floods could be traced) and (2) the reliability of information on the location and quantitative data on losses. Completeness varies substantially between countries, few of which maintain publicly available databases of flood losses. Historical information contained in mandatory preliminary flood risk assessments was sometimes very extensive, but often little or no quantitative information on losses was included. International databases of events have short timespan: EM-DAT nominally starts with year 1900, but very few floods are included before 1970. NatCatService and EEA's compilation of Floods Directive data have coverage from 1980 and Dartmouth Flood Observatory from 1985. Due to the development of Internet, availability of news reports on floods increased substantially starting with mid-1990s, though an increasing number of old newspaper articles are digitized and provide a valuable resource. Under-reporting for central European countries before 1990 is also evident, due to communist-era censorship.
The reliability of past flood loss data remains an open question. Efforts were made to gather multiple sources for past events, especially large ones. In the vast majority of cases, records of floods from international databases can be corroborated by other sources or at least by other international databases. Some records were found to be either dubious or were not primarily flood events, but rather landslides, as found for Portugal (Zêzere et al., 2014). The most extreme case is a record in EM-DAT, according to which a flood along the Danube in Romania in 1926 caused 1000 deaths. However, the Romanian preliminary flood risk assessment indicates that national literature sources do not contain any mention of flood fatalities in that year (Administraţia Naţională Apele Române, 2009). A calamity of such magnitude, which would have been the deadliest European flood in the past 150 years, must have left a trace in several sources. Therefore, this event was not included in HANZE. Also, there are some cases of floods occurring with other hazards (windstorms, hail, landslides), where it was not possible to disentangle flood losses from those from other causes. Therefore some flood records include or might include those other losses, which are marked in the database under “Notes” category. On the other hand, some flash floods were not included if the majority of losses were not caused by floodwater.
In total, HANZE-Events contains 1564 records of floods (Fig. 4), where
157 events (10 %) have information on the flooded area,
1547 (99 %) on persons killed, 682 (44 %) persons affected, and
560 (36 %) on monetary losses. The known flood consequences amount to
almost 123 000 km
HANZE-Exposure (both input and output data sets),
HANZE-Events and database documentation were uploaded to the 4TU Centre for
Research Data (
The HANZE-Exposure database is intended to provide data allowing one to normalize historical losses related to natural hazards. We hope that it will be useful for researchers studying past occurrences of damaging meteorological, hydrological, or geophysical phenomena. Also, the database could be used to analyse changes in distribution of population and assets within natural hazard zones, e.g. flood hazard maps. To improve reusability, we provide exposure data in different resolutions and formats, so that the data set can be easily applied regardless of how the extent of events (“footprint”) is defined: a polygon, a raster layer, a country subdivision, or a climate model grid. HANZE-Exposure can be also considered as a refinement of the HYDE database for Europe for the past 150 years. In principle the spatial data sets and the input historical statistics could also be applied for purely socioeconomic research, e.g. studying regional development or land use changes. However, in that case we would urge potential users to first analyse the methodology and data sources contained in the database and its documentation in order to assess if HANZE-Exposure is suited for the users' research purposes. For example the resolution of regional economic data is of crucial importance when analysing the convergence of the levels of economic development between regions. In this case, the resolution of regional economic data is of crucial importance Also, the 100 m resolution of the data set should not be interpreted as a benchmark of its accuracy, as it was chosen to (1) preserve the good representation of urban areas and elements of infrastructure, where most of the population lives and most wealth is accumulated and (2) align socioeconomic data with pan-European flood maps which have the same resolution.
HANZE-Events currently encompasses only information on floods, but the same framework could be used for other hazards. Number of casualties or losses in monetary terms can then be corrected (or “normalized”) for changes in currency, inflation, population, or economic growth using HANZE-Exposure. Also, reported losses could be contrasted with potential losses (e.g. exposed population or assets within a flood hazard zone with a given probability of occurrence; Paprotny et al., 2017). Information on relative losses could provide insight into how the vulnerability of a population has changed over time.
DP conceived and designed the study, prepared the data sets and drafted a first version of the manuscript. SNJ and OMN helped to design the study. OMN helped to draft the manuscript. All authors revised the manuscript and gave final approval for publication.
The authors declare that they have no conflict of interest.
HANZE database was prepared with the support of project “Risk Analysis of Infrastructure Networks in response to extreme weather” (RAIN), which received funding from the European Union's Seventh Framework Programme for Research and Technological Development under grant agreement no. 608166. Further support was provided by project “Bridging the Gap for Innovations in Disaster Resilience” (BRIGAID), which received funding from the European Union's Horizon 2020 Programme for Research and Innovation under grant agreement no. 700699. The authors would like to thank Antonia Sebastian for her comments on the manuscript. Edited by: Alexander Gelfan Reviewed by: two anonymous referees