A global historical data set of tropical cyclone exposure (TCE-DAT)

Tropical cyclones pose a major risk to societies worldwide, with about 22 million directly affected people and damages of USD 29 billion on average per year over the last 20 years. While data on observed cyclones tracks (location of the center) and wind speeds are publicly available, these data sets do not contain information about the spatial extent of the storm and people or assets exposed. Here, we apply a simplified wind field model to estimate the areas exposed to wind speeds above 34, 64, and 96 knots (kn). Based on available spatially explicit data on population densities and gross domestic product (GDP) we estimate (1) the number of people and (2) the sum of assets exposed to wind speeds above these thresholds accounting for temporal changes in historical distribution of population and assets (TCE-hist) and assuming fixed 2015 patterns (TCE-2015). The associated spatially explicit and aggregated country-event-level exposure data (TCE-DAT) cover the period 1950 to 2015 and are freely available at https://doi.org/10.5880/pik.2017.011 (Geiger at al., 2017c). It is considered key information to (1) assess the contribution of climatological versus socioeconomic drivers of changes in exposure to tropical cyclones, (2) estimate changes in vulnerability from the difference in exposure and reported damages and calibrate associated damage functions, and (3) build improved exposurebased predictors to estimate higher-level societal impacts such as long-term effects on GDP, employment, or migration. We validate the adequateness of our methodology by comparing our exposure estimate to estimated exposure obtained from reported wind fields available since 1988 for the United States. We expect that the free availability of the underlying model and TCE-DAT will make research on tropical cyclone risks more accessible to non-experts and stakeholders.


Introduction
Tropical cyclones (TCs) are among the most harmful natural disasters worldwide, with USD 29 billion of direct damages and 22 million people affected on average each year (Guha-Sapir, 2017). In addition to these direct damages tropical cyclones have the potential to exercise negative influence on long-term development such as dampening of economic output (Hsiang, 2010;Hsiang and Jina, 2014), e.g., by reduced education achievements, mortality, and displacement, but can also cause indirect benefits such as alleviating drought.
Direct economic losses from TCs show a positive trend over time (MunichRe, 2015) whose attribution to increasing exposure, changing vulnerability, and more extreme hazards is heavily debated (Pielke et al., 2008;Estrada et al., 2015). The attribution is particularly relevant for future projections of TC impacts given expected changes in population numbers and patterns (Jones and O'Neill, 2016), potential increases in hazards under unchecked climate change (Emanuel, 2013), and the future evolution of vulnerabilities (Bakkensen and Mendelsohn, 2016;Geiger et al., 2016Geiger et al., , 2017a. Options to Published by Copernicus Publications. gain a better understanding of TC induced societal risks strongly depend on high-quality observational TC and socioeconomic records. However, availability of data strongly varies over time and space, is limited to certain regions only (Anderson, 2017;Anderson et al., 2017), and data sets can be subject to various reporting biases (Guha-Sapir and Below, 2002;Wirtz et al., 2014). Working with these issues can be tedious and even beyond the scope of a researcher's expertise. Moreover, standardized methods of data selection and preparation facilitate the reproducibility and comparability of research results and also accelerate scientific discovery.
To overcome current limitations, we here provide a globally consistent data set of TC exposure, named TCE-DAT. Exposure in TCE-DAT is defined per TC event as the number of potentially affected people and the sum of potentially affected assets purely due to TC maximum wind speed. Additional impact categories to quantify exposure could account for duration or gustiness of strong winds, TC-related precipitation, and or storm surges. TCE-DAT covers the period from 1950 to 2015 and provides estimates of exposed population and exposed assets by 2713 individual landfalling TCs with at least 34-knot (kn) 1 min sustained wind speed above land documented by the International Best Track Archive for Climate Stewardship (IBTrACS) (Knapp et al., 2010). The data set is created using only publicly available data sources and running the open-source economics of climate adaptation (ECA) tool CLIMADA (Bresch, 2014;Gettelman et al., 2017).
To allow for an assessment of purely physically driven changes in exposure we also provide estimates of the number of people and the sum of assets exposed given fixed 2015 distributions of population and assets. In this regard TCE-DAT extends and complements estimates from the Global Assessment Report on Disaster Risk Reduction (GAR 2015) (UNISDR, 2015), which provides a statistical assessment of exposure given fixed socioeconomic conditions.
In combination with reported damages and number of people affected from other sources, e.g., EM-DAT (Guha-Sapir, 2017) and NATCAT (MunichRe, 2015), TCE-DAT allows for a convenient assessment of historical vulnerabilities finally translating hazard (wind intensities) and exposure into damages or people affected as indicators of societal risks.
In the following we describe the input data sets and our methodology used to create TCE-DAT. We then validate our findings based on exposed population estimates for the United States. We conclude by discussing potential applications of TCE-DAT and comment on its limitations and sources of uncertainty.

CLIMADA -risk modeling
TCE-DAT builds on various TC and socioeconomic data sets that are merged and analyzed using CLIMADA, an open-source probabilistic natural catastrophe risk assessment model (Bresch, 2014). For the definition of natural hazard risk, we follow the definition by the IPCC (2014) where risk is defined as a function of hazard, exposure, and vulnerability, i.e., risk =f (hazard, exposure, vulnerability) = (1) probability of hazard × f (intensity of hazard, exposure, vulnerability), where the latter three elements constitute severity of the impact. Hazard describes weather events such as storms, floods, drought, or heatwaves both in terms of probability of occurrence and physical intensity (see Sect. 2.3 below). Exposure describes the geographical distribution of people, livelihoods, and assets or infrastructure, or, generally speaking, of all items potentially exposed to hazards, including ecosystems and their services. In the present case, exposure is determined for each TC separately based on the storm's wind field and maximum sustained wind speed (see Sect. 2.2 below). Vulnerability describes how specific exposure will be affected by a specific hazard, i.e., relates the intensity of a given hazard with its impact, such as wind damage to buildings as a function of wind speed or the effect of a flood on a local community and its livelihoods. The damage function hence expresses the specific vulnerability for a given kind of assets. While CLIMADA allows for the implementation of different damage functions translating the intensity of the hazard, exposure, and vulnerabilities into damages and people affected (Gettelman et al., 2017) we only use part of its functionality to solely estimate exposure by using a steplike vulnerability function that is zero below a certain wind speed threshold and unity above. The CLIMADA module ISIMIP v1.0 used to generate TCE-DAT can be found at https://github.com/davidnbresch/climada_module_isimip/ releases/tag/v1.0.

Socioeconomic data
We use socioeconomic data at the grid level with 0.1 • × 0.1 • resolution. For the attribution of exposed population and assets to different countries we use a country mask with equal resolution.

Spatially explicit population data
Affected population is determined based on the History Database of the Global Environment (HYDE, version 3.2), which is developed under the authority of the Netherlands Environmental Assessment Agency and provides (gridded) time series of population and land use for the last 12 000 years (Klein Goldewijk et al., 2010, 2011. HYDE provides population data with an original resolution of 5 arcmin (0.083 • ), decennially up to Earth Syst. Sci. Data, 10, 185-194, 2018 www.earth-syst-sci-data.net/10/185/2018/ 2000 and annually up to 2015, and is freely available at https://doi.org/10.17026/dans-25g-gez3 (Klein Goldewijk, 2017). Where required we linearly interpolate the data to derive annual distributions, and finally aggregate the numbers to 0.1 • resolution.

Spatially explicit assets data
The spatially explicit assets data set is created based on spatially explicit GDP data (in 2005 PPP USD), available decennially between 1850 and 2100 Geiger, 2017;Geiger and Frieler, 2017;Geiger et al., 2017b;Murakami and Yamagata, 2017). Data from 2010 onwards are based on national GDP time series according to the Shared Socioeconomic Pathways (SSP2) (Dellink et al., 2017;Frieler et al., 2017;Geiger et al., 2017b). Grid-level GDP is downscaled from national GDP estimates, using spatially explicit population estimates and multiple other predictors, e.g., distance to cities and to the coast, road network densities, and others . GDP data, provided with an original resolution of 5 arcmin (0.083 • ), are linearly interpolated to derive annual distributions for the years from 1950 to 2015. Finally, data are aggregated to 0.1 • resolution in the same way as the population data.
To estimate assets distributions from the GDP data we use the Global Wealth Databook 2016 assembled by Credit Suisse (CreditSuisse, 2016) to derive national assets / GDP ratios for the year 2016 for 181 countries. Ratios for missing countries are approximated based on geographically close countries with similar GDP per capita values. Due to a lack of reported asset distributions for other years we assume national assets/GDP ratios to be constant over the considered time period . The decennially gridded GDP at original resolution and the national assets / GDP ratios are freely available at https://doi.org/10.5880/pik.2017.007 (Geiger et al., 2017b).

Hazard data
IBTrACS provides the most comprehensive global data set of historical tropical cyclone activity (Knapp et al., 2010). We rely on the latest version (v03r09), which includes tropical cyclones records up to the end of 2015. IBTrACS combines TC data from various regional specialized meteorological centers (RSMCs). However, historical TC records from the National Hurricane Center (NHC) of the United States (known as HURDAT), available for the North Atlantic and eastern Pacific, and the Joint Typhoon Warning Center (JTWC), available for the remainder of the world, are regarded most accurate (Holland and Bruyère, 2014). Whenever possible, we sub-select HURDAT and JTWC data from IBTrACS data, relying on other providers for otherwise missing events only (see Table 1).
The IBTrACS archive originally contains 7019 entries between 1950 and 2015 (3662 between 1980 and 2015). We select 5719 TCs between 1950(3577 TCs between 1980 where all information required to estimate the associated wind fields is available (see Table 2 for the list of required variables) to subsequently filter 2713 events with landfall. Note that most incomplete data entries occur prior to 1980, and in particular for very weak events mostly without landfall. We here define a TC to make landfall if at least one grid cell (of the hazard grid) of the TC's simulated wind field is above land with at least 34 kn maximum winds, thereby counting no direct hits as landfalls. This landfall definition also depends on the resolution of the underlying grid. We here use a country mask of 0.1 • × 0.1 • resolution (360 arcsec) that is upscaled from an original resolution of 150 arcsec to provide best possible coverage of the coastline. To further reduce inconsistencies with the socioeconomic gridded data we globally extend the land area of the hazard grid by one grid cell (0.1 • ) into the oceans. Thus, we artificially increase the number of landfalls but, conversely, minimize the number of socioeconomically relevant grid cells www.earth-syst-sci-data.net/10/185/2018/ Earth Syst. Sci. Data, 10, 185-194, 2018 that would be labeled as water otherwise. This procedure is particularly relevant for small islands and coastal cities for which the calculation of exposure would otherwise result in a gross underestimation.

Wind field modeling
The IBTrACS archive only contains TC center coordinates and other physical variables on a 6 h snapshot basis. A wind field model is required to generate continuous wind fields that -based on IBTrACS variables -provides realistic distributions of surface winds around the TC center. The spatial extent of a TC is usually described as the sum of the following components: (1) a static circular wind field for each track coordinate, and (2) the translational wind speed component that arises from the TC movement. To estimate the first component several models have been proposed; see, for example, Holland (1980Holland ( , 2008, Holland et al. (2010), and . Here, we apply the improved wind field model by Holland (2008) (named Holland08 in the following), which has been successfully applied in other studies, e.g., Peduzzi et al. (2012). The maximum surface wind v m defined by Hol-land08's pressure-gradient model is given as where ρ is air density, e is the base of natural logarithms, p is the pressure drop to the cyclone center as a function of radial distance r in units of radius of maximum winds r m , and b s is a quantity that depends on higher powers of p, the temporal change in pressure, and the TC's translational speed and latitude; see Holland (2008) for further details.
The second component is added to the first one by quantifying the mean TC's translational wind speed between two consecutive track coordinates (via an optimized Haversine formula) and vectorial addition of both wind speed components. We incorporate that the effect of the translational wind speed decreases with distance from the TC center by multiplying the translational component by an attenuation factor given as the ratio between the distance to center and rmax; see also Peduzzi et al. (2012). Although this attenuation factor can be thought to resemble surface friction effects, we neither explicitly account for surface friction and the resulting reduction and rotation in the translational speed's magnitude and direction, respectively (Lin and Chavas, 2012), nor do we incorporate that the magnitudes of the motion-induced asymmetries at the surface do not necessarily increase proportionally with the translation speed (Uhlhorn et al., 2014).
Our implementation of the Holland08 model (including the translational TC movement) is freely available within the CLIMADA ISIMIP module (https://github.com/ davidnbresch/climada_module_isimip/releases/tag/v1.0), which has been used to generate the provided data set. The input variables required to run the Holland08 model are summarized in Table 2.
The Holland08 model works best in the tropics; for TCs with subtropical transition that potentially enter the westerlies of the mid-latitudes we limit the translational wind speed component to 30 kn, thereby removing fast-moving storms that lack TC characteristics.
The present implementation of the Holland08 wind field model generates a complete wind profile for each TC by saving its lifetime's maximum wind speed at each spatial location; 1 min sustained wind speeds below 34 kn (17.5 m s −1 ) are discarded (see Fig. 1).

Overview of TCE-DAT
The final TCE-DAT is freely available in Geiger et al. (2017c). It is created by overlaying the estimated winds fields and the distributions of assets and population. We provide spatially explicit exposure data for each TC but also aggregated data of all nonzero country-and TC-specific exposure values. Two data types are included in TCE-DAT: (1) TCE-hist, where socioeconomic information matches the year of landfall, and (2) TCE-2015, where socioeconomic patterns are fixed at 2015 values. Aggregated TCE-DAT provides estimates of exposed population and exposed assets by event and by country for 34, 64, and 96 kn wind speed Earth Syst. Sci. Data, 10, [185][186][187][188][189][190][191][192][193][194]2018 www.earth-syst-sci-data.net/10/185/2018/  thresholds, corresponding to the Saffir-Simpson hurricane scale classification of tropical storm, hurricane, and major hurricane, respectively. TCE-DAT at the grid level provides exact wind speed information and exposed population and assets for each grid coordinate above land. Note that TCE-2015 contains 23 additional entries compared to TCE-hist. This is due to the fact that population and assets distributions have advanced over time and would have been exposed if all historical TCs were to make landfall in 2015 (as assumed in TCE-2015), while they were not exposed historically. Due to technological innovations the reporting of TCs in the IBTrACS database has improved significantly over time, reaching comprehensive global coverage by 1980 (see also Fig. 2). Compared to basin-wide TC activity, the number of landfalling TCs is smaller and shows greater variability due to underlying climate variability, e.g., driven by the El Niño-Southern Oscillation (ENSO). When using TCE-DAT to analyze trends in TC risk (see Fig. 3), one should be aware of potential underreporting in IBTrACS for earlier periods that might even affect landfalling TCs and can be one reason for trends.

Limitations of TCE-DAT
We ask each user to consult the list of limitations of TCE-DAT before working with the data.
The IBTrACS archive is the most comprehensive data set of TC activity today. However, before the invention of remote sensing technologies, TC coverage in IBTrACS data is incomplete (see Fig. 2). In particular the Indian Ocean and the southern Pacific Ocean should be treated with care for all events before 1980.
The Holland08 wind field model (as well as other available wind field models) provides a rather generic setup to derive wind fields based on statistical properties of observed TCs. The wind field generated by the model represents a gross approximation of the actually realized wind field. Wind fields of "standard" TCs are more accurately captured by wind field models than TCs with very unusual properties, e.g., Superstorm Sandy in 2012, whose extension was unusually huge despite its rather weak winds. Therefore, one should be aware of outliers when analyzing single storm properties from TCE-DAT. Furthermore, our methodology defines exposure solely using the storm's wind field and maximum sustained wind speed. We do not account for additional people and assets in regions that might still be exposed to, for exam- ple, severe precipitation and/or storm surges. This is particular relevant for TCs that cause damage but whose wind field never touches land directly. The same is true for offshore activities (e.g., oil platforms, ships), whose assets remain unresolved by our methodology.
The socioeconomic data have been carefully assembled but still gives rise to uncertainties, e.g., caused by linear interpolation between decennial time steps. While there exists some certainty for population distributions as subnational population counts have been collected for centuries, the uncertainty in the distribution of GDP is much larger as reported subnational GDP and assets estimates are still unavailable for most countries at present. Additionally, GDP at the grid level is used to approximate local assets. While this assumption seems reasonable for the spatial resolution used in this work, there might still exist large discrepancies for specific grid cells and economic sectors. Furthermore and due to a lack of data, we use 2016 national assets / GDP ratios to approximate assets structure for all years between 1950 and 2015. As a consequence, the assets value of fast-developing countries might be overestimated for earlier years.

Validation of exposure estimates
TCs and their impacts are comprehensively studied in the United States. We therefore use the United States as a test region to compare TCE-DAT estimates with more comprehensive observational records for storm size and in order to evaluate the reliability of our methodology.
Our validation is based on the extended best track HUR-DAT (HURDAText) archive. This archive is equally maintained by the NHC and -in extension to the regular HUR-DAT archive -provides size estimates for most North Atlantic TCs since 1988 for the wind speed thresholds 34, 50, 64 kn, and maximum wind speed) (Demuth et al., 2006). No size information is available for intermediate wind speeds. Data by HURDAText are preprocessed (as described in Geiger et al., 2016) and compared to results from TCE-DAT for the variables wind speed at landfall and exposed population at 34, 64, and 96 kn for 87 TCs between 1988 and 2012.
The comparison of the TC's maximum recorded wind speed above land (Fig. 4a) shows a good qualitative agreement between both data sets with a Pearson correlation of r = 0.86. Perfect agreement cannot be expected and is precluded for several reasons. First, the Holland08 model estimates TC wind speed indirectly based on minimum central pressure, thereby inhibiting a direct comparison of wind speeds at landfall. Second, the HURDAText data set provides observed wind speed in incremental steps (34, 50, 64 kn, and maximum wind speed). For TCs with no direct landfall of the storm's center (near misses) this provides only an approximate value for the real wind speed. As, however, near misses also affect people and assets they are also included in TCE-DAT. Therefore, a single grid cell can decide between a miss and a near miss and consequently the results strongly depend on the exact wind field. This also explains why the actual number of TCs with nonzero exposure slightly varies between both data sets (see Fig. 4b). The relatively large difference in numbers of landfall for the 96 kn threshold is due to the fact that the HURDAText archive does not provide size estimates for 96 kn directly but rather for the radii of maximum winds only. 1 Major TCs that do not hit land with their maximum winds are thus only included as TCs exceeding 64 kn despite the fact that a fraction of the wind field above land might well exceed the 96 kn threshold.
In a next step we compare the obtained exposure measures for different intensity thresholds, both at the individual event and aggregated level (see Fig. 5). This indicates the sensitivity of exposure measures to different surface wind estimates by Holland08 and HURDAText.
For 34 kn winds we find a good agreement (r = 0.83) for exposed population between Holland08 and HURDAText (see Fig. 5a). There are a few outliers where the exposed population based on HURDAText is several orders of magnitude larger than based on Holland08. Such large deviations are, however, expected as individual storms can strongly deviate from regular-sized TCs. Superstorm Sandy, which hit the US east coast in 2012, is a good example: Sandy's wind field of tropical storm force was huge in comparison to mean extensions of comparable events and extended all the way to Florida despite its landfall location in New Jersey. Similar deviations are also reflected in the exposure estimates across all TCs at 34 kn (see Fig. 5b): while mean affected population is comparable there are large deviations for higher percentiles.
Differences in TC exposure derived from observational and approximated wind fields become smaller with increasing intensity (Fig. 5c), and the mean numbers as well as the different percentiles of exposed population across all landfalling TCs between 1988 and 2012 compare well (see Fig. 5d). For 96 kn winds the number of TCs available for comparison is rather small (Fig. 5e, f), and there exists an additional bias as the 96 kn wind speed threshold is not provided in HURDAText explicitly; see discussion above. Nonetheless, and up to one outlier, we find good agreement between the exposure estimates from both data sets.
Based on the validation exercise for the United States we conclude that there exists a good qualitative and quantitative agreement between risk estimates drawn from the observation-based HURDAText and the generic Holland08 wind field data, despite known shortcomings of the Hol-land08 wind field model. Consequently, there exists confidence that exposure estimates for other parts of the world and other time periods can be used to approximate exposure given the lack of observed wind fields. Due to the generic wind field modeling approach, however, more confidence should be put into aggregated exposure estimates than single event exposure, in particular if additional information about this event is scarce.

Data availability
TCE-DAT was produced using publicly available data only. In particular, the open-source CLIMADA modeling tool module ISIMIP v1.0 (https://github.com/davidnbresch/ climada_module_isimip/releases/tag/v1.0) was used to generate TCE-DAT. Gridded population data are freely available from the HYDE database (Klein Goldewijk, 2017); gridded GDP data and corresponding national GDP/assets conversion factors can be found in Geiger et al. (2017b). In addition to the data sources mentioned above, the already pre-processed socioeconomic data can also be accessed via the input data tab available at https://www.isimip.org/. We created a data collection DOI that assembles all presently available data sets as well as future amendments to TCE-DAT in Geiger et al. (2017c). Currently, this data collection DOI hosts the spatially explicit (Geiger et al., 2017c) and the aggregated (Geiger et al., 2017c) TCE-DAT repositories.

Conclusions
We here provide a new and comprehensive data set TCE-DAT for global historical TC exposure between 1950 and 2015. The data set contains spatially explicit exposure at the grid level and aggregated exposed population and exposed assets by event and country for 5335 events based on 2713 TCs, separating exposure to wind speeds above 34, 64, and 96 kn. This data set provides an assessment by overlaying estimated wind fields with gridded information about population and assets. While this approach has some limitations, in particular potentially large deviations from actually realized exposure for selected events because of the generic wind field model, it also overcomes various other is- Figure 5. Comparison of exposed population by event (a, c, e) and across events (b, d, f) for different wind speed thresholds using estimates from the Holland08 wind field model and the observed HURDAText database. In the right panels, boxes (whiskers) indicate the 25-75 % (10-90 %) percentile range, while yellow lines are medians. sues that arise due to biased and/or changing reporting standards across time and space. Pure data of exposed population and assets, i.e., relying only on TC properties, are not available elsewhere. As a further benefit, TCE-DAT was created using only freely available input data and established methods and the freely available modeling tool CLIMADA with module ISIMIP.
In conclusion, this work provides a valuable additional resource to the community studying TC-related impacts, in particular for non-experts in this field. It avoids present endogeneity issues, in particular relevant for econometric assessments of TC impacts, by creating a TC exposure database based on physical storm properties. Based on this data set new insights are expected for global and region-specific vulnerability assessments and the long-run economic consequences of natural disasters in general.
Author contributions. TG and DNB wrote the code and created and analyzed the data set; TG, DNB, and KF designed the research and wrote the paper.
Acknowledgements. Tobias Geiger acknowledges funding through the framework of the Leibniz Competition (SAW-2013-PIK-5 and SAW-2016-PIK-1). We thank Daisuke Murakami and Yoshiki Yamagata for improving and adapting their gridded GDP data to the context of this study and their cooperation in making these data available to the public. We further thank Kirsten Elger from GFZ Data Services for invaluable support in creating the DOI data archives.
Edited by: David Carlson Reviewed by: James Done and one anonymous referee