The amount of water stored on continents is an important constraint for
water mass and energy exchanges in the Earth system and exhibits large
inter-annual variability at both local and continental scales. From 2002 to
2017, the satellites of the Gravity Recovery and Climate Experiment
(GRACE) mission have observed changes in terrestrial water storage (TWS) with an
unprecedented level of accuracy. In this paper, we use a statistical model
trained with GRACE observations to reconstruct past climate-driven changes
in TWS from historical and near-real-time meteorological datasets at daily
and monthly scales. Unlike most hydrological models which represent water
reservoirs individually (e.g., snow, soil moisture) and usually provide
a single model run, the presented approach directly reconstructs total TWS
changes and includes hundreds of ensemble members which can be used to
quantify predictive uncertainty. We compare these data-driven TWS estimates
with other independent evaluation datasets such as the sea level budget,
large-scale water balance from atmospheric reanalysis, and in situ streamflow
measurements. We find that the presented approach performs overall as well
or better than a set of state-of-the-art global hydrological models (Water
Resources Reanalysis version 2). We provide reconstructed TWS anomalies at a
spatial resolution of 0.5
Because the amount of freshwater available on land controls the development of natural ecosystems as much as human activities, terrestrial water storage (TWS) represents a critical variable of the Earth system. Changes in TWS can be caused by both anthropogenic and natural processes. Natural variability in ocean and atmospheric circulation, such as the El Niño–Southern Oscillation (ENSO), is responsible for anomalies in precipitation which strongly influence water storage (Ni et al., 2017), leading to regional droughts and floods with large impacts on human activities (Veldkamp et al., 2015). At the global scale, climate-driven fluctuations in the total amount of water stored on land have been linked to a wide range of geophysical phenomena, including changes in global mean sea level (Cazenave et al., 2014; Reager et al., 2016; Rietbroek et al., 2016; Dieng et al., 2017), changes in global carbon uptake by land ecosystems (Humphrey et al., 2018), and the motion of the Earth's rotational axis (Adhikari and Ivins, 2016; Youm et al., 2017). In addition to climate-driven natural variability, human activities also influence terrestrial water storage, for instance through groundwater depletion (Rodell et al., 2009; Chen et al., 2016), building of dams (Chao et al., 2008), or the impact of anthropogenic climate change on land ice (Jacob et al., 2012).
From 2002 to 2017, changes in terrestrial water storage (TWS) have been measured by the GRACE satellites with an unprecedented accuracy. Because these observations integrate both natural and anthropogenic effects across all water reservoirs (i.e., soil moisture, groundwater, snow, lakes, wetlands, rivers, and land ice), isolating the contribution of specific reservoirs or the relative importance of natural versus anthropogenic effects is still relatively uncertain and has been the focus of several recent publications (Reager et al., 2016; Eicker et al., 2016; Wada et al., 2016; Fasullo et al., 2016; Felfelani et al., 2017; Getirana et al., 2017; Pan et al., 2017; Andrew et al., 2017; Rodell et al., 2018; Hanasaki et al., 2018; Khaki et al., 2018; WCRP Global Sea Level Budget Group, 2018). In this context, one critical aspect is to model the effect of climate variability on TWS changes. At this time, only global hydrological models and land surface models can provide long-term estimates of natural TWS variability; however, they are usually not calibrated against GRACE measurements and sometimes exhibit large biases in TWS amplitude (Schellekens et al., 2017; Zhang et al., 2017; Scanlon et al., 2018). Typically, only a small number of such model runs is available and exploring the uncertainty related to the use of different meteorological forcing datasets is not possible. With this paper, we aim to address these shortcomings with a computationally cheap alternative. Unlike hydrological models which represent physical processes and model water reservoirs individually (e.g., snow, soil moisture, lakes), we train a statistical model to directly reconstruct the total TWS changes from precipitation and temperature information.
The primary objective of this paper is to provide long and consistent time series of climate-driven TWS variability. Although the temporal coverage of GRACE observations will be extended by the GRACE Follow-On mission launched on 22 May 2018, there will be a temporal gap of approximately 1 year between the two missions. The reconstruction provided here is calibrated against GRACE measurements and can be used to interpret this data gap and reconcile the two datasets. In addition, we provide a century-long TWS reconstruction that can be used to study past natural TWS variability. We expect that this product will be relevant to sea level budget studies (Chambers et al., 2016; Cheng et al., 2017; Frederikse et al., 2018; WCRP Global Sea Level Budget Group, 2018), the analysis of climate signals in geodetic time series (in GRACE or in ground GNSS measurements, for example), development of daily hydrological loading models (Dill and Dobslaw, 2013; Moreira et al., 2016), and global to regional assessments of the recurrence of extreme hydrological droughts and their impact on ecosystems (Sheffield and Wood, 2007; Sheffield et al., 2012; Beguería et al., 2014; Griffin and Anchukaitis, 2014; Kusche et al., 2016; Dai and Zhao, 2016; Spinoni et al., 2017; Heim, 2017; Rudd et al., 2017; Sinha et al., 2017; Haslinger and Blöschl, 2017; Um et al., 2017; Bento et al., 2018; D'Orangeville et al., 2018; Huang et al., 2018; Markonis et al., 2018; Anderegg et al., 2018; Gao et al., 2018).
The two different monthly GRACE solutions used here (Table 1) are obtained
using the so-called mass concentration (mascon) technique. This technique
provides estimates of mass changes over small predefined regions, which are
referred to as
GRACE datasets used for model calibration.
We use three different precipitation products which are aimed to address the
needs of various user communities (Table 2). The multisource
weighted-ensemble precipitation dataset (MSWEP) merges a large number of
existing precipitation products, including satellite-based, rain-gauge-based
and reanalysis products (Beck et al., 2017, 2018). We expect
this dataset to provide a best estimate for the period 1979–2016. The Global
Soil Wetness Project Phase 3 (GSWP3) forcing dataset (Kim,
2017) is based on the 20th Century Reanalysis (20CR) version 2c
(Compo et al., 2011). The
original 20CR precipitation fields produced at a resolution of 2
Meteorological forcing datasets.
A simple statistical model is calibrated at each GRACE mascon individually,
meaning that model parameters are space-dependent. One model is calibrated
for each combination of the two GRACE products (Table 1) with the three
precipitation products (Table 2). The meteorological forcing is always
spatially averaged over the spatial footprint of the GRACE mascons. Because
the model described here does not have any explicit constraint in terms of
mass or energy conservation, we refer to it as a statistical model; however
its formulation is largely inspired from basic principles of hydrological
modeling. Assuming a linear water store model, water outputs are directly
proportional to the storage and to the residence time of the water store (e.g., Beven, 2012), so that the temporal evolution of the storage
can be approximated as
Small (large) values of the residence time indicate that water inputs tend
to leave the reservoir quickly (slowly), through either runoff or
evapotranspiration. Here we introduce seasonal changes in residence time
(e.g., related to snow accumulation during the cold season or increased
evaporative demand during the warm season) using a temperature-dependent
relationship. The residence time used in Eq. (1) is formulated as a function
of de-trended daily air temperature:
Illustration of the GRACE reconstruction at one given
The initial value of the storage (TWS
The daily water storage time series (Eq. 1) is averaged to monthly temporal
resolution (
The empirical residuals (
Characterization of the empirical model residuals for the
GRACE-REC dataset based on MSWEP precipitation and ERA5 air temperature,
calibrated with the JPL mascons.
To provide a practical solution to this problem, we generate ensemble
members which incorporate the spatial and temporal covariance structure of
the residuals. These ensembles can be easily averaged over any larger area,
and once averaged they provide a predictive spread that is representative
of the aggregated error. In order to generate these ensembles, we present
hereafter a spatial autoregressive (SAR) noise model
(Cressie and Wikle, 2011), which aims at
reproducing the spatial and temporal autocorrelation structure found in the
empirical residuals (
In the SAR model (Cressie and Wikle, 2011),
residuals (
The local autoregressive parameters
Illustration of the spatial autocorrelation of the empirical model
residuals and their representation in the SAR model (for the GRACE-REC
product based on MSWEP and calibrated with JPL mascons).
As mentioned in Sect. 2.3, the Markov chain Monte Carlo (MCMC) procedure
for model parameter estimation additionally provides a distribution of
equally acceptable model parameters (
The result of the above-described procedure is briefly illustrated and
evaluated in Fig. 4. For illustration, Fig. 4a shows the empirical residuals
Output of the SAR model for the generation of random noise
realizations that have a spatiotemporal structure similar to that of the
empirical model residuals (for the GRACE-REC product based on MSWEP and
calibrated with JPL mascons).
The presented method represents one amongst many possible approaches to the generation of ensemble members. This method has the advantage of reflecting the uncertainty of the reconstruction (compared to GRACE measurements) and mimics the empirical spatiotemporal autocorrelation structure of the errors while only requiring a minimal degree of model complexity and parameterization. We note that while the SAR model also represents errors coming from the GRACE solution itself, it does not include any anisotropic error structure (e.g., due to striping) due to the isotropic nature of Eq. (11). The uncertainty related to the choice of the input precipitation or training GRACE dataset can be explored independently by comparing the six different versions of GRACE-REC (see Table 3).
List of the six GRACE-REC datasets available at monthly and daily scales.
Finally, we note that our modeling approach could in principle be evaluated with a cross-validation experiment, using only a subset of the data to calibrate the model parameters and then evaluate the performance against the other unused data (as done in Humphrey et al., 2017). However, this would go beyond the scope and objective of this paper, which is to document the generation of the GRACE-REC product. We prefer to evaluate the ability of the final product to extrapolate beyond the model calibration period in later sections by comparing the model predictions with fully independent datasets (Sect. 4.3 to 4.5).
The GRACE-REC data provide de-seasonalized terrestrial water storage (TWS)
anomalies in units of millimeters of water (kg m
Using two different training GRACE datasets (Table 1) and three different precipitation forcing datasets (Table 2), we produce a total of six different GRACE-REC datasets with 100 ensemble members each. For convenience, we also provide smaller summary files which only contain the ensemble mean and 90 % confidence interval.
For the daily TWS reconstructions, we only provide the ensemble mean of each GRACE-REC product in order to limit the data size. This ensemble mean is based on ensemble members which sample the parameter uncertainty only (Sect. 2.3.2). The reason for this is that no SAR model (Sect. 2.4.2) can be reliably calibrated at daily resolution as the two training GRACE datasets have monthly resolution. The format is identical to that of the monthly data (Table 3).
For global-scale applications, we provide global averages of the TWS time
series. Global averages are weighted by mascon area and include all land
mascons with or without Greenland and Antarctica (both options are
available). This format is especially suited for sea level and global water
budget studies and units are gigatons of water. To convert gigatons back to
millimeters of global land water, total land area values of 148 940 000 and
132 773 914 km
Although linear trends are removed during model calibration (Eq. 6), potential TWS trends caused by decadal variability and long-term changes in precipitation are not removed from the final dataset (Eq. 8) and can be substantial. By definition, any trend found in the reconstructed TWS products is caused by a trend in the underlying precipitation forcing (since the time-varying residence time uses de-trended temperature and there is no limit to storage capacity). Thus the reconstructed TWS trends mainly depend on the trends initially present in the driving precipitation data (see Sect. 4.1.2 for an example at global scale).
With these elements in mind, it should be clear that there will be differences between the trends found in GRACE and the trends found in the reconstruction. Such discrepancies are expected because the reconstruction does not represent several sources of long-term changes in TWS, including for instance, land ice melt, dams, anthropogenic water depletion (Reager et al., 2016; Felfelani et al., 2017; Rodell et al., 2018), or long-term changes in evaporative demand. Consequently, trends in GRACE-REC cannot be directly evaluated against the trends from GRACE itself. Thus, when we compute trends over the period 2003–2014 (Figs. S2 and S3 in the Supplement), we find that reconstructed trends are consistent with GRACE trends only over certain regions, likely due to the reasons mentioned above (linear trends simulated by the WRR2 models are also shown in Fig. S4).
As illustrated in Humphrey et al. (2017), the reconstruction can be used to remove the precipitation-driven variability from the original GRACE time series in order to better isolate and quantify other sources of long-term changes (such as anthropogenic impacts). However, users interested in computing long-term TWS trends from this dataset should always proceed with caution as the dataset was not evaluated for trends. For regional analyses, we recommend using the model ensembles to obtain a range of possible trends and thus better assess the uncertainty. More generally, we highlight that the quality of the reconstruction is strongly dependent on the quality of the input precipitation forcing and on the adequateness of an exponential decay model for representing water storage behavior. For instance, routing of water through the river system is not represented and might be important over certain regions. Section 4.1 provides global maps of model performance that can guide regional applications.
Correlation (of de-seasonalized, de-trended anomalies) between
GRACE-REC and GRACE JPL mascons
Nash–Sutcliffe efficiency (of de-seasonalized, de-trended
anomalies) between GRACE-REC and GRACE JPL mascons
In this section, the ensemble mean of GRACE-REC is compared against GRACE
observations. Note that this does not constitute an independent evaluation
because GRACE-REC is calibrated with GRACE data (comparisons with
independent sources are provided in Sect. 4.3 to 4.5). We evaluate model
performance with the Pearson correlation coefficient (Fig. 5) and the
Nash–Sutcliffe efficiency (Fig. 6). Model performance is highest especially
in regions with dense meteorological observing systems (e.g., Europe, western
Russia, North America, India, Australia) where we expect precipitation
datasets to have the highest accuracy. Over South America and central
Africa, the performance of the century-long reconstruction (GSWP3-based
products, Figs. 5c, d and 6c, d) is slightly inferior to that of multisource
and reanalysis precipitation datasets such as MSWEP and ERA5. Interestingly,
there is no clear difference in performance when GRACE-REC is calibrated
with the 3
We compare these performance metrics with the scores obtained by hydrological models and land surface models of the Water Resources Reanalysis version 2 (WRR2) (Schellekens et al., 2017; Dutra et al., 2017), which were also forced with MSWEP precipitation. Compared to the simple modeling approach used in GRACE-REC, WRR2 models are forced with additional meteorological information (such as radiation and humidity), were calibrated using various data streams, sometimes including GRACE observations (Dutra et al., 2017; Decharme et al., 2011, 2012, 2016; Vergnes et al., 2014; Krinner et al., 2005; de Rosnay et al., 2002; Van Der Knijff et al., 2010; Döll et al., 2009; Sutanudjaja et al., 2011, 2014; van Beek and Bierkens, 2008; van Beek et al., 2011; Wada et al., 2011, 2014; van Dijk et al., 2013, 2014), and are potentially able to resolve more complex processes that are relevant for TWS, such as snow dynamics, the effect of vegetation phenology on evapotranspiration, and runoff routing through the river system. We calculate TWS in WRR2 models by summing over all simulated water reservoirs (this includes soil moisture, snow, groundwater, and surface waters whenever these are represented in the models). It is important to underline that unlike WRR2 models, GRACE-REC is directly calibrated to reproduce GRACE observations. Therefore, GRACE-REC should be interpreted here as a benchmark, indicative of the performance that is at least achievable for a given precipitation dataset. In terms of Nash–Sutcliffe efficiency, GRACE-REC often obtains better scores than the WRR2 models (Fig. 7a). This is because the reconstruction better fits the local amplitude and variance of the observed TWS signal, as already diagnosed in previous work (Humphrey et al., 2017). We note that the reconstructions driven with ERA5 precipitation are most often superior to those driven with the other two precipitation datasets.
Global-area-weighted box plots of the performance metrics shown in
Figs. 5 and 6 for GRACE-REC datasets (blue), and comparison with the
performance of global hydrological models participating in the Earth2Observe
Water Resources Reanalysis version 2 (WRR2) (orange). Dark colors indicate
the performance obtained when comparing against
Global averages of all GRACE-REC products are illustrated in Fig. 8a.
Differences caused by different precipitation forcing datasets are much
greater than the differences related to different GRACE training datasets.
This is particularly true for long-term (
Comparisons with the de-trended GRACE global average are shown in Fig. 8b, c. We find that all GRACE-REC products produce a very similar inter-annual variability at the global scale and compare well against actual global mean GRACE, without applying any global constraint to the locally calibrated statistical model. Correlations between global means of GRACE-REC and global means of GRACE are larger than 0.75 (Fig. 9a) (evaluated over the common period 2003–2014). Compared to global means from the WRR2 models, GRACE-REC is on average better correlated (Fig. 9a) to the observed GRACE global mean and has a lower root-mean-square error (Fig. 9b), regardless of the GRACE dataset used for evaluation.
Agreement of the global average of different TWS model estimates (from GRACE-REC, blue, and WRR2 models, orange) with the observed TWS anomalies from JPL (squares) and GSFC (crosses) solutions.
We compare the daily GRACE-REC products with a Kalman smoothed daily GRACE
solution named ITSG-Grace2018 (Kurtenbach et al., 2012; Mayer-Gürr et al., 2018).
While this daily GRACE solution contains
significant information on the sub-monthly variability of TWS, the increased
temporal resolution is at the cost of spatial resolution, which is on the
order of 500 km for this particular product (note that the solution is also
correlated in time as a result of the Kalman smoothing). As illustrated in
Fig. 10a, there can be a good agreement between GRACE-REC and
ITSG-Grace2018 for sub-monthly variability when daily averages are computed
over large regions (here the Mississippi basin). Figure 10b, c provide a
summary of the agreement between GRACE-REC and ITSG-Grace2018 at a daily
scale, as well as a comparison with the performance of WRR2 models. Due to
the coarse resolution of the ITSG-Grace2018 product, the comparison (Fig. 10b, c) is
conducted at a spatial resolution of 5
Together with changes in ocean heat content, changes in the amount of water
stored on land are responsible for a large fraction of the year-to-year
variability in global mean sea level (Boening et al., 2012; Cazenave et
al., 2014; WCRP Global Sea Level Budget Group, 2018). Because changes in land
water storage result in
opposite changes in ocean mass, the sea level budget provides an independent
mean of evaluating various estimates of global mean TWS variability. Here we
assess the ability of terrestrial water storage products (GRACE, GRACE-REC,
and the WRR2 models) to close the sea level budget at the inter-annual timescale. We use de-seasonalized and de-trended global mean sea level (GMSL)
from satellite altimetry (Beckley et al., 2017) and steric
height estimates (GMSL
Over moderately large river basins (
We evaluate TWS products using a recently updated basin-scale water balance dataset (BSWB) (Hirschi and Seneviratne, 2017), which covers 341 catchments and is based on ERA-Interim reanalysis data (Dee et al., 2011) and runoff observations from the Global Runoff Data Centre (GRDC). The temporal coverage of BSWB estimates at each river basin thus depends on the availability of runoff data and does not always cover the GRACE time period. As a caveat, we note that BSWB should not be viewed as entirely independent of WRR2 models or as a ground truth. This is because moisture fluxes from ERA-Interim are not only influenced by the assimilated atmospheric profile information but are also dependent on the underlying land surface model (TESSEL), which is similar to WRR2 models in many aspects. All WRR2 models also used ERA-Interim as forcing data for all meteorological variables except for precipitation.
As illustrated in Fig. 12a for the Ob basin, we find that the reconstructed
TWS compares relatively well with BSWB estimates. Overall, all TWS products
considered here (including the GRACE data) seem to compare relatively
well with BSWB (Fig. 12b, c). We note that GRACE-REC products calibrated on
GSFC seem to compare slightly better with BSWB than the JPL-based products.
This might be because of the higher spatial sampling of the GSFC mascons
(1
In this section, we compare reconstructed TWS against streamflow
observations over the period 1901 to 2010. Streamflow and TWS of course
represent different variables with different units; however, we expect that
their temporal dynamics will correlate at the yearly scale, as illustrated
for the river Thames in Fig. 13a, b. Because observed streamflow is one of the
few water cycle variables available prior to 1980, it provides an
independent and useful means of evaluating the century-long reconstruction.
We use streamflow observations collected by the Global Streamflow Indices
and Metadata Archive (GSIM) (Do et al., 2018a; Gudmundsson et al., 2018).
From the 30 959 available stations, we keep stations with a basin size smaller
than 10 000 km
We find that TWS anomalies from both WRR2 models and GRACE-REC compare well
with yearly streamflow variability over the period 1980–2010 (Fig. 13c).
Reconstructions based on the GSFC products tend to perform slightly better,
again likely because of their higher spatial sampling (1
The presented dataset is publicly available
(
All datasets used in this paper are available at the following locations: GSWP3 (
We present a statistical reconstruction of climate-driven terrestrial water storage changes at daily and monthly resolution in six different configurations which cover three different time periods (Table 3). We evaluate the performance of this reconstruction and show that its overall accuracy is reasonable compared to other estimates of TWS variability available from global hydrological models. We also highlight the versatility and robustness of our approach by comparing our estimates with independent observations of Earth system variables outside of the calibration period.
The supplement related to this article is available online at:
VH and LG developed the approach. VH performed the analyses, produced the dataset and wrote the paper with feedback from LG.
The authors declare that they have no conflict of interest.
We thank Sonia Seneviratne for critical feedback and support of this work. We thank Hyungjun Kim for developing the GSWP3 forcing and providing us with early access to the data. We thank Richard Wartenburger for technical support. Model developers and data providers are also gratefully acknowledged for sharing their data.
This research has been supported by the European Research Council (DROUGHT-HEAT (grant no. 617518)) and by the Swiss National Science Foundation (grant no. P400P2_180784).
This paper was edited by Christian Voigt and reviewed by three anonymous referees.