Global gridded precipitation over land : a description of the new GPCC First Guess Daily product

Instruments Data Provenance & Structure


Introduction
Besides evaporation and condensation precipitation is one of the three main processes within the hydrological cycle. It is the main source of fresh water for rivers, streams and lakes and is responsible for the (re-) distribution of fresh water over the globe. Unfortunately, precipitation events are strongly variable in space and time and the global dis-treme events and to reduce secondary damage, there is a large range of applications that depend on precipitation monitoring products. Furthermore, precipitation monitoring products are required for the improvement of the understanding of the water cycle, the occurrence of extreme events, and the Earth's climate.
Therefore the Global Precipitation Climatology Centre (GPCC) was founded in support of the World Meteorological Organization (WMO) in 1989. It is a contribution to the World Climate Research Programme (WCRP) and to the Global Climate Observing System (GCOS). The objective of GPCC is to provide high quality global precipitation analyses over land for monitoring and research of the Earth's climate. Rain gauge data from different sources have been collected and the database, which is updated 15 reguarly, has increased to one of the largest of its kind. Great importance is thereby attached to the quality of the data, which undergoes different quality checks before use. Different gridded monthly precipitation products based on this data are released on a regular basis to address the different needs of quality and time of availability. A detailed description of all products and GPCC's quality control procedures is given in 20 Becker et al. (2013) and Schneider et al. (2013), respectively. However, for many applications such as the monitoring of the statistics of extreme events, determining the occurrence of individual, heavy precipitation events, or for comparison to model output data or satellite data, a time resolution higher than one month is required. There are some other gauge-based gridded precipitation products with a tem-ever, their uncertainty is particularly high due to the indirect precipitation estimation method (Petty and Krajewski, 1996;Kidd et al., 2011). Moreover, the relatively short existence of satellites yields only few time series exceeding 30 yr. Radar estimates have a very high spatial and temporal resolution but are not available for large parts of the globe. Moreover, their timeseries are even shorter than the 15 satellite-based ones.
Gridded products of rain gauge measurements also have their disadvantages, namely a coarse distribution that results in an underestimation of the true precipitation amount by about 5 % (Legates and DeLiberty, 1993).
A combination of station-and satellite-measurements (for example the Global Pre- 20 cipitation Climatology Project (GPCP), Huffman et al., 2001) allows an enhancement of the product quality, although some disadvantages still remain, e.g. adjustment of satellite data in sparsely or unprobed regions. Furthermore, it is difficult to give quantitative error estimates for a merged product. Intercomparisons of different products identify biases (Yilmaz et al., 2005) but do not give guidance to prefer a certain product. 25 With its new "First Guess Daily" product, the GPCC aims to address the need of a global precipitation product with a daily resolution as well as information on the uncertainty. This paper describes the new product which has been released since April 2013, but is available for the days since 1 January 2009. An update is provided three to 438 ESSDD 6,2013 A description of the new GPCC First Guess Daily product five days after the end of each month. The spatial resolution is 1 • latitude by longitude and covers the entire land surface except the Antarctic. In addition to the precipitation estimation of the "First Guess Daily" product an estimation of the uncertainty is provided. As the name implies, it is a quick analysis of the daily precipitation amounts that is designed to be released in near real-time for applications being up-to-date. The 5 advantage of this approach is the high timeliness, but this leads to a lower number of input stations and level of quality control. Therefore, only data from SYNOP messages, which are reported via Global Telecommunication System (GTS), are used and undergo only a preliminary automatic quality control (see Sect. 3). First we give a description of a comparison of different interpolation schemes to find 10 the best performing one for daily data (Sect. 2). An overview of the input data and quality control is given in Sect. 3. The calculation of the "First Guess Daily" is depicted in Sect. 4 and how to access the data set in Sect. 5. An outlook to scheduled future daily precipitation analyses is given in Sect. 6. Finally, we summarize our findings.

15
Several methods for gridding of daily precipitation data have been applied and compared to find the best performing scheme to interpolate daily precipitation amounts: a modified SPHEREMAP interpolation as it is used for the monthly products ( For all methods a minimum of 4 and a maximum of 10 neighbouring stations were used. The search radius is chosen depending on the mean station density (Shepard, 1968) and increases for each grid cell independantly with decreasing station density. Cross-validation was used for quality evaluation.
For the interpolation two approaches were applied: interpolation of totals and inter-5 polation of anomalies. At the interpolation of totals the daily precipitation amounts as measured were interpolated. Contrary, at the interpolation of anomalies the precipitation anomaly or fraction was calculated as the daily precipitation total to the monthly precipitation total at this station. After interpolation of the anomalies these were multiplied with the measured monthly total at this station to get the daily total.
10 To cross-validate the different interpolation schemes, the precipitation total of one station was left out and the precipitation at this station was interpolated from the surrounding stations. By comparing the real measured and interpolated precipitation at this station, different error estimates were calculated: mean absolute error (MAE), mean squared error (MSE, given in Table 2) and precipitation observed and precipi-15 tation interpolated at the station. This was done for all available stations for every day of the year 2008 (largest number of available stations).
Kriging (Krige, 1981;Gandin, 1993) is a stochastic approach for the interpolation of data. The interpolation theory has been developed in the geosciences by the French mathematician Georges Matheron and named in honour of the South African Mining 20 Engineer Daniel Krige. It is now a popular interpolation method and used for many applications in geosciences. As for almost all interpolation algorithms Kriging uses weights depending on the distance between measurement location and the location interested in. The weights decrease with increasing distance. The particular approach of Kriging is to calculate weights according to a (moderately) data-driven function, rather 25 than an arbitrary function like SPHEREMAP (Shepard, 1968;Willmott et al., 1985). The spatial correlation structure of the data is used to determine the weighting function.
In order to find the optimal interpolation setting and weighting function for Kriging, different autocorrelation parameters have been applied and the results have been compared. At first the same autocorrelation parameters have been used for each grid irrespective of the region or station density (parameters summarized in Table 3). The second approach was to calculate individual autocorrelation parameters for each grid point whenever enough data was available. If the data coverage does not support grid point specific diagnosis of the autocorrelation parameters, the lookup table for autocor-5 relation parameters (Table 1 in Kottek and Rubel, 2007) depending on the Koeppen-Geiger climate classification  is applied instead (see Fig. 4 for spatial distribution of climate zones). It appears that the first approach with only one set of parameters for all grid points shows the best results. We assume that the reason is that there are not enough measurements to calculate reliable autocorrelation parameters in most regions because precipitation is too variable. Even if not only stations within one grid point but all stations within a climatic area and for a longer time period are considered, the spatial structure is not sufficiently represented and the calculated parameters vary strongly. These results have also been found for other datasets (e.g. Hofstra et al., 2008). 15 The results of the comparison of these methods show that the performance does not differ significantly if they are compared in the same climate zone (see Table 2). Due to the large different precipitation totals, size of climate zones and number of stations between the climate zones the MSE differ strongly, because it is not normalized to take this into account. Depending on the region, data density, time period or amount of 20 precipitation different interpolation schemes perform better but all in all their results are similar. Even the choice of the interpolation with anomalies or absolute values does not influence the results in many regions strongly and makes little difference when comparing the globally-calculated mean squared error (MSE, Table 2). Nevertheless, ordinary point Kriging shows slightly better results than the other interpolation methods and the 25 MSE for the interpolation with anomalies is slightly lower than for absolute values in nearly all cases. Especially in regions with a very sparse data density and strongly varying climate zones such as central Africa, South America and parts of Southeast Asia, the anomaly interpolation partly makes up for the missing values (Figs. 2 and 3). Hence for the daily precipitation products ordinary block Kriging with anomalies was adapted (see Sect. 4.3). Additionally, it has the advantage that error estimation is provided implicitly. These results are supported by results which have been found for other regions and datasets (Dubrule, 1984;Tabios and Salas, 1985;Dirks et al., 1998;Hofstra et al., 2008;Herrera et al., 2012).

3 Data source and quality control
The GPCC database consists of a large amount of daily and monthly data from several different data sources such as national meteorological and hydrological services, global and regional data collections (e.g., Food and Agriculture Organization of the United Nations (FAO), Global Historical Climatology Network (GHCN), Climatic Research Unit 10 (CRU), European Climate Assessment & Dataset (ECA&D)) and near real-time data from WMO-GTS (SYNOP reports, CLIMAT messages for monthly data). Monthly data have been collected since the foundation of GPCC in 1989, whereas the collection of daily data has only been started recently. All GPCC products for monthly totals are based on the monthly data including a "Climatology". A detailed description of the com- 15 plete monthly data, the database and the monthly GPCC products is given in Becker et al. (2013) and Schneider et al. (2013).

Data source "First Guess Daily"
As the "First Guess Daily" product is a near real-time GTS based product, only data from SYNOP reports can be used. The WMO-Vol. A (WMO) lists all currently operat-20 ing SYNOP stations. Most of them report precipitation and these are utilized for the "First Guess Daily". Currently there are roughly 6000 to nearly 8000 stations with daily SYNOP data each month (see Fig. 1, light blue line) for the processed period. These stations are spread around the globe but unfortunately there are regions with higher (e.g. Europe) and lower (e.g. Russia, Northern Canada) station density. Especially the coverage across many African, Asian and South American regions is weak in both number and availability (see Fig. 5, top right). For each day the reported observations of the precipitation amounts for the specific time intervals (1 h, 3 h, 6 h, 12 h, 18 h or 24 h) are added up to a daily sum for the climatological day for each station.

Quality control "First Guess Daily"
Within the processing routine for SYNOP reports, quality checks are made including a plausibility check for consistency for overlapping periods in time, and for obvious coding errors. Also format errors were detected and corrected, for example when the weather group is shifted to the precipitation group. Additionally the reported weather 10 groups are used to evaluate the plausibility of the reported precipitation amount as well as to complete missing precipitation values when the weather group indicates no precipitation.
Wrong precipitation codings were corrected. For example, if the weather group indicates light rain and in the precipitation group is RRR = 919, then RRR is changed to 15 991. Uncoded, this means the precipitation total was changed from 919 to 0.1 mm.
As precipitation measurements can overlap in time, the precipitation total of the shorter report have to be equal or lower then the longer report. For example 12 h report with 20 mm can include a 6 h report with 20 mm or less.
Also the combination of precipitation and weather group, if available, is checked. 20 If the weather group indicated no rain and the precipitation group has rain, then the precipitation is left out. Before interpolation the daily totals calculated from the SYNOP data are loaded into the databank and the station metadata are checked (location and confusion with other stations).

"First Guess Daily" product -calculation of the gridded data set
For the "First Guess Daily" product the reported SYNOP precipitation measurements are calculated as areal means of regular grid cells with a spatial resolution of 1 • latitude by longitude. In addition to the total precipitation values in mm day −1 two different error estimations and the number of measurements per grid are provided.

Background climatology
All monthly GPCC products were interpolated as anomalies from long term means, called "Climatology". Briefly, this "Climatology" focuses on the period 1951 to 2000 but uses also other periods, if the mentiond one is not available. A detailed description of the "Climatology" is given in Schneider et al. (2013).

Interpolation of monthly data
The GPCC "First Guess" product (Ziese et al., 2011) is used for the monthly gridded data. This product is based on SYNOP reports only, which undergo the same quality control as for the "First Guess Daily" (see Sect. 3.2), and is available three to five days after the end of each month. Monthly totals were caclulated if the station has at least 15 70 % data coverage. Depending on the amount of missing data the monthly total is extrapolated. For the calculation of the gridded monthly totals the anomalies regarding long term means at the station are interpolated to a regular grid using a modified SPHEREMAP scheme. The monthly totals then are calculated as the sum of the interpolated anomalies and the "Climatology" for each grid point. A detailed description of 20 the "First Guess" product and the interpolation method is given in Becker et al. (2013).

Interpolation of daily data
For the "First Guess Daily" product the irregularly spaced observations of precipitation are interpolated to a regular grid using ordinary block Kriging. Parameters used in this 444  Table 3. The block size of the applyed Kriging scheme is 1 • . The advantage of block Kriging in comparison to point Kriging is that it calculates implicitly the areal precipitation of the block size. Therefore the precipitation is calculated at several points within the block surrounding the grid point and then a weighted sum of these totals is calculated. This is the same like interpolation with point Kriging 5 at several locations around the grid point and to calculate a weighted mean of them to get the areal precipitation.
For each grid box that contains at least a fraction of land the nearest neighbouring measurements are chosen. Precipitation values of stations located within a distance of less than 1 km are averaged so that the impact of these measurements will not be 10 overestimated and no station is included twice by mistake. Depending on their distance to the grid point and the chosen autocorrelation function, interpolation weights are calculated and the estimation of the precipitation within the grid is computed. This is also done for areas with sparse station density by increasing the search radius for neighbouring stations (Shepard, 1968). In those regions the search radius can be much 15 larger than the correlation length of precipitation (arround 347 km, see also Table 3) and therefore the uncertainty of the calculated precipitation can be as high as the calculated precipitation itself. But at the same time this approach allows a complete coverage of the precipitation estimation over the land surface. No information is provided for grid boxes that do not contain land surface. A detailed description of the theory of ordinary 20 block Kriging and the implementation of Kriging in Fortran is given in Rubel (1996).
The processing of the "First Guess Daily" product is based on an anomaly interpolation method. Therefore, the daily precipitation anomalies are calculated as the daily total divided by the monthly total at this station. Stations without monthly totals (monthly data coverage less than 70 %) are excluded from the analysis. These ratios of the daily 25 anomalies are interpolated to the grid and the results are multiplied with the gridded monthly totals from the "First Guess" in order to gain absolute daily precipitation values. Due to the application of the "First Guess" as background for the "First Guess Daily", the "Climatology" is also included indirectly in the "First Guess Daily". Therefore results in data sparse regions are more reliable than in a simple interpolation of daily totals. The disadvantage of the approach of interpolation of anomalies is that the daily product can only be calculated upon completion of the monthly product, which is about three to five days after the end of each observation month. Therefore, a future near real-time global availability (in the order of one day) of the GTS-based daily precipitation analysis would require a much more homogenous global coverage of GTS stations.

Uncertainty information
Depending on the station density and the structure and amount of precipitation in a region the uncertainties of the gridded product can be very high. To inform the user about the quality of the gridded data information of the uncertainties and the number of stations per grid cell are provided additionally with every product file. Information on the uncertainty is provided in two different parameters.
1. The Kriging-uncertainty (Fig. 5, botton right) depends mostly on the interstation distance (the autocorrelogram), the size of the grid and the number of measure-15 ments. It is a result of the interpolation equations of Kriging and can be interpreted as the percentage of the variance (Rubel, 1996) and is not an absolute error estimation. Furthermore the amount of precipitation does not influence the Kriging error. Users should access this information to identify problematic areas.
2. An absolute value of the uncertainty is given with the standard deviation (Fig. 5, 20 botton left), calculated according to Yamamoto (2000). It depends on the measured precipitation values. The difference of the values of the surrounding measurements and the interpolated value are calculated and weighted with the same weights as applied for the interpolation. Furthermore, the error estimation for regions where no precipitation is interpolated or measured is always zero in contrast 25 to the Kriging error. Further possibilities of calculating an uncertainty estimation have been implemented and validated with cross-validation (the regular standard deviation and the combination of the Yamamoto-standard deviation and the Kriging error). However, due to the different concepts of Kriging error and the Yamamoto standard deviation we have decided to show them both with the forementioned user advice instead of a blended parameter.

5
Finally, a third uncertainty information is provided with the "First Guess Daily" product in terms of the number of measurements per grid cell (Fig. 5, top right). This number is the real number of measurements per grid and must not be mistaken for the number of measurements used for interpolation to the specific grid. It allows the user to get a rough impression of the distribution of the measurements and therefore of the quality 10 of the product at a certain location.

How to access
All gridded GPCC products are available via the public GPCC webpage free of charge: ftp://ftp-anon.dwd.de/pub/data/gpcc/html/download gate.html. No registration is required to download the data. 15 Daily data is provided as a single netCDF file for each month containing the gridded data (total precipitation, number of measurements and both uncertainty estimations) for each day of the month. "First Guess Daily" is provided at a regular global grid (−180 to 180 • longitude and −90 to 90 • latitude) with 1 • latitude by longitude grid size. This grid has no projection! One netCDF-file has about 3 to 3.5 MB. 20 netCDF is a standardized, self describing binary file formate (netCDF). The coordinates of each grid cell and missing values are coded in the header of each netCDF-file and can be read by suitable software. We give the coordinates of the center of each grid cell. The missing value is −99999.99, but can be visualized with other values by visualizing software. Table 4 summarizes the applied grid and Table 5 the variables in the netCDF-file. Some software to analyse and convert netCDF-files is listed at the "First Guess Daily" website, which is referrenced by doi:10.5676/DWD GPCC/FG D 100. Data from sources other than GTS take longer to arrive at GPCC. Also, their quality control at GPCC is more extensive than for the SYNOP data. For daily data the quality control comprises, in addition to the metadata, a check of consistency with monthly data, a check of the assignment of the measurement to the correct day and a statisti-5 cal check for outliers and data errors. Only quality controlled data are loaded into the database for further use. Figure 1 shows the number of measurements from each different data source within the database. Daily precipitation products based on all available data will be available in non real-time (see below). Two further daily precipitation products are currently under developement at GPCC: "Full Data Daily" and a combined product with HOAPS (Andersson et al., 2010a,b). These products will be released by GPCC in order to satisfy the need of high quality global precipitation products utilizing all available stations with daily precipitation totals which have undergone a strict quality control.

15
As the GPCC's data base of measurements is growing, there will be a "Full Data Daily" product in the near future which will be based on all available data at the time of processing. As mentioned above, all data will undergo a thorough quality control. An update will be provided whenever the amount of data has increased significantly. The "Full Data Daily" product will have a spatial resolution of 2.5 • as well as a 1 • latitude by 20 longitude. The same interpolation method as for the "First Guess Daily" will be applied.

Merged GPCC-HOAPS product
The subproject aims to establish a new gridded precipitation dataset primarily constructed to study the decadal behaviour of precipitation. The new product will be combination of a GPCC produced in situ measurement analysis over land ("Full Data Daily") and a satellite-based precipitation product developed within DAPACLIP. The satellite-based product largely relies on experiences made during the generation of the Hamburg Ocean and Parameters and Fluxes from Satellite data (HOAPS), which was originally developed at the University of Hamburg and the Max-Planck Institute for Meteorology in Hamburg (Andersson et al., 2010a,b) and is now released by EUMETSAT's Satellite Application Facility on Climate Monitoring (CM SAF, Schulz et al., 2009;Schröder et al., 2013). Both products are combined in such a manner that the gap along the 10 coastline between the products is filled in an optimum way and the estimation of the uncertainties preserved. HOAPS is based on satellite data from the SSM/I sensor that is available for the time period from 1988 until 2008. Therefore, the merged product will be generated for this time period too. It will become available in the second half of 2014 with two 15 different spatial resolutions 1 • and 2.5 • latitude by longitude with a global coverage and additionally with a spatial resolution of 0.5 • latitude by longitude for Europe.
Moreover a special projection on the CORDEX grid (0.22 • resolution latitude by longitude) will be provided for Europe.

20
Reference information on the new Global Precipitation Climatology Centre (GPCC) "First Guess Daily" product is provided. It is designed to address the need for a globally gridded land-surface in-situ measured precipitation product with a high temporal resolution, provided in near real-time. In order to achieve the latter requirement, only data available via GTS can be considered and only a rough automatic quality control 25 can be conducted. Therefore this product describes only a first estimation ( the precipitation fields of each day. Additionally three different uncertainty parameters are provided. The "First Guess Daily" is interpolated applying ordinary block Kriging as interpolation of anomalies. These anomalies were calculated with the monthly total at this station. For the gridded daily total, the interpolated anomalies are multiplied with the gridded monthly total from the "First Guess".
Also a comparison of different interpolation schemes was given. Ordinary point Kriging of anomalies has globaly the best preformance of all tested interpolation schemes.
Due to the daily resolution, the "First Guess Daily" can be applied to study extreme precipitation events on a global scale. Also monitoring of dry spells and droughts is 10 possible. It can be used to adjust remote sensed data to in situ measurements.
Acknowledgements. First of all we are most appreciative to the data suppliers who are to the largest extent the worldwide spread National Meteorological and/or Hydrological Services, but also some other institutes. These data contributions have put GPCC into the position to provide the global precipitation analyses described in this document, and we are looking forward to 15 their further contributions, which are crucial in order to maintain and enhance GPCC's level of products in terms of scope and quality.