A description of the global land-surface precipitation data products of the Global Precipitation Climatology Centre with sample applications including centennial ( trend ) analysis from 1901 – present

Instruments Data Provenance & Structure

and Water Cycle Experiment (GEWEX) of WCRP (Rudolf, 1995). The series has been complemented backwards to 1979 by another preliminary gauge product using the same analysis method but a reduced input data set (Xie et al., 1996).
The GPCC effort was initially set up as a scientific project in support of the GPCP effort. In view of the high quality of its first delivery for GEWEX the GPCC was institu- 15 tionalized at the Deutscher Wetterdienst upon a WMO request for long-term operation of GPCC. Subsequently, GPCC has been integrated into new permanent instruments such as the WMO GCOS. Since 1999, GPCC is one of the two global GCOS Surface Network Monitoring Centres (GSNMCs) with special emphasis on precipitation. The other GSNMC being responsible for air temperature monitoring is operated by the 20 Japan Meteorological Agency (JMA).
In parallel with these settlements, GPCC has successively extended the temporal coverage of its analysis products backward from present to originally 1986 to 1951 and 1901 in years 2004 and 2008 (see also Fig. 1). There are also earlier periods available in the data archive, but so far GPCC has decided to renounce on analysis prior to 1901.
Of course, the GPCC portfolio of gridded global datasets of monthly terrestrial precipitation based on gauge data was and is not unique as such. There have been similar data archives and products compiled and published by the Climate Research Unit (CRU) of the University of East Anglia (New et al., 1999(New et al., , 2000(New et al., , 2001(New et al., , 2002; by Petterson et al. (1997Petterson et al. ( , 1998 based on the Global Historical Climatology Network (GHCN) 5 data set, by Hijmans et al. (2005), and by Mitchell and Jones (2005), all for a number of atmospheric ECVs including precipitation. For precipitation only there are also the datasets published by Dai et al. (1997) and Matsuura and Willmott (2012). A strength of some of these data sets lies in the public availability of both, the gridded products and the underlying original station observations. This is in distinct contrast to the GPCC data products where the latter cannot be provided for many stations as GPCC does not claim copyrights on acquired data, which is also true for the non-global APHRODITE data sets published by Yatagai et al. (2009Yatagai et al. ( , 2012. Therefore, GPCC applies a general policy not to parse any original station data but to pass according requests to the original suppliers, if possible. On the other hand the GPCC data archive is by far the largest 15 world-wide for monthly precipitation, outperforming the global precipitation data coverage of all aforementioned data sets by at least a factor of two and partly much more. The non-claiming of copyrights on the original data is certainly a key to this success. In line with the scope of the ESSD journal, this paper serves as a reference publication to describe the multi-decadal and partly centennial data products published by the The fifth product, the Homogenized Precipitation Analysis Product (HOMPRA), being the follower of the VASClimO Product published by Beck et al. (2005) and still available from ftp://ftp.dwd.de/pub/data/gpcc/vasclimo/ could not be completed before submission of this paper. Therefore, only basic features will be described here (Sect. 7.4), while a thorough description will be published in a follow up paper corresponding to the 5 issuance of HOMPRA.
The issuance of the DOI references implies that ISO 19115 compliant metadata is provided under URLs constructed from the DOI proceeded by http://data.datacite.org. For example the metadata for the "GPCC Climatology Version 2011" at 0.25°resolution is available from http://data.datacite.org/10.5676/DWD GPCC/CLIM M V2011 025. 10 Moreover the DOI referenced GPCC products are included in the dataset catalogue of the Climate Data Centre (CDC) of Deutscher Wetterdienst. This catalogue disseminates ISO19139 compliant metadata on its data sets through the Geo-Network software application. For example the GPCC Climatology Version 2011 products are documented under http://cdc.dwd.de/catalogue/srv/de/main.home?uuid=de.dwd. 15 gpcc.climatology.v2011.
In this GPCC reference paper the underlying data base and provenance is described thoroughly in Sect. 2, followed by brief descriptions of the data quality control (QC) applied to the station data in Sect. 3. The QC issues shall be elaborated in a companion paper of Schneider et al. (2012). Here, we will focus on the description of the gridded ESSDD 5,2012 A description of the global land-surface precipitation data products Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | past century. Finally Sect. 8 informs on access methods to the data sets and provides basic user advice.

GPCC's rain gauge data base
The accuracy of rain gauge based precipitation analyses mainly depends on the spatial density of stations being used. For example in order to calculate monthly area-mean 5 precipitation on a 2.5°× 2.5°grid with a sampling error of less than 10 %, it takes 8 to 16 stations per grid cell depending on the variability of the precipitation in the region analysed (Jenne and Joseph, 1985;Rudolf et al., 1994). On the other hand, 10 % sampling error has been the accuracy requirement of the GPCP ( WMO, 1990) which corresponds to a global requirement of 40 000 homogeneously distributed sta- 10 tions worldwide. The rain gauge data received so far by GPCC can be divided into the part received in near-real time (Sect. 2.1) through the Global Telecommunication System (GTS) and the much bigger part collected offline (Sect. 2.2). In total, data of more than 85 000 stations have been integrated at least once throughout the centennial reanalysis period 15 starting in year 1901. This is a good success rate of 57 % or 34 % depending on which estimate on the total number of gauges operated world-wide in national meteorological or hydrological observation networks is taken: 150 000 according to New et al. (2001) or 250 000 according to Strangeways (2007). However, depending on the duration of the longest uninterrupted time series of each station fetched by the GPCC analysis 20 these figures differ. Applying a 10 yr minimum constraint, as applied as a screening criterion for the cadre data set of the background climatology GPCC-CLIM, the number of eligible stations drops already down to slightly more than 67 000 stations. Requiring coverage of fixed 30-yr reference periods, the number drops further to approximately 35 000 for the WMO standard period . As will be discussed in further detail 25 within Sect. 6 this situation has driven decisions on the length of the reference period chosen for the GPCC Climatology.

929
ESSDD 5,2012 A description of the global land-surface precipitation data products

Near real-time GTS data base
If a real-time access to the station data is required, for example to issue monitoring products suitable for a watch function, the number of actually available stations drops dramatically to a subset of about 8000 stations, out of the 12 000 stations listed in WMO Volume A (WMO, 2011a), that are currently internationally exchanged between 5 the National Hydro Meteorological Services (NHMSs) on a regular basis. These data are disseminated near real-time by the NHMSs via the (World Weather Watch) GTS. Monthly precipitation data from the following three sources are routinely obtained at GPCC within about one month after observation, and can thus be used for the early analyses.

Meteorological synoptic data (SYNOP) received at DWD, Offenbach
The SYNOP data received at DWD forms the primary GTS source. Its primary purpose is the analysis of global current weather charts and initialization of numerical weather prediction models. For GPCC purposes only the precipitation-related components of the SYNOP code are evaluated as follows:

15
-The precipitation group: t R RRR with t R = time interval (t R can be 1, 2, 3, 6, 9, 12, 15, 18 or 24) and RRR = precipitation total for the interval t R in mm, respectively in tenths of mm for precipitation amounts less than 1 mm (RRR ≥ 990) -The weather group wwW1W2 20 ww and W1,W2 describe the observed current weather (e.g. ww = 65: heavy rainfall) and the past weather.
- The limited number of GTS stations makes their data particularly precious and it is always aspired to make maximum use of the SYNOP data. In doing so the GPCC data processing routine includes some automatic quality checks and corrections to rescue damaged SYNOP messages: In the framework of the regulated global data exchange, monthly climatic data for more than 2000 stations are disseminated by the countries via GTS as CLIMAT bulletins. The CLIMAT bulletins include monthly means or totals for a number of variables compiled from reprocessed SYNOP observations collected by the publisher. The data are 5 known to be of high quality because some control of quality and completeness was performed on it. However, some errors still occur partly caused by the manual coding process. Therefore, the data is checked by GPCC upon arrival for typical coding errors, completeness and consistency. The plausibility of monthly precipitation is examined using additional information being also part of the CLIMAT bulletin, e.g. number of days 10 with precipitation above 1 mm and the quintile of the monthly data with regard to the frequency distribution, yielding the possibility to recognize and flag questionable data. The resulting quality-controlled CLIMAT precipitation data serves as reference data during GPCCs QC procedures explained in much more detail by Schneider et al. (2012).

15
As third source, GPCC utilizes the monthly precipitation data of the Climate Prediction Center (CPC), Washington DC, hosted by the National Oceanic and Atmospheric Administration (NOAA) being mainly based on SYNOP data. The global SYNOP data collective received by CPC through the GTS is not fully overlapping the collective received at Offenbach, thus featuring unique data contributions. While the CPC receives more 20 data for the Americas, Eastern Russia and some African regions, the DWD reaches a much higher data density over Europe. The CPC procedure to estimate monthly precipitation totals and especially to fill gaps in the SYNOP precipitation series is different from the GPCC method described before. In addition to GPCC's method, CPC includes precipitation data being statistically estimated from the qualitative weather ob- 25 servations ww and W1,W2, and extrapolates to the full month even if only relatively few observations are available. Therefore GPCC ranks redundant CPC-SYNOPS below the DWD ones but still applies cross-checking of both data to detect trivial data transmission or encoding/decoding errors. In order to obtain the best possible spatial data coverage at the earliest time as required by the GPCP and other users, the GPCC merges for its Monitoring Product the monthly totals from all three GTS data sources CLIMAT, DWD-SYNOP and 5 CPC (Fig. 2) after each of them has been loaded into GPCC's relational data base management system (RDBMS) from where it is subsequently available for the monthly near-real-time GPCC Monitoring Product and other analyses.
The near real time data base provides in some regions a sufficient data base for quantitative precipitation estimates, if the grid resolution is not too high. Therefore, the 10 GTS based GPCC products are only offered on a 2.5°and 1.0°resolving latitude longitude grid but not at 0.5°in contrast to the reanalysis products utilizing also non-GTS data. Moreover the number of stations per grid is provided as additional information to every GPCC product, to allow for an easy assessment of its potential reliability.
Within the data pool, the CLIMAT data -after a quality check -is assigned a higher 15 quality and provide therefore a reference for quality assessment of the SYNOP-based data. The earliest GPCC (First Guess) product is public available for all months since August 2004 and utilizes just the DWD SYNOP-based monthly precipitation totals from approximately 6000 stations (Fig. 2). Most recently this number has increased to more than 6800 stations. For the GPCC Monitoring Product, issued two months later than the 20 corresponding First Guess Product almost 8000 stations are utilized nowadays starting from approximately 6000 GTS stations in 1986.

GPCC full data base
All other data not exchanged through the GTS has been originally raised by WMO NHMSs for the specific purposes of the host countries and its exchange is subdued to 25 particular national rules. It is still only a few NHMSs that publish their national data without restrictions or with copyrights through the Web. Therefore GPCC needs to perform data acquisition in order to access the many more stations that are not reporting through the GTS. Major sources are: i. National data contributions by WMO Members (158 countries and 31 regional suppliers so far totalling the number of national sources to 189, see Appendix A) ii. Data collections of some international regional projects (e.g. SE Asia, Africa, For-5 mer Soviet Union) iii. Global data collection of the Climate Research Unit (CRU, Norwich, UK; New et al., 2002) iv. Global data collection of the UN Food and Agricultural Organisation (FAO, Rome, Italy) 10 v. Collection of the Global Historical Climatology Network (GHCN, NCDC Asheville, USA; Peterson et al., 1997Peterson et al., , 1998 National data contributions are normally acquired through bilateral correspondence of GPCC with the responsible national agencies. Moreover all WMO Members are informed by circular letters of WMO about the international task of the GPCC and the 15 corresponding data requirements. The GPCC has no funds for data purchasing and even not for covering any shipping costs. The data delivered are contributions of the countries to the international task of the GPCC and are restricted to the defined purpose. In doing so GPCC globally applies a policy to respect the copyrights of every data supplier and to publish only gridded products from the data but not the original data it-20 self. While this data policy can be criticized as contradicting open access data policies it has yielded a data base of double and triple size (Fig. 3), respectively, with regard to the most popular data sets of the Global Historical Climatology network (GHCN; Peterson et al., 1997Peterson et al., , 1998 and the Climate Research Unit (CRU) of the University of East Anglia (New et al., 2002) that supply also the original data to the community. In order to 25 warrant transparency on our methods without parsing data to third parties any scientist is invited to inspect the GPCC archive and its methods on site at the Headquarters of the Deutscher Wetterdienst. Original data is provided from a national source (i) and origins directly from the institution (e.g. a WMO NMHS) that has actually carried out the measurement. It constitutes the core part of the GPCC data base (Fig. 3). To be comprehensive in its approach, 5 the GPCC integrates also other global precipitation data collections from sources (iii) to (v) as well as several regional data sets. For example a data set from Nicholson (1979) comprising unique precipitation data across Africa has been integrated in year 2010, as well as an update of the data set of Pavel ("Pasha") Groisman (NCDC, 2005) for the countries of the Former Soviet Union. As a result of these efforts the GPCC holds the worldwide largest and most comprehensive collection of monthly precipitation data, which is continuously updated and extended.
All precipitation data received are stored in source specific slots within the RDBMS and the corresponding meta information and quality indices are assigned to the data. The eight aforementioned major sources (CLIMAT, DWD-SYNOP, CPC-SYNOP, Na-15 tional, Regional, CRU, FAO and GHCN) are considered. Figure 3 displays the temporal evolution of the number of monthly precipitation station data in the GPCC data base from the different sources during the time period 1901-2011. The volume and timeliness of the individual data provisions largely differ, with the time delay resulting in an increase of the number of stations being available 20 for the analyses looking back from year 2011 where basically the GTS coverage determines the total number of available stations until the month of best data coverage, i.e. June 1986 with monthly precipitation data being available for just 47 400 stations. Looking further into the past, there is a drop from 47 228 stations in January 1986 to 41 285 in December 1985 with an increase further backward until another local max- 25 imum of 45 869 stations for June 1970. This drop is a remainder on the initial project phase where only months since January 1986 where regarded. GPCC will ultimately fill this gap with future data acquisition. The data coverage for all months older than June 1970 shows the typical behaviour of almost monotonically decreasing data availability 935 ESSDD 5,2012 A description of the global land-surface precipitation data products Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | due to loss or not yet performed data rescue with regard to digitization of historic data records. Only World Wars I and II shortly interrupt this monotonous decrease with age of the data records. Despite the number of stations successfully recruited, the homogeneity of its global distribution is another quality criterion for a climate data set. Therefore, Fig. 4 demon-5 strates how the coverage of the GPCC data base is composited (as of July 2012) with regard to the six regions defined by the WMO Regional Associations (RAs). Interestingly the total numbers are quite similar for the periods before World War II across all RA's despite RA1 (Africa), whereas a spread occurs for the periods post 1950 with RA4 (North America) and RA6 (Europe) showing the highest numbers up to 10 603  Fig. 4 demonstrates that the time elapsing until the arrival of historic data is much shorter for RA4 and RA6, respectively, where it takes 5-7 yr in distinct contrast to RA1-3 where the increase through historic data is much lesser and 20 it can take easily 20 yr before data arrives at the GPCC. This situation challenges all efforts to achieve a geo-temporal homogeneous station coverage which is a prerequisite for highly reliable gridded trend analysis products solely based on in-situ data.
As listed in Appendix A, GPCC holds also records from the 19th century, but the overall coverage is currently too low to justify reanalysis of these early periods. Further 25 successes in data acquisition and rescue might change this situation in the mid-term future.
All in all acquisition and integration of the additional non-GTS data from sources (i) to (v) takes much longer than through the GTS, but even delays of some years are to be accepted for the sake of a high quality and reliable quantitative gridded reanalysis, being crucial for global climate variability and hydrological studies. Therefore, the processing of the individual data collectives is a continuous GPCC activity and requires a number of steps: a. Identification of the file content (variables, period), general structure and specific f. Semi-automatic quality-control of the monthly precipitation data based on a comparison of the data from the different sources with respect to the spatial and sta-15 tistical data structure.
Apparently redundant data from different sources for the same stations and time allows for cross-comparison, quality-control and assessment of the accuracy of the data to be selected for analysis. This quality controlled merging of data from all eight sources leads to the best possible and comprehensive data base. The semi-automated QC 20 system applied is detailed in Schneider et al. (2012). All products are generated out of this data base by selection of data with respect to the data quality and product specifications. The spatial distribution of 6325 stations for the GTS data basis and of all 46 711 stations available for a well-covered month (July 1987) is shown in the left column of Fig. 5 substantially with large data-sparse regions, in particular across parts of Africa, Central and South America, East and Central Asia. The spatial distribution of 7964 stations for the GTS data basis forming the August 2011 monitoring product versus the more than 67 200 stations available for the GPCC climatology of the month August is shown on the right hand side of Fig. 5. The row by row comparison in Fig. 5 demonstrates how the 5 time constraint affects the data availability and station density; the column by column comparison in the top row reveals the limited temporal homogeneity of the GTS data coverage and in the bottom row the improved data coverage for the longer integration period of a climatology versus a monthly reanalysis which serves an argument for the anomaly interpolation method introduced in Sect. 4.

10
So the data coverage is very different depending on whether the data collection takes place with a time constraint (in online mode) or if time is a less important criterion. As will be shown in Sect. 7, both modes have their applications. In all cases the availability of a reliable background climatology is crucial for the quality of the analysed product.
During the last two decades the set of GPCC data and products has continuously 15 grown both in temporal coverage, as well as in extent and quality of the underlying data base. Until the end of 2003, the period covered by the GPCC reanalysis products reached back from present to just 1986, when the GPCP project was started. Later, in years 2004 and 2008 GPCC extended this period back to 1951 and 1901 respectively, as shown in Fig. 1 where the evolution of the GPCC Monthly Precipitation Database 20 throughout the dates of issuance of the latest five Versions of its Full Data Reanalysis Product (GPCC FD) is depicted. This product is only updated after substantial growths of the data base. It can be seen, that the starting period of GPCC, 1986GPCC, -2001, is still the period with the highest number of station data. However a larger increase of the number of stations available for the period before 1986 and after 2001 is visible in par-25 ticular for the updates from Version 3 to 5. So the gap from 1986 to the years before is almost closed with issuance of the most recent Version 6 issued in December 2011 and discussed in this paper. Moreover the number of 30k, 35k, and 40k stations is exceeded for the 56, 45 and 31 yr periods from 1950-2005, 1959-2003 and 1962-1992,  respectively, making those periods in particular reliable for analyses of means, anomalies, variability and even trends of global land-surface precipitation. Figure 6 shows the evolution of the number of station months in the GPCC Monthly Precipitation Database (decades with data from 1901 onwards) during the period August until December 2011. It indicates that the extension of the GPCC data base con-5 cerning historical data (data before year 1951) started in 2007. The historical extension of the GPCC data base during the last 8 yr is very visible by looking at the decades with data before year 1981. Altogether the number of station months tripled from 13 to just 40 million making GPCC the host of the worldwide largest and most comprehensive collection of monthly precipitation data, which is continuously extended. Green, blue and magenta colours indicate grids with a nominal sampling error of less than 10 % of the precipitation total on the grid according to Jenne and Joseph (1985). 15 This criterion is missed across vast areas in particular during the first two decades ( Fig. 7a, b), but later the world-wide best data coverage of the GPCC is good enough for a fair spatial homogeneity of the station density. Figure 7h shows the consequences of the rather limited number of available GTS stations, leading to a wide-spread exceedance of the 10 % sampling error criterion. Comparison of Fig

Data processing and quality control
The collected data are imported into a relational data base, where they are kept in eight separate source specific slots. This methodology allows for a source specific cross-comparison of the data. As none of the sources is error free, each source is allowed to provide for the reference information on a case-by-case basis. This is realized 5 by a comparative analysis of data entries from different sources relevant for the same or neighbouring stations, the latter only in cases staying ambiguous if only the station itself is regarded. Typical errors identified during data import are factor-10 (caused by a format shift or coding errors), factor 2.54 or also factor 25.4 errors due to wrong inch to mm conversions, shifts of the reference time, or geo-reference errors that had affected the data already before arrival at the GPCC. Any time new data is imported to the data base, an elaborated procedure is applied to compare the accompanying metadata of the pertinent stations to the metadata already available for this station from the data base. In case of discrepancies (e.g. deviating coordinates), external geographical sources of information are utilized to decide whether a correction of the metadata 15 information in the data base is required or not. Moreover the precipitation data to be imported is compared against a background statistic. Exceptional values are checked and either confirmed, corrected if possible, or flagged as erroneous and thereby excluded from the analyses. This approach requires a high level of human interaction, due to the complexity of the error analysis, which varies strongly from case to case in 20 the absence of general valid screening criteria. Nevertheless, despite all corrections applied by the GPCC, a set of the original data is also kept, allowing backtracking of all corrections. A detailed account on GPCCs data processing and quality control is presented by Schneider et al. (2012).

Calculation of gridded precipitation data sets (interpolation method)
The calculation of area means on the grid cells from gauge observations consists of three major steps, the interpolation from stations to regular latitude longitude grid points staggered at 0.25°resolution, the calculation of area-mean precipitation for grid cells sized 0.25°(GPCC-CLIM) or 0.5°(GPCC-FD, MP), and the assessment of area-mean 5 precipitation for larger grid cells (0.5°, 1°or 2.5°) or other areas (e.g. river basins).

Interpolation of gauge data onto regular grid points
For the GPCC (background) climatology and the full data reanalysis products on a 0.25°latitude longitude grid, GPCC still prefers the very robust empirical interpolation method SPHEREMAP. The method constitutes a spherical adaptation (Willmott et al., 10 1985) of Shepard's empirical weighting scheme (Shepard, 1968), which is taking into account: a. the distances of the stations to the grid point (for limited number of nearest stations), b. the directional distribution of stations in relation to the grid point (in order to avoid 15 an overweight of clustered stations), and c. the gradients of the data field in the grid point environment.
This choice was made in 1991 following external studies (Legates, 1987;Bussieres and Hogg, 1989) and internal inter-comparison studies (Rudolf et al., , 1994 indicating the SPHEREMAP method of being particularly suitable in analysis of a global precipi-20 tation climatology. In an inter-comparison study of four different interpolation schemes (Bussieres and Hogg, 1989) it was the best of the empirical schemes and did a job almost as well as Optimum Interpolation. Willmot et al. (1985) apply a weighting method for all stations beyond a minimum distance to the grid point (ε 1 , circles filled blue in Fig. 9). However, if stations closer are found their method only relies on those stations and applies a simple arithmetic mean for them, while neglecting all station outside this environment. This leads to neglecting many potentially useful stations and information in areas of high station density. Therefore GPCC has introduced the following modifications to interpolate data of stations surrounding each point of the stereographic GPCC product grids, as follows:

5
-Vicinities are defined by concentric circles of different radii (see Fig. 9) defining threshold distances to the grid point regarded -A second distance (ε 2 , circles filled green in Fig. 9) is introduced defined by 50 % of the grid cell size (depicted by grid lines in Fig. 9). This approach still leaves up to 21.5 % stations unprocessed, but any larger circle would lead to a double use 10 of stations, as the green circles in Fig. 9 would start to overlap each other.
ε 1 is defined by 10 % of the grid size instead of Eq. (14) in Willmot et al. (1985) -The simple arithmetic mean method is now only applied, if stations are found within the vicinity defined by radius ε 1 but not within the wider (green) circle of ε 2 -In all other cases, stations are interpolated with the original weighting method, 15 even those located closer than ε 1 -For the normalization in the weighting method, we keep Shepard's method for the calculation of the combined weighting (Term w in Eq. (7) of Wilmot et al., 1985 and t in the 1st equation of page 520 in Shepard, 1968, respectively) -The determination of the radius to the grid point beyond which the weighting of a 20 station reaches zero is used as published by Shepard (1968) In view of the potentially high number of stations involved in the interpolation process (> 10 000), it is feasible to introduce an intelligent search algorithm to identify for each grid cell the closest stations to be utilized for the interpolation. Instead of ranking the distances across all stations, we apply in advance a clustering of stations on 2 • × 2 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | sized grid cells and limit the search algorithm to a search window being sized as 1.75 times the cluster size (here 3.5 • ) at the equator. Towards higher latitudes the size of the search window is conserved in the canonical manner by scaling with the cosine of the latitude. The 175 % oversizing of the search window warrants that it does not miss stations just outside the closest cluster. Target number of stations to be ranked is 16.
If the original search window finds fewer stations than this target number, the window is doubled in size and the search is repeated until the target number is reached. For latitudes higher than 87.5 degrees the whole area is regarded as one cluster giving one for each pole region. 10 Since year 2008, when Version 4 of the precipitation reanalysis was issued, the GPCC has enhanced its gridding method to a climate anomaly method. This became possible because the GPCC data base allowed the first time for the calculation of the GPCC climatology product (Version 2008) to be utilized as background field for the anomaly method. Now the anomalies can be interpolated with the methods described 15 in Sect. 4.1 instead of the absolute precipitation totals yielding product specific gridded deviations from the climatology being subsequently superimposed to the gridded global background. For earlier versions this methodology was not applicable, as there were too few stations with sufficiently long data series to calculate climatological normals for a reliable gridded global background. The method is most beneficial in data 20 scarce (under sampled) regions as we will demonstrate in the sampling error Sect. 5.2.

SPHEREMAP vs. Kriging interpolation
A statistical interpolation method according to Rubel and Hantel (2001) that basically constitutes ordinary block kriging (Krige, 1951) is also implemented at GPCC as an alternate method. Figure 10 shows the difference of the SPHEREMAP and Kriging 25 method for the particular challenging real-time monitoring product analysis at 1°spatial resolution. It should be noted that for this comparison the absolute values but not the anomalies have been interpolated. Besides the difference field plotted in the upper right part and the monitoring product from both interpolations (lower row) the number of stations included into the analysis is also shown in the upper left graph. The example demonstrates that deviations occur only in data scarce areas where the min-5 imum constraint of 4 stations per grid cell is not fulfilled. However, the deviations are only substantial in regions where station scarcity and high precipitation variability come together, i.e. the Equatorial region of South-America and Central Africa plus the southern rim of the Himalaya. Moreover singular stations with no neighbour stations across long distance naturally produce differences (see Greenland or patches across northern 10 Siberia). In these areas the Kriging method features somewhat smoother patterns and is thus a considerable alternative to SPHEREMAP. Therefore, the GPCC kriging method, being an adaptation of the ordinary block kriging introduced by Rubel and Hantel (2001), is the method currently tested for the daily products currently under development at GPCC, where the issue of under sampling 15 and intermittency is more severe. Daily GPCC products are not part of this paper but will be discussed in a separate paper upon their first issuance. On the other hand, many also station scarce areas exhibit very little deviations, which gives confidence in the quality of the SPHEREMAP interpolation applied in particular for the non-real time GPCC full data reanalysis and the GPCC climatology which can utilize a data cover-20 age that is overall five and eight times better, respectively (Fig. 11). Comparison of Fig. 11b with Fig. 10b demonstrates the diminishing effect of the interpolation method with increasing data density and sampling.

Calculation of area-mean precipitation for the high resolution mother grid
(0.25°or 0.5°lat./long.) 25 For the GPCC products based also on historic data, the first area-average precipitation is calculated as arithmetic mean from the interpolated data from (up to) four grid points representing the corners of a 0.25°× 0.25°grid cell. In doing so, only those corners 944 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | located across land are used, so the mean represents the land-surface precipitation. The 0.25°grid cell means only represent the land-surface proportion of the total grid cell area as derived from USGS GTOPO-30 (USGS, 2012) data projected on the same evaluation grid. For the real-time products the mother grid has 0.5 degrees resolution instead of 0.25 reflecting the weaker data coverage.

Reduction of the gridded precipitation for coarser grids
To allow utilization of the precipitation totals for multiple purposes, area-average precipitation are calculated by the GPCC onto coarser grids (1.0°or 2.5°) from the 0.5°m other grid means and also published. In doing so it is important, to take into account the high latitude convergence of the meridians but also the relative land-surface pro-10 portion of the grid cells used. The following formula applies with PAM: Precipitation area mean <SP>: Mean scaled (gridded) precipitation LP: relative land-surface proportion 15 ϕ: Latitude i , j : Horizontal indices counting positive eastward and northward, respectively Figure 12 illustrates the grid topology applied for the GPCC procedure. Any user, utilizing the GPCC products for calculation of a global mean land-surface precipita-20 tion based on the gridded data product (e.g. GPCC FD) needs to apply the same land-surface percentage correction in order to avoid partly severe deviations (easily 20 mm) when applying standard tools alike the "fldmean" command of the climate data operators (CDO, Schulzweida et al., 2011) to reproduce a global land-surface mean precipitation, for example. ii. Stochastic sampling errors due to a sparse network density and/or uneven distribution of measurement sites (spatially heterogeneous data density). It mainly 10 depends on the density of the gauge locations and the variability of the precipitation field according to the climatic/orographic conditions (WMO, 1985;Rudolf and Schneider, 2005;Rudolf et al., 1994).
iii. Residual errors, e.g. resulting from spatial and temporal discontinuities of precipitation measurements associated with changes of observational methods and 15 differences of observational techniques used in different countries (homogeneity).
To address these problems the GPCC provides a gridded quantification for the following errors: i. The systematic gauge-measuring error as (a) climatic or (b)  the error has been estimated for long-term mean precipitation (Legates and Willmott, 1990) and is provided as climatic mean correction factor for each calendar month. The error and thus the required correction is large in snow regions respectively in cold seasons.
b. SYNOP derived correction: with the GPCC MP available for all months since 5 January 2007, an on-event correction method for systematic gauge measuring errors is also available at GPCC (Fuchs et al., 2001). This correction is usually smaller than the climatological correction, however it is still a rough bias estimate based only on wind, weather, temperature and humidity data retrieved from synoptic observations of ca. 6000 stations available worldwide. 10 These corrections have been calculated for the GTS based Monitoring Product (Schneider et al., 2011a, b) public available for all months since January 2007.
ii. The sampling error of gridded monthly precipitation data has been quantified by GPCC for various regions of the world. Based on statistical experiments using 15 data from very dense networks, the relative sampling error of gridded monthly precipitation is between ± 7 to 40 % of the true area-mean, if 5 rain gauges are used, and with 10 stations the error can be expected within the range of ± 5 % and 20 % (Rudolf et al., 1994). The error range for a given number of stations represents the spatial variability of precipitation in the considered region. In the 20 next Sect. 5.2 we provide a systematic assessment of the sampling error.
iii. The residual errors mainly related to the data homogeneity issue are addressed by construction of a special homogenized precipitation analysis (HOMPRA) data set that relies on a carefully chosen sub-set of stations featuring time series of particular length, completeness and temporal homogeneity. The method has been 25 introduced by Beck et al. (2005) for the construction of the VASClimO data set (based on a sub-set of some 9300 stations) to be replaced soon by HOMPRA that will build on more than 16 H.Österle, personal communication, 2008, 2010) to remove stations with obvious jumps.

A systematic assessment of the sampling error
In order to perform a quantitative assessment of the sampling error of the GPCC 5 products in dependence of station density and gridding (interpolation) method applied, we introduce here two standard sampling error metrics, the mean square error (MSE) and the mean absolute error (MAE), as follows with o, y denoting the observed, interpolated value at station k of the in total n stations. In the following these metrics have been utilized to calculate the sampling error of arbitrarily resampled data sets according to the Jackknife error approach (Miller, 1974  Since introduction of the anomaly based interpolation method in 2008, the monthly GPCC climatology product serves as a background field for the analysis, and is thus of central importance for all products. Anytime the GPCC data base has grown substantially due to successful acquisition 5 and pre-processing of further historic data, including the quality assurance and control performed on the data as described in -For each of these stations a time series is constructed from the up to eight source specific time series available.

15
-Depending on the lengths of the time series yielded, climatic normals are constructed for the reference periods 1951-2000, 1931-1960, 1951-1980, 1961-1990 and 1971-2000. It should be noted, however, that we still accept for each month missing data of up to 10 yr in total.
-If none of these periods is covered by the series examined, a climatic normal is still 20 calculated for arbitrary periods still divided into a long and an arbitrary category.
-Subsequently the 12 monthly minima and maxima of each station's time series for the period 1901-2010 are calculated to be available for interactive sanity checks.
-These 24 station specific ( -In doing so, necessary corrections (e.g. relocation of a station) are fed back into the data base and the associated stations are reprocessed again (which means re-examination from the first bullet of this list) -Based on the corrected data station specific climatic normals are calculated for the reference periods possible (preferably 1951-2000) 5 -Subsequently these normals are loaded to the DB to make them available for gridding with SPHEREMAP yielding the climatology product at 0.25 • spatial resolution -Finally reduction of the high-resolution gridded product to 0.5 • , 1.0 • and 2.5 • grids is performed with the same methods as described in Sect. 4.5 Note: GPCC's monthly precipitation analysis products described in the following section are based on anomalies from climatological normals. For the FD product only anomalies at the stations were utilized. The MP and FG product uses also anomalies 20 based on the corresponding climatological grid value including the station, if the station has no station based climatological normal. The anomalies are spatially interpolated by using the analysis method SPHEREMAP and the gridded anomaly analyses are then superimposed on GPCC's corresponding background climatology.

The GPCC products and their major sample applications and capabilities
Plenty of applications of the GPCC data products have been documented and published (Oldenborgh et al., 2012;Parker et al., 2012, Hennon et al., 2011Rubel and Kottek, 2010;Yatagai et al., 2009Yatagai et al., , 2012Dinku et al., 2008;Gruber and Levizzani, 2008;Kaspar and Cubasch, 2008;Wild et al., 2008;Kottek and Rubel, 2007;Rajeevan 5 et al., 2005;Rudolf and Rubel, 2005). In order to address the wide spectrum of users the GPCC has designed four different gridded monthly precipitation products optimized for partly competing requirements related to the purpose of product use. We categorize the product requirements as follows -Timeliness to support watch functions alike drought monitoring 10 -Quality and high availability at reasonable timeliness to serve as reference in-situ data set for regularly issued satellite-based products -Accuracy via high station density to provide for a minimized sampling error for water resources assessment and case studies -Homogeneity of stations time series to construct a product suitable for trend 15 analysis

The GPCC first guess product (addressing timeliness)
This global gridded product of the monthly precipitation provided on one lat-long grid of 1.0°resolution (Ziese et al., 2011) is based on interpolated precipitation anomalies from more than 6000 stations worldwide. Data sources are synoptic weather observa-20 tion reports (SYNOP) received at DWD via the WMO GTS, and climatic mean (mainly 1951-2000, or other reference periods as described before) monthly precipitation totals extracted from GPCC's global normals collection. An automatic-only QC is applied to these data. Since August 2004, GPCC First Guess monthly precipitation analyses are available within 3 to 5 days after end of an observation month.  Figure 14 illustrates a typical drought monitoring application of 5 the First Guess Product in accumulating monthly totals for a certain period prior to the assessed date for a region chosen to be Portugal here.

The GPCC monitoring product (addressing quality and timeliness)
This global gridded product of the monthly precipitation (Schneider et al., 20011a, b) is based on SYNOP and monthly CLIMAT reports received near real-time via GTS from ca. 7000-8000 stations (after high level QC) and is available within two months after observation month on two lat-long grids of 2.5°and 1.0°resolution. This is the GPCC product with the longest history: operational monthly analysis started in 1986 and has continuously been updated every month since then. The analyses are based on automatic and intensive manual quality control of the input data. In general the 15 GPCC MP is known as the best regularly issued in-situ and GTS based monthly landsurface precipitation reference product, public available.

Major sample application: calibration of satellite based data products
The GPCC Monitoring Product is the in situ component to the satellite-gauge combined precipitation analyses of GPCP (Huffman et al., 1995;Adler et al., 2003) and of CMAP 20 (Xie and Arkin, 1997). Figure 15 shows an example visualization of the GPCP satellitegauge product in terms of the anomaly against a GPCP 1961-1990 climatology for the El Niño (top plot) and La Niña (bottom plot) controlled southern hemispheric years ending in June 1998 and 2000, respectively. Across the land-surfaces each product relied on the twelve GPCC monthly monitoring products.

Auxiliary sample application: early annual reporting and monitoring
The gridded product is also utilized for the annual WMO statement on the status of the global climate (WMO, 2011b) and the BAMS Annual State of Climate (Parker et al., 2012;Hennon et al., 2011). Early assessments on larger scale extreme events like the Pakistan flooding in 2010 or the Thailand flooding in 2011 (Oldenborgh et al., 2012) 5 also rely on this high availability and quality product.

The GPCC full data reanalysis (addressing accuracy)
This global gridded product of monthly precipitation (Schneider et al., 2011c-e) is based on near-real-time and non-real-time data. These are data from NMHS, regional and global data collections, CLIMAT bulletins and values calculated from SYNOP re-10 ports. It uses the same stations applied to calculate the GPCC Climatology product, i.e. more than 67 200 stations for Version 6. Grid resolutions are 0.5°, 1.0°and 2.5°. The QC is extended by an additional manual control. Upon substantial improvements of the data base a new version of this product is released, which happens approximately every 1-2 yr.

Sample application: verification of reanalysis products
Global reanalysis products like the ERA-interim reanalysis (Dee et al., 2011) become more and more popular to hindcast the most recent decades of the global climate and to serve geo-temporal homogeneous and contingent reference data sets for the validation of climate prediction models. However, the quality of the precipitation data in these 20 model reanalysis's requires particular attention as precipitation is not a diagnostic but a prognostic parameter in the underlying global numerical weather prediction models utilized. Hence there is a need for purely observational analysis products of precipitation. It is the GPCC FD that proves to have a particular strength here (Simmons et al., 2010;Simmons, 2011). Its 110-yr coverage will also allow provision of reference information for the currently running ERA-CLIM effort (http://www.era-clim.eu/) to reanalyse the entire 20th century.

Sample application: analysis of historic global precipitation and the global hydrological cycle
The GPCC FD is well suited to provide reference information on the precipitation for 5 certain periods and regions of interests due to its optimization for station density yielding a minimized sampling error of the interpolation. This makes it also most suitable to study the global water cycle, arctic precipitation (Mächel et al., 2012) and to derive mean precipitation across regional and global river catchments. 10 An excellent sanity check for the centennial GPCC FD product is to diagnose regions being sensitive to indices of natural large scale variabilities of pressure patterns, known to govern precipitation patterns like the North Atlantic Oscillation (NAO) and the El Niño Southern Oscillation (ENSO) index (SOI). In doing so we have correlated the monthly resolved temporal evolution of precipitation for every 0.5°sized grid cell of the GPCC 15 FD product against the negative SOI (Fig. 16) and NAO (Fig. 17) Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | exposed. Even the narrow area along the pacific coastline of Ecuador is well resolved. For the NAO sensitive areas (Fig. 17) the typical north-south gradient from positive to negative correlation across Europe is well visible for the annual average and in particular during wintertime. Interestingly this analysis exposes also other regions world-wide where the attribution to NAO is less straightforward. The strength of both analyses lies 5 in the fact that it is based on a purely observational data set and along a 110-yr period, making the result accordingly reliable. GPCC will publish a thorough discussion of these regional sensitivities of precipitation in a separate paper.

Sample application: trend analysis despite limited data homogeneity
Due to the GPCC success in data acquisition the data coverage (Figs. 1, 3, and 4) of 10 the GPCC FD product does not feature anymore a strong inhomogeneity at the end of 1986, in contrast to predecessor versions (e.g. Version 4, see Fig. 1). Notwithstanding the fact that a robust trend analysis requires the scrutiny of a comprehensive station data homogenization, as currently carried out for the HOMPRA data set, the GPCC FD product can also be applied for trend analysis if the method of Sen (1968), according 15 to the interpretation of Huxol (2007) being rather robust against inhomogeneities is utilized. This is done while bearing in mind the inhomogeneity issue during interpretation of the results. A thorough trend analysis shall be provided in a separate paper; here we only show the 110-and 55-yr trends across the globe at 0.5°resolution (Fig. 18)

The GPCC Homogenized Precipitation Analysis (HOMPRA; addressing homogeneity)
While the GPCC FD product involves all available stations with time-series longer than 10 yr, this constraint is still not strong enough to warrant a data coverage that is stable across all times of multi-decadal or centennial studies of variability and trends of pre-5 cipitation. And even if a longer time constraint is applied, the lack of homogeneity of long-term series of in-situ precipitation observations remains a challenge to be met by appropriate detection and -if possible -correction to ultimately allow for a robust trend analysis. Currently the GPCC develops its new Homogenized Precipitation Analysis (HOMPRA) product that is based on a limited data collective of little more than 16 350 10 stations that feature an above 90 % availability of data across a 55 yr period from 1951-2005. For these stations an automated version of the homogenization tool PRODIGE (Caussinus and Mestre, 2004;Mestre, 2004) developed by Rustemeier et al. (2012) is applied. Unfortunately the evaluation could not be finalized until the submission of this paper made in due course to still be eligible for assessment by the authors to Chapter 15 2 of the WG-I part of the 5th assessment report of IPCC. Therefore, the station based trend analysis shown in Fig. 19 did not undergo the scrutiny and correction of Auto-PRODIGE. Notably areas of positive trends match to a good extend those identified by the evaluation of the non-homogenized GPCC FD product already. In addition the positive trends across Northern and Western Australia also identified by the VASClimO 20 analysis appear again on the subset of the HOMPRA stations. Moreover the cluster of stations with positive total trends for periods within 1951-2005 across the US (Fig. 19) covers a much bigger area compared to the GPCC FD based trend analysis (Fig. 18) for periods 1901-2011 and 1951-2011, respectively and precipitation, leading to very local effects in dependence of elevation, surrounding orography and exposition of each station. This can also induce inhomogeneities when stations are relocated. Only with completion of the HOMPRA data set a robust gridded trend analysis updating the VASClimO data set can be provided. For the time being the user is referred to the VASClimO data set also provided through the GPCC products 5 download gate. The access to all GPCC products is specified in the following Sect. 8.

Access methods
The different gridded monthly precipitation data sets of GPCC, as well as the GPCP aforementioned products, the "GPCC Visualizer" and current documentation of each product, for the users' convenience.

User advice
Whenever considering usage of GPCC gridded land-surface precipitation products: -Check which product is most suitable for the application purpose with regard to 20 the priority of timeliness, regional accuracy, or homogeneity.
-Pay attention to the accuracy-related information provided by the GPCC (number of stations per grid, systematic error). Check the error range by consideration of the systematic error estimates and the regional number of stations used. 958 5,2012 A description of the global land-surface precipitation data products -Do not compare regional area-means which are calculated from data sets on different grid resolutions. The rough approximation of coastlines may cause relevant deviations between 2.5°and 1.0°based area means. If you use standard software tools, please note that they have their own approaches to consider the land-surface percentage thus yielding means that are potentially tool specific.

5
-When analysing long-term climate variability and changes do not combine different GPCC products available for different periods, which may cause discontinuities in time. Only a homogenized product like the HOMPRA product under development is fully adjusted to support long-term precipitation trend analyses.
-For periods where both, the FD V6 and the MP V4 product are available, rather 10 refer to the FD product which is always based on more stations than the MP product. Only if you need to reproduce a GPCP product, reference to the MP product is meaningful.
-Reference to the GPCC through citations of this publication is requested from the users, and feedback about the application of the products to http://gpcc@dwd.de 15 is very welcome.

Conclusions
Reference information on the most recent versions of the four gridded monthly observational data sets on the global gauge-based land-surface precipitation constructed and published by the Global Precipitation Climatology Centre (GPCC) is provided. Each of 20 the four data set products is optimized for a specific purpose where either best stability and representativeness, (Climatology Version 2011), high accuracy (Full Data Reanalysis Version 6.0), high availability and reasonable timeliness (Monitoring Product Version 4.0), or high timeliness (First Guess Product) are the major requirements. For all products GPCC claims to serve the best possible observational gridded monthly land- data base of quality controlled rain gauge data collected from eight different sources allowing also for cross-checking of redundant data gathered from multiple sources. With the Digital Object Identifiers (DOIs) issued for each product and spatial resolution of the gridded data sets we hope to have provided a repository of high quality gridded precipitation analysis across the past 110 yr from present back to year 1901, 5 for the general public as well as for the scientific user community including the authors of the 5th assessment report of the IPCC. The data sets are accompanied by all essential information on their genesis and the corresponding ISO compliant metadata. All gridded products of the GPCC are provided through a download gateway under ftp://ftp.dwd.de/pub/data/gpcc/html/download gate.html hosted permanently by

the Deutscher Wetterdienst
Utilization examples of the GPCC products encompass case studies of specific events in the near or long term past, identification of ENSO and NAO sensitive precipitation regions and trend analysis across periods that can be chosen to up to 110 yr length starting from year 1901. Moreover the GPCC suite of documented gridded prod- 15 ucts establishes a homogeneous reference data base for model validation and crosscomparison and calibration with non in-situ based data sets.

Outlook
A fifth product optimized for homogeneity is in preparation, thus its predecessor VASClimO V1.1, thoroughly utilized for the 4th assessment report remains a recom-20 mended reference and is also available through the aforementioned GPCC download gateway. The replacement product HOMPRA (Homogenized Precipitation Analysis) will be superior in terms of the number of supporting stations that is effectively doubled (given the too high density of stations across central Europe for VASClimO) and the homogenization methodology utilized. The publication of HOMPRA is scheduled for 25 year 2013. Since 2011 GPCC has also commenced analysis of daily precipitation within an effort to combine a daily version of the HOAPS-3 product (HOAPS-4) with a daily GPCC precipitation analysis. First prototype results are expected to become available in year 2013, but providing a purely observational gridded daily data product at a reasonable quality and reliability remains a challenge. A major prerequisite for future enhance-5 ments on daily products lies in the success in data acquisition, as the demand on the station number and density is much higher for daily data.

ESSDD
We are grateful for the contribution of HermannÖsterle, Potsdam Institute for Climate Impact (PIK), who has de facto served as beta tester of Versions 4 and 5 of the Full Data Product, while 10 checking them for the existence of in-homogeneities. His work has substantially supported the quality control and improvement of the GPCC products.
We are also grateful to colleagues from Deutscher Wetterdienst, namely Hermann Mächel for sharing his expertise in data analysis and quality control, Peter Stender for administration of the data acquisition, and Tanja Winterrath for a thorough review of the manuscript. 5,2012 A description of the global land-surface precipitation data products Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Bussieres, N. and Hogg, W.: The objective analysis of daily rainfall by distance weighting schemes on a mesoscale grid, Atmos. Ocean, 27, 521-541, 1989. Caussinus, H. and Mestre, O.: Detection and correction of artificial shifts in climate series, J.