SPREAD: a high-resolution daily gridded precipitation dataset for Spain – an extreme events frequency and intensity overview

A high-resolution daily gridded precipitation dataset was built from raw data of 12 858 observatories covering a period from 1950 to 2012 in peninsular Spain and 1971 to 2012 in Balearic and Canary islands. The original data were quality-controlled and gaps were filled on each day and location independently. Using the serially complete dataset, a grid with a 5× 5 km spatial resolution was constructed by estimating daily precipitation amounts and their corresponding uncertainty at each grid node. Daily precipitation estimations were compared to original observations to assess the quality of the gridded dataset. Four daily precipitation indices were computed to characterise the spatial distribution of daily precipitation and nine extreme precipitation indices were used to describe the frequency and intensity of extreme precipitation events. The Mediterranean coast and the Central Range showed the highest frequency and intensity of extreme events, while the number of wet days and dry and wet spells followed a north-west to south-east gradient in peninsular Spain, from high to low values in the number of wet days and wet spells and reverse in dry spells. The use of the total available data in Spain, the independent estimation of precipitation for each day and the high spatial resolution of the grid allowed for a precise spatial and temporal assessment of daily precipitation that is difficult to achieve when using other methods, pre-selected long-term stations or global gridded datasets. SPREAD dataset is publicly available at https://doi.org/10.20350/digitalCSIC/7393.


Introduction
Daily precipitation is a key variable to understanding the behaviour of extreme weather events and the severe impacts they cause on hydrological systems, in natural systems and on human societies.These impacts can be considered in regional and local plans which can help to mitigate major disasters if the correct environmental data are available.Unfortunately, the raw climatic information is usually highly fragmented in time and discontinuous in space.Quality control and reconstruction processes are therefore in high demand, as well as final products such as serially continuous observa-tional series or gridded datasets.High-resolution spatiotemporal precipitation datasets are useful tools for land management, and their availability over a complete region can be useful to many other fields due to the relevance of precipitation in many disciplines as hydrology (for instance in water resources management at catchment scale) (e.g.Werner and Cannon, 2016;Kay et al., 2015;Lorenz et al., 2014;Maurer et al., 2002), environmental risks such as droughts (e.g.Touchan et al., 2011;Vicente-Serrano et al., 2010a;Bordi et al., 2009;Andreadis et al., 2005), floods (e.g.Hartmann and Andresky, 2013;Rojas et al., 2011)

or wildfires
Published by Copernicus Publications.(Westerling et al., 2003), groundwater recharge (e.g.Döll and Fiedler, 2008) or agricultural applications (e.g.Mishra and Cherkauer, 2010;Lo et al., 2007).Upon the development of these kinds of datasets, the researchers cannot only improve their studies in analytic climatology but also in environmental studies, habitat modelling, climate change model validations, relations of climate and vector diseases, crop forecasting, agroclimatic selection of plant landraces, palaeoclimatic analysis supported on the observational period, etc. Gridded datasets provide valuable products that can be used for both scientific and decision-making policy purposes.
The number and density of climate observatories are finite.Therefore, there have been many contributions over the last decade to solve this problem by creating high-resolution and world-based gridded datasets based on the (geo)statistical relations of the observatories (New et al., 2002;Hijmans et al., 2005;Di Luzio et al., 2008;Harris et al., 2014;Schamm et al., 2014).However, these global datasets are often not optimum for regional analysis, where a higher resolution both in space and time is required (Herrera et al., 2012).Increasing temporal resolution from monthly or annual to daily scale allows analysis of other relevant components of the climate, such as the extreme precipitation events or the length and intensity of dry spells that precede most of the annual hydrological processes.In this respect, daily precipitation grids of different spatial resolutions have been developed at the global (Piper and Stewart, 1996;Menne et al., 2012) or regional scale (Frei and Schär, 1998;Eischeid et al., 2000;Kyriakidis et al., 2001;Rubel and Hantel, 2001;Klein-Tank et al., 2002;Hewitson and Crane, 2005;Liebmann and Allured, 2005;Haylock et al., 2008;Klok and Klein-Tank, 2009;Mekis and Vincent, 2011;Hwang et al., 2012;Yatagai et al., 2012;Jones et al., 2013;Chaney et al., 2014;Isotta et al., 2014;Hernández et al., 2016).
A few daily datasets have also been made for Spain or some of its regions (Vicente-Serrano et al., 2010b;Herrera et al., 2012).These datasets are useful to analyse global precipitation, which in Spain have a wide range of different spatial and temporal distributions (Esteban-Parra et al., 1998;Rodríguez-Puebla et al., 1998;Belo-Pereira et al., 2011;De Luis et al., 2011;González-Hidalgo et al., 2011;Cortesi et al., 2014).But they are also suitable to analyse spatial patterns in daily characteristics of precipitation through extreme indices that allow comparisons between regions.In Spain, these indices have been applied to mainland Spain (Martín-Vide, 2004;Herrera et al., 2012;Merino et al., 2015) and to specific regions (Casas et al., 2007;Martínez et al., 2007;Rodrigo and Trigo, 2007;Lopez-Moreno et al., 2010).All of these datasets used the precipitation estimations to compute climatic indices.However, unlike temperature, uncertainty in precipitation estimation is considerably higher.Firstly, raingauge measurements can be notoriously uncertain, especially in certain circumstances such as windy conditions (Rodda and Dixon, 2012) that can lead to significant undercatches in highly exposed areas (Rodda and Smith, 1986).Secondly, spatial variability of daily precipitation can be extremely high under certain atmospheric conditions such as convective processes that can occur at a very local scale.Whereas the sum of these uncertainties is very difficult to estimate in practice, the uncertainties of these estimations are expected to decrease as the density of observations used to compute the models increases (Tveito et al., 2008;Hofstra et al., 2010).This issue has important implications in subsequent climatic analyses, and this is one consideration that needs to be taken into account (Beguería et al., 2015).
In this paper, we present SPREAD (Spanish PREcipitation At Daily scale), a new high-resolution daily gridded precipitation dataset for Spain.Thirteen daily and extreme precipitation indices were calculated as an example of applicability, characterising daily precipitation distribution (4 indices to characterise the spatial distribution of daily precipitation and 9 for extreme precipitation).The grid was computed for 1950-2012 period covering peninsular Spain and 1971-2012 for Balearic and Canary islands.It was built from 12 858 original stations by creating reference values (RVs) using generalised linear models (GLMs) based on the 10 nearest observations using altitude, longitude and latitude as covariates.All the calculations were developed with red-dPrec (https://cran.r-project.org/web/packages/reddPrec),an R package containing the required functions to reconstruct original daily precipitation series and create grids (Serrano-Notivoli et al., 2017a).
This paper is organised as follows.Section 2 describes the original dataset.Section 3 shows the methods applied for the reconstruction and the gridding process considering uncertainties of estimates.Also, the statistical procedures are explained here.Section 4 shows the results of the reconstruction, gridding procedure and spatial distribution of daily and extreme precipitation characteristics in Spain.Section 5 specifies the availability of the dataset.The results are discussed in Sect.6, with some final remarks in Sect.7.
Most of the data were provided by the Spanish Meteorological Agency (AEMET), but we also used data from regional hydrological and meteorological services, and from the national agronomic network (Table 1).The greatest part of the information comes from manual stations; the automated ones entered service in mid-1990s, making up 23 % of the total in AEMET network in 2012.The mean length of the original series was 18.8 years, and only 17 of the 12 858 orig- inal observatories had covered the period 1950-2012.However, the spatial distribution of the observatories showed a remarkable density (Fig. 1 main map), which is useful to make proper reconstructions.Although the recovered information from raw databases of the meteorological offices was precipitation, in some cases this can include both rainfall and the water equivalent of snow if the source did not make the distinction.

The reference values (RVs) as a general tool for quality control and reconstruction
The key process for climate reconstruction is based on the calculation of individual RVs for each day and location, according to the information available from the closest neighbouring stations.GLMs are used to compute the RVs using the precipitation data (occurrence and magnitude) of the 10 nearest neighbours as the dependent variable and the geographic information of each station (latitude, longitude and altitude) as the independent variables.The geographic information of each station (latitude, longitude and altitude) is used as the independent variable for modelling.As the data availability varies from day to day, selected neighbour stations also vary.Since independent models are constructed for each location and day, the estimated parameters of the models (reflecting the influence of the in-  1950 1953 1956 1959 1962 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007  dependent variables and their probability to influence the occurrence and magnitude of precipitation) may also vary with measurement day and location.
Including three factors in the model allows for increased sensitivity of the model to be able to reflect the local changes in precipitation patterns.Because this method is based on local and independent data points across time, there are no restrictions imposed due to the length or structural characteristics of the series, allowing efficient use of all of the available information.
The computation of each individual RV is based on two predicted values: (i) a binomial prediction (BP) of the probability of occurrence of a wet day and (ii) a magnitude prediction of precipitation (MP), in the case where a wet day is predicted.The combination of these two values (RV = MP if BP > 0.5, else RV = 0) produce the estimated (RV) and its corresponding standard error for each day and location.Further details on the statistical procedures are described and discussed in Serrano-Notivoli et al. (2017a, b).
The obtained RVs were first used to develop a quality control (QC) test in order to detect anomalous data in the original dataset, and then to estimate precipitation at all missing locations and days for the whole dataset.Finally, a 5 × 5 km grid was built for all the whole of Spain and the entire period of reconstructed stations.The reconstruction and gridding processes were applied using the R package reddPrec following the methodology described in Serrano-Notivoli et al. (2017a).
The QC process detected and removed suspect data by comparing daily values registered at each station with pre-dicted values calculated from its surrounding observations.Five criteria were used to flag and remove suspect data from the original dataset -(1) suspect data: observed value was over zero and all their 10 nearest observations were zero; (2) suspect zero: observed value was zero and all its 10 nearest observations were over zero; (3) suspect outlier: the magnitude of the observed value was 10 times higher or lower than that predicted by its 10 nearest observations; (4) suspect dry day: observed value was zero, wet probability was over 99 %, and predicted magnitude was over 5 mm; and (5) suspect wet day: observed value was over 5 mm, dry probability was over 99 %, and predicted magnitude was under 0.1 mm.
Once the QC was completed, a new set of RVs were calculated using the curated dataset.Since the RVs were calculated for all days and locations, including those for which an observation exists but without using that observation, the comparison between RVs and the corresponding observed values constitutes a leave-one-out cross-validation (LOO-CV) process.A number of goodness of fit statistics were therefore used for assessing the quality of the estimated values (RVs): the mean absolute error (MAE), as a measure of the error magnitude; the mean error (ME) and the ratio of means (RM = mean of estimations / mean of observations), as a measure of bias; and the ratio of the standard deviations (RSD = SD of estimations / SD of observations), as a measure of bias in the variance.These statistics were computed for monthly aggregates and for 13 daily and extreme precipitation indices (described on Sect.3.3).
Additionally, scatterplots were made between observations and estimations of daily precipitation at the station loca-Earth Syst.Sci.Data, 9, 721-738, 2017 www.earth-syst-sci-data.net/9/721/2017/ tions using the number of zero precipitation days (dry days), daily means (considering all days), medians of the wet days and the 95th percentile of wet days, and the Pearson's correlation statistic was computed in each case.Finally, the missing values in the original data series were filled with the reference values (RV).From 12 858 original observatories, we reconstructed those that had more than 10 years of original data (7604 stations).This guaranteed a reliable reconstructed series with enough observations to compute the grid.

Gridding and uncertainty
The same procedure based on the calculation of RV was used to build a 5 × 5 km spatial resolution grid.For each point of the grid (x, y and z) and each day of the total period, RV was computed based on the data of the 10 closest reconstructed stations.
As a measure of uncertainty, we computed the standard error (in mm) for each RV.Using the ratio between this error and the RV we obtained the relative error (expressed in per- centage) in each index and aggregation (monthly, annual and seasonal).

Applications: daily mean and extreme precipitation indices
As examples of possible applications of the gridded dataset, four indices using daily precipitation were calculated to characterise the spatial distribution of daily precipitation and its extremes (nine more indices) (Table 2).Most of these indices are included in the suite of extreme precipitation and temperature indices (Zhang et al., 2011) developed by WMO Expert Team on Climate Change Detection and Indices (ETCCDI).
They have been applied in previous works to assess the distribution of extreme events in many areas (e.g.Donat et al., 2013Donat et al., , 2014;;Keggenhof et al., 2014;Asadieh and Krakauer, 2015;Sanogo et al., 2015;Yin et al., 2015;Sigdel and Ma, 2016).All the indices were computed at the annual scale and the average annual values were calculated.

Reconstruction of the observational dataset and grids
The quality control process flagged and removed an annual average of 2.4 % of data in peninsular Spain, 1.7 % in the Balearic Islands and 1.8 % in the Canary Islands.There were no major differences in the number of removed data by years.A brief increase was observed in the first 20 years in peninsular Spain data (Fig. 2a), while from 1971 to the end of the period the number of removed data barely changed.
In the Balearic Islands the number of removed data was more variable (Fig. 2b).Suspect data and zeros were usually detected because they represented low precipitation values with estimates from 0 to 1 or 2 mm, so they were included in one of these two criteria.The low detected values of outliers, suspect dry and wet data were probably due to the configuration of the islands, small with a high density of observatories even at high elevations.In the Canary Islands, although the dataset contained more data series than in the Balearic Islands, and the number of available daily stations was very variable, the quality control process removed a similar number of data over the collection period (Fig. 2c).This was also the area with the least number of suspect zeros of the three localities.
A complete 5×5 km spatial resolution grid was calculated based on the reconstructed station series.Daily precipitation was estimated from 1950 to 2012 in peninsular Spain and from 1971 to 2012 in Balearic and Canary Islands.The standard error of the model used to compute the estimations was calculated as a measure of uncertainty for each day and grid point.

Observations -estimation comparison
Daily precipitation values were estimated at the same locations and days as the observed dataset for comparison purposes.This section shows the results of this comparison.

Wet/dry estimation
The number of observed zero precipitation days (dry days) in the entire study area was 57 761 815 and the number of estimated ones (only for corresponding days with observations) was 57 773 250 (a ratio of 0.9998), so it can be concluded that this method is not biased in the prediction of wet/dry days.The comparison between the original dataset and the corresponding estimates showed a high correlation in peninsular Spain, the Balearic Islands and the Canary Islands (Pearson correlation coefficients of 0.83, 0.85 and 0.73, respectively), with similar frequency distribution by station (see Fig. S1 in the Supplement).Terming the wet days as positive (observed P > 1) and the dry days as negative (observed P = 0), the true negative rate p (RV = 0 | P = 0) was over 94 % in all cases (Table 3), and the true positive rate or precision p (RV > 0 | P > 0) was over 79 % except in the Canary Islands, where it decreased to 70 %.To a large extent, the false negatives p (RV = 0 | P > 0) and false positives p (RV > 0 | P = 0) were due to the prediction of precipitation in days with low amounts.In events with very low precipitation amounts, the estimate of the probability of occurrence was likely to be dry, despite the fact that the station could register a minimum quantity of rain (usually under 1 or 2 mm).This causes a brief difference in amounts, becoming more distinct in the dry/wet accuracy assessment.

Magnitude estimation
The comparison between amounts of observed and estimated precipitation showed a high correlation both in daily means (daily mean precipitation by stations, considering the whole series), daily medians in wet days (only considering days with P > 0 in observations and estimations) and in the 95th percentile of wet days (Fig. 3).Daily precipitation means (considering dry and wet days) reached the maximum correlation between observations and estimations (Fig. 3a, d,  g), decreasing in daily precipitation medians on wet days (P > 0) (Fig. 3b, e, h) and considering only the daily precipitation over the 95th percentile on wet days (Fig. 3c, f and i).
However, in all cases the values of the Pearson correlation coefficient were over 0.93.In the Canary Islands the goodness of fit was lower than in the other areas.The Canary Islands experience high orographic rainfall and climatic variability, which is thought to have contributed to the lower availability of data.Most of these islands have their own climate, with high differences between both sides (north-south in eastern islands and east-west in western islands).As the number of observatories on the Canary Islands was limited and the estimates were from a low density of stations, a greater radius was used to get the minimum number of stations required to run the model (maximum 100 km).This can lead to a more inaccurate estimation, although the aggregation by days in-stead of stations showed a better agreement, with correlations over 0.96 in most of the cases (see Fig. S2).
The histograms of estimated and observed precipitation (Fig. 4) showed a good general agreement.There was a slight overestimation of the values below 1 mm, and a slightly flatter distribution around the mean.The agreement between the histograms was high above 10 to 20 mm.The differences found in the lowest values (under 0.1 mm) were due in large part to the fact that estimates below 0.1 mm were allowed.This value was the minimum measurement in the observed dataset.These results were similar in peninsular Spain, the Balearic Islands and the Canary Islands datasets.
One common problem to most observational datasets is the uneven distribution of the stations with respect to the altitude.Overall, the high-elevation areas tend to be under represented in the datasets due to a lower spatial density of stations.This could result in biases in any derived datasets.For instance, in Spain only around 2 % of the stations are above 1500 m a.s.l., which represents 4 % of the Spanish ter-  4).The RM showed that there were no substantial biases (values close to 1) until 1500 m a.s.l., and only a slight overestimation of 6-7 % above this altitude in peninsular Spain.In the Balearic Islands there were under-and overestimations above 500 m a.s.l., but the number of stations is too low to consider these values representative.In the Canary Islands precipitation was slightly underestimated (−8 %) at higher altitudes (> 2000 m a.s.l.).Assessing the monthly aggregates of daily precipitation (Table 5), we found the results were very similar to the ratio of means (RM), indicating the absence of systematic biases, with the exception of November and December which showed a slight underestimation.The ratio of the standard deviations (RSD) was also very close to one in peninsular Spain and the Balearic Islands, indicating no biases in the variance estimation, although there was a small underestimation in November and December.The RSD was more variable in the Canary Islands, with an overestimation between 10 and 20 % in the summer (June to August), and underestimation of around 10 % in November and December.Very low values of MAE were also found (average of 9.89, 8.98 and 5.99 in peninsular Spain, the Balearic and Canary islands, respectively), as well as ME (average of −0.41, −0.29 and −0.73).Summer months usually receive low precipitation (most of them zero) and the events are typically of small spatial extent, leading to higher uncertainty of the estimations, thus producing low values of fit between observed and estimated precipitation.However, the RM was near to 1 in all cases, indicating the absence of bias in the estimations despite the variable uncertainty.
The development of a spatially and temporally complete gridded dataset allowed the assessment of the characteristics of daily precipitation over Spain.For that, 13 daily and extreme precipitation indices were computed.The comparison between the observed and the estimated values of the indices showed values of RM and RSD near to 1 in all the considered spatial units and indices (Table 6), indicating no substantial biases in the mean and variance of the estimated indices.However, RX1 showed a slight underestimation in peninsular Spain and the Balearic Islands, as did R20mm in peninsular Spain and SDII in the Canary Islands, where the number of wet days (NWD) was overestimated.Overall, all the indices were similar at observed stations and their corresponding estimates in the Canary Islands showed the largest differences, as shown at the monthly scale (Table 5).

Spatial distribution and uncertainty in daily precipitation
The daily mean precipitation intensity (PMED) (computed as the median precipitation in wet days) map (Fig. 5a) showed three areas with maximum values (Central Range, Pyrenees and Baetic Range).Overall, PMED was higher (> 8 mm) in the south-western sector of peninsular Spain, in the northwest and in the Pyrenees.The rest of the territory reached values between 4 and 6 mm.The NWD showed a strong gradient from the northwest (Fig. 5c), with more than 100 days of precipitation, to the south-east, where the lowest was fewer than 30 days of precipitation per year.This value was the most frequent in the Canary Islands except in the island of La Palma (extreme west) and in the highest areas of the other islands (over 1500 m a.s.l.), where rainfall occurrence reached almost 90 days per year.The rest of peninsular Spain and the Balearic Islands showed average values between 50 and 70 days.
The mean length of the dry spells (CDDm) (Fig. 5e) showed a similar gradient.The northern sector reached values under 5 days, increasing to the southeast, where the average length of consecutive dry days was more than 30 days, like in the largest part of the Canary Islands.
NWD, CDDm and mean consecutive wet days (CWDm) characterised daily precipitation frequency, showing that in the southwest of peninsular Spain and the Canary Islands the average number of rainy days per year is fewer than 30 and, when rain occurs, the mean length of the events was fewer than 1.5 days.Conversely, the mean number of wet days in almost the entire Cantabric fringe was over 120 days per year, and the mean consecutive number of dry days was under 5 days.The uncertainty of estimates in daily precipitation indices was spatially variable, but in all cases it had an increasing gradient from north-west to south-east in peninsular Spain, especially in PMED and NWD.This uncertainty, which was not necessarily similar to the distribution of its corresponding variable, informed about the reliability of the results of the indices.The higher values in most of the indices occurred in the south-west.

Spatial distribution and uncertainty in daily extreme precipitation
Mean precipitation in wet days (SDII) (computed as the mean precipitation in wet days) in Spain ranged between 5 and more than 25 mm (Fig. 6).The lowest values were distributed in the northern and southern plateaus and at the bottom of the Ebro Valley, unlike the south-east of peninsular Spain, which had higher values similar to the eastern coast, where the total rainfall was low but daily precipitation intensity was very high.The normal values in the Pyrenees were  higher than 15 mm for each wet day, especially in the western half, where the Atlantic influence is strongest.Despite this, the highest values were in the Central Range, with more than 25 mm per wet day, being the area with most precipitation on wet days, in the whole country.
The mean maximum precipitation in one day (RX1) (Fig. 6) was concentrated mainly in the highest areas, which create orographic barriers.The Central Range was the only zone in Spain that reached values higher than 200 mm, decreasing with elevation until 40 mm.This pattern was repli-cated in most of the high-elevation areas of peninsular Spain and islands, but also in the Mediterranean coast, which is characterised by a high frequency of extreme events.The distribution of the maximum precipitation in five days (RX5) (Fig. 6) was very similar, but with a smoother gradient.As the variability in the extreme nature of the RX5 was less intense, the spatial distribution was more homogeneous with milder differences between the regions.
A high number of days per year with more than 10 mm of precipitation (R10mm, Fig. 6) is relatively frequent in Spain, and especially so in a Mediterranean climate as it corresponds to a large part of peninsular Spain and the Balearic Islands.Overall, the spatial distribution of this index was very similar to the mean annual precipitation in Spain, with the highest values in the north-west and in high elevations.Considering events with precipitation over 20 mm (R20mm), the spatial pattern mimicked that of the R10mm.
The uncertainty distribution was very similar for SDII, RX1, and RX5, with higher values along the Mediterranean coast, where intense precipitation is more frequent and, consequently, the differences between neighbouring observatories were also higher.R10mm had a low and homogeneous uncertainty all over Spain and R20mm had very low values over central and southern peninsular Spain, in the Balearic Islands and in the eastern Canary Islands.
A clear latitudinal gradient over peninsular Spain was evident for the mean annual maximum consecutive dry days (CDD) index, with values exceeding 100 days in the south in contrast with fewer than 20 days in the north (Fig. 7 up).The maximum consecutive wet days (CWD) extreme (Fig. 7) had a strong longitudinal gradient, with fewer than 5 days in the east to more than 16 days in the western.The Balearic Islands showed a latitudinal gradient in both indices with more CDD (> 60 days) and less CWD (< 5 days) in the south, coinciding with lower elevations.The Canary Islands had a similar behaviour in all individual islands with the maximum values of CDD (> 110 days) and minimum of CWD (< 5 days).
The 95th percentile of precipitation (R95) showed the maximum values at high-elevation areas and in the eastern and southern sides of the Mediterranean coast.This region is considered the central region of peninsular Spain and was more homogeneous with lower values, coinciding with a more continental precipitation regime.The uncertainty values were very low here, which showed the reliability of the estimations of this index.The percentage of precipitation over the 95th percentile contribution to the annual precipitation total (R95rel) showed different patterns, more extreme at eastern peninsular Spain and western Canary Islands.More than the 30 % of the precipitation along the Mediterranean coast corresponds to events with amounts of precipitation over the 95th percentile.These values are also common in the Balearic Islands, where all areas had R95rel values over 20 %.This spatial distribution represents the extreme character of daily precipitation, especially in Mediterranean areas and in the Ebro Valley.

Discussion
High-resolution gridded datasets are useful for regional analysis of daily precipitation, but the accuracy of the estimates depends mainly on the number of available observatories and on the estimation method.Although the method used to build this grid makes independent calculations of each grid point and day, the results showed coherent patterns in spatial distributions of all indices at regional scales.Some basic parameters of the reconstruction methodology have a key influence on the dissimilarity between different datasets.The selection of one specific method from the many available gridding interpolation methods that can be applied to precipitation may change the final result (e.g.Creutin and Obled, 1982;Hartkamp et al., 1999;Vicente-Serrano et al., 2003;Dobesch et al., 2007;Hofstra et al., 2008;Hwang et al., 2012;Brunetti et al., 2014;Militino et al., 2015;Contractor et al., 2015;Herrera et al., 2016).For example, Robeson and Ensor (2006) and Ensor and Robeson (2008) argued that the use of geostatistical interpolators for daily precipitation leads to a higher frequency of low-precipitation values while greatly reducing the extreme events.In addition, the high flexibility in the independent variables across the sites allows for a reasonable estimation of the uncertainty, which is very important for producing datasets that will feed further analyses.Local regressions have been used widely to model daily precipitation with different approaches (Buishand and Tank, 1996;Rajagopalan and Lall, 1998;Marquínez et al., 2003;Simolo et al., 2010;Tardivo and Verti, 2014;Partal et al., 2015), and in this case we used them to compute, from all reconstructed stations, a high-resolution grid estimating separately the probability of a wet/dry day occurrence and the precipitation amount.This two-step procedure avoids an excessive smoothness of the estimated precipitation fields (Robeson and Ensor, 2006).Furthermore, this individualised calculation allows for an easy update of the dataset, since single days can be reconstructed individually and added to the pre-existing dataset.We also added a measure of uncertainty, which is a big improvement over previous gridded datasets.Uncertainty (which we express by means of the standard error) informs in a quantitative way about the reliability of the estimated data, in a way that can be translated to further calculations such as the daily precipitation indices explored in this article.Our uncertainty estimation arises from a local interpolation, so it varies spatially and from one day to the next, reflecting the changes in conditions that affect the estimates.
Although some previous datasets exist that obtain daily and/or extreme precipitation indices for Spain, the different methodologies to compute them are a key influence in these differences.The use of the longest precipitation series and the spatial resolution of the final grid produces very different results.For instance, considering the global datasets, Sillmann et al. (2013) showed for Spain (eight grid points covering the whole of peninsular Spain) RX5 values from 50 and 200 mm using the HadEX2 dataset and from 40 and 75 mm using CMIP5 dataset.In the present work, these values ranged between 50 and more than 300 in more than 20 000 grid points.These values were the maximums in HadEX2 and CMIP5 for monsoon areas in southern Asia.Similarly, May (2007), using the HIRHAM model, showed SDII values for Spain between 4.5 and 12.5, while in the present work we found that this index can reach val-ues over 25 mm, especially in the Central Range.Schamm et al. (2014) used the GPCC (Global Precipitation Climatology Centre) dataset to show values of daily mean precipitation intensity (PMED) between 0.5 and less than 10 mm, while we found a wider range between 3 and more than 15 mm.Some previous works in Spain that used a lower spatial resolution and a lower number of observatories (Herrera et al., 2012;Merino et al., 2015) showed overestimated values in extreme precipitation indices in some areas (especially in the north-west area of the Iberian Peninsula) but smoothing in others (e.g.Central Range).López-Moreno et al. (2010) showed similar values and spatial patterns in northeast Spain, compared to the ones in the present work for NWD, SDII, CDD and CWD.Despite the relatively highdensity station dataset (217 stations) used in their study, they rejected most of the original stations in favour of only the longest series, resulting in a smoothed spatial distribution of the indices, probably also due to the interpolation method (not indicated).Martínez et al. (2007), using 75 stations, showed for Catalonia (northeast peninsular Spain) the 95th percentile of precipitation values ranging from 20 to 70 mm, which are very similar to those from the SPREAD dataset.All of these works made sub-optimal use of the available data.The values obtained in all of them were correct considering a global conception of precipitation distribution, but daily precipitation requires the highest possible density of observations in order to obtain a proper characterisation of its spatial distribution, especially for extreme precipitation.This work provides a representation of the local variability of extremes by using all the available information and applying a local reconstruction method.If the spatial resolution is amplified for a regional study, based on the use of more precipitation data, the results are less smoothed as shown in Pereira et al. (2016), which used 36 stations in Sierra Nevada (southern peninsular Spain) to compute NWD, R10mm and R20mm, with similar values to this work specifying a precise spatial distribution.The use of the complete information of the precipitation network in Spain provided a more detailed precipitation distribution over time and space.Although only a few stations covered the complete period, the use of short data series helped to estimate the missing precipitation values in longer ones, which were used to build the whole grid.A high number of grid points (higher spatial resolution) in combination with a low-density stations network could lead to higher uncertainties.This work aimed to set a compromise between both factors by using a high number of stations and a medium-high spatial resolution.In addition, the magnitude of the uncertainty informed about the reliability of each estimate.A higher uncertainty means more differences between the data used to estimate precipitation and these differences can be increased with a lower number of stations.

Data availability
The SPREAD dataset is freely available in the web repository of the Spanish National Research Council (CSIC).It can be accessed through https://doi.org/10.20350/digitalCSIC/7393,and cited as Serrano-Notivoli et al. (2016).The data are arranged in six files (daily precipitation estimations and their uncertainties for peninsular Spain, Balearic Islands and Canary Islands) in NetCDF format that allows for easy processing in scientific analysis software (e.g.R, Python) and GIS (list of compatible software at http://www.unidata.ucar.edu/software).

Conclusions
A high-resolution daily precipitation dataset for Spain (SPREAD) is presented.Based on all the available daily precipitation information, a 5 × 5 km spatial resolution grid was built using the reddPrec R package (Serrano-Notivoli et al., 2017a).The original dataset of observations was qualitycontrolled and the missing values were fitted using the 10 surrounding stations for each day and location to obtain a serially complete dataset from 1950 to 2012 in peninsular Spain and from 1971 to 2012 in the Balearic and Canary islands.From this dataset, individual daily precipitation estimations were computed for each grid point, resulting in a gridded dataset which was consequently used to compute four daily precipitation indices and nine extreme precipitation indices.
PMED showed the highest values in the Central Range and other elevated areas, while NWD, CDDm and CWDm followed a north-west to south-east gradient in peninsular Spain, from high to low values in NWD and CWDm and reverse in CDDm.The south-east of the Iberian Peninsula and the Canary Islands were the driest areas in Spain, with fewer than 30 wet days per year and more than 18 days of the average maximum annual dry spell length.These regions registered fewer than 2 days of the mean wet event duration.
Extreme precipitation indices showed that the Mediterranean coast is more active in these kinds of events, but also that the highest values of SDII, RX1, RX5, R10mm, R20mm and R95 are concentrated in a north-south band of northwest peninsular Spain and, especially, in the Central Range.These results have revealed areas with maximum values not detected in former studies, emphasising the importance of the use of all available observatories and a sensible methodology that do not produce excessive smoothing while being able to capture local and day-to-day variability.

Figure 1 .
Figure 1.Location of the precipitation stations used (a), location of Spain in Europe context (b) and geographical references used in the text (c).Number of daily available observatories (grey lines), and its moving average of 365 days (black lines) in the Canary Islands (d), Balearic Islands (e) and peninsular Spain (f).

R
. Serrano-Notivoli et al.: SPREAD: a high-resolution daily gridded precipitation dataset

Figure 2 .
Figure 2. Removed data by criteria (suspect data, suspect zero, suspect outlier, suspect dry day and suspect wet day): (a) peninsular Spain; (b) Balearic Islands and (c) Canary Islands.

Figure 3 .
Figure 3. Scatterplots and Pearson correlation coefficients between observations and estimations of daily precipitation in peninsular Spain (upper line; a, b c), Balearic Islands (midline: d, e, f) and Canary Islands (bottom line: g, h, i).Dots represent the stations and colours indicate the density.Daily precipitation mean (left column: a, d, g), daily precipitation medians in wet days (central column: b, e, h) and daily precipitation over 95th percentile (right column: c, f, i) are shown.

Figure 4 .
Figure 4. Histograms of observed and predicted daily precipitation frequency in (a) peninsular Spain, (b) the Balearic Islands and (c) the Canary Islands.

Figure 5 .
Figure 5. Daily precipitation indices (a, d, e, g) and their uncertainty (b, d, f, h).PMED: daily mean precipitation intensity; NWD: number of wet days; CDDm: mean consecutive dry days; CWDm: mean consecutive wet days.

Table 2 .
Computed indices over daily gridded dataset.

Table 3 .
Accuracy of the wet/dry day estimates: percent observed dry (P = 0) and wet (P > 0) days, and percent predicted dry (RV = 0) and wet (RV > 0) days on observed dry and wet days.

Table 4 .
The leave-one-out cross-validation (LOO-CV) statistics showing the goodness of fit between observations and estimations of daily precipitation separated by altitudes (m a.s.l.).IP: Iberian Peninsula; BI: Balearic Islands; CI: Canary Islands; N: number of stations; MAE: mean absolute error; ME: mean error; %OBS: percentage of observed precipitation; %PRE: percentage of predicted precipitation; RM: ratio of means; RSD: ratio of standard deviations.Results were constrained to 2 decimal places.0-100 > 100-> 300-> 500-> 700-> 900-> 1100-> 1300-> 1500- ritory.For this reason, it is relevant to evaluate the goodness of fit of the estimated values by altitudinal ranges (Table

Table 5 .
The leave-one-out, cross-validation (LOO-CV) statistics, showing the goodness of fit between observations and estimates of monthly aggregates.IP: Iberian Peninsula; BI: Balearic Islands; CI: Canary Islands; MAE: mean absolute error; ME: mean error; RM: ratio of means; RSD: ratio of standard deviations.Results were constrained to 2 decimal places.

Table 6 .
The leave-one-out, cross-validation (LOO-CV) statistics, showing the goodness of fit between observations and estimates of daily and extreme precipitation indices.IP: Iberian Peninsula; BI: Balearic Islands; CI: Canary Islands; MAE: mean absolute error; ME: mean error; RM: ratio of means; RSD: ratio of standard deviations.Results were constrained to 2 decimal places.