Lake surface water temperatures of European Alpine lakes (1989–2013) based on the Advanced Very High Resolution Radiometer (AVHRR) 1 km data set

Abstract. Lake water temperature (LWT) is an important driver of lake ecosystems and it has been identified as an indicator of climate change. Consequently, the Global Climate Observing System (GCOS) lists LWT as an essential climate variable. Although for some European lakes long in situ time series of LWT do exist, many lakes are not observed or only on a non-regular basis making these observations insufficient for climate monitoring. Satellite data can provide the information needed. However, only few satellite sensors offer the possibility to analyse time series which cover 25 years or more. The Advanced Very High Resolution Radiometer (AVHRR) is among these and has been flown as a heritage instrument for almost 35 years. It will be carried on for at least ten more years, offering a unique opportunity for satellite-based climate studies. Herein we present a satellite-based lake surface water temperature (LSWT) data set for European water bodies in or near the Alps based on the extensive AVHRR 1 km data record (1989–2013) of the Remote Sensing Research Group at the University of Bern. It has been compiled out of AVHRR/2 (NOAA-07, -09, -11, -14) and AVHRR/3 (NOAA-16, -17, -18, -19 and MetOp-A) data. The high accuracy needed for climate related studies requires careful pre-processing and consideration of the atmospheric state. The LSWT retrieval is based on a simulation-based scheme making use of the Radiative Transfer for TOVS (RTTOV) Version 10 together with ERA-interim reanalysis data from the European Centre for Medium-range Weather Forecasts. The resulting LSWTs were extensively compared with in situ measurements from lakes with various sizes between 14 and 580 km2 and the resulting biases and RMSEs were found to be within the range of −0.5 to 0.6 K and 1.0 to 1.6 K, respectively. The upper limits of the reported errors could be rather attributed to uncertainties in the data comparison between in situ and satellite observations than inaccuracies of the satellite retrieval. An inter-comparison with the standard Moderate-resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature product exhibits RMSEs and biases in the range of 0.6 to 0.9 and −0.5 to 0.2 K, respectively. The cross-platform consistency of the retrieval was found to be within ~ 0.3 K. For one lake, the satellite-derived trend was compared with the trend of in situ measurements and both were found to be similar. Thus, orbital drift is not causing artificial temperature trends in the data set. A comparison with LSWT derived through global sea surface temperature (SST) algorithms shows lower RMSEs and biases for the simulation-based approach. A running project will apply the developed method to retrieve LSWT for all of Europe to derive the climate signal of the last 30 years. The data are available at doi:10.1594/PANGAEA.831007 .


Introduction
The interest in lake surface water temperature (LSWT) is manifold. The temperature of lakes is an important parameter for lake ecosystems influencing the dynamics of physiochemical reactions, the concentration of dissolved gazes (e.g. oxygen), and vertical mixing (Delpla et al., 2009). Even small temperature changes may already have irreversible effects on the lacustrine system due to the high specific heat capacity of water. All these effects will finally influence the quality of lake water depending on parameters like lake size and volume (Delpla et al., 2009, and references therein).
Numerous studies (e.g. Adrian et al., 2009;Williamson et al., 2009) mention lake water temperature as an indicator of climate change and within the Global Climate Observing System (GCOS) implementation plan (GCOS-138, 2010), it is stated that "observing the surface temperature of lakes [. . . ] can serve as an indicator for regional climate monitoring". Recent studies (e.g. Austin and Colman, 2007;Hook, 2009, 2010;Lenters et al., 2012) have shown that many lakes are getting warmer more rapidly than the ambient air temperature and more work is needed to explain these differences. This warming trend also affects the onset of freezing and duration of ice cover of many lakes, especially in northern latitudes and mountainous regions (Jensen et al., 2007;Dibike et al., 2011).
Beside the climate and ecological importance of water temperatures, LSWT is also of interest for modelling purposes, since sufficiently large water bodies influence mesoscale weather development and LSWT can be assimilated in regional numerical weather prediction models (Balsamo et al., 2012) to make regional forecasts more precise.
In contrast to in situ observations, satellite imagery offers the possibility do derive spatial patterns of LSWT variability. Moreover, although for some European lakes long in situ time series exist (e.g. Livingstone and Dokulil, 2001;Livingstone, 2003), the temperatures of many lakes are not monitored or only on a non-regular basis making these observations insufficient for climate monitoring. In GCOS-154 (2011) it is further stated that trial products of satellite-based LSWT would be desirable.
The Remote Sensing Research Group at the University of Bern (RSGB), Switzerland, is hosting a large data set from the Advanced Very High Resolution Radiometer (AVHRR), a heritage instrument which has now been flown for almost 35 years on the National Oceanic and Atmospheric Administration (NOAA) Polar Operational Environmental Satellites (POES) and on the Meteorological Operational Satellites (MetOp) from the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT). It will be carried on for at least ten more years, thus offering a unique opportunity for satellite-based climate studies.
Nowadays, several different satellite-based LSWT data sets are available (e.g. Politi et al., 2012;MacCallum and Merchant, 2012;Schneider and Hook, 2010), but most  Table 1. of them cover only large lakes (with a surface area of > 500 km 2 ). Oesch et al. (2005) successfully demonstrated that LSWT can also be retrieved for smaller lakes like the majority of the European water bodies in or near the Alps. This data set, however, is only available for a limited time period and more importantly, the technique applied has been developed for the retrieval of sea surface temperatures (SSTs) which may lead to biases in the retrieved temperatures. More modern retrievals (e.g. MacCallum and Merchant, 2012;Hulley et al., 2011) are lake specific taking the lake altitude (i.e. thickness of the atmosphere) and local meteorological conditions into account.
The data set presented herein is based on a regionally optimised technique and covers lakes with sizes of > 14 km 2 for the period 1989-2013. The Radiative Transfer for TOVS (RTTOV) software package and European Centre for Medium-range Weather Forecasts (ECMWF) reanalysis data were used to improve the retrieved LSWT by correcting for atmospheric water vapour effects.
The following section specifies the lake locations and data used to derive the proposed data set. Section 3 explains the LSWT retrieval in more detail. In Sect. 4, we present a comparison of the satellite-based LSWTs with in situ measurements for various sample lakes and with another satellite product. Section 5 addresses the issue of compiling a time series from several satellites. The last section summarises the findings and gives a short outlook on future activities.

Data
This section lists the Central European lakes included in the proposed data set, the lakes for which in situ data is available for the comparison with the LSWT retrieval, other satellite data which were used for a inter-satellite comparison, and provides a detailed description of the satellite data used for deriving LSWT. Table 1. List of European lakes in or near the Alps included in the data set showing their area, volume, and altitude according to Dokulil (2007), Bavarian Lakes, LfU Bayern (2014), Beiwl andMühlemann (2008), andSwiss Lakes, BFS (2014). The numbers in the first column (ID) indicate the position of the lake in Fig. 1. The geographic co-ordinates indicate the position in the lake (centre of a 3 × 3 pixel matrix) for which the satellite data was extracted to create the proposed LSWT data set.

Lakes
The data set includes all major lakes located in or near the European Alps (25; cf. Fig. 1 and Table 1) with sizes from 14 km 2 (Lake Sempach) to 580 km 2 (Lake Geneva). Including lakes with various sizes and thus different morphological characteristics in a regional area could, for instance, be interesting for investigations on whether these lakes react in a similar way to the changing climate. Global satellite-based LSWT data sets (e.g. Schneider and Hook, 2010), however, include only the two largest (> 500 km 2 ) of them, whereas 18 out of 25 lakes presented herein have sizes between 14 and 100 km 2 and 5 of them cover areas between 100 and 370 km 2 . Another global data set from the ARC-Lake project (ATSR Reprocessing for Climate: Lake Surface Water Temperature & Ice Cover; MacCallum and Merchant, 2012) has recently been extended to include some of the larger lakes with sizes < 500 km 2 . Artificial water bodies or lakes used for hydro-electric power generation are not included. In addition, Lake Woerth has also been excluded from the data set, since AVHRR is not able to properly resolve this narrow and elongated lake. According to the European Environment Agency (Stanners and Bourdeau, 1995), about 16 000 lakes in Europe are larger then 1 km 2 with ∼ 2000 > 10 km 2 , ∼ 150 of them between 100 and 400 km 2 (without man-made reservoirs) and 24 of them covering areas > 400 km 2 . With this study we want to demonstrate the potential to derive LSWT for climatological studies from satellite data for the many lakes in Europe within the size range between approximately 15 (depending on the shape) and 500 km 2 , which is the limit used in the study of Schneider and Hook (2010). Of course, such a data set can hardly be provided on a global scale (with ∼ 12 300 inland waters in the range of 10 to 100 km 2 ; Reynolds, 2007), but it offers great potential for regional or even continental scale climate analyses.

In situ data
The lakes in Switzerland are a representative sample for all Central European lakes in terms of size, shape, altitude, and climatic conditions. Therefore, the comparison of the retrieval of LSWT was done only for Swiss lakes. Table 2 lists the lakes and locations with in situ data which were available for the comparison with the LSWT retrieved values. In contrast to the lakes of the proposed data set (Table 1) for which the temperature has been extracted from the lake centre, the locations with in situ observations (Table 2) Table 2. Summary of lakes and locations with in in situ observations of water temperatures used for the inter-comparison with the satellite retrieval. The two values for the size of Lake Constance indicate the area of the entire lake and the area of the subsection of Lake Überlingen. Abbreviations for the various locations are used to easily identify the chosen data set. represent a heterogenous data set in terms of both spatial and temporal sampling. Some sites are placed near-shore, other measurements were taken from the centre of the lake. The sampling frequency ranges from hourly, daily minimum and maximum, daily (one measurement per day), weekly (once per week) to monthly (one observation per month). For the largest lakes (Geneva, Constance) several locations with hourly or daily bulk measurements (0.5-1 m depths) covering most of the period between 1989 and 2013 were usable. The other (smaller) lakes are usually probed vertically once per month, except for Lake Zurich with daily to biweekly probes. Although many of the lakes which are probed once per month provide observations since the late 80s or early 90s, only few coincident in situ and satellite observations were found.

MODIS data
Comparing in situ measurements with satellite-based LSWT can give a first impression of the quality of a satellite retrieval, however, such a comparison includes several elements of uncertainty: (i) different scale of observations (point vs. spatial measurement), (ii) differing depth of the measurement (bulk vs. skin temperature), (iii) difference in time and/or space (e.g. near-shore in situ observations). For these reasons, we also performed an inter-comparison between the standard Terra and Aqua Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST/E) 5-Minute Level 2 Swath 1 km data set (MOD_L2 and MYD_L2, Version 5; Wan and Dozier, 1996) and the AVHRR-based LSWT data set proposed in this study for the years between 2000 (2002 for Aqua) and 2011. MODIS has similar characteristics as AVHRR in terms of spatial and temporal resolution, scan angle, and swath width. One major difference is the more accurate calibration of MODIS ) and higher amount of spectral channels (36 vs. 5 or 6 for AVHRR) which can be used to discriminate between cloudy and clear-sky pixels.
To compare the two data sets, a 3 × 3 pixel matrix in the centre of the lakes (cf. lake locations in Table 1) was extracted and the average values of this area was compared for all concurrent overpasses which had a maximum time difference of t = ±15 minutes. Since the MOD_L2/MYD_L2 data sets do not include a water mask, only the largest lakes were investigated to omit the presence of land pixels in the MODIS pixel matrix.

AVHRR data
The NOAA and MetOp AVHRR data from  between 1989 and 2013 at full resolution (1.1 km × 1.1 km at nadir) have been used for this data set. NOAA-15 has not been considered, since the data has been found to be of lower quality than that from the other satellites (e.g. Cao et al., 2001;Wu et al., 2009). MetOp-B was only launched in September 2012 and has not been considered in the current data set. Figure 2 gives an overview of data availability per satellite and month for this period in the archive of the Remote Sensing Research Group at the University of Bern. The current version of the algorithm contains both daytime and night-time data. Up to NOAA-14, the satellites carried the five channel AVHRR/2 sensor (0.6, 0.8, 3.7, 11.0, 12.0 µm), as of NOAA-15, the six channel AVHRR/3 (0.6, 0.8, 1.6, 3.7, 11.0, 12.0 µm) has been flown. For the lake surface water temperature retrieval, channel 4 (T 4 ; ∼ 11 µm) and channel 5 (T 5 ; ∼ 12 µm) are used in the split-window equation (cf. Sect. 3), whereas channel 1 (∼ 0.6 µm; R 0.6 ), channel 2 (∼ 0.8 µm; R 0.8 ), and channel 3 (∼ 3.7 µm; T 3 ) provide additional information for the cloud mask and quality assessment. The pre-processing of the data, including calibration of the visible channels, geocoding, orthorectification, cloud and cloud shadow detection, is described by Hüsler et al. (2011) and Khlopenkov et al. (2010) in more detail. For the retrieval of LSWT, thermal calibration and stability is a key issue and the procedure to convert the data from the raw sensor counts to the final brightness temperature differs from the description in these publications. These differences and associated effects will be discussed in the following section.

Thermal calibration
In contrast to AVHRR channels 1 (R 0.6 ) and 2 (R 0.8 ), which use vicarious calibration (Yu and Wu, 2009) based on stable reflectance targets and inter-satellite comparisons (Heidinger et al., 2010), the thermal channels (3/3B, 4, and 5) offer the possibility of on-board calibration making use of two reference measurements, one against an internal calibration target and the other by measuring into deep space (Goodrum et al., 1999). From this, a high quality output signal would be expected. However, due to errors during the signal transmission and reception, solar contamination during the measuring cycle (at low sun elevation), the sensor signal might be corrupted substantially (e.g. Trishchenko, 2002;Wu et al., 2009). The standard calibration technique of NOAA (Goodrum et al., 1999) does only consider minor fluctuations by averaging the calibration information (sensor signal from measuring into deep space and an internal calibration target, temperature measurement of the internal calibration target) over a few scan cycles. Thus, such corruptions of the signal will partially or fully propagate into the final brightness temperatures and may lead to errors of up to a few Kelvin (Trishchenko, 2002).
Despite the lack of independent information to estimate signal errors, the calibration information (signal from deep space and internal calibration target measurement, as well as the measured temperature of the target) provided during each scan cycle can be analysed for consistency since the nature of the calibration targets should lead to rather stable calibration signals with slow rates of change in time (Trishchenko, 2002). Thus, Trishchenko (2002) proposed a method to better control unwanted fluctuations during the calibration cycle by using a multi-stage filtering technique, which is a combination of robust statistical methods and Fourier transform filtering. For more details the reader is referred to the original publication.
In the proposed data set, we implemented this technique and compared the resulting brightness temperatures for channel 4 (T 4 ) and 5 (T 5 ) as well as the effect onto the final LSWT retrieval. Figure 3 shows the differences in the resulting channel brightness temperatures for NOAA-16 between 2001 and 2004 by comparing the average scene brightness temperatures (from North Cape/Scandinavia to Northern Africa) when using the standard calibration method of NOAA (orig.) and the adjusted calibration (filt.) after Trishchenko (2002). It becomes obvious that periods with signal corruptions frequently occur and the different calibration techniques lead to temperature differences of several Kelvin. It should be noted, however, that the data from NOAA-16 exhibits more of these spikes than other satellites which might be related to the problems with the scan motor this satellite had (cf. NOAA Satellite and Information System, Office of Satellite Operations, NOAA-16 AVHRR Subsystem Summary, http://www.oso.noaa.gov/poesstatus/). Most of the AVHRR-carrying satellites used for this data set show intermittent periods with corrupted signal. Figure 4 demonstrates the effect of using either the original NOAA (orig.) or the adjusted (filt.) calibration methods onto the final LSWT for a sample period in January 2002. The observations highlighted with the orange circle were corrupted and the adjusted method is capable of retaining a reliable signal. The proposed data set has been prepared with the adjusted calibration technique described above.
3 Lake water temperature retrieval

Split-window approach
The top-of-atmosphere spectral radiance L toa measured by a satellite sensor can be formulated in a simplified way as (Anding and Kauth, 1970) (for the sake of brevity wavelength λ is omitted) where ε is the surface emissivity, L sfc (T ) is the emitted blackbody radiance at temperature T , τ is the transmittance of the atmosphere, and L atm is the radiance emitted by the atmospheric constituents.
Water acts almost as a black body in the thermal infrared (TIR) region with ε ∼ 0.99 at 11 µm (Masuda, 2006). If the atmosphere were totally transparent, LSWT would directly be measured as the water-leaving radiation making use of a single TIR measurement. Atmospheric trace gases (mainly water vapour), however, act as absorbers and emitters and alter the water-leaving radiation leading to a combined atmosphere-surface-signal at sensor level which is why atmospheric correction becomes necessary. Anding and Kauth (1970) found that the differences of two neighbouring TIR channels (e.g. T 4 and T 5 ) are proportional to the correction needed which is the basis for the so-called "splitwindow" technique.
Linear and non-linear correction approaches have been proposed in the past (e.g. Walton et al., 1998). Results from several studies (e.g. Li et al., 2001;Oesch et al., 2005) have shown that the linear version of the multi-channel splitwindow equation gives slightly better results (lower biases) than the non-linear equation of Walton et al. (1998). Thus, we use similar to Hulley et al. (2011) the linear multi-channel equation where T 4 and T 5 are the brightness temperatures of AVHRR channel 4 (∼ 11 µm) and 5 (∼ 12 µm), sec( v ) is the secant of the viewing angle v , and coefficients a to d depict the split-window coefficients.  Goodrum et al., 1999) and adjusted (right, Trishchenko, 2002) calibration methods to NOAA-16 AVHRR data in January 2002. RT and NN stand for the RTTOV-10 and NOAA NESDIS based retrieval of lake surface water temperature (LSWT), respectively. Encircled in orange is a case for which the sensor signal has been corrupted during the on-board calibration procedure.
Deriving coefficients a to d can be done either by applying a fit between in situ and satellite observations (in situ-based) or by applying radiative transfer (RT) codes with a set of representative LSWTs to create a database of simulated satellite observations (radiative transfer-based) and fitting these two parameters. Various studies (e.g. Oesch et al., 2005;Politi et al., 2012) have shown that LSWT over Europe can be derived with reasonable accuracy making use of global split-window approaches designed for SST retrievals by comparing in situ observations of ocean water temperature with satellite data. These methods, however, are intended to match the global atmospheric conditions over ocean surfaces which may substantially differ from the continental conditions found over some inland water bodies. Therefore, other studies (e.g. Hulley et al., 2011;MacCallum and Merchant, 2012) have elaborated more accurate methods to retrieve LSWT by utilising radiative transfer codes and atmospheric data from numerical weather prediction (NWP) reanalyses and/or analyses data to better reproduce the atmospheric conditions in such regions and also account for the lake specific altitude. In addition, the latter methods have the advantage of being completely independent of in situ data and therefore are also applicable to situations far away from in situ observations, whereas Politi et al. (2012), for instance, use in situ observations to adapt their retrievals for local effects.
In order to derive coefficients a to d of Eq.
(2) for the proposed data set, we made use of a simulation-based data set for the European lakes in or near the Alps. For this, we used a representative set of LSWTs together with atmospheric profiles (21 pressure levels) of temperature and relative humidity as well as the mean sea level pressure at lake height and 10 m wind speed. These data are available from the European Centre for Medium-range Weather Forecasts (ECMWF) ERA-Interim reanalysis data (Dee et al., 2011) and were fed into the fast Radiative Transfer for TOVS Version 10 (RTTOV-10; Saunders et al., 2012) to create a database of simulated satellite observations. For the LSWT input into RTTOV-10, we used the NWP 2 m-temperature T 2 m ± 10 K with increments of 5 K of every cloud-free satellite overpass. This was done for different regions to the north and south of the Alps. In addition, for each overpass we used eight different values of v (from 0 to 60 • ). Changes in the emissivity of lake water due to varying view geometry or enhanced wind speed are considered in the RTTOV-10 simulations.
Finally, we derived daily split window coefficients for the period of 1989-2013 by applying a robust multiple linear regression analysis between the simulated satellite data and the LSWT including ±180 days of simulations for the calculation of the coefficients for each day. We tried shorter and longer time periods, but found the most accurate results (lowest bias) for this time interval. To account for differences in the atmospheric stratification during daytime and night-time (especially close to the ground), we derived the coefficients for both periods of the day separately. The intrinsic error of the split-window equation using the above mentioned time window (±180 d) is mostly in the range of 0.1 to 0.25 K with a few cases up to 0.4 K.
In contrast to in situ-based split-window approaches, for which the retrieved temperature from a satellite instrument reflects the fitted bulk water temperatures (T bulk ), a RT-based approach will retrieve the water temperature of a layer close to the surface (within a few µm), the so-called skin temperature (T skin ). Depending on the meteorological conditions (e.g. incoming solar radiation, sensible and latent heat flux, wind, etc.; Fairall et al., 1996a) and the depth of the bulk measurement, the temperature difference T = T skin − T bulk can be up to a few Kelvin (e.g. Wilson et al., 2013). Several parameterisations exist to correct for this effect (e.g. Fairall et al., 1996b), but additional input data to describe the meteorological conditions are necessary and are most often not where T is the skin-to-bulk temperature difference and U 10 is the wind speed 10 m above ground taken from the ERA-Interim data set. Although this correction has been derived from ocean data and may not be appropriate under all circumstances for lakes, it reduces the bias by ∼ 0.2 K between LSWT and in situ observations in the study region. Wilson et al. (2013) found a different behaviour of the skin effect on the high altitude Lake Tahoe than those reported in Minnett et al. (2011). The surroundings and valleys of the European Alps, however, experience more moist conditions than Lake Tahoe.

Quality testing
After the retrieval of LSWT, several tests examined on the data ensure that the resulting temperatures are not contaminated with cloudy or land surface pixels. These tests encompass the information generated from the Cloud and surface parameter retrieval (CASPR; Key, 2002), from the cloud shadow mask (Simpson and Stitt, 1998), and additional tests, which have been introduced to enhance the quality of cloud and land detection over (small) inland water bodies. During daytime, water surfaces are generally characterised by low reflectance values in the visible (R 0.6 ) and near-infrared (R 0.8 ) with R 0.8 < R 0.6 caused by higher absorption of radiation for longer wavelength, whereas over land surfaces chlorophyll absorption leads to R 0.6 < R 0.8 . This information can be used for a simple discrimination of land and water pixels during daytime. A threshold of R 0.8 < 0.08 turned out to be appropriate for the study region and removed most part of non-detected cloud pixels. We applied an additional test to identify mixed (land and water) pixels making use of the ratio between the R 0.8 and R 0.6 channel (cf. Schwab et al., 1999). For cloud-free pixels fully covered with water, the R 0.8 /R 0.6 -ratio is typically less then unity. Schwab et al. (1999) applied a threshold of 0.75 to exclude cloudy pixels. In our study region, this value turned out to be too strict, especially for small water bodies the ratio for cloud-free conditions (visual inspection of the data) was often found to be between 0.75 and 1.0. Therefore, we adjust this threshold to 1.0, although this might cause some misclassification over large lakes. The land-water-mask has been derived from a combination of a Moderate Resolution Imaging Spectroradiometer (MODIS) reference image and the Global Selfconsistent, Hierarchical, Highresolution Shoreline Database (GSHHS; Wessel and Smith, 1996). Pixels not fully covered by water are masked out. The LSWT retrieval is restricted to −5 • C ≤ LSWT ≤ 35 • C, which is a meaningful range for the investigated area and colder surfaces are either cloudy, frozen, or caused by sensor errors (Kilpatrick et al., Table 3. Statistical results of the scatter plots in Fig. 5 showing the comparison between in situ observations (OBS) and the regional LSWT-retrieval based on RTTOV-10 (RT-lswt) for various in situ locations (cf. Table 2). Shown are the slope (k) and offset (d) of the linear regression equation; the coefficient of determination as square of the correlation coefficient (R 2 ); the root-mean-square error (RMSE); the bias as the mean temperature difference and standard deviation ( T ± σ T ) between OBS and RT-lswt; and the number of coincident observations (N ).  2001). The local standard deviation σ 3×3 is calculated for each pixel, if at least 2 out of 9 pixels are available in the 3 × 3 pixel matrix. The higher the value of σ 3×3 , the more likely a pixel is contaminated with clouds. Similar to Schneider and Hook (2010), we apply a threshold of σ 3×3 ≤ 1.0 K to the data. As highlighted in other studies (e.g. Oesch et al., 2005;Kilpatrick et al., 2001), increasing v leads to erroneous retrievals due to the increased instant field-of-view (IFOV; with a pixel size of ∼2.2 km along scan direction at v > 45 • ) causing distortions towards the edges of satellite imagery, increased errors in the split-window equation, and longer atmospheric path length of the lake-leaving radiance. Therefore, retrievals with v > 45 • were discarded from the further analysis.
Specular reflection of sunlight -sun glint -over water surfaces leads to highly reflecting regions under particular observation and sun geometries. This effect is mostly harmful in the visible and short wave infra-red region, whereas the influence in the spectral range of AVHRR channel 4 (11 µm) and 5 (12 µm) is almost negligible. In rare cases, sun glint might cause a temperature deviation of a few tenths of a Kelvin. We evaluated the effect by comparing in situ observations and satellite-based water temperatures with and without sun glint. Excluding the sun glint area lowers the root mean Figure 5. Scatter plots with the comparison between in situ observations and the regional LSWT-retrieval based on RTTOV-10 (RT-lswt). The upper and middle panel show the comparison between individual satellites and in situ measurement, whereas in the lower panel all satellites together are shown, since for these lakes only few in situ measurements were available. Shown are the linear regression equation (dash-dotted), the 95 % confidence interval of the regression line, the coefficient of determination as square of the correlation coefficient (R 2 ), the root-mean-square error (RMSE), the Bias as the mean temperature difference and standard deviation (∆T ± σ ∆T ) between OBS and RT-lswt, and the number of coincident observations. In situ locations are indicated in the graph titles and are explained in Table 2. Earth Syst. Sci. Data www.earth-syst-sci-data.net Figure 5. Scatter plots with the comparison between in situ observations and the regional LSWT-retrieval based on RTTOV-10 (RT-lswt). The upper and middle panel show the comparison between individual satellites and in situ measurement, whereas in the lower panel all satellites together are shown, since for these lakes only few in situ measurements were available. Shown are the linear regression equation (dash-dotted); the 95 % confidence interval of the regression line; the coefficient of determination as square of the correlation coefficient (R 2 ); the root-mean-square error (RMSE); the bias as the mean temperature difference and standard deviation ( T ± σ T ) between OBS and RT-lswt; and the number of coincident observations. In situ locations are indicated in the graph titles and are explained in Table 2. square error (RMSE) and bias by 0.1 to 0.2 K, however, the exclusion of these pixels brings along a substantial reduction (> 50 %) in usable LSWTs. For this reason, we decided to keep pixels affected by sun glint.

In situ data
The proposed data set was extensively compared with in situ data from various lakes (cf. Applying the optimised split-window approach based on RTTOV-10 (RT-lswt) reduces the bias of the retrieved LSWT compared to a global SST approach (e.g. Oesch et al., 2005). To demonstrate this effect, we also applied the split-window approach presented in Oesch et al. (2005) which is based on the global NOAA National Environmental Satellite, Data, and Information Service (NESDIS) SST product.
First, we want to focus on the inter-comparison between RT-lswt and in situ observations. Figure 5 and Table 3 show a few results for various satellites and locations by applying RT-lswt. The upper row presents the scatter plots between NOAA-17 (AVHRR/3) LSWT-retrievals and hourly in situ measurements from Lake Geneva (left, cf. EPFL in Table 2), Lake Constance at the location of Lake Überlingen (centre, KONS) and the Harbour of Bregenz (right, BDS1). The centre row exhibits the results for the same in situ locations, but for the data of NOAA-14 carrying the AVHRR/2 sensor. Overall, these plots demonstrate good agreement between in situ (OBS) and satellite (SAT) temperatures with a coefficient of determination (R 2 ) of 0.95 or higher. The bias, calculated here as the mean differences and standard deviation between OBS and SAT, can be found between 0.3 ± 1.1 K at BDS1 and −0.5±1.1 K at EPFL for NOAA-17, and between −0.5 ± 1.2 K (EPFL) and 0.4 ± 1.1 K (BDS1)for NOAA-14, respectively. Negative biases mean that satellite-derived values are higher than in situ observations. EPFL and BDS1 reflect the retrievals for large lakes, whereas the station KONS is located in a fjord-like part of Lake Constance, called Lake Überlingen, which is merely 2 to 3 km wide and about 21 km long (∼ 60 km 2 ). This clearly demonstrates the potential of the AVHRR-based retrieval by using the 1 km resolution data set and even for such a narrow water body reasonable and accurate temperature retrievals are possible.
Even smaller lakes or lakes with more complex topographic conditions can be used for LSWT retrieval, due to precise geocoding and orthorectification. This is demon-strated with the results from Lake Sempach (SPS; 14 km 2 ), Lake Murten (MRS; 23 km 2 ), or Lake Thun (TNS; 48 km 2 ). The scatter plots from the lower row highlight the comparison between in situ profiles taken once per month and satellite-derived temperatures of these three water bodies. As stated in Sect. 2, due to the low observations frequency, only few coincident data points have been found and, therefore, all satellites are put together into a single figure and the statistics were calculated using all data pairs. Again, values of R 2 > 0.94 prove that the satellite retrievals are reasonable, although it has to be stated that due to the limited number of coincident pairs the robustness of the statistics is limited (higher confidence intervals than for hourly observations). The biases (−0.3 ± 1.3 K to 0.3 ± 1.6 K) are moderately higher than for larger lakes. We attribute part of the larger error to the fact that these comparisons are based on observations taken on the same day, even if the time difference was several hours. Although this introduces a larger uncertainty then the comparison with hourly data, this gives at least a hint about the performance of the LSWT retrieval. If the comparison was restricted to a smaller time difference, this would have resulted in almost no concurrent data points. The results from the narrow Lake Überlingen with a similar size and shape like these three small lakes (cf. KONS in Fig. 5) support the assumption that the larger error might be caused by the potentially larger time difference between the observations. To further assess the impact of such a time difference, we used the hourly resolved data set at KONS to calculate the spread between daily minimum and maximum temperature at 0.5 m depth. The resulting range is typically between 1 and 3 Kelvin and can be up to 4 to 5 Kelvin (cf. Fig. 6a), especially during summer months and is mostly driven by the radiation budget and the meteorological conditions (Hook et al., 2003;Minnett et al., 2011). Periods with , the 95 % confidence interval of the regression line, the coefficient of determination as square of the correlation coefficient (R 2 ), the root-mean-square error (RMSE), the Bias as the mean temperature difference and standard deviation (∆T ± σ ∆T ) between MODIS and RT-lswt, and the number of coincident observations (within ±15 minutes). The position of the lakes is shown in Table 1 Table 1 and Fig. 1. large amounts of incoming solar radiation and/or calm situations will cause large diurnal temperature variations, whereas cloudy (low incoming solar radiation) or windy days will redistribute the energy more evenly in the uppermost layers leading to a lower spread (Oesch et al., 2005). Therefore, the uncertainty introduced into the inter-comparison by using the data with time differences of several hours is larger than for the hourly resolved data. In addition, the amount of data pairs is biased toward summer observations, because periods with persistent coverage of low level clouds can frequently be found during winter months. Comparing night-time data would generally lead to lower differences (e.g. Wilson et al., 2013), however, the monthly profiles were all taken during daytime. Not only do differences in the observational times introduce uncertainty to the analysis, but also physical reasons behind the measurement techniques. Whereas in situ measurements are often carried out in a depth of 0.5 to 1.0 m, satellite sensors observe a sub-micron (skin) layer at the water surface. Although we did not have in situ profiles from the water surface (skin layer) to deeper layers available to exactly quantify the resulting difference, some information about the Table 4. Statistical results of the comparison between Terra/Aqua MODIS LST/E (MOD11_L2/MYD11_L2) and the regional AVHRR-based LSWT-retrieval for various lakes (cf. Table 1). Shown are the slope (k) and offset (d) of the linear regression equation; the coefficient of determination as square of the correlation coefficient (R 2 ); the root-mean-square error (RMSE); the bias as the mean temperature difference and standard deviation ( T ± σ T ) between AVHRR and MODIS; and the number of coincident observations (N). potential impact can be seen from Fig. 6b. The curve depicts the instantaneous temperature differences between 0.5 and 0.9 m depth at KONS for the period 2004-2007, which frequently exceed values of 0.5 K, especially during summer months. The differences are largest for calm and cloud-free situations with high incoming solar radiation. Thus, we applied a skin-to-bulk correction (Minnett et al., 2011), as described in Sect. 3, which lowers the bias (∼ 0.2 K) between in situ and satellite-based temperature in order to adjust the satellite-based retrieval towards the bulk temperatures. A third factor of uncertainty arises from the fact that some of the in situ measurements used for the inter-comparison are close to or directly captured at the shore of the lakes. Although the corresponding satellite pixels were extracted as near as possible, these two locations might be separated a few kilometres from each other, depending on the complexity of the shoreline structure. To estimate the impact of this uncertainty, several measurements in a specific region would be necessary. However, such a data set was not available for this study.

MODIS data
To overcome some of the uncertainties in the comparison between in situ observations and satellite retrieval, we also compared the proposed data set with the MOD11_L2 (Terra) and MYD11_L2 (Aqua) LST/E data set. Figure 7 displays all concurrent (±15 min) and valid satellite observations, Table 4 exhibits the corresponding statistics. Both retrievals are in good agreement with RMSEs between 0.6 and 0.9 K and biases between −0.5 and 0.2 K. AVHRR-based LSWTs tend to be slightly warmer than MODIS-based temperatures with a slope between 1.0 and 1.06. The inter-comparison between in situ measurements and AVHRR does not indicate that AVHRR-based temperatures are generally warmer during summer time.
The remaining differences between AVHRR and MODIS could be a slightly different performance of the split-window algorithm (intrinsic error), differences due to calibration accuracy, or different performance of the automated cloud classification. For the latter, we found several scenes by visual inspection for which clouds were not properly detected in the MODIS LST/E product during night-time. These scenes were excluded in the comparison. Likewise, the AVHRR cloud detection scheme also misclassified cloudy pixels as clear-sky in some cases. These scenes were not excluded from the comparison.

Multi-satellite time series
To generate a time series of LSWT for the period between 1989 and 2013, the data from several AVHRRs, flown on various NOAA and MetOp satellites, is necessary. As a consequence, the stability of the retrieval (consistency of the resulting data) is crucial. In addition, especially the early satellites  experienced a strong orbital drift (Ignatov et al., 2004). For the afternoon platforms like NOAA-11 and NOAA-14 this could lead to an artificial trend, as over the lifespan of the satellites the local time of observation tended to shift toward late afternoon or evening. Both effects will be discussed in the following paragraphs. Table 5. Statistical parameters for all satellites at Lake Constance (BDS1 and BDS2) and Lake Geneva (EPFL) by using the regional LSWTretrieval based on RTTOV-10 (RT-lswt) and NOAA NESDIS (NN-lswt;Oesch et al., 2005). Shown are the coefficient of determination as square of the correlation coefficient (R 2 ); the root-mean-square error (RMSE); the bias as the mean temperature difference and standard deviation T ± σ T between OBS and LSWT; and the number of coincident observations. The comparison for NOAA-11 and -12 at EPFL is missing due to the lack of in situ data for that time. To evaluate the general cross-platform stability of the retrieval, Table 5 shows an overview of the inter-comparison statistics for the regional adapted retrieval (RT-lswt) for each satellite. In addition, the results for the global approach (based on NOAA NESDIS, NN-lswt; Oesch et al., 2005) are listed as well to enable the comparison between both methods. Although the results for Lake Constance at BDS1 and BDS2 are rather similar, the regional method RT-lswt generally outperforms the global approach NN-lswt indicated by lower biases and RMSEs (this also holds true for all other lakes and locations). The drop of the RMSE at Lake Constance for the comparison between BDS1 and BDS2 can be attributed to the change from daily (BDS1) to hourly (BDS2) in situ data. Considering the EPFL data comparison for , and MetOp-A, for which period time sampling and location of the in situ location have not changed, one can see that across the different satellites the LSWT retrieval is stable within ∼ 0.3 K (RMSE and bias). The same comparison for NN-lswt exhibits a stability within ∼ 0.6 K. One problem with the NOAA NESDIS approach is that the split-window coefficients have not been calculated for all satellites in a consistent manner resulting in uncertainties of the final LSWTs.
Analysing only the scatter plots can mask the effect of orbital drift, since concurrent observations with the smallest time difference are ideally compared with each other (except for daily or monthly data). Consequently, a resulting time series will be more useful to evaluate whether or not artificial warming or cooling trends can be detected in the satellite-retrieved surface water temperatures. For this reason, a monthly mean temperature time series out of the data from BDS1 and BDS2, which covers the period from 1989 to 2009, was created and compared with the satellite time series for the same location and time period. Similar to Schneider and Hook (2010), we also used the robust locally weighted regression smoothing (LOWESS; Cleveland, 1979) to overcome the issue of sampling biases due to data gaps in the LSWT time series caused by cloud cover. Figure 8 displays the comparison of the monthly means at Lake Constance between in situ measurements (BDS1 and BDS2) and the LOWESS filtered satellite time series. The agreement is very good with R 2 = 0.99, the RMSE and bias are 0.9 K and 0.5 ± 0.7 K, respectively. We then arranged the monthly means into seasonal means computing the average of January-February-March (winter), April-May-June (spring), July-August-September (summer), October-November-December (autumn) following the procedure of Schneider and Hook (2010). Finally, the trends were estimated by an ordinary linear regression analysis. Figure 9 shows the seasonal means and linear trends for Lake Constance, solid lines indicate the satellite-derived seasonal averages and linear trends, the dash-dotted lines the ones derived from the in situ data. The observed differences between in situ and satellite trends are similar to the differences presented by Schneider and Hook (2010). The observed trends between +0.01 • C yr −1 (summer) and +0.12 • C yr −1 (spring) are slightly lower then the ones from Schneider and Hook (2010), but the presented time series is shorter and these trends fit well in the range of the study of Adrian et al. (2009), who found a temperature increase of 0.054 • C yr −1 based on in situ measurements over the last 30 years. According to these results, the drifting orbits of the satellites do not have an effect onto the final LSWT data set.

Summary and conclusions
The radiative transfer-based LSWT retrieval presented herein is a state-of-the-art method to derive lake water temperature from daytime and night-time AVHRR sensor data independently of in situ measurements. Similar to other studies, we have shown that such an approach will lead to more accurate LSWT retrievals than with a method designed for global SST retrieval. Initially, the pre-processing of AVHRR data is an important step and, although the thermal channels of AVHRR feature on-board thermal calibration, special treatment of the sensor signal is needed to guarantee a good quality of observed brightness temperatures.
The inter-comparison with in situ observations exhibits biases in the range of −0.5 to 0.6 K and RMSEs of 1.0 to 1.6 K. Potential error sources are the intrinsic error of the split-window equation in use (0.1-0.4 K), uncertainties in the spatio-temporal match-up between satellite and in situ measurements, and undetected cloud pixels (especially thin cirrus clouds). Results for small (> 14 km 2 ) and medium-sized lakes are similar to large lakes like Lake Constance and Lake Geneva which also highlights the need for precise geocoding and orthorectification of AVHRR data. A comparison with the Terra/Aqua MODIS LST/E product shows good agree-ment between both data sets with RMSEs between 0.6 and 0.9 K and biases between −0.5 and 0.2 K, respectively.
Creating a time series of data from several satellites requires good cross-platform consistency. The stability of the LSWT retrieval is found in the order of ∼ 0.3 K. Moreover, orbital drift, which was especially observed with the early satellites , could potentially introduce artificial warming or cooling trends. However, we were able to demonstrate with data from Lake Constance that no artificial trend due to orbital drift is visible in the data if both daytime and night-time data are used together and that the resulting trends are similar to the one observed in other studies. Thus, the proposed data set could help to analyse the warming trends of the lakes in or near the Alps over the past 25 years and how lakes with different morphological characteristics react to climate change.
The inter-comparison with in situ data demonstrated that AVHRR data not only provide spatially consistent information on LSWT, but also enable the extension of in situ time series back in time. Therefore, this data set can be seen as an important contribution to climate observations; e.g. many lakes in Switzerland are only monitored on an irregular basis.
The current version of the data set is available for all major lakes in or near the European Alps with sizes between 14 and 580 km 2 for the time from 1989 to 2013. Further improvements to the data set will be the expansion back in time (early 1980s) and spatially to the main water bodies in the whole of Europe since AVHRR is the only sensor offering such long time series. The influence of volcanic aerosols to the retrieval will also be evaluated.