Journal cover Journal topic
Earth System Science Data The data publishing journal
Journal topic
Earth Syst. Sci. Data, 11, 493–514, 2019
https://doi.org/10.5194/essd-11-493-2019
Earth Syst. Sci. Data, 11, 493–514, 2019
https://doi.org/10.5194/essd-11-493-2019

16 Apr 2019

16 Apr 2019

# Theia Snow collection: high-resolution operational snow cover maps from Sentinel-2 and Landsat-8 data

Theia Snow collection: high-resolution operational snow cover maps from Sentinel-2 and Landsat-8 data
Simon Gascoin1, Manuel Grizonnet2, Marine Bouchet1, Germain Salgues2,3, and Olivier Hagolle1,2 Simon Gascoin et al.
• 1CESBIO, Université de Toulouse, CNRS/CNES/IRD/INRA/UPS, Toulouse, France
• 2CNES, Toulouse, France
• 3Magellium, Toulouse, France

Correspondence: Simon Gascoin (simon.gascoin@cesbio.cnes.fr)

Abstract

The Theia Snow collection routinely provides high-resolution maps of the snow-covered area from Sentinel-2 and Landsat-8 observations. The collection covers selected areas worldwide, including the main mountain regions in western Europe (e.g. Alps, Pyrenees) and the High Atlas in Morocco. Each product of the Theia Snow collection contains four classes: snow, no snow, cloud and no data. We present the algorithm to generate the snow products and provide an evaluation of the accuracy of Sentinel-2 snow products using in situ snow depth measurements, higher-resolution snow maps and visual control. The results suggest that the snow is accurately detected in the Theia snow collection and that the snow detection is more accurate than the Sen2Cor outputs (ESA level 2 product). An issue that should be addressed in a future release is the occurrence of false snow detection in some large clouds. The snow maps are currently produced and freely distributed on average 5 d after the image acquisition as raster and vector files via the Theia portal (https://doi.org/10.24400/329360/F7Q52MNK).

1 Introduction

The snow cover is an important driver of many ecological, climatic and hydrological processes in mountain regions and in high latitude areas. In these regions, in situ observations are generally insufficient to characterize the high spatial variability of the snowpack properties. The snow cover was included as one of the 50 essential climate variables (ECVs) to be monitored by satellite remote sensing by the Global Observing System for Climate (GCOS) in accordance with the Committee on Earth Observation Satellites (CEOS) agencies. ECVs are intended to support the work of the United Nations Framework Convention on Climate Change and the Intergovernmental Panel on Climate Change.

The snow cover is not a variable sensu stricto, but an object which can be characterized through many variables, including snow-covered area (SCA), fractional area (fSCA), albedo, liquid water content, snow depth and snow water equivalent . An international survey by the Cryoland consortium (http://cryoland.enveo.at/consortium, last access: 12 April 2019) shed light on the user requirements for snow services based on satellite remote sensing. SCA and fSCA products were ranked as the most important by the respondents . The survey also indicated the need for operational products provided on a daily basis, with latency times shorter than 12 h. High-resolution data down to 50 m resolution were sought by road and avalanches authorities . The respondents requested regional products, e.g. on the scale of entire mountain ranges like the Alps or even the whole of Europe, and preferred the Universal Transverse Mercator (UTM) projection . At the national level in France there is also a need for operational high-resolution snow-covered area maps as revealed by the recent roadmap for satellite applications issued by the French Government (Plan d'applications satellitaires 2018Commissariat général au développement durable2018). Based on a wide panel of end users, this plan selected the monitoring of the snow-covered area in French national parks as a one of the key actions which should be achieved using Earth observation satellites in the near future (2018–2022).

Operational SCA maps have been generated from satellite observations since the 1960s (Ramsay1998). Current SCA and fSCA products are mostly derived from MODIS data , but their spatial resolution (1 km to 250 m) can be too coarse for various applications, especially in mountain regions where the snow cover properties vary on scales of 10 to 100 m (Blöschl1999). High-resolution (30 m) snow cover maps can be generated from Landsat images but the low temporal revisit of the Landsat mission (16 d) is an important limitation for snow cover monitoring, especially considering that the cloud probability can exceed 50 % in temperate mountains . Since 2017, with the launch of Sentinel-2B, the Copernicus Sentinel-2 mission offers the unique opportunity to map the snow cover extent at 20 m resolution with a revisit time of 5 d (cloud permitting) . The combination of Sentinel-2 and Landsat-8 data provides the opportunity for even more frequent observations of the snow cover with a global median average revisit interval of 2.9 d .

The principles of snow detection from multispectral optical imagery is well established since the pioneering work of with the Landsat Thematic Mapper. Today, the challenge for scientists and end users is rather to cope with the large amount of data that is generated by a mission like Sentinel-2. The generation of a single snow map from a Sentinel-2 level-1C tiled product (monodate orthorectified image expressed in top-of-atmosphere reflectance) involves downloading a 700+ Mb zip file (once uncompressed, the product contains 12 folders and 108 files, including 15 raster files and 13 XML metadata files). Since March 2018, ESA began the operational processing of level 2A (L2A) products (monodate orthorectified images expressed in surface reflectance, including a cloud mask). Each ESA L2A product also includes a snow cover mask. However, the size of a single L2A product before unzipping can exceed 1 Gb (15 folders, 137 files). In addition, the quality of the ESA L2A snow mask can be improved for two reasons: (i) ESA's L2A algorithm is a general-purpose algorithm, which was not optimized for snow detection, and (ii) because ESA's L2A processor treats each image independently (i.e. mono-date approach, Louis et al.2016), the output cloud mask has a lower accuracy than a cloud mask generated by a multi-temporal algorithm .

In this article we introduce the Theia Snow collection, a new collection of snow cover maps, which are derived from Sentinel-2 at 20 m resolution in an operational context. The Theia Snow collection was recently upgraded to also provide snow cover maps at 30 m resolution from Landsat-8 using the same algorithm. The data are currently being produced and freely distributed via the Theia land data centre.1 The snow retrieval algorithm is based on the Normalized Difference Snow Index (Dozier1989) but also uses a digital elevation model to better constrain the snow detection. We first describe the algorithm (Sect. 2) and its implementation in the let-it-snow processor (LIS, Sect. 2.4.4). Then, we provide a detailed description of the product characteristics (coverage, period, format, etc.; Sect. 3). In Sect. 4, we present an evaluation of Theia snow products using in situ snow depth measurements, very high-resolution clear-sky snow maps and visual control. We finally discuss the main limitations and potential applications of the Theia Snow collection.

2 Algorithm

## 2.1 Scope

An algorithm was designed to determine the snow presence or absence from Sentinel-2 and Landsat-8 observations outside areas of dense forest with the following requirements:

• It should be scalable, i.e. allow the processing of large areas (104 km2) with a reasonable computation cost (typically less than 1 h for a single product).

• It should be robust to seasonal and spatial variability of the snow cover and land surface properties.

• It should maximize the number of pixels that are classified as snow or no snow.

• It is preferable to falsely classify a pixel as cloud than falsely classify a pixel as snow or no snow.

## 2.2 Input

The algorithm works with multispectral remotely sensed images, which include at least a channel in the visible part of the spectrum and a channel near 1.5 µm (referred to as shortwave-infrared or SWIR). It takes the following as input:

• a L2A product, including

• the cloud and cloud shadow mask (referred to as an “L2A cloud mask” in the following),

• the green, red and SWIR bands from the flat-surface reflectance product. These images are corrected for atmospheric and terrain slope effects . The slope correction is important in mountain regions since it enables us to use the same detection thresholds whatever the sun-slope geometry.

• A digital elevation model (DEM).

## 2.3 Pre-processing

The red and green bands are resampled with the cubic method to a pixel size of 20 m by 20 m to match the resolution of the SWIR band. The DEM is generated from the Shuttle Radar Topography Mission seamless DEM by cubic spline resampling to the same 20 m resolution grid.

## 2.4 Algorithm description

### 2.4.1 Snow detection

The snow detection is based on the Normalized Difference Snow Index (NDSI,  Dozier1989) and the reflectance in the red band. The NDSI is defined as

$\begin{array}{}\text{(1)}& \text{NDSI}=\frac{{\mathit{\rho }}_{\mathrm{green}}-{\mathit{\rho }}_{\mathrm{SWIR}}}{{\mathit{\rho }}_{\mathrm{green}}+{\mathit{\rho }}_{\mathrm{SWIR}}},\end{array}$

where ρgreen (or ρSWIR) is the slope-corrected surface reflectance in the green band (or SWIR at 1.6 µm). The NDSI expresses the fact that only snow surfaces are very bright in the visible and very dark in the shortwave infrared. Turbid water surfaces like some lakes or rivers may also have a high NDSI value; hence a additional criterion on the red reflectance is used to avoid false snow detection in these areas. A cloud-free pixel is classified as snow if the following condition is true:

$\begin{array}{}\text{(2)}& \left(\text{NDSI}>{n}_{i}\right)\phantom{\rule{0.25em}{0ex}}\text{and}\phantom{\rule{0.25em}{0ex}}\left({\mathit{\rho }}_{\mathrm{red}}>{r}_{i}\right),\end{array}$

where ni and ri are two parameter pairs (i.e. $i\in \mathit{\left\{}\mathrm{1},\mathrm{2}\mathit{\right\}}$) as explained below. Otherwise the pixel is marked as “no snow”. The values of the parameters are provided in Table 1.

### 2.4.2 Snow line elevation

The algorithm works in two passes. The snow detection (Sect. 2.4.1) is performed a first time using parameters n1 and r1, which are set in such way as to minimize the number of false snow detections (Table 1). This first pass enables us to estimate the minimum elevation of the snow cover zs, above which a second pass will be performed with less conservative parameters values n2 and r2 (Table 1). An example of the evolution of the snow line elevation over the tile 31TCH in the Pyrenees is shown in Fig. 1 and the corresponding snow maps are shown in Fig. 1. Based on regional analyses of the snow line elevation variability in mountain ranges, (e.g. Gascoin et al.2015; Krajčí et al.2016), we assume that large altitudinal variations of the snow line elevation are not likely on the scale of a tile of 110 km by 110 km. Therefore, the minimum snow elevation zs is considered uniform on the scale of the processed image.

Figure 1Time series of snow and cloud cover area distribution by elevation band over tile 31TCH (tile location in Fig. 5). The dashed red line indicates the position of zs as determined by the LIS algorithm. The corresponding snow maps are shown in Fig. 2.

Figure 2Time series of snow maps over tile 31TCH (tile location in Fig. 5). The corresponding charts of snow and cloud cover area distribution by elevation band are shown in Fig. 1.

Before proceeding to pass 2, the total snow fraction in the image after pass 1 is computed. If this snow fraction is below ft (Table 1), then pass 2 is skipped, as the sample of snow pixels is not considered statistically significant for determining the snow line elevation. Otherwise, the pass 2 is activated. For that purpose, the DEM is used to segment the image in elevation bands with a fixed height of dz (Table 1). Then, the fraction of the cloud-free area that is covered by snow in each band (after pass 1) is computed. The algorithm finds the lowest elevation band b at which the snow cover fraction is greater than a given fraction fs (Table 1). The value of zs is the lower edge of the elevation band that is two elevation bands below band b.

The cloud mask in the input L2A product is conservative because (i) it is computed at a coarser resolution and (ii) it was developed to remove surface reflectance variations due to cloud contamination. However, the scattering of some thin clouds is low in the SWIR, green and red bands, which are used for snow detection (Sect. 2.4.1). Hence, the human eye can see the snow (or the absence thereof) through these semi-transparent clouds in a colour composite. In addition, the L2A cloud mask tends to falsely classify the edges of the snow cover as cloud. Therefore, the algorithm includes some additional steps to recover those pixels from the L2A cloud mask and reclassify them as snow or no snow. This step is important because it substantially increases the number of observations as specified in Sect. 2.1.

A pixel from the L2A cloud mask cannot be reclassified as snow or no snow if any of these conditions are satisfied:

• it is coded as “cloud shadow” in the L2A cloud mask;

• it is coded as “high-altitude cloud” (or “cirrus”) in the L2A cloud mask;

• it is not a “dark cloud” (see below).

The cloud shadows are excluded because the signal-to-noise ratio can be very low in these areas. The high clouds are excluded because they can have a similar spectral signature as the snow cover, i.e. a high reflectance in the visible and a low reflectance in the SWIR. This type of cloud are detected in Sentinel-2 and Landsat-8 L2A products based on the spectral band centred on the 1.38 µm wavelength .

We select only the dark clouds as possible reclassification areas, because the NDSI test is robust to the snow–cloud confusion in this case. The dark clouds are defined using a threshold in the red band after downsampling the red band by a factor rf using the bilinear method. This resampling is applied to smooth locally anomalous pixels, following the MAJA algorithm, which performs the cloud detection at 240 m for L2A products . Therefore, if a (non-shadow, non-high-cloud) cloud pixel has a red reflectance at this coarser resolution that is lower than rD (Table 1), then it is temporarily removed from the cloud mask and proceeds to the pass 1 snow detection. The new cloud mask at this stage is the pass 1 cloud mask in Fig. 3.

After passing the passes 1 and 2, some former cloud pixels (pixels that were originally marked as cloud in the L2A cloud mask) will not be reclassified as snow because they did not fulfil the conditions of Eq. (2). These pixels are flagged as cloud in the final snow product if they have a reflectance in the red that is greater than rB (Table 1). Otherwise they are classified as no snow. Here, the red band at 20 m resolution is used to allow an accurate detection of the snow-free areas near the snow cover edges that the L2A product tend to falsely mark as cloud. The resulting cloud mask is the pass 2 cloud mask in Fig. 3.

Figure 3Flow chart of the snow detection algorithm. Table 1 gives the description and value of the algorithm parameters (written in red in this chart). MUSCATE is the scheduler which manages the L2A production for Theia.

Table 1LIS algorithm parameters description and default values for Theia's L2A Sentinel-2 (and Landsat-8) products.

### 2.4.4 Implementation

The algorithm was implemented in an open-source processor called let-it-snow (LIS). LIS is written in Python 2.7 and C++ and relies on the Orfeo ToolBox and GDAL libraries. GDAL is used for input and output operations, image resampling and metadata access . Orfeo Toolbox enables us to make the computations with good performances under memory constraints (i.e. the amount of available memory can be predefined), which is critical for operational production . LIS takes as input a digital elevation model and a Theia L2A product from Sentinel-2 MSI, Landsat-8 OLI, SPOT-4 HRVIR or SPOT-5 HRG . The spatial resolution and central wavelength of the spectral bands used by LIS for each sensor are given in Table 2. It is also compatible with Landsat-8 USGS level 2 products and Sentinel-2 ESA's Sen2Cor level 2A products .

The parameter values were set based on previous studies with Landsat and by visually checking many snow maps and snow fraction histograms. From this set of a priori parameter values, only r2 was adjusted based on the analysis of a first batch of products to enhance the snow detection on shaded slopes.

Table 2Spatial resolution and central wavelength of the spectral bands used by LIS for each compatible sensor.

3 Data description

The Theia Snow collection data can be freely accessed using a web browser by connecting to http://theia.cnes.fr (last access: 12 April 2019) or using this free command-line tool: https://github.com/olivierhagolle/theia_download (last access: 12 April 2019). The user must first create an account in Theia to have the permission to download the data.

The Theia Snow collection is organized following the tiling system of Sentinel-2 (Fig. 4). Each tile covers a square area of 110 km by 110 km in the UTM coordinate system. There are currently 127 tiles in the Theia Snow collection, mostly over the mountain ranges of western continental Europe (France, Spain, Switzerland, Italy, western Austria). The snow products are also provided for southern Quebec in Canada, the Issyk-Kul lake catchment in Kyrgyzstan and the Kerguelen Islands. These extra-European sites were selected to support specific ongoing projects in relation with Theia. An opportunistic tile is produced in central Nevada, USA, because this tile is already available as a level-2A version for calibration purposes. The Theia Snow collection covers a number of mountain ranges with seasonal snow cover (Table 3).

Figure 4Coverage of the Theia Snow collection tiles.

Table 3Non-exhaustive list of mountain ranges covered by the Theia Snow collection and corresponding tiles. The tiles marked with an asterisk were made available before the start of the operational production (see Sect. 3).

* These tiles can be viewed on an interactive map on this page: https://theia.cnes.fr/atdistrib/rocket/\#/sites (last access: 12 April 2019).

Figure 5Map of the first two products in the Theia Snow collection and comparison with MOD10A1.005 snow product of the same day. The map background is the colourized version of the EU DEM.

The Theia Snow collection products are available for 127 tiles, starting on December 2017. At the time of writing there were over 17 000 available products. A set of 1300 products tagged as version 1.0 are available between November 2015 and June 2017 for a subset of 15 tiles (Table 3). These products were produced in pre-operational using a different configuration of LIS, which underestimated the snow-covered area on shaded slopes. All other products were generated with the same configuration as presented in this article. The products after December 2017 can still have a different version tag because of changing versions in the upstream L2A product. However, the different L2A versions have a low impact on the snow products. It is expected that the cloud mask has improved gradually with time, due to algorithms enhancements in the successive L2A versions and also because of the increasing revisit frequency of the Sentinel-2 mission (the full 5 d revisit became nominal above all land masses in March 2018). The different versions of the Theia Snow collection are listed and updated on this page: http://www.cesbio.ups-tlse.fr/multitemp/?page_id=13180 (last access: 12 April 2019).

The snow products are currently routinely generated at CNES using the MUSCATE scheduler, which also manages the L2A production for Theia. MUSCATE performs a series of operations in a high-performance computing environment: (i) download the level 1C product as soon it is available from the French mirror of the Sentinel products (PEPS, Plateforme d'Exploitation des Produits Sentinel), (ii) process and distribute the L2A product and (iii) process and distribute the snow product. This workflow is optimized to reduce the lag time between the acquisition and the distribution of the products. As an example, in April 2018, nearly 80 % of the snow products were made available online 5 d or less after the date of the Sentinel-2 acquisition (Fig. 6). It is expected that this time lag should reach a median value of 2 d thanks to the increasing performance of MUSCATE.

Figure 6Cumulative probability of the time lags between the Sentinel-2 acquisition and the distribution date of the corresponding snow product, computed for all snow products with acquisition date in April 2018.

Each snow product is distributed as a zipped file which contains raster and vector files. The file base name is the product ID, i.e. productId.zip, where the productId is a character string which is constructed as follows:

$\begin{array}{ll}& \mathtt{\text{productId = Satellite_AcquisitionDate_}}\\ & \mathtt{\text{_L2B-SNOW_TileName_D_Version}}.\end{array}$

For example, the eastern snow product in Fig. 5 was extracted from a file named

$\begin{array}{ll}& \mathtt{\text{SENTINEL2A_20151130-105641-486_L2B-}}\\ & \mathtt{\text{-SNOW_T31TDH_D_V1-0.zip}},\end{array}$

which indicates that this product was generated from a Sentinel-2A image acquired on 30 November 2015 at 10:56:41.486 UTC and covers tile 31TDH. The version of the product is 1.0 (see above).

The size of each zipped product varies depending on the complexity of the snow and cloud mask but generally ranges between 10 and 100 Mb. After extracting the product, the following files are created:

• productID_SNW_R2.tif is an 8 bit single-band GeoTiff image which provides the following classification for each pixel:

• 0: no snow,

• 100: snow,

• 205: cloud including cloud shadow,

• 254: no data.

• productID_SNW_R2.{shp,shx,dbf,prj} is a vector file in the ESRI shapefile format, which contains a vectorized version of the productID_SNW_R2.tif using the polygon geometry.

• productID_CMP_R2.tif is an 8 bit RGB GeoTiff image which shows the outlines of the snow mask (green lines) and cloud mask (magenta lines) on a false-colour composite of the input L2A image (RGB image made with SWIR, red and green bands). This composition was chosen because RGB composites using the SWIR-band image are useful for discriminating the snow cover from the snow-free areas and from the clouds . Hence, this image mainly allows the user to visually check the consistency of the snow and cloud mask.

• productID_MTD_ALL.xml is a metadata file.

• productID_QKL_ALL.jpg is a quick picture of the productID_SNW_R2.tif image made using this colour map (8 bit RGB code in parentheses): snow in cyan (0,255,255), cloud in white (255,255,255), no snow in grey (119,119,119) and no data in black (0,0,0) (see an example in Fig. 7).

• DATA/productID_HIS_R2.txt (since product version 1.4) is a text file indicating the cloud, snow and no-snow fraction area by elevation bins (Sect. 2).

• MASKS/productID_EXS_R2.tif (in product version 1.0, it is located in the root of the zipped file) is an 8 bit GeoTiff image for expert evaluation, where each bit indicates the value of an intermediate computation mask (Sect. 2):

• bit 1: snow (pass 1),

• bit 2: snow (pass 2),

• bit 3: clouds (pass 1),

• bit 4: clouds (pass 2),

• bit 5: clouds (initial all cloud).

For most applications, only files with the SNW suffix should be useful.

The shapefile and raster images are referenced in WGS-84 UTM system with the zone number given by the first two digits of the tile name. All raster files have a 20 m resolution. Note that no-snow pixels can be any surface, including water surface. No data pixels are the pixels outside of the acquisition segment.

Figure 7Example of a snow map (SNW file) and its corresponding colour composite (CMP file). In the snow map the snow is represented in cyan, no snow in grey and cloud in white. In the RGB composite, the snow-covered areas are delineated in magenta, while the clouds are delineated in green (Theia snow product of tile 32TNS on 27 May 2017).

4 Evaluation

In this section we present an evaluation of the Theia Snow collection. We first present the methods developed to do this evaluation and then the results. The evaluation was based only on Sentinel-2 products, because the production for Landsat-8 had just started at the time of writing and thus the Landsat-8 products represented a very minor fraction of the Theia Snow collection.

## 4.1 Method

### 4.1.1 Comparison with in situ snow depth measurements

We collected all available snow depth measurements within tiles 31TGM, 31TGL, 31TGK, 32TLS, 32TLR, 32TLQ, 31TDH, 31TCH, 30TYN and 30TNX from the Météo-France database between 1 September 2017 and 31 August 2018. These 10 tiles cover most of the French Alps and Pyrenees (Fig. 8). We obtained 120 snow depth time series with at least one measurement. We gathered all available Sentinel-2 snow products for these tiles over the same period, i.e. a total of 1134 products. Then, we extracted the pixel values at the location of each snow measurement station for all dates. When a station is located in two tiles we only kept the data from the first tile. We selected the snow depth measurements which were collected on the same day of a Theia snow product and discarded the measurements corresponding to a cloud detection in the Theia snow product. The snow depth measurements were converted to snow presence and absence using a threshold of SD0=0 m. This eventually allowed us to compute a confusion matrix between a set of snow presence and absence data from in situ measurement and a set of simultaneous snow presence and absence data from the Theia Snow collection across the French Alps and Pyrenees. However, previous studies comparing MODIS binary snow products with ground measurements showed that a value of 0 m may not be optimal ; therefore the sensitivity of the results to SD0 was tested by recomputing the confusion matrix for 1 cm increments of SD0 from 0 to 1 m.

Figure 8Map showing the location of the stations in the réseau nivo-météorologique (Météo-France snow observation network) used for this study (source: Météo-France) and the corresponding Sentinel-2 tiles.

### 4.1.2 Comparison with snow maps of higher spatial resolution

We used SPOT-6 and SPOT-7 images with a resolution of 1.5 m in panchromatic and 6 m in multispectral (blue, green, red, near-infrared) to evaluate the accuracy of the snow detection in Theia snow products. For this comparison, the LIS processor was run with its current operational configuration (i.e. the configuration used to routinely produce the Theia snow products since December 2017 Table 1). SPOT-6 and SPOT-7 sensors acquire images with a large radiometric depth coded in 16 bits, which enables us to identify surface features in alpine regions with dark shaded slopes and bright snow surfaces. We searched the catalogue of the Kalideos database for available SPOT images that could match a cloud-free (or nearly cloud-free) Theia snow map over the French Alps region (https://alpes.kalideos.fr, last access: 12 April 2019). We identified six pairs of images in 2016 and 2017 with a maximum time lag of 6 d (Table 4, Fig. 9). The SPOT images were obtained as orthorectified products from the Kalideos database. The SPOT images were used to generate reference snow maps by a pixel-based supervised classification. For each SPOT image, we manually drew about 15 polygons of homogeneous of snow and no-snow surfaces using the panchromatic image. The few cloud pixels in the SPOT images were also manually delineated with a large buffer to restrict the classification to strictly cloud-free areas. Colour composites made from the multispectral images were also used as a visual support to help the snow and cloud classification. These polygons were then used to extract training samples from the SPOT products. The samples were taken from both the panchromatic and the multispectral images. We tested a number of classification algorithms by splitting the polygons in calibration and validation data sets. The detail of this analysis is not shown here, but it allowed us to conclude that the random forest classifier was the best choice for our purpose, given its good accuracy and its numerical efficiency (Bouchet2018).

In addition, we generated the snow maps from the same Sentinel-2 level 1 products using ESA's Sen2Cor version 2.5 . The Sen2Cor output snow masks are expected to be nearly identical to the ones included in the distributed L2A products by ESA, since the same processor is used. We have generated the L2A products ourselves with Sen2Cor for this study, because the L2A products were not yet available when we made this analysis (ESA started operational production of Sentinel-2 data at level 2A in 2017). The Theia and Sen2Cor snow maps were compared to the reference snow maps from SPOT images using standard metrics derived from the confusion matrix (accuracy, F1 score, kappa, false-detection rate and false-negative rate).

Table 4Pairs of Sentinel-2 and SPOT-6/7 products used for the evaluation of the snow detection accuracy (see also Fig. 9).

Figure 9Location of the five SPOT-6/7 products used for the evaluation of the snow detection accuracy and the corresponding Sentinel-2 tiles (see also Table 4).

### 4.1.3 Evaluation metrics

From the confusion matrices of Sect. 4.1.1 and 4.1.2, we derived the following metrics: accuracy (the proportion of the total number of predictions that were correct), false-positive rate (FPR, i.e. the proportion of no-snow pixels that were incorrectly classified as snow), false-negative rate (FNR, i.e. the proportion of snow pixels that were incorrectly classified as no snow), F1 score (harmonic average of the precision and recall) and kappa coefficient (Cohen1960).

### 4.1.4 Visual verification

The evaluation methods presented above undersample the actual resolution of the Theia snow collection, both in space (only 120 points of comparisons in Sect. 4.1.1) and time (only six dates of comparisons in Sect. 4.1.2). Given that we do not have a more extensive validation data set, we used a time series of 64 Theia snow products from 6 July 2015 to 2 July 2017 over tile 31TCH to control the consistency of the snow and cloud masks based on the visual inspection of the false-colour composites. This approach is efficient for evaluating the frequency of gross errors on the tile scale, i.e. large patches of false snow or false no-snow detection .

## 4.2 Results

### 4.2.1 Comparison with in situ snow depth measurements

The confusion matrix between the Theia snow products and in situ snow depth measurements is given in Table 5. We find 1414 pairs of data for the study period (i.e. there are 1414 individual snow depth measurements for which a cloud-free retrieval can be found in the Theia Snow collection on the same day). From this confusion matrix, the accuracy (proportion of correct classifications) is 94 % and the kappa coefficient is 0.83, which indicates excellent agreement according to . The false-positive rate (2.8 %) is lower than the false-negative rate (6.7 %), which means that the Theia snow product is more likely to underestimate than to overestimate the snow detection at the station locations. The false-negative rate decreases if SD0 is set to a higher value; however the false-positive rate also increases in such a way that an optimum is reached at SD0=2 cm (Fig. 10).

The comparisons of each individual time series before matching the data by date is shown in Appendix Appendix B. These plots illustrate that the high revisit time of Sentinel-2 enables us to capture the seasonal cycle of the snow cover well. Even intermediate melt-out events at lower elevation stations can be identified over the course of the snow season.

Table 5Confusion matrix between the Theia snow products and in situ snow depth data (SD).

Figure 10Sensitivity of the agreement between the in situ and Theia snow product to SD0 in metres (threshold to convert the measured snow depth to snow presence or absence). The inset shows a close-up of the region of 0–0.2 m.

Table 6Results of the validation of Theia (LIS 1.2.1) and ESA (Sen2Cor 2.5) snow products using SPOT-6/7 reference snow maps.

### 4.2.2 Comparison with snow maps of higher spatial resolution

Table 6 shows that the MAJA–LIS workflow provides a better detection of the snow cover than the Sen2Cor processor in all the studied cases. Although the improvement in the accuracy coefficient is low in some cases (31TGL/2017-03-11, 31TGK/2016-08-13) or even slightly negative in one case (32TLS/2016-10-12), the F1 score and kappa coefficient of the Theia snow products are always greater than or equal to those of Sen2Cor. The differences are significant for four products among the six tested. The lower accuracy in the case of the 32TLS/2016-10-12 product is due to a higher false-negative rate. The false-positive rates in the Theia snow products are generally lower than the false-negative rates, which is consistent with the previous evaluation using in situ data. The spatial analysis of the errors indicates that the false-positives (i.e. overdetection of snow pixels) in the Theia products are mostly located near the snow cover edges (Fig. A1). This suggests that these errors might not be due to a false detection of snow (e.g. confusion with a lake or a cloud surface) but are rather due to the resolution discrepancy with the reference data. By contrast, the spatial comparison of the Sen2Cor products with the reference data shows large patches of false negatives (omission of snow pixels) in the snow-covered area, which are typically due to the omission of snow pixels in shaded areas (Fig. A2).

### 4.2.3 Visual verification

We found 4 products among 64 with significant patches of false-positive pixels (i.e. pixels falsely classified as snow). These false snow areas are exclusively located in cloud areas (Fig. 11). False-negative pixels (i.e. pixels falsely classified as snow-free) can be found by looking at higher-resolution data along the snow cover edges, but we did not find large areas of false-negative pixels.

Figure 11Full time series of Theia snow products from 6 July 2015 to 2 July 2017 over tile 31TCH. Four products with visually evident false snow detections are framed in red and shown at higher resolution. Red arrows indicate the location of the false snow patches, while the green arrow indicates a true snow patch. The product of 16 April 2017 is shown as a reference to help the visualization of the errors.

5 Discussion

The evaluation of the Theia Snow collection using both in situ and remote-sensing reference data sets indicates an excellent accuracy of the snow detection, albeit with a tendency to snow underdetection. The omission of snow pixels can be due to the presence of trees or shadows. In the case of the in situ comparison, part of the errors may be explained by the uncertainty in the geolocalization of the Sentinel-2 data, which is close to 11 m at 95 % for both satellites according to the latest Sentinel-2 L1C data quality report (MPC Team2018). In addition, in situ measurements may not be representative of the snow conditions within the Sentinel-2 pixel. However, the sensitivity analysis of the accuracy and kappa coefficient to the SD0 value suggests that the discrepancy between the scale of the in situ observations and the scale of the pixel observation is not significant, since the optimal SD0 is close to 0 m. In contrast, a similar analysis with lower-resolution snow products from MODIS found an optimal SD0 of 0.15 m . This illustrates how high-resolution snow products from Sentinel-2 enable us to partly resolve the typical scale issue between in situ and remote-sensing products (Blöschl1999).

The visual inspection of the products reveals that the main issue in some Theia snow products is rather the occurrence of false positives due to the confusion with cloud surfaces. This problem can be due to two main factors:

• Cold clouds: cold clouds have a spectral signature that is close to the snow cover since they contain ice crystals (high NDSI). If these clouds are not accurately detected by the level 2A processor then the LIS algorithm will also classify them as snow after pass 1 (Sect. 2). As a result the snow line elevation zs is not well estimated, which can generate erroneous snow patches within these clouds, even at low elevation (e.g. product of 20 July 2017 in Fig. 11).

• Shaded clouds: the three-dimensional structure of the clouds can form shadows within the cloud cover area (e.g. product of 4 March 2017 in Fig. 11). These shaded cloud areas have a lower reflectance in the visible wavelengths and therefore can be considered as dark clouds in the LIS algorithm, provided that the shaded cloud cover area is large enough to significantly reduce the reflectance in the red band, even after the downsampling step (Sect. 2).

Among 64 products, we find 4 cases with significant false snow cover detection in this tile (Fig. 11). Although it is not shown here, we have made this exercise over other tiles in the Alps and Atlas mountains, and we estimate that this proportion is similar in these areas. Regarding this issue we mainly rely on our visual judgment using colour composites to assess the ability of the algorithm to discriminate the cloud and the snow cover. However, to further quantify the algorithm performance and eventually optimize its parameters, we would need a database of classified imagery with labelled regions of cloud and snow cover in many different cases.

Both Sen2Cor and LIS use the NDSI to detect the snow cover; however the algorithms differ in many aspects. The better performances of LIS in Sect. 4.2.2 can be related to the mono-date approach for atmospheric correction and cloud detection in Sen2Cor, which can cause snow–cloud confusion in the L2A product, while LIS uses L2A products from the more accurate multi-temporal MAJA processor . In addition, a unique feature of the LIS algorithm is the use of the snow line elevation concept to improve the robustness of the snow detection in mountain regions. In addition, LIS was designed to retrieve snow below thin clouds and to reclassify snow pixels which are frequently found near the snow cover edges in L2A products. Apart from the MAJA processing, however, the effect of these algorithmic differences could be less evident in flat areas.

6 Code and data availability

The Theia snow collection can be accessed and cited using this digital object identifier: https://doi.org/10.24400/329360/F7Q52MNK . The let-it-snow source code is available under the GNU Affero General Public License v3.0 in this repository: https://gitlab.orfeo-toolbox.org/remote_modules/let-it-snow/ (last access: 12 April 2019).

7 Conclusions

The Theia Snow collection is a free collection of snow products which indicate the presence or absence of snow on the land based on Sentinel-2 (20 m) and Landsat-8 (30 m) data. Most of the snow products are derived from Sentinel-2. However, in late September 2018, Landsat-8 snow products started being routinely added to the Theia Snow collection for the titles in France only. Previous Landsat-8 data were also reprocessed back to March 2017. At the time of writing (15 March 2019), for example in tile T31TCH in the Pyrenees, 20.5 % of all available products were derived from Landsat-8 data (63 among 244). In addition, the processing is done to facilitate the combination of Landsat-8 and Sentinel-2 snow products, since the data format and the tiling scheme are the same (only the pixel size differs). Although the co-registration of Sentinel-2 and Landsat-8 data can be still problematic , the combination of “analysis ready” Landsat-8 and Sentinel-2 is important for increasing the number of cloud-free observations.

The evaluation presented in this article indicate that the Theia Snow product allows an accurate detection of the snow cover (and of the absence of snow). However, it remains a preliminary assessment that should be extended in the future. For example, the evaluation was based only on Sentinel-2 products, since the integration of Landsat-8 data started during the writing of this paper and should be extended to Landsat-8 products. However, the processing is done with the same algorithm, which is based on the calculation of the NDSI from slope-corrected surface reflectance (level 2A data). In addition, the evaluation was limited to the French Alps and French Pyrenees, while the collection covers other mountain regions. Last, the spatial evaluation using higher-resolution remote-sensing data was focused on the accuracy of the snow detection using very high-resolution clear-sky images, while, as discussed above, the main difficulty is not to detect the snow when the sky is cloud-free but rather to avoid false snow detection within clouds. The reduction of the false-positive rate is the main challenge for the future developments in the LIS algorithm. In the meantime, we also welcome feedback from other users.

The Theia Snow collection is based on optical observations; therefore it is not adapted to the detection of the snow cover in dense forest areas where the ground is obstructed by the canopy . This may typically occur in evergreen conifer forests of alpine regions (e.g. Alps, Pyrenees). We have noted that the snow detection can be successful even in alpine forests, but we lack data that can be used for quantifying its accuracy. Therefore it is recommended to use a land cover map to exclude these forest areas from the analysis. For the tiles in France, this can be done using the Theia land cover map , which is distributed at the same spatial resolution. In the Pyrenees, the mean altitude of zero-degree isotherm in winter is close to the treeline elevation at 1600 m; therefore the impact of the forest cover is limited . The LIS algorithm may also fail to detect the snow cover on steep, shaded slopes if the solar elevation is very low (typically below 20). This can occur in midlatitude areas in winter. In this case, the slope correction in the L2A product is generally not applied. This is indicated in the L2A mask but not in the snow product. A future release of the Theia Snow collection should include this information. However, the visual inspection of many images in regions of complex topography gives us the impression that the snow detection algorithm already performs well on shaded slopes. We are planning to investigate this issue further using alternative approaches .

In spite of all these limitations, the Theia snow products have already been successfully used for the evaluation of the MODIS snow products and the assimilation of snow-covered area data into a snowpack model in the High Atlas . Given that the snow cover is a key driver of many natural processes in mountains regions, we envision various potential applications of the Theia snow products, including the modelling of the distribution of the permafrost in mountain regions, the validation and calibration of hydrological models in snow-dominated catchments and the spatial modelling of biodiversity and productivity of ecosystems in mountain regions.

Appendix A: Spatial comparison between Theia and Sen2Cor snow products with SPOT-6/7 reference data

Figure A1Theia snow products vs. SPOT-6/7 snow maps (green: true negative, white: true positive, red: false positive, yellow: false negative).

Figure A2Sen2cor snow product vs. SPOT-6/7 snow maps (green: true negative, white: true positive, red: false positive, yellow: false negative).

Appendix B: Time series of Theia snow products and snow depth records at 120 snow observation stations from 1 September 2017 to 31 August 2018

Figure B1Magenta dots: snow presence (a) or absence (b) in the Theia snow collection. Black line: snow depth time series.

Figure B2Magenta dots: snow presence (a) or absence (b) in the Theia snow collection. Black line: snow depth time series.

Figure B3Magenta dots: snow presence (a) or absence (b) in the Theia snow collection. Black line: snow depth time series.

Figure B4Magenta dots: snow presence (a) or absence (b) in the Theia snow collection. Black line: snow depth time series.

Author contributions
Author contributions.

SG and OH defined the LIS algorithm. MG and GS worked on the implementation and development of the LIS processor. MG, MB and SG worked on the validation of the snow products. SG wrote the manuscript with inputs from all the co-authors.

Competing interests
Competing interests.

The authors declare that they have no conflict of interest.

Acknowledgements
Acknowledgements.

The development and production of the Theia Snow collection is supported by the CNES. We thank Tristan Klempka for his contribution to earlier developments of the LIS processor. We thank Marc Leroy, Arnaud Sellé and Philippe Pacholczyk for the coordination of the Theia project and Joelle Donadieu and Céline l'Helguen for the transfer to the production at MUSCATE. This manuscript was greatly improved thanks to the constructive comments of Gaia Piazzi, Javier Herrero, Jeff Dozier and an anonymous referee.

Review statement
Review statement.

This paper was edited by Alexander Kokhanovsky and reviewed by Gaia Piazzi, Javier Herrero, Jeff Dozier, and one anonymous referee.

References

Baba, M. W., Gascoin, S., and Hanich, L.: Assimilation of Sentinel-2 Data into a Snowpack Model in the High Atlas of Morocco, Remote Sensing, 10, 1982, https://doi.org/10.3390/rs10121982, 2018. a

Baetens, L., Desjardins, C., and Hagolle, O.: Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure, Remote Sensing, 11, 433, https://doi.org/10.3390/rs11040433, 2019. a

Blöschl, G.: Scaling issues in snow hydrology, Hydrol. Process., 13, 2149–2175, 1999. a, b

Bouchet, M.: Validation et amélioration des produits Surfaces Enneigées Sentinel-2, Zenodo, https://doi.org/10.5281/zenodo.1446460, 2018. a

Cohen, J.: A coefficient of agreement for nominal scales, Educ. Psychol. Meas., 20, 37–46, 1960. a

Commissariat général au développement durable: Plan d'applications satellitaires 2018, available at: https://www.ecologique-solidaire.gouv.fr/sites/default/files/Plan% 20d% E2% 80% 99applications% 20satellitaires% 202018.pdf (last access: 12 April 2019), 2018. a

Dozier, J.: Spectral signature of alpine snow cover from the Landsat Thematic Mapper, Remote Sens. Environ., 28, 9–22, 1989. a, b, c

Drusch, M., Bello, U. D., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., and Bargellini, P.: Sentinel-2: ESA's Optical High-Resolution Mission for GMES Operational Services, Remote Sens. Environ., 120, 25–36, https://doi.org/10.1016/j.rse.2011.11.026, 2012. a

Fleiss, J. L., Levin, B., and Paik, M. C.: Statistical methods for rates and proportions, John Wiley & Sons, 2013. a

Frei, A., Tedesco, M., Lee, S., Foster, J., Hall, D., Kelly, R., and Robinson, D. A.: A review of global satellite-derived snow products, Adv. Space Res., 50, 1007–1029, https://doi.org/10.1016/j.asr.2011.12.021, 2012. a

Gao, B.-C., Goetz, A. F., and Wiscombe, W. J.: Cirrus cloud detection from airborne imaging spectrometer data using the 1.38 µm water vapor band, Geophys. Res. Lett., 20, 301–304, 1993. a

Gascoin, S., Hagolle, O., Huc, M., Jarlan, L., Dejoux, J.-F., Szczypta, C., Marti, R., and Sánchez, R.: A snow cover climatology for the Pyrenees from MODIS snow products, Hydrol. Earth Syst. Sci., 19, 2337–2351, https://doi.org/10.5194/hess-19-2337-2015, 2015. a, b, c, d, e, f

Gascoin, S., Grizonnet, M., Klempka, T., and Salgues, G.: Algorithm theoretical basis documentation for an operational snow cover product from Sentinel-2 and Landsat-8 data (Let-it-snow), https://doi.org/10.5281/zenodo.1414452, 2018. a

Gascoin, S., Grizonnet, M., Hagolle, O., and Salgues, G.: Theia Snow collection, CNES for THEIA Land data center, https://doi.org/10.24400/329360/f7q52mnk, 2018. a

GDAL/OGR contributors: GDAL/OGR Geospatial Data Abstraction software Library, Open Source Geospatial Foundation, http://gdal.org (last access: 12 April 2019), 2018. a

Grizonnet, M., Michel, J., Poughon, V., Inglada, J., Savinaud, M., and Cresson, R.: Orfeo ToolBox: open source processing of remote sensing images, Open Geospatial Data, Software and Standards, 2, 15, https://doi.org/10.1186/s40965-017-0031-6, 2017. a

Hagolle, O., Huc, M., Pascual, D. V., and Dedieu, G.: A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENμS, LANDSAT and SENTINEL-2 images, Remote Sens. Environ., 114, 1747–1755, 2010. a, b

Hagolle, O., Huc, M., Desjardins, C., Auer, S., and Richter, R.: MAJA Algorithm Theoretical Basis Document, https://doi.org/10.5281/zenodo.1209633, 2017. a, b

Hall, D., Riggs, G., Salomonson, V., DiGirolamo, N., and Bayr, K.: MODIS snow-cover products, Remote Sens. Environ., 83, 181–194, 2002. a

Inglada, J., Vincent, A., Arias, M., Tardy, B., Morin, D., and Rodes, I.: Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series, Remote Sensing, 9, 95, https://doi.org/10.3390/rs9010095, 2017. a

Jarvis, A., Guevara, E., Reuter, H. I., and Nelson, A. D.: Hole-filled SRTM for the globe: version 4: data grid, CGIAR Consortium for Spatial Information, Washington D.C., USA, 2008. a

Klein, A. and Barnett, A.: Validation of daily MODIS snow cover maps of the Upper Rio Grande River Basin for the 2000-2001 snow year, Remote Sens. Environ., 86, 162–176, https://doi.org/10.1016/S0034-4257(03)00097-X, 2003. a

Krajčí, P., Holko, L., and Parajka, J.: Variability of snow line elevation, snow cover area and depletion in the main Slovak basins in winters 2001–2014, J. Hydrol. Hydromech., 64, 12–22, 2016. a

Li, J. and Roy, D. P.: A Global Analysis of Sentinel-2A, Sentinel-2B and Landsat-8 Data Revisit Intervals and Implications for Terrestrial Monitoring, Remote Sensing, 9, 902, https://doi.org/10.3390/rs9090902, 2017. a

Louis, J., Debaecker, V., Pflug, B., Main-Korn, M., Bieniarz, J., Mueller-Wilm, U., Cadau, E., and Gascon, F.: Sentinel-2 Sen2Cor: L2A Processor for Users, in: Proc. 'Living Planet Symposium 2016', Prague, Czech Republic, 9–13 May 2016, ESA SP-740, 9–13, 2016. a, b, c

Malnes, E., Buanes, A., Nagler, T., Bippus, G., Gustafsson, D., Schiller, C., Metsämäki, S., Pulliainen, J., Luojus, K., Larsen, H. E., Solberg, R., Diamandi, A., and Wiesmann, A.: User requirements for the snow and land ice services – CryoLand, The Cryosphere, 9, 1191–1202, https://doi.org/10.5194/tc-9-1191-2015, 2015. a, b, c

Masson, T., Dumont, M., Mura, M., Sirguey, P., Gascoin, S., Dedieu, J.-P., and Chanussot, J.: An Assessment of Existing Methodologies to Retrieve Snow Cover Fraction from MODIS Data, Remote Sensing, 10, 619, https://doi.org/10.3390/rs10040619, 2018. a

Metsämäki, S., Pulliainen, J., Salminen, M., Luojus, K., Wiesmann, A., Solberg, R., Böttcher, K., Hiltunen, M., and Ripper, E.: Introduction to GlobSnow Snow Extent products with considerations for accuracy assessment, Remote Sens. Environ., 156, 96–108, 2015. a

MPC Team: Sentinel-2 L1C Data Quality Report, European Space Agency, Tech. Rep. 31, 80 pp., 2018. a

Painter, T. H., Rittger, K., McKenzie, C., Slaughter, P., Davis, R. E., and Dozier, J.: Retrieval of subpixel snow covered area, grain size, and albedo from MODIS, Remote Sens. Environ., 113, 868–879, 2009. a

Parajka, J. and Blöschl, G.: Spatio-temporal combination of MODIS images–potential for snow cover mapping, Water Resour. Res., 44, W03406, https://doi.org/10.1029/2007WR006204, 2008. a

Ramsay, B. H.: The interactive multisensor snow and ice mapping system, Hydrol. Process., 12, 1537–1546, 1998.  a

Sirguey, P., Mathieu, R., Arnaud, Y., Khan, M., and Chanussot, J.: Improving MODIS Spatial Resolution for Snow Mapping Using Wavelet Fusion and ARSIS Concept, IEEE Geosci. Remote S., 5, 78–82, https://doi.org/10.1109/LGRS.2007.908884, 2008. a

Sirguey, P., Mathieu, R., and Arnaud, Y.: Subpixel monitoring of the seasonal snow cover with MODIS at 250 m spatial resolution in the Southern Alps of New Zealand: Methodology and accuracy assessment, Remote Sens. Environ., 113, 160–181, 2009. a

Storey, J., Roy, D. P., Masek, J., Gascon, F., Dwyer, J., and Choate, M.: A note on the temporary misregistration of Landsat-8 Operational Land Imager (OLI) and Sentinel-2 Multi Spectral Instrument (MSI) imagery, Remote Sens. Environ., 186, 121–122, https://doi.org/10.1016/j.rse.2016.08.025, 2016. a

U.S. Geological Survey Earth Resources Observation And Science Center: Landsat OLI Level-2 Surface Reflectance (SR) Science Product, https://doi.org/10.5066/f78s4mzj, 2014. a

Vidot, J., Bellec, B., Dumont, M., and Brunel, P.: A daytime VIIRS RGB pseudo composite for snow detection, Remote Sens. Environ., 196, 134–139, https://doi.org/10.1016/j.rse.2017.04.028, 2017. a, b

Xin, Q., Woodcock, C. E., Liu, J., Tan, B., Melloh, R. A., and Davis, R. E.: View angle effects on MODIS snow mapping in forests, Remote Sensing of Environment, 118, 50–59, https://doi.org/10.1016/j.rse.2011.10.029, 2012. a

Zhu, Z., Wang, S., and Woodcock, C. E.: Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images, Remote Sens. Environ., 159, 269–277, 2015. a

Theia is a French inter-agency organization designed to foster the use of Earth observation for land surface monitoring for academics and public policy actors.