Upscaled diurnal cycles of land-atmosphere fluxes : a new global half-hourly data product

Interactions between the biosphere and the atmosphere can be well characterized by fluxes between the two. In particular, carbon and energy fluxes play a major role for understanding biogeochemical processes on ecosystem level or global scale. However, the fluxes can only be measured at individual sites, e.g., by eddy covariance towers, and an upscaling of these local observations is required to analyze global patterns. Previous work focused on upscaling monthly, eight-day, or daily average values and global maps for each flux have been provided accordingly. In this paper, we raise the upscaling of carbon 5 and energy fluxes between land and atmosphere to the next level by increasing the temporal resolution to subdaily scales. We provide continuous half-hourly fluxes for the period from 2001 to 2014 at 0.5◦ spatial resolution, which allows for analyzing diurnal cycles globally. The dataset contains four fluxes: gross primary production (GPP), net ecosystem exchange (NEE), latent heat (LE), and sensible heat (H). We propose two prediction approaches for the diurnal cycles based on large-scale regression models and compare them in extensive cross-validation experiments using different sets of predictor variables. 10 We analyze the results for a set of FLUXNET tower sites showing the suitability of our approaches for this upscaling task. Finally, we have selected one approach to calculate the global half-hourly data products based on predictor variables from remote sensing and meteorology at daily resolution as well as half-hourly potential radiation. In addition, we provide a derived product that only contains monthly average diurnal cycles, which is a lightweight version in terms of data storage that still enables to study the important characteristics of diurnal courses globally. We recommend to primarily use these monthly 15 average diurnal cycles, because they are less affected by the impacts of day-to-day variation, observation noise, and shortterm fluctuations on subdaily scales compared to the plain half-hourly flux products. The global half-hourly data products are available at https://doi.org/10.17871/BACI.224.


Introduction
Understanding the coupling of the atmosphere and the biosphere is key to understanding Earth system dynamics and ultimately to predict future trajectories based on dynamic and fully coupled Earth system models (Bonan, 2008).Observations of energy and carbon fluxes obtained by the eddy covariance technique have revealed major insights into landatmosphere interactions (see the overview by Balddocchi, 2014), but the measurements are local by nature and it remains difficult to derive global inferences.To overcome this limitation, continental to global scale products of biosphereatmosphere fluxes have been produced using machine learning techniques that combine flux tower measurements, observations from remote sensing, and climate data (Jung et al., 2009;Papale et al., 2015).These products proved to be useful, for example, in terms of assessing large-scale patterns of biosphere-atmosphere fluxes with climate data (Jung et al., 2010) or to provide cross-consistency checks for processmodel simulations (Bonan et al., 2011).The general principle of this upscaling approach has been to exploit relation-P.Bodesheim et al.: Upscaled diurnal cycles ships between climate or satellite-based driver variables like temperature or leaf area index, and the targeted biosphereatmosphere flux (Xiao et al., 2012).In the first ("training") step, a machine learning model of the flux data is established based on the driver variables across a regional or global network of towers.In the second ("production"1 ) step, the model is applied to large spatial domains where only gridded estimates of the drivers are available.Machine learning techniques are very effective here since they are fully dataadaptive, do not require initial assumptions on functional relationships, and can cope with nonlinear dependencies.
One of the first upscaling papers by Jung et al. (2009) deals with empirical upscaling of monthly average values of gross primary production (GPP) obtained from a biosphere model.They propose using a model tree ensemble approach to perform the predictions and introduce both a new model tree induction algorithm and a specific ensemble approach.Later, Beer et al. (2010) estimated GPP for different biomes, focussing on global median annual GPP derived using different prediction approaches.Covering a larger number of variables, Jung et al. (2011) produced global flux products at 0.5 • spatial resolution for monthly average values of GPP, terrestrial ecosystem respiration (TER), net ecosystem exchange (NEE), latent heat (LE), and sensible heat (H).Their findings were confirmed by a comprehensive cross-validation analysis using FLUXNET2 towers.In the latest study of Jung et al. (2017), they investigate the dependencies of changes in temperature and water availability on the interannual variability in carbon fluxes both locally and globally using their upscaled data products and process-based global land models.
There exist further upscaling approaches in the literature based on support vector regression models (Yang et al., 2007;Ueyama et al., 2013;Ichii et al., 2017) that estimate carbon fluxes on both regional and continental scales.The work of Xiao et al. (2008Xiao et al. ( , 2010) ) deals with estimating carbon fluxes for the United States using data from MODIS and Ameri-Flux.Only recently, a systematic comparison of different regression algorithms for predicting carbon and energy fluxes has been carried out by Tramontana et al. (2016).They were interested in the best prediction performances for estimating GPP, TER, NEE, LE, and H, as well as net radiation at either 8-day or daily temporal resolution.In their crossvalidation analysis, they found that prediction performance varies only slightly among different regression algorithms from machine learning.However, they could show that accuracies clearly differ between the individual fluxes, meaning that some fluxes are harder to estimate than others, which is probably due to a lack of information in the set of explanatory variables.
Upscaling flux tower measurements represents a "bottomup" approach whereas the "top-down" atmospheric CO 2 inversions have been used for assessing the net carbon exchange between the land and atmosphere.The atmospheric inversions use measurements of CO 2 in the atmosphere and prior information, e.g., on anthropogenic emissions, together with wind fields and a transport model to infer the landatmosphere net CO 2 flux.Due to the relatively sparse atmospheric CO 2 station network and uncertainties in atmospheric transport, such inversion methods cannot precisely provide located estimates but rather assessments for broad regions.The complementary perspectives, strengths, and uncertainties in the "bottom-up" and "top-down" approaches make it particularly appealing to bring these two together.Therefore, we will compare our results for upscaling NEE with those from an ensemble of atmospheric inversions by Peylin et al. (2013) in Sect.6.4.
Today, global flux products feature, at best, a daily temporal resolution as presented by Tramontana et al. (2016).This is partly due to rapidly growing computational issues in the training and production step scaling quadratically with spatial resolution.In addition, consistent global long-term products of driver data with hourly or higher temporal resolutions are lacking or are not readily available.Upscaling half-hourly carbon and energy fluxes raises previous upscaling approaches to the next level by increasing the temporal resolution to subdaily timescales.
Furthermore, there is a need for a global data product of half-hourly fluxes.Such a data product would allow for characterizing subdaily variations in the diurnal cycles at places where no towers are currently installed.Please note that we use the term diurnal cycle to name the full 24 h period of 48 half-hourly values per day.In the literature, this is sometimes rather called a diel cycle, e.g., by Walter et al. (2005); Halsey et al. (2010); Cyronak et al. (2012), which can be separated into a diurnal pattern (daytime) and a nocturnal pattern (nighttime).However, in the community of land-atmosphere exchange research the term diurnal cycle is more common and we will use it throughout this paper.If we want to indicate only those parts that correspond to light-driven processes like carbon fixation in GPP, we explicitly call this daytime GPP (and use the term nighttime to refer to the rest of the day).Furthermore, we use diurnal courses as a synonym for diurnal cycles.
Characterizing typical subdaily flux patterns is critically needed for certain satellite remote sensing applications.For example, the interpretation of satellite retrievals of suninduced fluorescence as proxy for photosynthesis (Guanter et al., 2014;Sun et al., 2015) or integrated atmospheric column carbon dioxide (XCO 2 ) at certain overpass times (usually around mid-day) requires consideration of strong diurnal variations of biosphere-atmosphere carbon fluxes.Another research area where half-hourly data products would be a crucial piece of information is land-atmosphere feedback modeling studies.The derived products could allow for checking the cross-consistency, since many processes governing land-atmosphere interactions, e.g., related to the formation of heavy rainfall or heat waves, in fact operate at subdaily timescales (Dirmeyer et al., 2012).
In view of the need for global high-frequency flux data, we aim at increasing the temporal resolution of data-driven carbon and energy flux products to subdaily timescales by estimating half-hourly values at global scale.We tackle the problem of predicting diurnal cycles with half-hourly values globally for both carbon and energy fluxes between biosphere and atmosphere by treating the upscaling task as a large-scale regression problem.From the machine learning perspective, the random forest regression framework serves as a basis for our computations due to its good performance and suitable scaling properties with respect to large data sets.We test two approaches for estimating half-hourly GPP with random forest models and evaluate both of them using a leave-one-site-out cross-validation strategy for a large set of FLUXNET sites.We produce derived global products with 0.5 • spatial and half-hourly temporal resolution for GPP and NEE as well as for LE and H covering the years 2001 to 2014.For the sake of clarity, some figures in this paper only show the results obtained for GPP although similar plots can easily be created for the other three fluxes that have been considered.Thus, GPP serves as the running example throughout this paper.The choice of GPP was somewhat arbitrary but it also constitutes the upscaled carbon flux that received most attention in the past.
The following sections are organized as follows.First, we introduce the data base that is used in our study by describing both site-level and global forcing data (Sect.2).Then, we explain the methodological background (Sect.3) and the algorithmic concept (Sect.4) of the proposed upscaling approaches in detail.In Sect.5, extensive evaluations and comparisons of the different upscaling strategies are presented based on leave-one-site-out cross-validation, which validate the proposed approach and the derived global products.Afterwards, we present the empirical results at global scale in Sect.6 and highlight intrinsic features of the new data sets.Finally, we discuss both our findings and possible improvements for future applications (Sect.8).The global data sets presented in this paper are freely available to any interested user (see Sect. 7).

Data sources
In this section, we shortly describe the two data sources we are using in our studies.For learning the relationships between predictor variables and the target fluxes as well as for the cross-validation experiments, we make use of sitelevel data extracted at FLUXNET sites that are equipped with eddy covariance towers (Sect.2.1).To perform global upscaling of diurnal cycles, we require gridded data products of the predictor variables at a global scale.The latter are described in Sect.2.2.

Site-level data
Fluxes at half-hourly resolution are currently only achieved by eddy covariance instruments that provide local measurements and spatial extensions are so far only possible by deployment of those instruments on globally distributed towers.Based on these in situ observations, we aim at predicting half-hourly fluxes globally and therefore also rely on the data obtained by the eddy covariance method at different sites.The eddy covariance method (Baldocchi et al., 1988;Aubinet et al., 2012) has revolutionized the study of landatmosphere interactions by offering a means of continuously observing net land-atmosphere fluxes of CO 2 , latent heat, and sensible heat (Balddocchi, 2014).By now, the flux towers are running for sufficient time to enable studies about the interannual variability in land-surface dynamics, but the temporal representativeness remains highly uneven (Chu et al., 2017).In our studies, we rely on data from 222 FLUXNET eddy covariance towers (see Appendix D for a full list of involved sites).All towers are typically equipped with a suite of comparable micrometeorological devices; i.e., instrumentation and data outputs are similar enough, such that training of machine learning methods on data from multiple different sites is possible.Gross carbon fluxes can be derived using different flux partitioning methods as described, for example, by Reichstein et al. (2005) or Lasslop et al. (2010), and here we rely on the former method.In all our experiments, we only make use of measured fluxes; i.e., no gap-filling has been applied and gaps in the half-hourly flux data have simply been ignored (more information on the distribution of data gaps can be found in Appendix B).
As predictor variables, we use the ones selected by Tramontana et al. (2016, Table 2) in the RS+METEO setup that they use for estimating fluxes at daily resolution.For convenience, we have reproduced this table and put it in Appendix C.Besides the plant functional type (PFT), the variables contain remote sensing data from MODIS satellites and meteorological data either in situ measured at the flux tower locations or from long-term time series of the ERA-Interim data set at daily resolution.It should be noted that only the mean seasonal cycles (and derived properties like amplitude, minimum, mean, and maximum) are taken into account for the vegetation indices (normalized difference vegetation index -NDVI, enhanced vegetation index -EVI) as well as for the normalized difference water index (NDWI), the fraction of absorbed photosynthetically active radiation (fAPAR), and the land surface temperature (LST).In contrast, the actual values of air temperature, global radiation, potential radiation, relative humidity, and of different water availability indices have been used.For detailed descriptions, we refer to the corresponding sections in the paper of Tramontana et al. (2016, Sect. 2.1.3 and 2.1.4

Methodological background: random forest regression
Ensemble methods are powerful machine learning tools that combine the outputs of many individual prediction models to obtain more accurate estimations for a target variable.The random forest approach (Breiman, 2001) denotes a typical example, which consists of a set of randomized decision trees.Decision trees in general can be built for classification or regression purposes and they are therefore also called classification trees or regression trees.Multiple decision trees form a decision forest and learning their decision rules typically involves some randomization, which leads to the name randomized decision forest or short random forest.In the following, the concepts of learning and testing randomized decision trees for regression tasks are briefly summarized, because they denote the essential parts of random forest regression.The reader who is familiar with the technical details of random forest regression can skip this section and may directly continue with the proposed upscaling approaches in Sect. 4.Besides the work of Breiman (1996Breiman ( , 2001)), detailed background information about randomized decision trees and random forests can be found in various machine learning textbooks, e.g., those from Mitchell (1997, chap.3), Bishop (2006, Sect. 14.3 and 14.4), Hastie et al. (2009, chap. 9 and10), andMurphy (2012, Sect. 16.2 and16.4).Furthermore, many applications are found in the area of computer vision and medical image analysis (Criminisi and Shotton, 2013).Specific use cases of random forests are also 3 Data from CRUNCEPv6 have been obtained via personal correspondence with Nicolas Viovy (email: nicolas.viovy@lsce.ipsl.fr).land-cover classification (Gislason et al., 2006), high-density biomass estimations (Mutanga et al., 2012), and classification purposes in ecology like the prediction of plant species presence (Cutler et al., 2007) among others.Of course, previous work on the upscaling of fluxes at coarser temporal resolutions also made use of random forests (Tramontana et al., 2016;Jung et al., 2017) or related model tree ensembles (Jung et al., 2009(Jung et al., , 2010(Jung et al., , 2011)).Hence, random forests and tree-based machine learning techniques in general are applied in a broad range of diverse research fields.

Randomized decision tree
Given a training set X = x (i)  ∈ IR D : i = 1, 2, . .., N of N samples with each sample x being a vector consisting of D predictor variables x 1 , x 2 , . .., x D and a corresponding real-valued target variable y ∈ IR with observations y 1 , y 2 , . .., y N ∈ IR for the N training samples, the goal is to find a set of rules that allow for predicting y based on x.In the case of a decision tree, these rules are binary tests for individual predictor variables with simple thresholds.A hierarchical tree structure is built as shown in Fig. 1 by selecting at each node i a predictor variable d i ∈ {1, 2, . .., D} and a threshold t i ∈ IR.The estimate of a node i is the average value y i of the observations computed from training samples that reach this node.The first node of a decision tree, called root node, contains all training samples, and hence the overall mean value y 1 = 1 N N n=1 y n of observations y n from all N training samples is an extremely coarse approximation that needs to be refined depending on the constellation of the input variables x.
Starting at node 1 in Fig. 1, the set of training samples is partitioned into two subsets, represented by nodes 2 and 3, based on the result of the binary test x d 1 ≤ t 1 .Both nodes, node 2 and node 3, have associated predicted outputs y 2 and y 3 that are computed as the average observation of samples that reach the corresponding node.The split parameters d 1 and t 1 are optimized such that the mean squared error for the training samples is minimized given the respective predictions from node 2 or node 3.Such splits are then computed for nodes 2 and 3 as well as for further derived nodes until a stopping criterion is fulfilled.Typical stopping criteria are (i) a split would create nodes with less than N min samples, (ii) the variance of the observations from samples in a node is smaller than some threshold σ 2 min , or (iii) a maximum depth d max of the tree is reached.The depth of a tree is defined as the largest distance of a node to the root of the tree.Values of the parameters N min , σ 2 min , and d max can be changed to obtain either smaller or larger trees, which allows for controlling the runtime of the algorithm and the trade-off concerning generalization and overfitting.
It is usually the case that multiple stopping criteria are tested and if one of them is fulfilled, the current node is not split but becomes a leaf node that stores a final output prediction.Learning a decision tree therefore consists of comput- ing split parameters until only leaf nodes remain that are not split any further (Fig. 1).Hence, one distinguishes between split nodes as the inner nodes of a tree and leaf nodes, each of which contains the estimated output for any sample that reaches this node.
To reduce overfitting to the training set, the learning process is carried out in a stochastic manner by introducing several types of randomization.Whenever split parameters need to be identified, only a random subset of the D predictor variables is taken into account.Furthermore, only a fixed amount of randomly chosen thresholds is tested.Both randomization techniques also lead to reduced computation costs compared to exhaustive search.To predict the output y * of a test sample x * , it is passed through the tree according to the evaluation of the split functions at the inner nodes starting at the root node.This is done until a leaf node is reached, whose precomputed output y is assigned to x.However, more accurate predictions can be achieved by considering an ensemble of randomized decision trees.

Random forest as an ensemble of randomized decision trees
In his work about random forests, Breiman (2001) makes use of a technique called bagging that he has introduced before (Breiman, 1996).Bagging is an acronym for bootstrap aggregating (Breiman, 1996)  Each tree in the ensemble is learned separately and independent from the other trees.Due to the involved randomization techniques during learning of a single tree described before, different trees contain different binary tests and provide different estimates for a single input sample x.The individual predictions of each tree are then aggregated to obtain a final result, which is typically carried out by simple averaging as shown in Fig. 2.However, the number of trees N tree is a hyperparameter whose value needs to be chosen in advance but good assignments depend on various aspects.Since Breiman (1996) pointed out that bagging leads to predictions which are more stable compared to a single model, especially if the decision function of the single model is highly instable with respect to the training set, a larger number of trees is in favor of higher stability.On the other hand, more trees are causing higher computational costs during both learning and testing.In addition, a saturation effect for the prediction accuracy can typically be observed for an increasing number of trees.Hence, accuracies obtained by cross-validation for different numbers of trees can help to identify this saturation and a proper value for N tree .

Methods for upscaling diurnal cycles
The problem of upscaling diurnal cycles of carbon and energy fluxes can be formulated as a large-scale regression task, i.e., estimating half-hourly fluxes for every grid cell of the globe based on a set of predictor variables.These predictor variables typically encode climate conditions or Earth observations obtained from remote sensing at the corresponding spatial positions.However, the temporal resolutions of variables can be different, not only between the target flux (half-hourly) and a predictor variable (e.g., daily) but also among different predictor variables (e.g., daily and halfhourly).Therefore, two prediction approaches for upscaling diurnal cycles are presented in Sect.4.1 and 4.2, respectively, which account for this mismatch of temporal resolutions.Although both approaches can be equipped with any regression algorithm, we have decided to use the RDF as a nonlinear method, which has been summarized in the previous section.The main reasons for this choice are the fast learning and testing algorithms, because the upscaling tasks involve a huge number of samples such that learning nonlinear kernel methods for regression like Gaussian processes (Rasmussen and Williams, 2006) are impractical due to both memory demand and computation time.Furthermore, Tramontana et al. (2016) have shown in their cross-validation experiments that the accuracies for estimating fluxes vary only slightly among different machine learning methods.

First prediction approach: an individual regression model for every half hour of the day
Recall from the beginning of Sect. 4 that the two main challenges for upscaling diurnal cycles to global scale are the huge amount of data which needs to be handled as well as the mismatch of temporal resolutions between predictor variables and the target fluxes.The first approach for predicting diurnal cycles has the advantage that it allows for using only predictors of daily temporal resolution.This is very important, because daily (average) values are often less noisy with respect to measurement noise and the availability of daily values is much higher compared to half-hourly values, especially when considering global products with values for every grid cell.Furthermore, variables derived from remote sensing are often limited to daily temporal resolution meaning that they are not more frequent than once per 24 h.Therefore, the first prediction approach involves learning an individual regression model for each half hour of the day and as indicated at the beginning of Sect.4, RDF regression models are used for handling large-scale data.A schematic overview for a single day and a diurnal cycle of GPP is shown in Fig. 3.Even if one uses only predictor variables of daily temporal resolution which can be treated as constant for the whole day, different values of the target flux for different half hours of the day can be estimated.The reason is that the 48 different RDF models are learned with different values for the target output variable y, although the same values for the predictor variables x are used.For example, an RDF model that is learned for a half hour during night only covers the rather small range of observations y that can be observed at this time, while the range of observations around noon is typically much larger, especially during the growing season.Hence, the 48 RDF models and their estimated outputs differ only because of different observations y that are provided during learning together with the same set of samples X .Of course, it is also possible to incorporate predictor variables at half-hourly temporal resolution, which would directly fit to the resolution of the target flux.Such predictor variables could further enhance the distinction of individual half hours of a day and could lead to more accurate estimations.How-Figure 3. Visualization of the first prediction approach: an individual RDF regression model has been learned for each half hour of a day by just using training data from the corresponding half hour.Here, the predictions of half-hourly fluxes for a single day are visualized.The predictor variables are passed to the individual RDF regression models indicated by the arrows above the RDF models.Each RDF model computes an output for the corresponding half hour, which is shown by the arrows below the RDF models, such that the diurnal cycle is estimated by a conjunction of 48 different predictions from 48 different regression models.Note that this approach allows for predicting diurnal cycles only based on predictors with daily resolution (by ignoring the vertical colored bars in the upper part).However, predictors with half-hourly resolution can also be incorporated (by adding the corresponding half-hourly values indicated by the vertical colored bars).ever, they are optional and not required for this prediction approach as indicated in Fig. 3.

Second prediction approach: a single regression model suitable for all half hours of the day
In contrast to the first prediction approach, the second approach only uses a single regression model that is able to estimate different values for different half hours of the same day.
It is then necessary that the distinction between these half hours is somehow encoded in the predictor variables, which is not the case if only predictors of daily resolution are incorporated.Therefore, this approach requires at least one predictor variable at half-hourly temporal resolution.Fortunately, the potential radiation (R pot ) can be calculated globally at half-hourly resolution, because it only depends on the time as well as the solar angle that is defined given the spatial position via latitude and longitude.Thus, the second approach with a single model, as visualized in Fig. 4, is therefore also applicable for upscaling diurnal cycles to global scale.Again, we make use of an RDF regression model due to its largescale capabilities.
In addition, besides the potential radiation, its first-order temporal derivative can also be incorporated as an additional half-hourly predictor.This allows for a distinction between morning and afternoon via the sign of the derivative as well as for the distinction between day and night.The latter is achieved because R pot is zero for many consecutive hours during night leading also to a value of zero for its derivative in contrast to nonzero values of the derivative in the daytime.For all our computations, we have always included this derivative in the case when we also used R pot .The nice property of this approach is that information about the physical relationships between the predictor variables and the fluxes can be shared among different half hours during learning of the single regression model, which is not the case for individual models as mentioned in the previous section.This second prediction approach therefore seems to be more plausible from a physical perspective, because the distinction between different half hours of the day is made based on the data, e.g., (potential) radiation, and not enforced by learning independent regression models for each half hour.
Although meteorological variables such as air temperature or vapor pressure deficit (VPD) as well as incoming radiation are also potential candidates for predictors that encode subdaily variations in the fluxes, they are currently only available with a half-hourly resolution at individual sites, e.g., also measured at eddy covariance towers.Due to the missing half-hourly meteorological data products at a global scale, it is not possible to use these information for the global upscaling.However, since we are interested in whether such data products could be beneficial for upscaling diurnal cycles, we use the corresponding site-level data in our cross-validation analysis to get further insights.Hence, meteorological variables measured at the eddy covariance towers of FLUXNET can still be used for validating the upscaling approaches and evaluations of cross-validation experiments are presented in the next section.

Assessing different upscaling strategies with leave-one-site-out cross-validation
The global products presented in this paper cover diurnal cycles of four fluxes: GPP, NEE, LE, and H.For each of these fluxes, we have consistently performed cross-validation experiments but the results presented in the following only consider GPP as a running example.We have decided to apply RDF models for regression due to its efficient training and testing algorithms, even in the case of large-scale data, as well as its good performance for upscaling daily mean values of GPP (Tramontana et al., 2016).Each RDF was trained with 100 randomized decision trees, because we observed a saturation effect for the prediction performance in preliminary experiments when increasing the number of decision trees.Further parameters have been set to its default values in Matlab's TreeBagger function, e.g., a minimum leaf size of five samples, since we hardly observed any changes in the overall performances when varying the parameter settings.Performances are measured using the Nash-Sutcliffe modeling efficiency (Nash and Sutcliffe, 1970) as well as the root-mean-square error (RMSE) based on a leave-one-siteout cross-validation scheme.The Nash-Sutcliffe modeling efficiency, from now on simply called modeling efficiency, has been introduced by Nash and Sutcliffe in the context of river flow forecasting but it is often also used as an evaluation criterion in other applications that involve the prediction of variables, especially in related upscaling tasks (Jung et al., 2011;Tramontana et al., 2016).A perfect match between observations and predictions would be reflected by a modeling efficiency of 1, whereas always using the mean of all observations as an estimate translates to a modeling efficiency of 0 and negative modeling efficiencies indicate that the learned model performs worse than always assigning the mean of the observations.The modeling efficiency is related to the fraction of explained variance and to the coefficient of determination (R 2 ).While values close to 1 are preferred for the modeling efficiency, RMSE is a non-negative measure with an optimal value of 0.
The motivation for the leave-one-site-out evaluation as a special case of cross-validation is twofold.First, we want to evaluate regression models that have been learned from as many observations as possible and based on training sets that are most similar to the training set that will be used to compute the global products, which will incorporate all the available data from all FLUXNET sites.Second, we intend to mimic a realistic scenario most similar to the upscaling task by predicting fluxes at locations where no training data has been taken from.As a consequence, we predict fluxes at one FLUXNET site using a regression model learned with all observations from all the remaining FLUXNET sites.After doing this for each individual site, we concatenate all site-specific predictions to form a long vector of predictions that can be compared to the corresponding observations measured at the corresponding sites.This allows for a general evaluation of the prediction approaches in a site-independent manner.

Overview of experiments
We start with a short overview of the experiments that have been conducted in order to clarify our ideas and motivations behind them.In Sect.5.2, we compare the two different prediction approaches for upscaling diurnal cycles that have been introduced in Sect. 4. Furthermore, we focus on comparing different sets of predictor variables, e.g., the effect of meteorological variables at half-hourly resolution on the prediction performances.Evaluations of the prediction performance for monthly average diurnal cycles derived from the half-hourly values are shown in Sect.5.3.These average diurnal cycles per month nicely summarize the fluxes over a longer time period (one month) by still keeping a halfhourly pattern that allows for monitoring subdaily variations.In addition, averaging diurnal cycles for a specific month removes noise in the individual half-hourly measurements and reduces the effects of day-to-day variability, e.g., caused by cloud coverage, which allows for comparing the main characteristics of the observations and the predictions for the selected month.The evaluations of monthly average diurnal cycles play an important role with regard to our provided data products, since we also prepare derived products that only contain these monthly average patterns.With two additional experiments presented in Sect.5.4, we want to demonstrate that the quality of our achieved predictions is not inherently limited by the presented upscaling approaches but rather by missing site-specific information and latent driving forces that are not encoded in the set of predictor variables that has been used.This is not a specific problem of upscaling diurnal cycles of fluxes at half-hourly resolution but a general challenge for all upscaling approaches that deal with carbon and energy fluxes, also at coarser timescales.

Improved predictions by using half-hourly meteorological data
In the following, we compare the results of our presented prediction approaches for half-hourly GPP depending on different sets of predictor variables, which have been obtained by using the leave-one-site-out strategy explained in the beginning of Sect. 5.As the core for all sets of predictors, we include those variables that have been used for upscaling daily mean GPP values by Tramontana et al. (2016).In fact, we use exactly the same set of predictors that corresponds to the RS+METEO setup which has been defined by Tramontana et al. (2016, Table 2) and this table can also be found in Appendix C. Given the explanations of the first prediction approach in Sect.4.1 with individual regression models for each half hour, we can directly use these predictor variables for estimating half-hourly GPP values.However, we also added the potential incoming radiation R pot at half-hourly resolution to encode subdaily variations in the predictors as well as its first temporal derivative to distinguish between morning and afternoon.Furthermore, we have tested a third set of predictors by additionally incorporating meteorological variables with half-hourly resolution measured at FLUXNET tower sites.The added variables are air temperature, vapor pressure deficit, and incoming global radiation.In a nutshell, the three sets of predictor variables consist of (i) daily predictors, (ii) daily predictors + half-hourly R pot , and (iii) daily predictors + halfhourly R pot + half-hourly meteorological predictors, besides static predictors like PFT that are used in all three cases.The second and third set are also used in the experiments for the second prediction approach that includes only a single regression model, because half-hourly information is encoded in some of the predictor variables.In Fig. 5, we have visualized the results for all sites as well as for selected FLUXNET towers.Only for comparison to the leave-one-site-out experiments, we also included the modeling efficiency of a gap-filling algorithm (Reichstein et al., 2005) as a potential upper bound for our predictions.In fact, each measured value is also estimated by a gap-filling algorithm that makes use of flux measurements at the same site under similar climate conditions and hence provides only a theoretical baseline, because it can not be applied for predicting fluxes at locations without any observations.First, we focus on the leftmost group of bars in Fig. 5, which shows the modeling efficiencies for all sites.Looking at the results for the first prediction approach with individual models for each half hour, including half-hourly R pot only slightly improves the average performance (0.67 compared to 0.66), which is probably also caused by the stochastic nature of the RDF learning algorithm.However, including the meteorological predictors at subdaily temporal resolution leads to an increase in the performance to 0.70 modeling efficiency.A similar improvement can be observed for the second prediction approach with a single model for all half hours of the day, because the modeling efficiency increases from 0.67 to 0.71 when including half-hourly meteorological data.This highlights that varying subdaily meteorological conditions has a clear impact on predicting the diurnal cycles of GPP fluxes.
On the one side, half-hourly R pot has almost no influence on the accuracy of the predictions but on the other side, it allows for applying the second prediction approach with only a single regression model.Hence, it may seem more natural from a physical perspective to distinguish individual half hours of the day by the provided half-hourly R pot rather than enforcing the distinction by separately learned regression models.Comparing both prediction strategies, they achieve similar performances when using the same set of predictor variables.Since it is more convenient from a technical perspective to only handle a single regression model instead of 48 different models, the evaluations in the following sections will focus on the second prediction approach with a single RDF model that is suitable for predicting values at every half hour of the day.It is interesting to note that relative performance differences between the two prediction approaches and among the different sets of predictor variables look very similar when considering single sites only.In all our crossvalidation experiments, the best prediction accuracies are always achieved by including half-hourly meteorological variables in the set of predictors.However, absolute performance values vary among sites.As shown in Fig. 5, the accuracies at the sites CA-Man and DE-Hai are between 0.80 and 0.90 modeling efficiency, whereas lower performances (between 0.60 and 0.80) have been achieved for predicting GPP at FR-Pue, IT-Cpz, and US-Goo.Moreover, the fluxes at US-Var seem to be very difficult to estimate, since only modeling efficiencies between 0.20 and 0.30 were obtained.
To further highlight the difference in the predictions when half-hourly meteorology is encoded in the driver variables, we visualize all half-hourly estimations over one year at a specific site using fingerprint plots.A fingerprint in this context is a plot with 365 rows corresponding to 365 days of a year and 48 columns corresponding to 48 half hours of each day such that one fingerprint contains all half-hourly values of a whole year and shows characteristic patterns for the selected site, e.g., length of the growing season.In Fig. 6, the estimations of half-hourly GPP with and without half-hourly meteorological predictors are shown in two individual fingerprint plots and their difference is indicated in a third plot.As expected, the predictions based on half-hourly meteorology contain much more short-term fluctuations during single days, whereas smoother estimations are obtained when only half-hourly R pot is used as a subdaily driver.This can also be observed from the difference of the two fingerprints.Hence, half-hourly meteorological predictor variables are required to better capture high-frequency changes in the fluxes on subdaily timescales.In the following, we take a closer look at average diurnal cycles per month.

Analyzing average diurnal cycles per month
For visual inspection purposes, it is useful to look at average diurnal cycles for individual months at specific sites.Example plots are shown in Fig. 7.They show that our predictions are able to produce the typical shapes of diurnal courses which are in line with corresponding observations.For the depicted predictions, only observations from other sites have Earth Syst.Sci.Data, 10, 1327Data, 10, -1365Data, 10, , 2018 www.earth-syst-sci-data.net/10/1327/2018/ ] 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00   been used to learn the regression models.It is important to note that averaging diurnal cycles within a month reduces noise in the observations as well as in the predictions of a single day, but also smoothens high-frequency short-term fluctuations, e.g., due to (partial) cloud coverage, and yet decreases the influence of day-to-day variations.Hence, these mean diurnal courses are more stable and an evaluation of averaged predictions with respect to averaged observations for all sites leads to larger modeling efficiencies compared to those reported in the previous section.An overview of modeling efficiencies when comparing all half-hourly values versus only looking at the average diurnal cycles is given in Table 1.
In fact, modeling efficiencies for monthly average diurnal cycles increase on average across all sites to a range between 0.78 and 0.80 depending on the set of predictor vari-ables with the best results being accomplished again by incorporating half-hourly meteorological data.This can also be observed from Fig. 8, which is organized in the same way as Fig. 5 but contains the achieved modeling efficiencies for comparing monthly average diurnal cycles of observations and predictions.For the monthly mean diurnal courses, the difference between only using half-hourly R pot or also including half-hourly meteorology is not so large anymore compared to the evaluations for all half-hourly values.This holds for both the overall accuracies for all sites as well as for single selected sites.As previously mentioned, the reason is that averaging fluxes within a month reduces the effect of short-term fluctuations on subdaily timescales.Therefore, if one is only interested in monthly average diurnal cycles, the results obtained by using daily predictors and halfhourly R pot are only slightly worse compared to including ] 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 half-hourly meteorology and for some sites, the prediction performances are even on the same level of accuracy.This is important to know, since we also provide a derived product from our global half-hourly fluxes that contains the monthly average diurnal cycles globally at the same spatial resolution (Sect.6).However, the average diurnal cycles can also be used to identify potential problems of the predictions.In Fig. 9, mean diurnal courses of several months at the sites FR-Pue and IT-Cpz are shown.It can be observed for both sites that the averaged observations are lower in the summer months compared to the corresponding predictions.In other words, the regression models overestimate GPP during these months.We believe that this is caused by the fact that our current prediction models are not able to cope with seasonal droughts, which is not a specific problem of diurnal upscaling but a challenge that every upscaling approach for carbon and energy fluxes needs to tackle.Although the observations show decreased productivity due to drought stress in summer, the regression models still estimate large amplitudes of the diurnal cycles, i.e., a larger productivity.One reason for this behavior could be the insufficient characterization of water availability that is present in the set of predictor variables.Currently, we plan to investigate this issue in further research.In the following, we show that our current sets of predictor variables are lacking some site-specific information, probably not only with respect to water storage capacities.

Are we missing (site-specific) information in the predictors?
In order to gain any insights into whether site-specific information is currently not well represented in the predictors, we have conducted two auxiliary experiments.During the first experiment, we additionally estimate GPP fluxes at each site in a leave-one-month-out setup and compare the resulting predictions with those of the leave-one-site-out setup.For the leave-one-month-out estimations, we learn and test regression models for each month at each site separately.Furthermore, each regression model for each month is only learned with data from the same site but measured in different months (and years).Hence, the regression models are highly site- 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 specific, since only correspondences between predictor variables and GPP fluxes at a single site are used and predictions are made at the same site but in a different time period.As a result, we have observed improved flux estimations, which is shown exemplarily in Fig. 10 for IT-Cpz.It can be clearly seen that the gaps between averaged observations and averaged predictions are getting smaller and mostly almost disappear; i.e., the predictions match the observations much better in the leave-one-month-out setup.In terms of modeling efficiency, the performances increase to a range between 0.75 and 0.79 when comparing all individual half-hourly predictions from the leave-one-month-out setup at all sites with the observations (best performance with leave-one-site-out is 0.70).Regarding the comparison of averaged predictions and averaged observations within each month as presented in the previous section, the leave-one-month-out setup leads to modeling efficiencies between 0.87 and 0.89.This is clearly larger than the results of the leave-one-site-out-experiment (best performance: 0.80).Table 2 allows for a direct com-parison of the results from the leave-one-site-out and the leave-one-month-out experiments using both prediction approaches with the best set of predictor variables, i.e., daily predictor variables, half-hourly R pot , and half-hourly meteorological variables.This table also contains the prediction performances obtained from a second experiment, in which we have used the daily GPP as an additional daily predictor for our regression models in the leave-one-site-out setup.Of course, this is only possible in the cross-validation analysis where we actually have the daily averages of GPP, but the following evaluation reveals interesting insights.Using the daily average GPP basically incorporates information about the amplitudes of the diurnal cycles, hence drought effects of reduced productivity can directly be observed in this additional predictor variable.First of all, it can be seen in Fig. 11  00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 00:00 06:00 12:00 18:00 24:00 Figure 11.Improvements in the initial estimations (a) at FR-Pue can be observed when using daily GPP as an additional daily predictor if it would be available a priori (b).Hence, the problems with seasonal drought effects would be greatly reduced in the leave-one-site-out setup for every half hour of the diurnal cycle in the case when an accurate estimate of the daily average value is given.
by the regression models.Since the daily GPP as an additional predictor constrains the size of the peak in a diurnal cycle, the predictions become much more powerful and the characteristic shapes of the diurnal cycles can be produced.The modeling efficiencies are even larger than those obtained with the leave-one-month-out setup.They are in the range of 0.83 to 0.87 for all half-hourly values at all sites, which is comparable with the performance of the gap-filling algorithm that has been included as an additional reference in Fig. 5 as well as in Table 2.The gap-filling also achieves a 0.87 modeling efficiency; i.e., the upper performance limit shown as a green bar in the leftmost group in Fig. 5 can be obtained by including the daily GPP as an additional predictor.Regarding monthly averaged diurnal cycles, a modeling efficiency of up to 0.94 is obtained by the regression models that use daily GPP as an additional predictor, while gapfilling reaches 0.93.This is also summarized in Table 2.
From this experiment, we can conclude that the problems for predicting diurnal cycles of GPP are mainly caused by the lack of estimating the daily mean GPP properly.If the daily mean is given, predictions of half-hourly values are much more accurate.Hence, the main problems for the upscaling of half-hourly fluxes are not related to producing the right shapes of the diurnal courses, but turn out to be problems of estimating the correct amplitudes.These are then the same problems as for upscaling daily average values (or fluxes at coarser timescales) and are not introduced by the step of going to a larger temporal resolution in terms of half hours.

Key insights from the cross-validation experiments
In this section, we want to shortly summarize the main findings from our cross-validation experiments.First, we have seen that it does not really matter which of the two proposed prediction approaches we are using, since prediction performances hardly differed between the single model approach and the individual model approach.We prefer to use the single model approach, because it seems to be more plausible from a physical perspective to make distinctions between half hours of a day by the information encoded in the pre-Table 2. Comparing modeling efficiencies (and RMSE in µmol m −2 s −1 ) of the two auxiliary experiments (leave-one-month-out setup and including the daily GPP as an additional predictor in the leave-one-site-out setup) to the best performances obtained with the leave-one-siteout experiments by using daily predictor variables, half-hourly potential radiation, and half-hourly meteorological variables.

Comparing all
Comparing dictor variables and half-hourly R pot can always be used for this purpose.Second, including half-hourly meteorological information in the predictors clearly helps to improve the prediction performances for fluxes on the half-hourly timescale.However, for monthly average diurnal cycles the differences are not so prominent anymore and estimations based on halfhourly R pot as the only predictor at half-hourly resolution may be sufficient for analyzing the monthly patterns.Third, we have shown that the main problem for upscaling halfhourly fluxes is not the fact that we increase the temporal resolution, since we are able to reproduce the characteristic subdaily patterns.Moreover, we are lacking additional information in the predictors that encode site-specific characteristics as well as certain special conditions like seasonal droughts.This currently prevents us from obtaining the correct day-to-day variability and also, in the end, the correct interannual variability.However, these are also problems that need to be tackled when an upscaling of carbon and energy fluxes at coarser timescales is considered (Tramontana et al., 2016).In the following section, we summarize the results from our cross-validation experiments for all the four fluxes (GPP, NEE, LE, H) with the setup that has been used to compute the global half-hourly data products.

The selected approach for computing the global products
While the previous sections validate the presented prediction approaches and point to potential problems in the estimation of half-hourly fluxes, we also decided to produce the first global products of half-hourly GPP and NEE, as well as LE and H that will be described in the next section.So far, the analyses have shown that best predictions are obtained by incorporating meteorological variables at half-hourly resolution, but such data products are not available at a global scale.Therefore, we have computed the global products only based on the daily predictors of the RS+METEO setup from Tramontana et al. (2016, Table 2), which can also be found in Appendix C, as well as using half-hourly values of R pot and its first temporal derivative.The used data sources have been described in Sect. 2 and the set of daily predictors varies between carbon and energy fluxes as indicated within the aforementioned table.Furthermore, we have decided to use the second prediction approach (Sect.4.2) by learning one single regression model that is suitable for all half hours of the day.For us, it seems more natural from a physical perspective to distinguish between different half hours of a day by (potential) radiation as an additional variable rather than enforcing the distinction with individual models for each half hour as it is done in the first prediction approach (Sect.4.1).
In Table 3, we report the corresponding prediction performances from the leave-one-site-out cross-validation experiments for this setup, i.e., for the selected set of predictor variables and the single regression model approach.The modeling efficiencies for both individual half-hourly values and monthly average diurnal cycles are stated.Comparing these values, we observe that the accuracies for predicting energy fluxes are higher compared to those for the carbon fluxes.Half-hourly values of the sensible heat flux can be best estimated by achieving a modeling efficiency of 0.77 across all sites.On the other hand, net ecosystem exchange has only been predicted with a modeling efficiency of 0.61.This performance is lower compared to the one for gross primary production (0.67), probably due to missing information in the predictor variables for the respiration component of NEE.For all four fluxes, the modeling efficiencies are higher when comparing monthly average diurnal cycles of observations and predictions.The main reasons, as also mentioned in Sect.5.3, are the reduction of noise and the smoothing of short-term fluctuations at subdaily timescales due to the averaging.In the following section, we present the global half-Earth Syst.Sci.Data, 10, 1327Data, 10, -1365Data, 10, , 2018 www.earth-syst-sci-data.net/10/1327/2018/ hourly flux products that have been calculated with the upscaling approach and the setup of this section.

Global flux products with half-hourly resolution
For each of the four fluxes ( In addition to the provided half-hourly data, we also offer derived products containing the monthly average diurnal cycles of the four fluxes for the 14 years that are covered by the half-hourly product.For the potential user of the data, it will be much more convenient to directly obtain the monthly average diurnal cycles compared to downloading the much larger half-hourly data product and computing the monthly averages afterwards.Furthermore, the monthly average diurnal cycles are more robust, which has also been shown by larger modeling efficiencies in the experimental evaluations, e.g., as listed in Table 1.Since only daily predictor variables and half-hourly R pot are used to estimate the halfhourly fluxes, short-term fluctuations on subdaily timescales due to cloud cover and other effects can not be captured by the current version of the product.Therefore, also day-today variations may not be represented accordingly.However, the averaging to create monthly mean diurnal cycles reduces the impact of these factors and additionally smoothens errors due to observation noise.As a consequence, we recommend to primarily use the monthly average diurnal cycles because of larger robustness and stability.In the following, we show some characteristics of the computed global flux products at half-hourly resolution, which can only be calculated due to the subdaily timescale.

Global maps and fingerprints
Cutouts of the global products are visualized in Fig. 12, where we have selected 14 June 2014 at 13:00 UTC in the time domain.Global maps of GPP and NEE are shown in the top row of Fig. 12 and one can nicely distinguish daytime from nighttime for individual regions around the world.Furthermore, selected locations are highlighted and all halfhourly values of the year 2014 for these grid cells are summarized in fingerprint plots, which allow for identifying different characteristics at the corresponding places due to the different patterns in these plots.The fingerprints provide a nice overview of the half-hourly fluxes over a whole year and different lengths of the growing season as well as varying lengths of the day (time between sunrise and sunset) can directly be observed.Corresponding maps for LE and H with fingerprint plots for the same locations are shown in the bottom row of Fig. 12. Larger values of LE compared to H at this specific point in time can be observed in western, central, and eastern Europe as well as in the tropical regions of Africa, whereas it is vice versa on the Iberian Peninsula as well as in the northern and southern regions of Africa.

Maximum diurnal amplitudes within a month
Besides the fingerprint plots summarizing a whole year of half-hourly values for a specific location, it is also possible to compute diurnal amplitudes for each grid cell from the global products.We again picked GPP acting as an example for all the fluxes and determined maximum diurnal amplitudes within each month.In Fig. 13, the maximum diurnal amplitudes of GPP are visualized for June and December 2014 with a logarithmic color scale.These months have been chosen to indicate differences between summer and winter.The biosphere at the Northern Hemisphere is quite active in June showing large amplitudes, whereas maximum amplitudes are close to zero at most of the grid cells of the Northern Hemisphere in December.In the tropics, amplitudes of GPP do not vary much between June and December with values around 30 µmol m −2 s −1 .As expected, the behavior in the Southern Hemisphere is opposite to the Northern Hemisphere; i.e., the productivity in the Southern Hemisphere is higher in December compared to June.

Maximum half-hourly fluxes
Furthermore, we have been interested in the maximum flux at each spatial position.These statistics have been calculated among all the years 2001 to 2014 to produce a single map of maximum half-hourly values for each flux.The results are shown in Fig. 14.Those maximum values denote the capabilities of each flux at each grid cell.For GPP, the hot spots with maximum capacities are in the corn belt of the USA, in eastern China, and in the tropical regions.Largest values of NEE are obtained in the tropics as well, especially in the Amazon.Regarding the energy fluxes, it is not so easy to identify single hot spot regions since large values of LE or H are widespread.However, distinct spatial patterns can be observed in all maps of the maximum fluxes.

Comparison with ensemble of atmospheric inversions
Finally, we want to compare our global product of NEE with an ensemble of atmospheric CO 2 inversions that contains CarbonTracker (Peters et al., 2007), CarbonTracker Europe (Peters et al., 2010), the Jena CarboScope inversion scheme (s99_V3.6)from Rödenbeck (2005), and MACC-II of Chevallier et al. (2010) that has also been used by Peylin et al. (2013)   In the spatial domain, we have then performed an averaging step according to the 11 TransCom land regions (Gurney et al., 2002): North American Boreal, North American Temperate, South American Tropical, South American Temperate, Northern Africa, Southern Africa, Eurasian Boreal, Eurasian Temperate, Tropical Asia, Australia, and Europe.The upper panel in Fig. 15 contains the MSC of our NEE product for the 11 TransCom land regions as well as the average MSC from the ensemble of atmospheric inversions.For the latter, the shaded regions are spanned by minimum and maximum values per month.Since it is known that upscaling methods tend to overestimate the carbon uptake of the biosphere (too large carbon sink) compared to atmospheric inversions, we have also subtracted the mean value from all curves in the upper panel and show the differences from the corresponding mean values in the lower half in order to see whether the patterns in the MSC are matching between the upscaled product and the inversions.One can nicely see from the plots in the lower half that this is the case for most of the regions, except for the tropics where seasonality is also small.North American Boreal North American Temperate

Europe
Ensemble of atmospheric inversions Our data product In the upper panel, we also observe the largest discrepancies between the upscaled NEE product and the results from the atmospheric inversions in the tropical regions.However, this is a known problem for the upscaling approaches that rely on flux tower measurements (Jung et al., 2011;Zscheischler et al., 2017) and not a specific problem of our product.The reason is not yet clarified in the community but one issue is related to the fact that there are only very few flux tower sites in tropical regions, which can also be observed from Fig. A1.For some regions, such as Southern Africa and South American Temperate, mismatches between the inversions and the flux tower upscaling might also be due to contributions of fire emissions which are "seen" by the atmosphere but not in our approach.Overall, the patterns of the MSC in most of the regions are very similar to the results of the atmospheric inversions.This is remarkable given that the two approaches and data sources are entirely independent.Thus, our upscaling product has the potential to provide further constraints for the atmospheric inversion methods with the benefit of high resolution in both space and time.
We use the atmospheric inversions as an independent benchmark here, even though a number of uncertainties also apply to those.To put the agreement of the upscaling with the inversions into context of agreement among inversions, we display the values of pairwise correlation coefficients for the MSC of the different inversion methods together with the correlation coefficients between the MSC of each inversion approach and our upscaling product for each TransCom land region in Fig. 16.Overall, the agreement of our upscaling approach with the inversions is comparable with the agreement among different inversions suggesting promise for a synergistic joint usage of both approaches.There is very large consistency among inversions and with the upscaling approach for the regions North American Boreal, North American Temperate, Eurasian Boreal, and Europe.Interestingly, the upscaling approach is more consistent with the inversions than among the inversions for the Eurasian Temperate region.We observe that the seasonality of NEE in the tropical regions is not well constrained by each approach.

Data availability and usage notes
The calculated global half-hourly flux products are publicly available for free at https://doi.org/10.17871/BACI.224(Bodesheim et al., 2017) under the creative commons license CC BY 4.04 .More precisely, gridded products at 0.5 • spatial resolution and half-hourly temporal resolution are provided covering GPP, NEE, LE, and H for the years 2001 to 2014.In addition, a derived product of monthly average diurnal cycles globally for these four fluxes and the given range of years at the same spatial resolution has been prepared for download as well.It is much more convenient for the user to just download the lightweight data of the monthly averages instead of getting the half-hourly data of much larger file size and then performing the averaging on the local machine.As mentioned in the paper, the monthly average diurnal cycles are primarily recommended for usage, since this derived product turned out to be more robust.
Please note that all data files of GPP can contain slightly negative values, which seems to be implausible at first glance.However, these negative values mainly occur during nighttime and are the result of an artifact in the flux partitioning method at site level carried out for the FLUXNET eddy covariance tower network, where observed NEE is separated into GPP and ecosystem respiration.The negative values from the sites are part of the training set for the proposed upscaling approach, and therefore the machine learning model can produce negative GPP values for similar environmental conditions as well.Since our data products are obtained by an entirely data-driven machine learning approach, the observational error at site level (that also causes negative nighttime GPP at the sites) propagates to global scale.Hence, dealing with negative GPP observations is not only a problem in our global data products but also occurs when working with site-level data.Neglecting negative values at the sites during model learning or manually setting them to zero would lead to biased regression models and setting negative estimations to zero would cause biased predictions.We therefore decided to keep negative values both in the training set and in our provided global data products.If these negative values are causing trouble within any application that builds on our data products, they can easily be set to zero by the user as an appropriate post-processing step.However, the user should keep in mind that this leads to an overall bias within the data product.
The data products are stored as individual files for each variable and each year that has been considered.We have chosen the platform-independent NetCDF5 file format and software packages exist in many scientific programming languages (including Matlab, python, and R) for easy data access.

Conclusions and future work
In this paper, we have shown how to perform an upscaling of half-hourly carbon and energy fluxes from local in situ measurements to global scale.We have introduced two general prediction approaches to estimate half-hourly values mainly from predictor variables at coarser temporal resolution.Since the problem has been formulated as a large-scale regression task, we have been working with random forest regression, although other regression algorithms could be applied as well.Our prediction approaches have been validated by a set of cross-validation experiments employing a leaveone-site-out strategy for the FLUXNET towers that provide the observations.As a result of our analyses, we have presented global flux products at half-hourly temporal resolution for the years 2001 to 2014 covering four important variables: gross primary production, net ecosystem exchange, latent heat, and sensible heat.Detailed descriptions of the experimental setup for the cross-validation as well as for the computations that have led to the global products were given as well.Concerning the global data products, we have also shown derived statistics like maximum diurnal amplitudes of a month as well as maximum half-hourly fluxes at each spatial position.These properties can only be computed from data products with subdaily temporal resolution showing the benefits of our contributions.
In future work, we aim at improving the prediction performance of half-hourly fluxes in various ways.First, we plan to add additional sources of information to the drivers by extending the set of predictor variables to cover further relevant aspects for the individual fluxes like water availability or soil properties.This would allow for tackling difficult scenarios like seasonal droughts, where the current approaches have shown larger errors in the prediction.Second, we also want to incorporate the history of the predictor variables in order to account for lagged effects.So far, samples are treated independently in the prediction but their temporal context due to the time series characteristics may provide additional knowledge that can be exploited for the estimation of fluxes.Third, subdaily meteorology could be included in the calculations of the global products by incorporating the new generation of meteorological reanalysis data of ERA5 at an hourly timescale that will be released in the near future or by exploiting observations from geostationary satellites.

Appendix A: FLUXNET tower locations
Since our machine learning models for the upscaling tasks depend on in situ measurements from FLUXNET towers that are required to create the training set, it is necessary to look at the spatial distribution of these towers to judge on the adequacy of the global data products.In Fig. A1, we display the FLUXNET tower locations of the 222 sites that have been used in this paper by superimposing them on the maximum half-hourly GPP map of Fig. 14.One can clearly observe a bias in the distribution of sites, since most of the towers are located in Europe and North America.Hence, regions with similar climatic conditions are well represented by our global data products compared to ecosystems that are also less captured by the site network, e.g., the tropical regions.Please note that this bias is not a specific problem for the upscaling approaches and global data products of this paper, because it affects all upscaling studies and corresponding data products (Jung et al., 2011;Tramontana et al., 2016;Zscheischler et al., 2017).Thus, it does not only influence the diurnal cycle of fluxes, which is the key contribution of this paper, but rather remains a more general issue.We also refer to previous studies about the representativeness of FLUXNET, see for example the work of Reichstein et al. (2014), who make use of climate-space mappings to illustrate the impact of an imbalanced distribution of sites on carbon uptake.In Sect.2.1, we mentioned that we have ignored gaps in the half-hourly flux data.From a machine learning perspective, we can anyway only do the training on the data we have, independent of their distribution in time.Furthermore, we have decided to not include gap-filled data in order to prevent the machine learning model to adapt too much to the gap-filling method.However, we have found that within all the site days that we have taken into account, there are roughly 35 % gaps and Fig. B1 shows their distribution among the half hours.
One can clearly observe a nighttime dominance of the gaps.For GPP, this is not a big problem, because it is assumed to be zero anyhow.Considering NEE, the absolute fluxes are also smaller during night compared to daytime observations.The nighttime dominance of gaps arises from less turbulence during these hours and this is an inherent problem of the measurement devices that we cannot resolve.However, it should be noted that such a biased distribution of gaps does not directly lead to a model bias, as it would be the case, for example, for linear methods.Since we have picked random forests as a nonlinear machine learning technique, our derived models are less biased for imbalanced data because the final estimations in the leaf nodes of the decision trees are made locally in predictor space by considering mean values from samples that fall into the respective leaf node.Hence, they are independent from samples that are far away in predictor space but could potentially have higher or lower density.
In addition, we have also carried out preliminary experiments where we have only used site days with no gaps, i.e., where all 48 half-hourly values have been available.This has then reduced the overall number of training samples massively and has clearly reduced prediction performance, most likely due to worse generalization abilities because the reduced training data did not capture all environmental conditions sufficiently well.

Figure 4 .
Figure 4. Visualization of the second prediction approach: a single RDF regression model is able to predict the flux at every half hour of the day if at least one predictor variable has a half-hourly temporal resolution (such as the potential radiation R pot ).This is different from the first prediction approach shown in Fig.3, because the whole training data of all half hours are used to learn a single RDF regression model.The upper part of the figure shows the composition of the input data (predictor variables) that are required for predicting the half-hourly values of a single day.For each half hour, the corresponding values of the predictors with half-hourly resolution are added to the predictors with coarser temporal resolution (indicated by the vertical color bars).We then have 48 distinct samples due to the distinction from the half-hourly predictor variables and these samples are delivered to the single RDF regression model indicated by the arrows in the upper part.For each sample, the RDF model calculates an output value (lower part) that denotes the estimate for the corresponding half hour within the diurnal cycle.

Figure 6 .
Figure 6.Fingerprint plots of half-hourly GPP fluxes estimated for US-SO2 in 2004 with leave-one-site-out cross-validation which show that short-term fluctuations on subdaily timescales are captured better when half-hourly meteorological predictors have also been included (a) compared to only using half-hourly R pot (b).The difference of the first two plots (c) also emphasizes this observation.

Figure 7 .
Figure 7.Some example sites with average diurnal cycles for different months comparing two prediction approaches with the observations.

Figure 8 .
Figure 8. Prediction performances for monthly average diurnal cycles of GPP are shown in the same way as the accuracies for all half-hourly values in Fig. 5 (a modeling efficiency; b RMSE in µmol m −2 s −1 ).

Figure 9 .
Figure 9. Average diurnal cycles of two sites showing the problems with seasonal droughts.The error in the prediction of the half-hourly fluxes increases during hot and dry summers for both sites, FR-Pue and IT-Cpz.

Figure 10 .
Figure 10.Comparison between leave-one-site-out (a) and leave-one-month-out (b) at IT-Cpz.It can be observed that site-specific training in the leave-one-month-out setup reduces the prediction errors during seasonal droughts.Thus, the drought effects only lead to problems when training across sites and predicting fluxes in the leave-one-site-out setup or for the upscaling when fluxes are estimated at locations where no towers exist.
that using the daily GPP as an additional predictor clearly improves the predictions at FR-Pue during summer months.Especially the decreased productivity in July and August 2005 can be nicely predicted Earth Syst.Sci.Data, 10, 1327-1365, 2018 www.earth-syst-sci-data.net/10/1327model, daily predictors + half-hourly Rpot Predictions with single model, daily predictors + half-hourly Rpot + half-hourly meteorological predictors GPP

Figure 12 .
Figure 12.The global maps show estimated values for half-hourly GPP (a), NEE (b), LE (c), and H (d) on 14 June 2014 at 13:00 UTC.In addition, fingerprints for selected grid cells are used to visualize half-hourly values for each day of the year.The dot in each fingerprint marks the value that is shown in the global map.Note that the fingerprints display different extensions of the growing season in different regions and the global maps allow for distinguishing between daytime (e.g., in Europe and Africa) and nighttime (e.g., in East Asia, Australia, and in the western parts of North America).

Figure 13 .
Figure 13.Maximum diurnal amplitudes of GPP within a month are shown for June 2014 (a) and December 2014 (b).Differences between summer and winter for both the Northern Hemisphere and the Southern Hemisphere as well as (almost) constant productivity in tropical regions can be observed from both maps.Note the logarithmic color scale.

Figure 14 .
Figure 14.Maximum half-hourly values of GPP (a), NEE (b), LE (c), and H (d) during the years 2001 to 2014 are shown for each grid cell.

Figure 15 .
Figure 15.Comparison of our upscaled global NEE fluxes (red lines) with an ensemble of atmospheric CO 2 inversions (blue lines) considering the mean seasonal cycle (MSC) for 11 TransCom regions estimated from the years 2001 to 2010.For the ensemble of atmospheric inversions, the average MSC among the ensemble members is shown together with shaded regions spanned by minimum and maximum values within the ensemble.

Figure 16 .
Figure 16.Distributions of correlation coefficients for the mean seasonal cycles in the 11 TransCom regions comparing either individual ensemble members from the set of atmospheric inversions or each ensemble member with our upscaled NEE product.
Figure A1.The superimposed FLUXNET tower locations of the sites used in this paper clearly show the biased distribution of underlying in situ measurements due to better spatial coverage of regions in Europe and North America compared to the rest of the world.

Figure B1 .
Figure B1.The distribution of gaps in the flux data of 129 456 site days that we have used from 222 FLUXNET towers clearly shows a nighttime dominance of the gaps.In total, there are roughly 35 % of gaps in this half-hourly flux data.
In order to compute the global flux products at half-hourly resolution via upscaling, we require the predictor variables mentioned in the previous section at global scale, i.e., the variables of the RS+METEO setup from TableC1in Appendix C, which has been reproduced from Table2 of Tra- ). www.earth-syst-sci-data.net/10/1327/2018/ Earth Syst.Sci.Data, 10, 1327-1365, 2018 1330 P. Bodesheim et al.: Upscaled diurnal cycles 2.2 Global forcing data General structure of a decision tree for regression: binary splits with thresholds for individual predictor variables will be used to navigate a sample x to a leaf node that stores a continuous estimate for the target variable.
Predicting the output y * of a sample x * with a randomized decision forest is carried out by averaging the individual predictions obtained from the T decision trees in the ensemble.
P.Bodesheim et al.:Upscaled diurnal cycles Individual models, daily predictors + half-hourly Rpot Individual models, daily predictors + half-hourly Rpot + half-hourly meteorological predictors Single model, daily predictors + half-hourly Rpot Single model, daily predictors + half-hourly Rpot + half-hourly meteorological predictors Gap-filling performance for comparison Individual models, daily predictors + half-hourly Rpot Individual models, daily predictors + half-hourly Rpot + half-hourly meteorological predictors Single model, daily predictors + half-hourly Rpot Single model, daily predictors + half-hourly Rpot + half-hourly meteorological predictors Gap-filling performance for comparison Prediction performances (a modeling efficiency; b RMSE in µmol m −2 s −1 ) for individual half-hourly values of GPP depending on different sets of predictor variables are shown.We compare our two proposed prediction approaches (individual RDF models and single RDF model) and also include the results of a gap-filling algorithm for comparison.Looking at the results of all sites as well as at sitespecific performances, we observe that meteorological predictor variables at half-hourly resolution clearly improve the accuracies of the estimations.Site acronyms: Canada Manitoba Black Spruce (CA-Man), Germany Hainich (DE-Hai), France Puechabon (FR-Pue), Italy Castelporziano (IT-Cpz), USA Mississippi Goodwin Creek (US-Goo), USA California Vaira Ranch (US-Var).

Table 1 .
Modeling efficiencies (and RMSE in µmol m −2 s −1 ) for the predictions of all sites obtained from the leave-one-site-out experiments are summarized and here we differentiate between comparing all individual half-hourly values with the observations and only looking at monthly average diurnal cycles.Individual models, daily predictors + half-hourly Rpot Individual models, daily predictors + half-hourly Rpot + half-hourly meteorological predictors Single model, daily predictors + half-hourly Rpot Single model, daily predictors + half-hourly Rpot + half-hourly meteorological predictors Gap-filling performance for comparison www.earth-syst-sci-data.net/10/1327/2018/ Earth Syst.Sci.Data, 10, 1327-1365, 2018 1338 P. Bodesheim et al.: Upscaled diurnal cycles

Table 3 .
Prediction performances in terms of modeling efficiency (and RMSE in µmol m −2 s −1 ) are estimated from the leave-one-site-out cross-validation experiments with the setup that has been used to compute the global half-hourly products for the four fluxes.
GPP, NEE, LE, H), we have learned a single regression model for all half hours based on all available half-hourly values of the corresponding flux at the 222 FLUXNET sites listed in Appendix D, i.e., one regression model for GPP, one for NEE, one for LE, and one for H.These models are then used to compute halfhourly fluxes globally with 0.5 • spatial resolution and continuously from 2001 (1 January) to 2014 (31 December) using global forcing data described in Sect.2.2.As mentioned in the previous section, we have used the daily predictors of the RS+METEO setup fromTramontana et al. (2016, Table 2)as well as half-hourly values of R pot and its first temporal derivative.Note that this table has been reproduced in Appendix C for faster reference.Furthermore, it should be noted that the global products have been initially calculated such that they are tiled by PFT.The final flux for each point in space and time has then been determined as a weighted sum depending on the percentage of each PFT to be present in the corresponding grid cell.When looking at annual sums of the half-hourly data products, we observe that these sums are pretty constant among the different years for the individual fluxes.On average, we get 125.94Pg C for GPP and −21.42 Pg C for NEE as well as 182.22 ZJ for LE and 144.79 ZJ for H.

Table D1 .
This is a list of 222 FLUXNET sites from which we have used data in our study.