Tropospheric water vapour isotopologue data ( H 162 O , H 182 O , and HD 16 O ) as obtained from NDACC / FTIR solar absorption spectra

We report on the ground-based FTIR (Fourier transform infrared) tropospheric water vapour isotopologue remote sensing data that have been recently made available via the database of NDACC (Network for the Detection of Atmospheric Composition Change; ftp://ftp.cpc.ncep.noaa.gov/ndacc/MUSICA/) and via doi:10.5281/zenodo.48902. Currently, data are available for 12 globally distributed stations. They have been centrally retrieved and quality-filtered in the framework of the MUSICA project (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water). We explain particularities of retrieving the water vapour isotopologue state (vertical distribution of H16 2 O, H 18 2 O, and HD 16O) and reveal the need for a new metadata template for archiving FTIR isotopologue data. We describe the format of different data components and give recommendations for correct data usage. Data are provided as two data types. The first type is best-suited for tropospheric water vapour distribution studies disregarding different isotopologues (comparison with radiosonde data, analyses of water vapour variability and trends, etc.). The second type is needed for analysing moisture pathways by means of {H2O,δD}-pair distributions. Published by Copernicus Publications. 16 S. Barthlott et al.: Water vapour isotopologue data from NDACC/FTIR spectra


Introduction
Simultaneous observations of different tropospheric water isotopologues can provide valuable information on moisture source, transport, cloud processes, and precipitation (e.g.Dansgaard, 1964;Gat, 2000;Yoshimura et al., 2004).For identifying and analysing different tropospheric moisture pathways, the distribution of {H 2 O, δD} pairs (e.g.Galewsky et al., 2005;Noone, 2012;González et al., 2016) or deuterium excess (d = δD − 8δ 18 ; e.g.Pfahl and Sodemann, 2014;Steen-Larsen et al., 2014;Aemisegger et al., 2014) are particularly promising.The formula of deuterium excess is based on definitions used in studies such as Craig (1961a, b) and Dansgaard (1964).The δ notation is used to express the relation of the observed isotopologue ratio to the standard ratio VSMOW (Vienna Standard  VSMOW − 1.In recent years, there has been significant progress in measuring the tropospheric water vapour isotopologues; remote sensing observations are particularly interesting since they can provide data for the free troposphere and they can be performed continuously (for cloud-free conditions).During MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water), a method has been developed to obtain tropospheric water vapour profiles as well as {H 2 O, δD} pairs from groundbased FTIR (Fourier transform infrared) and space-based IASI (Infrared Atmospheric Sounding Interferometer) observations.The ground-based FTIR spectra are measured in the framework of the NDACC (Network for the Detection of Atmospheric Composition Change, www.ndacc.org).These spectra are of very high quality and have been available at some stations since the 1990s, thus offering longterm data records, which are of particular interest to climatological studies or for assessing the stability of longterm satellite data records.Network-wide consistent quality of the MUSICA NDACC/FTIR water vapour isotopologue data is ensured by central data processing and quality control (e.g.all data are quality-filtered by the XCO 2 method as presented in Barthlott et al., 2015).The high quality of the MUSICA ground-and space-based remote sensing data has been demonstrated by empirical validation studies (Schneider et al., 2016).
In this paper, we present the MUSICA NDACC/FTIR data as provided recently via the NDACC database (see also Barthlott et al., 2016).Our objective is to make a data user aware of the nature of the data and to give recommendations and explanations for correct data usage.Section 2 briefly describes the particularity of retrievals of the water vapour isotopologue state.It is shown that for achieving an optimal estimation of the tropospheric water vapour isotopologue state, we have to work with full state vectors consisting of different isotopologues, i.e. a state vector consisting of several trace gases.As a result, water vapour isotopologue data cannot be provided in exactly the same data format as other FTIR data.In Sect. 3 we describe the data format used for providing this new data via the NDACC database.Section 4 gives some insight into the data characteristics and recommendation for working with this new data set.In Sect.6, we conclude and summarise our results.The summarised guidelines for correct data usage can be found in Table 3.

Optimal estimation of the water vapour isotopologue state
2.1 The principle of optimal estimation retrieval methods Atmospheric remote sensing retrievals characterise an atmospheric state from a measured spectrum.However, such an inversion problem is often ill-posed (a lot of different atmospheric states can explain the measured spectrum).Consequently, for solving this problem, some kind of regularisation is required.This can be introduced by means of a cost function: Here, the first term is a measure of the difference between the measured spectrum (y) and the spectrum simulated for a given atmospheric state (x), where F represents the forward model, which simulates a spectrum y for a given state x, taking into account the actual measurement noise level (S is the measurement noise covariance).Vector p represents auxiliary atmospheric parameters (like temperature) or instrumental characteristics (like the instrumental line shape).The second term of the cost function (1) is the regularisation term.
It constrains the atmospheric solution state (x) towards an a priori most likely state (x a ), where kind and strength of the constraint are defined by the a priori covariance matrix S a .The constrained solution is reached at the minimum of the cost function (1).This method of updating the knowledge about the a priori state with information from a measurement is known as the optimal estimation method, which is a standard remote sensing retrieval method (see Rodgers, 2000, for more details) in which validity of the optimal estimation solution strongly depends on a correct and comprehensive description of the a priori state by means of x a and S a .For readers that are not familiar with remote sensing mathematics, we added some mathematical foundations in Appendix A.

Correct description of the water vapour isotopologue state
Water vapour isotopologues with reasonably strong and welldiscernible spectral infrared signatures are H Earth Syst.Sci.Data, 9, 15-29, 2017 www.earth-syst-sci-data.net/9/15/2017/ Tropospheric water vapour shows a strong variation (in space and time) and it can be better described by a lognormal than by a normal distribution (Hase et al., 2004;Schneider et al., 2006).Furthermore, the different isotopologues vary mostly in equal measure, i.e. their variations are strongly correlated and the variation in the isotopologue ratios is much smaller.For instance, the variation in ln HD 16   (Craig and Gordon, 1965).An elegant method for correctly describing the nature of the full water vapour isotopologue state is to work on a logarithmic scale (due to the log-normal distribution characteristic) and with the following three states: the humidity-proxy state: The water vapour isotopologue state can be expressed on the basis of ln H 16 2 O , ln H 18 2 O , ln HD 16 O or on the basis of the proxies of {humidity, δD, d}.Both expressions are equivalent.Each basis has the dimension nol × 3, where nol is the number of levels of the radiative transfer model atmosphere and three different combinations of the three different isotopologues have to be considered.In the following, the full water vapour isotopologue state vector expressed on the ln H 16 2 O , ln H 18 2 O , ln HD 16 O basis and the {humidity, δD, d}-proxy basis will be referred to as x l and x l , respectively, where index "l" stands for logarithmic scale.A basis transformation can be achieved by operator P: (5) Here, the nine matrix blocks have the dimension nol × nol, 1 stands for an identity matrix, and the state vectors x l and x l are related by Similarly, covariance matrices can be expressed in the two basis systems and the respective matrices S l , and S l are related by The variation in humidity, δD, and deuterium excess have different magnitudes.This can be well considered by giving the a priori covariance matrix in the {humidity, δD, d}-proxy basis: This assumes that humidity, δD, and deuterium excess vary independently, which is actually not the case in the lower/middle troposphere.However, the correlation between humidity, δD, and deuterium excess is a detail.It is important that the different magnitudes of variability are taken into account (the entries of S l a hum are much larger than the entries of S l a δD and S l a d ).The covariance matrix S l a can be calculated as (according to Eq. 7) This matrix S l a correctly captures the covariance of the a priori state in the ln H 16  2 O , ln H 18 2 O , ln HD 16 O basis and it is important for a correct formulation of the cost function (1) and thus for setting up a correctly constrained optimal estimation retrieval of the water vapour isotopologue state.
Matrix S l a reveals the complex covariances between the state vector components of x l , in which it has to be considered that the entries of S l a hum are much larger than the entries of S l a δD and significantly larger than the entries of S l a d , which means that the nine nol × nol blocks of S l a have almost the same entries.The variations in ln H 16 2 O , ln H 18 2 O , and ln HD 16 O are strongly correlated and we cannot work with individual state vectors that report the states of the different isotopologues independently.Instead, we have to describe the full isotopologue state by a single state vector with the dimension nol × 3.This is the reason why MUSICA NDACC/FTIR water vapour isotopologue data cannot be provided in the same data format as other NDACC/FTIR S. Barthlott et al.: Water vapour isotopologue data from NDACC/FTIR spectra data.A slightly extended data format is needed, which is described in the next section.

The MUSICA (v2015) ground-based NDACC/FTIR retrieval setup
In Schneider et al. (2012), the MUSICA NDACC/FTIR retrieval setup, such as interfering gases and temperature fit, is described in great detail.Here, we focus on the modification made for the retrieval version (v2015).
For the previous retrieval version, we used 11 spectral windows with lines of water vapour isotopologues (see Fig. 2 of Schneider et al., 2012).For the final MUSICA retrieval version (v2015), we removed windows with strong H 16 2 O lines and replaced them by windows with weaker lines.By this modification, we want to make sure that even for very humid sites observed spectral lines do not saturate.Furthermore, we add a second window where a H 18 2 O signature dominates.The nine spectral water vapour isotopologue windows are depicted in Fig. 1 for an observation and fit, which are typical for the different NDACC stations.In addition, we fit the three spectral windows with the CO 2 lines, which is beneficial for atmospheric temperature retrievals (the CO 2 lines are the same as for the previous retrieval version: between 2610.35 and 2610.8, 2613.7 and 2615.4, and 2626.3 and 2627.0 cm −1 ; not shown).All of these spectral windows are observed within NDACC filter #3.
For v2015, we perform an optimal estimation of the {humidity, δD, d}-proxy states as explained in Sect.2.2.This is a further development of the previous retrieval version (there the optimal estimation was made for the {humidity, δD}-proxy states; Sects.3 and 4 of Schneider et al., 2012).Consistent with the previous version, we perform simultaneous but individual fits (no cross-constraints) for profiles of the water vapour isotopologue H 17 2 O, temperature, and interfering species CO 2 , O 3 , N 2 O, CH 4 , and HCl.
For the previous version, we used HITRAN 2008 line parameters (Rothman et al., 2009), whereas for v2015 we work with HITRAN 2012 parameters (Rothman et al., 2013).For both versions, the water vapour isotopologues parameters have been adjusted for the speed-dependent Voigt line shape (Schneider et al., 2011).For further optimisation of the HI-TRAN parameters, we used high-quality H 2 O and δD aircraft in situ profile references, coincident FTIR spectra, and FTIR spectra measured at three rather distinct sites (Izaña on Tenerife, subtropical ocean; Karlsruhe, central Europe; and Kiruna, northern Sweden).This method for line parameter optimisation by means of atmospheric spectra is described in detail in Schneider and Hase (2009) and Schneider et al. (2011).We changed line intensities and broadening parameters by about 5-10 %, which is in agreement with the uncertainty values as given in the HITRAN parameter files.More details on the modification of the line parameters as shown in Fig. 1 are given in Appendix B.
The empirical assessment study of Schneider et al. (2016) suggests an accuracy for the MUSICA (v2015) NDACC/FTIR H 2 O and δD products of about 10 % and 10 ‰, respectively.

Data set description
MUSICA NDACC/FTIR water vapour isotopologue data are available via the NDACC database and can be accessed via ftp://ftp.cpc.ncep.noaa.gov/ndacc/MUSICA/and via doi:10.5281/zenodo.48902.The data are provided as HDF4 files and they have been generated in compliance with GEOMS (Generic Earth Observation Metadata Standard).
For isotopologue data, a new metadata template, GEOMS-TE-FTIR-ISO-001, has been set up.It is almost identical to the GEOMS-TE-FTIR template (used for all other FTIR data provided via NDACC in HDF4 format) but has the additional variable "CROSSCORRELATE.N".For MUSICA data, this new variable has three entries: "H 16 2 O", "H 18 2 O", and "HD 16 O".This enables the three trace gases H 16  2 O, H 18 2 O, and HD 16 O to be provided as one full state vector (vector with nol × 3 elements) together with their full averaging kernels and full error covariances (both of (nol × 3) × (nol × 3) dimension) in one data file.To be compliant with GEOMS, the data are provided on a linear scale (and not on a logarithmic scale on which the inversion is performed; see Sect. 2).
All the isotopologue data have been normalised with respect to their natural isotopologue abundances.These natural abundances are 0.997317 for H 16  2 O, 0.002000 for H 18 2 O, and 3.106930 × 10 −4 for HD 16 O.This normalisation means that δD can directly be calculated as HD 16  The number of stations contributing to the MUSICA NDACC/FTIR data set is gradually increasing and Table 1 gives an overview of the current status (status for January 2016) together with mean DOFS (degrees of freedom of signal) for the two data types.The stations are well distributed from the Arctic to the Antarctic and in some cases offer data from the late 1990s onward.A further extension of this data set to other sites or for some stations to measurements made  in the beginning of the 1990s is feasible but was not possible within the MUSICA project.

Averaging kernels
An averaging kernel describes how a retrieved state vector responds to variations in the real atmospheric state vector.The MUSICA NDACC/FTIR isotopologue state vector consists of three trace gases and the corresponding full averaging kernel matrix A consists of nine blocks, each of which is a nol × nol matrix.In total, A has the dimension (nol × 3) × (nol × 3): The three blocks along the diagonal describe the direct responses: block A 11 for the response of the H

Error covariances
For error calculations, the same uncertainty sources are assumed for all MUSICA NDACC/FTIR stations and are grouped into statistical and systematic errors.An overview of the uncertainty assumptions is given in Table 2. Error covariances for each of these uncertainty sources are calculated according to Rodgers (2000).Total error covariances are ob- Figure 2. Column entries of the nine blocks of the full averaging kernel matrix A (see Eq. 10).Shown are some columns of the kernel matrix for 0.5 km (black), 2.4 km (red), 4.9 km (green), 8.0 km (blue), and 13.6 km (light blue).This kernel is for the retrieval with the spectrum and fit as shown in Fig. 1.
tained as the sum of the individual error covariances, calculated separately for statistical and systematic errors.Full error covariance matrices have the same matrix dimension as the full averaging kernel matrix and provide information about the errors and how the errors between different altitudes as well as between the three different isotopologues are correlated.

Column densities
Partial and total column densities are also provided as well as column sensitivity averaging kernels and column error covariances.Partial columns are calculated for the layers between the nol atmospheric levels.A full water vapour isotopologue partial column state vector has (nol − 1) × 3 elements (i.e.nol −1 elements for each of the three isotopologues).The column sensitivity averaging kernel has the dimension ((nol − 1) × 3) × 3 and describes the sensitivities of the total column retrieval product (columns of H 16 2 O, H 18 2 O, and HD 16 O) with respect to real atmospheric variations in the isotopologues' partial columns.Column error covariances are provided for statistical and systematic errors independently and are matrices with the dimension 3 × 3.They describe total column errors for the three isotopologues and how these errors are correlated.
All column data are calculated from mixing ratio data (mixing ratio state vector, error covariances, and averaging kernels).For a conversion of mixing ratios (ppmv) to number densities (molec.cm −3 ), we use the ideal gas law and work with retrieved temperature profiles.Temperature and pressure profiles are also provided in the HDF files.
Please note: since GEOMS requires the same number of elements of column data and the respective mixing ratio data, value 0 has been added for all column data values at level nol.

Recommendations for data usage
In order to be compliant with GEOMS, data are provided for the full state vector consisting of the H 16 2 O, H 18 2 O, and HD 16 O state vectors and in mixing ratios (ppmv).However, actually the MUSICA NDACC/FTIR retrieval works on a logarithmic scale and performs an optimal estimation of combined isotopologue states (see Sect. 2).These details have to be considered when working with these data.

Transfer to a logarithmic scale
For operations with averaging kernels (e.g. when adjusting model data to the sensitivity of a remote sensing system), it has to be considered that the retrieval works on a logarithmic scale, because only on this scale are linearity assumptions valid.Therefore, it is strongly recommended to transfer the averaging kernels to a logarithmic scale; in doing so, it has to be considered that the derivatives are calculated for the state as given by the retrieval state vector x.The full retrieval state vector consists of the retrieved H 16 2 O, H 18 2 O, and HD 16 O states, i.e. it is a vector with nol × 3 elements x i (with i between 1 and nol × 3).The full averaging kernel matrix has the dimension (nol × 3) × (nol × 3). Figure 2   full averaging kernel matrix provided on a linear scale (A).It has the entries a i, j (i and j are between 1 and nol × 3, where i is the row index and j the column index).Entry a i, j is a derivative, describing how the retrieved state vector element i responds to a variation in the real atmospheric state vector element j .Since ∂ ln x = ∂x x , entry a i, j has to be modified for operations on a logarithmic scale as follows: Here, a l i, j are the entries of the full averaging kernel matrix transferred to a logarithmic scale (A l ), which is plotted in Fig. 3.This matrix A l is the averaging kernel matrix for a scale on which the linearity assumptions are valid and on which kernel operations should be performed.
For the purpose of error analyses, it is also very useful to transfer the error covariances on a logarithmic scale.On this scale, the δD and deuterium-excess error covariances can be made available in an elegant manner (see Sect. 4.4).The covariance matrix entry s i, j is the covariance between the state vector elements i and j (the value of these elements is x i and x j , respectively).On a logarithmic scale, the covariance matrix entry is The covariance matrices on a logarithmic scale (S l ) are composed of the entries s l i, j .

Comparison studies
For many purposes, remote sensing data may be compared to other data.For instance, they may be compared to vertically resolved observational data in order to empirically assess the quality of the different data sets, or they are compared to model data in order to investigate model performances.For such comparisons, it is important to consider the sensitivity of the remote sensing system.While highly resolved profile data or model data generally capture atmospheric signals well even on rather small scales, remote sensing data report atmospheric signals according to the averaging kernel.In order to make different data sets comparable, we have to convolve the data with the averaging kernels.A full water vapour isotopologue state vector obtained from highly resolved profile measurements or model calculations (x l data ) will be observed by a remote sensing system as x l data, conv : Here, x l a means the full water vapour isotopologue a priori state vector on a scale.ing the averaging kernel blocks A l 11 , A l 12 , and A l 13 (these blocks are depicted in Fig. 3).

4.3
(see Eqs. 3 and 4).Here, index a identifies a priori data.Like all full isotopologue state vectors, x l RS has nol × 3 components.The first section of nol components is determined by the in situ measured ln H 16 2 O profile.The second section of nol components is also determined by the in situ measured ln H 16 2 O profile, but we have to consider an uncertainty of these vector components according to the uncertainty covariance matrix 1 8 2 (S l a δD + S l a d ) (see Eq. 14).The uncertainty is due to the fact that a radiosonde does not measure H 18 2 O.The third section of nol components is similarly determined by the in situ measured ln H 16 2 O profile, but with an uncertainty according to the uncertainty covariance matrix S l a δD (see Eq. 15), in which the uncertainty is caused by missing HD 16 O radiosonde measurements.Hence, a ln H 16  2 O radiosonde profile is equivalent to a ln H 16  2 O remote sens-ing observation of in which there is an uncertainty in the equivalency of which is due to missing radiosonde observations of H 18 2 O and HD 16 O.
Equation 16 reveals that the nol × nol averaging kernel matrix A l H 2 O = A l 11 + A l 12 + A l 13 is a good proxy for the remote sensing system's sensitivity with respect to H 16 2 O and thus water vapour in general (about 99.7 % of all water vapour molecules are H 16 2 O isotopologues).The columns of the matrix A l H 2 O are plotted in Fig. 4.

Utility of the {humidity, δD, d }-proxy basis and data a posteriori processing
The MUSICA NDACC/FTIR retrieval performs an optimal estimation of the humidity, δD, and deuterium-excess proxy states.Although optimal estimation of these states is made in a single retrieval process, it is made for each of the three states independently, meaning, for instance, that the estimation is not optimal for the {H 2 O, δD} pairs.The reason is that the remote sensing system's sensitivity for the humidity state is much higher than for the δD-state (and significantly higher than for the deuterium-excess state).The problem becomes clearly visible by transforming the full averaging kernel matrix onto the {humidity, δD, d}-proxy basis: Earth Syst.Sci.Data, 9, 15-29, 2017 www.earth-syst-sci-data.net/9/15/2017/ S. Barthlott et al.: Water vapour isotopologue data from NDACC/FTIR spectra 23 13 , with the A l blocks as plotted in Fig. 3.
Here, A l is the full averaging kernel matrix in the {humidity, δD, d}-proxy basis.It is depicted in Fig. 5, which clearly reveals larger sensitivity for humidity (kernel block A l 11 ) than for δD (kernel block A l 22 ).Data with this characteristic cannot be used in the context of {H 2 O, δD}-pair distribution analyses.
A transformation of the error covariance matrices onto the {humidity, δD, d}-proxy basis (for transformation operation, see Eq. 7) is very helpful for analysing the error characteristics of the H 2 O, δD, and deuterium-excess data because error covariances expressed in the {humidity, δD, d}-proxy basis are good proxies for the error covariances of H 2 O, δD, and deuterium excess (for more details see discusion in Sect.4.2 of Schneider et al., 2012).

A posteriori processing for a quasi-optimal estimation of H 2 O, δD pairs
During MUSICA, an a posteriori processing method for obtaining a quasi-optimal estimation product of {H 2 O, δD} pairs has been developed.The a posteriori processing brings about a moderate reduction in the H 2 O sensitivity and in the δD cross dependencies on H 2 O (Sect.4.2 in Schneider et al., 2012).The operation has to be performed with logarithmicscale full state vector, averaging kernel matrix, and covariance matrices (x l , A l and S l ): A and Here, x l * , A l * , and S l * are a posteriori-processed logarithmic-scale full state vector, averaging kernel matrix, and covariance matrices, respectively.Operator P is introduced by Eq. ( 5) and the a posteriori operator C is Figure 6 depicts the a posteriori processed averaging kernel A l * .It is obvious that the processing ensures that the sensitivity for humidity (kernel block A l * 11 ) and for δD (kernel block A l * 22 ) are almost identical.Data with this characteristic are provided on a linear scale in the "ftir.iso.post.h2o"HDF files and they are well suited for {H 2 O, δD}-pair distribution analyses.

A posteriori processing for a quasi-optimal
estimation of H 2 O, δD, d triplets Figure 6 reveals that kernel block A l * 33 is still rather different from the kernel blocks A l * and A l * 22 .Furthermore, there is a rather large cross dependency of δD on deuterium excess (kernel block A l * 32 ).This means that a posteriori correction with operator C according to Eq. ( 22) is not sufficient for providing deuterium excess data that can be analysed together with H 2 O and δD and that is of sufficient quality.For such purpose, the a posteriori treatment has to be stronger, which can be achieved by using the following operator C: Figure 7 depicts the a posteriori processed averaging kernel A l * by using C from Eq. ( 23).This treatment ensures that the three blocks A l * 11 , A l * 22 , and A l * 33 are almost identical.Nevertheless, the retrieved deuterium excess shows still some cross dependency on δD (averaging kernel block A l * 32 ).Currently, these data are not provided via the NDACC database, mainly because it has so far not been possible to empirically prove the quality of this deuterium-excess remote sensing data.In any case, interested users are welcome to investigate the quality of these data.The required a posteriori processing can be made by using the data provided in "ftir.iso.h2o"HDF files and by following the description given in this paper (if unclear, please contact the MUSICA team).

Conclusions
For a correct optimal estimation retrieval of the full water vapour isotopologue state, we have to consider that atmospheric variations in different isotopologues are strongly correlated.This strong correlation is then also present in the retrieved state vectors and it has to be considered when interpreting averaging kernels and error covariances.As a consequence, it makes little sense to provide different isotopologues in the form of individual states and via individual data sets.Instead, it is essential that water vapour iso-topologues are made available as single full state vectors together with their full averaging kernels and error covariances.The standard GEOMS metadata template for FTIR data on the NDACC database (called GEOMS-TE-FTIR) does not allow data to be provided in such a format and a slight extension of the standard FTIR template has been made (the modified template is called GEOMS-TE-FTIR-ISO-001).The MUSICA NDACC/FTIR data are now available in this new data format on the NDACC database and via doi:10.5281/zenodo.48902.The extended template can also

Appendix A: Mathematical foundations of atmospheric remote sensing
Atmospheric remote sensing means that the atmospheric state is retrieved from the radiation measured after having interacted with the atmosphere.For a mathematical treatment, vectors are used for describing the atmospheric state and the measured radiation.Hence, vector and matrix algebra is the tool used in the context of remote sensing retrievals and remote sensing product characterisation.In the following, we briefly explain the connections between vector and matrix algebra and atmospheric remote sensing that are relevant for our paper.For a more detailed introduction, please refer to Rodgers (2000).For a general introduction to vector and matrix algebra, dedicated textbooks should be consulted (e.g.Herstein and Winter, 1988;Strang, 2016).
A1 Atmospheric state vector and measurement vector (x and y): We are interested in the vertical distribution of H 16 2 O, H 18 2 O, and HD 16 O and our atmospheric state is a vector whose components represent the H 16 2 O, H 18 2 O, and HD 16 O concentrations at different atmospheric altitudes.The atmospheric state vector is generally written as x.Our measurements are high-resolution infrared spectra and the different spectral bins of a measured spectrum are represented by the components of the measurement vector.This vector is generally written as y.
A2 A priori state vector and covariance matrix (x a and S a ) The interaction of radiation with the atmosphere is modelled by a radiative transport model (also called a forward model, F ) and allows relating the measurement vector and the atmospheric state vector by We measure y and are interested in x.However, a direct inversion of Eq. (A1) is generally not possible, because there are many atmospheric states x that can explain one and the same measurement y.This means we face an ill-posed problem and for the inversion we need an additional constraint or regularisation.This leads us to the optimal estimation method.For this method we set up a cost function that combines the information provided by the measurement with a priori known characteristics of the atmospheric state (see Eq. 1 in Sect.2.1).This a priori knowledge about the atmospheric state is collected as the mean and covariances in form of the a priori state vector (x a ) and the a priori covariance matrix (S a ).By minimising the cost function (Eq.1), we get the most probable atmospheric state for the given measurement, i.e. the optimally estimated atmospheric state.

A3 Averaging kernel matrix (A)
The averaging kernel relates the real atmospheric state to the atmospheric state as provided by the remote sensing retrieval (see, for instance, Eq. 13).The averaging kernel matrix element (i, j , i.e. element in line i and column j of the matrix) reveals how a change in the real atmospheric state vector component j will affect the retrieved atmospheric state vector component i.The averaging kernel is an important component of a remote sensing retrieval and it is calculated by the retrieval code by combining the forward model calculations with the retrieval constraints: K is the Jacobian matrix (derivatives that capture how the measurement vector will change for changes in the atmospheric state) and provided by the forward model.S and S a are measurement noise covariance matrix and a priori atmospheric state covariance matrix.
A4 Basis systems for describing the atmospheric state A coordinate system for a vector space is called a basis and consists of linearly independent unit vectors.Vector spaces can be equivalently described by different bases.One possibility for describing the atmospheric water vapour isotopologue state is to use three independent basis systems for the H 16 2 O, H 18 2 O, and HD 16 O state vectors.This is what we call in the paper the ln H 16 2 O , ln H 18 2 O , ln HD 16 O basis.Another possibility is to describe the atmospheric water vapour isotopologue state by means of three basis systems, whose respective vector spaces vary almost independently in the real atmosphere.This is what we call in the paper the {humidity, δD, d}-proxy basis.The transformation between the two different basis systems is achieved by the transformation operator P (see Eq. 5).
Both basis systems are equivalent, but using both of them helps to correctly constrain the inversion problem and to adequately describe the characteristics of the remote sensing product.The retrieval processor works in the ln H 16 2 O , ln H 18 2 O , ln HD 16 O basis and we need to formulate S a in this basis according to Eq. ( 9).Furthermore, for the NDACC database we provide the data in this basis (or actually in the H 16 2 O, H 18 2 O, HD 16 O basis; for transfer from a logarithmic basis to linear basis, see Sect.4.1) in order to achieve compliance with standard metadata templates.However, it is the {humidity, δD, d}-proxy basis system in which the data product can be best described.That is the reason why in the paper we switch between the two basis systems and between logarithmic and linear scales.
plots the Earth Syst.Sci.Data, 9, 15-29, 2017 www.earth-syst-sci-data.net/9/15/2017/ Mean Ocean Water), where Two different data types are provided.The first type is stored in HDF files called "ftir.iso.h2o".These files report the best estimate of the H 16 2 O state, but its usefulness for isotopologue studies (e.g. in the context of {H 2 O, δD}-distribution plots) is significantly compromised.The second type is stored in HDF files called "ftir.iso.post.h2o".This data type should be used for analysing {H 2 O, δD}-distribution plots.Section 4 gives more details on the different data types.

Table 1 .
List of current MUSICA NDACC/FTIR sites (ordered from north to south) and available MUSICA data record.DOFS (type 1) reports the typical trace of A l H 2 O and DOFS (type 2) that of A l * 11 ≈ A l * 22 for optimal estimation of {H 2 O, δD} pairs (example kernel plotted in Fig.5).

Table 2 .
Uncertainty sources used for the error estimation.The second column gives the assumed uncertainty value and the third column the assumed partitioning between statistical and systematic sources.
Column entries of the nine blocks of the logarithmic-scale kernel matrix in the {humidity, δD, d}-proxy state (A l ; see Eq. 18).Same as Fig.5but after a posteriori processing with C from Eq. 22 (A l * ; see Eq. 20).