An updated and improved version of a global, vertically
resolved, monthly mean zonal mean ozone database has been calculated –
hereafter referred to as the BSVertOzone (Bodeker Scientific Vertical Ozone) database. Like its predecessor, it
combines measurements from several satellite-based instruments and ozone
profile measurements from the global ozonesonde network. Monthly mean zonal
mean ozone concentrations in mixing ratio and number density are provided in
5

Ozone is a greenhouse gas, and changes in stratospheric ozone concentrations
have an effect on surface climate. Ozone changes can result in direct
radiative forcing changes

Vertically resolved ozone databases are not only useful for prescribing ozone concentrations in climate models but can also be used for climate model evaluation and development. When chemistry–climate models (CCMs) are run with specified dynamics (e.g., wind and temperature fields from reanalyses), the resulting ozone distributions are as close to the real atmospheric distribution of ozone as CCMs can be expected to simulate them. Comparisons with observation-based, vertically resolved ozone databases can reveal possible model deficiencies in simulating chemical and dynamical processes. Diagnosing the problems occurring in the specified dynamics model simulations can inform a process-oriented validation of a free-running CCM and thereby improves the quality of projections.

In preparation for the World Meteorological Organization/United Nations
Environment Programme (WMO/UNEP) Scientific Assessment of Ozone Depletion
2014, the communities involved in making satellite-based and ground-based
ozone measurements decided to intensify the preparation of ozone databases
that consist of multiple data sources from different platforms for
stratospheric ozone variability investigations and trend detection. Several
ozone databases were created that combine (merge) measurements from (i) the
same type of instruments that were flown on different satellites

When combining measurements from different data sources, providing realistic
uncertainty estimates on every value of the final data product (either
calculated from the different data sources or estimated using statistical
methods) becomes more and more complex. However, realistic estimates of
uncertainties on every datum are necessary to be able to estimate resultant
uncertainties on ozone trends calculated from those data. This is
particularly important when seeking to detect the small but expected signal
of ozone recovery due to the reductions in ozone-depleting substances

Here, BSVertOzone v1.0 (Bodeker Scientific Vertical Ozone, hereafter referred
to as BSVertOzone) is described, which is an update and further developed
version of the BDBP (Binary Database of Profiles) v1.1.0.6 that is described
in

The original BDBP v1.1.0.6

In late 2012, an improved and updated version of the SAGE II data set was
released (version 7.0;

As a final quality check, ozone values from all data sources were used to
calculate monthly mean zonal mean climatologies at each level and latitude
bin. If individual values in these latitude bins exceed the respective mean
by 3

BDBP v1.1.0.6 covered the period 1979 to 2007 since the main satellite data
sources for that database, i.e., SAGE II and HALOE, ended in 2005. After 2005,
only ozonesonde profiles were included in BDBP v1.1.0.6. To extend
BSVertOzone to 2016, it was necessary to add new satellite measurements and
additional ozonesonde profiles to the database. To ensure sufficient overlap
with SAGE II and HALOE measurements, preference was given to instruments that
provide measurements starting in 2005 or earlier and extend to the end of
2016. Although having a somewhat broader vertical resolution than the other
data sources used in the BDBP v1.1.0.6 database, the large data quantity and
high data quality of the Microwave Limb Sounder makes it an attractive
target data source for incorporation into the BSVertOzone database. The
usefulness of MLS ozone data has already been shown through its use in
several other combined ozone databases, e.g., GOZCARDS

The MLS instrument sits on NASA's Aura satellite, which was launched in
mid-July 2004 and remains operational to date (see Fig.

The screening of the MLS ozone measurements is based on the official MLS v4.2
data description document provided by JPL
(

One of the main foci of BSVertOzone is tracing all sources of uncertainty
from the individual measurements through to the final monthly mean zonal mean
ozone values. The uncertainty estimates that are provided with the individual
measurements obtained from each satellite instrument shown in Fig.

Temporal

Ozonesondes remain the only source of ozone measurements throughout the
troposphere (see Fig.

The monthly mean zonal means comprising the BSVertOzone
database are provided in 5

For simplicity, any reference to either a specific geopotential height or pressure level is hereafter referred to as “level”.

Quantifying offsets and drift between different
measurement systems can be made far more robust by using an independent data
source, especially when temporal and spatial coincidences between the two
measurement systems are sparse. If the independent data source has high
spatial and temporal sampling and covers the combined range of the two
measurement systems to be homogenized, it can be used as a transfer standard.
The independent data source does not need to be quantitatively exact but does
need to capture the spatial and temporal morphology of the underlying
measurements. Output from either a chemistry–climate model that has
been nudged towards observed meteorology or output from a
chemistry-transport model can meet these requirements. The bias and drift
correction applied here is based on a homogenization approach that uses a
regression model together with global vertically resolved ozone
concentrations as simulated by the chemistry-transport model TOMCAT/SLIMCAT
for the period 1980 to 2016 (see Fig.

TOMCAT/SLIMCAT (hereafter referred to as SLIMCAT) is an
offline three-dimensional (3D) chemistry-transport model (CTM)

Using CTM output as an evaluation and adjustment tool for coarsely
distributed global ozone measurements is not a novel idea. In

Flow chart describing the different modification and adjustment steps
that are applied to the ozone measurements before they are used in the monthly
mean zonal mean calculation. Note that ISBC refers to the inter-satellite bias
correction, which is described in Sect.

As shown in several recent studies

The homogenization of the satellite-based measurements that contribute to
BSVertOzone is a sequential process where each measurement from a selected
satellite instrument is adjusted with respect to the standard, hereafter
referred to as the inter-satellite bias correction, ISBC (see
Fig.

Calculate differences between individual ozone measurements from the standard and the CTM-simulated ozone values.

Calculate an error-weighted, latitude-weighted (based on 1

Fit a linear regression model to the calculated monthly mean zonal mean differences
(hereafter referred to as modeled differences) to obtain an analytical representation of
the difference field that can be evaluated at any latitude and time;
(

Repeat steps 1 to 3 using the measurements from the target new data source that
requires bias correction to obtain an analytical representation for its difference
field:

Calculate an adjusted measurement at the location and time of a given satellite measurement using

Incorporate the adjusted measurements

Repeat steps 1 to 6 for all data sources to be included in the BSVertOzone database.

The number of Fourier and Legendre polynomial expansions used to model the
monthly mean zonal mean differences depend on the individual differences
provided as input to the regression model, as each satellite instrument
provides measurements with a different spatial and temporal coverage. The
number of Fourier and Legendre polynomial expansions used to model the
difference field is made adaptive to avoid overfitting of the model. Four
Fourier pairs and eight Legendre polynomials are the default expansions for
the offset term while four Legendre polynomials and no Fourier pairs are the
default values for the trend term. If the chosen default expansions result in
overfitting of the difference fields, which is determined by testing whether
the maximum and minimum values of the modeled field do not exceed 150 % of
the maximum and minimum value of the original data, a new candidate model is
generated by decreasing the degree of one of the expansion (e.g., one scenario
would have three instead of four Fourier pairs for the offset term, with the
rest of the offset and trend expansions remaining unchanged). The Akaike
information criterion

In the generation of a homogenized data set, in addition to adjusting
measurements from different measurement systems to account for bias and
drifts, the uncertainties on those measurements also need to be revised since
the application of these adjustments introduces additional uncertainty.
Following error propagation rules, the uncertainty on the adjusted
measurements

While this re-evaluation of the measurement uncertainties does not include
the effects of uncertainties in the CTM output, the effects of CTM
uncertainties on the adjustment of the satellite data are minor, since the
CTM data are only used as a transfer standard and, as can be seen from
Eq. (

A bootstrap method

Fit the regression model (as described above) to the monthly mean zonal means of the differences between the measurements of the standard and the CTM data.

Subtract the regression model fit from the difference field to obtain the residuals.

To each residual data point, add a randomly sampled value from a normal distribution with a mean of zero and a standard deviation of the uncertainty on the monthly mean zonal mean differences associated with the selected data point. This step represents the influence of measurement uncertainty on the residuals.

For each monthly mean zonal mean bin, randomly select one modified residual value and add it to the monthly mean zonal mean of the differences. Do this for all bins to generate a new difference data set which, while having the same underlying structure as the original signal, now has different random noise. Then fit the regression model again, resulting in a new modeled difference field.

Repeat steps 2–4 many times (e.g., 200) to generate 200 estimates of the
modeled difference field. Calculate the standard deviation of those 200 modeled
difference fields to obtain the estimated uncertainty on the modeled difference field

To create a homogeneous database, each measurement and its
uncertainty, on a specific level, and in a specific latitude band, is
adjusted using the ISBC method described above. The uncertainties on the
corrections applied are included in the total uncertainty for each individual
data point. For the final calculation of monthly mean zonal means values,
additional data filtering was applied, similar to the filtering described in

Monthly mean zonal mean ozone mixing ratios from different data
sources (color coded as shown in the legends) at 182

Monthly mean zonal mean ozone values obtained from different data sources
pre- and post-homogenization, for example, level and latitude band, are
shown in Fig.

The time series of ozone in Fig.

The calculated monthly mean zonal mean ozone time series, combining all
measurements from different data sources, are shown in Fig.

The homogeneous database of monthly means zonal means constitutes Tier 0 of BSVertOzone.

Monthly mean zonal mean ozone mixing ratio at 182

To generate a global, gap-free monthly mean zonal mean
ozone data set, all values from the Tier 0 database are used, and the
missing monthly mean zonal mean values are estimated from correlations
against a total column ozone (TCO) database. This Tier 0.5 data set includes the
full range of measurement variability and is created as an intermediate step
for the calculation of the Tier 1 data where a least squares regression
model is used to attribute variability to various known forcing factors for
ozone (Sect.

The first step in creating the Tier 0.5 data set is to regress the monthly
mean zonal mean ozone at 20

The TCO database used here is described in detail in

The regression model fit coefficients in Eq. (

The result is a pre-filled ozone data set, where filled values include some
indication of the true month-to-month variability as suggested by the TCO
month-to-month variability. Monthly mean zonal mean ozone values at
20

This Tier 0.5 data set was then used as input to a least squares regression model to generate the Tier 1.1 to Tier 1.4 data sets described in the next section. It describes the full natural variability and is therefore particularly useful for CCM evaluation studies when the model runs with prescribed dynamics.

Monthly mean zonal mean ozone mixing
ratios at 20

The methodology to generate the Tier 1.1 to Tier 1.4
data sets is much the same approach as the one described for the previous version
of the database (BDBP v1.1.0.6) in

The least squares regression model that was applied to the Tier 0.5 data
consists of eight basis functions, viz.

a constant offset that is expanded in a Fourier series to represent the mean annual cycle;

an EESC (equivalent effective stratospheric chlorine) term that differs with age of air;

a linear trend term;

a quasi-biennial oscillation (QBO) basis function that was specified as the monthly mean 50

a second QBO basis function, that is mathematically orthogonalized to the first, to account for QBO lag variations with latitude and level;

an El Niño–Southern Oscillation (ENSO) term;

a solar cycle term; and

a Mt. Pinatubo term that accounts for the enhancement of stratospheric aerosols after the Mt. Pinatubo eruption in 1991.

Similar to

Tier 1.1 (anthropogenic): This data set is calculated by summing up the contributions from the offset, EESC, and linear trend basis functions.

Tier 1.2 (natural): This data set is calculated by summing up the contributions from the offset, QBO, ENSO, and solar cycle basis functions.

Tier 1.3 (natural and volcanoes): This data set is calculated by summing up the contributions from the offset, QBO, ENSO, solar cycle, and Mt. Pinatubo volcanic eruption basis functions.

Tier 1.4 (all): This data set is calculated by summing up the contributions from all basis functions.

Tier 0 and 0.5 of BSVertOzone for the latitude band 30 to
35

Same as Fig.

As the Tier 1.x data sets are output from a regression model, they do not capture real-world year-to-year variability, only the variability for which basis functions are included in the regression model. These data sets are optimized for the use in comparisons with CCM simulations that do not exhibit the same unforced variability as reality. They can be used for different purposes, e.g., to compare ozone radiative forcing with and without the effects of changes in EESC and greenhouse gases on ozone.

Ozone concentrations as extracted from BDBP v1.1.0.6 Tier 1.4 (red line) and BSVertOzone Tier 1.4 (blue line) for three different pressure levels and three different latitude bands.

Comparisons between SWOOSH (black line), BSVertOzone Tier 0.5 (blue line), and BSVertOzone Tier 1.4 (red line) for three different latitude bands and three different pressure levels.

Two examples of the Tier 0 database, and Tier 0.5, together with the
differences between Tier 1.x and Tier 0.5 data sets for the latitude bands
30 to 35

The Tier 0 and Tier 0.5 data sets both show considerably more variability
than the Tier 1.4 data set as the regression model is not capable of tracking
all of the variability in Tier 0 and Tier 0.5 (non-systematic differences
shown in the lower panels in Figs.

Due to the implemented improvements in the
construction of the BSVertOzone database over the previous version BDBP
v1.1.0.6, differences between both databases are to be expected. Ozone
concentrations from 1979 to 2016 as extracted from Tier 1.4 of both databases
at three different pressure levels and three different latitude bands are
shown in Fig.

Most measurements available for the latitude band and level described here
(85 to 90

Comparisons between BSVertOzone Tier 0.5 and Tier 1.4 are in very close
agreement, as would be expected. Tier 0.5 shows more interannual variability
since its missing monthly mean zonal mean values are filled with regression
model output describing the relationship between monthly mean ozone values
from Tier 0 and monthly mean total column ozone values (see Sect.

BSVertOzone v1.0, which is described in this paper, is
archived and publicly available at Zenodo (Zenodo is a research data
repository that was created by OpenAIRE and the European Organization for
Nuclear Research, CERN) with the DOI number

An updated
and further developed version of the vertically resolved ozone database, the
BDBP v1.1.0.6

As for the BDBP, BSVertOzone provides different tier data sets

Tier 0 contains the monthly mean zonal mean values that are directly calculated from the individual (adjusted) data sources, containing data gaps where no measurements were available.

Tier 0.5 monthly mean zonal means represent an intermediate filled
data set that is calculated from Tier 0 data. Missing monthly mean zonal
mean ozone values are filled with regression model output describing the
relationship between monthly mean ozone values from Tier 0 and monthly mean
total column ozone values obtained from the total column ozone database
described in

Tier 1.1 to Tier 1.4 are based on multiple linear regression model output. They differ in the combination of the contributions of the different basis functions used in the regression model. The ozone variability in these data sets is reduced compared to Tier 0 and Tier 0.5, since it describes only the variability for which basis functions were included in the regression model. Especially Tier 1.4 is therefore well suited for evaluating CCM output, where the CCM is not nudged to real-world dynamics.

A clear improvement compared to BDBP v1.1.0.6 is the provision of uncertainty estimates on each monthly mean zonal mean for all tiers. These uncertainties combine the uncertainties that are provided with each individual measurement and the uncertainties introduced by applying the homogenization method. The provided uncertainties are essential for more realistic comparisons with CCM simulations, and results of ozone variability analyses can be interpreted with more information about the confidence in the results.

There are several improvements that could be implemented when preparing the
measurements and for the used homogenization method. In the current version
(v1.0) of BSVertOzone, the global troposphere is only covered by ozonesonde
profile measurements. These profiles are available for many decades (see
Sect.

Measurements from MLS are the only source of stratospheric ozone data in the last 10 years which were included in the current version of the database. As long as MLS is active and measures ozone, BSVertOzone will be updated regularly to include these MLS measurements. However, when MLS stops measuring ozone, alternative, and possibly additional, data sources for stratospheric ozone will need to be added to BSVertOzone to ensure a continuous time series of vertically resolved ozone into the future. Measurements from NASA's Ozone Mapping Profiler Suite (OMPS), from SCISAT's Atmospheric Chemistry Experiment Fourier transform spectrometer (ACE-FTS), or from the recently launched SAGE III instrument on the International Space Station (ISS) would be possible candidates to be included in BSVertOzone.

In addition to including more ozone measurements from different instruments, there
are some improvements in the processing of the measurements that are planned
to be implemented in the future. Firstly, as the CTM output used here as a
transfer standard to homogenize the satellite and ozonesonde measurements has
a temperature bias due to the underlying meteorological ERA-Interim
reanalysis (see Sect.

All available measurements for each latitude band, each level, and each month
are most likely not evenly distributed spatially and temporally, which can
result in a skewed (unrepresentative) monthly mean value and an
underestimation of the monthly mean uncertainty. The individual ozone
measurements should therefore undergo a spatial and temporal bias correction
before monthly mean zonal means are calculated, to represent the monthly
distribution correctly. Additionally, it might be necessary to consider
possible existing spatial and temporal autocorrelations between individual
data points. As mentioned in Sect.

In the upper stratosphere and mesosphere, ozone formation and destruction
happens so fast that it follows the availability of sunlight. As a results,
diurnal variations in ozone concentrations are observable in the upper
stratosphere and lower mesosphere

BH wrote the paper with support and input from all other co-authors. BH, SK, GEB, and JL were involved in combining the satellite measurements and performed the several computations required for the generation of the BSVertOzone database. KN was involved in developing the software for the homogenization method described in the paper. Comparison between the BSVertOzone and SWOOSH data sets was performed by SMD The SLIMCAT simulations were performed by MPC and SSD, who also provided the model output data for this study. MD assisted in writing the paper and provided valuable discussions around the methodology.

The authors declare that they have no conflict of interest.

This work was conducted under subcontract to NIWA under the Deep South National Science Challenge (CO1X1445). The SLIMCAT modeling work was supported by the NERC National Centre for Atmospheric Science (NCAS). We thank Wuhu Feng for help with SLIMCAT. The simulations were performed on the national Archer and Leeds ARC HPC facilities. We would like to thank Lucien Froidevaux for providing the measurement uncertainties of MLS ozone data and many helpful discussions. We would also like to thank two anonymous reviewers for their helpful comments. The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association. Edited by: David Carlson Reviewed by: two anonymous referees