Quality control procedures and methods of the CARINA database

Quality control procedures and methods of the CARINA database T. Tanhua, S. van Heuven, R. M. Key, A. Velo, A. Olsen, and C. Schirnick Leibniz-Institut für Meereswissenschaften, Marine Biogeochemie, Kiel, Germany Department of Ocean Ecosystems, University of Groningen, Groningen, The Netherlands Atmospheric and Oceanic Sciences Program, Princeton Univ., Princeton, NJ 08544, USA Instituto de Investigaciones Marinas CSIC; Eduardo Cabello 6, 36208 Vigo, Spain Bjerknes Centre for Climate Research, UNIFOB AS, Allégaten 55, 5007 Bergen, Norway and Department of Chemistry, University of Gothenburg, 41296 Göteborg, Sweden Received: 18 June 2009 – Accepted: 4 August 2009 – Published: 20 August 2009 Correspondence to: T. Tanhua (ttanhua@ifm-geomar.de) Published by Copernicus Publications.

assure the highest possible quality and consistency. All CARINA data were subject to primary QC; a process in which data are studied in order to identify outliers and obvious errors. Additionally, secondary QC was performed for several of the measured parameters in the CARINA data base. Secondary QC is a process in which the data are objectively studied in order to quantify systematic differences in the reported values. 10 This process involved crossover analysis, and as a second step the offsets derived from the crossover analysis were used to calculate corrections of the parameters measured on individual cruises using least square models. Significant biases found in the data have been corrected in the data products, i.e. three merged data files containing measured, calculated and interpolated data for each of the three regions (i.e. Arctic, 15 Atlantic and Southern Ocean). Here we report on the technical details of the quality control and on tools that have been developed and used during the project, including procedures for crossover analysis and least square models. Furthermore, an interactive website for uploading of results, plots, comments etc. was developed and was of critical importance for the success of the project, this is also described here.

Introduction
CARINA is a database of carbon and carbon-relevant data from hydrographic cruises in the Arctic, Atlantic and Southern Oceans. The project started as an essentially infor- 15 mal, unfunded project in Delmenhorst, Germany, in 1999 during the workshop on "CO 2 in the North Atlantic", with the main goal to create a uniformly formatted database of carbon relevant variables in the ocean to be used for accurate assessments of oceanic carbon inventories and uptake rates. The collection of data and the quality control of the data have been a main focus of the CARINA project. Both primary and sec-20 ondary QC of the data has been performed. Experience with the previous GLODAP synthesis project (Key et al., 2004) demonstrated that a consistent data product can be produced containing data from many different cruises by many different laboratories in different regions. But it is essential that the results obtained by the different methods of quality control are compared and systematically assessed. This assessment in real time. In addition to the interaction over the web portal and email communication, 3 workshops were held where practical matters and the adjustments of data were discussed, and agreed on. This report provides an overview of the methods and techniques used for the quality control of the database. A more comprehensive description of the complete CARINA database can be found in Key et al. (2009) as well as in other 10 papers in this special issue.

Data Provenance and Structure
The CARINA database includes data and metadata from 188 oceanographic cruises or projects, i.e. entries to the cruise summary table (http://cdiac.esd.ornl.gov/oceans/ CARINA/Carina table.html). A few of the cruises listed in the table are collections of 15 several cruises or time series stations. In addition, 52 reference cruises are included in the secondary QC to ensure consistency with historical data (i.e., WOCE/GLODAP). These reference cruises are not included in the CARINA data base, but on several occasions are suggestions for adjustments made that are different from those applied to the GLODAP data base. Due to the volume of the data set, the CARINA data 20 are divided in three regions: Arctic Mediterranean Seas (AMS); Atlantic Ocean (ATL), and Southern Ocean (SO), each of them with a working group that carried out the secondary QC. A few of the CARINA cruises are common to the ATL and SO groups, and a few cruises are common to the ATL and AMS groups. In these cases there has been agreement between the groups on all adjustments. This provides a consistency 25 control in the secondary QC between the CARINA working groups.
The CARINA database consists of two parts: the first part is the individual cruise files 208 ESSDD 2,2009 Quality control procedures and methods of the CARINA database T. Tanhua  where all the data reported by the measurement teams are stored. Quality flags are accompanying the data. In many cases the flags are those originally reported, in others cases the flags were assigned by R. Key. These files are in WHP (WOCE Hydrographic Program) exchange format. The first lines of each file are the condensed metadata. There are no calculated or interpolated values in the individual cruise files, except for 5 pressure calculated from depth. No adjustments have been applied to any of these values with the exception that all pH measurements were converted to the seawater pH scale at 25 • C. Recently, the international ocean carbon measurement community decided that new pH measurements would be reported on the total hydrogen scale, but that decision occurred too late to be incorporated into this project.

10
The second part of CARINA consists of three merged data files; one each for the Atlantic Ocean, Arctic Mediterranean Seas and Southern Ocean regions. These files contain all the CARINA data judged to be "good", and also include: 1) interpolated values for nutrients, oxygen and salinity if those data were missing and the interpolation could be made according to certain criteria, as described in Key et al. (2009); and 2) 15 calculated carbon parameters; e.g. if Total Carbon Dioxide (TCO 2 ) and Total Alkalinity (TA) were measured, pH was calculated, 3) instances where bottle salinity was missing or bad were replaced with CTD salinity, if possible. Values for any of these cases have been given the quality flag "0". In many cases there are additional parameters in the individual cruise files which have not been included in the secondary QC, such as 20 ∆ 14 C, δ 13 C and SF 6 . Some of these are included in the merged data files as well.
A significant and time consuming part of the CARINA synthesis was the data collection, merging of subsets of data from individual cruises and conversion of units to a common format/standard, see Key et al. (2009). The next step in the synthesis was the quality control (QC). Our quality control procedures are comprised of two distinct 25 steps. First the reported measurements are studied in order to identify outliers and obvious errors, i.e. primary QC. Secondly, we quantify systematic differences in the reported values in a process called secondary QC. Essentially, primary QC is a check of precision and secondary QC is a check of accuracy. These QC processes were per-209 ESSDD 2,2009 Quality control procedures and methods of the CARINA database formed on the data sets reported by the measurements teams, and are distinct from the quality assurance (QA) procedures originally performed by each cruise measurement team in order to ensure sufficient quality, which is part of the data collection and analysis procedures.

5
Primary QC is a process in which data are studied in order to identify outliers and obvious errors. While the methods used to identify questionable or bad data are objective, the actual flag assignment is subjective and is also highly dependent on the overall measurement precision of each parameter for each cruise. These outliers are either flagged, or the data were revised via direct contact with the data generators, for 10 instance calibration of total alkalinity values with respect to CRM that had been analyzed but not certified for TA by the time the cruise was carried out. During the WOCE program a system was developed to indicate the quality of each measured datum in a cruise data set. The system amounts to flagging each datum with an integer. Nine different flag values were defined (0-9, see Table 1 for definitions). This system, which 15 has been continued in subsequent major ocean sampling programs and was used for the GLODAP data synthesis, was adopted for CARINA. It is important to note that data flagging or primary quality control deals only with data precision. The techniques used to assign these flags are usually insensitive to data bias particularly in cases where all of the measurements of any parameter for a cruise are biased relative to results from 20 other cruises. A large number of the CARINA cruises predate the custom of data flagging. Consequently, as the raw data files were accumulated, flags were initialized with flag value 2 (good) for each measured variable (except temperature) or 9 (not measured) when there was no measured value. Subsequently, the various measured parameters were 25 analyzed with the same software and techniques used for modern cruises. Essentially, various property-property plots are examined for small groupings of stations looking ESSDD 2,2009 Quality control procedures and methods of the CARINA database Interactive Discussion for outliers. Notes were kept for each outlier in each plot. Measurements that were outliers, generally in more than one type plot, were flagged questionable (3) or bad (4) with the difference between these two flags being one of degree. Whenever possible, the data values flagged 3/4 were reported back to the person originally responsible for the measurements for confirmation. In cases of disagreement, the choice of the data 5 generator was used. In general whenever a datum was borderline between good and questionable, the good flag was retained. Additionally, during the flagging significantly more variability was allowed for points that were in the upper thermocline or nearbottom. Near-surface values were virtually never flagged 3/4 with the only exception being totally unrealistic values and/or obvious clerical errors.

10
This method of data flagging is quite subjective regardless of who assigns the flags. With a very few exceptions all of the CARINA cruises were flagged at Princeton (R. Key). This does not mean that the flags were assigned correctly, but does ensure that the assignment was done as consistently as possible and additionally that the flags were consistent with those assigned to the carbon parameters during preparation of 15 GLODAP (Key et al., 2004). The only flags not assigned at Princeton were associated either with very recent cruises (generally WOCE or CLIVAR) or with data streams from labs which had participated in WOCE and assigned flags to their own data routinely (mostly CFC data from M. Rhein and/or R. Steinfeldt). Whenever data were submitted with flags, the flag values were simply checked for obvious/clerical errors. The CARINA 20 data product incorporates one additional flag with value zero (0). This flag was also used in GLODAP. The zero flag indicates a datum that "could have been measured", but was approximated in some manner. There are three different uses for the zero flag in the data products: 1) Instances where bottle salinity was missing or bad and consequently replaced with CTD salinity, 2) interpolated values for salinity, oxygen or 25 nutrients, or 3) for calculated carbon parameters.
The secondary quality control procedures used here (discussed below) critically examine the data using techniques quite different from the routine primary QC methods. In some cases additional data points are identified that are apparently spurious. In 2,2009 Quality control procedures and methods of the CARINA database Interactive Discussion these cases the unusual data were reported back to Princeton to have the initial flag values reconsidered. Once all of the flag values are final, each cruise file was submitted to national data centers (CCHDO and CDIAC).

Secondary quality control
Secondary quality control is a process in which the data are objectively studied in order 5 to quantify systematic biases in the reported values. The identified data biases are then subjectively compared to predetermined accuracy limits. Special consideration is given to the fact that some of the regions studied are known to have had real temporal change over the time period covered by the various cruises . Obviously, one does not want secondary QC to "erase" real temporal change. The nature of the 10 secondary QC procedure is such that various data recording errors/outliers are also identified in the process, thus complementing the initial primary QC. Data from cruises that show significant bias are given an adjustment (either multiplicative or additive), that is applied to the data product (i.e. the merged data files for the three regions), but not to the individual cruise files (those remain as reported with measured data). Data with 15 lower than acceptable quality are also identified; if the data is questionable, these data will not be included in the data product (but will remain in the individual cruise files, with appropriate quality flags). The parameters considered in the secondary QC are salinity (and in a few cases CTD salinity), oxygen, nitrate, phosphate, silicate, alkalinity, total inorganic carbon (DIC), pH, CFC-11, CFC-12, CFC-113 and CCl 4 .

20
The most important tool in the secondary quality control of the CARINA data was the crossover analysis. Other approaches that were used include multiple linear regression (MLR) analysis and relation between measured parameters, such as CFC-11 vs. CFC-12. This report focuses on the crossover analysis and the least square models (inversions) that followed to determine the corrections/correction factors needed to ESSDD 2,2009 Quality control procedures and methods of the CARINA database

Crossover analysis
Crossover analysis is an objective comparison of deep water data from one cruise with data from other cruises in the same area. Crossover analysis has been preformed earlier on, for instance, the WOCE and JGOFS data set (e.g. Sabine et al., 1999;Gouretski and Jancke, 2001;Johnson et al., 2001;Sabine et al., 2005), see also http: 5 //cdiac.esd.ornl.gov/oceans/glodap/crossover.html, where the concept were laid out. These results have increased the internal consistency of, for instance, the GLODAP data set. In CARINA we have used the basic concept of crossover analysis. The result of a crossover analysis is an offset. Offsets are defined as the difference between two cruises, A and B, derived from a crossover analysis. If the offset 10 for cruise A (relative to cruise B) is less than zero (or unity, for multiplicative parameters), then cruise A data would have to be increased in order to be consistent with cruise B (or vice versa). The offset were quantified as multiplicative factors for nutrients, oxygen and CFCs, and as additive constants for salinity, DIC, and alkalinity. There are several reasons for this division between additive and multiplicative offsets. 15 Firstly, multiplicative offsets eliminate the problem of potentially negative values for any variable with measured concentration close to zero, i.e. in the surface water for nutrients, or oxygen concentrations in low oxygen areas. Also, for nutrients and oxygen analysis, problems in standardization are the most likely source of error, hence a multiplicative offset is deemed as appropriate. For DIC, alkalinity and salinity an additive 20 adjustment seemed most likely, due to, for instance, biases in the reference material used. Similarly, since pH is a logarithmic unit, only additive offsets can be considered. In the crossover analysis, clustering or cluster analyses was often preformed. This refers to a subroutine in the crossover analysis for cruises that fall in more than one region, or for crossovers that cover such a large and diverse area that the crossover is 25 divided into more "sub-crossovers", i.e. clusters. When we discuss crossover analysis in the following, clustering is included, i.e. the crossover analysis between two cruises takes the geographic distribution of the cruises into consideration, either manually or ESSDD 2,2009 Quality control procedures and methods of the CARINA database automatically. The station distribution in CARINA was such that the definition of "same area" was variable and defined subjectively on a case by case basis, but the compared stations was normally within 2 degrees of latitude, i.e. ∼222 km. Mostly, the two cruise tracks are crossing each other; hence the name "crossover analysis", but it can also be repeat or parallel cruise tracks, which is often the case for the CARINA data. We 5 identified and analyzed more than 2100 individual crossovers for the CARINA data set.
Since the upper water column is more sensitive to variability on various time-scales than the deep ocean, only the deep part of the water column was considered for the analysis. For the ATL and SO regions, this minimum depth (i.e. pressure) was 1500 dbar. However, due to the deep mixed layers sometimes found in the AMS region, the minimum depth was set to 1900 dbar for the Nordic Seas region as this is deeper than the ventilation depths observed over the time span covered by CARINA (Ronski and Budeus, 2005). The crossover analysis was preformed on either pressure, density (i.e., sigma-4), or potential temperature surfaces. To account for vertical shifts of properties (i.e., internal waves etc.), the crossover analysis are normally done 15 on density surfaces. However, in the Nordic Seas crossover analysis was made on depth surfaces due to the small density gradients there. Arguments for the use of theta over sigma include the superior measurement accuracy of temperature and its independence of other parameters (especially biases in salinity are impossible to clarify in sigma-space, since sigma itself is a function of salinity). One of the collections 20 of software-routines that were developed for the CARINA effort determined offsets in each of the three spaces simultaneously, and its results allow for comparison. The quality of the determinations of offsets, as expressed by the standard deviation, was generally highest using density, followed closely by theta, and with comparisons in depth-space regularly yielding spurious results. In most cases however, there were 25 only small differences in the offsets calculated in the three different ways, which gives us some confidence in the analysis, see below.
As a first step in the crossover analysis, the profiles of the parameter in question for all stations included in a crossover were interpolated with a Piecewise Cubic Hermite ESSDD 2, 2009 Quality control procedures and methods of the CARINA database Interpolating scheme. An important feature of this algorithm is that interpolated values almost never exceed the range spanned by the data points. That is, the Hermitian function has low tendency to "ring". Large vertical gaps in the data were not interpolated, the definition of "large" being depth-dependent; larger "gaps" were allowed in the deeper part of the profile. In the case of density, the profiles were interpolated in such a 5 way that the interpolated values were roughly equally distributed in depth space. Practically this was done by letting a density profile that was typical for the area determine the interpolation distances in density space. This was done in order to have sufficient weight (or "interpolated data points") in the deep water column were the properties of the water are supposed to change only slowly, and were the density gradients are often 10 small. Essentially, we distinguish between three routines for the crossover analyses carried out during the CARINA project: the manual crossover routine, and the two automatic routines; running-cluster and cnaX-scripts, see below. These codes and routines developed during the project, so that manual crossover results obtained in the early part 15 of the project are not necessarily made with the same assumptions as those made in the later part of the project. This applies also for the automated routines, but in this case the full data set was analyzed again when new routines were developed. However, even if the codes are slightly different, the results were mostly very similar, as we will see below.

20
Due to the large number of crossovers in CARINA, the task of manually generating crossovers and entering the results in an online table soon become overwhelming in terms of workload. Even though the process of manually generating the crossovers lead to quality checked results, the process of manually entering the values in a table is prone to typos and errors that might bias the results of the inversions. Even though 25 the automatically generated crossovers were, in general, used for the inversions, the manually generated crossovers were invaluable as reference points in the decision process for suggesting adjustments. The matlab routines used for the secondary QC of CARINA, as well as all figures generated during the secondary QC can be viewed ESSDD 2,2009 Quality control procedures and methods of the CARINA database

Manually generated crossovers
For the manually performed crossovers, an average profile (and its standard deviation) was calculated using all stations in each cruise that were closer to any station of the other cruise than the minimum horizontal distance. For this averaging process, the 5 interpolated profiles were used. These two average profiles were then compared to each other in a second step, and the weighted offset and standard deviation of the crossover were calculated. The disadvantage of this method is that the crossover can cover large area with potentially very different hydrography, for instance for repeat cruise tracks, and the offset for the crossover might thus be biased. In this case, groups of stations for each cruise within a hydrographical regime were sought, and a crossover was divided into several sub-crossovers, "clusters". The analyst then had the choice of using the average of all clusters to calculate the crossover offset, or to discard clusters in areas of known high variability, such as just south of the Greenland-Scotland ridge. The analyst then entered the results from the crossover analysis to the 15 CARINA website, and uploaded the figure for the crossover.

The "Running cluster" -crossover routine
The crossover version called "running-cluster" was mainly used for automatically generated crossovers. In this routine the interpolated profile from each station in cruise A was compared to each interpolated profile from all cruise B stations within the max-20 imum distance for a valid crossover, and a difference profile was calculated for each such pair. This process was repeated for each station in cruise A which normally results in several, up to hundreds for a repeat section, difference profiles. The crossover offset and its standard deviation was calculated as the weighted mean and standard deviation of the difference profiles of each crossover pair, i.e. the part of the profile with 25 low variability weigh higher in the calculation (often, but not always, the deeper part of Interactive Discussion the profile). This way of performing the crossover analysis has the advantage that only individual stations within the minimum horizontal distance are compared to each other. Hence, even for repeat cruises covering several different oceanographic regimes, the offset between these cruises can be calculated in a straight forward manner. An example of a crossover is shown in Fig. 1.

cnaX crossover routine
This routine is essentially a fully automated implementation of the manual approach described above. Please note that, for the sake of readability, this paragraphs deal only with the determination of offsets in depth-space, but the software concurrently calculates offset profiles in depth and theta-space as well. The "cnaX " collection of 10 matlab-routines was developed over several months during the CARINA project and incorporates many of the concepts mentioned above into a fully automatic analysis of the CARINA input dataset. The level of automation is variable and is set pre-run by the user. In its fastest form, user input is restricted to setting thresholds for criteria relating to cruise-and sample inclusion and the automatic quality assessment of results. When 15 set up, a full run from raw-data loading to output of a final set of recommendation for cruise parameter adjustments takes about 8 h on a regular computer for the complete CARINA data set. This includes the production of several tens of thousands figures and dozens of tables, useful for tracing the script's steps to determine problem areas. At the highest level of user interaction, each of the steps concerning clustering and quality 20 assessment require user input; a full run may take up to several days to complete this way. This method generally results in a somewhat reduced amount of uncertainty in the final output since badly determined offsets that may slip through the automatic quality check can be caught by the user manual interaction. In either form, the routine first loads all individual cruise data. During this step the 25 user is allowed to specify subdivision of certain cruises if sufficient reason exists to assume changes in instrument calibration during, for instance, port calls or instrument changes. for each cruise for possible user inspection. After this a crude scan is made to check which cruises share a geographical area, and therefore may cross each other. During a subsequent refining step these potential matches are more closely analyzed. Stations are only considered for possible relevance for crossover analysis when certain criteria for parameter availability, minimum sample depth and number of samples are met. This approach saves considerable time over a full station-vs.-station proximity check.
Knowing which cruises can be compared with which other, i.e. cruise-pairs; a clustering method is employed to group stations from cruise A in close proximity to stations of cruise B into distinct geographical clusters. This clustering is either user-controlled, allowing the analyst to specify further clusters, reduced clustering or discard clusters, or 10 fully automatic mode in which stations are progressively further clustered until the spatial dimensions of each cluster meet pre-set criteria. Limits also apply to the minimum number of stations per cluster that should be retained. The results of the clustering operations are stored in tabular form as well as drawn on maps. After all cruise-pairs have had their 'crossover stations' geographically clustered, offsets are determined for 15 each parameter for each cluster. As a first step the profiles are interpolated to about 75 levels in the range between 2000 m to the deepest sample of either of the cruises, with gradually decreasing intervals with increasing depth. Offsets are determined and expressed both as additive offsets and multiplicative offsets taking considerations outlined above into account. These calculations closely follow the procedure outlined by 20 Johnson et al. (2001).
After this "discrete clusters"-method of determining offsets, a "running cluster" approach is taken (as detailed above and comparable to the "running cluster"-routines) for another determination of offsets. For cruise-pairs in which both cruises have several dozens of stations in close proximity to stations of the other cruise (e.g. repeat 25 lines) the running cluster routine potentially results in several hundred station-pairs. Of these only the 100 most closely spaced station-pairs are further considered. The measurements at each of these stations are interpolated to ∼75 depth levels. For each station-pair the offset profile is determined by subtraction/division of B's interpolated ESSDD 2,2009 Quality control procedures and methods of the CARINA database Interactive Discussion profile from/by A's. The offset profiles are subsequently averaged to get the mean offset profile (MOP) and an associated offset standard deviation profile (OSDP) for this cruise-pair. The interpolated values in the MOP are averaged and weighted by the OSDP, resulting in a weighted mean offset (WMO) for this cruise-pair. The uncertainty of the value thus determined is expressed by the associated weighted mean 5 offset standard deviation (WMOSD). It is these last two values that are used as input for the inversion. As mentioned above, the routine concurrently determines WMO and WMOSD in depth-, theta-and sigma4-space. The determined WMO with the smallest associated WMOSD (generally the one in sigma4-space), is considered to be the best estimate of the cruise-pair's offset.

10
The overall quality of the offset determined this way is assessed through the use of several conditional statements considering number of stations, samples and difference profiles, and the OSDP. Alternatively, the user can manually rate the quality of the determined offset. This quality assessment and the WMOSD are later used in the inversion for weighing the offsets. The running-cluster -routine was determined to yield 15 superior results to the discrete-cluster -routine as it offers more rapid processing and is more objective. The results of the cnaX routines were made available to the analysts on the website and used for determination of adjustments.

Inversions
A second step in the secondary QC uses the offsets and standard deviations derived 20 from the crossover analysis to calculate corrections of the parameters measured on individual cruises using least square models (Wunsch, 1996), i.e. inversions, following the methodology described in Johnson et al. (2001). The inversion produces a correction factor (for multiplicative corrections) or a correction (for additive corrections) for each of the cruises in the analysis. Indiscriminate application of these factors might 25 produce the most uniform data set, but this would also remove real temporal trends, an undesirable side effect. Therefore the corrections and correction factors were manually evaluated; those that were actually applied to the data product are called adjustments, 219 to avoid confusion with the corrections suggested by the inversion process. Johnson et al. (2001) presented three models of different complexity to adjust five parameters for World Ocean Circulation Experiment (WOCE) cruises in the Pacific Ocean. The conclusion of Johnson et al. (2001) was that the intermediate complexity model Weighted Least Squares (WLSQ) performed most satisfactory for this analysis.

5
In this model the standard deviation of each crossover is included in the calculation, but no a priori assumptions are made regarding the quality of the measurements. In the slightly more complex model, Weighted Damped Least Square (WDLSQ), a priori assumptions on the quality of the data are made; essentially the maximum allowed range of adjustments is set for each cruise, the model error. As Johnson et al. (2001) 10 point out; this limitation tends to decrease the adjustments of individual cruises on cost of the overall performance of the model. Finally, the simplest of the models, Simple Least Squares (SLSQ), do not take the uncertainty of the offset values into account, and is considered too simple.
For instance, both the work of Johnson et al. (2001) and Lamb, et al. (2002) dealt 15 with a set of cruises with presumable similar quality since they all aimed at meeting the WOCE Hydrographic Program (WHP) standards. The CARINA data base, in contrast, is a collection of cruises that covers more than 2 decades with different scope and standards of the measurements. The quality of the measurements is more heterogeneous for CARINA than for the WOCE data set. Therefore, the CARINA group 20 preformed both WLSQ and WDLSQ inversions of the crossover results, and the results from both were considered in determining the adjustment of a cruise. Since there is such a heterogeneity among the CARINA data, for the North Atlantic and Southern Ocean areas, a number of core cruises were identified, see above, where the quality of the data were, in general, assumed to follow WHP targets. The core cruises thus 25 includes the reference cruises (i.e. WOCE/GLODAP cruises) for the SO area; for the NA area a number of non-reference cruises were additionally included as core-cruises (Tanhua et al., 2009). These core cruises were generally given low model error for the WDLSQ analysis whereas the non-core cruises were given high model errors, i.e. 2,2009 Quality control procedures and methods of the CARINA database Interactive Discussion the corrections of the non-core cruises were allowed to be larger than those for the core cruises, Table 2. In this way, the model tended to adjust the CARINA dataset towards the values of the core-cruises. In most cases, however, there were only small differences between the WLSQ and WDLSQ inversions. As the CARINA project didn't blindly follow the inversion results we used both the WLSQ and WDLSQ models for the 5 analysis; see Fig. 2 for an example of the result of an inversion for the multiplicative parameters for the CARINA-ATL data set. There are some important additions/alterations to the inversion methods used in CARINA compared to Johnson et al. (2001). Firstly, since the time-span over which the hydrographic surveys took place is large (up to 3 decades) and trends in deep 10 water properties can be expected on this time frame over large parts of the CARINA domain, a time factor, K T , was weighted into the inversions. This factor was calculated as unity plus the time in years between the two cruises in a crossover times 0.1, i.e. K T =1+∆ year×0.1, and multiplied to the standard deviation of a crossover. This reduces the impact of a crossover on the inversion if the time elapsed between the two 15 cruises is large. One can argue that the time factor should be larger for more active parts of the ocean, for instance the Labrador Sea, than in "calmer" parts of the Ocean, such as the subtropical eastern Atlantic. However, since such classification tends to be rather arbitrary and difficult to implement, no such area dependent weighting was conducted for CARINA. This was rather done manually by the analyst in the final deter-20 mination of the adjustment. That is, data from a "variable" area were generally allowed larger offsets (i.e. higher suggested corrections from the inversion) than data from a "quiet" area of the ocean before an adjustment was accepted.

Determination of adjustments
Many potential sources of uncertainty can complicate an otherwise straightforward as- 25 sessment of cruise biases. Examples include: 1) temporal variability and long-time trends in parameter values on a particular location in the ocean, 2) drifting or variable measurement precision and accuracy during a cruise, 3) profile interpolation errors, 221 and 4) differences between routines used to determine offsets. Since biases cannot be determined with absolute accuracy, the CARINA working group agreed to correct only biases greater than a certain threshold value, Table 2. These thresholds reflect the expected minimum bias that should be possible to determine with reasonably certainty. For highly variable regions or for cruises with few deep data points, larger uncertainty 5 was allowed in the manual evaluation of corrections. Aided by the corrections suggested by the inversions and the offsets of all crossovers for a cruise/parameter combination, the potential bias for each cruise and parameter was scrutinized by the analyst. Particular emphasis was put on crossovers with core cruises and from crossovers with good statistics (i.e. larger number of stations/samples) in "quiet" parts of the ocean (i.e. with less variability). In many cases, additional evidence was considered by the analyst, such as the relation to another parameter (e.g. nitrate/phosphate or CFC11/CFC12 ratios) and whether or not certified reference material (CRM) was used. Finally, an adjustment was suggested by the analyst and entered into the online adjustment table. Only significant adjustments were 15 applied, generally rounded to the nearest full percent, µmol/kg or ppm (for salinity). This somewhat subjective process makes the CARINA project different from most previously published consistency analyses of hydrographic data. The subjectivity possibly makes CARINA more prone to errors made by the analyst, but at the same time makes the CARINA adjusted data potentially more robust. In case of doubt, the CARINA 20 team always tried to err of the side of not making an adjustment. The CARINA adjustment decision mechanism was similar to GLODAP, but the CARINA mechanics used to quantify the adjustment were much more sophisticated than those of GLODAP.
The lines of evidence for an adjustment (or the absence thereof) can be found at http://cdiac.ornl.gov/oceans/CARINA/CARINA QC.html in the form of comments and 25 relevant figures, see Sect. 6. These adjustments were vetted during the final CARINA workshop in Paris in June of 2008. Additionally, a second crossover analysis and inversion was made using the adjusted CARINA data. Any adjustments larger than the threshold were scrutinized again by the analysts. In a few cases, this step revealed ESSDD 2, 2009 Quality control procedures and methods of the CARINA database cruises for which the adjustment needed revision. A few changes to the adjustment agreed on during the Paris meeting were made in consultation between the three group leaders.

Evaluation of the methods used
Since a few different routines and approaches were used for the secondary QC we have 5 evaluate the consistency of the different methods and analyzed how these differences affect the secondary QC.

Manual vs. automatic crossovers
Both manually and automatically generated crossovers were used for the analysis. In order to evaluate any biases between the two methods we have plotted the difference 10 between manually performed crossovers vs. crossovers generated automatically with the running-cluster routine for the Atlantic subset of CARINA in Fig. 3. The manual results are found in the crossover table on the CARINA website, see below. These crossovers were generated by a number of different analysts, potentially using different maximal horizontal length scales, clustering, minimum depth etc. Even though there are a number of data points that are significantly different from zero, the mean difference is close to zero for all of parameters except phosphate. The few data points with large deviations from zero difference are due to miss-entered values in the crossover table and crossovers that were performed on few data points, i.e., that have low significance. It is clear that there are some differences in the results from the manually and 20 automatically generated crossovers but that the overall difference is small, although it could potentially be important for some cruises. 2,2009 Quality control procedures and methods of the CARINA database

Automatic crossover routines
The offsets determined by the two different automatic crossover routines discussed above (Sects. 4.1.2 and 4.1.3) are compared to each other to evaluate any biases, Fig. 4. The first observation is that the cnaX routine generally found more crossovers than the RunningCluster (RC) routine. This is mostly due to the spatial distance within 5 which to search for crossovers being larger for cnaX than for RC; cnaX will thus find more crossovers at an increased risk of introducing errors due to spatial variability. The second observation is that performance is different for different parameters. For instance, determinations of oxygen-offsets by the two routines are in agreement to approximately 1%, while silicate offsets are determined with a much lower agreement of about 7% (measured as the standard deviation of the difference from the two methods), Fig. 4. Given that each routine operate equally for all parameters, the cause for the difference must be caused by the silicate data itself. Profile interpolation is assumed not to be cause of the problem since the vertical sample distribution is in most cases identical for nutrients and oxygen, both generally being sampled at full resolution. A 15 possible cause of the difference is high station-to-station variability of silicate values either due to analytical difficulties or large natural variability of silicate values in the deep water. This would make the result of the offset determinations strongly dependent on the stations included in the crossover. The second alternative is certainly possible considering the large difference in silicate concentrations in the southern and northern 20 end-members of the Atlantic deep waters. In this case the different "search radius" would be important for the result. On the other hand, offsets determined for alkalinity are in very good agreement between the two routines, suggesting that although large biases between cruises exist for this parameter, the alkalinity values within a cruise is measured with a constant bias. 25 Taking the RMSE of the difference between the offsets determined by the two routines as an upper limit of detection for biases, we conclude that the thresholds used by the CARINA team for applying adjustments are on the optimistic side for silicate and ESSDD 2,2009 Quality control procedures and methods of the CARINA database

Adjustments
As discussed above, the results of the inversions were only used as a guide by the analyst when determining the adjustments, and almost no adjustments below the thresh-5 old were applied to the data product. The inversions often suggest corrections that are smaller than these limits. In order to evaluate the effects of the threshold, we analyzed how the weighted mean offsets for all crossovers and parameters in the CARINA data set are affected by using two different sets of adjustments; in the first case all the corrections suggested by the inversion were blindly applied to all the data, in the second case we applied only adjustments larger than the threshold (see Table 2). The overall performance is somewhat better if all corrections are applied, Table 3, but the difference between the two cases is not particularly large. Thus, application of all corrections suggested by the inversion results in the most internally consistent dataset, but implicitly suggests a confidence in the adjustments that is not warranted for a relatively small 15 gain in internal consistency. An example of the effect of the adjustments on the offsets for individual crossovers can be seen in Fig. 5 where the offsets of all crossovers for alkalinity are plotted; before any adjustments are applied as well as after adjustments larger than the threshold are applied. The mean absolute offset clearly decreases by application of the adjustments.

20
We have also directly compared the suggested corrections derived the cnaX scripts with the adjustments that were actually applied to the CARINA data product, Fig. 6. The differences between the corrections and adjustments are a measure of the subjectivity of the CARINA secondary QC. There is generally a reasonable agreement between adjustments and corrections. However, this should not be considered to be an indica- 25 tion of the correctness of the adjustments since the two measures are not independent -the results from the inversions were generally followed with small modifications. Note several points with an adjustment value of zero but where a significant correction has 225 ESSDD 2, 2009 Quality control procedures and methods of the CARINA database been suggested by the inversion (x's in Fig. 6), particularly for cruises conducted in areas of known high temporal variability tend. The reverse is also true, although not as often, i.e. even if the inversion suggests a correction smaller than the threshold, an adjustment has been made. Further inversions could have been performed (with the adjusted system) to find new model solutions that iteratively maximizes the internal 5 consistency of the system. A first step in this iterative process was taken, but as we found that the result did not significantly improve the result above the level of uncertainty, this was not further pursued.

The web-based crossover workspace
The CARINA team included a large number of scientists from all over the world working 10 simultaneously on the quality control and rapid communication of individual efforts and results were important for the success of the project. To facilitate this an interactive internet based platform was developed. To the user, the website provides important functions and tables of which the crossover and adjustment tables are the most important. We will first briefly discuss some of the functionality of the website from a user's 15 perspective, and will then describe its basic architecture. A non-user interactive version of the website, with all the information used by the CARINA team, is available at http://cdiac.ornl.gov/oceans/CARINA/Carina inv.html. The crossover table provides the interface to enter offset values for crossovers for any of the 14 parameters considered for the secondary QC. Also generic information 20 to a crossover, such as position, number of stations etc. can be entered. Furthermore, files can be uploaded and comments can be posted. This allows several investigators to work on crossover analysis simultaneously without duplicating work and enables communication of information. The adjustment table is similar in its functionality. Adjustment values, files and comments can be entered or uploaded for any of the 14 parameters, or to the cruise as a whole. Furthermore, a quality flag (either "good" or "poor") is assigned for each cruise/parameter combination. The user can search the ESSDD 2, 2009 Quality control procedures and methods of the CARINA database There is also a link to the relevant readme (i.e. condensed cruise metadata) for each cruise entry in the adjustment table.
The cruise and ship tables provide easy access to basic information for each cruise 5 or ship, if that information is not found in the readme files. More importantly, it provides means to keep track of the aliases for different cruises, i.e. old versions of the EXPOCODE or project names associated to a cruise (this information is also displayed in the adjustment table). The information in the crossover and adjustment tables can be exported as csv-files than can be used by other applications. For instance, the ad-10 justments are used for creation of the merged data product where the adjustments in the table are applied to the data in the individual cruise files, or the manually entered crossover values can be downloaded for the inversions. Another important aspect of the website is the possibility to post larger files and data volumes. This enable, for instance, rapid upload of inversion results that cannot be attributed to any specific cruise. 15 Also draft versions of manuscripts etc. were posted on the website. The CARINA website project started with the initial requirement of at least two tables; one where crossover results could be stored, and one for adjustment values of individual cruises. Supplemental information such as figures, comments and ReadMe files could be stored on the platform as well and be linked to submitted data. It soon 20 became clear that this website would accumulate thousands of individual data points for crossovers and adjustments, and even more supplementary files and updates of these. All of this would then have to be available to all users during the data compilation and evaluation process. Manual creation or maintenance of contents and links would thus be impossible. Moreover, the need arose to batch submit a large number 25 of automatically generated figures and data calculated by user scripts which should be automatically processed and reflected throughout the applications. The greatest demand (and challenge) was the linkage between an individual datum and multiple supplementary information with the ability to "share" these relations in other contexts. 2,2009 Quality control procedures and methods of the CARINA database For example, a supplemental file uploaded to the crossover analysis of salinity (which involves two cruises) should be available in the context of the salinity adjustment value for each of the two cruises, and whenever the file is updated it would have to be reflected in every shared relation. To accommodate these demands, we decided in favor of open source software in order to be able to freely use and distribute the application, 5 particularly after termination of this project and in case that offline usage is needed. We chose to use Ruby on Rails, which is based on the object-oriented programming language Ruby, as web application framework and PostgreSQL as relational database for storage of data and information snippets. Uploaded files are stored in the file-system, while their respective metadata are kept in the database.

10
According to the Ruby on Rails framework, the CARINA application is implemented using the model-view-controller architecture (MVC) which provides an out-of-the-box basic skeleton of all necessary methods to create, read, update or delete (CRUD) datasets of a particular model (i.e. a corresponding table) and to build HTML forms or pages to edit or display datasets in the end user's browser. Due to the conventions of the Rails framework, all methods necessary for a quick and effortless implementation of links between a datum and its supplementing information are available as soon as the database models are setup such that they reflect the real world relations of the material in use. It is not mandatory, but the usage of AJAX (Asynchronous JavaScript and XML) in the CARINA web application greatly improved usability and performance.

20
All information of the CARINA web application is stored in a total of eleven database tables: ships, countries, cruises, crossovers, regions, adjustments, attachments, comments, postings, readmes and users. Datasets are then handled by Rails as objects with automatically created methods. This provides access to both a dataset's parameters (i.e. columns) and to other related datasets; these are also treated as objects. 25 This allows syntactically simple access to the parameters/attributes (i.e. columns) of datasets and their supplements as well as to the relations between different models. It also avoids formulation of any SQL-based queries or handling of interactions with the back-end database. We have used polymorphic relations for comments, readmes, 2,2009 Quality control procedures and methods of the CARINA database Interactive Discussion postings and attachments. This allows a single model (e.g. attachment representing an uploaded file) to be used for datasets of different models to which files can be uploaded (i.e. attached). In the CARINA site, most models can thus have multiple files uploaded to a single dataset. We even extended this feature by an additional attribute for attachment and comment records holding the parameter to which an uploaded file 5 or a comment is attached. Thus, a file uploaded to a crossover can be specifically "attached" to any parameter, for instance salinity.

ESSDD
In numerous cases crossovers are assigned to more than one working group (see Sect. 2 on data provenance). It was required that each working group could store values for their regional crossover separately. This is accomplished by insertion of 10 two additional crossovers as subsets with a simple parent-child relation to the parental crossover (parent id as foreign key) representing the two data provenances. Members of either working group can click a dynamically inserted link in the list view whenever subsets are present and both subsets are then retrieved from the server and displayed in the context without reloading the entire page. One of the subsets is always assigned 15 as primary subset yielding a priority for automatic display of values; as long as a parameter value is not finalized and entered in the primary (i.e. parent) dataset, the web application displays the value of the parameter from the primary subset or, if no value has been entered, from the secondary subset. A trailing "c" as subscript to the value indicates such a copy. Another parent-child relation is used for the adjustments table 20 when a cruise or campaign may have to be split in sections due to temporal variations in the proper adjustment value (e.g. mid-cruise change of standards or different instrument performance after a port call). In this case multiple copies of the dataset are inserted, but as there is no unique solution to this case, multiple datasets are either exported or displayed and sorted according to the sets of stations which apply. 2,2009 Quality control procedures and methods of the CARINA database Interactive Discussion of the uploading user, and will "attach" each discovered plot file to the appropriate crossover (or adjustment). Uploaded figures are commonly provided in PostScript format, unsuitable for display in web browsers. These files are automatically converted to PNG formatted files using the ImageMagick utilities and available for quick views of the figure via an autogenerated weblink while the original PostScript file is only sent to 5 the user when a download is explicitly requested. Special forms allow users to make a selection of crossovers or adjustments which they wish to export and download. An entry which is not a number may be exported as a string (e.g. NaN) or as a special number (e.g. −999) based on individual user settings. Similarly, overview lists of all comments posted to each adjustment can be generated and saved to disk.       Figure comparing offsets determined by the cnaX method and the Running-Cluster (RC) method for the SO and ATL areas (black dots with gray error bars). The square in the middle of the figures indicate the minimum adjustment that were applied to the data. The numbers in the upper left corner states the number of crossover that the different methods found; the numbers in the lower right corner states the R2 value of the linear fit and and the rmse of the difference between the methods. The methods are generally in good agreement, with the exception for silicate, where the standard deviation of differences between the two methods is around 7%. See text for discussion. 2,2009 Quality control procedures and methods of the CARINA database   Figure showing the offsets for individual crossovers for alkalinity as determined by the cnaX routine. The gray circles are the offsets before adjustment; black crosses the offsets after application of the adjustments suggested by the inversion that are larger than the threshold value. Both sets of offsets are sorted independently of each other, but the uncertainty of the crossovers is only shown for the uncorrected crossovers. The right panel shows the relative distribution of offsets (gray line before adjustment; black line after adjustment). This analysis includes also the GLODAP reference cruises.  . 6. A comparison between the corrections suggested by the cnaX routine and the adjustments applied to the CARINA data product. Black dots denote that an adjustment was applied; black crosses that no adjustment was applied. The square in the middle of the figures indicate the minimum adjustment that were applied to the data. The numbers in the upper left corner states the number of cruises for which a correction was suggested by the cnaX method (N cnaX ) and the number of adjustments applied to the CARINA data (N carina ); the numbers in the lower right corner states the R 2 value of the linear fit and and the rmse of the difference between the methods for those cruises where an adjustment was applied.