The CARINA data synthesis project: introduction and overview

. The original goal of the CARINA (Carbon in Atlantic Ocean) data synthesis project was to create a merged calibrated data set from open ocean subsurface measurements by European scientists that would be generally useful for biogeochemical investigations in the North Atlantic and in particular, studies involving the carbon system. Over time the geographic extent expanded to include the entire Atlantic, the Arctic and the Southern Ocean and the international collaboration broadened signiﬁcantly. In this paper we give a brief history of the project, a general overview of data included and an outline of the procedures used during the synthesis. The end result of this project was a set of 3 data products, one for each of the listed ocean regions. It is critical that anyone who uses any of the CARINA data products recognize that the data products are not simply concatenations of the originally measured values. Rather, the data have been through an extensive calibration procedure designed to remove measurement bias and bad data. Also a signiﬁcant fraction of the individual values in the data products were derived either by direct calculation or some means of approximation. These data products were constructed for basin scale biogeochemical investigations and may be inappropriate for investigations involving small areal extent or similar detailed analyses. More information on speciﬁc parts of this project can be found in companion articles in this issue. In particular, Tanhua et al. (2010) and Tanhua (2009) describe the procedures and software used to remove measurement bias from the original data. The three data products and a signiﬁcant volume of supporting information are available from the CARINA web site hosted by the Carbon Dioxide Information Analysis Center (CDIAC: http: // cdiac.esd.ornl.gov / oceans / CARINA / Carina inv.html). Anyone wanting to use the data is advised to get the highest version number of each data product. Incremental versions represent either corrections or additions. The web site documents speciﬁcs of the changes. cycle in these two parameters is retained. The near surface DIC data from this region have the same trend. Similar analyses with other parameters and other regions demonstrate that the 2nd QC procedure has not “erased” strong temporal signals. Investigation of more subtle signals such as the expected temporal increase in near surface DIC due to anthropogenic CO 2 will require more careful analysis.


Background
Historically, the vast majority of chemical oceanographic investigations have focused on problems that had the scale of an ocean basin or smaller. There were multiple reasons for this restricted view that included lack of financial resources, lack of manpower, and the fact that very limited data sharing occurred between individual researchers. Some data sets were submitted to national data centers, however, many were not, and the level of quality control possible at the national data repositories is limited. The end result was that no really high quality biogeochemical ocean data set with global scope existed.
The GEOSECS program (Geochemical Ocean Sections) was conceived in 1967 and carried out during the 1970s. GEOSECS sampling consisted of 312 stations distributed approximately along the center of each major ocean basin. Many parameters were analyzed in addition to the common hydrographic measurements, i.e. pressure, temperature, salinity, oxygen, and the macro nutrients nitrate, silicate and phosphate. Most remarkable about GEOSECS was the extremely high quality of the measurements -in some cases equivalent to the best data being generated today. Also revolutionary was the fact the entire data set was available to the public in a reasonably short time. It is not an overstatement to say that GEOSECS revolutionized chemical oceanography. The greatest limitation of GEOSECS is that fact that it only provided a two dimensional picture of chemical distributions in the global ocean. The data were not sufficient to generate property distributions on horizontal surfaces. Global property integrals computed from the data had significant errors (Peacock, 2004;. During the 1980s the TTO (Transient Tracers in the Ocean) and SAVE (South Atlantic Ventilation Experiment) programs extended the GEOSECS view to three dimensions for the Atlantic. Station spacing was still sparse, however the individual station locations were chosen so that the combined data could be used to produce property maps on potential density surfaces with reasonable interpolation error (e.g. Kawase and Sarmiento, 1985). The number of measured parameters was significantly smaller than for GEOSECS, but the data quality was again remarkably high, and the data were made public.
Two other transitions resulted from these programs. The first was that nutrient and oxygen data were reported in micromoles per kilogram rather than in micromoles or millilitres per liter. This change was based on chemical arguments and has been adopted by subsequent large-scale programs. Unfortunately, this transition has not been universal. Second, the data were presented in a format designed for computer access. By today's standards, the formats were far from ideal, but they were carefully thought out and the format "flaws" were largely a result of computer limitations.
TTO and SAVE organizers had planned to extend the programs to the other oceans, however, this never materialized.
In the late 1980s WOCE (World Ocean Circulation Experiment) and JGOFS (Joint Global Ocean Flux Study) began. Unlike the previous studies, both of these had international organization and participation. Both programs had accuracy goals for every measured parameter, both required that the data be released quickly for public use in uniform-format computer-accessible files, and both had standard reporting units for every measurement. WOCE protocol had the additional requirement that each measurement in a bottle data set (except CTD derived temperature and pressure) be assigned an integer quality flag. The flag values were determined either by first hand knowledge of the analysis, or by "data experts" after a data set was submitted. This data flagging procedure has come to be called "primary quality control" or simply "1st QC". Primary quality control is largely a measure of the precision of a particular measurement rather than accuracy. The WOCE data flags have been used by many subsequent programs.
WOCE originated as a physical oceanographic program with sampling designed to optimize global transport calculations. The occupied sections were either meridional or zonal and had dense sampling along the sections relative to previous studies (∼30 nm station spacing; 24 to 36 bottle samples per station; high accuracy CTD records). In addition to the common hydrographic measurements a subset of the samples were analyzed for transient tracers ( 3 H, 3 He, 13 C, 14 C, . JGOFS was a process oriented investigation and included repeated sampling at a few locations. The JGOFS locations were chosen for specific hydrographic and biogeochemical conditions. JGOFS measurements included the common hydrographic parameters, but focused on less common biogeochemical measurements. Critical to the CARINA project, JGOFS also funded the analysis of carbon system parameters (total inorganic carbon-DIC, total alkalinity-ALK, pH and/or the partial pressure (or fugacity) of dissolved carbon dioxide) on WOCE cruises.
Many of the papers in this special issue discuss total inorganic carbon and/or total alkalinity data. In these papers as well as within the chemical oceanographic community there is no standard abbreviation for these two parameters. Total inorganic carbon is abbreviated by DIC, TCO 2 , C T etc. Total alkalinity is abbreviated with Alk, ALK, A T , TA, etc. Regardless of the abbreviation used, in the CARINA papers all are talking about the exact same thing. Efforts to standardize these abbreviations have failed.
Concurrent with WOCE sampling came the general acceptance that human activities -most importantly the release of CO 2 into the atmosphere by burning fossil fuels -had the potential to alter global climate. By the end of WOCE one of the largest uncertainties in global climate change studies was the inventory of anthropogenic CO 2 stored in the ocean. Accurate quantification of this inventory was the primary motivation for GLODAP (Global Ocean Data Analysis Project). GLODAP was a formally organized and funded collaboration. Most of the GLODAP team members were US scientists, but the project included participation by scientists from Australia, Japan, Korea and Europe. To achieve the stated goal, the first requirement was a high quality, uniformly calibrated global data set that included carbon system measurements and ancillary data. The core data for GLO-DAP were provided by WOCE and JGOFS. The uniform calibration requirement led to the development (or adoption) of various techniques designed to quantify (and subsequently correct) measurement bias that existed between various cruise data sets. The data bias existed because there were no universal standards for most of the needed measurements (e.g. nutrients, oxygen, carbon system measurements). The quantification of measurement bias has come to be known as secondary quality control or simply "2nd QC". Details of the GLODAP 2nd QC procedures can be found in the literature Sabine et al., 2005) and at the CDIAC web site (http://cdiac.esd.ornl.gov/oceans/ glodap/Glodap home.htm). For the carbon system, most of the data bias was eliminated by the availability, part way through the WOCE sampling, of CRMs (Certified Reference Material) which were devised, prepared and distributed by A. Dickson (Dickson, 1990;Dickson et al., 2003;Dickson, http://andrew.ucsd.edu/co2qc/index.html). The GLO-DAP team did not have the manpower to do complete 2nd QC on all of the parameters included in the data products, but rather adopted results from previous studies where available (Gouretski and Jancke, 2001;Johnson et al., 2001;C. Mordy and L. Gordon, personal communication to R. Key, 2003).
Once the GLODAP team had completed the 2nd QC work, they produced two data products . The first was a set of three merged calibrated data sets, one each for the Atlantic, Indian and Pacific Oceans. These compilations used a simplified set of quality flags (subset of the WOCE flags), had all questionable/bad data removed, included interpolated values for missing salinity, oxygen and nutrient data and reduced the carbon measurements to ALK and DIC (by calculation from whatever carbon-pair was measured). The second product was a series of objectively mapped property distributions. The maps used the same grid spacing and depth levels as previous work (e.g. Levitus, 1982 and subsequent revisions) for compatibility. The maps were then integrated to provide inventories (for the region covered by the data) for DIC, ALK, natural 14 C, bomb-produced 14 C, anthropogenic CO 2 , CFC-11 and CFC-12 (Table 1, . These inventories were not quite global since GLODAP included very little data from the Arctic Mediterranean Seas.  made reasonable extrapolations to extend the data to the remainder of the global ocean and produced the first data-based anthropogenic CO 2 global ocean inventory using the method of Gruber (1998). The same data have been used with different methods to calculate alternate anthropogenic CO 2 inventory estimates (McNeil et al., 2003;Waugh et al., 2006). The GLODAP data products were released to the scientific community immediately, and have subsequently been very widely used for varied biogeochemical and physical investigations by modelers and data analysts (Orr et al., 2001(Orr et al., , 2005Feely et al., 2002Feely et al., , 2004Lee et al., 2006Matsumoto, 2007;McNeil et al., 2007;Mikaloff-Fletcher et al., 2006Roussenov et al., 2004;Sarmiento et al., 2007;Sweeney et al., 2007;Vazquez et al., 2009;and many others).
While quite successful, GLODAP did not cover all ocean areas. The only data in the collection from latitudes north of approximately 60 • N were a few GEOSECS and TTO stations in the Nordic Seas. GLODAP included no data from the Arctic Ocean or the Gulf of Mexico, only a couple of stations in the Caribbean Sea, one GEOSECS station from the Mediterranean Sea, etc. Some of the research referenced above also demonstrated that the data density in the North Atlantic was exceptionally sparse relative to the concentration gradients and complicated physics encountered there. These deficiencies were partially responsibility for the CA-RINA project.

History of the CARINA project
Unlike GLODAP, the CARINA project began as an informal collaboration with very limited funding. The project was started by D. Wallace and L. Mintrop, and had an organizational meeting at Delmenhorst, Germany in 1999. Subsequently, funding was obtained from German JGOFS to support Mintrop who acted as data collector. Participation was voluntary and consisted mostly of European scientists. Participating scientists were required to submit their historical data sets that included either subsurface carbon system measurements or underway surface pCO 2 data. The last meeting of this group was held in 2002. By that time the group had accumulated subsurface data from approximately 30 cruises (excluding those that were in GLODAP) and twice that number of underway data sets. The funding ended in March 2003 and, unfortunately, the support level was insufficient to do much more than amass and catalog the submitted data.
In 2004 the original CARINA data collection was transferred to CDIAC. This was about the same time that the North Atlantic GLODAP data deficiencies were recognized. Consequently, a copy of the CARINA bottle data was transferred to Princeton for data assessment and quality control.
In January 2005 the EU funded CARBOOCEAN program began. This consortium consists of more than 40 research groups and includes the most of original CARINA scientists. CARBOOCEAN is an integrated program with the aim of making an accurate assessment of oceanic sources and sinks of carbon over space and time. It has focus on the Atlantic and Southern Ocean and a time interval of −200 to +200 years from the present. All funded CARBOOCEAN partners are required to make public all historical data and new data after a two year proprietary period. During workshops held in the first two years of CARBOOCEAN, the CARINA project was reactivated and additional data sets collected.
In June 2007 the CARBOOCEAN/CARINA scientists met in Laugarvatn, Iceland to discuss methods and responsibilities for the CARINA data synthesis. By that time, the CA-RINA collection had grown to approximately 80 cruises. During this meeting the group decided to extend the original scope of CARINA to include the entire Atlantic, the Arctic and the Southern Ocean. Various team and project leader assignments were: -Data collection, 1st QC and production of final data products: R. Key and X. Lin The team also decided to include data from CLIVAR (Climate Variability and Predictability) repeat hydrography cruises (http://www.clivar.org/carbon hydro/hydro table.php) that were final and that were in one of the focus regions. Since the new CLIVAR data were known to be high quality, those data, along with WOCE results would serve as "master cruises" for the data calibration (2nd QC) phase of the synthesis. The areal expansion of the project led to a flood of new data and a final total of 188 cruises. The CARINA station locations are shown in Fig. 1. The CA-RINA web site (http://cdiac.esd.ornl.gov/oceans/CARINA/ Carina inv.html) includes links to the original cruise data files (via the Cruise Summary Table), the resulting data products and publications, and detailed information on the quality control procedures used.

Instrumentation
Data included in the CARINA data products span almost 30 years of measurements. Rather than attempt to summarize the specific methods and instruments in this document, we have included this information in the individual cruise file headers. For many cruises additional information can be found in the individual final cruise reports and other documentation provided with the cruise data. In many instances, a full description of the methods and instruments can be found in the footnotes to the Cruise Summary Table at the CARINA web site that refer to specific publications. Certainly the most important changes in methods and instrumentation are the adoption of CRM for standardization of ALK and DIC measurements, the development of the SOMMA-type analyzer (Johnson et al., 1998 and references cited therein) for DIC and the shift from electrode based to spectrophotometric pH determination (Clayton and Byrne, 1993). All three began to be used in the early 1990s.

CARINA data assembly and synthesis
Here we describe the data collection and synthesis steps used for this project. Many of the procedures used during CARINA were adopted from GLODAP, however, the number of cruises included in CARINA combined with the additional manpower and funding available from the CAR-BOOCEAN contract allowed improvements. The most significant changes were: (a) more parameters were subjected to 2nd QC by the project participants; (b) software was designed to automate portions of the 2nd QC procedures; (c) work was coordinated among the different groups and within groups by means of a web site; (d) pH was included in the final data products along with ALK and DIC; (e) fully formatted versions of all the individual cruise files were submitted to both CCHDO (CLIVAR & Carbon Hydrographic Data Office: http://whpo.ucsd.edu/) and CDIAC for archive and distribution; and (f) a significant collection of references to literature describing the individual cruise results was compiled. This effort led to two distinct results. The first is a set of individual cruise files with the measured data converted to common units, having quality flags added for all parameters, and accompanied by metadata. All of the individual cruise files are in "WHP-Exchange" format (Swift, 2008). This format is a standard that developed during the 1990s and has since become widely accepted. It is a comma separated data file with formal column header names and units and that can include metadata within the header. The second is a set of 3 data products (Arctic Mediterranean Seas-AMS, Atlantic Ocean-ATL and Southern Ocean-SO) that have been fully calibrated (i.e. measurement bias removed via 2nd QC) and include some calculated values. For CARINA we defined Arctic Mediterranean Sea(s) to include the main Arctic basin and all adjacent seas southward to approximately 60 • N. Thus the AMS region includes the Nordic Seas (down to the Greenland-Iceland-Scotland Ridge) on the Atlantic side and the Bering Sea on the Pacific side. The format for the data products is simple comma separated records with a single header record defining the included values. The header does not include units since everything is standard (as defined for the Exchange format). Additionally, the data products are purely numeric other than the single header record.
The CARINA data products are compatible with the three GLODAP data products, but they are not identical, differing somewhat in column order and included parameters. We plan to merge CARINA and GLODAP once the initial scientific analysis of CARINA is completed.

Collection and 1st QC
The most time consuming portion of the CARINA synthesis was data assembly. Investigators who had participated in data collection and/or made the measurements, submitted most of the data sets. Along with the data file, submitters were asked to supply references to any publication(s) that had resulted from the data. Whenever they existed, final cruise reports were obtained. The remaining data sets were obtained by "discovery" which amounted to scanning publications for mention of other cruises, data discussed at CARBOOCEAN and other meetings and similar. Once discovered, either the chief scientist or another cruise participant was contacted for a copy of the data and any existing documentation. In most cases, a complete copy of the cruise data set was not available. In these instances the missing data were sought from the principal investigator(s) (PI) responsible for that data. Though the effort was not completely successful, we tried to obtain all of the bottle measurements from each cruise. As the data were collected, we also obtained permission from each PI to release his/her data to the public. In a few cases electronic versions of the data did not exist and the results were manually entered into the existing files.
For all of the CARINA cruises the following conventions were used for station information. Only one location was recorded for each station of each cruise. When multiple casts were collected, the location and date of the first cast was used for the entire station. Locations were stored as decimal degrees with negative values for west longitude and south latitude. For many of the cruises bottom depth was not recorded for each station. In these cases bottom depth was first approximated from a global (0.25 degree resolution) topography. This depth was then compared to the deepest sample pressure at the station. Whichever was greater, the topographic depth or the deepest sample pressure +10 was recorded for the water depth. These bottom "depths" are not meant for research purposes, but rather to enable drawing approximate bottom topography for section plots.
For most cruises multiple files with different subsets of the data were collected. The first synthesis task, and the most error prone, was merging data from these subsets. File merging is a quick and easy computer matching procedure whenever adequate sample identification is given. However, for most of the CARINA cruises the identification information was either incomplete or totally missing. In these instances the data files were manually merged based on available information. The manual merges, which consist of multiple cut and paste operations were made especially tedious by the fact that "intended bottle depth", "bottle pressure" and "bottle depth" were often used synonymously. In the many instances where the cast and bottle information was missing, values were fabricated to ease subsequent discussion of specific results among various project participants and to make the files more format consistent with modern oceanographic records. Such fabrication is noted in the metadata header of the final format files submitted to the data centers. Alphabetic station names were converted to numeric and unnecessarily complex station numbers were simplified. These alterations were documented in the file header information.
Immediately after merging, cruise data were read into the same data system used for the GLODAP collection. There, units were converted to match those used during the WOCE program. Most commonly, this amounted to converting oxygen and nutrient data from milliliter per liter and micromole per liter into micromole per kilogram (µmole/kg). Unfortunately, there is no standard method for this conversion. For this work the most common method was to use density calculated from measured salinity for each sample with an assumed lab temperature (default of 22 • C) and pressure (1 atmosphere). In cases where the concentration was reported in standard units (µmole/kg) the conversion method is unknown, but simple division by a constant assumed density (often 1.025) is common. Regardless of method, this conversion error is less than the measurement errors, so we consider this inconsistency to be bothersome, but minor. Another source of error that we were not able to completely eliminate is the possibility of erroneous units for the nutrients, i.e. that data were given in volumetric units instead of the stated gravimetric units, or the vice verse. Both cases would cause an offset of 2-3%.
Another complication arose with nitrate data. In ideal cases nitrate and nitrite measurements were reported separately. In others only nitrate was reported or only the combination of nitrate plus nitrite. Finally, in a few instances nitrate plus nitrite was reported along with values for nitrite. For the last example the nitrite values were simply subtracted from the reported nitrate plus nitrite values. For cases where only nitrate plus nitrite was reported we had a choice: carry an additional parameter (i.e. NO 3 + NO 2 in addition to nitrate) or simply rename the data nitrate (ignoring the nitrite contribution in the upper water column). Both choices are problematic. We chose the latter for CARINA cruises (both original cruise files and final data products).
Chlorofluorocarbon data in the CARINA collection cover the time span from 1982 to 2005 and were originally reported on either the SIO-93 or SIO-98 scale. All of these (CFC-11, CFC-12, CFC-113, CCl 4 ) were converted to the SIO-98 scale (Prinn et al., 2000). SF 6 data are reported on the NOAA-GMD 2000 calibration scale.
Reported pH data were also converted to uniform scale and temperature. The CARINA data span 1977-2005. Over that time pH measurements have been made with radically different techniques and the results reported on three different pH scales: National Bureau of Standards scale (NBS), seawater scale (SWS) and total hydrogen scale (TOT). Values are also reported at various temperatures (measurement temperature, some arbitrarily chosen temperature or in situ temperature). The difference between these scales isn't too large, but it is significantly larger than the precision/accuracy of modern spectrophotometric techniques. All of the measured pH data were converted to SWS at 25 • C in both the individual cruise files and in the final products. While we were producing the data products, a new version of the handbook of best practices for ocean carbon measurements was published (Dickson et al., 2007). This handbook suggests that the preferred pH scale is the total hydrogen, however, at that point it was already too late for our project. Velo et al. (2009) give the conversion functions and additional details for this work.
Historically, salinity has been analyzed on every bottle sample from a CTD/Rosette cast. The bottle salinity results were calibrated by analyzing seawater standards. The calibrated bottle salinity values were subsequently used to calibrate the CTD conductivity probe. Also, because bottle salinity can routinely be measured with high precision, the bottle salinity data provide the best check that a sample bottle closed properly and at the desired depth (for most ocean regions). That is, bottle salinity is the best way to identify mis-trips and leaking sample bottles for most of the global ocean. On many of the CARINA cruises, bottle salinity was only analyzed with sufficient frequency to calibrate the CTD. Without bottle salinity, identification of mis-trips and leaking sample bottles is reduced to an educated guess, at best. An additional problem with many of these data sets was that bottle salinity and CTD salinity values were not discriminated. That is, it was impossible to determine which of the two was included in a data file. When we could not determine if a set of values was CTD or bottle salinity, we assumed that it was bottle salinity. Therefore it is virtually certain that some of the bottle salinity data is actually CTD salinity. See also the discussion below on special steps taken with salinity data during production of the final data products. In general, the treatment of salinity data in CARINA could be labeled sloppy. We wouldn't argue with that, however, this wasn't due to lack of effort -we did the best we could. We also believe that the salinity data in CARINA are adequate for "normal" chemical oceanographic applications. We do not know whether or not the salinity data will be of sufficient quality for detailed physical oceanographic applications. The next step in the synthesis was 1st QC -the assigning of a data quality flag to each measured value. This is a process by which individual data points are closely scrutinized. It is a method of improving precision and removing spurious data. Details of this procedure are in Tanhua et al. (2010).
The 2nd QC procedures (discussed in Tanhua et al., 2010) critically examine data using different techniques than 1st QC. The goal of 2nd QC is to quantify measurement bias. In some cases additional spurious data points were identified during 2nd QC, and the initial flag values altered appropriately. Once all of the flag values are final, each cruise file was submitted to national data centers (CCHDO and CDIAC). Data bias identified during 2nd QC was corrected in the final data products, but these adjustments were not applied to the individual cruise data sets.
The CARINA data product incorporates one additional flag with value zero (0). This flag was also used in GLODAP.
The zero flag indicates a datum that "could have been measured", but was approximated in some manner. There are three different uses for the zero flag in the data products: -Instances where bottle salinity was missing or bad and consequently was replaced with CTD salinity.
-Interpolated values for salinity, oxygen or nutrients.

2nd QC
While 1st QC is designed to improve the overall precision of a data set, 2nd QC procedures are designed to quantify measurement bias. That is, the goal of 2nd QC is to improve the accuracy of a data set. Measurement bias is rather common with nutrient and oxygen measurements because certified standards are not routinely used. The very best nutrient measurements can have precision better than 1%, but the accuracy is seldom better than 2%. The same condition existed for ALK and DIC measurements until the early 1990s when CRM were developed. From GEOSECS to WOCE, ALK and DIC measurement precision improved from 5-10 to 4-5 µmole per kilogram. The best CLIVAR data now have precision of <2 µmole/kg. Prior to CRM development, however, it wasn't uncommon for these measurements to have a bias of >20 µmole/kg. The use of CRMs has lowered that to <5 µmole/kg. The 2nd QC is based on the initial assumption that abyssal waters are at steady-state. That is, deep water concentrations are invariant over time for a given location. This assumption was reasonable for the WOCE cruises included in GLO-DAP since the collection period only spanned a few years and few of the cruise track intersections occurred in regions with strong horizontal abyssal concentration gradients. This is not the case for CARINA. Many publications have clearly demonstrated that the abyssal steady state assumption is false over the time interval spanned by CARINA data and especially for some of the regions sampled by CARINA cruises (i.e. the far North Atlantic, the Labrador Sea and the Nordic Seas). Decadal change due to anthropogenic and natural forcing was one of the CARBOOCEAN/CARINA focus areas, so all of the scientists involved in 2nd QC were aware of the potential to erase real changes when attempting to correct measurement bias.
The 2nd QC normally consisted of two steps: quantification of the relative measurement offsets between different cruises and assignment of a adjustment factor to data deemed to have a measurement bias that exceeded a predetermined limit. The first step was objective, the second subjective and influenced by the experience of the scientists involved and the knowledge that real temporal changes were expected for some regions. Offset was determined using variants of the crossover technique developed for GLODAP (Key et al.,  2004; Sabine et al., 2005) and different forms of the inversion methods derived by Gouretski and Jancke (2001) and Johnson et al. (2001). The 2nd QC methods are discussed in detail by Tanhua et al. (2010). 2nd QC tests were run for salinity, oxygen, nutrients, DIC, ALK, pH, CFC-11, CFC-12, CFC-113 and CCl 4 . For the carbon system parameters, additional tests were possible using calculated values. For example, if DIC and ALK were measured, calculated pH could be compared to measured pH from another cruise. To demonstrate the validity of this comparison, we compared calculated to measured parameters for one Atlantic cruise that had very high quality measurements for three carbon system parameters (Cruise #86; 33RO20030604; Fig. 2). Regardless of the pair used for the calculation, the mean difference between the measured and calculated values was statistically indistinguishable from zero and the standard deviation of the difference was not much larger than the measurement precision. This comparison provides strong evidence that the calculation error is insignificant and that calculated carbon parameters can be used for 2nd QC investigations. If a calculated carbon parameter is biased, the implication is that one of the input parameters is biased. The 2nd QC procedures yield an offset for virtually every cruise. In some previous studies (Gouretski andJancke, 2001 andJohnson et al., 2001), in order to be as objective as possible, all of the determined offsets were corrected. This will produce a combined data set with the lowest combined variance between cruises. In GLODAP and CARINA a more subjective approach was used. First, only those offsets that exceeded a predetermined minimum value were considered for correction. Second, all offsets that exceeded the threshold were examined by the working groups prior to assigning a final adjustment value. This subjective approach was necessary because the different 2nd QC procedures often gave different results and because some of the parameters were expected to change with time. This issue is discussed in detail in the accompanying methods paper  and in each of the regional CARINA papers in this issue. The minimum offsets considered for adjustment are given in Table 1. All of the details of the crossover checks, inversion results and final adjustments are available at the CARINA web site.
In a few instances 2nd QC and associated investigations determined that all of the measurements of some parameter from a cruise could not be adequately adjusted. The reasons varied, but included strongly conflicting 2nd QC results, extremely noisy data and similar problems. In these cases the entire set of parameter measurements was discarded from the data product. Instances of this are indicated in the on-line version of the adjustment table by the lower case letter "o" in the flag column for each parameter instead of the normal check mark ( √ ) which indicates acceptable results. If this table is downloaded these two adjustment quality flags are translated into "3" and "2", respectively. The decision to discard an entire set of measurements was made independently from the individual datum 1st QC flags.

Construction of the data products
The CARINA project resulted in three data collections or products: the Arctic Mediterranean Seas (AMS), the Atlantic Ocean and Mediterranean Sea (ATL), and the Southern Ocean (SO). The divisions between the regions were approximately 60 • N (the Greenland-Scotland Ridge in the Atlantic and the Aleutians in the Pacific) and 30 • S. Cruises which spanned a division line were generally included in both collections. Each cruise in the collection was assigned an EX-POCODE (Swift, 2008). These codes provide an unique identifier and are composed of NODC (National Ocean Data Center) platform code for the research vessel (http://www. nodc.noaa.gov/General/NODC-Archive/platformlist.txt) followed by the date when the cruise left port. The NODC code is composed of a 2 digit country code and a 2 character (number or letter) ship code. For example a cruise that started on 3 October 1999 aboard the Norwegian vessel Haakon Mosby would have EXPOCODE 58AA19991003. All of the cruises were then sorted by EXPOCODE, numbered sequentially, and a Cruise Summary Table (CST) was created (http://cdiac. esd.ornl.gov/oceans/CARINA/Carina table.html). The last 5 entries in the CST are not single cruises, but cruise collections representing a single investigator (#'s 184 and 185) or a single project (#'s 186-188). Assignment of an EX-POCODE in these 5 cases was inappropriate so they were simply named. The data for these 5 collections were not segregated into individual cruise files because we thought the data more valuable as a collection and because the limited amount of data for each individual cruise did not warrant the increased record keeping that would have been required. The three data products include only the sequential cruise number, not the EXPOCODE so that the data records could remain purely numeric. Lookup tables are provided along with the data products so that the cruise number can be matched to the EXPOCODE.
The Cruise Summary Table (CST) contains a wealth of additional information. Along with the EXPOCODE the second column also lists aliases. Aliases include names used by the original investigators for the cruise or project and in some cases WOCE line designations (e.g. for cruise #4 the "WOCE SR04e"). The third column (Area) refers to the CARINA region (and data product) with: 1 = ATL, 2 = SO, 3 = ATL & SO, 4 = AMS and 5 = AMS & ATL. The numbers under the parameter columns indicate the number of stations that have the particular measurement. Two entries under the parameter columns have a different meaning. Very few cruises in this collection included discrete pCO 2 sampling. For these few, a numeric entry is the station count. A "U" entry, however, indicates that underway pCO 2 measurements were made. The CARINA work does not include underway data. Underway pCO 2 data are being compiled by another team (SOCAT; Surface Ocean CO 2 Atlas Project; http: //ioc3.unesco.org/ioccp/Synthesis.html#SOCAT). A "C" entry in the CST under the pH, C T or A T column indicates that The data products do not contain all of the measurements from all of the cruises. Rather we narrowed the total list of different measurements down to those that were commonly measured or would be useful for carbon system calculations using current methods. The list of retained parameters is given in Table 2. This table also translates the parameter names in the products to the "official" Exchange format nomenclature and it gives units for the measurements. This naming convention was selected so that the CARINA data products matched the GLODAP data products as closely as possible.
With a few minor changes the CARINA data products were constructed with the same software used for GLODAP. The procedure is semi-automated and execution amounts to Potential Density relative to 0 dB kg m −3 sigma1 Potential Density relative to 1000 dB kg m −3 sigma2 Potential Density relative to 2000 dB kg m −3 sigma3 Potential Density relative to 3000 dB kg m −3 sigma4 Potential Density relative to 4000dB kg m −3 manually calling several programs in sequence with the appropriate options set for each program. With the exception of one step, all of the code was developed and runs on the same computer used for archiving the master version of each cruise data file. All of this code is written in S-Plus (Version 3.4 release 1 for Sun SPARC; TIBCO Spotfire, previously Insightful ® ). Below, each step of the procedure is briefly described. The cruises included in the CARINA data products generally exclude those that were included in GLODAP. This was done primarily to facilitate later merging of these two data products. There are, however, 3 exceptions: 06MT19941012, 06MT19941115 and 74DI19970807 (Cruise Numbers 12, 13 and 171 respectively). These cruises were added to CARINA because additional parameters critical to the CARINA goals became available after GLODAP was published. The CARINA 2nd QC, however, made full use of many of the GLODAP cruises and details are given in many of the accompanying publications in this issue.

Concatenation and adjustment
Program makeocean is the main routine for building merged calibrated data products. Input includes: (1) a list of cruise names, (2) a list of parameters to be included in the data product, (3) a list of parameters that were considered for adjustment and (4) the name of the table that contains all of the various parameter adjustment factors. In sequence, each cruise file is first read and then reduced to the list of measured parameters that are included in the output product. Any parameter (and accompanying flag) that is in the include list, but not in the cruise data set is generated and filled with null values (NA; −999 on output). The parameter columns are then sorted into the same order as the input parameter list. Fi-nally, any necessary adjustments (multiplicative or additive) are taken from the adjustment table and applied. The result is two files: one with station information and a second with data.
The two files are checked for missing value numbers (−9, −999, etc.) that may have resulted from other software and these are replaced with NA. Care is required with the station file since −9 is a possible real value for latitude and longitude, consequently, a very few latitude and longitude values that were exactly −9 were changed to −9.00001. This change is scientifically inconsequential.
Finally, the compiled data were subjected to a very coarse primary QC to eliminate any highly anomalous data points that had not previously been discovered. This check was made by plotting all values of each parameter against pressure. For most parameters a few points were noted. These few anomalous points were removed from the data product. With this procedure, it is far more likely that questionable values were retained than good data eliminated, but the latter is still possible.

Flag simplification
Program flagmod simplifies the full set of WOCE quality control flag values (Table 3) to a minimum subset. The rationale is to make the data products easily usable to the widest audience without losing information that is critical to a large merged data set. The following transformations to the flags (and values) in the merged data file were made: 1. flag 0, 2, 9, no change to data or flag 2. flag 3, 4, 5 (questionable, bad, not reported), existing data values reset to NA and flags to 9 Clearly bad result 5 Value not reported 6 Average of replicate 7 Not used 8 Not used 9 Not measured 3. flag 6 reset to 2 with no change to data value 4. to correct flag errors which occurred at any step, the data are searched for NA and the flag associated with NA is set to 9.
The final result should be a file that only has flag values 0, 2, or 9. This procedure is not perfect. It is impossible to predict all the possible typographical errors in files of this size. While it is trivially easy to identify the unique flag values in the combined data set it can be extremely tedious to identify the exact location of the error and know the appropriate correction.

Salinity and miscellaneous corrections
For CARINA we decided that a sample must have pressure and temperature to have any value. Basically, we assumed that if either of these values was missing then something had gone critically wrong with that sample. Consequently, if either temperature or pressure was missing, then all data for that sample bottle was set to NA and the flags to 9. Fortunately, there were very few instances. Salinity data is also critical, however, the circumstances are different. For CARINA we chose bottle salinity in preference to CTD derived salinity. Some original data files contained bottle measurements only, others contained CTD salinity values only, others contained both, and many files had salinity values with the source not identified. When the source was not identified, we assumed that the values were bottle salinity.
Up to this point the two types of salinity data were both retained and stored separately. Here we made two assumptions: first that any CTD salinity was better than nothing and that any existing salinity was better than what could be interpolated. Both assumptions should usually be true even with uncorrected CTD salinity. Consequently, wherever bottle salinity was missing and a CTD salinity value existed, Table 4. Interpolation zones and limits. Zones and limits were determined by experimentation. For each interpolated value the adjacent measured values (above and below) can be separated by no more than the corresponding limit for the interpolated value to be deemed acceptable. the CTD salinity (and flag) was copied into the bottle salinity data slot. The rationale for this procedure was to make the data easier to use without incurring errors that would be significant for most applications. This procedure probably added noise to the salinity data, but one might expect the noise to be pseudo-random for the entire data set.

Interpolation
Many of the procedures used to interpret biogeochemical data involve various property-property plots or linear least squares fitting procedures. Since the highest priority application for the CARINA data set was oceanic carbon chemistry, we did not want to exclude relatively expensive carbon measurements from such analyses only because the sample was not analyzed for salinity, oxygen or one of the nutrients. Consequently, we made the same decision as was made during the GLODAP effort  and interpolated missing values for salinity, oxygen, nitrate, phosphate and/or silicate where it was reasonable to do so. The GLODAP algorithm was used. That is, a quasi-Hermetian piecewise polynomial was fit to existing data and that fit used to approximate missing values. The distance over which interpolation was allowed varied with pressure in the water column and by region. The zones and limits were determined by experiment and consensus between Princeton and the four area team leaders. Table 4 summarizes the pressure zones and the maximum allowable data separation for each zone. Extrapolation was not allowed. These interpolated values were assigned a zero flag value. While this procedure has proven to be very reliable, it is not perfect. Unusual sample distributions combined with the nature of the fitting function can generate anomalous values. In particular for the CARINA cruises it was not uncommon to have multiple samples at very similar pressures for a given station. This situation was virtually never encountered with GLODAP sampling. The Hermite fitting function is not prone to "ring", however, when adjacent samples are  Figure 3. Illustration of interpolation. The black dots are measured data. The red boxes and blue x's are interpolated values at the indicated pressures using the Hermitian and linear fitting functions, respectively. Note that there are two measurements near 3000 dB and that these measured values are very nearly identical. The close proximity (in pressure) of these two measurements causes the Hermitian fitting function to "ring" thus producing the errant interpolated value near 3100 dB. In cases such as this, when the two fitting functions produce results that differ by more than 1%, the linear interpolation is used. For all the other cases shown the difference is less than 1% and the approximation from the Hermitian function is used. All of the interpolated points shown in this example pass the "maximum measured data separation distance" test described in the text and in Table 4. extremely close together the function can give spurious results. Consequently, the interpolated values generated with the Hermitian scheme were compared to values derived by simple linear interpolation. In cases where the Hermitian approximation differed from the linear approximation by more than 1%, the linear value was chosen. An example of this is shown in Fig. 3. Even these precautions will not cover all questionable interpolations, therefore, after the interpolation step was completed, the combined (measured + interpolated) parameters were checked and the obvious fliers eliminated from the data set. This check was very crude with the result that the final data set undoubtedly contains a few anomalous interpolated values.
As an experiment, the data shown in Fig. 3 were also fitted with spline, spline under tension, "csakm" (from the IMSL FORTRAN library; Virtual Numerics, Inc.), and "loess" (from the S-Plus library; see Cleveland and Devlin, 1988) functions. The first 3 showed "ringing" equal to or worse than the Hermitian function. The "loess" fit does not ring, but is overly smoothed. For this example an obvious "fix" would be to average the two data points that are so close to each other (near 3000 dB) and use the average as input to the fitting routine. Such an averaging scheme for data that are nearly co-located would be a good modification to the interpolation software. The problem is that one has to define "close" and that definition will certainly vary with pressure and geographic location. If one only had 10 or 100 inter-polations then the interpolation procedure could be visually monitored, however, with more than 84 000 possible interpolations that was not practical. Therefore, the required software development and testing has been left as a future exercise.
We are aware that myriad other interpolation algorithms exist. Only those mentioned were tested and we do not imply that the method used is the "best" (however one might choose to define best). We do feel that the interpolation is worthwhile and that the method used is both reasonable and adequate. In the end, the limits over which interpolation is allowed tend to be more important than the fitting algorithm.

Basic calculations
The existing data were used to calculate values for potential temperature, potential density relative to 0, 1000, 2000, 3000 and 4000 dBar, and apparent oxygen utilization (AOU) using the same algorithms used for GLODAP. Additionally, sample depth was approximated for all samples using a simple function based on pressure and latitude (in cases where only depth was available, pressure was approximated using a similar function). These parameters were added to each data file.

Carbon calculations
All of the various carbon calculations in CARINA used the MATLAB ® translation ; http://cdiac. esd.ornl.gov/oceans/co2rprt.html) of the code originally developed by Lewis and Wallace (1998, same link). CARINA used the same constants used for GLODAP (most importantly, the Dickson and Millero (1987) refit of Mehrbach et al. (1973), but see also van Heuven et al. (2009)). This decision is supported by significant literature (e.g. Lee et al., 1996;Wanninkhof et al., 1998;McElligot et al., 1998;Millero et al., 2002;Mojica Prieto and Millero, 2002). Others have suggested different constants and given new fits to old data, but these studies were either vetted on a regional scale rather than globally or offered only very minimal improvement. The CARINA team carbon experts decided that the potential for minor improvement was less important than being consistent with values calculated during GLODAP since data from the two collections will undoubtedly be used together.

Partial pressure
The partial pressures of CCl 4 and SF 6 were calculated based on the solubility equations given by Warner et al. (1995), Bu and Warner (1995), Bullister and Wisegarver (1998) and Bullister et al. (2002). The partial pressure values and fractional equilibrium relative to the atmosphere at sampling time were extremely useful in the 2nd QC procedures for these parameters (Steinfeldt et al., 2010). Note that the GLODAP data products included "simple" CFC ages rather than partial pressures.

Data product parameter accuracy
Stated simply, it is impossible to determine the general accuracy of the various parameters included in the CARINA data products. Precision estimates could be calculated for various subsets of the data, however those results would have limited, if any, value. In lieu of such numbers, we investigated the "internal consistency" of the data products. Details of these estimates are given in Tanhua et al. (2010; Table 3). This exercise clearly demonstrated that the internal consistency of the data product was significantly better than for the original data. Excluding oxygen and nutrient data (since there are no "standards") the consistency values could loosely be interpreted as an upper limit of accuracy. This approximation is an upper limit since some of the variance included in the internal consistency calculation is due to real change. Conversely, if the 2nd QC procedure removed real change signals rather than measurement bias, then the internal consistency calculation would imply that the data in the products are "better" than they really are.

Lessons learned
Two things are clear. The CARINA project both benefited from and improved upon GLODAP techniques. The most significant improvements include development of software to automate much of the 2nd QC work and consequently being able to carry out 2nd QC on a larger subset of the total parameter set. This software also allowed the CARINA team to derive either additive or multiplicative adjustment factors for the various parameters. Experience has shown that multiplicative adjustments are superior to additive adjustments for oxygen and nutrients in particular (the additive nutrient adjustments used in GLODAP occasionally generated negative near surface concentrations!). As with GLODAP, CARINA 2nd QC demonstrated that different analytical techniques can yield different results with respect to data adjustments. We believe that retaining human control is preferable to fully automated analysis for data such as these.
Certainly the most glaring shortcoming for many of the cruise data sets was that complete records were not retained with the data. Prior to the WOCE program in the 1990s final cruise reports were not produced for many cruises. This was particularly prevalent when the cruise was manned by a single group from one institution. This situation was exacerbated by the fact that the data from most of the CA-RINA cruises were held exclusively in the collection of individual scientists. By the time the data were released for inclusion in this data product many of the people who had made the measurements were no longer working in the field. Fortunately, these practices are slowly ending. The CAR-BOOCEAN program requires that all funded projects report data within 2 years after the cruise. For CLIVAR, shipboard measurements are made public immediately and final data are required within 6 months after the cruise (except for shore based measurements). This paradigm shift from "proprietary forever" to rapid public availability carries the risk that another scientist will publish data before the PI responsible for the data has a chance. This occurrence is, however, extremely rare. Rapid public scrutiny of data more commonly results in elimination of data errors and new collaborative research opportunities. Timely data reporting ensures that sufficient metadata can still be obtained if it is not originally provided.
The development of CRM for the calibration of ALK and DIC was noted as one of the most important developments with carbon system measurements for GLODAP . The same is true for CARINA. CRM are readily available and reasonably priced. Production of a high quality ALK and/or DIC data set requires frequent CRM analysis.
pH measurements were rarely made during the WOCE program and the few measurements that were made were not included in GLODAP. Rather in GLODAP, pH and DIC were used to calculate ALK. With the CARINA collection pH was frequently measured. Additionally, since GLODAP was completed the issue of ocean acidification has attracted significant attention. Finally, the spectrophotometric measurement technique has become common and is far superior to electrode based measurements. One result of this history is that reporting requirements for pH data were not previously standardized. When CARINA began, the most accepted scale for oceanographic measurements was the seawater scale. During this project, however, agreement was finally reached that pH data should be reported on the totalhydrogen ion scale at some specified temperature (generally 25 • C). By the time this decision was made, it was too late to change all of the CARINA data sets. Consequently, all CARINA pH values (both in the cruise files and in the data products) are reported on the seawater scale at 25 • C.
For GLODAP,  noted that the need for nutrient standards similar to the carbon CRMs. Progress has been made (Aoyama et al., 2008;Aminot and Kirkwood, 1995), but so far, the use of nutrient "CRMs" has not been generally adopted. Analysis of the CARINA data make it abundantly clear that this practice must stop. The community must adopt a set of CRMs and those "standards" should be used on every cruise. This change in methodology is absolutely critical if we are ever to understand subtle changes in nutrient distributions and stoichiometric ratios in a changing ocean environment.
The development of a dedicated web site for the CARINA work was extremely helpful. This site allowed team members to easily share data and ideas and provided a location to store all of the QC output and final adjustment tables. Now that the project is finished all of the CARINA website materials are being transferred to CDIAC for archive and public access. The CARINA data products represent the work of hundreds of scientists. The project has now extended for a decade with the final effort requiring half that time. The original goal, to assemble a collection of European data that would be useful to study the inorganic carbon system in the North Atlantic Ocean, was significantly expanded and, we believe, successfully completed. Not only were the data assembled, but the most critical parameters were subjected to very careful analysis to remove various data biases. An independent analysis of the CARINA data product would undoubtedly show that overall the data quality of CARINA is not as high as GLODAP. This was expected. The CARINA cruises cover a longer time interval and more importantly the cruises were primarily carried out by individual scientists operating in small groups rather than being the result of a globally organized survey effort. Regardless, the secondary quality control activities have resulted in a data product that is sufficiently accurate for modern analyses including climate change issues. Equally important is the fact that CARINA both supplements and extends the global coverage provided by GLO-DAP. Chemical oceanographers now have a very nice data set covering the northern North Atlantic and Nordic Seas, the beginning of coverage for the Arctic Ocean, and significantly more data for the Southern Ocean. Additionally, while the CARINA calibration techniques differed somewhat from those of GLODAP, the two data sets are thought to be compatible without alteration for large scale investigations.
The CARINA QC and adjustment procedures risk removing real signals from the original data. Without a much larger and higher initial quality data set such removal would be impossible to detect. As others use these data for independent research projects additional information will be gained. However, temporal signals still exist in the data products. As an example Fig. 4 shows boxplots of near-surface (0-25 dB) nitrate and AOU (apparent oxygen utilization) data from the Nordic Seas region taken from the AMS data product. The data were taken after all adjustments had been applied. No interpolated values were included in this analysis. AOU was used rather than oxygen to remove the temperature dependence of oxygen solubility. It is abundantly clear that the seasonal cycle has not been removed from these data. A similar seasonal cycle exists for the near surface DIC data from this region, however, without removing the seasonal cycle, the expected anthropogenic increase is not readily apparent for these surface waters (it is visible in deep water). Detailed analyses are required to identify subtle signals. Such studies are planned, but not discussed here.
The seasonal signal demonstrated in Fig. 4 is so strong that it is not the most convincing demonstration that 2nd QC did not remove real signals from the data products. Figure 5 illustrates a much sterner test. Here, deep water DIC data from the same region as Fig. 4 are summarized by measurement year. A significant fraction of the data variability for . These two boxplots were generated using measured values from the AMS data product. The data selected are from the upper 25 m of the Nordic Seas region. The box widths are proportional to the number of data included. Even though a substantial fraction of the data were adjusted as part of the 2nd QC work, the seasonal cycle in these two parameters is retained. The near surface DIC data from this region have the same trend. Similar analyses with other parameters and other regions demonstrate that the 2nd QC procedure has not "erased" strong temporal signals. Investigation of more subtle signals such as the expected temporal increase in near surface DIC due to anthropogenic CO 2 will require more careful analysis.
each year is due to spatial variability. Even though this test is crude, the increasing concentration trend with time is clearly evident and statistically significant at a very high confidence level. The DIC increase rate derived from these combined data (0.33 µmole/kg/yr) is less than that derived from the near bottom data in the Irminger Sea time series (0.8 µmol/kg/yr; cruise #185). Again, detailed investigation will be required to determine if the difference in increase rate is real or due to the averaging incurred in the trend shown in Fig. 5. The next planned step is to merge CARINA with GLO-DAP. Tests show the two data products to be consistently calibrated. The merge is, however, non-trivial because of differences in the parameters included and various detail differences such as sample indexing. Figure 5. This boxplot shows deep water DIC measurements from the same region as Fig. 4. A significant fraction of the spread indicated by each box is due to spatial variability. In spite of the crude nature of this summary, the average concentration increase over time is statistically significant at a very high confidence level. The increase rate derived here is only about half that found for the Irminger Basin alone. For this discussion the important point is that the secondary QC adjustments have not erased subtle large scale temporal signals.
Acknowledgements. The CARINA project was highly collaborative, however, the participants only represent a small fraction of the scientists and ship's crew members who were involved in generating the data. We have tried to acknowledge those people in the CST, but there are unintentional omissions. We consider the online version of the CST to be a dynamic document and will correct/add any such errors/omissions. The scope of this project would have been impossible without the generous support of the CARBOOCEAN contract that supported many of the CARINA participants and provided significant travel support for Key