A uniform, quality controlled Surface Ocean CO 2 Atlas (SOCAT)

. A well-documented, publicly available, global data set of surface ocean carbon dioxide (CO 2 ) parameters has been called for by international groups for nearly two decades. The Surface Ocean CO 2 Atlas (SOCAT) project was initiated by the international marine carbon science community in 2007 with the aim of providing a comprehensive, publicly available, regularly updated, global data set of marine surface CO 2 , which had been subject to quality control (QC). Many additional CO 2 data, not yet made public via the Carbon Dioxide Information Analysis Center (CDIAC), were retrieved from data originators, public websites and other data centres. All data were put in a uniform format following a strict protocol. Quality control was carried out according to clearly deﬁned criteria. Regional specialists performed the quality control, using state-of-the-art web-based tools, spe-cially developed for accomplishing this global team e ﬀ ort. 2011 and holds controlled CO 2 data points oceans and coastal spanning Three types of data products are available: individual cruise ﬁles, a merged complete data set and gridded products. With the rapid expansion of marine CO 2 data collection and the importance of quantifying net global oceanic CO 2 uptake and its changes, sustained data synthesis and data access are priorities.


Motivation
The net absorption of CO 2 by the oceans, caused by rising atmospheric CO 2 concentrations since the industrial revolution, has been responsible for removing CO 2 equivalent to approximately 50 % of the fossil fuel and cement manufacturing emissions or about 30 % of the total anthropogenic emissions, including land use change (Sabine et al., 2004). Because of the availability of the carbonate ion, an important species of the dissolved inorganic carbon pool, and carbonate sediments, the oceans have a tremendous CO 2 uptake capacity and will, on timescales of ten to hundred thousand years, absorb all but a small fraction of the fossil CO 2 that has been and will be emitted (Archer et al., 1997). Meanwhile the changes in ocean CO 2 uptake, relying on factors such as ocean circulation and biology, will be among the decisive factors for the evolution of future atmospheric CO 2 concentrations and climate development (e.g., Friedlingstein et al., 2006;Riebesell et al., 2009).
Presently there are two types of globally coordinated efforts that seek to resolve the dynamics of ocean CO 2 uptake through observations: repeat hydrography and surface ocean CO 2 observations (Gruber et al., 2010;Sabine et al., 2010). While repeat hydrography aims to assess variations in the ocean inventory of CO 2 on decadal timescales, surface ocean observations may resolve variations on seasonal to interannual timescales due to the higher sampling frequency. This high sampling frequency has been made possible by the advent of autonomous instruments and sensors for the nearcontinuous determination of surface water CO 2 , which may be installed on commercial seagoing vessels giving an observational repeat rate of a few weeks, depending on ship schedule (Cooper et al., 1998;Pierrot et al., 2009), or on moorings (Merlivat and Brault, 1995;DeGrandpré et al., 2000;Friederich et al., 2008;Wada et al., 2011). Moorings or drifting platforms provide observations on sub-diurnal timescales (e.g., Körtzinger et al., 2008;Leinweber et al., 2009;Merlivat et al., 2009;Parard et al., 2010), while underway observations increase spatial coverage.
These technological developments have led to a rapid increase in new surface ocean CO 2 data being collected each year. This is reflected in the number of data underlying the successive surface ocean pCO 2 (partial pressure of CO 2 ) climatologies of Takahashi et al. (1997Takahashi et al. ( , 2002Takahashi et al. ( , 2009aTakahashi et al. ( , b, 2011, increasing from 0.25 million for the 1997 edition to 5.2 million in 2011. Presently over a million observations are being made each year (Sabine et al., 2010). In order to deal with these data effectively and to maximise their scientific use, the international ocean carbon research community initiated the Surface Ocean CO 2 Atlas (SOCAT) project in 2007 (IOCCP, 2007). The aims of SOCAT were threefold. Firstly, SOCAT aimed to merge all available surface ocean CO 2 data into one uniformly formatted, quality controlled, publicly available database with regular updates. The second aim of SO-CAT was to secure the long-term storage of each data set together with its required documentation (metadata). Finally, the community sought to realise a transparent and traceable approach for the handling, quality control and integration of surface ocean CO 2 data, which may be managed by the community on a routine basis in the future.
The first version of SOCAT (version 1.5) was made public on 14 September 2011 during "The ocean carbon cycle at a time of change: Synthesis and Vulnerabilities" meeting at the UNESCO (United Nations Educational, Scientific and Cultural Organization), Paris (Bakker et al., 2012). This SOCAT version compromises 6.3 million surface water CO 2 data from 1851 voyages from 1968 to 2007 covering the global oceans and coastal seas (Figs. 1 and 2). Three data products are available: (1) cruise data files of quality controlled surface water f CO 2 (fugacity of CO 2 , similar to partial pressure) data and including the reported CO 2 values as reported by the investigator, (2) globally and regionally aggregated files of these f CO 2 data, and (3) a collection of gridded products providing averaged f CO 2 with minimal interpolation (Sabine et al., 2013). This article describes the history of SO-CAT (Sect. 2), the procedures adopted in SOCAT for retrieving data (Sect. 3), for formatting (Sect. 3) and quality controlling these data (Sect. 4). The article introduces SOCAT data products and where they can be accessed (Sect. 5). An accompanying article (Sabine et al., 2013) describes the gridding procedures. The SOCAT website (www.socat.info) provides documentation on SOCAT, as well as links to sites with SOCAT data products. This article concludes with lessons learned from this first SOCAT version and recommendations for future SOCAT releases (Sect. 6).

History of SOCAT
In the late 1990s attempts were made by the SCOR-IOC (Scientific Committee on Oceanic Research-Intergovernmental Oceanographic Commission) committee on ocean CO 2 , the forerunner of the IOCCP (International Ocean Carbon Coordination Project), to assemble a comprehensive, well documented, publicly available data set of surface ocean f CO 2 for the global oceans and coastal seas. Efforts for encouraging data submission to a central location, the Carbon Dioxide Information Analysis Center, were partly successful. In 2004 the marine carbon community agreed on recommendations for the reporting of surface water CO 2 data and metadata (IOCCP, 2004). However, most data gatherers did not strictly follow these. Only a subset of all global surface water CO 2 data were made publicly available via CDIAC, with many data only available via the investigators, institute websites and national or world data centres.
Over the past decades several attempts have been made to establish a global surface ocean CO 2 database. In the late 1990s, Taro Takahashi from Lamont-Doherty Earth Observatory (LDEO) compiled an initial data set and updated this collection in 2002 and every year from 2007 onwards (Takahashi et al., 1997(Takahashi et al., , 2002(Takahashi et al., , 2009a(Takahashi et al., , 2011. The primary reason for this effort was the creation of global climatologies of airsea CO 2 fluxes (Takahashi et al., 1997(Takahashi et al., , 2002(Takahashi et al., , 2009b. This LDEO database was made public in 2007 and is currently being updated on an annual basis. The data treatment is based upon Takahashi's long experience. The LDEO database includes pCO 2 from discrete and continuous measurements. The most recent version of the LDEO data set has 5.2 million pCO 2 data from the global oceans and coastal seas from 1957 to 2010 (Takahashi et al., 2011).
In 2001, Bakker began to assemble a surface ocean CO 2 data set by putting public data from CDIAC into a uniform format, as part of the European Union (EU) project ORFOIS (Origin and fate of biogenic particle Fluxes in the Ocean and their Interaction with the atmospheric CO 2 concentration as well as the marine Sediment). Pfeil and Olsen streamlined and expanded this effort within the EU project CarboOcean from 2005 onwards. They compiled public surface ocean CO 2 data held at CDIAC, PANGAEA -Data Publisher for Earth & Environmental Science (an International Council for Science (ICSU) World Data Center, formerly the World Data Center for Marine Environmental Sciences, WDC-MARE) and elsewhere into a common format f CO 2 database based on the recommended formats for data and metadata reporting (IOCCP, 2004).  The Surface Ocean CO 2 Atlas was initiated at the Surface Ocean CO 2 Variability and Vulnerability (SOCOVV) meeting by the international ocean carbon research community (Table 1) (IOCCP, 2007). The SOCAT project agrees well with the objectives of the joint Carbon Implementation Plan of the Surface Ocean Lower Atmosphere Study (SOLAS) and Integrated Marine Biogeochemistry and Ecosystem Research (IMBER) (IMBER, 2005). SOCAT was given the specific objectives of developing two data products (IOCCP, 2007): -A quality controlled f CO 2 data set made publicly available on a regular basis following agreed procedures and regional review; -A gridded product consisting of monthly surface f CO 2 means (including number of data points and standard deviation) on a 1 • latitude by 1 • longitude grid with no interpolation.
A gridded surface ocean f CO 2 product was deemed to be more useful than air-sea CO 2 flux estimates for modelling and other purposes (IOCCP, 2007). Regional groups and a global group for coordination were formed (Table 2).    A series of meetings was held in which SOCAT gradually took shape and in which the regional groups coordinated their work (Table 1) (IOCCP, 2007(IOCCP, , 2008(IOCCP, , 2009a(IOCCP, , b, 2010a. The SOCAT community evaluated existing data compilations and selected the data collection by Pfeil and Olsen as the basis for SOCAT (IOCCP, 2008). The focus for SOCAT has been the assembly of publicly available data (including metadata), standardisation of the file formats, recalculation of consistent and uniform surface water f CO 2 data, and basic and secondary level quality control (Sects. 3, 4 and 5).
SOCAT is independent from the LDEO database (Takahashi et al., 2011), but has a large overlap in its original data. SOCAT only includes surface water CO 2 values, measured in near-continuous operation or in discrete samples with an equilibrator system or a spectrophotometer and reported as xCO 2 (CO 2 mixing ratio), pCO 2 or f CO 2 (Sect. 3). SOCAT does not include f CO 2 recalculated from dissolved inorganic carbon, alkalinity or pH.

SOCAT groups
Roughly 45 international, seagoing marine carbon scientists and data managers from 12 countries actively participated in the assembly and quality control of SOCAT version 1.5. These participants were organised into regional groups and a global group ( Table 2). The regional groups were responsible for quality control of the data in their region. Regional groups were formed for the coastal seas (north of 30 • S), the North Atlantic Ocean (north of 30 • N, including the Atlantic Arctic Ocean), the Tropical Atlantic Ocean (30 • N to 30 • S), the North Pacific Ocean (north of 30 • N, including the Pacific Arctic Ocean), the Tropical Pacific Ocean (30 • N to 30 • S), the Indian Ocean (north of 30 • S) and the Southern Ocean (south of 30 • S, including coastal waters). Coastal regions were initially defined by bathymetry (shallower than 200 m) for regions north of 30 • S (IOCCP, 2008). This definition was later replaced by a criterion of distance from a major land mass (less than 400 km) in order to better reflect the environmental significance of these regions as continental margins. Figure 3 shows these oceanic and coastal regions in SOCAT.
SOCAT has been a large, complex undertaking and has involved activities focused on: data retrieval, assembling data in a uniform format, recalculating surface water f CO 2 using the same agreed-upon protocol, defining SOCAT QC criteria, developing the QC cookbook and Matlab QC code, making SOCAT available via the Live Access Server (LAS) for QC and public release, data QC, gridding SOCAT, making SOCAT documentation and products available via the web, designing the SOCAT logo, internal communication, organisation of SOCAT meetings, and liaising with the international marine carbon community. Numerous colleagues have played a role in these activities ( Table 3). The SOCAT global group initially had five members and has gradually been expanded to reflect the increasing complexity of the tools and products in SOCAT (Table 2).
The data in SOCAT are a synthesis of 4 decades of seagoing fieldwork by numerous scientists from 12 countries. Various instruments have been used to obtain these data and only the basic principles will be summarised here. Further information is available in the metadata, which accompany individual cruise files at PANGAEA (doi:10.1594/PANGAEA.769638) (Sect. 5.2).
The seawater f CO 2 values included in SOCAT have been measured according to one of the following two principles: (1) analysis of the CO 2 content in an air sample in equilibrium with a large volume of seawater or (2) calculation of the seawater f CO 2 from the colour response of an acidbase indicator dye (sulfonephtalein) in contact with seawater across a CO 2 permeable membrane. The analysis of the CO 2 content in an air sample in equilibrium with a large volume of seawater is recommended in the standard work by Dickson et al. (2007). The CO 2 concentration in the air sample is determined through either gas chromatography (Weiss at al., 1981) or infrared analysis (Takahashi, 1961). The equilibration of air and water can be carried out in an equilibrator in a flow-through system (Takahashi, 1961;Wanninkhof and Thoning, 1993;Cooper et al., 1998;Pierrot et al., 2009). Only the latter (i.e., continuous data) are included in SO-CAT. Flow-through systems combined with a non-dispersive infrared (IR) detector are by far the most common type in operation. Flow-through systems are routinely deployed on commercial vessels (e.g., Cooper et al., 1998;Olsen et al., 2008;Watson et al., 2009), research vessels (e.g., Lefèvre et al., 1994;Skjelvan et al., 1999;Bakker et al., 2008), and on moored platforms (e.g., Friederich et al., 2008;Wada et al., 2011). Intercomparison experiments have taken place on a number of occasions (e.g., Körtzinger et al., 1996Körtzinger et al., , 2000. The indicator-based, spectrophotometric determination of f CO 2 has been developed for moored and drifting platforms (Lefèvre et al., 1993). Prominent examples of these are the CARIOCA (Carbon Interface Ocean Atmosphere) buoy (Merlivat and Brault, 1995) and the SAMI (Submersible Autonomous Moored Instrument) pCO 2 instrument (De-Grandpré et al., 2000). These instruments have been deployed in many ocean regions (e.g., Hood et al., 1999;Bakker et al., 2001;Körtzinger et al., 2008;Lefèvre et al., 2008).

Data harmonisation and basic quality control
All data files available for SOCAT were first converted to a common file structure. This also included discarding data not directly relevant for surface ocean CO 2 , e.g., meteorological parameters like wind speed and direction, whenever these were supplied in the file. Next, the unit of each parameter was checked and converted into the agreed standard unit, if required (e.g., conversion of atmospheric pressure from atmospheres to hPa, and of latitude and longitude to decimal degrees). For around 10 % of the cruises, different versions of the data had been obtained from various sources. In these cases only the most recent version was included in SOCAT in consultation with the data originator.
Basic, primary quality control was carried out at this stage. Outliers and unrealistic values in date, time, position, intake temperature, salinity, atmospheric pressure and surface water CO 2 were identified. The criteria were that ship speeds calculated from position should be realistic, that atmospheric pressures should be within 800 hPa and 1100 hPa and that the dates should exist. Rapid changes in intake or equilibrator temperature of several degrees, in salinity of several units or in surface f CO 2 of several hundreds of microatmospheres were also questioned, except for data in coastal or ice-covered regions. Whenever several such data points were encountered, the data originator was contacted and this often resulted in resubmission of an updated (corrected) ver-sion. In some cases several iterations were required, making this a time-consuming task. In a few cases interaction with the data originator was not possible, and obviously bad data were removed from the data file.
In version 2 of SOCAT this class of quality control will be used to assign quality flags to individual data points, using the conventions of the World Ocean Circulation Experiment (WOCE): flag 2 (good), flag 3 (questionable) or 4 (bad). Only a very small number of WOCE flags 3 and 4 are found in the version 1.5 data collection.

f CO 2 (re-)calculations
The final stage of the SOCAT data assembly was the (re-)calculation of f CO 2 values at sea surface (or intake) temperature in order to ensure a uniform representation of CO 2 concentration. The conversions from xCO 2 and pCO 2 were carried out using a single set of equations with a clear hierarchy for the preferred CO 2 input parameter (Table 4) ). We used the equations recommended by Dickson et al. (2007), for the conversion of dry CO 2 mole fraction to partial pressure at 100 % humidity, where P equ is the pressure in the equilibrator. The water vapour pressure pH 2 O is calculated as where T K equ is the measurement (or equilibrator) temperature in Kelvin and S is sample salinity. For the conversion of pCO 2 values into f CO 2 the equation is where xCO 2 wet T equ is the wet CO 2 mole fraction at the equilibrator temperature. The virial coefficients for CO 2 , B(CO 2 , T K equ ) and δ(CO 2 , T K equ ) (cm 3 mol −1 ), are given by Whenever conversion of the measurement (equilibrator) temperature (T equ ) to the sea surface temperature (SST) was required, we used the equation of Takahashi et al. (1993) with both temperatures in the same unit:   Table 4. Surface water CO 2 parameters reported in the original data files, which have been used for the calculation of recommended f CO 2 (fCO2 rec) at sea surface (or intake) temperature ). The parameters are listed in order of preference (with index 1 as the favourite). The index has been reported in the SOCAT global and regional output files as "fCO2 source" (Table 5). Ancillary parameters have been used for NCEP (National Centers for Environmental Prediction) atmospheric pressure (Kalnay et al., 1996) and WOA (World Ocean Atlas) salinity (Antonov et al., 2006)   Averaged -Indicator that data was averaged for version 1.5 1 * refers to data reported by the data originator. 1 Individual cruise data files in version 1.4 may contain multiple entries for a given time stamp. Multiple entries for a given time stamp have been averaged in the global and regional concatenated files in version 1.5 (Sects. 5.1, 5.2 and 5.3). 2 If the intake depth has not been reported by the data originator, we assume an intake depth of 5 m. The Takahashi et al. (1993) temperature correction is preferred, as it does not require knowledge of the alkalinity and dissolved inorganic carbon content of the water and was determined for isochemical conditions, while other temperature corrections (Gordon and Jones, 1973;Weiss et al., 1982;Copin-Montégut, 1988, 1989Goyet et al., 1993) were not.
Altogether 6 different surface ocean CO 2 parameters were reported by the data originators, notably xCO 2 , pCO 2 and f CO 2 , either at sea surface (or intake) temperature or at equilibrator (or measurement) temperature (Table 4). The (re-)calculations of f CO 2 at sea surface temperature were implemented following these strict guidelines: 1. Whenever possible, (re-)calculate f CO 2 .
2. The preferred starting point for the calculations is xCO 2 , next pCO 2 , and finally f CO 2 .
3. Minimize the use of external data required to complete the calculations.
Thus, f CO 2 was recalculated if xCO 2 , pCO 2 , and f CO 2 , as well as all parameters required to calculate f CO 2 were available in the file. However, f CO 2 was not recalculated if f CO 2 was reported, but pressure or salinity were not, as Eqs. (1), (2) and (3) could not be applied without resorting to external data. If only surface water f CO 2 at sea surface temperature was provided (as is the case for CARIOCA data and other spectrophotometric measurements), no recalculation was carried out. If f CO 2 was not provided, f CO 2 was always calculated, even if use of external data was necessary. Table 4 lists the parameters that went into the f CO 2 calculations and the preference (or hierarchy) of the different calculation methods. The f CO 2 values, which have been (re-)calculated following the preferred method (lowest index number in Table 4), are reported as the recommended f CO 2 (fCO2 rec) values in each SOCAT output file (Table 5). The calculation method is indicated (as fCO2 source) in the regional and global synthesis files of SOCAT version 1.5 (Table 5).
Two external parameters were used for the recalculations of f CO 2 , when necessary: climatological monthly mean salinity was obtained from the World Ocean Atlas (WOA) 2005 (Antonov et al., 2006). Sea level pressure (SLP) was acquired from the NCEP/NCAR (National Centers for Environmental Prediction/National Center for Atmospheric Research) project (Kalnay et al., 1996), provided on a 6 hourly, global, 2.5 • latitude by 2.5 • longitude grid. Whenever NCEP/NCAR SLP or reported atmospheric pressure was used in the calculations (as opposed to equilibrator pressure), 3 hPa were added to account for the slight overpressure normally maintained in ships (Takahashi et al., 2009b). Surface water CO 2 data without accompanying SST were suspended from SOCAT, as f CO 2 is highly sensitive to temperature fluctuations.

Naming convention
Each cruise was assigned a unique cruise identifier, an Expocode (Swift, 2008), to remove the ambiguities of the commonly used informal cruise names and to identify duplicate versions of data. The first two characters of a twelvecharacter Expocode identify the country code of the vessel and are followed by the two-character National Oceanographic Data Center (NODC) vessel code. The final eight characters denote the starting date of the measurements of the cruise (as YYYYMMDD). For instance, 06MT19920510 means that this cruise was conducted on the German (06) research vessel Meteor (MT) and that the first measurement was reported for 10 May 1992. Both the Expocode and the original cruise name are provided in all SOCAT output files (Sect. 5), such that cruises can be retrieved using the Expocode as well as the vessel specific or investigator specific naming convention (M21/3 for the above example). The Expocode has not been used for buoys, since no NODC vessel code is available for these.

SOCAT secondary quality control
An important aim of SOCAT was to establish and implement community agreed secondary quality control (QC) procedures for f CO 2 data. Procedures for secondary QC were established at several SOCAT workshops (IOCCP, 2008(IOCCP, , 2009b(IOCCP, , 2010b and were summarized in the SOCAT QC Cookbook ). Secondary quality control was carried out by the SOCAT regional groups. The following sections provide an overview on the secondary quality control procedures in SOCAT and their implementation.

SOCAT secondary quality control procedures
Secondary quality control was carried out in SOCAT by assigning a quality flag to each cruise. The cruise flags provide information on the expected quality of the f CO 2 data in the different cruises. These are based on (i) an evaluation of the procedures and instruments used to measure the data, (ii) the availability of documentation enabling this evaluation, i.e., metadata, (iii) (whenever possible) a comparison with other data collected in the same region in the same period, and (iv) an assessment of data quality. The cruise flags and the formal criteria used to assign them are provided in Table 6. Only cruises with cruise flags A, B, C or D are included in the SOCAT products (Sect. 5).
In order to achieve cruise flag A, the data had to be accompanied by "complete metadata documentation", the measurement techniques had to follow "approved methods or SOP criteria" (Standard Operating Procedures), extended QC had to be carried through and deemed acceptable, and the data would have to reasonably compare to other data from the same region. Moving from the A through to the D flag implies that the data did not meet with one or several of these Table 6. Criteria for assigning cruise flags, based on the expected quality of the recommended f CO 2 data (revised after Olsen and Metzl, 2009). All criteria need to be met for assigning a cruise flag. SOP is Standard Operating Procedures (Dickson et al., 2007). QC is quality control.

N (No flag)
No cruise flag has yet been given to this cruise.

U (Update)
The cruise data have been updated.
No cruise flag has yet been given to the revised data.
criteria. Hence, if the data were found to be sufficiently documented, obtained according to approved methods, and the data quality was deemed acceptable, but the data could not be compared to other data from the same region (since no other data were available), they would be assigned cruise flag B. If the sampling techniques did not follow approved methods, a flag C was assigned, and if the metadata documentation was incomplete, the data were assigned a cruise flag D. In addition to the A to D flags, it was intended that a flag F should be assigned to cruises for which the extended QC revealed that the data were non-acceptable. In practice, however, such cruises were suspended (Flag S). The flag S was assigned to cruises which were suspended with the aim to update the cruise data in future (often by the individual PIs after SOCAT QC revealed issues that could be fixed) and the flag X was assigned to data that were identified as duplicates of other data in SOCAT. To streamline the workflow we also used flags N, for newly added cruises with no cruise flag yet, and U, for cruises that had been updated.

Approved methods or SOP criteria
By approved methods or SOP criteria, required for flags A and B, we mean the recommendations of a 2002 workshop on underway f CO 2 systems (Atlantic Oceanographic and Mete-orological Laboratory, 2002), as well as those of Dickson et al. (2007). Adhering to these methods results in f CO 2 data with an accuracy of 2 µatm or better ). Seven SOP criteria need to be fulfilled for a cruise flag A or B in SOCAT: 1. The data are based on xCO 2 analysis, not f CO 2 calculated from other carbon parameters, such as pH, alkalinity or dissolved inorganic carbon; 2. Continuous CO 2 measurements have been made, not discrete CO 2 measurements; 3. The detection is based on an equilibrator system and is measured by infrared analysis or gas chromatography; 4. The calibration has included at least 2 non-zero gas standards, traceable to World Meteorological Organisation (WMO) standards; 5. The equilibrator temperature has been measured to within 0.05 • C accuracy; 6. The intake seawater temperature has been measured to within 0.05 • C accuracy; 7. The equilibrator pressure has been measured to within 0.5 hPa accuracy.
Note that criterion 1 also needs to be fulfilled for flags C and D. A satisfactory comparison with other data was required for flag A. This was carried out using either comparison with all other data previously obtained in the region or, if available, a set of formally defined cross-overs. The formally defined cross-overs were identified using a criterion that combined separation in space and separation in time into a single value. The algorithm that was used treated 1 day of separation in time as equivalent (heuristically) to 30 km of separation in space, i.e., if dx is the distance between points from two cruises in km, and dt is the separation between the same two points in days, then the separation between these two points would be given as [dx 2 + (dt/30) 2 ]. The crossover distance separating two cruises, dc, is the smallest value found comparing all pairs of points between two cruises. If a cross-over distance between two cruises was zero, a cruise had in most cases been erroneously duplicated, and the oldest version of the cruise data was excluded (flag X) in consultation with the data originator. Where the cross-over distance was relatively small, meaningful QC insights were often found by comparing observations from the two cruises. The LAS (Sect. 4.2.2) offered QC operators a means to compare cruise pairs with a small cross-over distance between them. No strict criteria were defined for judging the quality and significance of cross-overs.
The comparison with all data collected in a region was implemented in the Matlab QC toolbox for SOCAT (Sect. 4.2.3). This toolbox prompts the user to define the region of interest on a map, hence allowing the QC operator to use his/her expert knowledge of the regional characteristics during this process. When the region has been defined, the toolbox produces figures that compare the data subject to QC with all other data in the region in the time-and space domains, as well as in SST-space.

Metadata
Complete metadata documentation was required for flags A, B and C. By complete we mean that all the following information must be supplied: 5. The type of reported CO 2 data (xCO 2 , pCO 2 , f CO 2 ); 6. The number of CO 2 standards used with their approximate CO 2 mixing ratio and traceability; 7. A list of sensors and their accuracy, notably for: a. The equilibrator and seawater intake temperature; b. The equilibrator pressure.
Salinity does not need to be highly accurate for meeting the 2 µatm criterion, as the sensitivity of the xCO 2 to f CO 2 calculation is small (for example xCO 2 of 360 ppm at 20 • C and 1 atm yields f CO 2 of 347.22 µatm and 347.24 µatm at salinity 30 and 35, respectively). The metadata information had to either appear in the metadata themselves or in a publication cited in the metadata.

Extended quality control deemed acceptable
Flags A, B, C and D all required an extended QC with acceptable results. This extended QC included checks of the sampling positions and time, atmospheric pressures, salinity, intake and equilibrator temperatures, as well as recommended f CO 2 data, and included also a comparison with other data from the same region, if possible. The parameters were checked for range and occurrence of sudden, unrealistic jumps, and data from multiple streams were compared whenever possible (equilibrator, atmospheric and NCEP pressure; measured and WOA salinity; intake and equilibrator temperatures). Criteria for comparison of the intake and equilibrator temperatures were defined by the Southern and Indian Ocean SOCAT groups at their joint workshop in 2010 (IOCCP, 2010b): -Warming should be less than 3 • C; -Warming rate should be less than 1 • C h −1 , unless a rapid temperature front is apparent; -Warming outliers should be less than 0.3 • C, compared to background data.
Apart from these, no strict criteria for QC were defined for the extended QC across all SOCAT groups. This will be improved in future versions of SOCAT. If the data from a campaign were by-and-large of unacceptable quality, a cruise flag S was assigned. Whenever a large number (> 50, as a guideline) of non-acceptable data were found, the data file was suspended (Flag S), while the data contributor was invited to submit a suitably revised version of the data. If revised data were made available before the SOCAT quality control had been completed and were deemed of good quality, the data were included in version 1.5. Other resubmitted data will be included in the quality control for future SOCAT versions.
If it was not possible to establish contact with the data originator, or if the number of unacceptable data was sufficiently small (typically less than 50), WOCE flags 3 (questionable) or 4 (bad) were assigned to each unacceptable f CO 2 recommended value (Table 5). While WOCE flags 3 and 4 were assessed during version 1 quality control, virtually all such flags were unintentionally reset to flag 2 (good) in the version 1 data products. The WOCE flags 3 and 4 assigned during version 1 quality control will be applied in the SOCAT version 2 products.
4.2 Secondary quality control in practice 4.2.1 Secondary quality control by the regional groups The regional groups had the responsibility for secondary QC of all cruises crossing their region. Regional SOCAT QC operators carried out secondary quality control and assigned flags to each cruise during the QC process upon evaluation of the data and metadata. The recommended f CO 2 and supporting data were made available via the Live Access Server during quality control. Data were evaluated according to the procedures outlined above. The QC was carried out in a variety of ways, either online via the LAS (Sect. 4.2.2) or offline (Sect. 4.2.3).

Live Access Server for quality control
The Live Access Server is a web server designed by NOAA PMEL (National Oceanic and Atmospheric Administration, Pacific Marine Environmental Laboratory) to provide access to geo-referenced scientific data (http://ferret.pmel.noaa.gov/ LAS). Cruise data and metadata were ingested into a relational database and made available to the regional teams for evaluation through a version of the LAS, which had been enhanced with SOCAT quality control tools. Contents of the database included recommended f CO 2 values, ancillary parameters, cruise metadata, and reference variables drawn from other sources (Sect. 5.4). The LAS enabled QC operators to query the data collection using criteria of region, time period, seasonality, cruise and ship identifiers, and ranges of data values. The scientists could select data from one or more cruises, evaluate the data within the LAS and/or download subsets as compressed files for offline QC. The LAS offered QC evaluation tools, such as interactive property-property plots and co-inspection of cruises identified by the crossover analysis (Sect. 4.2.1). The LAS provided access to the cruise metadata, which was evaluated as part of the QC. It also allowed uploading of ancillary documentation about the cruises and QC findings. The QC operators entered cruise flags and WOCE flags with comments explaining the rationale for their evaluations on the LAS during quality control. These cruise flags and comments are available via the Cruise Data Viewer (Sect. 5.5). The system alerted QC operators, when conflicting QC evaluations had been entered, allowing SOCAT scientists to evaluate and resolve these conflicts.

Offline quality control
A set of Matlab routines for data evaluation was available for offline QC (Olsen and Pierrot, 2010). These routines create a series of plots, enabling QC. This toolbox prompts the user to define the region of interest on a map, hence allowing the QC operator to use his/her expert knowledge of the regional characteristics during the QC process. When the region has been defined, the toolbox produces figures that compare the data subject to QC with all other data in the region in the time-and space domains as well as in sea surface temperature space. Examples include a series of property-property plots, a series of plots of property versus time, and a series of plots comparing the f CO 2 data for the cruise subject to QC with all other data obtained in the region (as defined by the QC operator).

Conflicting cruise flags
Most cruises cross multiple regions, e.g., the coastal region and the North Atlantic Ocean. In SOCAT QC, a cruise needed to receive a cruise flag for each region that it crosses. A final check in the quality control consisted of checking conflicting cruise flags (Bakker). Most "conflicting" cruise flags reflected the absence of quality control in one region. These conflicts were resolved by carrying out appropriate QC and entering the missing cruise flags. Few truly conflicting cruise flags were encountered and in all cases a satisfactory solution was found.

Secondary quality control in practice
4.3.1 Secondary quality control by the regional groups The regional groups had the responsibility for secondary QC of all cruises crossing their region. Regional SOCAT QC operators carried out secondary quality control and assigned flags to each cruise during the QC process upon evaluation of the data and metadata. The recommended f CO 2 and supporting data were made available via the Live Access Server during quality control. Data were evaluated according to the procedures in the SOCAT cookbook ). The QC was carried out in a variety of ways, either online via the LAS (Sect. 4.3.2) or offline (Sect. 4.3.3).

Live Access Server for quality control
The Live Access Server is a web server designed by NOAA PMEL (National Oceanic and Atmospheric Administration, Pacific Marine Environmental Laboratory) to provide access to geo-referenced scientific data (http://ferret.pmel.noaa.gov/ LAS). Cruise data and metadata were ingested into a relational database and made available to the regional teams for evaluation through a version of the LAS, which had been enhanced with SOCAT quality control tools. Contents of the database included recommended f CO 2 values, ancillary parameters, cruise metadata, and reference variables drawn from other sources (Sect. 5.4). The LAS enabled QC operators to query the data collection using criteria of region, time period, seasonality, cruise and ship identifiers, and ranges of data values. The scientists could select data from one or more cruises, evaluate the data within the LAS and/or download subsets as compressed files for offline QC. The LAS offered QC evaluation tools, such as interactive property-property plots and co-inspection of cruises identified by the cross-over analysis (Sect. 4.2.1). The LAS provided access to the cruise metadata, which had to be evaluated as part of the QC. It also allowed uploading of ancillary documentation about the cruises and QC findings. The QC operators entered cruise flags and WOCE flags with comments explaining the rationale for their evaluations on the LAS during quality control. The flags and comments are available via the Cruise Data Viewer (Sect. 5.5). The system alerted QC operators, when conflicting QC evaluations had been entered, allowing SO-CAT scientists to evaluate and resolve these conflicts.

Offline quality control
A set of Matlab routines for data evaluation was available for offline QC (Olsen and Pierrot, 2010). These routines create a series of property-property plots, enabling QC operators to compare data from cruises in the same region. The f CO 2 is plotted and colour coded according to the input parameter used (xCO 2 , pCO 2 , f CO 2 ) in the (re-)calculation of recommended f CO 2 (Sect. 3.3). Examples include a figure comparing the f CO 2 versus sea surface temperature of a particular cruise to that for other cruises in the region. A second plot compares the monthly average and spread of the data in a box plot.

Suspended cruises and conflicting cruise flags
During the primary and secondary quality control, cruises were suspended from SOCAT (cruise flag "S" in Table 6), as minor and major flaws in the CO 2 data or in the data necessary for the (re-)calculation of f CO 2 became apparent. Data contributors were informed of these suspensions and were invited to resubmit their data upon making relevant corrections to the original data. In many cases data were resubmitted to SOCAT.
Most cruises cross multiple regions, e.g., the coastal region and the North Atlantic Ocean. In SOCAT QC, a cruise needed to receive a cruise flag for each region that it crosses. A final check in the quality control consisted of checking conflicting cruise flags (Bakker). Most "conflicting" cruise flags reflected the absence of quality control in one region. These conflicts were resolved by carrying out appropriate QC and entering the missing cruise flags. Few truly conflicting cruise flags were encountered and in all cases a satisfactory solution was found.

SOCAT cruises, versions, and time stamps
SOCAT data are publicly available via the SOCAT website (www.socat.info) as individual cruise data files (SO-CAT version 1.4) (Sect. 5.2) and as regional and global, concatenated files (SOCAT version 1.5) (Sect. 5.3). SO-CAT versions 1.4 and 1.5 include all cruises with a cruise flag A, B, C or D. A table of these cruises is available at doi:10.1594/PANGAEA.769638 and provides information about the investigator, research vessel, Expocode, original cruise naming, metadata (as reported by the investigator), and temporal and geographical coverage. Through PANGAEA SOCAT is fed into the ICSU World Data System (WDS). The Global Earth Observation System of Systems (GEOSS), which is being built by the Group on Earth Observations (GEO), makes SOCAT available to other research communities.
The individual cruise data files (version 1.4) record observation time stamps at a resolution of integer minutes, rounding off the seconds, when they were available. Some cruises have multiple recommended f CO 2 values for a given time stamp (around 5 % of the observations). Individual cruise data files (Sect. 5.2) contain all recommended f CO 2 data, including multiple values per minute. However, handling multiple entries for the same time stamp can be problematic for some software programs. The SOCAT global group decided to average multiple entries within a given minute for the regional and global synthesis files (Sect. 5.3) as a pragmatic solution to this issue. Table 5 lists the contents of the SOCAT files in versions 1.4 and 1.5. Matlab code by Pierrot and Landschützer for reading these files is available via the SOCAT website or directly at CDIAC (http://cdiac.ornl.gov/ftp/oceans/SOCATv1. 5/).

Individual cruise data files (version 1.4)
Individual cruise data files (version 1.4) with cruise flags A, B, C and D are available via PANGAEA (doi:10.1594/PANGAEA.767698). These cruise data files include all recommended f CO 2 data with WOCE flags 2 (good), 3 (questionable) and 4 (bad), without listing these WOCE flags. Cruise data files archived at PANGAEA have not been averaged to remove multiple entries per minute (Sect. 5.1).
The individual cruise data files provide access to the metadata, the original CO 2 parameter(s) (as reported by the investigator), which were used to (re-)calculate f CO 2 (Sect. 3.3), and the (re-)calculated and quality controlled f CO 2 data. The files contain these additional parameters: WOA salinity (Antonov et al., 2006), NCEP/NCAR sea level pressure (Kalnay et al., 1996) andETOPO2 (2006) bathymetry. Each individual cruise data file has been assigned a digital object identifier (doi) for citation and transparency. Table 5 lists the parameters in the cruise data files.
5.3 SOCAT global and regional files (version 1.5) Regional and global concatenated files (version 1.5) have been merged from the individual cruise data files for a subset of SOCAT parameters (Table 5). These concatenated files only contain recommended f CO 2 data with a WOCE flag 2 from cruises with a flag A, B, C or D. Table 5 lists the parameters in these regional and global synthesis files. Some changes have been applied relative to SOCAT version 1.4 (Sects. 5.1 and 5.2). Notably, multiple entries with the same time stamp were averaged for the global and regional synthesis files (Sect. 5.1).
Additional parameters have been added to the regional and global, concatenated files. These include Julian day (day of year), interpolated atmospheric xCO 2 extracted from GLOBALVIEW-CO2 (2008), WOA salinity, NCEP/NCAR sea level pressure and ETOPO2 bathymetry. The global and regional files specify which reported CO 2 variable was used for (re-)calculation of recommended f CO 2 (Sect. 3.3; Table 5). Every line of the concatenated files contains a doistring, which provides a link to the individual cruise data file with the original CO 2 parameter(s) and metadata at PANGAEA (Sect. 5.2).
The regional and global concatenated files (version 1.5) are publicly available as "compressed zip" text files via CDIAC (http://cdiac.ornl.gov/ftp/oceans/SOCATv1.5/), in Ocean Data View (ODV) format (http://odv.awi.de/en/data/ ocean/socat v15 fco2 data/) and via the interactive Cruise Data Viewer (Sect. 5.4). NetCDF files (Eaton et al., 2011) will be made available in the future. The text files exist as one very large global file, and as subset files per region, with no overlap between the regions. The latter means that data of a given cruise may have been divided into several regional files (for example North Atlantic, Tropical Atlantic and Coastal region).

Cruise Data Viewer (version 1.5)
The LAS Cruise Data Viewer provides interactive access to SOCAT version 1.5 on a Live Access Server. It provides all of the output capabilities described in Sect. 4 as tools for the SOCAT QC-ers, except for the ability to enter QC flags and comments. The Cruise Data Viewer also supplies variables from other sources that provide scientific context useful to users of the f CO 2 data: atmospheric xCO 2 values interpolated from GLOBALVIEW-CO2 (2008), WOA 2005 salinity, NCEP/NCAR sea level pressure values (Kalnay et al., 1996), and bathymetry from ETOPO2 (2006).
The Cruise Data Viewer allows the inclusion of WOCE flag 3 (questionable) or 4 (bad) data when viewing or downloading data. When subsets are downloaded from the Cruise Data Viewer, each data line contains a doi-string that links di-rectly to the relevant cruise data file with its original reported CO 2 parameters at PANGAEA. A "Table of Cruises" is available from the Cruise Data Viewer and lists the cruise flags, QC comments and SOCAT QC-er for each cruise. The Cruise Data Viewer can be accessed via the SOCAT website or directly at http://ferret.pmel.noaa.gov/SOCAT cruise viewer/.

Gridded products (version 1.5)
The gridded products provide values at a 1 • latitude by 1 • longitude resolution using monthly, annual, decadal and monthly climatological timescales, and at a 0.25 • latitude by 0.25 • longitude with monthly time resolution for coastal analysis (Sabine et al., 2013). The recommended f CO 2 with a WOCE flag 2 were gridded by two algorithms: (1) averages giving equal weight to each observation in a cell, and (2) averages giving equal weight to each cruise that passed through a cell. Mean, extremes and standard deviations of f CO 2 are provided. Other statistical measures include the number of cruises per cell, the number of observations per cell and measures of the degree to which the f CO 2 averaged values may be biased from the cell centre. The SOCAT version 1.5 gridded products have not been corrected for any temporal increase in surface water f CO 2 . Gridded fields are available as NetCDF files from CDIAC (http://cdiac.ornl.gov/ftp/oceans/ SOCATv1.5/SOCATv1.5 Gridded Dat/) and via the interactive Gridded Data Viewer. For more details, refer to the accompanying paper by Sabine et al. (2013).

Gridded Data Viewer (version 1.5)
The interactive LAS Gridded Data Viewer enables users to explore the gridded SOCAT fields. The viewer displays maps and time series for the specific region or period selected. Sequences of fields can be viewed as animations. Simple statistics such as means, extremes, variance and counts, may be requested of the data. By requesting counts of the number of observations and cruises, a user is able to explore the global coverage of the SOCAT collection. Figure 4 obtained by this means, illustrates the north-south distribution of cruises in the years 2000 through 2007. The gridded viewer also supplies 1 • latitude by 1 • longitude marine surface variables from ICOADS (2008) that provide useful scientific context when exploring f CO 2 : surface air temperature, sea level pressure, sea surface temperature, and surface wind speed. The Gridded Data Viewer can be accessed at (http://ferret.pmel.noaa.gov/SOCAT gridded viewer/) or via the link on the SOCAT website.

Lessons learned
SOCAT has taken four years to be put together and has been a large, international, collaborative effort of the marine carbon  research community. SOCAT version 1.5 is the culmination of much hard work in data collection, data assembly and quality control by many seagoing marine carbon scientists around the world.
Lessons learned and improvements for future SOCAT releases have been discussed at the Surface Ocean CO 2 Datato-Flux workshop (IOCCP, 2013). The lessons include a strong need for automating SOCAT with respect to data submission, metadata submission and quality control. The automation and other improvements will reduce the amount of work required for creating SOCAT data products and SO-CAT quality control, while at the same time speeding up the whole process with the aim to provide regular updates.
The SOCAT global group, upon consultation with regional group leaders, has decided to start work on SOCAT version 2, while in parallel automating SOCAT for version 3. Data submission to SOCAT version 2 was closed on 31 December 2011. SOCAT version 2 products will report time in seconds as reported in the original data files to remove the need to calculate averaged data. Regular SOCAT releases are envisaged, e.g., every two years from SOCAT version 3 onwards. Such regular future SOCAT releases will require sustained funding for key players.
Colleagues are strongly encouraged to make public their surface water f CO 2 data and accompanying documentation from the global oceans and coastal seas, preferably via CDIAC (http://cdiac.ornl.gov/oceans/submit.html) for inclusion in future SOCAT releases. Data and metadata should be reported in the IOCCP (2004) recommended formats, which are also listed on the CDIAC website.

Automation of SOCAT procedures
Automation of the submission of data and metadata will include prompt feedback to the data originator on unrealistic data and property-property plots of the data, such that the data originator can carry out primary and initial secondary quality control. Such automation will facilitate harmonisation of the data for SOCAT and will strongly reduce the number of cruises suspended from SOCAT during secondary quality control.
In the future, new cruises will be added to the LAS at regular (e.g., two monthly) intervals, enabling QC operators to carry out regular SOCAT QC. The Live Access Server will be modified to automatically generate typical property-property plots for secondary QC. The LAS will be enhanced with features to enter cruise flags and QC comments for multiple cruises (e.g., on the same vessel).

SOCAT products for assessing the ocean carbon sink
The release of SOCAT version 1.5 represents a milestone in ocean carbon research. Research using SOCAT will highlight the response of surface water f CO 2 and the oceanic CO 2 sink to increasing levels of atmospheric CO 2 in a changing climate. The SOCAT products can be used in studies of spatial and temporal (seasonal, interannual and decadal) variability and trends in surface water f CO 2 . The SOCAT products will enable validation of model distributions of surface water f CO 2 and air-sea CO 2 fluxes. SOCAT will aid process studies of oceanic f CO 2 variability, e.g., in the North Atlantic, in the Pacific Ocean, in coastal seas, in the Arctic Ocean, in seasonally ice-covered Southern Ocean regions, near remote islands and oceanographic fronts. The SOCAT products may be used to create monthly basin-wide f CO 2 maps for the most data-rich basins by a range of techniques such as neural networks, statistical techniques and algorithms (e.g., Lefèvre et al., 2005;Telszewski et al., 2009). These f CO 2 maps can be used for calculating basin-wide monthly CO 2 airsea fluxes, which may constrain atmospheric inversions for global atmospheric carbon budgets. Study of length scales of f CO 2 variability will provide information on the minimum sampling coverage required for quantifying the oceanic CO 2 sink with sufficient accuracy (e.g., Lenton et al., 2009). It is expected that the regular SOCAT releases will become a crucial tool in quantification of changes in oceanic CO 2 uptake and in global climate research. Increasing the number of surface ocean CO 2 data has in the past significantly modified the estimate of the oceanic CO 2 sink (e.g., Takahashi et al., 2009b). SOCAT and its future development will contribute to further enhance the reliability of such assessments.