A multi-decade record of high-quality f CO 2 data in version 3 of the Surface Ocean CO 2 Atlas ( SOCAT )

The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled f CO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.7 million f CO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.6 million f CO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water f CO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water f CO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. Highprofile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) “living data” publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014). Individual data set files, included in the synthesis product, can be downloaded here: doi:10.1594/PANGAEA.849770. The gridded products are available here: doi:10.3334/CDIAC/OTG.SOCAT_V3_GRID. Data coverage and parameter measured Repository references: Individual data set files and synthesis product: doi:10.1594/PANGAEA.849770 Gridded products: doi:10.3334/CDIAC/OTG.SOCAT_V3_GRID Available at http://www.socat.info/ Coverage: 79 S to 90 N, 180W to 180 E Location name: Global Oceans and Coastal Seas Date/time start: 21 October 1957 Date/time end: 4 October 2014


Introduction
The oceans represent a vast reservoir for carbon, mainly in the form of dissolved inorganic carbon (DIC), made up of the species bicarbonate, carbonate and dissolved carbon dioxide (CO 2 ). This carbon reservoir is in contact with the much smaller reservoir of CO 2 in the atmosphere via air-sea gas exchange.
Emissions of CO 2 by human activity, such as fossil fuel burning, cement manufacturing and changes in land use, are rapidly increasing the atmospheric concentration of this long-lived greenhouse gas. The oceans are taking up about 26 % of the global CO 2 emissions with ocean uptake estimated at 2.6 ± 0.5 Pg C yr −1 for the time period 2005 to 2014 (Le Quéré et al., 2015b). This ocean carbon sink slows down the rate of climate change caused by human activity. Ocean carbon uptake changes ocean carbonate chemistry, notably by reducing ocean pH and the carbonate ion concentration, a process known as ocean acidification and sometimes referred to as "the other CO 2 problem" (Turley, 2005;Henderson, 2006;Doney et al., 2009a). These changes in ocean chemistry are expected to affect key physiological processes of marine organisms, such as calcification, growth, development and survival (Kroeker et al., 2013). Ocean acidification is likely to have far-reaching impacts on marine organisms and marine biodiversity, with the effects expected to first be felt in the polar oceans (Orr et al., 2005).
The annual change in marine carbonate chemistry resulting from net ocean carbon uptake is small in comparison to its natural variation. A mean annual increase of 1.5 µatm has been estimated in surface ocean f CO 2 (fugacity of CO 2 ) for the period from 1970 to 2007 , which is superimposed on large seasonal variation, here defined as the difference between winter and summer values, of, for example, 120 µatm in the seasonally ice covered Southern Ocean and 160 µatm in Georgia Basin (E. M. . The annual increase also occurs against a background of large spatial variation of, for example, 140 µatm in different regions of the Southern Ocean in spring (Bakker et al., 2008;E. M. Jones et al., 2015). Similarly, seasonal variation of 0.04 in surface pH in the subtropical North Atlantic Ocean (González-Dávila et al., 2007) is 20 times the mean annual decrease in surface ocean pH at a rate of −0.002 yr −1 Lauvset et al., 2015).
Seasonal and spatial variation in surface water f CO 2 and pH tend to be larger in coastal waters than in the open ocean, as a result of relatively strong tidal forces, large temperature changes, freshwater and other terrestrial inputs, and strong primary production in coastal waters (e.g. Simpson and Sharples, 2012). This is illustrated by an f CO 2 decrease of 250 µatm from winter to summer at a coastal site near Antarctica (Legge et al., 2015) and spatial variation of up to 200 µatm within the North Sea (Thomas et al., 2004;Omar et al., 2010). Arctic coastal and shelf seas equally have large spatial (> 500 µatm within the region in summer), seasonal (300 µatm) and year-to-year variation (100 µatm) in surface water f CO 2 (Fransson et al., 2006(Fransson et al., , 2009. Surface water f CO 2 may range from less than 200 to 800 µatm (or even 1200 µatm) over short time (days) and space scales (less than 10 nm) in the upwelling system of the US west coast (Hales et al., 2005(Hales et al., , 2012Harris et al., 2013, supplemental figure.) The annual changes in surface ocean f CO 2 and pH exhibit spatial and temporal variation. Basin-specific rates in the f CO 2 increase vary from 1.2 to 2.1 µatm yr −1 for the years 1970 to 2007 , with higher rates of 2.3 to 3.3 µatm yr −1 at different mooring sites in the equatorial Pacific Ocean for the more recent period of 1997 to 2011 (Sutton et al., 2014a). The annual pH decreases at rates of −0.0013 yr −1 in the South Pacific Ocean (for 1998 to 2012) to −0.0026 yr −1 in the Irminger Sea (for 1982 to 2006)  , while annual pH changes vary from −0.0018 to −0.0026 yr −1 for moorings in the equatorial Pacific Ocean for 1997 to 2011 (Sutton et al., 2014a).
Here it is worth noting that such rates of change vary with the start date and period used for the calculation as a result of interannual to decadal variability (McKinley et al., 2011).
Modelling has long been a primary tool for quantification of the ocean carbon sink (e.g. Le Quéré et al., 2014) and ocean acidification (Orr et al., 2005). The availability of large surface ocean CO 2 data synthesis products, such as the Lamont Doherty Earth Observatory (LDEO) surface ocean pCO 2 (partial pressure of CO 2 ) database  and the Surface Ocean CO 2 Atlas (SOCAT) (Pfeil et al., 2013;Sabine et al., 2013;Bakker et al., 2014; this study), now enables data-based estimates of the ocean carbon sink, as well as direct model-to-data comparison for surface ocean f CO 2 and ocean carbon sink estimates (Le Quéré et al., 2014Quéré et al., , 2015aSéférian et al., 2014;Turi et al., 2014). A challenge for data-based estimates of the ocean carbon sink is the gap-filling required for times and locations without surface ocean f CO 2 data. Different techniques and assumptions are applied for doing this; however, the resulting estimates of the ocean carbon sink differ considerably between the methods, especially in data-sparse regions, such as the South Pacific Ocean . Recent data-based studies highlight large year-to-year, decadal and longer-term variation in surface ocean f CO 2 with consequent variation in the global ocean CO 2 sink Fay et al., 2014;Landschützer et al., 2014Landschützer et al., , 2015. Several model-to-data comparison studies suggest that models underestimate the spatial and temporal variation in surface ocean f CO 2 and the ocean carbon sink Turi et al., 2014;Rödenbeck et al., 2015). Such results could only be achieved because of the huge progress that has been made in data collection efforts like SOCAT.
The Global Carbon Budget provides an annual estimate of the carbon sinks and sources for the atmosphere (Le Quéré et al., 2014Quéré et al., , 2015a. The land carbon sink is determined as a residual of the other terms in the budget, namely the atmospheric and ocean components and land-use change. Thus, quantification of the ocean carbon sink is critical to resolving the Global Carbon Budget. Ocean carbon sink estimates based on the LDEO and SOCAT synthesis products have been included in recent versions of the Global Carbon Budget (Sect. 7.3) (Le Quéré et al., 2014Quéré et al., , 2015a. The above highlights the need for long-term sustained, accurate observations over the entire surface ocean and synthesis of the marine carbonate chemistry measurements for quantification of trends in the ocean carbon sink and ocean acidification. This has been eloquently expressed for in situ observations of the climate system by Carl Wunsch and colleagues (Wunsch et al., 2013): No substitute exists for adequate observations. [. . . ] Models will evolve and improve, but, without data, will be untestable, and observations not taken today are lost forever. [. . . ] Today's climate models will likely prove of little interest in 100 years. But adequately sampled, carefully calibrated, quality controlled, and archived data for key elements of the climate system will be useful indefinitely.
In 2007, the international marine carbon community decided to create a quality-controlled, publicly available synthesis product of surface ocean CO 2 for the global oceans and coastal seas (IOCCP, 2007;Doney et al., 2009b). The Surface Ocean CO 2 Atlas provides regular updates of (1) a synthesis product of surface ocean f CO 2 measurements and (2) a gridded product of surface ocean f CO 2 values (without interpolation to grid cells with no measurements).
Both SOCAT data products cover the global oceans and coastal seas. Version 1 of SOCAT was made available in 2011 Sabine et al., 2013), followed by Earth Syst. Sci. Data, 8, 383-413, 2016 www.earth-syst-sci-data.net/8/383/2016/ the release of version 2 in 2013  and of version 3 in 2015 (this study). The Surface Ocean CO 2 Atlas (http://www.socat.info/) provides a key synthesis data set of surface ocean f CO 2 for global and regional scientific studies of the ocean carbon sink and ocean acidification. The SOCAT data collection only contains original surface water CO 2 data, as reported by the data originator, as input values. Thus, the SOCAT data collection does not contain CO 2 values processed by secondary data sources. The SO-CAT data products only contain surface water f CO 2 values from xCO 2 (mole fraction), pCO 2 or f CO 2 measurements . SOCAT does not include surface water f CO 2 calculated from the other seawater carbonate system parameters, such as pH, dissolved inorganic carbon or total alkalinity. Almost all f CO 2 values in SOCAT have been collected on ships by determination of the CO 2 concentration in the headspace of an equilibrator with a continuous seawater flow Bakker et al., 2014). Shipboard systems for equilibrators generally use gas chromatography or infrared detection to determine the CO 2 concentration in headspace air (Pierrot et al., 2009). SOCAT versions 2 and 3 also have data sets from fixed moorings and drifting buoys with measurements made by an equilibrator system with infrared detection or by a membrane spectrophotometer. The SOCAT data collection includes a small number of historical, discrete surface water f CO 2 measurements.
Two large surface ocean CO 2 data synthesis products, the LDEO and SOCAT synthesis products, are now available Sabine et al., 2013;Bakker et al., 2014;this study). While there is substantial overlap in the data sets they contain, the LDEO and SOCAT synthesis products are independent and differ in their data treatment and quality control. There is no intention to merge the LDEO and SOCAT synthesis products, which from a SOCAT perspective would not meet its aim of full documentation and coherence of data treatment and quality control. That said, the SOCAT data managers regularly check which data sets are in the LDEO data product, but are not (yet) included in SOCAT, and invite the data providers to submit their original data sets to SOCAT. In reverse, SOCAT expects data providers to make their original data sets public as part of the submission to SOCAT or upon publication of the SOCAT version of which these data sets are part (Sect. 6.1). The frequent SOCAT releases therefore increase the data availability in general, including for the LDEO data product. Overall, both data products reinforce each other. Furthermore, the existence of the two data products with slightly different time lines enables the use of independent data from the LDEO data set (i.e. data not (yet) included in SOCAT) in testing interpolation methods built using SOCAT  and vice versa.
SOCAT version 3 was made public during the SOCAT and SOCOM (Surface Ocean pCO 2 Mapping Intercomparison) Event on 7 September 2015 (SOCAT and SOCOM, 2015). The event was part of the Surface Ocean Lower At- mosphere Study (SOLAS) Open Science Conference in Kiel, Germany. This manuscript documents SOCAT version 3, while highlighting the key differences with respect to version 2 (Sect. 2). The SOCAT Fair Data Use Statement is presented in Sect. 3. This is followed by a description of data upload, quality control (Sect. 4) and the data products available for version 3 (Sect. 5). We also look forward towards ongoing developments affecting future SOCAT versions, notably automated data upload, inclusion of additional parameters and annual releases (Sect. 6). The article ends with an assessment of the impact and scientific applications of SO-CAT to date (Sect. 7) and concluding remarks (Sect. 8). This publication will be updated regularly using the format of the ESSD (Earth System Science Data) "living data" to document the SOCAT versions and significant changes in the data collection, data upload, quality control and data products. This is the first version of the SOCAT "living data" and is closely associated with earlier ESSD publications describing SOCAT versions 1 Sabine et al., 2013) and 2 .

Characteristics of SOCAT version 3 and key differences to version 2
Version 3 of the Surface Ocean CO 2 Atlas includes 14.7 million surface water f CO 2 values over the time period 1957 to 2014 for the oceans and coastal seas around the world (Figs. 1 and 2; Table 1). The f CO 2 values are from 3646 Artificial seconds were added for concurrent entries. A WOCE flag of 4 was given to duplicate times in data sets with less than 50 equal time stamps (Table 7).
Upload Dashboard Not available. Single platform for data upload, f CO 2 rec calculation and automated data checks.
Data upload Bulk data upload on quality control system. All data sets in versions 1, 2 and 3 were uploaded on the Upload Dashboard.
Calculation of f CO 2 rec In Matlab, prior to bulk data upload.
On the Upload Dashboard with Ferret scripts for all data in versions 1, 2 and 3.
Automated data checks Not available. Automated checks after calculation of f CO 2 rec for all new and updated data sets. WOCE flags of 4 were assigned in specific cases (Table 7).

Quality Control Editor
As in version 1. After automated checks. Upgraded search options and graphical interface. Data set QC flag needs to match QC criteria (tick boxes).

Data set QC flags in data products
Flags of A-D.
Flags of A-E. Revised data set QC criteria (Table 2) applied to all new and updated data sets.
Flag A Needs a cross-over (an acceptable comparison with other data).
Needs a high-quality cross-over.
Flags A, B Accuracy equilibrator pressure ≤ 0.5 hPa. Six other SOP criteria apply.
Flag C Did not follow approved methods or SOP criteria Did or did not follow approved methods or SOP criteria.
Flags C, D Accuracy f CO 2 rec not specified. Accuracy f CO 2 rec ≤ 5 µatm.
Flag E Not available. Accuracy f CO 2 rec ≤ 10 µatm, mainly for alternative sensors and platforms with in situ calibration and full documentation.
WOCE flags for f CO 2 rec Flag of 2 (good) as a default. Manual entry of flags of 3 (questionable) and 4 (bad).
Flag of 2 as a default. Flags of 4 given during automated data checks (Table 7). Quality control comment added during manual entry of flags of 3 and 4.
Synthesis products Data sets with flags of A-D and f CO 2 rec with a WOCE flag of 2 in synthesis and gridded files and as default elsewhere.
Data sets with flags of A-E made public (Table 8). Data sets with flags of A-D and f CO 2 rec with a WOCE flag of 2 in synthesis and gridded files. Data sets with a flag of E and f CO 2 rec with a flag of 2 in a separate synthesis file. Contents of files downloadable from the Data Set Viewer have been streamlined (Table 9).
Gridded products Missing grid cells in cruise-weighted gridded products (versions 1 and 2). A gridded product of means per climatological month is available.
Correction of data-set-weighted gridded products (version 3  data sets, collected on ships (3504 cruises), moorings (123) and drifters (19). The 3646 data sets include 3640 data sets with a WOCE (World Ocean Circulation Experiment) flag of 2 (good), available in all data products, as well as six data sets with a WOCE flag of 3 (questionable), only available in some data products, if selected. Version 3 is an update of version 2 with an additional 4.6 million f CO 2 values from 986 data sets. Version 3 takes the start of the data record backwards from 1968 to 1957 by adding four historic cruises. It also extends the data collection forward by adding 1. New in version 3 is an accuracy criterion for all surface ocean f CO 2 values, described by data set quality control (QC) flags of A to E, for accuracies of 2 (A, B), 5 (C, D) and 10 µatm (E) (  (Wanninkhof et al., 2013b;. Flag A now also requires a highquality cross-over with another data set. The introduction of a lower-accuracy, data set quality control flag of E (accuracy of f CO 2 values better than 10 µatm) enables the inclusion of calibrated f CO 2 measurements made by alternative sensors and on alternative platforms (Wanninkhof et al., 2013b;. Version 3 has significantly more data sets from fixed moorings (123 data sets) and drifting buoys (19) than version 2 (28 and 3 data sets, respectively). These measurements were made by an equilibrator system with infrared detection (e.g. Johengen, 2010; Sutton et al., 2014b) or a membrane spectrophotometer (e.g. Boutin and Merlivat, 2009;Merlivat et al., 2015).
Overall, the quality of the data is comparable to that of version 2, with a small improvement in the documentation of the individual data sets. In version 3, 14 % of the data sets (509 data sets) received a quality control flag of A, 35 % (1260 data sets) a flag of B, 23 % (840) a flag of C and 27 % (990) a flag of D. This compares to 17 % (454 data sets), 31 % (834), 18 % (491) and 33 % (881), respectively, in version 2. The percentage of data sets receiving a flag of A or B is remarkably similar between both versions (49 % in version 3, 48 % in version 2). The small reduction in the percentage of data sets with a flag of D (27 % in version 3, 33 % in version 2), which implies incomplete metadata, highlights an improvement in the documentation accompanying individual data sets. A total of 41 data sets (1 %) received a flag of E; most of these are sensor data, but they also include a small number of valuable historic data sets with an accuracy deemed better than 10 µatm.
Version 3 represents a major step towards the automation of the SOCAT data and metadata upload and quality control in future versions. A new interface, the SOCAT Upload Dashboard, hosts data and metadata upload, (re)calculation of f CO 2 , automated data checks, data visualisation and submission to the quality control system in a single application (Table 1). A prototype of the SOCAT Upload Dashboard was used for data upload for version 3 (Sect. 4.1) and (re)calculation of f CO 2 (Sect. 4.2). All data sets were run across a newly developed, automated data checker for identification of values that were out of range (Sect. 4.3). As a result, issues identified during data upload were already corrected prior to entry on the quality control system. The search capabilities and graphical interface of the quality control system and the associated Data Set Viewer (previously known as the Cruise Data Viewer) were upgraded (Sects. 4.4 and 5.4). Version 4 will see enhanced implementation of SOCAT automation by enabling data providers to upload their data using the SOCAT Upload Dashboard and submission onto the SOCAT QC Editor (Sect. 6.1).
The publicly accessible, user-friendly and interactive Data Set Viewer now allows selection of f CO 2 values by data set, year, month, region, data provider, vessel or platform name, country of the vessel's or platform's flag, data set quality control flag, WOCE flag and SOCAT version, as well as setting of limits on data ranges. The graphical tools of the Data Set Viewer (access via http://www.socat.info/) for SOCAT version 3 have been extended (Figs. 1, 3 and 4). Users can now set fixed colour scales and create high-quality, publishable images.
A small error was detected in the gridded data products of SOCAT versions 1 and 2 (Sect. 5.5). In short, the data-setweighted f CO 2 values (formerly known as cruise-weighted f CO 2 values) in these products were found to have missing values for a small number of grid cells, as a result of an inconsistency between the algorithms used for computing the www.earth-syst-sci-data.net/8/383/2016/ Earth Syst. Sci. Data, 8, 383-413, 2016 Table 2. Data set quality control (QC) flags in version 3 (Wanninkhof et al., 2013b;. All criteria need to be met for assigning a flag of A to E. Data sets with flags of A to E have been made public. Data sets with a flag of A to D are included in the global synthesis and gridded products (Table 8). Changes relative to versions 1 and 2 are in bold. Flag (ID) refers to the data set quality control flag with its numerical identifier (ID) provided between brackets. Calculation of "recommended f CO 2 " (f CO 2 rec) is explained in Sect. 4.2.
(2) A high-quality cross-over 1,2 with another data set is available.
(5) Data set QC was deemed acceptable.
(4) Data set QC was deemed acceptable.
(2) Did or did not follow approved methods/SOP criteria.
(4) Data set QC was deemed acceptable.
(2) Did or did not follow approved methods/SOP criteria.
(4) Data set QC was deemed acceptable.
(4) Data set QC was deemed acceptable.
S (15) (Suspend) (1) More information is needed for data set before flag can be assigned.
(3) Data are being updated (part or the entire data set).
X (15) (Exclude) The data set duplicates another data set in SOCAT.

N (No flag)
No data set flag has yet been given to this data set.

U (Update)
The data set has been updated. No data set flag has yet been given to the revised data. 1 A cross-over between two data sets is defined as an equivalent distance of less than 80 . This criterion combines distance and time as ([ x 2 + ( t× 30) 2 ] 0.5 ) ≤ 80 with distance x in kilometres and time t in hours. One day of separation in time is equivalent (heuristically) to 30 km of separation in space. 2 A high-quality cross-over is defined as a cross-over between two data sets with a maximum cross-over equivalent distance of 80 km, a maximum difference in sea surface temperature of 0.3 • C and a maximum f CO 2 rec difference of 5 µatm. Inconclusive cross-overs with the temperature or f CO 2 rec difference between the data sets exceeding 0.3 • C or 5 µatm, respectively, do not receive a flag of A. High-quality cross-overs are rare in coastal waters, near sea ice and in regions of freshwater influence, as a result of high spatial variation, not for lack of measurement quality (Sect. 4.4). 3 Seven approved methods or SOP (standard operating procedure) criteria need to be fulfilled for a data set quality control flag of A and B (Sect. 4.4) (after Pfeil et al., 2013). In version 3, the accuracy requirement for equilibrator pressure has been relaxed to 2.0 hPa from 0.5 hPa in earlier SOCAT versions. The six other criteria are the same in SOCAT versions 1, 2 and 3. weighted and unweighted gridded products. This was both in time and in position. This error was corrected in the gridded data products for version 3. Note that this error remains present in the gridded products for versions 1 and 2.
In summary, SOCAT version 3 is a significant update of version 2. It provides a 58-year record (1957-2014) of 14.7 million surface ocean f CO 2 values for the global oceans and coastal seas. It has higher-quality data with better documentation than version 2. Addition of a flag of E has en-abled inclusion of calibrated f CO 2 values from alternative sensors and platforms. All surface ocean f CO 2 values now have an accuracy estimate, embedded in the data set QC flag. Automated quality control checks during version 3 data upload have identified outliers. The graphical interface of the Data Set Viewer has been vastly improved. These characteristics of version 3 are described in more detail in Sects. 4 to 6.

Fair Data Use Statement for SOCAT version 3
The Surface Ocean CO 2 Atlas provides access to a vast amount of surface ocean CO 2 data from the global oceans and coastal seas, painstakingly collected by marine carbon scientists around the world over 58 years. These data sets represent an important scientific output by these scientists. Individual researchers and the marine carbon community make these data public in SOCAT, such that they are available for scientific research and for informing policy (Sects. 7 and 8). Nonetheless, it is important that the data providers receive credit for the data that they collected. This will provide data providers with vital evidence of how their data are being used, enabling successful funding applications for future data collection. Furthermore, the assembly, quality control and archiving of SOCAT data products involve many data managers and scientists (Tables 3 and 4). Planning meetings and community events have proved effective in informing SOCAT contributors and users, in discussing SOCAT progress and in setting SOCAT strategy (Table 5).
The SOCAT Fair Data Use Statement therefore contains an urgent request to generously acknowledge the contribution by SOCAT data contributors and investigators. Ideally users will invite large data providers to contribute to regional studies and, if they do, to co-author relevant papers. Citation of relevant scientific articles by data providers is a good scientific practice. The following Fair Data Use Statement ap-  plies to SOCAT data products (SOCAT, 2016): the synthesis and gridded SOCAT products are a result of scientific effort by data providers, data managers and quality controllers. It is important that users of the SOCAT products fairly acknowledge this effort. This will help generate funding for continuation of observational products and promote further sharing of data. We expect the following from users of SOCAT data products: 1. To generously acknowledge the contribution of SOCAT data providers and investigators in the form of invitation to co-authorship, reference to relevant scientific articles by data providers or by naming the data providers in the acknowledgements. Specifically, in regional studies, users should invite large data providers, who frequently possess valuable expert knowledge on data and region, to collaborate at an early stage, which may lead to an invitation of co-authorship. We recognise that coauthorship is only justified in the case of a significant scientific contribution to a publication and that provision of data on its own does not warrant co-authorship.
2. To cite SOCAT and its data products as follows:  . The revision follows concerns raised by SOCAT data providers and discus-sions among SOCAT scientists at two recent community events (SOCAT, 2014a;SOCAT and SOCOM, 2015).

Data retrieval and data upload on the SOCAT Upload Dashboard
In version 3, new and updated data sets were obtained from the Carbon Dioxide Information Analysis Centre (CDIAC), PANGAEA ® and public websites. In addition, many data sets were directly submitted to SOCAT. As well as 887 new data sets, version 3 also contains 1258 updated version of data sets previously submitted to versions 1 and 2, with revised metadata or data. Some of these were updates of data sets previously suspended from SOCAT (e.g. Table 10 in Bakker et al., 2014).  (2015) As in previous versions, all new and updated data sets were put in a uniform format . Similar to version 2, an expocode was assigned to all data sets, including moorings and drifters . In general, an expocode consists of 12 characters, describing the country, the vessel or platform, and the data set start day (Swift, 2008). The expocode 320620090306, for example, indicates a data set collected on the US (32) ship R/V Nathaniel B. Palmer (06) with the first day of the cruise on 6 March 2009. There are a few exceptions to this. If two American mooring data sets (which always start with 3164) have the same start date, they will end with "−1" and "−2", corresponding to an expocode of 14 characters.
In version 3, the SOCAT data managers used the new SO-CAT Upload Dashboard for upload of data and metadata (Table 1). All data sets previously included in versions 1 and 2 were also uploaded, automatically screened for obvious outliers and added to version 3 via the SOCAT Upload Dashboard (Table 1). This new capability is an important step in the ongoing SOCAT automation effort and integrates data and metadata upload, (re)calculation of f CO 2 , automated data checks, data visualisation and data submission in a single application which is tightly coupled to the SOCAT QC Editor. Once fully operational in version 4, the Upload Dashboard will allow data providers to upload, verify and submit their data for SOCAT quality control.
Not all data sets had time stamps which included seconds. In such cases, multiple occurrences of a time stamp were often present. Artificial seconds were added to data sets with 50 or more duplicate time stamps. For these data sets, evenly distributed artificial seconds were added for each equal time stamp. However, if there were less than 50 duplicate times in a data set, a WOCE flag of 4 was generated for the f CO 2 rec values (or "recommended" f CO 2 values; see Sect. 4.2) with duplicate time stamps during the automated data checks (Sect. 4.3). Adding artificial seconds is time-consuming and there was insufficient time available for adding artificial seconds to all duplicate times in all data sets.

(Re)calculation of f CO 2
Data providers reported CO 2 values as xCO 2 , pCO 2 and/or f CO 2 , at the equilibration temperature (Tequ) and/or the sea surface temperature (SST or intake temperature). In order to ensure a coherent SOCAT synthesis product, surface water f CO 2 values at sea surface temperature were recalculated from the reported CO 2 values using a strict calculation protocol with the following procedure (quoting Pfeil et al., 2013): 1. when possible, (re)calculate f CO 2 ; 2. the preferred starting point for the calculations is xCO 2 , then pCO 2 , and finally f CO 2 ; 3. minimise the use of external data required to complete the calculations.
In total, 14 algorithms were used for (re)calculating these "recommended" f CO 2 (f CO 2 rec) values from the xCO 2 , pCO 2 and/or f CO 2 values reported by the data providers ( Table 6). The particular algorithm used for a given data set is included in the data products (Sect. 5). Equations recommended by Dickson et al. (2007) were applied for the conversion of the dry CO 2 mole fraction to pCO 2 , for the calculation of the water vapour pressure and for the correction of pCO 2 to f CO 2 . The temperature correction suggested by Takahashi et al. (1993) was used to correct for temperature change between the seawater intake and the equilibrator. Atmospheric pressure from reanalysis and climatological values of salinity were used in the calculation if in situ values had not been reported ( Table 6). The 2014 version of the atmospheric pressure data product was used (NCEP, 2014), which is an update of the 2012 data product used in the previous SOCAT version (NCEP, 2012). Sea surface salinity was from the World Ocean Atlas (WOA) 2005 (Antonov et al., 2006). Full details on the external pressure and salinity products are in the footnotes of Table 9. Note that the use of external atmospheric pressure data would rule out data set quality control flags of A and B during subsequent quality control, while use of external salinity values would not affect the data set quality control flag (Sect. 4.4). Table 6. Algorithms and surface water CO 2 parameters used in the calculation of recommended f CO 2 (f CO 2 rec) at sea surface temperature in version 3 (after Pfeil et al., 2013). Algorithm 1 was the preferred method, followed by algorithm 2 and so forth. The algorithm used for each data set is stated in the output files (Table 9). In the case of incomplete reporting, NCEP (National Centers for Environmental Prediction) atmospheric pressure (Kalnay et al., 1996;NCEP, 2014) and WOA (World Ocean Atlas) 2005 salinity (Antonov et al., 2006) were applied.
An important change relative to earlier versions is that the (re)calculation in version 3 took place using Ferret scripts on the new SOCAT Upload Dashboard after data upload (Sect. 4.1), rather than in Matlab before the bulk upload of the data ( Table 1). The implementation of the Ferret scripts enables full integration of SOCAT data submission, (re)calculation of f CO 2 and quality control on a single software platform. This streamlines and simplifies the SOCAT data flow. The Matlab code used for the (re)calculation in versions 1 and 2 was transferred to Ferret scripts on the Upload Dashboard for version 3. The new Ferret scripts were checked by comparing f CO 2 rec values in version 2 calculated using Matlab and new values calculated using Ferret. Almost all new values were within 0.01 µatm of the value calculated in Matlab, if not identical to it. Significant changes (smaller than 5 µatm) for less than 200 data points were attributed to changes in atmospheric pressure from reanalysis (Table 1).

Automated data checks
A newly developed, automated data checker performed checks on parameters directly influencing the position, time or calculation of f CO 2 rec values (Tables 1 and 7). A WOCE flag of 4 (meaning a bad data point) was assigned to all f CO 2 rec values with an incorrect position or time stamp or otherwise identified as inaccurate. These automated checks were carried out on all data in version 3 after (re)calculation of f CO 2 rec and before submission to the quality control system.
Unintentionally, WOCE flags of 4 were also assigned for values which were out of range in parameters which do not directly affect f CO 2 rec values, such as wind speed and ship direction (Table 7). This resulted in a WOCE flag of 4 being given to some good-quality f CO 2 rec values in newly added and updated data sets in version 3. The criteria for the automated checks will be reconsidered for version 4.
Automated data checks were also performed for data sets previously included in versions 1 and 2 (and not updated in version 3). For these data sets all WOCE flags of 4 assigned by the automated data checker, other than for duplicate time stamps, were removed to preserve the data sets as reported for version 2.

Secondary quality control
Secondary quality control is a key part of the creation of a high-quality data synthesis product. During secondary quality control, scientists, also known as quality controllers, assess the quality of each new and updated data set by following a checklist of specific criteria, while also examining the documentation of the data, known as metadata, for completeness. The quality controllers assign a data set quality control flag to each data set, based on their findings ( Table 2).
The SOCAT quality control system has been upgraded (Table 1), as part of the ongoing SOCAT automation. In particular, the ease of use, search options and visualisation tools have been improved. Other modifications are that the quality control criteria used for setting the data set quality control flag now must be specified (by a tick box system) and that a comment needs to be entered when assigning a WOCE flag (Table 1). Text relating to the tick boxes and the comments accompanying WOCE flags are incorporated into the quality control comments.
Earth Syst. Sci. Data, 8, 383-413, 2016 www.earth-syst-sci-data.net/8/383/2016/ The definitions of the data set quality control flags in version 3 have been revised relative to versions 1 and 2 (Tables 1 and 2) (Wanninkhof et al., 2013b;. These revised QC criteria were applied to all new and updated data sets in version 3, but not retrospectively to data sets included in earlier versions, unless data providers had updated these. Version 3 has data set quality control flags of A to E and WOCE flags of 2, 3 and 4 for individual f CO 2 rec values. For a data set to obtain a data set quality control flag, it needs to meet all the criteria of that specific data set flag (Table 2).
All data set flags now have an accuracy requirement for the f CO 2 rec values. Previously, flags of C and D did not have an accuracy requirement . In version 3, requirements are an accuracy of better than 2 µatm for flags of A and B, and of better than 5 µatm for flags of C and D and of better than 10 µatm for a flag of E ( Table 2). The accuracy requirement takes precedent over the criteria that follow (Wanninkhof et al., 2013b;, implying that, if the accuracy requirement is not met, a data set is given a data set flag with a lower accuracy requirement, appropriate to the accuracy of the data set. Seven approved methods or SOP (standard operating procedure) criteria need to be met for a data set quality control flag of A and B (after Pfeil et al., 2013): 1. The data are based on xCO 2 analysis, not f CO 2 calculated from the other carbon parameters pH, total alkalinity and dissolved inorganic carbon.
2. Continuous CO 2 measurements have been made, not discrete CO 2 measurements.
3. The CO 2 detection is based on an equilibrator system and is performed by infrared analysis or gas chromatography.
4. The calibration has included at least two non-zero gas standards, traceable to World Meteorological Organization (WMO) standards.
5. The equilibrator temperature has been measured to within 0.05 • C accuracy.
6. The intake seawater temperature has been measured to within 0.05 • C accuracy.
7. The equilibrator pressure has been measured to within 2.0 hPa accuracy.
The requirement regarding the accuracy of the equilibrator pressure has been relaxed to an accuracy of 2.0 hPa in version 3, replacing the earlier requirement of 0.5 hPa, as an accuracy of 2.0 hPa in pressure is sufficient for achieving an accuracy of 2.0 µatm in f CO 2 (Wanninkhof et al., 2013b;. The six other criteria are the same in all SOCAT versions. In version 3, a high-quality cross-over has become a prerequisite for a data set flag of A, replacing the earlier requirement of "an acceptable comparison (or cross-over) with other data" (Wanninkhof et al., 2013b;. As in previous versions, a cross-over is defined by an equivalent distance of less than 80 km between two data sets . This criterion combines distance and time as ([ x 2 + ( t×30) 2 ] 0.5 ) ≤ 80 with distance x in kilometres and time t in days. One day (or 24 h) of separation in time is equivalent (heuristically) to 30 km of separation in space. According to this definition, the maximum time separation (at a spatial distance of 0 km) is 64 h for a cross-over to occur. The new definition of a high-quality cross-over between two data sets requires that differences in sea surface temperature and f CO 2 rec between the data sets do not exceed 0.3 • C and 5 µatm, respectively. These criteria reflect the test for a high-quality cross-over between two data sets with a flag of A or B, i.e. each with an accuracy for f CO 2 rec of better than 2 µatm or a joint accuracy of better than 4 µatm with 1 µatm added to account for differences in time and space. A temperature difference of 0.3 • C roughly corresponds to an f CO 2 difference of 5 µatm. "Inconclusive" cross-overs, where differences in temperature or f CO 2 rec exceed these values, do not qualify for a data set flag of A in version 3.
It is worth noting that meaningful high-quality cross-overs are rarely found in coastal waters, near sea ice and in regions of freshwater influence (ROFIs), as a result of high spatial variation in sea surface temperature and f CO 2 rec, not for lack of measurement quality. Even if a small number of sea surface temperature and f CO 2 rec values are within 0.3 • C and 5 µatm, this tends to be a coincidence rather than a meaningful correspondence between data sets. This can be illustrated for the US research ships Nathaniel B. Palmer and the Lawrence M. Gould, which have frequent high-quality crossovers in the open Southern Ocean but few high-quality crossovers near Palmer station, where they both make port calls.
In version 3, a data set with a flag of C "did or did not follow approved methods or SOP criteria" (Wanninkhof et al., 2013b;. This is an amendment from the earlier requirement that the data set "did not follow approved methods or SOP criteria" . The new flag of E enables inclusion of f CO 2 values from calibrated alternative sensors and platforms (Wanninkhof et al., 2013b;. A flag of E requires complete metadata and a demonstrable accuracy for f CO 2 rec of better than 10 µatm by in situ calibration of the sensor. The WOCE flags for individual f CO 2 rec values are defined as 2 (good), 3 (questionable) and 4 (bad) in versions 1, 2 and 3 . New is the requirement to add a comment when assigning WOCE flags of 3 and 4 ( Table 1).
As in version 2, five additional guidelines were considered for open-ocean f CO 2 rec values, away from sea ice. The guidelines were used for assigning data set quality control flags and WOCE flags (after Pfeil et al., 2013, and 1. warming between the seawater intake and the equilibrator should be less than 3 • C; 2. warming rate should be less than 1 • C h −1 , unless a sharp temperature front is apparent; 3. warming outliers should be less than 0.3 • C, compared to background data; 4. cooling between the seawater intake and the equilibrator is unlikely in high-latitude oceans for an indoor measurement system; 5. zero or constant temperature difference between the equilibrator and seawater intake usually indicates the absence of SST values. As for SOCAT version 2, quality controllers were organised into eight regions, each with a group lead (Table 4). The eight regions included the coastal and marginal seas, the Arctic Ocean, the North and tropical Atlantic, the North and tropical Pacific, the Indian Ocean, and the Southern Ocean. The quality controllers gave data sets a quality control flag for each region they crossed. As a final step, the data set quality control flags for the different regions had to be reconciled.

Overview of data products
In essence, the data products and data platforms are the same as for earlier SOCAT versions with some modifications (Table 8). Improvements include a major upgrade of the search and visualisation capabilities of the Data Set Viewer (previously known as the Cruise Data Viewer) and uniform contents for the files downloadable from the Data Set Viewer (Tables 1 and 9). Access to the data products is via the SO-CAT website (http://www.socat.info/) and the web addresses for the individual data platforms (Table 8). Quality-controlled recommended surface ocean f CO 2 measurements in a uniform format are available in individual data set files, in regional and global synthesis files and in gridded form (Table 8). These three data products can be accessed via the user-friendly, interactive online Data Set Viewer and Gridded Data Viewer, by downloading data files, or in Ocean Data View (Schlitzer, 2015). Similar to earlier versions, data sets with a quality control flag of A to D and recommended f CO 2 values with a WOCE flag of 2 (good) are included in the synthesis files and gridded products. Data sets with a flag of E are available in a separate synthesis file. Data set flags of A to E and a WOCE flag of 2 for Earth Syst. Sci. Data, 8, 383-413, 2016 www.earth-syst-sci-data.net/8/383/2016/ Table 8.
Key characteristics of the SOCAT data products in version 3 (Sect. 5) (after Bakker et al., 2014). Data products differ in whether they include data sets with flags of A to D or A to E and f CO 2 rec values with a WOCE flag of 2 only or 2 to 4. Two data products provide access to metadata. Quality control comments are available via the http://ferret.pmel.noaa.gov/SOCAT_Data_Viewer/; select "Data Set", then "Cruise data" and "SOCAT v3 data collection". 5   (Antonov et al., 2006), available at http://www.nodc.noaa.gov/OC5/WOA05/woa05nc.html, using the data set s0112an1.nc from the "monthly" link at http://data.nodc.noaa.gov/opendap/woa/WOA05nc/ (last access: 1 September 2015). This data set is identical to that SOCAT version 2. 3 Atmospheric pressure extracted from the NCEP/NCAR (National Centers for Environmental Prediction/National Center for Atmospheric Research) 40-Year Reanalysis Project on a 6-hourly, global, 2.5 • latitude by 2.5 • longitude grid (Kalnay et al., 1996;NCEP, 2014). This is an update relative to the 2012 data set (NCEP, 2012) used in SOCAT version 2. 4 Bathymetry extracted from ETOPO2 (2006) 2 min Gridded Global Relief Data. This data set is identical to that in SOCAT version 2. 5 GLOBALVIEW-CO2 (2014), downloading the "surface" reference type gives the sine function of latitude versus time for the reference marine boundary layer. This is an update relative to the 2012 version used in SOCAT version 2. 6 Individual data set files contain all f CO 2 rec data. Synthesis files at CDIAC and via ODV contain data sets with a flag of A-D and f CO 2 rec values with a WOCE flag of 2 (Table 6).
Earth Syst. Sci. Data, 8, 383-413, 2016 www.earth-syst-sci-data.net/8/383/2016/ f CO 2 values is the default setting for the Data Set Viewer. Quality control comments can be accessed via the Data Set Viewer (Table 8). While the SOCAT data products include seawater temperature and salinity, as these are required for (re)calculation of f CO 2 , these two parameters have not been quality-controlled to the high standards required by the physical oceanographic community (SOCAT, 2014a). As in earlier versions, each individual data set has a digital object identifier (DOI), which provides a direct link to the metadata, including the name and affiliation of the data provider. This DOI for the data set is available for each recommended surface ocean f CO 2 value in the synthesis files. This enables users to easily identify the data provider and to gain access to the original data set and to detailed information on the data set, including any relevant peer-reviewed journal articles that we are aware of. The Data Set Viewer now enables to search the data collection by data provider. Data providers are also prominently displayed in the Table of Datasets (access via the Data Set Viewer) (Table 8). A more detailed description of the data products follows.

Individual data set files
Individual data set files are available for all data sets with flags of A, B, C, D and E. Each individual data set has a DOI. The files contain all original CO 2 measurements and recommended f CO 2 values with a WOCE flag of 2, 3 and 4 (Table 8), as set by the data originator, by the automated range checker or during the secondary quality control. The files also contain other parameters, such as atmospheric pressure from reanalysis, climatological salinity and the atmospheric CO 2 mole fraction. Metadata reported by the data provider accompany the files and links to the original data sets are provided. The files are available in text format at PANGAEA ® (https://doi.org/10.1594/PANGAEA.849770).

Global synthesis product
The global and regional synthesis files contain recommended f CO 2 values with a WOCE flag of 2 (good) for data sets with flags of A, B, C and D (Table 8). A separate synthesis file is available for data sets with a flag of E. Each line of the global and regional synthesis files contains the DOI for the corresponding individual data set, as archived at PANGAEA ® , thus enabling retrieval of metadata, name of the data provider and the original CO 2 values reported by the data provider (Table 9) (Sect. 5.2). Global and regional files are available as compressed zip text files via CDIAC (http://cdiac.ornl.gov/ftp/oceans/SOCATv3/). Matlab code is available for reading these text files. Regional files for the SOCAT regions (Table 4) only contain data for a specific region with no overlap, so that many data sets on moving ships are split between several regional files. The global synthesis product for data sets with flags of A to D is also available in Ocean Data View format (https://odv.awi.de/en/data/ocean/ socat_fCO2_data) (Schlitzer, 2015).

Subsetting the global synthesis product
The interactive Data Set Viewer (http://ferret.pmel.noaa.gov/ SOCAT_Data_Viewer/) has powerful search capabilities and an attractive graphical interface following the upgrade for version 3 (Tables 1 and 8). The SOCAT Data Viewer now hosts the Data Set Viewer and the Gridded Data Viewer on a single software platform. The move of the Data Set Viewer onto this platform in version 3 streamlines access to the SO-CAT synthesis and gridded products via a Live Access Server (LAS). The move and upgrade of the Data Set Viewer accompany that of the closely associated SOCAT quality control system ( Sects. 2 and 4.4).
The Data Set Viewer enables subsetting of the global SO-CAT data collection. The default setting is for data sets with flags of A to E and "valid" f CO 2 values with a WOCE flag of 2 for years 1957 to 2014, corresponding to 3640 data sets for version 3. Recommended f CO 2 values with flags of 3 and 4 can also be selected. In the Data Set Viewer, the user can select data sets by, for example, year, month, region, platform/vessel, "valid" values, data provider, data set flag, WOCE flag and SOCAT version. It is also possible to define limits for the values shown. Maps of surface ocean f CO 2 demonstrate the data distribution, as well as temporal and spatial variation in surface ocean f CO 2 for the selected data sets (Figs. 1, 3 and 4). High-quality figures can be rapidly created for scientific presentations to fellow scientists, funding agencies and policy makers. Scatter plots or propertyproperty plots, available via the Correlation Viewer, can be used to depict any two variables of a data set or data sets, enabling further investigation. Examples are figures of f CO 2 or sea surface temperature as a function of time, salinity or latitude.
The data shown on the Data Set Viewer have been subsampled for system efficiency, such that only part of the data are shown. Visual display of these data sets on maps in the Data Set Viewer is subject to further improvement, as the interpolation of sparse data ignores topographic features. As a result cruise tracks occasionally appear to cross land. This issue does not affect the data sets themselves. The Table of Datasets (previously known as the Table of Cruises) can be accessed from the Data Set Viewer. It provides access to the original CO 2 measurements; f CO 2 values with a WOCE flag of 2, 3 and 4; metadata; comments entered during quality control; and thumbnail plots (Table 8) (Sect. 4.4). Thumbnail plots consist of a series of scatter plots for key parameters in an individual data set and are useful for obtaining a quick overview of a data set. Both the Data Set Viewer and the Table of Datasets allow download of data sets in NetCDF and text format (Tables 8 and 9). All downloadable files now contain the same parameters (Table 9).
The performance speed of the Data Set Viewer may be slower if the full SOCAT data collection is accessed. Subsetting the data collection by decade or region considerably improves the system speed of the Data Set Viewer. Updates of web browsers occasionally result in less than perfect web access to the Data Set Viewer. In such cases, another web browser may provide better access. The web manager (socat.support@noaa.gov) may have useful advice.

Gridded products
The protocol for the creation of gridded f CO 2 products in version 3 follows that for versions 1 and 2, as described by Sabine et al. (2013). The gridded products have a 1 • latitude by 1 • longitude resolution with a higher resolution of 0.25 • latitude by 0.25 • longitude for coastal seas. Recommended surface ocean f CO 2 values from 1970 to 2014 with a WOCE flag of 2 from data sets with flags of A to D have been used for the gridded products. The gridded products have no interpolation -i.e. there is no gap-filling and grid cells without f CO 2 values are empty. No correction is made for the longterm increase in surface ocean f CO 2 .
Gridded f CO 2 values are reported as unweighted means and as data-set-weighted means . In an unweighted mean, all f CO 2 values in a grid cell have equal weight for calculating the mean. In a data-set-weighted mean, averages of the f CO 2 values are calculated per data set for each grid cell, before calculating averages of these data set means. In version 3, a small error was corrected in the procedure for creating the gridded data products. This resulted in a small reduction in the number of grid cells with data in the data-set-weighted product for versions 1 and 2. This problem was corrected in gridded files in version 3 with the revised gridded data set made public on 2 November 2015.
Gridded products are available per decade, per year and monthly per year (Table 10). A monthly climatological f CO 2 product has not been made available for version 3, out of concern, that such a product without a correction for the long-term increase in f CO 2 could be misinterpreted. Gridded f CO 2 values may have temporal bias, for example, if only summertime f CO 2 values are available for a grid cell in the annual gridded product. Several auxiliary variables are reported per grid cell, for example the number of data sets and observations and the standard deviation in the unweighted and weighted f CO 2 mean values (Table 10).
Gridded products are available in NetCDF format at CDIAC (http://cdiac.ornl.gov/ftp/oceans/SOCATv3/ SOCATv3_Gridded_Dat/) (Table 8). Matlab code is available for reading the files. The Gridded Data Viewer (http://www.socat.info/; select "Gridded Data Viewer") provides easy access to the gridded data products, as well as comparison to gridded products from earlier versions. Figures 5 and 6 have been made with the gridded data product.
6 Future developments 6.1 Direct data upload and annual SOCAT releases The SOCAT automation system was formally launched on 7 September 2015 (SOCAT and SOCOM, 2015). Data providers can now directly upload, check and submit their data on the SOCAT quality control system for future SOCAT versions. The SOCAT automation was first discussed at the 2011 Data2Flux Workshop in Paris (SOCAT, 2011). The automation system was designed at the 2012 Automation Planning Meeting (SOCAT, 2012a) and approved shortly afterwards by global and regional group leads (SOCAT, 2012b) ( Table 5). The automation system has been implemented in the background, with all the work for the biannual SOCAT releases of versions 2 and 3 taking place in the foreground (Bakker et al., 2014, this study). This considerable achievement has been made possible by the hard work and planning of the NOAA-PMEL and University of Washington Live Access Server team and other members of the SOCAT automation team (Table 3) (Tans and Keeling, 2016). Note the changing scale on the y axis. Similar figures have been made for versions 1 and 2 Sabine et al., 2013). The new automation system allows data providers to upload their data, to check their data with the automated data checker and to visualise their data. Finally, if the data provider deems the data of good quality, he or she can submit them to the SOCAT quality control system. As part of the data submission to SOCAT, the data provider is encouraged to make the original data set public, for example at CDIAC (SOCAT and SOCOM, 2015), either immediately or upon the release of the SOCAT version of which the data set is part. The automation system will enable annual SOCAT releases. The timetable for future SOCAT versions envisages that data upload will end in late January of each year and quality control in late March for a release in summer later  Bakker et al., 2014). The higher resolution of 0.25 • × 0.25 • , available for coastal seas (Sect. 5.5), is not shown.
that year. With the new system it is now possible to upload and submit data to SOCAT, while quality control of previously submitted data sets is in progress. Thus, both data upload and quality control can now be carried out in parallel. Data upload and quality control for the next SOCAT version will start as soon as they have finished for the preceding version. Thus, the automation system will enable rolling, continuous data upload and quality control, as well as annual SOCAT releases. The system for automated data upload is under continuous improvement. Metadata templates and upload will be integrated into the SOCAT data upload system. Other planned improvements include searchable information for funding agency and entry of preliminary data set flags by the data provider. A number of additional features are being considered for future SOCAT versions, some of which may be implemented as early as version 4. These are discussed below.

Atmospheric CO 2 values
Data providers can now submit measurements of atmospheric CO 2 mole fraction, made in parallel to surface water f CO 2 . A separate WOCE flag will be created for measurements of the atmospheric CO 2 mole fraction in future SOCAT versions. Once quality control has been carried out on the atmospheric CO 2 measurements, such values will be included in the SOCAT data products.
In future, atmospheric f CO 2 will be calculated from atmospheric xCO 2 values, both from the measurements and from GLOBALVIEW-CO2 (2014) values. New graphics will enable comparison of surface ocean f CO 2 values to atmospheric f CO 2 values. The graphs will become an important quality control tool. Future data products will contain atmospheric f CO 2 values calculated from atmospheric measurements and from GLOBALVIEW-CO2, in addition to the atmospheric mole fractions from GLOBALVIEW-CO2 already part of the SOCAT data products (Table 9).

Additional surface water parameters
In 2014, SOCAT scientists decided to allow inclusion of additional surface water parameters accompanying surface water f CO 2 values in SOCAT data output files (SOCAT, 2014a). Such additional parameters might include dissolved inorganic carbon, total alkalinity, pH, nutrients, methane (CH 4 ) and nitrous oxide (N 2 O) concentrations. SOCAT scientists will not carry out quality control on these additional parameters, but would welcome collaboration with other communities taking responsibility for this. These additional parameters will be made public in parallel to the official SO-CAT releases. The extra parameters will be posted in separate data files to emphasise that they have not been qualitycontrolled. A SOCAT and MEMENTO (MarinE MethanE and NiTrous Oxide; Bange et al., 2009) working group is considering the way forward for surface water CH 4 and N 2 O measurements (SOCAT and SOCOM, 2015).

Impact and scientific highlights of SOCAT
7.1 A multi-decade record of surface ocean f CO 2 values SOCAT provides a record of the history of surface ocean CO 2 research (Fig. 3). Initial, exploratory surface water CO 2 measurements in the late 1950s, 1960s and 1970s were followed by more frequent CO 2 data collection on research ships in the 1980s and large (inter)national research programs, such as the World Ocean Circulation Experiment (WOCE), the Joint Global Ocean Flux Study (JGOFS) and the Tropical Atmosphere Ocean (TAO) network in the 1990s. The operation of CO 2 instruments on ships part of the Carbon Voluntary Observing Ships (Carbon VOS) programme, also referred to as the Ships Of Opportunity Programme (SOOP), strongly increased the number of available f CO 2 values from the 1990s onwards. Data availability in the SO-CAT collection has increased 4-fold from 0. The seasonal distribution of surface ocean f CO 2 values in the relatively data-rich decade from 2000 to 2009 is shown in Fig. 4. This figure highlights the lack of winter data in the high-latitude oceans, as well as the opposing seasonal cycle of surface ocean f CO 2 in the subtropical and temperate oceans (Takahashi et al., 2002). The distribution of surface ocean f CO 2 values per decade clearly shows the long-term increase in surface ocean f CO 2 (Fig. 5), while suggesting that surface ocean f CO 2 has increased slower than the atmospheric CO 2 concentration since the 1990s. Figure 6 visualises the data availability as the number of months in each 1 • latitude by 1 • longitude grid cell with f CO 2 values since 1970, both as unique months and as total months.

Impact of SOCAT
SOCAT and its data products are cited or named in influential international reports, in more than 100 peer-reviewed scientific publications, PhD and master's theses, book chapters and numerous other publications, as listed on the SO-CAT website (http://www.socat.info/publications.html). Figure 7 shows the rapid increase in such publications, since the initiation of SOCAT in 2007(IOCCP, 2007 and the first SOCAT release in 2011 Sabine et al., 2013). The SOCAT data collection forms the basis of several data products (http://www.socat.info/products.html) and diverse scientific applications. These include a dozen mapping products of surface ocean pCO 2 and air-sea CO 2 fluxes for the global oceans (see overview in . The SOCAT gridded product and one data product based on SOCAT  are integrated with the ESMValTool (Eyring et al., 2016) for routine evaluation of Earth system models. For the same purpose, the SOCAT gridded product is currently being integrated into the Obs4MIPs (Observations for Model Intercomparison Projects) data repository (Ferraro et al., 2015). Citation of SOCAT in high-impact reports, scientific applications of SOCAT and scientific findings based on SOCAT are discussed below.
The importance of the SOCAT synthesis is highlighted by its citation in three categories of high-impact reports, notably reports on ocean observing systems, assessments of climate change and global carbon budgeting, including carbon observing strategies, and ocean acidification studies.
-Reports on ocean observing systems include publications from OceanObs'09 Monteiro et al., 2010), the Framework for Ocean Observing (FOO, 2012), the Tropical Pacific Observing System 2020  and the 2nd International Indian Ocean Expedition . mental Panel on Climate Change) report    (Newton et al., 2014) and the Secretariat of the Convention on Biodiversity (2014).

Scientific applications of SOCAT
SOCAT is used for a variety of scientific applications (Fig. 7b), some of which imply a wider relevance for SOCAT data products than envisaged during the creation of SOCAT (IOCCP, 2007). Scientific applications of SOCAT include figures of surface ocean CO 2 observations; use of SOCAT tools and protocols; use of surface ocean f CO 2 in diverse environmental studies; model-data comparison, model evaluation and data assimilation; detection of ocean acidification trends; regional process studies of surface ocean f CO 2 ; quantification of coastal ocean carbon sinks and sources; quantification of the ocean carbon sink and its variation; quantification of the land carbon sink.
These applications are roughly listed in order of the increasing importance of the SOCAT synthesis for the studies. The use of the SOCAT data collection in peer-reviewed, scientific publications is evolving. Initial publications made reference to the ongoing synthesis activity. Actual use of the SOCAT data collection started as soon as version 1 was released in 2011 Sabine et al., 2013). Studies that heavily rely on SOCAT data products, such as modelling, ocean acidification trend analysis and carbon budgeting, represent one-third to half of the scientific publications citing or naming SOCAT from 2013 onwards.
Examples of scientific applications of SOCAT are given below. There is no strict separation between the different types of applications identified here, with several studies belonging to more than one type of application. Many of the studies use surface ocean pCO 2 values, derived from the f CO 2 values reported in SOCAT data products.
Figures of surface ocean CO 2 observations. Newly created figures based on the SOCAT data collection and existing figures from SOCAT publications have been used in scientific publications. Such figures generally highlight the availability or lack of surface ocean CO 2 data in specific regions or seasons or over time (Chierici et al., 2012;Regnier et al., 2013;Wanninkhof et al., 2013a;Ciais et al., 2014;Majkut et al., 2014a;Brévière et al., 2015;Hofmann et al., 2015).
Use of SOCAT tools and protocols. A variety of tools and protocols has been developed in SOCAT. One of these is the definition of a continental margin mask which defines coastal waters as waters within 400 km from land . Evans and Mathis (2013) and Evans et al. (2015) use this continental margin mask. Other studies have adopted SOCAT protocols for calculation of f CO 2 (Ulfsbo et al., 2014) and quality control (Sutton et al., 2014b).
Use of surface ocean f CO 2 in diverse environmental studies. Regional f CO 2 values from SOCAT are used in diverse environmental studies with topics ranging from ocean acidification to genomics, gas transfer velocity and evaluation of independent measurements (Blomquist et al., 2014;Larsen et al., 2014;Holding et al., 2015;Marrec et al., 2015;Bonou et al., 2016;Reum et al., 2016). Reum et al. (2016) assess the co-variance between pCO 2 , pH and other environmental parameters with the aim to improve the design of future ocean acidification incubation experiments. Larsen et al. (2014) establish a significant correlation between gene expression for the relative turnover (synthesis or consumption) of CO 2 and surface ocean f CO 2 . SOCAT f CO 2 values are also used for evaluation of surface ocean f CO 2 estimates from eddy correlation (Blomquist et al., 2014) or from other carbonate parameters (Bonou et al., 2016) and for evaluation of regression parameterisations (Marrec et al., 2015;Xu et al., 2016).
Model-to-data comparison, model evaluation and data assimilation. SOCAT data products are used for model-to-data comparison, model evaluation and data assimilation in coupled and ocean-only biogeochemical models. Model-to-data comparisons of surface water f CO 2 have been carried out for seasonal (Tjiputra et al., 2012;Arruda et al., 2015) to multi-year timescales McKinley et al., 2016). In several studies, model data are subsampled to surface ocean pCO 2 observations from SOCAT Tjiputra et al., 2014;Turi et al., 2014). Cooley et al. (2015) evaluate surface ocean pCO 2 values from an integrated assessment model with pCO 2 observations from SOCAT and other sources. SOCAT data products are supporting model evaluation in context of the Coupled Model Intercomparison Project (CMIP) and beyond (Eyring et al., 2016). The SOCAT data collection is used for assimilation of surface ocean pCO 2 values in global ocean biogeochemical models (While et al., 2012;Bertino, 2013, as cited in Gehlen et al., 2015). Ocean biogeochemical models have many applications, such as quantification and attribution of trends in the ocean carbon sink (Le Quéré et al., 2014Quéré et al., , 2015aSéférian et al., 2014) and forecasting population dynamics of sea scallops, which are the basis of important commercial fisheries (Cooley et al., 2015).
Detection of ocean acidification trends. A number of studies estimate trends in surface ocean pH or the carbonate concentration by combining SOCAT f CO 2 values with another carbonate parameter (Lauvset and Gruber, 2014;Freeman and Lovenduski, 2015;Lauvset et al., 2015).
Regional process studies of surface ocean f CO 2 . Several authors investigate regional processes driving temporal or spatial variation in surface ocean f CO 2 and CO 2 airsea fluxes. Examples are for the Subantarctic Indian Ocean (Lourantou and Metzl, 2011) and the eastern equatorial Pacific Ocean (Walker Brown et al., 2015).
Quantification of coastal ocean carbon sinks and sources. SOCAT data products are used for quantification of CO 2 sources and sinks in coastal seas. Such studies are regional or global in extent Signorini et al., 2013;Laruelle et al., 2014Laruelle et al., , 2015.
Quantification of the ocean carbon sink and its variation. An important application of the SOCAT data collection is quantification of the ocean carbon sink on seasonal to multiyear timescales with a mapping or gap-filling method. Such studies may be regional or global in extent. Studies tend to be either for the coastal seas (Signorini et al., 2013) or for the open ocean . The studies interpolate sparse pCO 2 data from a SOCAT or LDEO synthesis product in time and space by a gap-filling method. Approaches include statistical interpolation Goddijn-Murphy et al., 2015;, multiple linear regression Signorini et al., 2013;Iida et al., 2015), neural network approaches Nakaoka et al., 2013;Sasse et al., 2013;Zeng et al., 2014) and model-based regression and tuning (Valsala and Maksyutov, 2010;Majkut et al., 2014b). Mapping methods may be specific to individual regions ("biomes") (Signorini et al., 2013;Landschützer et al., 2014) or may apply to the full (global) domain (e.g. Rödenbeck et al., 2013;. Most of these approaches use additional parameters with good data coverage during the gap-filling process, for example satellitederived sea surface temperature and chlorophyll a, as well as sea surface salinity and mixed layer depth from reanalysis. Many mapping methods use a time-dependent variable, such as time itself or the steadily increasing atmospheric CO 2 mole fraction, in order to be able to reproduce a long-term increase in surface ocean pCO 2 . The Surface Ocean pCO 2 Mapping Intercomparison (http://www.bgc-jena.mpg.de/SOCOM/) compares the surface ocean pCO 2 distribution and air-sea CO 2 fluxes in 14 data-based mapping products, 10 of them using SOCAT . The methods vary in their characteristics, making them suitable for different space and timescales. The SOCOM initiative aims to quantify uncertainties and to identify common features in the gap-filling methods. The first SOCOM results highlight considerable differences between mapping products, especially in data-sparse regions .
The high-profile Global Carbon Budget uses ocean biogeochemical models for estimating trends in the global ocean carbon sink (Le Quéré et al., 2014Quéré et al., , 2015a. Recent budgets also consider observation-based estimates of the ocean carbon sink using the LDEO and SOCAT synthesis products (Park et al., 2010;Landschützer et al., 2014Landschützer et al., , 2015Rödenbeck et al., 2014). The 2015 Global Carbon Budget assesses the uncertainty in the ocean carbon sink by comparing model results to observation-based estimates (Le Quéré et al., 2015b).
Earth Syst. Sci. Data, 8, 383-413, 2016 www.earth-syst-sci-data.net/8/383/2016/ Quantification of the land carbon sink. Quantification of the ocean carbon sink is critical to resolving the Global Carbon Budget and underpins the estimate of the land carbon sink (Le Quéré et al., 2014Quéré et al., , 2015a. In addition, quantification of ocean-atmosphere CO 2 fluxes in space and time provides priors for atmospheric inversion, thus improving estimates of the land carbon sink Van der Laan et al., 2014;.

Scientific findings obtained using the SOCAT data collection
This section provides an overview of scientific findings obtained using the SOCAT data collection.
Model-to-data comparison. Schuster et al. (2013) carry out a comparison of CO 2 air-sea fluxes for the Atlantic Ocean from data-based methods, ocean biogeochemical models, ocean inversion, and atmospheric inversions. The seasonal cycle and year-to-year variation in the fluxes differ between the various methods for most Atlantic regions. Two studies subsample model pCO 2 data to surface ocean pCO 2 observations derived from SOCAT. The authors conclude that ocean biogeochemical models on average underestimate the spatial and temporal variation in regional and global surface ocean pCO 2 by 10 to 40 % Turi et al., 2014). This corroborates the SO-COM finding that ocean biogeochemical models underestimate the year-to-year and decadal variation in the global airsea CO 2 flux . However, at least one model-to-data comparison study concludes that the Community Earth System Model captures the annual to 30-year variability in the ocean carbon cycle at regional to global scales (McKinley et al., 2016). Landschützer et al. (2015) demonstrate how ocean carbon observations are delivering new insights into large and globally significant decadal changes in the ocean carbon sink. The variability in the ocean carbon sink in regions like the Southern Ocean is not apparent in modelled estimates of ocean carbon uptake or from atmospheric inverse calculations (e.g. Lenton et al., 2013).
Detection of ocean acidification trends. SOCAT-based research indicates a decrease in global surface ocean pH at a rate of −0.0018 ± 0.0004 yr −1 for 1991 to 2011 with significant decreases in 70 % of all ocean regions .
Data-based carbon budgeting. Using SOCAT and other data sources, Regnier et al. (2013) estimate that anthropogenic activities may have increased open-ocean outgassing of land-derived carbon by 0.1 Pg C yr −1 . The global CO 2 sink in continental shelf seas has been estimated as 0.4 Pg C yr −1  and 0.19 ± 0.05 Pg C yr −1 (Laruelle et al., 2014).
Several mapping studies highlight large year-to-year variation in air-sea CO 2 fluxes in the tropical Pacific Ocean Rödenbeck et al., 2014Rödenbeck et al., , 2015. This variation is closely related to the El Niño-Southern Os-cillation (ENSO) (Feely et al., 1999Inoue et al., 2001;Rödenbeck et al., 2014). The variation in the equatorial Pacific Ocean roughly corresponds to 40 % of the interannual variation in the global ocean carbon sink , which has been estimated as 0.31 Pg C yr −1 .
The SOCOM comparison of mapping methods identifies an increase in global ocean carbon sink by 1 Pg C decade −1 since 2000 . About half of this increase in the global ocean carbon sink originates south of 35 • S in the Southern Ocean .

Conclusions
SOCAT version 3 represents an important release of the SO-CAT data collection, by creating a 58-year data record and by adding many additional data sets for recent years. The new data set flag of E in version 3 now enables inclusion of calibrated surface ocean f CO 2 measurements from alternative sensors (with an accuracy of better than 10 µatm) made on alternative platforms, such as moorings and drifters, in remote and less remote ocean regions. This article provides an ESSD "living data" update of SOCAT version 3. The launch of the SOCAT automation system will enable annual SOCAT releases from version 4 onwards.
The rapid growth of scientific publications using SOCAT (Fig. 7) demonstrates the importance of this synthesis activity by the international marine carbon community. The SOCAT data collection is being used in high-impact, scientific applications such as evaluation of ocean biogeochemical models, carbon budgeting, and trend analysis of the ocean carbon sink and ocean acidification. SOCAT-based studies have informed the Paris climate negotiations, as the 2015 Global Carbon Budget was released at the 21st Conference of the Parties of the United Nations Framework Convention on Climate Change (Le Quéré et al., 2015b).
However, despite much progress in data synthesis, major uncertainties remain in observation-based studies of the ocean carbon sink and ocean acidification due to (1) inadequate spatial and seasonal data coverage, (2) short data records, and (3) uncertainty in the correction for "natural", pre-industrial oceanic outgassing of land-derived CO 2 (Jacobson et al., 2007) and any anthropogenic perturbation of this outgassing (Regnier et al., 2013). Data coverage is particularly poor in the Indian Ocean, the Southern Hemisphere oceans and coastal seas and in the high-latitude oceans, notably in ice-covered regions and in winter (Figs. 3, 4 and 6).
The above reinforces the need for the continuing collection and synthesis of accurate, well-calibrated and welldocumented observations and investment in high-quality surface ocean CO 2 measurements on autonomous platforms. Adequate resources need to continue to be made available for data collection, quality control and data synthesis. Sys-tems should be automated whenever possible. The SOCAT data synthesis highlights the success of a bottom-up approach with buy-in from the international marine carbon community and endorsement by IOCCP, SOLAS and IMBER.

Data availability
This manuscript describes how the synthesis product has been created (Sect. 4) and how the individual data set files, synthesis files and gridded products can be accessed (Sect. 5) ( Table 8). Individual data set files, all combined forming the synthesis product, can be downloaded here: doi:10.1594/PANGAEA.849770. Global and regional files are available as compressed zip text files via CDIAC (http: //cdiac.ornl.gov/ftp/oceans/SOCATv3/). The global synthesis product for data sets with flags of A to D is also available in Ocean Data View format (https://odv.awi.de/en/data/ ocean/socat_fCO2_data). The gridded products are available here: doi:10.3334/CDIAC/OTG.SOCAT_V3_GRID. Further details are in Sects. 4 and 5.