Surface Ocean CO 2 Atlas (SOCAT) gridded data products

. As a response to public


Introduction
Human industrial and agricultural activities have caused the global atmospheric carbon dioxide (CO 2 ) concentration to increase from about 280 parts per million (ppm) prior to the industrial revolution to a 2011 value of about 390 ppm (Tans and Keeling, 2013). Atmospheric CO 2 concentrations are now higher than experienced on Earth for more than 800 000 yr and are expected to continue increasing in the foreseeable future (Lüthi et al., 2008). The rate of CO 2 release into the atmosphere is also thought to be unprecedented in Earth's history (Kump et al., 2009). As of the mid-1990s, the oceans had absorbed approximately half of the CO 2 released from fossil fuel use and cement manufacturing over previous 200 yr . As a consequence of ocean CO 2 uptake, the chemistry of the oceans is presently changing at a rate exceeding any believed to have occurred for at least the past 20 million years .
Although the magnitude of the CO 2 uptake by the ocean presently is reasonably well constrained by models and observations (e.g., Gruber et al., 2009), the inter-annual variability of the CO 2 flux is still poorly known. Knowledge of variability is of particular importance to predict changes in the ocean-atmosphere fluxes in response to global change, and lack of this knowledge currently limits our ability to accurately verify the partitioning of fossil fuel CO 2 between the ocean and the terrestrial biosphere and to realistically estimate future atmospheric CO 2 levels. To document the changing patterns of air-sea CO 2 exchange requires an extensive observational program.
The ocean CO 2 research community has responded by initiating internationally organized observation programs and the number of annual surface CO 2 observations has been growing exponentially since the 1960s, such that today well over one million observations are reported to data centers each year (Sabine et al., 2010). The latest published global flux map, based on a compilation of approximately three million measurements collected between 1970 and 2007, provides information on the monthly patterns of air-sea CO 2 fluxes during a "normal" non-El Niño year taken to be 2000 .
The tremendous increase in the number of annual observations provides exciting opportunities to look at the patterns of air-sea CO 2 fluxes in greater detail and to understand the seasonal to inter-annual variations and the mechanisms controlling them. As a complement to Takahashi's work to update the CO 2 climatology, there is an ongoing international effort to synthesize all the available surface CO 2 data into a quality controlled database, along with uniform metadata that can be used to examine surface CO 2 variability over a range of temporal and spatial scales (SOCAT; http://www.socat.info/).
The core Surface Ocean CO 2 Atlas (SOCAT) data set (version 1.5) is a global compilation of underway surface water CO 2 data with 7.8 million measurements (6.3 million with f CO 2 values) from 1851 cruises run between 1968 and 2007 by more than 10 countries (Fig. 1). SOCAT brings together, in a common format, all publicly available surface ocean CO 2 data, including the Arctic and the coastal seas. All measurements are evaluated for data quality using methods that are transparent and fully documented (Pfeil et al., 2013). The observations in the core SOCAT data set are sparse and unevenly distributed in time and space. To simplify exploration of the information from the collection of observations, a standard gridded representation of the SOCAT data with minimal interpolation was generated. The gridded product contains mean and extreme f CO 2 (CO 2 fugacity) values for ev-ery 1 • × 1 • grid cell (¼ • × ¼ • for coastal regions) that had measurements in a given month. Grid cells with no measurements in a given month and year were left blank. Reducing the original data down to 1 • × 1 • average values will facilitate comparisons with models and analyses of large scale patterns of variability.

Core data set construction
The construction of the core SOCAT data set is described in detail by Pfeil et al. (2013), so only a basic description is given here. The SOCAT data set was compiled from data sets of surface CO 2 measurements made through 2007, which were either publically available or obtained upon request. These include measurements from research ships, volunteer observing ships and CO 2 moorings. Once collected, all files had to be put into a common format. This involved not only restructuring the data columns, but recalculating the f CO 2 values using standardized approaches (Pierrot et al., 2009). Missing parameters like salinity and atmospheric pressure were extracted from standard global datasets (WOA 2005-Antonov et al., 2006and NCEP/NCAR -Kalnay et al., 1996, in addition to bathymetry added from ETOPO2. To ensure that each cruise was given a unique identifier in the database, all cruises were given an EXPOCODE that identifies the vessel, country, and date of the first cruise measurements (e.g., 06MT19910903 = German cruise on the Meteor starting 3 September 1991).
An initial set of quality control checks were performed to identify any unrealistic information (e.g., sample locations on land, dates and times that do not exist), any large "fliers" in the data and any duplicated data or cruise information. Where problems were discovered, the initial quality control checks resulted either in the removal of entire cruises (e.g., in the case of a duplicated cruise) or in the flagging of individual data points with a bad or questionable value. Although the flagging of individual data points can be somewhat subjective, we made an effort to be very conservative and only flag observations that were well beyond realistic values. Flag numbers 2, 3, and 4 were used to indicate good, questionable, and bad values, respectively.
For the second level quality control checks, the database was divided into regions: North Atlantic (including Arctic), Tropical Atlantic, North Pacific, Tropical Pacific, Indian Ocean, Southern Ocean, and coastal. Small groups of carbon scientists that specialize in each region were given responsibility for examining the data quality in each region. Standardized procedures and tools were used for the 2nd level quality control (2nd QC) in each of the regions. These evaluations included a careful assessment of the metadata to determine if the best practice methods (Dickson et al., 2007, http://cdiac.ornl.gov/oceans/Handbook 2007.html) were followed and properly documented. The f CO 2 data were also examined for consistent patterns and reasonable or expected thermodynamic relationships when compared to measured temperature and salinity. Data consistency was also assessed by comparing measurements from multiple cruises in the same area at about the same time. The regional groups used their scientific experience in the area to determine the appropriate temporal and spatial scales for the consistency evaluations.
Cruises were categorized as A-D, F, S or X based on the evaluations by the regional groups (Olsen and Metzl, 2010). There are four categories of generally acceptable data that range from A (followed approved methods, metadata documentation complete, 2nd QC performed and deemed acceptable, and comparison with other data performed and deemed acceptable) to D (unsure whether approved methods were completely followed, metadata documentation was incomplete, but 2nd QC was performed and deemed acceptable). There are three categories of data that were not deemed acceptable in their current form. Category F is for data where the 2nd QC revealed problems with data quality. Category S is for cruises that are temporarily suspended because the data contains unacceptable problems but are in the process of being updated by the data provider. The intent is that these cruises will be included in the next version of SOCAT. Category X is for data that have been excluded, for example if they are a duplicate of an existing SOCAT data set.
Because of the extensive amount of work involved in formatting and carefully checking data quality, no data collected after December 2007 were added to this (first) release of the database. After each of the regional groups had categorized all of the cruises, a global group examined the entire database for consistency and resolved occasional conflicts between regional assessments of cruises that crossed regional boundaries.

Mapping procedures
The gridded SOCAT product was derived by combining all SOCAT v1.5 data collected within a 1 • ×1 • box during a specific month (e.g., between 60-61 • N, 30-31 • W, for January 2007). Data within 400 km of a significant land mass were assigned to the coastal region and were grouped into ¼ • × ¼ • boxes. Only data with a secondary quality flag between "A" and "D" were included. Grid cells that had no measurements in a given month were not assigned a value. Given the very limited data in the first two years of the core data set, the gridded product starts in 1970.
The primary purpose of the gridding is to provide regularly spaced f CO 2 values that can be used for mapping, creating comparisons to models, or in other applications where highly structured values would be useful. The choice of grid size and temporal resolution was intended to stay within the average correlation length scale for surface ocean CO 2 and provide a product that could be directly combined with other routine gridded products available at the same resolution (e.g., World Ocean Atlas).
Two types of f CO 2 values are reported: an unweighted mean and a cruise-weighted mean. Measurements collected from different ships can have very different temporal resolutions. Some ships record values every minute and others only take a reading once per hour. If two cruises pass through the same grid cell, one with one minute resolution and another with one hour resolution, the mean of all the measurements would be strongly biased toward the higher resolution data set. In some cases this biasing may not be appropriate. Therefore, the cruise-weighted mean first averages all the data obtained on a given cruise within a grid cell, and then averages the cruise means within the grid cell. This gives equal weight to all the cruises regardless of the original temporal resolution. Generally these values are very similar, but in some grid cells there are significant differences (e.g., some cells in the North Pacific).
Other diagnostic information is also included to help the user understand any implicit biases in the data. For example, the total number of cruises and the total number of observations within a grid cell are provided. Technically, only one measurement is required to produce a value in a grid cell. One could argue that one measurement cannot adequately represent an entire 1 • × 1 • area over a whole month. With the cruise and total number of observations, one could designate a minimum number of cruises or data required to make an acceptable average and filter the data accordingly.
The minimum, maximum, and standard deviation of the f CO 2 values within each grid cell are also provided. These values can supply useful information on the distribution of data within a grid cell. For example, Fig. 2 shows the distribution of the non-zero standard deviations of the weighted and unweighted f CO 2 values. The average standard deviation of all the non-zero unweighted mean f CO 2 values is 5.0 µatm. The average standard deviation for the cruiseweighted mean is only slightly lower at 4.9 µatm. The relatively low standard deviations suggest that the average values are reasonably robust. Another potential issue with the mean f CO 2 values is whether the cruises adequately cover the grid cell area. To assess the area covered, the average latitude and longitude offset from cell center was calculated. If the maximum deviation from the center point is 0.5 in latitude and longitude, then a triangle with a hypotenuse of m gives the maximum possible offset (m = (0.5 2 + 0.5 2 ) 0.5 = 0.7071). The average combined offset for all grid cells was 0.34 ± 0.14, representing a standard Gaussian distribution of cruise locations in the cells (Fig. 3).

Spatial and temporal coverage
The number of CO 2 measurements made in the surface ocean annually has increased dramatically over the past 40 yr. Between the early 1990s and the early 2000s, approximately 900 000 new measurements in total were added to the database. Today, nearly a million measurements are collected every year. Figure 4 shows a map of the total number of surface f CO 2 values per grid cell over the observational period and the percentage of surface ocean grid cells per latitude band sampled over this time. At ∼ 30 • N, nearly 100 % of the surface ocean grids have been sampled at least once within the nearly 40 yr time period covered in this data set. The North Atlantic and North Pacific have at least some measurements in the majority of grid cells. The best coverage appears to be in the high latitude North Atlantic and in the Equatorial Pacific. A much smaller percentage has been covered in the Southern Hemisphere (generally around 50 %). The map reveals some very significant areas in the South Pacific, Indian and even Atlantic that have no observations over this time period (white spaces in Fig. 4). Breaking the  data down into seasons or months shows even more sporadic coverage.
The monthly and seasonal variability can be explored in the 12 month climatology data product. This product averages all f CO 2 values in a given grid cell for each month regardless of the year. Figure 5 shows the number of unique months out of the year that a grid cell has observations in the nearly 40 yr database. While spatial coverage has increased dramatically over the last few decades, a majority of the surface ocean has only been sampled for f CO 2 during one month out of the year (blue colors in Fig. 5). The volunteer observing ship (VOS) effort to put underway CO 2 instruments onto commercial ships has dramatically increased total number of observations each year and the monthly coverage along certain shipping lines (red colors in Fig. 5); however, the figure also illustrates the need for even more observations, particularly in the Southern Hemisphere.

Decadal trends
To facilitate the exploration and use of the temporal richness of SOCAT, the monthly gridded data have also been binned into annual and decadal averages. Figure 6 shows histograms of the decadal mean f CO 2 values for each decade globally as well as broken down into the Northern Hemisphere and Southern Hemisphere. As atmospheric CO 2 increases with time so do the peak histogram f CO 2 values. This increase is expected since globally the surface water CO 2 is thought to track the atmospheric increase ). However, one must be careful to remember that since there is essentially no interpolation in this data set, there are spatial and temporal biases inherent in these gridded products. For example, the data from the 1970s appear to show a bimodal distribution but this is likely an artifact of the limited data coverage at that time.

Comparison to previous work
This product is meant to complement the similar products available to the community (e.g., Takahashi et al., 2009;Key et al., 2004). This is the most comprehensive collection of surface CO 2 data available as it is compiled with all publicly available data using transparent quality control procedures. There is no explicit interpolation in time or space during gridding, which keeps the gridded product as close to the original data as possible. The only other data set with comparable quantity and quality control checks is the pCO 2 climatology of Takahashi et al. (2009).
To generate the well-known Takahashi climatology, all data had to be normalized to a common non-El Niño year and interpolated in space to fill the entire grid . While these steps were important to generate a robust fully covered map, much of the temporal richness of the data was lost and uncertainty was introduced with the temporal and spatial interpolation approaches. The SOCAT product has many of the same original data as the Takahashi data set, but the objective was not to generate a climatology representing a single year. The SOCAT gridded product can be used to further investigate the seasonal, inter-annual and decadal variability in the data set. It also allows the community to explore alternative ways of making the time and space interpolations required to generate global flux maps.
The grid cell structure was selected to coincide with the World Ocean Atlas and GLODAP gridded products (e.g., Garcia et al., 2010;Key et al., 2004). This will allow direct comparison of synthesis products that can be used to investigate controls on surface CO 2 and drivers of variability. Care must be taken, however, when comparing products from different sources that may represent different temporal and spatial scales.
The grid is also easily adapted for direct comparison with a range of numerical model products. The hope is that these data will provide important initialization and validation fields for a range of carbon cycle models. The use of standardized formatting and consistent approaches for compiling the data and making standardized calculations should make the data more accessible to those that do not work with carbon data every day.

Online access to SOCAT gridded data and services
The gridded fields of Table 1 are available in a variety of formats (e.g., NetCDF) from the SOCAT project page at http://www.socat.info, with guidance on using and citing the data. The project Web page also features an online gridded data viewer based upon the NOAA/PMEL Live Access Server (LAS) (Hankin et al., 2002). The viewer provides custom visualizations of the data such as maps and time series plots, and performs simple analyses such as averages computed over time ranges or spatial areas. A short instructional video is available on the SOCAT project Web page. Figure 1 is an example product of this system -a time average of the 12-month gridded climatology product.

Conclusions
The SOCAT gridded data is the second data product to come from the SOCAT project. The first was a quality controlled data set at the originally collected time and space resolution with 6.3 million f CO 2 observations collected between 1968 and 2007. Recognizing that some groups may have trouble working with millions of measurements, the SOCAT gridded product was generated to provide a robust regularly spaced f CO 2 product with minimal spatial and temporal interpolation, which should be easier to work with for many applications. Gridded SOCAT is rich with information that has not been fully explored yet (e.g., regional differences in the seasonal cycles), but also contains biases and limitations that the user needs to recognize and address (e.g., local influences on values in some coastal regions). Despite this synthesis effort, understanding surface CO 2 variability is still a data-limited problem. More data are being collected every day. Plans are in the works to update the current SOCAT with more data. As future SOCAT data sets become available, the SOCAT gridded product will also be updated. Further automation of data submission and quality control in SOCAT will enable future, prompt SOCAT releases. To keep abreast of the latest developments in SOCAT, please visit www.SOCAT.info.