Global ocean particulate organic carbon flux merged with satellite parameters

Particulate organic carbon (POC) flux estimated from POC concentration observations from sediment traps and 234Th are compiled across the global ocean. The compilation includes six time series locations: CARIACO, K2, OSP, BATS, OFP, and HOT. Efficiency of the biological pump of carbon to the deep ocean depends largely on biologically mediated export of carbon from the surface ocean and its remineralization with depth; thus biologically related parameters able to be estimated from satellite observations were merged at the POC observation sites. Satellite parameters include net primary production, percent microplankton, sea surface temperature, photosynthetically active radiation, diffuse attenuation coefficient at 490 nm, euphotic zone depth, and climatological mixed layer depth. Of the observations across the globe, 85 % are concentrated in the Northern Hemisphere with 44 % of the data record overlapping the satellite record. Time series sites accounted for 36 % of the data, while 71 % of the data are measured at ≥ 500 m with the most common deployment depths between 1000 and 1500 m. This data set is valuable for investigations of CO2 drawdown, carbon export, remineralization, and sequestration. The compiled data can be freely accessed at doi:10.1594/PANGAEA.855600.


Introduction
Field estimates of particulate organic carbon (POC) flux have been made over many decades in the interest of understanding the biological pump of carbon to the deep ocean. While there have been a variety of new techniques to quantify POC flux, sediment traps have been the most extensive temporally and geographically, and 234 Th has improved data resolution in the upper 500 m of the water column. POC flux depends largely on the biologically mediated export of carbon from the surface ocean and its remineralization with depth, thus capturing biological variables associated with POC flux are essential to understand flux variability. Here we compile POC flux estimated from sediment traps and 234 Th from around the globe from public repositories and directly in the literature. We then match the POC flux observations with biological and physical parameters determined from satellite imagery along with mixed layer depth (MLD) climatology. See Table 1 for a list of products and units.
Understanding the impact of surface processes on the export of organic carbon at depth has been an ongoing challenge in the oceanographic community since the Joint Global Ocean Flux Study (JGOFS). Continued efforts with the upcoming Export Processes in the Ocean from RemoTe Sensing (EXPORTS) program along with the Pre-Aerosol, Clouds and ocean Ecosystem (PACE) satellite mission seek to connect remotely sensed estimates of net primary production, particle size distribution, phytoplankton carbon, biomass, and community composition to water column carbon processes. To do this, existing data sources capturing water column processes need to be compiled and synthesized. Our data set provides researchers with access to a comprehensive historical data set of POC flux throughout the global Published by Copernicus Publications.  (Maritorena et al., 2002), diffuse attenua-tion coefficient at 490 nm (K d (490)) (O'Reilly et al., 2000), and photosynthetically available radiation (PAR) (Frouin et al., 2002). At the time of writing, only 8 % of the publicly available POC observations were measured beyond 2008, when the MODerate resolution Imaging Spectroradiometer (MODIS) replaced the SeaWiFS record, and thus we focus our data compilation here solely on SeaWiFS. Net primary production (NPP) estimates from the Vertically Generalized Production Model (VGPM) (Behrenfeld and Falkowski, 1997) are obtained from http://www.science.oregonstate. edu/ocean.productivity/ (9 km, 8-day resolution). SeaWiFS data products and NPP are retrieved as the median of a 5 × 5 pixel box (2025 km 2 ) centered on each POC flux location (Bailey and Werdell, 2006). AVHRR Pathfinder Version 5 (4 km, 8-day resolution) sea surface temperature (SST) imagery was acquired from the US National Oceanographic Data Center and GHRSST (http://pathfinder.nodc.noaa.gov) (Casey et al., 2010). To match the spatial resolution of Sea-WiFS as much as possible, SST was retrieved as the median of an 11 × 11 pixel box (1936 km 2 ) centered on each POC flux location. The Mouw and Yoder (2010) approach is used for satellite retrieval of phytoplankton size classes from SeaWiFS imagery (9 km, monthly resolution). The imagery files were obtained from: doi:10.1594/PANGAEA.860474. The method estimates the percentage of microplankton (S fm ) from satellite imagery of remote sensing reflectance (R rs (λ)). This is an absorption-based approach where the chlorophyll-specific absorption spectra for phytoplankton size class extremes, pico-(0.2-2 µm) and microplankton (> 20 µm), are weighted by S fm (Ciotti et al., 2002;Ciotti and Bricaud, 2006). Briefly, S fm is estimated from a look-up table containing simulated chlorophyll [Chl], absorption due to dissolved and detrital material at 443 nm (a cdm (443)), R rs (λ), and S fm . For a given pixel, satellite-estimated [Chl] and a cdm (443) (Maritorena et al., 2002) are used to narrow the search space within the look-up table. Of the remaining options, the closest simulated R rs (λ) to the satellite-observed R rs (λ) is selected and the associated S fm is assigned. S fm is retrieved on a monthly timescale as the median of a 5 × 5 pixel box (2025 km 2 ) centered on each POC flux location. Export depth is often chosen as either the base of the euphotic zone or MLD (Lutz et al., 2007;Lam et al., 2011); thus both are compiled here. The depth of the euphotic zone was determined from K d (490) (O'Reilly et al., 2000) as 4.6/K d (490) (Morel and Berthon, 1989) from 8day SeaWiFS data products. MLD estimates are obtained from the IFREMER/LOS Mixed Layer Depth Climatol-ogy group (http://www.ifremer.fr/cerweb/deboyer/mld) from density profiles using a variable density threshold equivalent to 0.2 • C, which accounts for both changes in temperature and salinity (level 3, monthly climatology, 1 • resolution; de Boyer Montégut et al., 2004Mignot et al., 2007). We retrieve monthly MLD climatology for each pixel containing a POC flux location (1 • resolution).

POC flux data
POC sediment trap data are acquired from public repositories and published literature (Table 2; Fig. 1). Estimates from 234 Th measurements are also acquired to improve the resolution of observations in the upper 500 m of the water column (Dunne et al., 2005;Henson et al., 2012;Guidi et al., 2015). These represent 4 % of the total data set. Collected field estimates of POC flux derived from 234 Th maintain the original authors' analysis, where POC flux is retrieved based on 234 Th activity in the water column accounting for the ratio of POC to 234 Th concentration (Buesseler and Boyd, 2009). Both sediment traps and 234 Th methodologies have documented challenges associated with accurately retrieving POC flux and characterizing uncertainty. Sediment traps have possible bias associated with the interaction of hydrodynamics with trap design, the capture of zooplankton ("swimmers"), and incomplete preservation of material. 234 Th-based measurements have associated biases accounting for local advection, quantifying particulate adsorption and with variability in the ratio of POC : 234 Th. See the discussions of Buesseler (1991), Buesseler et al. (2000), Lee et al. (1992), Murray et al. (1996), Quay (1997, and van der Loeff et al. (2006), for in-depth analyses of these issues.   A significant number of studies occurred prior to the launch of SeaWiFS in September 1997 (see Honjo et al., 2008, and references therein). While we compiled observa-tions across all available time frames, greater focus is placed on collecting data concurrent with the satellite record to allow corresponding imagery-based environmental parameters to be matched. Overall, the data set comprises a total of 15 792 individual measurements at 673 unique locations with 6842 (43 %) collected during the satellite record. In the interest of matching the timescale of POC flux to satellite-derived products to the greatest degree possible, we focused on collecting short-term sediment trap deployments with individual cup intervals of 30 days or less. The majority of the data set (14 555 measurements or 92 %) fell into this category with a median cup interval of 14 days and a standard deviation of 6 days. Data are skewed towards shorter deployments with 59 % of qualified measurements deployed 14 days or less and 93 % deployed 20 days or less.

Fluxes of other constituents, uncertainty estimates, and metadata
Where readily available, we collect concurrent flux estimates of other organic and inorganic components in addition to POC flux including particulate inorganic carbon, particulate nitrogen and phosphorus, calcium carbonate, biogenic silica, trace metals, and phytoplankton pigments (Table 1). These data are included to explore relationships between POC export and remineralization and ballasting materials. Where reported by the original authors, we include uncertainty estimates for measured fluxes in the compilation. We also collect and include metadata as reported by the original authors. At a minimum, we require each observation be associated with latitude and longitude, deployment date, and depth to be included in the data set. Other information, such as sediment trap type and trap funnel area, is included where available. The majority of measurements (58 %) were not associated with a reported total water depth. Bathymetry was retrieved for POC flux locations from the ETOPO1 1 arcmin Global Relief Model (Amante and Eakins, 2009) from the single pixel containing the measurement location. Locations close to shore were sometimes classified as being on land by ETOPO1; bathymetry is excluded in these cases.

Results
The deployment, retrieval, and analysis of sediment trap and 234 Th samples represents a significant expenditure of both effort and resources and projects are often funded on a shortterm local/regional basis (Honjo et al., 2008). This is reflected in the patchy distribution of observations across the globe in multiple dimensions: space, time, and vertical resolution (Fig. 1). Collection efforts are more prevalent in the Northern Hemisphere, with 63 % of unique station locations comprising 85 % of total observations falling north of the Equator ( Fig. 2a and b). Long-term oceanographic time series locations at BATS/OFP, CARIACO, K2, OSP, and HOT (all in the Northern Hemisphere) collectively account for 36 % of the total data set. If time series locations are removed, 77 % of remaining observations still concentrate north of the Equator. The most sampled regions in the Northern Hemisphere are at midlatitudes, with a quarter of the data set (discounting time series locations) falling between 30 and 40 • N (Fig. 2b). In the Southern Hemisphere, data are concentrated at higher latitudes, with a little over half of collected measurements derived from the Southern Ocean at ≥ 60 • S. In both hemispheres, the second-most sampled latitudes are near the Equator (10 • N-10 • S). The data set spans 4 decades from 1976 to 2012 with the majority of efforts (62 %) deployed between 1990 and 2000 ( Fig. 2, Table 2). In addition, 43 % of the measurements were collected after September 1997, when the SeaWiFS mission was launched. Prior to SeaWiFS, 79 % of observations are in the Northern Hemisphere (Fig. 2c). After September 1997, the latitudinal distribution becomes even more skewed with 93 % of the observations in the Northern Hemisphere concurrent with the satellite record (Fig. 2d).
While 43 % of the data were observed during the continuous satellite era, not all observations had coincidental imagery. Here we define coincident as retrieved satellite observations within the same month as sediment trap deployment or 234 Th measurement for a given POC flux location. We consider only the S fm and NPP imagery for this purpose as they are representative of phytoplankton surface processes and the NPP product already requires SST and [Chl] imagery as inputs. This reduces the total satellite era observations from 6842 to 3722, a drop in total contribution from 43 to 24 %. These are spread over 121 unique locations (Fig. 3). Of the coincident observations, 95 % are in the Northern Hemisphere primarily between 10 and 50 • N, with the majority found between 30 and 40 • N (Fig. 2e). Data sets in some re-gions of the ocean (e.g., the equatorial Pacific and the Arabian Sea in Fig. 1) have no satellite overlap (Fig. 3).
The depth resolution of the observations is important for investigators interested in fitting export flux relationships (Martin et al., 1987;Lima et al., 2014). The greatest variability in POC flux is found in the first 500 m of the water column (Lam et al., 2011, Fig. 4). Considering all POC observations together, median POC flux rapidly diminishes from 160 mg C m −2 d −1 in the upper 100 m to 30 mg C m −2 d −1 at 500 m and 6 mg C m −2 d −1 at 1000 m. Below 1000 m, the average POC flux is 3 mg C m −2 d −1 (Fig. 4).
Overall, 70 % of the compiled data set is measured at ≥ 500 m (Fig. 5). Thus, the upper water column close to the depth of export is relatively underrepresented. To increase depth resolution, we consider 234 Th and sediment traps together (Dunne et al., 2005;Guidi et al., 2015). Guidi et al. (2015) also merged data from the underwater vision profiler (UVP), which is not included in this compilation as it has not yet been released into a public archive. Shallow observations are critical for capturing the impact of phytoplankton on POC export flux as these data are most connected to surface processes. By adding 234 Th measurements to the data set, 249 locations gain depths in the upper water column < 500 m. 234 Th data contribute 32 % of all POC flux estimates resolved at depths between 100 and 200 m (Fig. 5a). Overall, the most common deployment depths are between 1000 and 1500 m (14 %) followed by 200 to 300 m (11 %) and then 3000 to 3500 m (9 %) (Fig. 5b). The dominance of the 1000 to 1500 m observation depth is weighted to the presatellite era (Fig. 5c). During the satellite era, 200 to 300 m (6 %) became the most sampled depth, largely due to persistent time series observations at BATS and OSP, followed closely by the 1000 to 1500 and 3000 to 3500 m bins (5 % each) again the result of time series observations at CARI-ACO and OFP (Fig. 5d). Reasonable depth resolution is found in the observations coincident with satellite matchups (Fig. 5d).

Conclusions
This data set is the most comprehensive compilation of POC flux across the globe that we are aware of. By providing merged coincident satellite imagery products, the data set can immediately be used to link phytoplankton surface process with POC flux. Due to rapid remineralization within the first 500 m of the water column, shallow observations from 234 Th are helpful to supplement the more extensive sediment trap record. The data compilation is also insightful in terms of spatial and depth resolution to aid in decision making for future POC flux observing investments.

Data availability
The data set contains 15 792 individual POC flux estimates at 674 unique locations collected between 1976 and 2012. Where available, the flux of other minerals is also reported. 43 % (6842) of POC flux measurements overlap with the SeaWiFS satellite record (September 1997to December 2010. Satellite parameters in this compilation include: chlorophyll concentration, net primary production, sea surface temperature, diffuse attenuation coefficient, euphotic depth, photosynthetically active radiation, and microplankton fraction. Estimated mixed layer depths and bathymetry are also provided. Parameters associated with observation sites are extracted as the median of a 5 × 5 (chlorophyll concentration, NPP, K d (490), PAR and S fm ), 11 × 11 (SST), or 1 × 1 (MLD, bathymetry) pixel box. The compiled data are available on PANGAEA (https://www.pangaea.de/): doi:10.1594/PANGAEA.855600 (Mouw et al., 2016).