Global marine plankton functional type biomass distributions: coccolithophores

Abstract. Coccolithophores are calcifying marine phytoplankton of the class Prymnesiophyceae. They are considered to play an import role in the global carbon cycle through the production and export of organic carbon and calcite. We have compiled observations of global coccolithophore abundance from several existing databases as well as individual contributions of published and unpublished datasets. We make conservative estimates of carbon biomass using standardised conversion methods and provide estimates of uncertainty associated with these values. The quality-controlled database contains 57 321 individual observations at various taxonomic levels. This corresponds to 11 503 observations of total coccolithophore abundance and biomass. The data span a time period of 1929–2008, with observations from all ocean basins and all seasons, and at depths ranging from the surface to 500 m. Highest biomass values are reported in the North Atlantic, with a maximum of 127.2 μg C L−1. Lower values are reported for the Pacific (maximum of 20.0 μg C L−1) and Indian Ocean (up to 45.2 μg C L−1). Maximum biomass values show peaks around 60° N and between 40 and 20° S, with declines towards both the equator and the poles. Biomass estimates between the equator and 40° N are below 5 μg C L−1. Biomass values show a clear seasonal cycle in the Northern Hemisphere, reaching a maximum in the summer months (June–July). In the Southern Hemisphere the seasonal cycle is less evident, possibly due to a greater proportion of low-latitude data. The original and gridded datasets can be downloaded from Pangaea ( doi:10.1594/PANGAEA.785092 ).


Introduction
Marine plankton are the main driver for the global marine cycling of elements such as carbon, nitrogen and phosphorus, primarily through the process of carbon fixation and nutrient uptake during primary production and subsequent export of organic matter to the deep ocean. Modern marine ecosystem models seek to represent the functional diversity of marine plankton using the concept of plankton functional types (PFTs; Iglesias-Rodríguez, 2002;Le Quéré et al., 2005). PFTs are groups of plankton with defined biogeochemical functions, for example calcification, DMSproduction or nitrogen fixation. The inclusion of these groups in marine ecosystem models provides great potential for improving our understanding of marine processes (see for example Dutkiewicz et al., 2012;Marinov et al., 2010;Vogt et al., 2010;Manizza et al., 2010), but has also highlighted a need for extensive observational datasets for model parameterisation and validation (Hood et al., 2006;Le Quéré et al., 2005;Anderson, 2005).
The MARine Ecosystem DATa (MAREDAT) project (as part of the MARine Ecosystem Model Intercomparison Project -MAREMIP) seeks to compile global biomass data for PFTs commonly represented in marine ecosystem models: silicifiers, calcifiers (including coccolithophores, pteropods and foraminifera), DMS-producers, pico-phytoplankton, diazotrophs, bacteria, and three zooplankton sizeclasses (micro-, meso-and macrozooplankton). A summary of the findings for all groups is presented in Buitenhuis et al. (2013).
This paper presents a database of global coccolithophore biomass distributions compiled as part of the MAREDAT effort. The coccolithophores are a globally occurring group of calcifying phytoplankton of the class Prymnesiophyceae (Jordan et al., 2004;Winter and Siesser, 1994;Thierstein and Young, 2004). They are thought to play an important role in the global carbon cycle due to their contribution to primary production and export as well as through calcite production (Iglesias-Rodríguez, 2002;Hay, 2004;Jin et al., 2006), with blooms of over 100 000 km 2 observed in some ocean regions (Brown and Yoder, 1994;Holligan et al., 1993). The coccolithophores have received considerable attention in recent years due to their potential sensitivity to climate change and particularly ocean acidification (Doney et al., 2009). The decrease in carbonate saturation state in the oceans caused by rising atmospheric CO 2 is generally expected to have negative effects on calcifying marine organisms due to the increasing energetic cost of calcification (Hofmann et al., 2010). There have, however, been mixed results from experimental and field studies of coccolithophores, with some showing a negative effect of ocean acidification (e.g. Beaufort et al., 2011;Riebesell and Zondervan, 2000) whereas others show no change or even increased calcification and production (Langer et al., 2006;Iglesias-Rodríguez et al., 2008). Changes in ocean temperature, stratification and nutri-ent supply are also expected to affect coccolithophore distributions, although again the direction of this change is unclear (Hood et al., 2006;Iglesias-Rodríguez, 2002). Given these uncertainties, it is more important than ever to understand the current distribution of coccolithophores in the global oceans.
Remote sensing approaches are frequently used to study the distribution of coccolithophore blooms (e.g. Smyth, 2004;Brown and Yoder, 1994;Iglesias-Rodríguez, 2002;Hirata et al., 2011). The reflective properties of the calcitebased coccoliths allow blooms to be observed in satellite images (Holligan et al., 1983), providing great potential for improving our understanding of coccolithophore distributions on a global scale. There are, however, several limitations to this approach. Firstly, satellite images pick up the optical properties of the calcite-based coccoliths themselves and do not distinguish between living cells and detached coccoliths (Tyrell and Merico, 2004). Secondly, satellite data are limited to waters within the optical depth of the satellite and provide no information as to the vertical structure of cells within the water column or cells occurring below this depth. Finally, more detailed taxonomic information cannot yet be obtained from satellite images. There is, therefore, a continuing need for in situ observations of coccolithophores in order to better understand their distribution, ecology and contribution to global plankton biomass.
This database compiles existing published and unpublished coccolithophore abundance data and provides standardised biomass estimates using species-specific conversion factors. We also provide a detailed discussion of our conversion methods and quality control procedures and discuss the uncertainties associated with the biomass values. Although this dataset was born from the needs of the modelling community, we anticipate that it will be of use to scientists from a range of fields including biological oceanography, marine ecology, biogeochemistry and remote sensing.

Origin of data
Our data consists of abundance measurements obtained from several existing databases (NMFS-COPEPOD, BODC, OBIS, OCB DMO, Pangaea, WOD09, OOV) 1 , as well as published and unpublished data from a number of contributing authors (P. Ajani datasets, sorted in temporal order. The database contains 58 384 data points when all counts of individual taxa are considered separately, which equates to 11 503 samples of total coccolithophore abundance collected from 6741 depthresolved stations. Abundance data were standardised to units of cells per litre, and ancillary data such as temperature, salinity, chlorophyll and nutrients were retained where available.

Biomass conversion
To convert the abundance data (cell counts per unit volume) to biomass estimates (expressed as the concentration of organic carbon per unit water volume), we first needed to multiply the abundance data by the average biovolume for each species, and then multiplied the resulting biovolume concentration with the average organic carbon content per biovolume.
We determined cell biovolumes for each of the taxonomic groups reported in the database based on an extensive literature survey. Coccolithophore taxonomy has been subject to numerous revisions over the time span of the dataset, making it challenging to match historical data to current species names and descriptions. For consistency, data entries were matched to currently accepted species names following the taxonomic scheme of Jordan et al. (2004) wherever possible. Where full taxonomic information was not provided, data were matched to the lowest taxonomic group possible. Data that could not be assigned to a particular taxonomic group were categorised as unidentified coccolithophores. We identified a total of 195 taxonomic groups for this dataset (Table A3), ranging from identifications at the sub-species to the family level. Morphotype information is reported for Emiliania huxleyi in only one dataset, and we have therefore chosen to use a single biomass conversion factor for all occurrences of this species. Additionally, 2258 samples consisted of combined counts of coccolithophores without further size or taxonomic information, and 1988 samples contained at least some counts of unidentified or partially identified coccolithophores. For our biomass conversions, we began by converting only cell counts for which full species or sub-species identifications were provided. Each species/subspecies was assigned an idealised shape (e.g. sphere, prolate sphere, cone) based on the work of Hillebrand et al. (1999) and Sun (2003) as well as species descriptions in the literature. We then estimated cell dimensions (e.g. diameter, length, width) for each taxonomic group in order to calculate cell biovolumes (units: µm 3 ).
Cytoplasm dimensions have been published for very few coccolithophore species, with species descriptions usually providing the more easily observed coccosphere dimensions only. Observations of 16 species of coccolithophore from laboratory and field studies show cytoplasm diameter varying from 30 to 90 % of the total coccosphere diameter, depending on the species and level of calcification (Table 2); naked coccolithophores have also been observed for some species, although they are relatively rare in field samples (Frada et al., 2012). While these 16 species represent only a small fraction (10 %) of the species represented in the database, they include some of the more dominant coccolithophores in terms of both abundance and frequency of observation: these 16 species together account for an average of 75 ± 32 % of coccolithophore abundance per sample (median = 92 %), and we therefore consider them to be reasonably representative for the purposes of estimating coccolithophore biomass.
Given the lack of data and the lack of consistency among the few available cytoplasm measurements, we chose to estimate coccolithophore biovolumes by assuming cytoplasm dimensions to be 60 % of the mean coccosphere dimensions for all species -this value represents the midpoint of observed ratios of cytoplasm to total coccosphere diameter. These calculations can be expected to overestimate organic biomass for species with a higher ratio of coccosphere to cytoplasm volume, and underestimate biomass for species with a lower ratio. Biovolumes are calculated based on the mid-point of coccosphere dimensions. Uncertainty ranges are provided using biovolumes and biomasses calculated from 0.6 × minimum coccosphere dimensions and 0.6 × maximum coccosphere dimensions.
The range of coccosphere dimensions (e.g. diameter, length, width) for each species or sub-species in the database was determined based on a literature survey (Table A3). For some datapoints, coccosphere dimensions were provided alongside abundance data. In these cases the provided measurements were used in preference to our literature-based values. Biovolume estimates were then further converted to carbon biomass (units: µg C L −1 ) using the prymnesiophytespecific conversion factor developed by Menden-Deuer and Lessard (2000). Biovolume and biomass values based on the mid-point are hereafter referred to as "mean" biovolume and biomass. We assess the likely over-or under-estimation of our mean biomass estimates for different species of coccolithophore through a comparison with direct biomass measurements as well as biomass values calculated from measured cytoplasm dimensions for 16 species (Table 2).
For 23 species only a single set of dimensions or a single biovolume value was reported in the literature. In these cases, we have assumed the reported values to be the mean estimates. Minimum and maximum biovolume values were estimated for these species based on the ratios of minimum and maximum biovolume to mean biovolume observed for all other species in the database. These ratios were found to be 0.5 (± standard deviation of 0.2) for minimum biovolume/mean biovolume, and 2.1 (±0.8) for maximum biovolume/mean biovolume. For cell counts with identifications only to the level of genus or family, or for combined counts of multiple species, we calculate minimum and maximum biomass values per cell based on the absolute minimum and maximum of all species reported for that taxonomic group.
Mean biomass values per cell were calculated by taking the Earth Syst. Sci. Data, 5, 259-276, 2013 www.earth-syst-sci-data.net/5/259/2013/ mean of all reported biomass values for species within the taxonomic group. Taking the mean of the biomass values avoided weighting mean biomass values towards a single large species. For some genera, however, insufficient specieslevel data were available to calculate biomass using this approach. In these cases we were able to obtain a range of coccosphere dimensions from the literature, and calculated biovolumes and biomasses based on the mid-point of these values as detailed above for the species-specific cell counts. For cell counts of unidentified coccolithophores, we have chosen to use a spherical coccosphere with diameter of 10 µm (cell diameter of 6 µm) to calculate our mean biovolume and biomass estimates. This value was selected based on the diameters of species most commonly occurring in the database. The large uncertainty associated with this value is taken into account by providing minimum and maximum biovolume and biomass estimates based on the absolute minimum and maximum values across all species in the database. Following the biomass conversions, data were compiled to total coccolithophore biomass per sample for the purposes of further analyses. Further taxonomic information is reported in the attached dataset (doi:10.1594/PANGAEA.785092) and coccolithophore biodiversity patterns will be discussed in O' Brien et al. (2013).

Quality control
Our quality control procedure flagged data based on a number of criteria, with flag values (1-4) provided in the data table. Flag 1 was applied to 33 samples that included obser-vations of the species Thoracosphaera heimii -this species was originally thought to be a coccolithophore, but further investigations have shown it to be a calcified dinoflagellate cyst (Tangen et al., 1982). Flag 2 was applied to 205 samples for which only biomass values were provided, without corresponding cell counts; and flag 3 is applied to 482 samples with integrated water column values rather than discrete depth measurements, or to samples for which no depth information was provided. Flag 4 was assigned to outliers identified by the statistical analyses to be outlined below.
For the next stage of the quality control process, we removed samples with flags 2 and 3 and corrected samples with flag 1 to remove counts of T. heimii. For the remaining 9194 non-zero samples, we used Chauvenet's criterion to identify statistical outliers in the log-normalized biomass data (Buitenhuis et al., 2013;Glover et al., 2011). Based on this analysis, we identified one sample with a biomass value with probability of deviation from the mean greater than 1/2n, with n = 8997 being the number of non-zero samples (two-sided z score: |zc| = 4.03). This sample is denoted by a flag value of 4.
An additional flag column denotes the quantification method used for determining coccolithophore abundance. Of the 9193 non-zero samples included in the database, 4209 are known to have been analysed using light microscopy, 500 using SEM and 197 with flow cytometry. For the remaining 4287 the method is unknown. Coccolithophore counts from SEM are consistently higher than those obtained using light microscopy due to the better identification of smaller and more fragile species. For example, Bollmann . Horizontal lines depict the median, boxes depict the interquartile range (25th to 75th percentiles) and points marked beyond the whiskers of the plot are outliers (points falling greater than 1.5 times the interquartile range below the 25th percentile or above the 75th percentile).
18 Figure 1. Boxplots depicting distributions of non-zero biomass estimates for different quantification methods: light microscopy (LM), scanning electron microscopy (SEM), unknown method and flow cytometry (FC). Horizontal lines depict the median, boxes depict the interquartile range (25th to 75th percentiles) and points marked beyond the whiskers of the plot are outliers (points falling greater than 1.5 times the interquartile range below the 25th percentile or above the 75th percentile). et al. (2002) found that species such as syracosphaerids, small reticulofenestrids, small gephyrocapsids and holococcolithophores are likely to be missed in light microscopy analyses. Cell density has been shown to differ up to 23 % between the two methods when analysing samples with large numbers of small species such as E. huxleyi, Gephyrocapsa ericsonii and G. protohuxleyi.
We have made a statistical comparison of abundance and biomass values to determine whether a systematic bias can be associated with the enumeration method for samples in our database (Table 3, Fig. 1). Our comparison of coccolithophore abundance and biomass shows greater differences between methods than would be expected from previous comparisons of enumeration methods, but we suggest that these differences are likely to be at least partially explained by real differences in coccolithophore abundance and community composition. For example, we expect that SEM is more likely to be used for samples with a known portion of small coccolithophores which are difficult to identify or enumerate using light microscopy alone. Although median biomass from SEM studies is higher by a factor of four than the median for light microscopy studies, the highest values reported in the dataset are from light microscopy studies. Since the quantification method is unknown for nearly 50 % of samples, we have chosen to retain SEM data in the gridded dataset and all analyses, though users may access a subset of this data from the raw file. In contrast, we have excluded 199 datapoints collected using flow cytometry from the gridded dataset. These values are significantly higher again than those collected using either SEM or light microscopy.  Based on our full quality control procedure we removed a total of 888 flagged samples for the purposes of our analyses, and a further 32 samples were corrected to remove the contribution of T. heimii to total coccolithophore biomass (note: one sample contained data for T. heimii only). All data are included in the published raw dataset in the event that a user has different requirements for the quality control procedure, while the gridded dataset contains the unflagged datapoints only.
An additional column in the raw dataset denotes the taxonomic level to which coccolithophores are identified, as this has a major influence on the level of uncertainty associated with our biomass calculations. Coccolithophores identified to species level are denoted by the flag value 0, those identified to genus or family level as flag value 1, and unidentified coccolithophores as flag value 3. If coccosphere dimensions are known, cells identified to genus or family level receive flag value 2, and unidentified coccolithophores receive flag value 4. All samples of unidentified or partially identified coccolithophores have been included in our analyses and in the gridded file.
Several datasets report biomass values in addition to abundance data. While we have chosen to use our own conversion methods for consistency, it is likely that the original biomass values are based on more accurate estimates of cell size. All original biomass values are included in the submitted database and can be substituted for our estimates if desired.

Results
Excluding flagged data, the database contains coccolithophore biomass observations for 11 503 samples, collected from 6741 depth-resolved stations (Fig. 2). Highest coccolithophore abundance is 9.8 × 10 6 cells L −1 . 2507, or 21.8 % of samples, were found to be zero values. These data were retained in the dataset, since confirmed zero values hold valuable information for the study of plankton distributions. There is, however, inconsistency in the reporting of zero values in plankton datasets: often abundance data are reported only for a limited range of target groups that are expected to be present. There is also likely to be a bias due to sampling focusing on areas where coccolithophores are expected to occur. Values reported in the subsequent sections are therefore calculated based on non-zero data only. Where zerodatapoints are included, this value follows in parentheses.
Arithmetic mean values are reported plus or minus one standard deviation. We also provide median biomass values, as these are less influenced by high values and provide a better representation of the central tendency of the data.

Spatial and temporal coverage
The database includes non-zero coccolithophore observations from the surface to a depth of 500 m (Fig. 3b, (Table 4). 31.6 % of nonzero data are from the Atlantic Ocean, 40.2 % from the Pacific Ocean and 10.4 % from the Indian Ocean. Despite the high number of observations reported from the Pacific compared to the Atlantic, the spatial coverage of this ocean basin is relatively poor, with many observations limited to intensively studied regions in Peruvian and Japanese coastal waters. 9.9 % of non-zero observations are from the polar re-  gions, with 5.1 % from the Southern Ocean and 4.8 % from Arctic waters. Coccolithophores are reported to be present in only one sample below 60 • S (   Strong differences can be observed between the Atlantic and Pacific Ocean, with Atlantic biomass values reaching 127.2 µg C L −1 (mean 1.7 ± 7.5, median 0.12 µg C L −1 ) compared to just 20.0 µg C L −1 in the Pacific (mean 0.3 ± 0.9, median 0.04 µg C L −1 ). The relatively poor spatio-temporal coverage of Pacific Ocean observations, however, may contribute to this discrepancy. Indian Ocean biomass values reach a maximum of 45.2 µg C L −1 , with a mean of 1.1 ± 3.4 and median of 0.03 µg C L −1 .
In the Southern Ocean, the maximum biomass value reported is 6.5 µg C L −1 , mean biomass is 0.19 ± 0.58 µg C L −1 and median biomass is 0.04 µg C L −1 . Higher values are recorded in the Arctic Ocean, with a maximum of 98.9 µg C L −1 , mean of 0.78 ± 5.7 µg C L −1 and median of 0.05 µg C L −1 .

Depth distribution
Highest biomass values are reported in surface waters and decline with depth (Figs. 3b, 6), although biomass values of up to 23 µg C L −1 are still reported at 100 m depth. Mean biomass for the surface layer (0-10 m) is 0.9 ± 5.2 µg C L −1 and median biomass is 0.09 µg C L −1 . Biomass values below 200 m reach a maximum of 0.01 µg C L −1 . The deepest observations of coccolithophores are at 500 m depth, with biomasses reaching a maximum of just 0.004 µg C L −1 .

Seasonal distribution
The data show a clear seasonal cycle in the Northern Hemisphere, with biomass values reaching just 1.1 µg C L −1 in December and over 100 µg C L −1 in the summer months (June-July, Fig. 7). In the Southern Hemisphere the seasonal cycle is less evident, possibly due to the greater contribution of data from low latitudes where seasonal changes are less pronounced.

Uncertainty
The expected uncertainty associated with our conversions of cell abundance to carbon biomass due to varying cell size is depicted in Fig. 8. Biomass estimates are best constrained where detailed taxonomic information is available, and for samples containing species for which a limited size range has been reported. Very high uncertainty (range of biomass values greater than 5000 % of the mean biomass) is associated with counts of unidentified coccolithophores. This is to be expected given the large range of sizes reported for the approximately 200 known coccolithophore species (see Appendix Table A3).
Latitudinal band All data Non-zero data Mean S.D. Median Max An additional source of uncertainty, however, is the estimation of cell biovolumes from coccosphere dimensions, and is more difficult to quantify. A comparison of our biomass estimates based on coccosphere dimensions with estimates from available cytoplasm dimensions suggests that we may be underestimating coccolithophore biomass values by a factor of up to 5 for some species (Table 2). It is worth noting, however, that the cytoplasm dimensions considered here are based on either culture specimens (Stoll et al., 2002) or a small number of field samples from the Icelandic Basin (Poulton et al., 2010) and the Mauritanian Upwelling (Franklin et al., 2009). For one of the best-studied species, E. huxleyi, our mean biomass estimate of 13 pg C cell −1 falls within the range of published carbon measurements of 7.8 to 27.9 pg C cell −1 (Fernandez et al., 1993;van Bleijswijk et al., 1994;Verity et al., 1992), while our estimates from the cytoplasm measurements in Table 2 show much lower values of 3.5-3.7 pg C cell −1 .

Discussion
There are many sources of uncertainty associated with our calculations. We have attempted to quantify the uncertainty associated with variable cell dimensions by providing minimum and maximum biomass values for each datapoint, but this does not represent the full range of uncertainty associated with our biomass values.
The estimation of cell biovolumes from coccosphere dimensions is likely to result in additional errors which are at present difficult to quantify. A more accurate estima-tion of coccolithophore biomass will only be possible with improved understanding of coccolithophore cytoplasm dimensions (e.g. Stoll et al., 2002), and we highlight this as a key data requirement for improved estimates of coccolithophore biomass from abundance data. While the routine measurement of coccolithophore cell dimensions is a timeconsuming process, there also appears to be potential to estimate cell size from coccolith length (Henderiks and Pagani, 2007;Henderiks, 2008).
Few observations of coccosphere dimensions are reported in the literature for most species, and the number of cells that have been studied to derive the given ranges is rarely reported. Measurements are often from a single geographical location, meaning that size variation between strains is not accounted for. There is additionally inconsistency as to whether the range of coccosphere sizes reported is the full range of sizes that occurs or only those most commonly observed. A further source of uncertainty is the generalisation of at times complex geometry to fit a particular geometric form.
The uncertainty ranges provided around our biomass estimates are intended to reflect the influence of cell size on coccolithophore biomass. Since these are based on cytoplasm dimensions estimated from total coccosphere size, it is unclear whether biomass values towards the high end of our uncertainty range are biologically realistic. We may expect larger coccospheres to be characterised by a greater proportion of inorganic carbon rather than reflecting a constant ratio of cytoplasm : coccosphere dimensions.
In addition to the errors introduced by the biomass conversion process, a considerable degree of uncertainty is already associated with the cell abundance data. Coccolithophores can be quantified using several techniques, including visual or automated identification from scanning electron microscopy, regular light microscopy and light microscopy using cross-polarised light. Additionally, samples can be prepared for light microscopy either by filtration or by using the Utermöhl sedimentation method (Utermöhl, 1958). Reid (1980) and Bollmann et al. (2002) both concluded that inverted light microscopy is unreliable for determining cell densities of small coccolithophores.
Despite these limitations, the Utermöhl method of sedimentation and inverted light microscopy remains widely used in studies investigating phytoplankton assemblages, and any compilation of global coccolithophore distributions would be incomplete without these data. Cell counts from SEM can additionally be unreliable at high cell densities, where shedded coccoliths can lead to difficulties in distinguishing individual coccospheres (A. Poulton, personal observation). The synthesis of datasets obtained from these different methods would be greatly improved by further comparative studies similar to those carried out by Bollmann et al. (2002), as it is currently unclear to what extent small and rare species are being overlooked in different ocean regions as a result of these methodological differences.
Users of the gridded data file should also take into consideration the sparse nature of the original data. Often monthly mean gridded values have been derived from relatively few individual datapoints that do not represent the full range of values that occur in a given location. We expect to see a bias toward higher biomass values, given that studies are often conducted in locations and times of year when blooms are expected to occur.
We have not included estimates of inorganic carbon content in the database, as we do not feel that useful estimates of coccolithophore calcite can currently be provided from the abundance data. The ratio of inorganic : organic carbon has been shown to vary considerably with environmental and growth conditions (Zondervan, 2007), with ratios for the species E. huxleyi alone ranging from 0.26 to 2.3 (van Bleijswijk et al., 1994;Paasche, 2002). While some estimates have been made of the relationship between inorganic and organic carbon for E. huxleyi-dominated communities (e.g. Fernández et al., 1993;Poulton et al., 2010), the relationship of calcite content to biomass for other coccolithophore communities remains less well understood.
The biomass estimates presented here represent a first attempt to assess global coccolithophore biomass distributions. While we recognise that the uncertainties associated with these biomass estimates are significant, we nevertheless feel that they provide a more informative dataset than would a compilation of abundance data alone given the large size variation among coccolithophore species. The coccolithophores present particular challenges for the compilation and synthesis of diverse datasets due to the wide range of methods used for their quantification as well as the limited understanding of cell dimensions. The strong biases associated with the different methods highlight the need for coccolithophore abundance data to be published alongside appropriate metadata to allow users to assess data quality. This database represents the largest effort to date to compile coccolithophore abundance observations and provide standardised biomass estimates to the scientific community. We report our biovolume and biomass conversion procedures in detail and discuss the associated uncertainties. We anticipate that this dataset, together with others from the MAREDAT special issue, will be a valuable resource for studies of plankton distributions and ecology and in particular for the evaluation and development of marine ecosystem models. While data are clearly lacking for certain regions, the dataset nevertheless represents the largest available compilation of global coccolithophore abundance and biomass. We hope to improve the spatial and temporal coverage of the dataset as well as the accuracy of biomass conversions as additional data become available in the future.

A1 Data table
A full data table containing all biomass data points can be downloaded from the data archive PANGAEA (doi:10.1594/PANGAEA.785092). The data file contains longitude, latitude, depth, sampling time, abundance counts and biomass concentrations, as well as the full data references.

A2 Gridded netcdf biomass product
Monthly mean biomass data have been gridded onto a 360 × 180 • grid, with a vertical resolution of 33 depth levels (equivalent to World Ocean Atlas depths) and a temporal resolution of 12 months (climatological monthly means). This dataset is provided in netcdf format for easy use in model evaluation exercises. The netcdf file can be downloaded from PAN-GAEA (doi:10.1594/PANGAEA.785092). This file contains total and non-zero abundance and biomass values. For all fields, the means, medians and standard deviations resulting from multiple observations in each of the 1 • pixels are given. The ranges in biomass values due to uncertainties in cell size are not included as variables in the netcdf product, but are given as ranges (minimum cell biomass, maximum cell biomass) in the data table.