Data compilation on the biological response to ocean acidification : an update

The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation 5 years since its first description by Nisumaa et al. (2010). Most of the study sites from which data have been archived are in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans is still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcome shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.


Introduction
The release of carbon dioxide (CO 2 ) into the atmosphere by human activities results in an increased flux of CO 2 into a mildly alkaline ocean, resulting in an increase in the concentration of inorganic carbon, and a reduction in pH, carbonate ion concentration, and the capacity of seawater to buffer changes in its chemistry.These changes are collectively known as ocean acidification (Gattuso et al., 2014).
Investigations of the effect of ocean acidification on marine organisms and ecosystems have a relatively short history.A wide range of sensitivities to projected rates of ocean acidification exists within and across diverse groups of organisms, with a trend for greater sensitivity in early life stages (Kroeker et al., 2013).Several meta-analyses reveal a pattern of positive and negative impacts (Gattuso et al., 2014) but key uncertainties remain in our understanding of the impacts on organisms, life histories and ecosystems.
The number of papers addressing biological responses to ocean acidification has increased steeply in the past decade, from 18 papers per year in 2004 to 365 papers in 2014 (Gattuso and Hansson, 2011;OA-ICC bibliographic database, www.tinyurl.com/oaicc-biblio).It is challenging to compare and synthesize the results of these papers for two reasons.First, data are not easily discoverable and accessible because they are either not archived or archived in different data repositories in varying formats.Second, the carbonate chemistry and ancillary data are often reported in different units and scales, and calculated using different sets of constants, making the carbonate chemistry across studies inconsistent.For example, it is crucial to report the pH scale used, since pH reported on the free scale is about 0.11 to 0.12 units higher than on the total and seawater scales, respectively (Zeebe and Wolf-Gladrow, 2001).
A compilation of ocean acidification biological response data was initiated by the European Network of Excellence for Ocean Ecosystems Analysis (EUR-OCEANS) and the European Project on Ocean Acidification (EPOCA) in 2008 (Nisumaa et al., 2010).This effort ended in 2012 when the EPOCA project came to an end but was resumed in the framework of the Ocean Acidification International Coordination Centre (OA-ICC) in 2013, in close collaboration with Xiamen University and the world data centre PANGAEA.The goal of the OA-ICC data compilation is to gather data on the biological response to ocean acidification (carbonate chemistry, biogeochemical processes and ancillary data) from published articles and to make them available in a coherent format to the scientific community.As part of the effort, the carbonate system variables are recalculated in a consistent way.Data from papers that report at least two carbonate chemistry parameters, as well as temperature and salinity, are included in the compilation.The data compilation has been very well received by the ocean acidification community.It has been used in three meta-analyses (Kroeker et al., 2010(Kroeker et al., , 2013;;Liu et al., 2010), a modelling study (Muller and Nisbet, 2014) and cited in six other publications (Fiorini et al., 2011;Hendriks and Duarte, 2010;Hoppe et al., 2012;Koeve and Oschlies, 2012;Meyer and Riebesell, 2014;Rokitta, 2012).We report here on the developments of this database 5 years since its first description by Nisumaa et al. (2010).This update is timely, since 500 data sets from 439 papers (81 % of the total number included in the compilation) have been archived since the publication of Nisumaa et al. (2010).

Data
The compilation process described in Nisumaa et al. (2010) was followed to maintain consistency.Briefly, papers on the biological response to ocean acidification were identified by searching the OA-ICC news stream (http:// news-oceanacidification-icc.org/) or through the OA-ICC bibliographic database for older papers.Papers were excluded from the compilation when they only used levels of partial pressure of carbon dioxide (pCO 2 ) or pH which are not consistent with present-day levels or future scenarios.For example, data collected with a control condition with pCO 2 values below about 100 µatm or above 1700 µatm or pH T (on the total scale) values below about 7.5 or above 8.5 were excluded unless they are environmentally relevant at the study location.Data were requested from the authors by email.Data were either provided by authors, or ex-Figure 1. Cumulative number of papers for which data have been included in the compilation ("archived"), papers for which data could not be obtained ("not obtained"), papers which reported less than two carbonate system parameters ("incomplete") and papers for which the data have been lost ("lost").
tracted from tables in the original paper, or downloaded from other data repositories such as the Biological and Chemical Oceanography Data Management Office (BCO-DMO), the British Oceanographic Data Centre (BODC) and the Australian Antarctic Data Centre (AADC).All the data sets archived in the framework of the OA-ICC (since 2013) as well as the projects EPOCA/EUR-OCEANS (2008-2012) have been given the tag "Ocean Acidification International Coordination Centre (OA-ICC)" in PANGAEA.
Between the beginning of this activity in 2008 and January 2015, 1026 relevant papers were identified.581 data sets (over 4 000 000 data points) were archived from 539 of these papers.Data from a paper can be archived as several data sets (e.g.http://doi.pangaea.de/10.1594/PANGAEA.777725), which explains why out of 539 papers, there were 581 data sets archived.Data of the remaining 487 papers could not be added to the compilation for the following reasons (Fig. 1): (1) less than two carbonate system parameters were measured in 176 papers, preventing the recalculation of the carbonate chemistry; (2) data from 295 papers could not be obtained from the authors; and (3) data from 16 papers were lost by authors.
The first papers having investigated a biological response to decreased pH (e.g.Vernon, 1895) predate the definition of the pH scale by Sørensen (1909) and are obviously not amenable to be included in the data compilation.The earliest data included in the compilation were published in 1967 (Traganza, 1967).No data from the papers published during 1968-1993 could be archived because data were lost or could not be obtained.Data from 41 papers published during 1994 and 2006 were archived.The quantity of data archived for a given year follows the increase in publication rate of bio-logical response papers, with data from 11 papers archived for 2007 to 110 papers for 2014.More than 90 % of the data archived in the compilation come from papers published after 2007.
All the references of papers included in the data compilation have been tagged with the keyword "OAICCdb" in the OA-ICC bibliographic database (www.tinyurl.com/oaicc-biblio).The OA-ICC bibliographic database was used to retrieve statistical information on the type of papers from which the data archived originated.Keywords describing the geographical region, type of organism and biological process studied, other factors manipulated, as well as the country of affiliation of the first author were extracted from the bibliographic database for statistical analysis.Results are presented as the percentage of papers from which data were archived.Information on salinity, temperature and carbonate chemistry data archived as part of the data compilation were extracted from the PANGAEA data warehouse.The analyses of these parameters were based on the percentage of data sets or count of data points.

Geographical coverage
In the OA-ICC data compilation, the location of study sites indicates where the studied organisms were collected or the location of the natural communities investigated.The geographical location is not always clearly indicated in the papers, and the bibliographic database does not provide geographical information for experiments using organisms which have been cultured for a long time in the laboratory or collected from commercial hatcheries.The geographical areas which are best covered are the North Atlantic Ocean, North Pacific Ocean, South Pacific Ocean and Mediterranean Sea (33, 19, 17 and 10 %, respectively, of the papers indicating a geographical location).The Arctic Ocean, Baltic Sea, Southern Ocean, Red Sea, Indian Ocean and South Atlantic Ocean collectively represent only 21 % of the papers.In summary, most of the study sites to date have been in the Northern Hemisphere, the number of studies from the Southern Hemisphere and polar oceans are relatively low (Fig. 2).Data from 35 studies performed in the Arctic Ocean were archived, 22 of them from the EPOCA mesocosm experiment carried out in Kongsfjorden (Spitsbergen) in 2010 (Riebesell et al., 2013).There were data from 12 papers from the Southern Ocean, all added after the publication of Nisumaa et al. (2010).This remains a low number considering the fact that polar regions are particularly vulnerable to ocean acidification (Orr et al., 2005;Steinacher et al., 2009).

Taxonomic coverage
Phytoplankton (in 20 % of the papers in the data compilation) and corals (18 %) are still the best represented taxonomic groups (Fig. 3), although their percentages came down from, respectively, 39 and 29 % of the papers from which data were archived before 2010 (Nisumaa et al., 2010).The relative number of papers studying other taxonomic groups went up: molluscs (8 to 16 %), macroalgae (7 to 11 %), crustaceans (5 to 8 %), echinoderms (3 to 8 %) and fish (2 to 6 %).Sixty studies (11 %) investigated the response of a mix of organisms or natural communities, while there were no such studies from which data were archived before 2010, indicating a clear shift away from the study of individual organisms to communities and ecosystems, as has been recommended to close existing gaps in ocean acidification science (Riebesell and Gattuso, 2015).This is important as data on competitive and trophic interactions are key to better predict future impacts of ocean acidification.The amount of papers concerning prokaryotes, protists, zooplankton and others (annelids, virus etc.) are relatively low (less than 8 % for each species).

Biological processes
The most studied biological processes are morphology, calcification and growth, representing 33, 30 and 26 % of the papers from which data were archived (Fig. 4).Physiology, photosynthesis, mortality, reproduction and respiration are also well represented (13 to 23 %), followed by primary production, performance, dissolution and nitrogen fixation.Other processes such as community composition, abundance, adaption, nutrient uptake, toxicity and bleaching are represented in 26 % of the papers from which data were archived (see detail description of keywords in the user instructions of the OA-ICC bibliographic database, www.tinyurl.com/oaicc-biblio).All processes, except calcification and primary production, are better represented today than they were before 2010, indicating that the initial imbalance (of considerably more studies on calcification and primary production than on other processes) has improved.

Multiple factors
There is a clear tendency towards more multifactorial studies after 2010.Even though the majority of the compiled papers have only manipulated the carbonate chemistry, their relative contribution in the data compilation decreased from 81 (pre-2010) to 69 %.The main other factors studied in addition to the carbonate system are temperature (14 % of papers), nutrients (6 %) and light (5 %) (Fig. 5).Few papers manipulated oxygen, metal or toxic compounds.Twenty papers have also reported combined effect of changes in carbonate chemistry with two or three other factors.In general, data from studies on multifactorial impact are still limited in the data compilation.It is one of the major challenges faced by future ocean acidification research (Riebesell and Gattuso, 2015).

Countries of first-author affiliation
Based on first-author affiliation, most of the data compiled are from articles contributed by European countries (59 %; Fig. 6).Within Europe, 41 % of the papers were published by German scientists and 19 % by UK scientists.The USA (18 %) and Australia (8 %) also contribute a lot to the data compilation.There are still data from many papers which could not be obtained from the authors from these two countries (87 and 49 papers, respectively).

Measured carbonate chemistry variables
A complete and consistent set of carbonate system variables was calculated by the R package seacarb (Gattuso et al., 2015) as described by Nisumaa et al. (2010), in order to provide coherent information on the chemistry1 .The recalculated parameters were archived together with the original ones and were flagged accordingly ("Calculated using seacarb after Nisumaa et al., 2010" together with a reference to the data curator responsible for the recalculation).Total alkalinity (A T ) is the carbonate chemistry variable that is the most measured (79 % of the data sets; Fig. 7).The other variables measured include pH (70 %), dissolved inorganic carbon (C T , 36 %) and the partial pressure of carbon dioxide (pCO 2 , 8 %).Out of the 70 % data sets that measured pH, 38 % reported pH on the National Bureau of Standard (NBS) scale (also referred to as the National Institute of Standards and Technology, NIST, scale), 30 % on the total scale, 1 % on the seawater scale (SWS) and 1 % on the free scale (Fig. 7).Although the number of data sets with pH reported on the NBS scale are still more than on the total scale, the ratio of them has been decreased from 2.2 (pre-2010) to 1.2.The pH value on the total scale is 0.15 units lower than on the NBS/NIST scale and 0.01 units higher than on the SWS scale (Dickson, 2010), which makes the direct comparison of experimental results difficult.All other scales are converted to the total scale in this compilation as recommended in the Guide to Best Practices in Ocean Acidification Research and Data Reporting (Dickson, 2010).

Temperature and salinity coverage
The temperature values reported range from −3 to 34 • C but most temperature data are between 18 and 21 • C (Fig. 8a) and few values are below 0 • C or above 30 • C. Salinity ranges from 4 to 65 with 60 % of the data points ranging between 32 and 36 (Fig. 8b).The geographical distribution of the papers considered (Fig. 2) shows that the study sites are distributed over all the oceans of the world, including the high-salinity Red Sea and the low-salinity Baltic Sea.High-salinity lagoons and low-salinity estuarine areas have also been studied, which can explain the large range of salinity in the compilation.There are 36 data points with salinity greater than 50 -they were derived from paper on brine algae collected from sea ice in Antarctica (McMinn et al., 2014).3 ) data included in this compilation span the range of the average open-ocean surface values from the Last Glacial Maximum (LGM) to 2100 (Fig. 8).The largest number of data points are comprised between pH T 7.65 to 7.95, pCO 2 500 to 1000 µatm and C T 2100 to 2200 µmol kg −1 (Fig. 8c, d, f).This indicates that most studies used carbonate chemistries that are environmentally relevant.Ambient pCO 2 level is often used as the control treatment, so many pH T ,pCO 2 and C T data points are close to the present-day values (8.07, 384 µatm and 2064 µmol kg −1 ).Few data points are close to the value during LGM and 1766, because few studies used preindustrial and "glacial" pCO 2 level (respectively 267 and 180 µatm) as treatments.Some data points are indicative of carbonate chemistry that are not environmentally relevant (i.e.pCO 2 below 150 µatm or above 2000 µatm, pH T below 7.35 or above 8.45).This is because in some studies they not only used pCO 2 and pH T relevant to past or for future scenarios as treatments, but also used extreme pCO 2 and pH T as treatments.Since A T is unaffected by the uptake of CO 2 , it has been recommended to keep it constant in perturbation experiments, for example by bubbling seawater with CO 2 enriched air (Gattuso and Lavigne, 2009).In the OA-ICC data compilation, most of A T data points range from 2250 to 2350 µmol kg −1 , which is close to the average present-day open-ocean surface values (2325 µmol kg −1 ) (Fig. 8e).However, some studies manipulated the carbonate chemistry by adding strong acids and bases without restoring total alkalinity (see Gattuso et al., 2010), which significantly altered A T .The number of this kind of study has decreased since the publication of the Guide to Best Practices in Ocean Acidification Research and Data Reporting (Cornwall and Hurd, 2016).In addition, some low-A T data points were archived from stud-   1.1 of Gattuso and Hansson, 2011).
Earth Syst.Sci.Data, 8, 79-87, 2016 www.earth-syst-sci-data.net/8/79/2016/ ies performed in low-salinity areas, such as the Baltic Sea and estuarine areas.The concentration of CO 2− 3 , as well as a and c exhibit similar distribution patterns, with most number of data points close to their average value of surface seawater for 2100 (Fig. 8g, h, i).

Recommendations
As a new scientific field such as ocean acidification develops, the amount of scientific data grows considerably and the need for data archiving greatly increases to ensure that data are easily accessible for analysis and reuse.Many journals, such as the Proceedings of the Royal Society B -Biological Sciences, PLoS ONE and Nature, require or encourage authors to archive data in public data repositories.Several data journals were launched to help this process.For example, the present journal was launched by the European Geosciences Union in 2008 and Scientific Data of the Nature publishing group in 2014.Many funding agencies also have requirements on data archiving.The European Research Council (ERC) mandate funded scientists to deposit primary data in relevant databases as soon as possible after data collection.In the United States, researchers seeking funding from the National Science Foundation (NSF) are required to submit a data management plan as a supplement to the grant application.Research projects such as the European FP7 project EPOCA have provided guidelines on data reporting and discussed the benefits of data archiving (Pesant et al., 2010).Although data archiving is not only of benefit for the scientific community but also a great way to advertise scientific work, get more citations (Piwowar et al., 2007) and initiate new collaborations, it is a difficult task to collect data from authors.It is a slow process and only data from 53 % of the relevant papers could be included in the compilation.We therefore encourage the international ocean acidification community to actively participate in the data archiving effort.
Another challenge is the use of consistent variable names as papers sometimes report the same variable with different names, for example "respiration" and "oxygen consumption".It is recommended to develop standard vocabularies describing the variables.And the standard vocabularies should be used in the metadata template to ensure effective data searching (Jiang et al., 2015).At an expert meeting on the management of ocean acidification biological response data organized by the Ocean Acidification International Coordination Centre in 2014, ocean acidification scientists and data managers from major data centres agreed to develop a list of the most common biological parameters from ocean acidification studies (Hansson et al., 2014).Journals are invited to make sure that variables of the carbonate chemistry are reported in full and with the required level of detail (see Supplement).Furthermore, some data have previously been published in PANGAEA by data curators of other projects (e.g. the project "Biological Impacts of Ocean Acid-ification", BIOACID), or in other databases (e.g. the Biological and Chemical Oceanography Data Management Office, British Oceanographic Data Centre, Australian Antarctic Data Centre), which induced duplicates when they were archived again for the OA-ICC data compilation with recalculated carbonate chemistry data.To detect duplication, we recommend that the community creates unique identification for data sets on the biological response to ocean acidification.And it is hoped that all projects and databases can define best practices for archiving ocean acidification data and cooperate to avoid duplication.To this end and to meet the need for easier and more effective access to ocean acidification data, data managers of various institutions are planning to create a joint portal ("one-stop shop") for access to global ocean acidification data by linking different data repositories (Garcia et al., 2015;Hansson et al., 2014).The Ocean Acidification International Coordination Centre recently organized a follow-up workshop to the last meeting in 2014 to work on the technical details of the one-stop shop data portal.

Figure 2 .
Figure 2. (a) Location of organisms and natural communities from which data has been archived and (b) distribution of locations according to latitude.

Figure 3 .
Figure 3. Taxonomic coverage of the papers from which data have been included in the OA-ICC data compilation, compared to those archived prior to 2010.Categories marked with * were not present in Nisumaa et al. (2010).

Figure 4 .
Figure 4. Biological processes reported in the papers from which data have been included in the OA-ICC data compilation, compared to those archived prior to 2010.Other processes studied in the papers from which data were archived before 2010 comprise variables such as shell length, width and linear extension of molluscs, bleaching, invasion, orientation, different blood cell concentration, concentration in the tissues of organisms, rate of nitrogen fixation(Nisumaa et al., 2010).The categories marked with * were not present inNisumaa et al. (2010) or their definition was different."Morphology" was included in "Other processes" inNisumaa et al. (2010) whereas it is considered as a distinct category in the present paper.

Figure 5 .
Figure 5. Papers that have manipulated the carbonate chemistry alone and those that have manipulated the carbonate chemistry as well as other variables.CC, carbonate chemistry.The categories marked with * were not present in Nisumaa et al. (2010).

Figure 6 .
Figure 6.Countries of affiliation of first author of papers from which data were archived and papers for which data could not be obtained.

Figure 7 .
Figure 7. Variables of the seawater carbonate system reported in the data sets.

2. 8
Carbonate chemistry variables coverage pH T , pCO 2 , A T , C T , aragonite ( a ) and calcite ( c ) saturation states and carbonate ion (CO 2−