Global Inventory of Gas Geochemistry Data from Fossil Fuel, Microbial and Burning Sources, version 2017

. The concentration of atmospheric methane (CH 4 ) has more than doubled over the industrial era. To help constrain global and regional CH 4 budgets, inverse (top-down) models incorporate data on the concentration and stable carbon ( δ 13 C) and hydrogen ( δ 2 H) isotopic ratios of atmospheric CH 4 . These models depend on accurate δ 13 C and δ 2 H end-member source signatures for each of the main emissions categories. Compared with meticulous measurement and calibration of isotopic CH 4 in the atmosphere, there has been relatively less effort to characterize globally representative isotopic source signatures, particularly for fossil fuel sources. Most global CH 4 budget models have so far relied on outdated source signature values derived from globally non-representative data. To correct this deﬁciency, we present a comprehensive, globally representative end-member database of the δ 13 C and δ 2 H of CH 4 from fossil fuel (conventional natural gas, shale gas, and coal), modern microbial (wetlands, rice paddies, ruminants, termites, and landﬁlls and/or waste) and biomass burning sources. Gas molecular compositional data for fossil fuel categories are also included with the database. The database comprises 10 706 samples (8734 fossil fuel, 1972 non-fossil) from 190 published references. Mean (unweighted) δ 13 C signatures for fossil fuel CH 4 are signiﬁcantly lighter than values commonly used in CH 4 budget models, thus highlighting potential underestimation of fossil fuel CH 4 emissions in previous CH 4 bud-get models. This living database will be updated every 2–3 years to provide the atmospheric modeling community with the most complete CH 4 source signature data possible. Database digital object identiﬁer (DOI): https://doi.org/10.15138/G3201T.


Introduction
Methane (CH 4 ) is a potent greenhouse gas that accounts for approximately 20 % (0.48 W m −2 ) of anthropogenic greenhouse gas radiative forcing in the lower atmosphere (Ciais et al., 2013).Atmospheric CH 4 levels have more than doubled over the industrial era, increasing from about 700 ppb in the year 1750 to > 1800 ppb today (Saunois et al., 2016).At-mospheric CH 4 stabilized from 2000 to 2006 and increased again after 2007 (Nisbet et al., 2014;Dlugokencky et al., 2011).Specific contributions of natural and anthropogenic sources of CH 4 to this renewed increase, and to the global CH 4 budget in general, remain unclear (Kirschke et al., 2013;Saunois et al., 2016).Wetlands and agriculture have been suggested as dominant sources of renewed increases in CH 4 emissions (Dlugokencky et al., 2009(Dlugokencky et al., , 2011;;Bousquet et al., 2011;Bloom et al., 2010;Nisbet et al., 2014;Patra et al., 2016;Schaefer et al., 2016).The recent surge in unconventional oil and gas development in North America and growing awareness of CH 4 emissions from oil and gas infrastructure (Howarth et al., 2011;Karion et al., 2013;Brandt et al., 2014;Bruhwiler et al., 2017) informs alternative explanations for the increase in atmospheric CH 4 (Hausmann et al., 2016;Helmig et al., 2016;Rice et al., 2016).Finally, increasing emissions of coal-related CH 4 , particularly from China (Bergamaschi et al., 2013;Nisbet et al., 2014), and changes in oxidative sinks (Rigby et al., 2017) have been hypothesized as other possible reasons for the recent increase in atmospheric CH 4 .
Despite the critical importance of accurate source signature data, there has been no recent comprehensive effort to define globally representative CH 4 source signatures for the atmospheric modeling community (Table 1).Early studies from the 1980s and early 1990s provided tables of average values for each of the various CH 4 source categories, typically with little or no metadata on sample size or geographic origin (Deines, 1980;Quay et al., 1988;Stevens and Engelkemeir, 1988;Whiticar, 1989Whiticar, , 1993)).Subsequent studies referred back to the original data tables with little accounting of sample size, error and/or range, or geographic and geological representation (e.g., Fung et al., 1991;Levin, 1994;Ferreti et al., 2005;Quay et al., 1999;Mikaloff-Fletcher et al., 2004;Bosuquet et al., 2006).Other top-down studies have often assumed a set of canonical end-member values used in previous modeling studies, without reference to the primary data (Gupta et al., 1996;Tyler et al., 1999;Houweling et al., 2000;Lassey et al., 2007;Neef et al., 2010;Monteil et al., 2011).Moreover, model sensitivity to source signature values is rarely tested (e.g., Schwietzke et al., 2014aSchwietzke et al., , 2016;;Rice et al., 2016).
There is in fact much literature on the molecular and isotopic composition of natural and anthropogenic sources of CH 4 , going back decades.The literature has grown significantly since the early studies of the 1980s from which most canonical source signature values were originally derived.This paper describes a global database of δ 13 C CH 4 , δ 2 H CH 4 , and C 2 H 6 : CH 4 source signatures for fossil fuel, microbial and biomass burning sources of CH 4 compiled from public domain sources.Data distributions are discussed within the context of existing and evolving natural gas genetic origin frameworks (Schoell, 1983;Whitcar et al., 1986;Whiticar, 1989Whiticar, , 1999;;Etiope, 2015;Milkov et al., 2017).The database is intended primarily for use by atmospheric scientists working on top-down modeling of CH 4 emissions on regional to global scales.This "living" database will be updated every 2-3 years so that the modeling community has access to the most up-to-date and comprehensive collection of CH 4 source signature data available.The database may also prove useful for petroleum geoscientists interested in genetic characterization of natural gas across different basins and formations.Hydrogeochemists may use the database for analyzing the origin and fate of hydrocarbon gases in groundwater in specific oil-and gas-producing basins.

Database version
The 2017 version of the source signature database is accessed from the NOAA Earth Systems Research Laboratory with this link: https://www.esrl.noaa.gov/gmd/ccgg/arc/?id=123.This version supersedes an earlier version (Sherwood et al., 2016) published as a complement to Schwietzke et al. (2016).Whereas the previous version reported values of δ 13 C CH 4 only, the 2017 version expands the range of geochemical parameters, as described in Sect.2.4 below.Other minor changes to the database are noted in the database "Readme" file.

Types of gas
The database is separated into fossil fuel and non-fossil fuel sources of CH 4 .Fossil fuel sources comprise conventional natural gas, coal gas, and shale gas.Shale gas is included as a separate category because of growing interest in CH 4 emissions associated with this form of unconventional gas production.Both conventional and shale gas include natural gas coproduced with oil.Coal gas includes both coal mine gases and coal bed methane.All three fossil fuel gas types are representative of reservoir gases measured from producing or previously producing oil or gas wells or coal mines.Data from exploratory wells were excluded, as these are not broadly representative of atmospheric emissions.The  Bréas et al. (2001), Whiticar and Schaefer (2007) database does not currently distinguish between oil and nonoil-associated gas or between different ranks of coal (i.e., lignite, bituminous, and anthracite).However, the database includes the locations of each sample, which may be used to make this distinction based on activity data (e.g., production based on coal rank at a given coal mine).Pipeline (processed) distribution gases are not included in the database, primarily due to lack of data availability.Users of this database should be aware that, due to preferential stripping of alkane components, processed gases may have different molecular compositions than the reservoir gases represented herein.Also, the molecular composition of distribution gases in any region may change over time (Schwietzke et al., 2014b).In comparison with the intentional changes of the molecular composition of natural gas, isotopic signatures are thought to be relatively unaffected by gas processing except for mixing of two or more isotopic end members (Schoell et al., 1993).Geological seepage gases, i.e., the natural source component of the fossil fuel category (Etiope et al., 2008;Etiope, 2009Etiope, , 2015)), are not included in this database.A global database of onshore seeps is discussed in Etiope (2015) and available from CGG (2015).The composition of seepage gases and their influence on the global CH 4 budget is the subject of ongoing research.
Non-fossil fuel sources of CH 4 in the database consist of modern microbial sources and biomass burning.Modern microbial data are from rice paddies, ruminants (C3-and C4plant eating cattle, sheep, goats, and their manure), termites, waste and/or landfills, and wetlands (bogs and/or peat, deltas, estuaries, floodplains, lagoons, lakes, marshes, ponds, rivers, swamps and tundra).Biomass burning data are from brush, forests and/or woodlands, grasses, and pastures.

Data gathering
Data were obtained from the peer-reviewed literature, conference proceedings, graduate theses, and government reports and databases.Government databases include the US Geological Survey (USGS) Energy Geochemistry Database (https://energy.usgs.gov/GeochemistryGeophysics/GeochemistryLaboratories/ GeochemistryLaboratories-GeochemistryDatabase.aspx), the Geological Survey of the Netherlands (NLOG) database (available by request through http://nlog.nl/en/gas-properties), and the Geoscience Australia (ORGCHEM) database (available by request through http://www.ga.gov.au/search/index.html#/).Google Scholar, Web of Science, the American Association of Petroleum Geologists (AAPG; http://www.datapages.com/), and the Society of Petroleum Engineers (SPE; http://www.spe.org) were used to search for data.The use of English language search tools presented an unavoidable bias in data gathering.Searches focused on publications with gas isotopic data.Since gas compositional analysis is a prerequisite for subsequent isotopic analysis in most laboratories, gas compositional data are included with δ 13 C CH 4 and δ 2 H data if reported in the original source.Note that the literature contains far more publications with gas compositional data alone.All of the data can be traced back to original sources using the references provided.To maintain data transparency, industry proprietary data were excluded.
The database is separated into fossil fuel and non-fossil fuel (modern microbial and biomass burning sources) data tables for two practical reasons.First, the petroleum geochemistry literature tends to report analyses for discrete samples, for example, production gas analyses from individual wellheads or analyses from discrete stratigraphic horizons in a wellbore.By contrast, the literature on non-fossil fuel sources of CH 4 more commonly reports statistical summaries (e.g., multiple measurements at a given location and time) as opposed to discrete sample data; because of this, the non-fossil data comprise n = 1973 measurements represented in 107 rows of data.Second, fossil fuel data usually include gas composition of C 2 + alkanes and non-alkane gases and isotopic compositions of C 2 + alkanes.The non-fossil fuel literature rarely reports data on these additional parameters, even though microbial processes in fact produce C 2 +, albeit in negligible quantities (< 0.1 %) compared to CH 4 (Oremland, 1981;Ladygina et al., 2006;Xie et al., 2013).Rather than trying to fit these two fundamentally different types of data into a common table format, they are presented separately.

Analytical parameters
Table 2 lists analytical parameters included in the database.For fossil fuel gases, parameters include molar percent composition of non-alkane gases (N 2 , O 2 , CO 2 , Ar, H 2 , H 2 S, He) and C 1 to C 6 alkanes (CH 4 , C 2 H 6 , C 3 H 8 , iso-C 4 H 10 , n-C 4 H 10 , iso-C 5 H 12 , n-C 5 H 12 , C 6 H 14 ) as well as δ 13 C and δ 2 H isotopic ratios of C 1 to C 5 alkanes.Though less commonly used in 3-D inverse modeling studies of the global CH 4 budget, alkane compositions are important for source attribution in regional air quality and emissions studies (Karion et al., 2013;Pétron et al., 2014;Peischl et al., 2015;Kort et al., 2016).The δ 13 C and δ 2 H isotopic signatures of C 2 + alkanes may also prove useful as source tracers with future advances in analytical instrumentation.For non-fossil fuel samples, δ 13 C and δ 2 H of CH 4 are the only parameters provided in the database.

Stable isotope notation and standardization
Stable isotopic data are reported in conventional delta notation: δX = (R sample /R standard − 1) × 1000, in which δX = δ 13 C or δ 2 H and R= 13 C / 12 C or 2 H / 1 H, respectively.δ 13 C data are reported on the Pee Dee Belemnite/Vienna Pee Dee Belemnite (PDB/VPDB) scale and δ 2 H data are reported on the Vienna Standard Mean Ocean Water (VSMOW) scale.The Vienna version of the PDB scale, signifying that the original PDB reference material used to define the scale ran out and was replaced with the NBS-19 reference material, is nominally identical to the previous PDB scale (Gröning, 2004).For references in which the scales were not stated explicitly, we assume the use of the PDB/VPDB and VSMOW scales, based on the fact that the use of PDB to define the δ 13 C scale and VSMOW to define the δ 2 H scale goes back to the 1950s and early 1960s (Craig, 1953(Craig, , 1961) ) and that the oldest reference in the database (Dubrova and Nesmelova, 1968) postdates formal recognition of these scales.
It should be noted that stable isotope laboratories calibrate their data against working and/or secondary standards that have been tied to the PDB/VPDB and VSMOW scales (e.g., Dai et al., 2012).The NG-1, NG-2, and NG-3 suite of natural gas isotopic standards served this purpose beginning around the year 1984 but have since been exhausted (Hut, 1987).The current lack of International Atomic Energy Agency (IAEA) or National Institute of Standards and Technology (NIST) isotopic standards for natural gas or methane remains an ongoing problem.Unfortunately, this level of analytical detail often goes unreported in the database references.

Data screening
Data screening for the fossil fuel data consisted of the following steps.(1) Location of metadata (country, state or region, basin, formation) were checked for logical compatibility.(2) To aid searching for basinspecific data, wherever possible fossil fuel data were assigned to a corresponding sedimentary basin in the Robertson Tellus Sedimentary Basins of the World (available at http://www.datapages.com/gis-map-publishingprogram/gis-open-files/global-framework/robertson-tellussedimentary-basins-of-the-world-map).(3) Data duplicates were merged.This step was particularly important for the USGS Energy Geochemistry Database as it includes data from several other sources including the Gas Research Institute report on US natural gas analyses (Jenden and Kaplan, 1989), peer-reviewed papers, and other USGS data reports.For merged duplicates, references to both sources are provided.(4) Obvious outliers, such as individual gas concentrations greater than 100 %, O 2 concentrations greater than 21 %, total gas compositions summing to greater than 100 % (plus 10 % to allow for analytical and rounding errors), and positive values of δ 13 C and δ 2 H were omitted.For the non-fossil fuel data, no data-screening steps were taken; data are provided as originally reported in the respective sources.

Data quality
This database was not subject to a data quality assessment.The data were generated from countless laboratories in different countries over a span of 5 decades.Source publications also span a wide range of academic rigor, from conference proceedings to peer-reviewed journals.Milkov (2010) analyzed natural gas data from the West Siberia Basin and found that Soviet-era papers from the 1970s reported δ 13 C CH 4 values that were too negative by ∼ 7 ‰ compared to data generated in the late 1990s by US, German, and Russian labs, while Soviet-era papers from the 1980s reported values that were too positive by ∼ 4.5 ‰.We make no attempts to correct for these systematic errors; rather we caution users of this database to evaluate and use the data appropriately.By sheer number of samples (n = 10 706) and data sources, systematic errors inherent in any single dataset average out over the whole database, while random errors have a negligible impact on measures of central tendency.

Data summary
Fossil fuel sources comprise 8734 data records from 149 published sources.Table 3 provides a summary of the number of countries, basins, fields, formations, and published source by gas type (conventional gas, coal gas, shale gas) and specified analytical parameter (δ 13 C CH 4 , δ 2 H CH 4 , C 2 H 6 : CH 4 ).Non-fossil fuel sources comprise 1972 data records from 41 published sources.3).This was done at the level of individual countries owing to difficulty in obtaining production statistics at the sub-country level for all the countries in the database.We note that reservoir gases vary compositionally and isotopically within individual countries, basins, fields, and formations (Fig. 4).Within an individual formation, for example, natural gas can range from microbial gas in shallow and/or thermally immature areas, to oil-associated gas in deeper or thermally mature areas, to unassociated dry gas in thermally over-mature areas.Similarly, the type (i.e., rank) of coal gas data presented for any specific country may not be representative of the dominant coal type produced in that country.Despite isotopic and compositional variability within countries, country-level analysis is the finest practical spatial resolution that can be assessed for the global dataset.Shale gas was excluded from this analysis of representative-ness since shale gas production is limited mostly to Canada and the US.For the parameter δ 13 C CH 4 , the database is representative of 84 % of global natural gas production and 80 % of global coal production for the time period 2000-2015.For conventional gas, the countries with the highest numbers of samples with δ 13 C CH 4 are the US (n = 2042), China (834), Russia (556), Canada (402), and Australia (400) (Fig. 3).Countries with no conventional gas data include Algeria, Malaysia, Turkmenistan, the United Arab Emirates, and Venezuela, which together account for 12.2 % of global natural gas production.For coal gas, the countries with the largest sample sizes include the US (722), China (196), Australia (110), and Poland (105) (Fig. 3).Countries with no coal gas data representation include India, Indonesia, Kazakhstan, Ukraine, and Colombia, which together account for 14.5 % of global coal production.For the parameter δ 2 H CH 4 , the database is representative of 73 % of global natural gas production and 74 % of global coal production.For C 2 H 6 : CH 4 ratio data, the database is representative of 76 % of global natural gas production and 31 % of global coal production.Sample biases can be mitigated by weighting values by each country's fraction of global gas or coal production (Schwietzke et al., 2016) or by other methods suited to the specific data use.
Representativeness is generally poorer for the non-fossil data, owing in part to the smaller total sample sizes and the lack of data for several key areas.For example, there are few microbial or biomass burning data from Southeast Asia and Africa, two areas of significant wetland, termite, and biomass burning CH 4 emissions.Arctic wetlands are also underrepresented in the database.These areas constitute important data gaps that should be targeted for more intensive data mining and/or future field studies.

Genetic characterization
Figure 5 shows a natural gas genetic characterization plot of δ 13 C CH 4 versus δ 2 H CH 4 , first presented in Whiticar et al. (1986) and modified in Whiticar (1989Whiticar ( , 1999)).The characterization framework in Fig. 5 and in other plots of δ 13 C CH 4 versus alkane molecular compositions (Bernard, 1978;Schoell, 1983;Faber and Stahl, 1984) were originally developed by researchers at the German Federal Institute for Geosciences and Natural Resources in the 1970s and early 1980s.These plots were derived largely from proprietary industry data.Because the data could not be publicized, the characterization plots were published without showing the underlying data used in their development.These characterization schemes are still widely used to this day, despite that fact that the literature data on gas isotope ratios and compositions has expanded by orders of magnitude since the 1980s.
Figure 5 shows the distribution of conventional gas, coal gas, and shale gas in relation the major genetic fields: thermogenic, microbial CO 2 reduction, and microbial fermentation.
It also shows the field for gases from geothermal, hydrothermal, and crystalline rocks.Overall, the low percentage of samples falling outside any of the principal genetic fields in Fig. 5 indicates that this original classification scheme captures essentially the full range of isotopic variability in natural gases; however, the breakdown of sample counts by genetic origin changes with revision to the classic characterization scheme.For example, while the canonical thermogenic field assumes a δ 13 C value of −50 or −55 ‰ as the limit between thermogenic and microbial CH 4 (Stahl, 1974;Schoell, 1983;Whiticar et al., 1986), recent work extends the thermogenic field to isotopically lighter values; see below.
Figure 6 shows a more recent version of the δ 13 C CH 4 versus δ 2 H CH 4 plot, updated in Etiope (2015) based on a previous, unpublished version of a fossil fuel reservoir dataset.This diagram distinguishes more types of thermogenic gas, following Etiope and Sherwood Lollar (2013) and Hunt (1996) and reports an updated genetic field for abiotic gas, i.e., gas formed by chemical reactions of inorgani-Earth Syst.Sci.Data, 9, 639-656, 2017 www.earth-syst-sci-data.net/9/639/2017/ cally derived gases such as carbon dioxide (CO 2 ) and hydrogen (H 2 ) and not from degradation of organic matter (Etiope and Schoell, 2014).The thermogenic field in Fig. 6 extends to δ 13 C = −67 ‰ due to the existence of low-maturity thermogenic gas (Rowe and Muehlenbachs, 1999;Milkov and Dzou, 2007) and secondary alterations (biodegradation; Milkov, 2010Milkov, , 2011) that would otherwise be mistaken for primary microbial gas.
Of the 8734 fossil fuel samples in the database, a subset of n = 2861 have both δ 13 C and δ 2 H data and are thus represented on the plot.For conventional gas (n = 1951 δ 13 C − δ 2 H data pairs), a majority (78 %) of the samples plot within the thermogenic field.A smaller percentage of samples plot within the microbial field (17 %) or the abiotic field (5 %).For coal gas (n = 511), data are more evenly distributed between thermogenic (56 %) and microbial (39 %) fields, with a smaller percentage falling within the abiotic (2 %) field.Because of overlapping genetic fields, percentages sum to > 100 %.Additionally, it is important to outline that conventional or coal gases falling within the abiotic field actually have a dominant thermogenic origin: these δ 13 Cenriched gases are, in fact, mainly from over-mature (latestage catagenesis) source rocks from northwestern Germany (Rotliegend) and China (Songliao and Tarim basins).Further refinement of the genetic characterization plot should therefore account for these late-stage thermogenic gases.Shale gas data (n = 396) fall almost entirely within the thermo-genic field (91 %), with the majority of the data clustered toward the dry gas (T D in Fig. 6) end of the thermogenic maturity spectrum.Non-fossil source data (rice paddies, ruminants, waste, wetlands, termites) plot entirely within the microbial fermentation field.Biomass burning has a characteristically enriched isotopic signature, falling within the abiotic field despite a fundamentally different generation pathway compared to abiotic natural gas.A revision of the genetic diagram is in fact in progress (Milkov et al., 2017), and statistics of our database will be readjusted, taking into account this new reassessment of microbial versus thermogenic isotopic genetic characterization.

Importance of isotopically light natural gas and coal gas
A long-standing view in the petroleum geochemical literature held that "more than 20 % of the world's discovered gas reserves are of biogenic origin" (Rice and Claypool, 1981).This biogenic gas was loosely defined by cutoffs of δ 13 C CH 4 < −55 ‰ and < 2 % C 2 + alkanes (C 2 H 6 through pentane (C 5 H 12 )).For conventional natural gas in the current database, 14 % of the samples have δ 13 C CH 4 < −55 ‰ and 23 % have a C 2 H 6 : CH 4 ratio < 0.02.These percentages envelope the original Rice and Claypool (1981) estimate.However, it is now known that natural gas within the δ 13 C CH 4 and % C 2 + cutoffs encompass primary microbial gas (i.e., www.earth-syst-sci-data.net/9/639/2017/ Earth Syst.Sci.Data, 9, 639-656, 2017 biogenic gas in Rice and Claypool, 1981; formed from microbial CO 2 reduction and methyl fermentation in shallow sediments), secondary microbial gas (formed from biodegradation of thermogenic hydrocarbons; Zengler et al., 1999;Head et al., 2003;Jones et al., 2008), and low-maturity thermogenic gas (Rowe and Muehlenbachs, 1999;Milkov and Dzou, 2007).Analysis of the δ 13 C and molecular ratios of C 2 + alkanes and CO 2 is often the only means of distinguishing between these three types of gas (Milkov, 2011).At the global level, primary and secondary microbial gases are thought to account for ∼ 3-4 % and ∼ 5-11 %, respectively, of conventional recoverable natural gas reserves (Milkov, 2011).Secondary microbial gas accounts for a larger share of global conventional gas production: giant Cenomanian gas pools of secondary microbial CH 4 (mean   Hunt (1996) and Milkov (2011) and abiotic gas from Etiope and Sherwood Lollar (2013) and Etiope and Schoell (2014).The reversed vertical and horizontal axes as compared to Fig. 5 follow conventions established previously to emphasize abiotic fields.M: microbial; T: thermogenic; A: abiotic; MCR: microbial CO 2 reduction; MAF: microbial acetate fermentation; ME: microbial in evaporitic environment; T O : thermogenic with oil; T C : thermogenic with condensate; T D : dry thermogenic; T H : thermogenic with high-temperature CO 2 -CH 4 equilibration; T LM : thermogenic low maturity; GV: geothermal-volcanic systems; S: serpentinized ultramafic rocks; PC: Precambrian crystalline shields.
Microbial methanogenesis is even more significant for coals (Rice, 1993), with an approximately even distribution between thermogenic and microbial genetic origins (Figs. 5,6).The two largest coal mines in the world (North Antelope Rochelle and Black Thunder mines) are located in the Powder River Basin, Wyoming, US.Coal gas from these formations is microbial (fermentation) in origin (mean δ 13 C CH 4 = −59.1 ‰, n = 267; mean δ 2 H CH 4 = −309.6‰, n = 118).However, as discussed above, we note that some gas, traditionally considered microbial because of its low δ 13 C values, may actually have a thermogenic origin.Coals can also generate secondary microbial gas (Scott et al., 1994).

Data distributions
Figures 7 and 8 show normalized probability distributions of δ 13 C CH 4 and δ 2 H CH 4 for fossil fuel and modern microbial processes (with their respective subcategories) and biomass burning sources of CH 4 .The distributions show wide overlap between different CH 4 source categories, thus highlighting the critical need for robust weighting schemes that result in globally or regionally representative measures of central tendency (discussed below).
Data distributions for modern microbial processes have relatively normal distributions with tight overlap between the different subcategories.The distributions for biomass burning show characteristic bimodality, caused by differences between isotopically lighter C3 and isotopically heavier C4 vegetation.Fossil fuel δ 13 C and δ 2 H exhibit left-skewed (conventional and shale gas) or bimodal (coal) distributions arising from the presence of microbial and low-maturity thermogenic gas as described above.This also leads to relatively wider data ranges than the non-fossil categories.
Figure 7 also indicates the δ 13 C of atmospheric CH 4 (∼ −53.6 ‰) before fractionation by photodegradation, calculated as measured atmospheric δ 13 C CH 4 (mean −47.3 ‰ in the year 2016; White et al., 2017) plus an average fractionation factor ε = −6.3± 0.8 ‰ (Schwietzke et al., 2016).The δ 13 C of the atmosphere before fractionation represents the "hinge point" upon which CH 4 emissions fluxes are estimated by isotopic mass balance (e.g., Whiticar and Schaefer, 2007).Modern microbial processes have δ 13 C CH 4 signatures falling to the left of the hinge point; thus, lower δ 13 C CH 4 requires lower emissions to isotopically balance fossil fuel and biomass burning sources; higher δ 13 C CH 4 requires higher emissions.Conversely, fossil fuel and biomass burning source categories have δ 13 C CH 4 signatures falling to the right of the hinge point, thus lower δ 13 C CH 4 requires higher emissions; higher δ 13 C CH 4 requires lower emissions.Biomass burning falls furthest from the hinge point (mean δ 13 C CH 4 = −26.2± 4.8 ‰, unweighted by proportion of C3 and C4 vegetation).Therefore, it has the most leverage on the isotopic mass balance.
In Fig. 8 the pre-fractionation hinge point is more poorly constrained, owing to greater uncertainty in measured atmospheric δ 2 H CH 4 (−95 ± 5 ‰) and, more importantly, uncertainty in the estimated fractionation factor ε = −235 ± 80 ‰ (Gierczak et al., 1997).Modern microbial δ 2 H CH 4 signatures are within the range of the estimated pre-fractionation atmosphere.Biomass burning and fossil fuel signatures fall to the right of the hinge point.Hence, lower δ 2 H CH 4 requires higher emissions and higher δ 2 H CH 4 requires lower emissions for both these categories.
These results highlight the possibility that widespread use of too-heavy δ 13 C CH 4 and δ 2 H CH 4 fossil fuel source signatures could have led to systematic underestimation of fossil fuel emissions in the CH 4 budget literature.Indeed, Schwietzke et al. (2016) reanalyzed the global CH 4 budget using weighted source signature data calculated from an earlier version of this database (Sherwood et al., 2016) and showed that total fossil fuel emissions (excluding geological seepage) are about 50 % higher than previously estimated.
Database users are encouraged to adopt appropriate weighting criteria for estimating spatially averaged source signatures.For instance, at the global level, Schwietzke et al. (2016) developed a method to weight fossil fuel δ 13 C CH 4 data at the country level and non-fossil fuel δ 13 C CH 4 data at the emissions subcategory level.Weighting fossil fuel δ 13 C CH 4 data at the basin level may be practical for some countries with a sufficient sample size.Basin-level gas production statistics may be used in the weighting procedure as a proxy for basin-level CH 4 emissions.However, note that basin-level CH 4 emissions may be correlated with basinlevel δ 13 C CH 4 .A basin with mature dry gas and no associated oil production (and thus relatively heavy δ 13 C CH 4 ) typically employs less gas processing infrastructure (gas separators, combustors, storage tanks) than a basin with associated gas production (and thus relatively light δ 13 C CH 4 ).The former is therefore likely to emit less CH 4 per unit of gas production than the latter.This is substantiated by CH 4 emissions estimates from multiple US oil and gas basins.For example, the dry gas basins Marcellus Shale and Fayetteville are estimated to emit on average 0.3 and 1.9 %, respectively, per unit of gas produced (Peischl et al., 2015), whereas the wet gas Denver and Uinta basins emit on average 4.1 and 8.9 %, respectively, per unit of gas produced (Karion et al., 2013;Pétron et al., 2014).Thus, using gas production statistics to weight individual basins without knowledge of the respective CH 4 emissions may introduce biases.

Conclusions
The database described here is the most comprehensive CH 4 source signature database ever compiled.For the fossil fuel category (conventional gas, shale gas, and coal gas), the data comprise 8,734 unique records representing 84 and 73 % (respectively for δ 13 C CH 4 and δ 2 H CH 4 ) of global conventional natural gas production and 80 and 74 % (respectively for δ 13 C CH 4 and δ 2 H CH 4 ) of global coal production at the country level.For the non-fossil category (rice paddies, ruminants, termites, landfills and/or waste, wetlands, and biomass burning), the data comprise 1972 records from 19 countries on five continents.While this constitutes the most comprehensive global data compilation to date, additional data may help further reduce uncertainty in the global CH 4 budget, especially for regionally distinct CH 4 source attribution.In particular, additional wetland (especially Arctic) and rumi-O.A. Sherwood et al.: Global Inventory of Gas Geochemistry Data nant δ 13 C CH 4 data are needed given their large contribution to the global CH 4 budget.Database users are encouraged to adopt appropriate weighting criteria to account for variability in emissions specific to each source category.
Unweighted mean δ 13 C CH 4 and δ 2 H CH 4 signatures for the non-fossil subcategories are generally within range of a few per mil of typical values used in the CH 4 budget modeling literature.Unweighted mean δ 13 C CH 4 and δ 2 H CH 4 signatures for the fossil category, by contrast, are significantly lighter than the canonical values, particularly for coal gas.The origin of this bias is unknown but may be caused in part by a tendency among CH 4 budget modelers to reference other modeling studies instead of the primary literature on isotopic characterization of natural gas.In addition, an evolving understanding of natural gas genetic origins blurs the traditional cutoffs between microbial or biogenic and thermogenic natural gas: fossil fuel CH 4 is not exclusively thermogenic and the δ 13 C CH 4 of thermogenic CH 4 can be < −55 ‰.
Finally, the database includes a relatively new category of fossil fuel CH 4 , shale gas; these data will become more useful as this resource assumes an increasing share of global natural gas production.The availability of gas molecular concentrations will provide additional end-member constraints on fossil fuel emissions on global and regional scales.This living database will be updated every 2-3 years to provide a comprehensive and up-to-date resource for the CH 4 modeling community.

Figure 4 .
Figure 4. Strip chart of conventional gas δ 13 C CH 4 by continent and sedimentary basin, demonstrating high levels of variability within individual basins.

Figure 5 .
Figure 5. Genetic characterization plot of δ 13 C CH 4 versus δ 2 H CH 4 showing data distributions with respect to genetic domains, as traced from Whiticar (1999).The atmospheric value represents global average atmospheric CH 4 in the year 2015.

Figure 7 .
Figure 7. Normalized probability density distributions for the δ 13 C CH 4 of microbial, fossil, and biomass burning sources of methane.The flux-weighted average of all sources produces a mean atmospheric δ 13 C CH 4 of ∼ −53.6 ‰, as inferred from measured atmospheric δ 13 C CH 4 and isotopic fractionation associated with photochemical methane destruction (see text).

Figure 8 .
Figure 8. Normalized probability density distributions for the δ 2 H CH 4 of microbial, fossil, and biomass burning sources of methane.The flux-weighted average of all sources produces a mean atmospheric δ 13 C CH 4 of between −245 and −415 ‰, as inferred from measured atmospheric δ 2 H CH 4 and isotopic fractionation associated with photochemical methane destruction (see text).

Table 1 .
Representative list of atmospheric modeling studies in which isotopic ratios were used to constrain emissions from fossil fuel sources of CH 4 , showing values of δ 13 C CH 4 and δ 2 H CH 4 used and the source of those values.

Table 3 .
Fossil fuel data: number of countries, basins, fields, formations, and references by gas type and specified chemical parameter.CH 4 δ 2 H CH 4 C 2 H 6 : CH 4 δ 13 C CH 4 δ 2 H CH 4 C 2 H 6 : CH 4 δ 13 C CH 4 δ 2 H CH 4 C 2 H 6 : CH 4 Does not account for unknown and/or unspecified basins, fields, or formations.

Table 5 .
Database summary statistics (unweighted) by gas type and parameter.