A spatial database of wildfires in the United States, 1992-2011

Instruments Data Provenance & Structure


Introduction
The statistical analysis of wildfire activity has long been a critical component of national wildfire planning, operations, and research in the United States (US) (Show and Kotok, 1923;Brown, 1959;Hardy and Hardy, 2007). The analysis of historical fire and weather records, for example, is currently integral to national fire danger rating applications (Andrews and Bradshaw, 1997;Bradshaw and McCormick, 2000;Andrews et al., 2007), fire-potential forecast models (Andrews et al., 2007), and several widely used geospatial fire modeling systems (Stratton, 2006;e.g., Sanborn, 2009;Finney et al., 2011). These operational systems are relied upon to generate consistent national data for risk assessment, planning, budget formulation, and decision support at multiple scales (Buckley et al., 2006;Hardy and Hardy, 2007;Calkin et al., 2010;Wolf and Buckley, 2010;Finney et al., 2011;Noonan-Wright et al., 2011;Thompson et al., 2011;Miller and Ager, 2012;Scott et al., 2012;WRSC, 2012). Outside of the operational realm, spatiotemporal analyses of US wildfire activity are increasingly used to characterize local, regional, and national patterns and trends as they relate to factors such as climate, population, land use, and fire policy, and to predict how wildfire activity and values at risk may be influenced by changes in those factors (e.g., McKenzie et al., 2003;Gedalof et al., 2005;Stephens, 2005;Collins et al., 2006;Westerling et al., 2006;Parisien and Moritz, 2007;Miller et al., 2008;Littel et al., 2009;Preisler et al., 2009;Davis and Miller, 2010;Parisien et al., 2012). US wildfire activity statistics have been reported in various forms since the early 20th century. State-and national-level estimates of wildfire numbers and area burned, for example, are available for circa 1912 to 1997 in US Department of Agriculture, Forest Service (USFS) wildfire statistics publications (e.g., USFS, 1998a) and from circa 1999 to present from the Predictive Services Intelligence Section at the National Interagency Coordination Center (e.g., NICC, 2012b). The USFS wildfire statistics, often referred to as "Smokey Bear Reports", summarize wildfire activity by US state and therefore lend themselves only to analyses at the level of the state or interstate region. The NICC statistics, published in Wildland Fire Summary and Statistics Annual Reports, are based on calendar-year summaries of wildfire activity from the interagency Situation Report/209 (SIT/209) application (USFS, 2009). The Situation Report (SIT) module of the SIT/209 application keeps a running tally of wildfires and area burned by agency unit (including participating federal, state, local, and private entities) based on daily activity reports from dispatch offices during fire season and weekly reports otherwise (NIFC, 2011;NICC, 2012a). Detailed information regarding individual large fires and other significant incidents is entered separately into the Incident Command System -209 (ICS-209) module of the SIT/209 application, generally by local dispatch personnel or by the team managing the incident (NIFC, 2011;NICC, 2012a). In sum, the ICS-209 provides detailed, incident-specific information, but only for significant events (e.g., large fires), while the SIT provides daily and cumulative (year-to-date) total fire counts and area-burned estimates for all fires reported by dispatch offices, summarized by agency unit. The SIT/209 system was designed for tactical support (e.g., to help determine firefighting priorities and resource needs), and while its estimates of wildfire numbers and area burned may be the best available at the time of need (i.e., to characterize the current "situation"), they are not necessarily complete or accurate and thus should be considered initial figures only (NFAEB, 2007;NIFC, 2011). Internal inconsistencies (i.e., in the SIT versus ICS-209 modules) and inconsistencies with other data sources (e.g., agency fire reports, described below) are not necessarily reconciled before the final NICC annual wildfire activity statistics are published (C. Leonard, personal communication, 2011). Despite their potential weaknesses, the published NICC numbers have commonly been used to characterize national, regional, and subregional (e.g., state) wildfire activity levels in recent decades (e.g., Andrews, 2005;Hammer et al., 2009;Urbanski et al., 2009;Kolden and Brown, 2010;Reid et al., 2010;Thomas and Butry, 2012).
Incident-level wildfire reporting also occurs within each of the five major US federal agencies with wildland-firemanagement programs, as they are required to complete Individual Fire Reports for all fires under federal protection or on federal ownership and to enter that information into their respective systems of record. These "final fire reports" are intended to be the authoritative sources of wildland fire ac-tivity statistics for the federal agencies. The USFS uses the FIRESTAT application (USFS, 2003) to transmit and archive data entered from the FS-5100-29 Individual Fire Report form (Donoghue, 1982a;USFS, 2000) into the National Interagency Fire Management Integrated Database (NIFMID) (USFS, 1998b;Bunton, 2000). The USFS fire reports in NIFMID, which is accessible via the national Fire and Aviation Management Web Applications site (FAMWEB, https: //fam.nwcg.gov/fam-web/), currently date back to 1970. The four major US Department of the Interior (USDI) agencies with wildland fire programs all use the DI-1202 Individual Fire Report form, but the Bureau of Indian Affairs (BIA), Bureau of Land Management (BLM), and National Park Service (NPS) enter and store the information in the Wildland Fire Management Information (WFMI) system, while the US Fish and Wildlife Service (FWS) archives its data using a separate application, the Fire Management Information System (FMIS). The USDI fire reports date back to the 1960s. Most state and some local entities have independent reporting systems of their own, with records for various periods. The states of California and Oregon, for example, maintain publicly accessible incident-level databases that are distinct from each other and the federal systems. In addition, there are currently two national non-federal reporting systems, which are used increasingly either in concert with, or in lieu of, a state-maintained system. The National Association of State Foresters (NASF) database is intended to be a clearinghouse of state-fire-service wildfire records, accessible via the FAMWEB data warehouse, while the National Fire Incident Reporting System (NFIRS) of the US Fire Administration (USFA) is intended to capture incident information, including wildfire data, from US fire departments (i.e., city, district, county), as overseen by state fire marshal offices (Hall and Harwood, 1989; Thomas and Butry, 2012). Although NFIRS is administered by a federal agency (USFA), we refer to it as a non-federal system, because it includes fire reports from local departments rather than from federal agencies. (Abbreviations, acronyms, and aliases used repeatedly throughout this paper are defined in Appendix A.) While incident-level reporting and corporate data warehousing (e.g., via FAMWEB, WFMI; see Bunton, 2000) have facilitated intra-agency analyses of recent decades' wildfire activity, data from the various disparate systems cannot be readily integrated for a true interagency analysis of historical wildfire activity from the official systems of record. Fire data from FAMWEB and WFMI, for example, can be imported in just a few steps into FireFamilyPlus (FFP), which is the analysis system commonly used for US fire danger rating and other historical fire-weather analyses (Bradshaw and Mc-Cormick, 2000), but FAMWEB has FFP-ready non-federal data (including detailed instructions for importing them) currently available for California only. Moreover, while FFP checks for duplicate fire records upon import, it will not necessarily catch redundant incident information from different systems of record. The onus, therefore, is on the user intending to analyze wildfire occurrence to check for and purge redundant records of the same fire after pooling data from different sources. The redundant records will tend to be of multijurisdictional incidents (e.g., automatic aid incidents, large fires impacting multiple ownerships) with responses reported by several firefighting entities (see Bunton, 1999;Artley, 2009).
The problems with data compilation from the multiple wildfire systems of record affect far more than FFP users, however, and there have been repeated calls for a single fireoccurrence database to support national operations and research. In 1995, the Federal Wildland Fire Policy recognized that "accurate, organized, and accessible information about natural/cultural resources and fire activities is the basis for coordinated agency program decisions and is crucial to effective and efficient program management" and called for federal agencies to "standardize fire statistics and develop an easily accessible common database" (USDI and USDA, 1995). The necessary business analysis subsequently was performed with oversight from the National Wildfire Coordinating Group (NWCG). The resulting National Interagency Fire Statistics Information Project (NIFSIP) delivered business-process and conceptual data models for an interagency fire-reporting system that could be coordinated with non-federal cooperators and allow upward reporting of federal information to NFIRS (USFS, 1998c;Bunton, 1999), but no system was actually developed. The need for one persisted, however, and the 2001 update to the 1995 Federal Wildland Fire Policy again included a call for development of "coordinated databases for federal fire information that support fire program development and [policy] implementation" (USDI et al., 2001). By that time, however, it was becoming increasingly clear just how daunting that task would be, especially if non-federal data integration was to remain part of the ultimate goal.
Not long after the 2001 fire policy update was released, reports from three notable and independent efforts to compile wildland fire occurrence data from the various systems of record were published. In 2002, Westerling et al. (2003) compiled more than 400 000 wildland fire records spanning 20-30 yr from the USFS, BLM, BIA, and NPS for western US fire-climate analyses. Due to "data quality [sic] concerns", BLM reports of fires that occurred prior to 1980 were excluded from the research data set, which subsequently contained approximately 300 000 fire records from western federal lands for the period 1980-2000. The climatological analyses of Westerling et al. (2003) hinged on having fire location information more specific than state, county, or agency unit. They explained that the quality of the location data for some of the records, particularly the "older" subset, "constrains a comprehensive, regional-scale analysis to a 1degree [approximately 111 km] grid resolution." Westerling et al. (2003) did not indicate whether further quality-control measures were taken to remove redundant records from the resulting gridded data set of fires and area burned. Brown et al. (2002) reported to the NWCG specifically regarding the completeness and quality of the fire data from the US federal systems of record. They evaluated all agency firereport data from 1970-2000 but, like Westerling et al. (2003), considered the USDI fire-occurrence data to "effectively start in 1980" due to "very minimal" reporting prior. Brown et al. (2002) deemed fire records "usable" if they included apparently accurate (or correctable) values for the following data elements: (1) discovery date, (2) location (latitude/longitude), (3) total area burned, and (4) cause. Locations that could be converted to latitude/longitude from a Public Land Survey System (PLSS) section (2.6 km 2 grid) identifier were considered viable. Their quality-control efforts included checking for duplicate records (both intraagency and interagency). A duplicate, or identical, record was defined by Brown et al. (2002) as having "exactly the same values for all fields" as another record in the data set. Of the 657 949 federal fire records evaluated, only 538 809 could be flagged as usable and non-redundant per their criteria. While their work was invaluable in pointing out problems related to the completeness, quality, and consistency of core elements of wildfire data in the federal systems of record, Brown et al. (2002) considered the data set of Schmidt et al. (2002), which focused on a shorter period of record, to be the only "quality controlled [sic] historical observed individual federal fire occurrence set" available at the time.
The work of Schmidt et al. (2002) was particularly remarkable because it included efforts to compile core data from 1986 through 1996 from federal and non-federal fire reporting systems, further illuminating the great inconsistencies between the systems and underscoring both the need for and the challenge of building a national wildfireoccurrence database. Like Westerling et al. (2003) and Brown et al. (2002), Schmidt et al. (2002) considered fire location to be a core data element, as their intended use of the data set was for geospatial fire analysis and risk assessment. They sought the same elements as Brown et al. (2002) plus an identifying fire number and, where available, fire name and containment date. While they acquired non-federal data from all conterminous states except for Nevada, several states lacked fire records for entire years or lacked values for core data elements, including area burned. Moreover, the non-federal fire records were often georeferenced only to the county level, if at all. While they expected there to be redundant records from the federal and non-federal systems, they could not identify them "because fire locations are generally imprecise (to the nearest [PLSS] section), and not all database fields that could aid in tracking duplicates are fully populated" (Schmidt et al., 2002). They invested two and a half person-years to compile the 11 years' worth of national fire-occurrence data and concluded that even basic estimates of total fire numbers and area burned nevertheless would be restricted to the state level in many cases and, moreover, be compromised by both missing and redundant records. Building on the initial efforts of the NIFSIP and informed to some degree by the work of Brown et al. (2002) and Schmidt et al. (2002), progress toward a national firereporting system ostensibly has been made over the last decade (see NWCG, 2003;NFAEB, 2007). Still, no such system exists. In 2003, a prototype for the national Fire Program Analysis (FPA) system was initiated, and it was recognized that several components of that system would rely on historical wildfire activity data from both federal and non-federal systems of record. FPA is a national, interagency application intended to evaluate the effectiveness of alternative firemanagement strategies and thereby support national planning activities and budget development (see Mavsar et al., 2013). FPA requires wildfire activity data as inputs to, and for evaluation of output from, its Initial Response Simulator (see Fried and Fried, 2010) and Large Fire Module (see Finney et al., 2011). For these purposes, the wildfire records must include, at minimum (1) location, at least as precise as PLSS section, (2) discovery date, and (3) final fire size. Some additional elements such as fire name, cause, and containment date are sought but not required for a record to be considered "viable". Interagency guidance to FPA was to exclude fire records older than 1992 from consideration due to concerns about both the quality and the completeness of the data (J. Fotjik, personal communication, 2012).
FPA was released in 2008, and the first national analysis was completed in 2009 (USDI, 2009). At that time the system drew upon a wildfire data set spanning 1992-2008 that included records from the federal systems of record as well as the NASF database and NFIRS. The data were compiled and quality-checked in a manner that expanded upon the approaches used by Brown et al. (2002) and Schmidt et al. (2002). The original process, documented in an unpublished technical guide (FPA, 2010), was adapted in 2010 to include additional non-federal data from state-maintained systems of record and to more thoroughly screen the compiled data set for redundant records. Data for 2009-2010 were added during the winter of 2011. Data for 2011 and previously absent records from prior years were acquired from the NASF database via FAMWEB and added during the winter of 2012. The resulting national spatial database of US wildfires 1992-2011, referred to as the FPA Fire-Occurrence Database (FPA FOD), is presented and described here. Only the subset of basic elements used by FPA is included, but additional attributes can be drawn from the source systems using the identifier of the original record, which is retained in the FPA FOD. Record identifiers from the ICS-209 application and the satellite-derived Monitoring Trends in Burn Severity (MTBS) national fire-perimeter data set (Eidenshink et al., 2007) are also included for a subset of the fires, providing, in essence, bridges to those information systems. We evaluate the completeness of the resulting data set by comparing estimates of wildfire numbers and area burned from the FPA FOD with published wildfire activity statistics by state and year.

Data sources
Wildfire records for the period 1992-2011 were acquired from federal, non-federal, and interagency systems of record (Table 1). Records were required to have values for the following attributes to be considered candidates for inclusion in the FPA FOD: (1) location at least as precise as PLSS section, (2) discovery date, and (3) final fire size. Candidate non-federal records were available from the two national, non-federal reporting systems (NASF, NFIRS) only for a subset of states and years, largely due to limited state and local participation and a lack of sufficient location information in submitted fire reports. Additional candidate non-federal data were acquired from state-maintained and interagency systems to augment the subset drawn from NASF and NFIRS. FPA staff associated with the Alabama Forestry Commission (AFC) provided AFC data for [2003][2004][2005][2006][2007][2008][2009], and the Texas A&M Forest Service (TFS) provided a compiled data set of state and local wildfires reported to the TFS since 2005. Due to time constraints, data were not sought directly from all other US states and territories. Instead, we drew non-federal wildfire data that were readily available online and capitalized on past and present efforts to compile non-federal fire records, obtaining multi-state data sets from the other projects. Publicly available wildfire data were downloaded from the Alaska Interagency Coordination Center (http://fire.ak.blm.gov/), the Oregon Department of Forestry (http://www.oregon.gov/ODF/GIS/gisdata. shtml), the Virginia Department of Forestry (http://www. dof.virginia.gov/gis/dwnload/index.htm), and the Wisconsin Department of Natural Resources (ftp://gomapout.dnr. state.wi.us/geodata/forestry/fire_occurrence.zip). Data from the California Department of Forestry and Fire Protection were downloaded from FAMWEB (http://fam.nwcg. gov/fam-web/weatherfirecd/fire_files.htm). Multi-state nonfederal data sets for at least part of the period of interest were obtained from Schmidt et al. (2002), the Southern Wildfire Risk Assessment (SWRA; see Buckley et al., 2006), the MTBS project, and the USFS Eastern Forest Environmental Threat Assessment Center (EFETAC) ( Table 1).

Data processing and quality control
The data sought from each system of record are listed in Ta  Sources of wildfire data in the FPA FOD. Although the National Fire Incident Reporting System is administered by a federal agency (USFA), we refer to it as a non-federal system, because it includes fire reports from local departments rather than from federal agencies. We distinguish the Alaska Interagency Coordination Center and ICS-209 as interagency systems, because they are sources of federal, state, and local reports. Data were not formatted consistently across the various systems of record, and the following transformations, intended to comply as best as possible with NWCG data standards (http://www.nwcg.gov/pms/stds/standards/), were made when necessary. Of course, our ability to conform the data (see Table 2) to NWCG standards was constrained by the level of adherence to those standards within the source reporting systems throughout the period of record. After making any necessary geographic transformations, locations were formatted as latitude and negative longitude in decimal degrees, based on the North American Datum 1983. Precision to eight decimal places was retained, when available, for the sake of consistency with the source information. However, both the accuracy and precision of the location estimates are generally much lower than that implied by the stored coordinate information -which, for example, may have been calculated from a PLSS section centroid. Dates were formatted as mm/dd/yyyy, and time as hhmm (using the 24 h clock). Final fire size was formatted to indicate area in acres (1 acre = 0.405 hectare), which is the NWCG standard. The full precision of the estimate in the fire report was retained, which may or may not meet the NWCG standard of tenth-acre precision for all fires less than one acre. If the source database did not include a code or number that uniquely identified each record, we created one by either concatenating data elements or simply auto-numbering the records. We cross-walked the reporting agency and unit to the active NWCG Unit Identifier standard (see NWCG, 2012b), and we include the active Unit Identifier data set that we downloaded from https://www.nifc.blm.gov/unit_id/ Publish.html on 5 March 2012 and used for our cross-walk as a lookup table, NWCG_UnitIdActive_20120305, in the final database. We removed any extraneous leading and trailing characters from the FireCode (https://www.firecode.gov/ index.cfm?action=login), which is a standardized four-digit federal accounting code assigned to fires and included in most federal fire reports that date back to circa 2003, providing a link to the agencies' financial systems (USDI and USDA, 2013). Fire names were converted to uppercase, but otherwise unadjusted. Some fire names had clearly been truncated, for example, by the source system or by some data entry or transfer process, but we made no attempt to "correct" or harmonize them. The NWCG data standard for fire cause is currently pending, so we cross-walked the cause descriptions from the various systems of record to a set of codes and names (Table 3) based on those used by the USFS for statistical analysis (USFS, 2003). Values indicating landowner at the fire's location (i.e., owner at origin) were cross-walked to a set of names and codes (Table 4) based on the NWCG standard landowner categories (http://www.nwcg.gov/pms/ stds/standards/land-owner-kind-category_v1-0.htm). While based on existing conventions, the full suite of cause and owner codes and values to which we cross-walked (Tables 3   Table 2. Data elements extracted from wildfire reports and used to populate the FPA FOD.

Item Description
Location * Point of origin of the fire, at least as precise as PLSS Section.
Discovery Date * The date that the fire was discovered or confirmed to exist.
Final Fire Size * Area within the final perimeter of the fire.
Record Identifier Code or number that uniquely identifies the record within the source database.
Reporting Agency Identifier for the reporting agency.
Reporting Unit Identifier for the reporting unit within the agency.
Local Fire Report ID Number or code that uniquely identifies a fire report for a particular unit and a particular calendar year.
Local Incident ID Number or code that uniquely identifies an incident for a particular local fire-management organization within a particular calendar year.

FireCode
Code used within the interagency wildland fire community to track and compile cost information for emergency fire suppression expenditures.

Fire Name
The name of the incident.

Discovery Time
Time of day that the fire was discovered or confirmed to exist.

Fire Cause
The reported cause of the fire.

Contain Date
Date on which the fire was declared contained.

Contain Time
Time of day that the fire was declared contained.
Owner Name of primary owner or entity responsible for managing the land at the point of origin of the fire at the time of the incident.

State
Name of the state in which the fire is reported to have burned (or originated).

County
County in which the fire is reported to have burned (or originated).
Fire Type Type of fire, in terms of management response.
Protection Type Entity responsible for fire protection at the point of origin. and 4) include categories that we added in order to more fully accommodate the range of information in the original records. The US state (or territory) in which the fire occurred was not nominally designated in all fire reports. In those cases, we populated that field with the name of the state to which the reporting unit is tied, so that we obtained a nominal locality designation for all records to use for quality-control purposes (i.e., to compare with the point locations; see Sect. 2.2.2). State names were converted to standard two-letter alphabetic Federal Information Processing Standards (FIPS) codes (NIST, 1987). County (or equivalent) names were cross-walked to the three-digit FIPS county codes (NIST, 1990). Allowable fire types, in terms of management strategy, were coded as follows: 1 -actionable (i.e., suppression or appropriate management response taken), 2natural out, and 4 -fuels management (unplanned ignitions Other (non-agency) land protected by the agency under a cooperative agreement, memorandum of understanding, interagency mutual aid agreement, or contract 9 Naturally ignited wildland fire for which the appropriate fire-management response is based on objectives from an approved Fire Management Plan (FMP) (i.e., "fire used for resource benefit" or "wildland fire use" when fire type = 4) only). Planned ignitions (i.e., prescribed fires), except those that escaped and ultimately required suppression response, were intentionally excluded from the FPA FOD. Protection type, specified only in the federal fire reports, was coded as indicated in Table 5.

Error checking
Candidate fire records from each of the systems of record were examined for spatial errors via overlays with geographic information system (GIS) data sets delineating US state boundaries (http://nationalatlas.gov/mld/statesp.html) and the boundaries of the fire planning units (FPUs) used by FPA. Some egregious spatial errors, including transposition of coordinates, were revealed through initial visual inspection of the mapped points, and, when the solution was obvious (as in the case of transposed latitude and longitude), the data were corrected. After all possible location information had been salvaged, fires that still mapped outside FPU boundaries or outside the state or territory expected from the fire report were flagged. All US states and territories are fully contained within the collective extent of the FPU boundaries, and fires outside FPU boundaries were excluded from the FPA FOD because (1) they were presumed to be located incorrectly (i.e., mapping offshore or in another country), and (2) all FPA fire analyses are FPU-based and therefore the records would be excluded de facto. Fires from the non-federal systems of record flagged because they mapped outside of the expected state were likewise presumed to be located incorrectly (or to be a large, multijurisdictional incident 8 K. C. Short: A spatial database of wildfires in the United States that would be redundantly reported in another system) and were therefore excluded from the FPA FOD. Federal fires (fires from a federal system of record) flagged for the same reason were not summarily excluded, however, because some agency units span or have fire protection responsibility or cooperative agreements in more than one state (e.g., NPS Appalachian National Scenic Trail, BLM Miles City Field Office) and some nominal state designations were based on the state designation in the unit name, which may not reflect the true state location of the fire. The flagged subset of suspect federal fires was therefore visually inspected and only records with obvious spatial errors (e.g., fires < 40 hectares mapping several states away from the expected domain) were excluded from further processing. In some cases, fires did not map within the state expected from the fire report, but did map within the domain of the interstate reporting unit (e.g., fires reported from Dinosaur National Monument mapping in Colorado and Utah, across which the unit spans) or were responded to under cooperative agreement or as a threat to the unit's land. In other cases, fires mapped near enough to the proclaimed state or unit such that the mismatch was ostensibly due to imprecision of the reported location. We did find fires that were clearly mislocated because they mapped over water, but we retained them if they fell within the expected domain of state or FPU. Clearly erroneous dates (e.g., 1/1/1901, 12/6/4320) were excluded when we set our date range to 1992-2011. We checked for and omitted containment dates and times that preceded the fire discovery dates and times.
To avoid confusion regarding the numbers 0 and 1, the FireCode system does not generate codes containing the characters "O" or "I" (USDI and USDA, 2013). However, we found codes in several hundred fire records with an O or I that had been changed from a 0 or 1 at some point in the reporting process. We changed the incorrect letters to the correct numbers in the FPA FOD so that we could fully leverage FIRE_CODE in our quality-control processes (see Sects. 2.2.4 and 2.2.5).

Data compilation and derivation of additional elements
All viable data extracted from each of the systems of record were compiled in a Microsoft ® Access 2010 database. Records from each of the systems were appended into a single Fires table with the schema shown in Table 6. As data were appended, the FOD_ID was assigned with an autonumbering function. Either the numeric FOD_ID or the alphanumeric FPA_ID can be used as the table's primary key, as each uniquely identifies records in the database. It is the FPA_ID, however, that contains the necessary information to link back to the original data set. It consists of the unique identifier acquired (or created by concatenating elements) from the source system, with leading characters added to en-sure that it remained unique after being pooled with records from other systems. The fields SOURCE_SYSTEM and SOURCE_SYSTEM _TYPE, provide the system information summarized in Table 1 for each record. While fire year is included as an attribute separate from discovery date in most source systems, to ensure logical consistency it was populated in the Fires table directly from DISCOVERY_DATE, as was day of year (DOY) for discovery and containment dates. FIRE_SIZE_CLASS was derived from FIRE_SIZE using the proposed NWCG standard class breaks and codes (http: //www.nwcg.gov/pms/stds/fire_size_class/values.pdf).

Removing redundant records
Once the data were consistently formatted and pooled together, we used a multistep process to identify and purge redundant fire records. In addition to the potential for multiple reports of the same fire to appear within and among the various systems of record (Bunton, 1999;Brown et al., 2002;Schmidt et al., 2002), there is also the potential for redundancy from the ways that fire complexes are handled in the reporting process.
It is a fairly straightforward database exercise to identify fire records that are "duplicates" sensu Brown et al. (2002), because they have identical values for location, discovery date, total area burned, and cause. However, due to inconsistencies in data collection, formatting, and storage requirements, as well as information errors, the reporting systems are rife with records that are redundant but by no means identical, and there is no simple process to remove them before or after pooling the data.
Examples of redundant fire records extracted from the reporting systems are provided in Table 7. The process of Brown et al. (2002) likely would have flagged cases 1 and 2 as intra-agency and interagency duplicates, respectively, but overlooked the others in Table 7, because their coordinates, discovery dates, and area burned values do not match exactly. Fire cause is often unreported, especially in the nonfederal systems, or may differ among reports due to the information or reporting options available at the time of filing (see Donoghue, 1982b); thus, we did not enlist this field to assist in identifying redundant records. Instead, we primarily leveraged latitude, longitude, discovery date, fire size, and, when available, fire name and FireCode. We ran a series of database queries, explained as follows: we began with the most basic checks for duplicate records and progressed through steps that, as we cast a widening net, required increasing visual inspection to ensure that records flagged for exclusion from the final data set by the queries were indeed redundant with others to be retained. All records with flags that persisted to the end of the process (i.e., through step 10 below) were ultimately purged from the database.
We readily identified sets of redundant records like cases 1 and 2 in Table 7 by querying the Fires table for separate Text Unique identifier that contains information necessary to track back to the original record in the source data set. Can be used as primary key in lieu of FOD_ID. SOURCE_SYSTEM_TYPE * Text Type of source database or system from which the record was drawn (federal, non-federal, or interagency). SOURCE_SYSTEM * Text Name of or other identifier for source database or system from which the record was drawn. NWCG_REPORTING_AGENCY * Text Active NWCG Unit Identifier for the agency preparing the fire report. NWCG_REPORTING_UNIT_ID * Text Active NWCG Unit Identifier for the unit preparing the fire report. NWCG_REPORTING_UNIT_NAME * Text Active NWCG Unit Name for the unit preparing the fire report. SOURCE_REPORTING_UNIT Text Code for the agency unit preparing the fire report, based on code/name in the source data set. SOURCE_REPORTING_UNIT_NAME Text Name of reporting agency unit preparing the fire report, based on code/name in the source data set. LOCAL_FIRE_REPORT_ID Text Number or code that uniquely identifies an incident report for a particular reporting unit and a particular calendar year. LOCAL_INCIDENT_ID Text Number or code that uniquely identifies an incident for a particular local fire-management organization within a particular calendar year. FIRE_CODE Text Code used within the interagency wildland fire community to track and compile cost information for emergency fire-suppression expenditures. FIRE_NAME Text The name of the incident, from the fire report (primary) or ICS-209 report (secondary  entries with identical values for all five of the following elements: latitude, longitude, discovery date, fire size, and name (step 1). Once the records were flagged accordingly, we selected only one from each set to retain. When we identified a set of federal and non-federal wildfire records that were redundant, we always retained a federal record, because the federal records tend to be more fully attributed (e.g., fire name and cause more consistently populated). When records from the federal reporting systems were redundant, we attempted to retain only the record from the agency unit that in-dicated it had protection responsibility for the fire. We based that determination on a code created by concatenating the fire type and protection type (Table 5) values in the Fires table. A fire type-protection type (FTPT) code of 11, for example, indicates an actionable fire on agency land protected by that same agency. A decision matrix was generated with help from agency representatives and used to identify the FTPT code of the record to be retained in the FPA FOD from a given set including two or more redundant federal fire reports ( Table 8). The matrix (Table 8) indicates that, for example, Table 8. Decision matrix indicating the fire type-protection type code of the record to be retained in the FPA FOD from a set including two or more redundant federal fire reports. The first digit of the code indicates fire type, where 1 = actionable (i.e., suppression or appropriate management response taken), 2 = natural out, and 4 = fuels management (unplanned ignitions only). The second digit of the code indicates protection type, as defined in Table 5. 11 12 13 14 15 16 19 21 22 23 25 26 49   11 11 11 11 11 11 11 11 11 11 11 11 11 11  12 11 12 12 12 12 16 19 12 12 12 12 12 49  13 11 12 13 14 13 16 19 21 13 13 13 13 49  14 11 12 14 14 14 16 14 14 14 14 14 14 14  15 11 12 13 14 15 16 19 21 22 23 15 26 12 13 14 21 16 19 21 21 21 21 21 49  22 11 12 13 14 22 16 19 21 22 22 22 26 49  23 11 12 13 14 23 16 19 21 22 23 23 26 49  25 11 12 13 14 15 16 19 21 22 23 25 26 49  26 11 12 13 14 26 16 19 21 26 26 26 26 49  49 11 49 49 14 49 49 19 49 49 49 49 49 49 records with FTPT codes of 11 are to be retained over all others, because that code indicates that the entity with protection responsibility is the reporting entity. When redundant federal records indicated the same FTPT for the same fire (e.g., multiple federal units reporting the same fire and using FTPT = 13, because, although it burned multiple federal ownerships, it was the responsibility of a non-federal entity), or when sets included only non-federal records, we simply retained the first record of the fire as it occurred in the compiled data set, sorted in ascending order on FOD_ID. The FOD_ID was auto-assigned in ascending order as records were appended to the Fires table, and federal records were appended in the following order: (1) FS-FIRESTAT, (2) DOI-WFMI, (3) FWS-FMIS. The USFS records, for example, were assigned lower FOD_IDs than the DOI records and would therefore be retained in lieu of redundant DOI records with the same FTPT indicated. Records were "retained" by removing the flag that had been assigned in step 1. We then repeated the process (i.e., step 1), ignoring any records with a persisting flag, after rounding latitude and longitude to two decimal places to identify additional sets of redundant records illustrated by case 3 (Table 7, step 2). Fire name was not consistently populated in the systems of record, and steps 3 and 4 in our process entailed repeating steps 1 and 2, respectively, with fire name excluded from the queries to flag record sets like cases 4-7 in Table 7. Some fires flagged in steps 3 and 4, which appeared redundant because, according to the reports, they were the same size and discovered on the same date and at essentially the same location, were likely "false positives" due to imprecision in reported fire location. For example, fires located only as precisely as PLSS section generally have a reported latitude and longitude that merely represent the coordinates of the section centroid, and all fires occurring within that section are thereby apt to have matching location information. Therefore, PLSS section-located fires that were discovered on the same date and that reached the same final size are likely to be flagged as redundant in steps 3 and 4. We presumed, however, that fires greater than 40 hectares flagged in steps 3 and 4 were far more likely to be redundant (i.e., unlikely to occur in multiples on the same day within the same 2.6 km 2 section) than smaller fires, and to avoid purging potentially legitimate records from the database, we only flagged fires < 40 hectares if they matched in terms of size, date, and location, and were obtained from different agencies or systems of record (e.g., Table 7, case 7). All sets flagged in steps 3 and 4 were visually inspected for false positives before proceeding. The name and protection type of the FWS-reported fire in case 7, for example, indicated to us that the record was of a mutual aid fire that was indeed redundant with the record of an unnamed fire reported by the Florida Forest Service (FFS; formerly the Florida Division of Forestry, or DOF), and per our processing rules, only the federal record was retained.
Step 5 was another variation on the previous steps, with coordinates for the persisting subset of data rounded to a single decimal place and records flagged if they matched in terms of name, date, and furthermore the generalized location (but not fire size). This step, again with visual confirmation, allowed us to identify sets of redundant records illustrated by cases 8 and 9 (Table 7), in which the records did not have the same reported location or fire size. In Step 6, we relaxed the date requirement to year and identified sets of redundant records shown in case 10 (Table 7), which matched in terms of name, year, general location (coordinates rounded to one decimal place), and fire size greater than 4 hectares. In step 7, we grouped records of fires greater than or equal to 405 hectares by (1) year and FPU and (2) year and state, and queried and visually inspected the groups to identify sets of redundant records based on matching or similar names, dates, and/or fire sizes (Table 7, cases 11-15).
In step 8, we included fires of all sizes and based our search for redundant records only on name, year, and general location (coordinates rounded to one decimal place), which returned a list of several thousand fires, generally less than 405 hectares. This again required visual inspection to ensure that non-redundant fires were retained, as it is not uncommon for different fires, especially smaller ones, that occur in the same general location throughout the year to be repeatedly assigned the same generic place name (e.g., "Bombing Range", "Roadside"). Case 16 (Table 7) provides an example of a set of records that was flagged in step 8 and deemed redundant after visual inspection, although they do not report the same date, size, or precise location. The protection type (5) and ownership (USFS) of the BLM-reported fire in case 16, for example, indicated to us that the BLM response was to a perceived threat and the fire report was indeed redundant with the record reported by the USFS, which 12 K. C. Short: A spatial database of wildfires in the United States indicated it had protection responsibility; per our processing rules, only the USFS record was retained.
In the next step, step 9, which was the most laborious, we grouped all fires (regardless of size) by year and general location (coordinates rounded to one decimal place) and identified groups that included records from multiple sources (i.e., different systems of record). We then sorted the records within each group by date and fire size and visually inspected more than 150 000 records, scanning for redundant records that had passed through our screening up to that point. Case 17 (Table 7) provides an example of a set of records identified in this way that had not yet been flagged because they are reports of a < 405-hectare fire that agree only on month and location at one decimal place; but given their size (> 324 hectares) and the similar dates, we concluded that they reported the same incident.
In step 10, we used FIRE_CODE to check for redundant reports from the federal systems of record. The formal Fire-Code system dates back only to 2003, and FireCode is not consistently populated for all fires from the federal systems since circa 2003, so it is of limited use in identifying potential duplicates. Moreover, FIRE_CODE alone is insufficient for this purpose, because fires in complexes (see Sect. 2.2.5) and groups of small (i.e., < 40 hectare, or "miscellaneous ABC") fires are often assigned the same accounting code in a given fiscal year. We queried the subset of data that persisted through step 9 and generated a list of potentially redundant records based on FIRE_CODE, which was visually inspected to avoid discarding nonredundant fires grouped for accounting purposes, and those within complexes, which were dealt with separately as described in the following section.

Fire complexes
Operationally, a wildfire complex consists of two or more fires located in the same general vicinity and assigned to a single incident commander or unified command (NWCG, 2012a). In other words, a fire complex effectively comprises multiple fires managed as one large incident. It is not uncommon, however, to find fire reports for the complex as well as reports for all or some of its constituent, or subordinate, incidents in the systems of record. Because complexes can cover very large areas (e.g., the 2004 Taylor Highway complex in Alaska was reported at more than 500 000 hectares), estimates of total area burned from a database with records of complexes as well as their subordinate fires are apt to be highly inflated. When complexes were identified in the reporting systems, either via the incident name or by some database flag, we used that information to populate the COM-PLEX_NAME field in the Fires table (Table 6). While that approach identified some of the complexes in the data set, it did little to help us identify and label each of the subordinate fires within complexes when they, too, were reported.
Because they generally are significant incidents, most federal and many non-federal fire complexes from circa 1999 forward should have an ICS-209 report in the SIT/209 database. Individual large incidents within the complex should be named, with area burned for each reported, under Remarks (NWCG, 2012a). In 2010 the SIT/209 database began including a separate incident complex table that names subordinate fires by complex. Unfortunately, the incident numbers used in the ICS-209 records (see NIFC, 2011) are not required to be included in the agency fire reports, which means that there is no straightforward way to join the records together. The LOCAL_INCIDENT_NUMBER in the Fires table may contain a component of the alphanumeric ICS-209 incident number (i.e., the numeric portion), but the LO-CAL_INCIDENT_NUMBER rarely includes all of the information and is rarely formatted in the manner necessary to join the Fires to the ICS-209 records. The formatting and information content of the incident number used in the ICS-209 reports (e.g., XX-XXX-######) is, however, generally consistent with that of the incident order number in the FireCode system, and FIRE_CODE is an attribute in our data set. We were able to acquire federal incident records for 2003-2011 from the FireCode system, and we used those to populate a temporary INCIDENT_NUMBER field in the Fires table (Table 6). We intentionally excluded incident order numbers that were incomplete (e.g., XX-XXX-).
According to recent interagency guidance, all federal wildfires within a complex are to retain their original FireCodes and incident numbers for accounting purposes (NWCG, 2011). However, when we linked our incident records into FireCode and extracted incident names, we found it not uncommon for all incidents within a complex to be assigned the same FireCode and given the complex name. Therefore, simply by linking into the FireCode system, we were able to further populate the COMPLEX_NAME field in the Fires table.
The FireCode system dates back only to 2003, and not all federal reports of wildfires from 2003-2011 were FireCodepopulated. For records that we could not populate IN-CIDENT_NUMBER via FireCode, we scoured the LO-CAL_INCIDENT_NUMBER and FIRE_NAME fields for what appeared to be components of the ICS-209 incident numbers and, usually after some reformatting or concatenation with other elements the record (e.g., State and Unit ID), we were able to use those components to populate the INCI-DENT_NUMBER field further.
We then attempted to use the newly assigned INCI-DENT_NUMBER in the Fires table to link to the ICS-209 records and extract additional information about complexes and subordinate incidents. We populated the field ICS_209_INCIDENT_NAME accordingly. Although we expected all federal fires > 405 hectares to have an ICS-209 report, we were unable to extract ICS-209 incident names for all records with a non-null INCIDENT_NUMBER in the Fires table that met those criteria. Our inability to populate names in those cases was either due to an apparent absence of ICS-209 reports for the fires or to our inability to derive matching incident numbers from the Fires table. We then scoured the ICS-209 incident records for the subset of all fires > 405 hectares from 1999 to 2011 in the Fires table and populated the INCIDENT_NUMBER and ICS_209_INCIDENT_NAME fields for as many additional records as possible, partially through a series of database queries and partially through visual inspection of the data. When we located records of the same event that did not link originally due to differences in incident numbers, we overwrote the INCIDENT_NUMBER in the Fires table with that from the ICS-209 system in order to ensure that IN-CIDENT_NUMBER could serve as a bridge to the ICS-209 information, at least for most fires in the FPA FOD > 405 hectares. Due to inconsistencies in the ways that incident numbers are populated in various information systems, we determined that INCIDENT_NUMBER, as we had derived and vetted it, could only reliably be used to join to the ICS-209 records, and we therefore changed the name to ICS_209_INCIDENT_NUMBER and deleted entries that did not join, or joined incorrectly, to ICS-209 records. We then leveraged information in the ICS-209 data set to continue populating COMPLEX_NAME.
The SIT/209 databases for 2010 and 2011 include an incident complex table that names subordinate fires by complex, and we capitalized on that information. For the years 1999-2009, however, there is no easy way to find and extract the names of individual incidents within complexes because they are listed, usually among copious other notes, in a memo field (i.e., a data type intended to store large amounts of text). It was therefore essentially a manual exercise to populate COMPLEX_NAME from the ICS-209 records (when this was possible).
After leveraging the FireCode and ICS-209 systems, we turned to a third national data set to identify additional fires in complexes. The MTBS perimeter data set, which currently spans 1984-2011, was used to further populate COM-PLEX_NAME for fires pre-dating the other two systems and for later fires that our process heretofore had missed. In addition to the FireCode and ICS-209 systems, the MTBS project consults several other sources (e.g., the Wildland Fire Decision Support System, InciWeb, field units) to determine the appropriate complex designations for the fires it has mapped, dating back to the mid-1980s (B. Quayle, personal communication, 2013). The MTBS data set identifies fires in complexes via the FIRE_NAME attribute, and in cases in which the fires within a complex remained distinct (unmerged), the complex name is listed, and the name of the subordinate (i.e. individually mapped) fire is indicated parenthetically (e.g., Canyon Complex [Bear]). It was possible to correctly derive the MTBS_ID (Table 6) for several thousand records by concatenating agency, unit, FireCode, and discovery date from the fire records. Fire names and sizes were used to verify that the correct ID had been derived. MTBS_IDs for another several thousand records were populated manually. MTBS_FIRE_NAME (Table 6) was populated once the records could be joined, and missing complex names were extracted from MTBS_FIRE_NAME.
Once COMPLEX_NAME was populated as completely as possible, we used it to flag records like those illustrated by case 19 (step 11) in Table 7. In that case, the subordinate fires merged into one and continued to grow before reaching the final fire size. We therefore retained only one record of the three in that example, that of the JULY 4TH COMPLEX, in order to accurately reflect the total area burned. In cases in which the total area burned by the complex was adequately accounted for by the individual reports of subordinate fires, we preferentially retained the individual fire records in lieu of the single complex record in order to avoid losing information.

Completeness evaluation
We attempted to evaluate, at least nominally, the completeness of the resulting data set by comparing estimates of annual fire numbers and area burned, by state, from the FPA FOD with other published estimates. To avoid errors associated with mapping imprecision, the FPA FOD estimates were compiled by state based on the nominal STATE attribute, rather than the point location of the fire, although those assignments agree in 99.9 % of the records.
Because the published estimates of annual wildfire numbers and area burned can differ considerably among sources due to inconsistencies and errors in measurement and reporting (e.g., see Urbanski et al., 2009), several sources of reference estimates were included in our assessment. We consider agreement in estimates of the same metrics from the FPA FOD and a given reference source as a proxy for "completeness" with respect to the latter. How accurately the reference estimates reflect actual wildfire activity is unknown; however, none are presumed to represent the true values, and therefore completeness, in fact, cannot be known by way of this assessment, or, indeed, at all. In other words, agreement of estimates from the different sources implies nothing about their accuracy.

Sources of reference estimates
Interagency estimates by state of annual fire numbers and area burned for 1992-2011 were obtained from the USFS wildfire activity statistics published until 1997 and the NICC annual wildfire statistics available for 1999-2011. (Statelevel wildfire activity statistics for 1998 are not available from the NICC, C. Leonard, personal communication, 2011). The USFS wildfire activity statistics include reported estimates of the number of wildfires and area burned on lands qualifying for federal, state, and local wildfire protection, as required by the Cooperative Forestry Assistance Act of 1978. The data were provided by the USFS, the USDI agencies (bureaus), and the state forester or equivalent state official (e.g., state fire marshal). The NICC estimates are based on year-end summaries of wildfires and area burned according to the SIT reports. During the active fire season (i.e., preparedness level > 1), SIT reports are required on a daily basis from federal units with fire protection responsibility as well as any non-federal (i.e., state, county, or local) units with protection responsibility for lands under federal ownership (per cooperative agreement). The use of the SIT reporting system is optional for non-federal units with wildfire protection responsibility for non-federal lands. Voluntary SIT reporting by non-federal units may result in considerable underestimates of total wildfire numbers and area burned appearing in the NICC annual reports.
If the NICC estimates are low due to limited non-federal reporting through the SIT application in years and for states from which we were able to acquire viable data from the nonfederal systems of record, we would expect lack of agreement in those cases (i.e., FPA FOD estimates exceeding those from NICC), particularly for states with the majority of their wildland area under non-federal fire protection. To identify states and years for which the NICC numbers appear low due to underreporting of non-federal fires, we relied on several additional sources of published estimates of wildfire numbers and area burned, as available.
Although the USFS ceased publication of annual wildfire reports in 1997, the agency is still required to collect wildfire activity statistics from state and local units that it supports under its State and Private Forestry Cooperative Fire Assistance Program (USFS, 2010). Each participating state's fire marshal or other authority uses the Annual Wildfire Summary Report (AWSR), form FS-3100-8, to provide the necessary summary information, including the numbers and area burned by wildfires responded to by state and local firefighting agencies (USFS, 2010). We extracted wildfire counts and area burned estimates from all AWSRs available from the FAMWEB Data Warehouse, by state, for the period 1992-2011.
Finally, we extracted any all-lands estimates of wildfires and area burned that were published or otherwise made publicly available by the states themselves or other interagency groups. We were able to find such estimates, purported to be for all ownerships and spanning the entire period of interest (1992-2011), from the Alaska Interagency Coordination Center (AICC), the California Department of Forestry and Fire Protection (CDF), the Southwest Geographic Area Coordination Center (SWCC, covering Arizona and New Mexico), and the South Carolina Forestry Commission (SCFC). Wildfire activity statistics for state and local ownerships were obtained from the Texas A&M Forest Service (TFS) for 2005(TFS) for -2011 In sum, we used the USFS/NICC estimates as our default for comparison to the FPA FOD data summarized by state and year. When any of the USFS/NICC estimates appeared to underestimate wildfire numbers or area burned within a state-year based on publicly available estimates from another authoritative source, we used the latter for the assessment described in the following section.

Methods of assessment
We compared estimates of wildfire numbers and area burned from the FPA FOD to those from the reference sources; this amounted to a comparison of measurement methods, for which many statistical analyses are inappropriate (Bland and Altman, 1986). For example, given the generally high interannual variability in fire numbers and area burned, annual wildfire activity metrics from different sources can be highly correlated but have very poor agreement, making correlations irrelevant. Because we were interested in the similarity of estimates (in relative rather than absolute terms), we calculated, for each state and year, the ratio of (1) wildfire numbers and (2) wildfire area burned estimated from the FPA FOD and the same metrics from the reference source. We did not expect perfect agreement in any case, due to measurement inconsistencies and errors. However, if the FPA FOD and the national interagency statistics were based on similar levels of federal and non-federal reporting, and the FPA FOD was nominally complete in that regard, then the estimates of fires and area burned from the FPA FOD should agree with the national interagency estimates reasonably well (e.g., ±20 %) and the ratio of FPA FOD numbers to the reference (REF) estimates should be close to 1. Averaged across years, the FOD/REF ratio essentially provides an index of agreement for a given period. However, in cases in which the FPA FOD includes records from non-federal sources unaccounted for in the reference estimates (e.g., data from local fire departments) for some years and not others, the average of the ratios can be very near to 1 despite poor agreement in estimates for individual years. We therefore used the ratio simply to score states on a scale of 0-10, by limiting the maximum value of the ratio to 1 and then averaging across years and multiplying the resulting value by 10. We used the scaling factor to make the result appear less like the average of the unadjusted ratios in order to avoid confusion. Low scores indicate states for which the FPA FOD appears relatively incomplete for the period of assessment based on national published estimates of wildfire numbers or area burned, while high scores indicate states for which the FPA FOD yields estimates of those metrics that tend to meet or exceed those from the reference source(s). We used the entire 20 yr span, 1992-2011, as one period of assessment and the recent 10 yr span, 2002-2011, as a second period, because we expected differences largely due to the increased use of non-federal reporting systems in the latter period.

Results
A total of nearly 2.6 million US wildland fire records were obtained from the (non-independent and generally overlapping) sources listed in Table 1 and considered for inclusion in the FPA FOD. The bulk of the data that we acquired were processed in 2010, at which point we limited our focus to records only from the years 1992-2008. At that time, we identified approximately 1.2 million wildfire records from that 17 yr time span that met our geospatial and information requirements, and which we compiled into a single database. Via our process, we identified approximately 120 000, or 10 %, of those records as redundant with others in the data set and subsequently purged them from the final database, thereby eliminating 22 million hectares from the original 58million hectare data set, or 38 % of the total. In other words, the redundancy we identified in the unprocessed, compiled data set inflated wildfire numbers by a factor of 1.1 and inflated the estimate of wildfire area burned by a factor of 1.6. During the winter months of 2011-2012, we processed and added data for 2009-2011 as well as newly available records from the non-federal systems that were known to be missing from the 1992-2008 data set. The resulting FPA FOD, spanning 1992-2011, includes nearly 1.6 million wildfire records, which collectively account for approximately 46 million hectares burned during the 20 yr period. The data set includes wildfire records from each of the 50 US states and from the District of Columbia and Puerto Rico.
The years and states for which at least a subset of data from one or more of the non-federal reporting systems is included in the final database are identified by the boldface values in Tables 9 and 10. Twenty-one states, including Alaska, afforded at least some viable and non-redundant non-federal data for the entire 20 yr span. Considering just the 10 yr span, 2002-2011, the number of states affording some viable and non-redundant non-federal data for that entire period rises to 36.
The maps in Fig. 1 show the locations of all of the wildfire records by year from the conterminous US included in the FPA FOD. A map of the locations of all FPA FOD records in the conterminous US is juxtaposed with a map of land mapped as burnable wildland surface-fuel types (per Scott and Burgan, 2005) in the LANDFIRE Refresh 2008 (LF_1.1.0b; see Ryan and Opperman, 2013) data set in Fig. 2. Figure 3 shows the locations of wildfires in the conterminous US, 1992-2011, reported as greater than or equal to 405 hectares in the (A) FPA FOD and (B) MTBS perimeter data set, for comparison.

Agreement with national estimates
Published USFS wildfire activity statistics for 1992-1997 approximate the number of US wildfires and total area burned during that six-year period as 626 000 and 9 million hectares, respectively, averaging to about 104 000 fires and 1.5 million hectares per year. Approximations for 1998-2011 from NICC Predictive Services are 1.1 million wildfires and 38 million hectares total, averaging to about 80 000 fires and 2.7 million hectares per year for that 14 yr period. Annual estimates of US wildfire area burned from the FPA FOD align well with the national estimates for the entire 20 yr period (Fig. 4b), but annual fire numbers from the FPA FOD are generally consistent only with the numbers reported for 1998-2011 by NICC (Fig. 4a). Annual fire numbers estimated from the FPA FOD range from 65 to 70 % of the USFS estimates for the period 1992-1997 (Fig. 4a).  Ryan and Opperman, 2013). Points depicting the locations of fires are not to scale. Areas with few or no fires represented in the data set may have afforded little viable data for inclusion in the FPA FOD, or, alternatively, may be areas with little or no burnable land cover. Non-burnable fuel types are those with "insufficient wildland fuel to carry wildland fire under any condition", and include urban or suburban development, agricultural land maintained in a non-burnable condition, snow/ice, open water, and bare ground (Scott and Burgan, 2005). Agreement with published estimates of wildfire numbers and area burned is generally evident for states and years from which non-federal data were available and incorporated into the FPA FOD (Tables 9 and 10). Agreement scores for the 21 states from which we acquired viable non-federal data for all 20 yr ranged from 8.1 to 9.9 for wildfire numbers (Table 9) and 7.3-9.9 for wildfire area burned (Table 10). Our index of agreement is generally high for states and regions with a large proportion of land area administered by one or more of the five primary federal agencies with wildlandfire-management programs (Figs. 5 and 6), as they are least affected by missing non-federal records. Twelve of the 17 states in the western region scored 8.2 or higher for the 20 yr period for wildfire numbers and area burned, while the region's west-central states and Hawaii, which have less land under federal administration than Alaska and the far-western states in the conterminous US, scored the lowest of the western region (Tables 9 and 10, Fig. 5a and c). Nine of the ten US states with the lowest 20-year scores, for either metric, are in the northeastern region (Tables 9 and 10, Fig. 5a and c). While they are included in Tables 9 and 10, scores for the states of Iowa, Illinois, Kansas, New York, and Texas are not indicated in Fig. 5, because they are misleading due to reporting biases evident in the FPA FOD and the reference sources (which we describe in the next section and illustrate in Figs. 7-10).

Discussion
Nearly 100 years ago, Show and Kotok (1923) argued that "successful [wildfire] protection depends on a critical study of past performances. For this purpose, the importance of accurate and complete records of fires cannot be overemphasized." These sentiments have been echoed repeatedly since, including in 1959, by A. A. Brown, who contended that "the discovery of new facts ferreted out of reliable records of our day-to-day experience in [wild]fire-control operations can point the way to improvements in management, training, methods, and techniques." More recently, Brown et al. (2002) pointedly characterized wildfire-occurrence data as "the most important data that a fire management [sic] agency can utilize." A wealth of information is collected in federal, state, and local fire reports, but even the most rudimentary interagency analyses of wildfire numbers and area burned from the systems of record have been unfortunately stymied to some degree by their disunity. While necessarily incomplete in some aspects, the database presented here is intended to facilitate fairly high-resolution geospatial analysis of US fire activity over the past two decades, based on available information from the authoritative systems of record.
The inherent incompleteness of this data set is largely a function of the lack of viable non-federal wildfire records from certain states and years. These records are lacking for one of two reasons: (1) fire reports had not been entered into the systems of record that we accessed, or (2) the archived fire reports lacked values for one or more of the data elements required for inclusion in the FPA FOD (i.e., fire location at least as precise as PLSS section, discovery date, and final fire size). The location requirement was the primary cause for omission of non-federal records from the FPA FOD. Despite increased use of the national non-federal reporting systems (see Thomas and Butry, 2012), precise fire location information is not consistently required by those systems and thus not included in many of the available wildfire records. For example, about half of the wildfire records initially acquired in 2010 from the NASF database and 92 % of those from the NFIRS wildland fire module lacked location information that met our criteria. When we acquired data again in 2012 from the NASF database via the FAMWEB data warehouse, the number of records without viable location information had dropped to 24 %. Of the wildfire records acquired most recently (in 2012) from NFIRS (for the year 2010), 82 % still lacked location information that met our criteria. According to Thomas and Butry (2012), data entry into the wildland fire module of NFIRS, which is where precise location and size information can be entered for wildfires, is "not required by fire departments, so it is rarely completed." The impact of missing non-federal data on the utility of the FPA FOD for analyses of wildfire activity will depend on the domain of interest and the federal, state, and local wildland firefighting roles and responsibilities in the states within that domain. Federal agencies that manage and administer large tracts of public or tribal-trust lands are also authorized to, and generally do, provide wildfire protection for those lands, either directly or indirectly through contracts and agreements with other organizations (Artley, 2009). Federal reporting of those activities has been generally consistent and complete since circa 1992 (S. Larrabee, personal communication, 2010), and those records should be relatively complete in the FPA FOD. The degree to which state entities have legislatively accepted wildfire protection responsibility for state and private lands (i.e., non-federal and nontribal lands outside incorporated municipalities) varies by state (Artley, 2009). Only fires to which the state responds, whether for initial attack or large fire support, are expected to be reported in the state systems of record. The FFS, for example, has assumed direct wildland firefighting responsibility for all state and private lands in the state of Florida, while the Washington State Department of Natural Resources (WDNR) and Oklahoma Forestry Services (OFS) are examples of state entities that provide direct protection on non-federal lands only in designated areas (Artley, 2009). Unlike records from the FFS, which should map statewide, those from WDNR and OFS are expected to cluster in certain regions (i.e., forested lands in Washington and eastern portions of Oklahoma, respectively) and indeed do (see Fig. 1).
In certain states, including Colorado, Kansas, and Texas, the local fire departments are the official primary and initial responders to non-federal wildfires, with state (and federal) agencies authorized to provide support to the local authorities when local response capabilities are surpassed (e.g., by large fires) (Artley, 2009;TFS, 2012). The degree of protection responsibility assumed by the state firefighting entities is not necessarily consistent over time, however, and some of these role changes are evident in the availability of wildfire records for certain states. In Texas, for example, the area  Tables 9 and 10). Low scores indicate states for which the FPA FOD appears relatively incomplete for the period of assessment based on national published estimates of wildfire numbers (A, B) or area burned (C, D), while high scores indicate states for which the FPA FOD yields estimates of those metrics that tend to meet or exceed those from the reference source(s). Scores for IA, IL, KS, NY, and TX are omitted because they are misleading due to reporting biases evident in the FPA FOD and the reference sources (see Figs. 7-10). for which the Texas A&M Forest Service (TFS) provides cooperative fire protection began expanding in the mid-1990s from state and private wildlands in eastern Texas (i.e., east of Interstate 45) to the entire state, per state law passed in 1993 (TFS, 2009). The TFS estimates that it still only responds to about 15 % of the wildfires that occur each year in the state, but because many of the incidents that it is called upon to assist with are very large, TFS records typically account for 70 % of the total area burned, including, since circa 1998, incidents in central and western Texas (TFS, 2012). For Texas and other states in which local entities bear responsibility for initial attack on non-federal wildlands, no data set purporting to represent all lands can be deemed complete, particularly with regard to wildfire numbers, without viable local wildfire records. In general, non-federal fire reporting has been on the rise over the past several decades, and users of national data sets like the FPA FOD must beware of local reporting biases in addition to those of state entities in order to avoid drawing spurious conclusions when analyzing the data. Apparent trends in the numbers and area burned by wildfires, for example, may be the result of multiple factors, including changes in climate, fuels, demographics (e.g., population density), fire-management policies (Johnston and Klick, 2012), and -as we underscore here -levels of reporting.
While the US fire departments' system of record, NFIRS, dates back to the mid 1970s, its use has always been voluntary and therefore subject to the vagaries of voluntary  Fig. 1, and from the FPA FOD. Estimates from Annual Wildfire Summary Reports (AWSR), which should cover state and private lands only, are included as well. Viable non-federal data were available and included in the FOD for all years, and FPA FOD estimates agree well with AWSR numbers throughout the entire period. Estimates from all sources agree well for the period 1992-1999, but are based only on federal records and records from fires responded to by the state's forest ranger force. Beginning in 2000, numbers from the AWSR and the FPA FOD increase with increased reporting from local fire departments, which remain unaccounted for by the NICC estimates throughout the entire period.
reporting. It is estimated that, by the late 1980s, only one third of all fires -of any type -responded to by US fire departments were accounted for in the system (Hall and Harwood, 1989). That percentage is estimated to have risen to 65 % by 2010 (Thomas and Butry, 2012). Of all fires reported in NFIRS, only a fraction are indicated as involving wildland fuels. Using a national estimates approach adapted from Hall and Harwood (1989), Thomas and Butry (2012) determined that an average of nearly 117 000 fires in wildland fuels were responded to annually by US fire departments from 2002 to 2006 and that the data entered into NFIRS represented just a subset of those, growing from 11 % in 2002 to 27 % in 2006. Thus, even if all of the wildfires in the NFIRS database could be incorporated in the FPA FOD, the resulting database would provide an incomplete, and inconsistent, account of US wildfire activity from the standpoint of local responses. But, as noted above, of the growing subset of wildfires responded to by local departments and reported in NFIRS, only a small fraction met our geospatial and information standards. However, we appear to have acquired and incorporated viable records of some "local fires" for certain states and years from outside NFIRS, thereby augmenting that subset in the final FPA FOD.
Although Thomas and Butry (2012) contend that national wildfire activity statistics based on federal and state incident reporting "do not account for wildland fires originat- ing within municipal jurisdictions (i.e., towns and communities)", mutual aid fires and other local incidents with federal or state responders will, in fact, likely be reported in the respective agency systems of record. In other cases, however, there are clear signals in the data that indicate that local incidents, whether or not responded to by the state fire service, have been included in the state system(s) of record, and, subsequently, the FPA FOD. The number of wildfire records we acquired from the state and federal systems for New York and Texas, for example, begins to inflate at points in time, evidently corresponding to increased reporting of local incidents in the state systems (Figs. 7 and 8). According to the New York State Department of Environmental Conservation (NYDEC), its state forest ranger division reported 279 wildfires, on average, for the period 1988-2012 (NYDEC, 2013). The NYDEC distinguishes those fires from the annual average of 5500 "wildfires, brush fires, grass fires, or other outdoor fires" that the New York State (NYS) Office of Fire Prevention and Control reports that NYS fire departments responded to from 2002 to 2011 (NYDEC, 2013). The FPA FOD numbers are consistent with the NYDEC (i.e., state) Estimates from the USFS wildfire activity statistics reports were used for 1992-1997 reference numbers, which indicate that an average of about 1300 and 600 wildfires occurred annually in Iowa and Illinois, respectively, during that period. Estimates from NICC Predictive Services were used for 1999-2011 and estimates from Annual Wildfire Summary Reports (AWSR), which should cover state and private lands only, are included, starting in 2005 for Iowa and in 2008 for Illinois. For both states, the FPA FOD estimates for 1992-2000 are based on reports from the federal systems of record only. While we were able to incorporate some viable data from the non-federal systems for both states beginning in 2001, and the FPA FOD and NICC estimates align fairly well in certain years during the period 1999-2004 for Iowa and 1999-2006 for Illinois; those estimates are based largely, if not solely, on federal reporting. Furthermore, the number of records in the FPA FOD and the wildfire numbers reported by NICC are less than 70 % of the AWSR numbers for Iowa 2005-2007. The FPA FOD estimates agree best with the Illinois AWSR numbers 2008-2011. However, we presume that those too are underestimates of wildfire activity on all ownerships, which may be best approximated by the earlier USFS counts. An upward trend toward those earlier numbers is evident for Illinois in 2008-2010, ostensibly due to increased non-federal reporting.
estimates prior to 2000, after which the FPA FOD wildfire count climbs to a maximum of about 7700 in 2005, with no addition of data from the NFIRS system, suggesting that the While we were able to incorporate some viable data for Kansas from the non-federal systems for all but 1992 and 2009-2011 (Table 9), the resulting set of FPA FOD records for the state is clearly far from complete. Agreement with the reference estimates improves 2004-2011 not due to acquisition of substantially more viable non-federal data, but instead due to decreased reporting of non-federal fire activity to NICC. NASF database, and thereby the FPA FOD, began including local (i.e., fire department) reports in 2000 (Fig. 7). In the case of Texas, fire departments began using a reporting system maintained by the TFS in 2005(TFS, 2012, and the Texas wildfire records for 2005-2008 that we acquired from the NASF database and those for 2009-2011 that we acquired directly from the TFS evidently included at least some, if not most, of those local reports, as the annual FPA FOD numbers jump from an average of about 1500 to 13 000 starting in 2005 (Fig. 8). The pre-2005 numbers average about 12 % of the 2005-2011 numbers, which is generally consistent with the TFS (state fire service) estimate that it responds to only about 15 % of all wildfires in the state each year (TFS, 2012). The inclusion of local records for 2005-2011 in the FPA FOD is likewise evident from changes in the spatial extent of the data for Texas: starting in 2005, coverage markedly increases in the central and western portions of the state (Fig. 1), which have effectively remained outside of the purview of TFS fire protection, except in cases of cooperative assistance for large incidents. Local numbers appear to be included in the NICC estimates for Texas only from 2008 to 2010, as county data. In the cases of both New York and Texas, the FPA FOD estimates of wildfire numbers and area burned agree fairly well with reference estimates across all years, as apparent from their relatively high scores in Tables 9 and 10. However, the overall agreement is misleading if used as a proxy for completeness of the data set, due to the fluctuations in local reporting evident during the period of record (Figs. 7 and 8).
There are other cases in which agreement between metrics estimated from the FPA FOD and reference sources clearly does not translate to completeness, because, again, the reference estimates themselves are evidently incomplete. Most notable are the cases of Iowa, Illinois, and Kansas, which, along with New York and Texas, were omitted from the maps in Fig. 5 because their scores were misleading. As with New York and Texas, there are periods during which the FPA FOD estimates of wildfire numbers for Iowa, Illinois, and Kansas align well with reference estimates, but those estimates are based largely, if not solely, on federal reporting and therefore biased low (Figs. 9 and 10). That is indeed the case for Iowa during the period 1999-2004 and for Illinois during 1999-2007 (Fig. 9). For both states, the FPA FOD estimates likewise agree relatively well with the generally higher AWSR numbers for 2008-2011, which are non-federal estimates only. We presume that those too are underestimates of wildfire activity on all ownerships, which may be best approximated by the earlier USFS counts (Fig. 9). An upward trend toward those earlier USFS numbers is evident for Illinois in 2008-2010, ostensibly due to increased nonfederal reporting during that period (Fig. 9). For Kansas, we acquired very few non-federal records that met our criteria for inclusion in the FOD, and therefore all FPA FOD estimates for that state are biased low despite fairly good agreement with NICC numbers for the period 2004-2011, which too are based on scant non-federal reporting (Fig. 10).
Ignoring the five states just noted, which are difficult to score due to reporting biases, we have tried to identify states and years in which there is relatively good FPA FOD agreement with numbers and area burned estimates published elsewhere (Fig. 5). Keeping in mind the inherent incompleteness of local records for most, if not all, states and years, the FPA FOD should be best suited for all-lands analyses of wildfire activity in areas within states or interstate regions (e.g., ecoregions) with relatively high scores (e.g., > 8) for wildfire numbers and/or area burned. Those high-scoring states include most of those in the Great Lakes, southeastern, and western regions (including Alaska), as well as a handful of others. Moreover, the FPA FOD should be complete or near complete in terms of data compiled from the federal systems of record, which alone lends itself to analyses that were previously limited to data from a single agency (e.g., USFS) due to difficulty compiling records from the disparate federal systems (e.g., Stephens, 2005). Regardless of its apparent appropriateness for a given analysis, users must carefully critique the FPA FOD to recognize and understand any potential limitations of data obfuscated by a relatively high score. Likewise, users may find value in at least a subset of data from low-scoring states, but the onus is on the users to critique them fully.
Examples of questions to ask before proceeding with any analysis of wildfire activity using the FPA FOD include the following: 1. Are wildfire records from multiple federal agencies and/or non-federal sources required to answer the question(s) at hand? If yes, then the scores presented here should provide a starting point for determining an appropriate spatial domain and/or identifying potential issues with a predetermined domain for analysis. If no, and the question at hand can be answered by directly consulting the system of record of a single federal agency, for example, then we recommend using that authoritative source rather than the FPA FOD. The FPA FOD is intended to facilitate interagency or all-lands analyses, not to replace or trump existing systems of record, which remain in authority. In other words, if an analysis of wildfire activity solely on USFS lands is to be performed, then the USFS system of record, FIRE-STAT/NIFMID, should be consulted.

What spatial resolution is required for the analysis?
The FPA FOD should provide point locations of wildfires at least as precise as a PLSS section (2.6 km 2 grid). But many non-federal records that were excluded from the database due to imprecise fire location information could be used directly from the source systems for analyses at, for example, the county level (e.g., per the Cohesive Strategy, WRSC, 2012). If the analysis does require precise wildfire location information, analysts must bear in mind that the coordinates provided in the FPA FOD may or may not represent actual ignition points, or even fall within the actual burn perimeter, due to reporting inconsistencies and imprecise georeferencing. Moreover, the spatial impacts of large fires, which, by definition, burn far from their ignition points, can be characterized imprecisely at best with this or any pointbased reporting data set. Burned-area estimates from the FPA FOD will be necessarily georeferenced to the contributing wildfires' ostensible points of origin or nominal domains (e.g., state, reporting unit). Without fire footprints and temporal progression information, one cannot assert for a time period of interest that a given area burned, for example, in the state of Nevada; rather, an estimate from the FPA FOD would represent area burned by fires reported as starting (or having been discovered) within the specified time period and spatial domain. The FPA FOD is therefore most useful for characterizing the statistical properties of fires reported as starting at a given place and time. Supplemental information about spatial and temporal impacts of large fires can be found in the ICS-209 records and MTBS data set using the links provided in the FPA FOD.  USDI and USDA, 2012). Federal Information Processing Standards (FIPS) codes are used to identify states (NIST, 1987). due to the lack of viable local wildfire records. Simply because so many fires responded to by local fire services go unreported (Thomas and Butry, 2012), it is impossible to characterize patterns or trends having to do with wildfire numbers on all US lands based on fire reporting. A true all-lands analysis may be possible only with the aid of satellite detections (e.g., MODIS; see Urbanski et al., 2009); however, it is difficult, if not impossible, to confidently tease apart planned from unplanned ignitions (i.e., prescribed and agricultural burns versus wildfires) in remotely sensed data. Yet, we expect that, at the very least, the FPA FOD can be used to faithfully characterize patterns of wildfire area burned within high-scoring states or interstate regions ( Fig. 5c and d). We expect that this generally holds true regardless of missing local data, because the largest 5 % or so of all fires generally account for upwards of 85 % of total area burned (Strauss et al., 1989;Stocks et al., 2003), and these large fires tend to be multijurisdictional events that are responded to and reported by entities other than (although perhaps in addition to) local fire departments (Bunton, 1999;Artley, 2009). Even so, users should check other data sources and consult with local authorities to confirm that no significant wildfires are missing from the FPA FOD for their focal area before performing analyses of wildfire area burned. A relatively low ratio for a given state-year in Table 10 could guide analysts in this regard, keeping in mind that it is  USDI and USDA, 2012). Federal Information Processing Standards (FIPS) codes are used to identify states (NIST, 1987 not uncommon to find quite disparate area burned estimates for fires reported and/or mapped in the various systems and applications, indicating that there is much uncertainty inherent in those estimates (Urbanski et al., 2009).
It is also important that users recognize that the data elements included in the FPA FOD are just a core subset of what may reside in each system of record, and additional elements may be necessary or desirable for the analysis at hand. It may be important, for example, to screen or stratify records based on fire-management objectives or initial suppression strategies (see Johnston and Klick, 2012), which are not elements in the FPA FOD, but may be indicated, albeit inconsistently, in the original, source data sets. For large or otherwise signif-icant wildfires occurring since 1999, the ICS-209 records in the SIT/209 databases accessible via FAMWEB should provide some of the most detailed and consistently available information regarding suppression and other management actions taken during those incidents (see NIFC, 2011;NWCG, 2012a), and where ICS_209_INCIDENT_NUMBER and ICS_209_INCIDENT_NAME are populated in the FPA FOD, those attributes, along with FIRE_YEAR, can be used to join the two data sets. Users can then define their own criteria for distinguishing different fire types, as relevant to their analyses. Given careful consideration of the assessment to be performed and a thorough understanding of the data and their limitations, the FPA FOD should provide unprecedented one-stop access to spatially explicit US wildfire records for 1992-2011 to support statistical analyses of data from the authoritative systems of record. Acquisition, standardization, error checking, compilation, scrubbing, and evaluation of these data required a tremendous effort, but the effort was necessary for the analyses required by FPA. Assuming that the need for fully scalable US wildfire activity statistics from the systems of record will persist, efforts like those described here, which essentially expand upon similar undertakings in past decades (e.g., Brown et al., 2002;Schmidt et al., 2002), must necessarily continue until at least the most basic elements of wildland fire reporting are truly standardized and a national system of record is established and used by federal, state, and local firefighting entities. Progress toward those goals appears promising under the auspices of the Integrated Reporting of Wildland-Fire Information (IRWIn) project, which is sponsored by the NWCG and intended to provide an "integrated and coordinated process for collecting and reporting [wildland fire] incident/event data" (USDI, 2013). In addition to enforcing NWCG data standards, IR-WIn, as proposed, would provide linkages between wildland fire dispatch and reporting systems, including those of nonfederal entities, using a global unique identifier for each incident. Indeed, IRWIn may provide the ultimate way forward in national wildland fire reporting and statistical analyses. In the meantime, the FPA FOD is intended to help fulfill information needs heretofore met with great difficulty or not at all.