Over 10 million seawater temperature records for the United Kingdom Continental Shelf between 1880 and 2014 from 17 Cefas (United Kingdom government) marine data systems

The datasets described here bring together quality-controlled seawater temperature measurements from over 130 years of departmental government-funded marine science investigations in the UK (United Kingdom). Since before the foundation of a Marine Biological Association fisheries laboratory in 1902 and through subsequent evolutions as the Directorate of Fisheries Research and the current Centre for Environment Fisheries & Aquaculture Science, UK government marine scientists and observers have been collecting seawater temperature data as part of oceanographic, chemical, biological, radiological, and other policy-driven research and observation programmes in UK waters. These datasets start with a few tens of records per year, rise to hundreds from the early 1900s, thousands by 1959, and hundreds of thousands by the 1980s, peaking with > 1 million for some years from 2000 onwards. The data source systems vary from time series at coastal monitoring stations or offshore platforms (buoys), through repeated research cruises or opportunistic sampling from ferry routes, to temperature extracts from CTD (conductivity, temperature, depth) profiles, oceanographic, fishery and plankton tows, and data collected from recreational scuba divers or electronic devices attached to marine animals. The datasets described have not been included in previous seawater temperature collation exercises (e.g. International Comprehensive Ocean–Atmosphere Data Set, Met Office Hadley Centre sea surface temperature data set, the centennial in situ observation-based estimates of sea surface temperatures), although some summary data reside in the British Oceanographic Data Centre (BODC) archive, the Marine Environment Monitoring and Assessment National (MERMAN) database and the International Council for the Exploration of the Sea (ICES) data centre. We envisage the data primarily providing a biologically and ecosystem-relevant context for regional assessments of changing hydrological conditions around the British Isles, although cross-matching with satellite-derived data for surface temperatures at specific times and in specific areas is another area in which the data could be of value (see e.g. Smit et al., 2013). Maps are provided indicating geographical coverage, which is generally within and around the UK Continental Shelf area, but occasionally extends north from Labrador and Greenland to east of Svalbard and southward to the Bay of Biscay. Example potential uses of the data are described using plots of data in four selected groups of four ICES rectangles covering areas of particular fisheries interest. The full dataset enables extensive data synthesis, for example in the southern North Sea where issues of spatial and numerical bias from a data source are explored. The full dataset also facilitates the construction of long-term temperature time series and an examination of changes in the phenology (seasonal timing) of ecosystem processes. This is done for a wide geographic area with an exploration of the limitations of data coverage over long periods. Throughout, we highlight and explore potential issues around the simple combination of data from the diverse and disparate sources collated here. The datasets are available on the Cefas Data Hub (https://www.cefas.co.uk/cefas-data-hub/). The referenced data sources are listed in Sect. 5. Published by Copernicus Publications. 28 D. J. Morris: Seawater temperatures around the UK, 1880–2014


Introduction
The measurement of surface and subsurface seawater temperature has been a standard activity for a significant proportion of marine researchers for the past 200 years.From the physical oceanographer to the marine chemist to the marine biologist, the original purposes for such measurements range from a desire to determine the physical properties and movements of seawater to understanding how temperature influences the distribution of marine species, their migration, growth, and reproduction, and, as a dominant feature of the collected works herein, the impacts of and upon commercial activities such as fishing.Furthermore, accurate sea temperature data are necessary for a wide range of applications, from providing boundary conditions for numerical hydrodynamic models and weather prediction systems, to assessing the performance of long-term climate modelling and understanding the drivers of observed changes in marine ecosystems.The importance of sea surface temperature (SST) to climate science is reflected in its designation as an "essential climate variable" of the Global Climate Observing System (Bojinski et al., 2014).
The Marine Biological Association (MBA) of the United Kingdom was established in 1884 in order "to foster the study of marine life, both for its scientific interest and because of the need to know more about the life histories and habitats of food fishes".In 1902 a dedicated fisheries laboratory was established in the Port of Lowestoft by the MBA together with the UK Board of Trade.This was the UK's primary contribution to the newly founded International Council for the Exploration of the Sea (ICES).From its inception, the laboratory in Lowestoft has collected information on fish stocks surrounding the British Isles, but also water temperatures at the surface and near the seabed.Much of the information collected by the Lowestoft laboratory over the past 115 years has never been made publicly available, but these datasets are now the subject of legacy data rescue (Wyborn et al., 2015) as part of a drive for "open data" within the UK government.This paper is one result of that ongoing effort.In their Preamble, Griffin and the CODATA DAR-TG (2015) describe the unglamourous reality of legacy data rescue and the reasons why heritage data are not as readily accessible as the term "archive" might imply.The approach taken here is to turn, in their terminology, old data into new data and to present, explore, and explain the new data so that they can be used within a context that includes the diverse and disparate reasons for which the old data were collected and the differences and limitations of the acquisition and measurement techniques of the day.
The methods of measuring seawater temperature range from the simple thermometer to the ubiquitous presence on a modern marine research vessel of a conductivity, temperature, and depth (CTD) instrument of some kind.Such activ- ities have, for well over 100 years, formed a routine part of the sea-going and observational work of the MBA Lowestoft substation and its successors.In 1910 the Lowestoft laboratory transferred to the Board of Agriculture and Fisheries where it then became a Fisheries laboratory under MAFF (Ministry of Agriculture, Fisheries and Food) in 1920.From 1955 it was known as the DFR (Directorate of Fisheries Research); see Lee (1992) and Graham (1953).It now continues as Cefas (Centre for Environment, Fisheries & Aquaculture Science) under Defra (Department of Environment Food and Rural Affairs), with a remit focusing on the UK Continental Shelf and occasional forays into more distant waters for projects supporting UK government priorities.
Data holdings within this institution extend back beyond 1902 although these form only a very small part of the collated temperature dataset described here.The historic focus of our marine research has been biological, specifically fisheries related, but this has changed as both government policy needs and interests have widened.Figure 1 shows the RV Huxley, which was deployed between 1902 and 1909, with Fig. 2 highlighting the differences between the adapted trawlers of early years and the current bespoke research vessel, the RV Cefas Endeavour, which started service with Cefas in 2003.A wider, historic, institutional context for the 17 data sources described here is available in Cefas (2014).The methods of measuring seawater temperature have ranged from simple mercury thermometers deployed in buckets of seawater, to pumped seawater systems on research vessels (see Kent and Taylor, 2006, for an exploration of these methods of measurement), to the ubiquitous presence on most modern research vessels of CTD instruments or, more recently, autonomous surveillance buoys, gliders, profilers, and electronic devices attached to animals.Much has been written about difficulties in calibrating information from these various data sources; see, for example, Matthews (2013) and Kennedy et al. (2011a, b).Subtle differences in the methodologies for calibrating such disparate measurements have been found to greatly impact reconstructions of time series of global climate warming (Karl et al., 2015).Both issues with ship data sources have been specifically identified, including the change from bucket samples to engine intake thermometers, and more relevant here, the increase in data density with time as buoy-mounted observation systems were deployed as sources of time-dependant bias in the global SST record.We explore such possible data bias in general terms along with examinations of the effects of data source, time dependencies, location, and numerical bias.
Many different data portals and data syntheses now exist housing collated maritime temperature records, the most notable including the International Comprehensive Ocean-Atmosphere Data Set (ICOADS; Freeman et al., 2017), the NOAA Extended Reconstruction Sea Surface Temperature (ERSST; https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v4.html) dataset, the Hadley Centre SST gridded dataset derived from observations in ICOADS (HadSST3; Kennedy et al., 2011a, b), and the Japanese Meteorological Agency centennial observation-based estimates of SSTs (COBE-SST; http://ds.data.jma.go.jp/tcc/tcc/products/ elnino/cobesst/cobe-sst.html).All of these are composite SST series that ingest data from multiple different instrument platforms (ships, buoys, and some satellite data in the case of COBE-SST) and from different measurement methods to create consistent long-term time series (see Hausfather et al., 2017).Analysis of these long-term historic datasets show that the sea surface temperatures around the British Isles have warmed at rates up to 6 times greater than the global average (Dye et al., 2013).Indeed, this region has been identified as one of 20 "hot-spots" of marine climate change globally based on an analysis of trends in ocean temperature (Hobday and Pecl, 2014).
Numerically, the data presented here start with tens of observations per year, rising to hundreds from the early 1900s, to thousands by 1959, to hundreds of thousands by the 1980s, peaking with > 1 million for some years from 2000.The majority of the data included in this paper originate from modern research and monitoring programmes executed by scientists using appropriate QA-QC (quality assurance and quality control) processes for their designated purposes, which did not include the extensive sharing and repurposing of the current day.
In this paper, 17 separate data systems are described, comprised of more than 10 million individual temperature measurements.Most are from the seas around the British Isles (ICES areas IV, VI, and VII) but there are some additional measurements in the Bay of Biscay (ICES area VIII), off Labrador and southern Greenland (ICES area XIV) and in the Norwegian and Barents seas (ICES areas I and II); see Fig. 3 (ICES, International Council for the Exploration of the Sea).Dann et al. (2015) specifically recognise the challenges of using "data available from different surveys [that] have been collected for different purposes, using different gears and different sampling strategies over time".They were working on fish and their aim was "to provide a broad view of regional, depth related . . .and temporal patterns . . .by integrating as much information as possible".This paper collates www.earth-syst-sci-data.net/10/27/2018/ Earth Syst.Sci.Data, 10, 27-51, 2018 and makes readily accessible data that can contribute significantly to such integrations of seawater temperature.
The data collection programmes that act here as data sources were designed to measure temperature for a specific purpose (physical oceanographic measurements and as part of Cefas SmartBuoy programmes focusing on nutrient levels or as a directly relevant contextual measurement, e.g.WaveNet and RV Cefas Endeavour FerryBox).Other datasets arise from research for which temperature data are collected for general context and interpretation.Two data sources are from citizen science, although the Coastal Temperature Network (CTN), which was established in the mid-1960s (with individual datasets going back over 100 years), preceded the term whilst also relying on volunteers.The majority of these temperature datasets have been previously analysed and integrated into a myriad of diverse and disparate reports and scientific papers, often in the form of summary tables and figures or as contributions to understanding the environment of fish and other biota.Most of the recent data now reside in numerous operational database systems, whilst a significant proportion of the rest now exist in organised and documented electronic forms thanks to recent legacy data rescue efforts by Cefas; all are available through the published discovery metadata Cefas Data Hub (http://data.cefas.co.uk), the UK Government Metadata Portal (https://data.gov.uk/data/search), and the MEDIN Metadata Portal (http://portal.oceannet.org/search/full).
The Cefas Data Hub extends the search for discovery metadata to include direct access to data.It provides direct access to extracts from Cefas operational databases to facilitate data reuse beyond the original purpose.This paper takes an additional step and makes comprehensive, quality-assured extracts for this key physical parameter readily available and easily accessible in simple text files of seawater temperature data, with each record standing alone and not associated with bespoke and specialist data formats.Throughout, we highlight and explore potential issues around the simple combination of data from the diverse and disparate sources collated here.
This paper focuses on seawater temperature data but we recognise the value of assembling and publishing co-located data, such as salinity and the presence of species (in the case of the plankton dataset), amongst other parameters.The Cefas Data Hub currently holds published data in source formats with the intention of making these and other datasets more accessible by using transformations similar to those executed here.

Overview of the basic characteristics of the seas covered by the dataset
Most are from the seas around the British Isles (ICES areas IV, VI, and VII) but there are some additional measurements in the Bay of Biscay (ICES area VIII), off Labrador and southern Greenland (ICES area XIV) and in the Norwe-gian and Barents seas (ICES areas I and II); see Fig. 3.The International Council for the Exploration of the Sea (ICES) produces an annual report on the marine climate of the North Atlantic (the ICES Report on Ocean Climate).This gives a broad description of the oceanography of this region and documents the year-by-year variations using a set of hydrographic stations collected by the international community (Larsen et al., 2016).They describe the variation in the northern North Atlantic and sub-Arctic seas where the North Atlantic Current provides a source of heat and salt along the eastern margin into the Barents Sea and entry to the Arctic Ocean.Along the western margin, the Arctic influence of cold and fresh conditions extends from the Fram Strait to Cape Farewell.At the southern part of the region covered by the Cefas temperature data from the western channel down to Iberia, the influence of subtropical waters is more evident.
The combination of gyres and the North Atlantic Current places the UK shelf waters at the boundary between temperate and subpolar waters exerting a heavy influence on the variability of conditions in the Greater North Sea and Celtic Seas.

The Greater North Sea
The temperature of the Greater North Sea is controlled by the seasonal cycle of heat exchange with the atmosphere, the vertical mixing in the water column, and the circulation of waters from the North Atlantic.The annual mean temperature generally increases from the south (in the English Channel) to the north (near Shetland), but this pattern is not representative of all seasons.During the winter the shallow waters in the southern North Sea that are furthest from the influence of the inflowing North Atlantic waters tend to be the coolest in the entire Greater North Sea.
Northern North Sea.Modified Atlantic water flows into the region via the Fair Isle current, maintaining relatively warm winter temperatures, typically 6 to 9 • C minimum with a decrease to the south as water from the Atlantic is cooled by atmosphere and depth shallows.Summer temperatures are typically 12 to 14 • C near the surface with a cooling influence evident from the North Atlantic inflow, and it generally stratifies.
Southern North Sea.The southern North Sea is shallow, mostly less than 50 m in depth, and furthest from the inflows and influence of Atlantic water.Temperature minima in winter are typically 4 to 8 • C; they depend strongly on the weather in any one year and on depth (shallower → cooler).Likewise, the typical summer maxima of 16 to 19 • C depend on the weather and strongly on depth (shallower → warmer) English Channel.From depths of less than 50 m near the coast and the Dover Strait, the channel deepens westwards to 100 m.The influence of Atlantic water also increases towards the west and only some parts in the very west stratify in the summer.Thus minimum winter temperatures, typically 5 to 8 • C, are strongly dependent on the weather in any one year and on depth.Summer maximum temperatures are typically 16 to 19 • C.
The Greater North Sea near-bottom temperatures differ from SST due to stratification, which takes place only during the summer.Where the region does stratify (in the northern North Sea and at the very western part of the English Channel), summer temperatures near the bottom remain cool until the breakdown of stratification in the autumn.

The Celtic Seas
The various temperature and salinity characteristics of the Celtic Seas are reflective of the inhomogeneity of the region, from enclosed shallow-shelf sea with large river catchments all to deep oceanic waters and across a wide range of latitudes.Surface temperature is controlled by a balance of seasonal heating, vertical mixing, and the circulation of Atlantic water, with the relative importance depending on local depth, tides, wind, and exposure to the ocean.
Celtic Sea.Sea temperatures are strongly related to the weather in any one year and to water depth.The climate being strongly maritime, typical winter minima are 8 to 11 • C and summer maxima are 14 to 18 • C. The seasonal cycle of near-bed temperature in this part of the region is controlled by the vertical mixing.When well mixed vertically in the winter, its temperature is similar to that at the surface.During the summer the area stratifies and near-bed temperatures do not reach the temperature maxima of the surface; the maximum annual temperature here is typically reached in October when the heat of surface waters is fully mixed down.
Irish Sea.Temperatures depend strongly on the weather in any one year and on water depth.Typical winter minima are 4 to 8 • C and summer maxima are 14 to 18 • C. As elsewhere, temperatures also depend on whether the area stratifies.The area is well mixed vertically in winter and typical winter minima match the SST at 4 to 8 • C. In the areas that stay well mixed throughout the year, summer maxima of 14 to 18 • C are typical, while areas that stratify in the summer reach their annual maximum of 13 to 15 • C in autumn when the heat of surface waters is fully mixed down.
Minches and western Scotland.There is some influence of (modified) Atlantic water arriving from the west.Resulting typical winter minimum temperatures are 6 to 8 • C and summer maxima are 13 to 15 • C in well mixed areas or 11 to 13 • C where stratified.Typically, there is summer stratification in the deep waters away from islands and north of the Islay front (west of Islay to Ireland).
Scottish Continental Shelf.Except for shallow areas near coasts, there is summer stratification.Temperature minima in winter are typically 9 to 10 • C at the shelf edge but 6 to 9 • C elsewhere; they depend on the weather in any one year, on depth, and on travel time for any Atlantic water arriving from the shelf edge.Summer maxima are typically 12 to 14 • C for surface water.

Data sources
The 17 source systems are the following.
1.The Cefas Coastal Temperature Network (CTN) is comprised of time series of measurements from a number of long-term recording stations throughout the coast of England and Wales, with measurements provided by volunteers and external suppliers who have agreed that their data can be published as part of the network (Jones, 1981).See also Joyce (2006), Jones and Jeffs (1991), Ellett andJones (1994), andNorris (2001).In Joyce (2006), Appendix A, Table 8, and the associated figures show data at Brancaster that result in a yearly anomaly from a base period of 4-5 • C.These data have been excluded from this compilation.
2. The Cefas Fishing Survey System (FSS) is a purposebuilt database used to hold and maintain Cefas fish survey data, primarily from government-mandated surveys.
3. The Cefas Oceanographic Archive (OA) is a system for managing data from a CTD system deployed during traditional oceanographic water-column profiling.
4. The Cefas Plankton Analysis System contains data from the sampling of plankton which has been carried out by Cefas since the 1940s.In recent decades, sampling has mainly been concentrated on fish eggs and larvae and other zooplankton.Pre-egg survey temperature data are profiles from stations.Egg survey temperature data are from a sensor attached to the net.Plankton samples were collected using high-speed towed nets that capture plankton from the surface to near the seabed.At each sampling position the sampler was deployed in an oblique tow from the surface to within approximately 2 m of the seabed.Veering and hauling speeds were manually adjusted with the aim of sampling each depth band equally.Since the early 1980s CTD sensor packages have been fitted to the plankton samplers to continuously monitor temperature and salinity throughout each deployment, with positions interpolated from start and end times and positions.(Jones and Jeffs, 1991; see https://www.cefas.co.uk/cefas-data-hub/ sea-temperature-and-salinity-trends/data-sets/ for full descriptions of sites and routes) and from Cefas research vessel surface logger systems.The surface logger data were used, stored, and processed as part of the vessel management system and were normally run during cruises.9.The Cefas Electronic Data Storage-Tag Database supports the deployment of electronic tags that record temperature and depth.These tags were attached to or implanted into several species.The data provided here are from cod caught in the southern North Sea between 1999 and 2009 (for methods see Neat et al., 2014).Data from tags that were returned from recaptured cod were downloaded and the depth time series was used to estimate daily geographic location.This was done by matching the tidal and maximum depth data to known dates and locations as per the method described in Pedersen et al. (2008).Temperature data from each tag were binned into 10 m depth intervals and then averaged.Cod were at liberty to move at will, so the geographic and vertical sampling is not regularised to a grid or vertical stratification.The data describe the temperature data sampled by a total of 90 cod and are comprised of temperature data collected on a total of 10 446 days.Methods used to capture and tag cod are found in Righton et al. (2010) and Neat et al. (2014).Summary data are published in Neat and Righton (2007) and Righton et al. (2010).

The Cefas Fisheries
10. Citizen Science Diver Recorded Temperatures come from a data source that differs from the others in this collection because it arises from an investigation into the potential for citizen science to contribute to assessments of the marine environment.The dataset is derived from a database containing over 7000 records of temperature data collected from temperature-compensated dive computers.The lowest temperature is recorded from the thermal sensor.This resulted in a qualityassured dataset of just over 5000 records (including freshwater and lake data).The subset of the global dataset provided covers the UK shelf.See Azzopardi and Sayer (2012) and Sayer and Azzopardi (2014) for additional information.Data accuracy for some instruments is limited to 1 • C.
11.The Cefas Lowestoft Sample Data Management System (LSDM) was the primary system used before and throughout the 1990s by Cefas (Lowestoft) to manage water sample processing and data.Its function was to provide a vehicle for the management of the ingestion, analysis, and recording of measurements on marine samples ranging from oceanographic water samples through sediments to "environmental materials" and radiological samples; see Sutton (1993) for an example of the supporting role of LSDM in relation to the usually high-level scientific measurement systems of the day and Sauer et al. (2002) for an example of its pivotal role in quality-assured processes and analyses.As the work profile for the Ministry of Agriculture, Fisheries and Food's Directorate of Fisheries Research changed followed by the creation of Cefas and then Defra, the need for a centralised system for the management of an extensive suite of physical samples decreased.LSDM was closed in 2015 with chemical data transferred to other systems.The temperature data held included the historical ferry routes and historical CTN data, both covered separately.The remainder from a variety of programmes and cruises are presented in this section.
12. The Mnemiopsis Ecology Modelling and Observation Project (MEMO) was part of a wider sampling programme in collaboration with Ifremer and ULCO (France), ILVO (Belgium), and Deltares (Netherlands).
The data collected were used to produce models, such as an individual biological model and hydrodynamic, ecosystem, and socioeconomic models; see Collingridge et al. (2014) and van der Molen et al. (2015).These increased the understanding of the life cycle of warty comb jellyfish (Mnemiopsis leidyi).
The project collected samples for the analysis of fish larvae and fish eggs, microzooplankton and mesozooplankton, and phytoplankton.Samples were collected using a 200 µm mesh ring net of 0.5 m diameter (for zooplankton samples) and physical data were collected via a CTD attached to a ring net.
13.The Cefas Multibeam Acoustics Sound Velocity Profile Temperature Data comes from the RV Cefas Endeavour, which has been routinely deploying multibeam acoustic measurement techniques since 2005, with particular emphasis being placed on habitat mapping projects (Brown and Vanstaen, 2008).As part of the calibration of the various acoustic systems, a CTD cast is performed at relevant stations to provide temperature data for the necessary calculation of sound velocity.
14. Intensive plankton surveys off the north-east coast of England in 1976 were comprised of a series of 12 cruises carried out in 1976 by DFR staff to investigate the distribution, abundance, mortality, and main predators of planktonic fish eggs and larvae of important commercial fish species (e.g.plaice, cod; Harding and Nichols, 1987).Measurements of surface water temperature and salinity and bottom temperature were carried out at each sampling station on a planned survey grid.
15.The RV Cefas Endeavour FerryBox Monitoring System was installed in 2009.Unlike most Ferry-Box systems (http://www.ferrybox.organd specifically the systems described at http://noc.ac.uk/ocean-watch/ shallow-coastal-seas/ferrybox), RV Cefas Endeavour runs a combination of regular (usually annual) monitoring cruises in UK shelf waters (with a focus on ICESmandated surveys for fisheries assessments) and bespoke research cruises.This provides widespread coverage with some repeat components in time and space.
16. Cefas ScanFish was a programme that deployed a highperformance towed undulating CTD, initially to aid the understanding of the coupling between physical and biological processes (Brown et al., 1996).It was towed behind the vessel at approximately 8 kn and undulated from the near surface (∼ 4 m) to within a few metres (∼ 5 m) of the bed, down to water depths of 135 m.The vertical ascent rate was controlled so that each undulation covered a horizontal distance of 1 km regardless of water depth.17.The Cefas ESM2 Profiler-mini CTD Logger is a Cefasdeveloped micro-logger for applications requiring a small low-power logger with integrated sensors and battery.It has standard sensors for conductivity, temperature, depth, optical backscatter, and roll and pitch.It was initially developed to be a handheld profiler that could be used from small boats and/or when a conventional large rosette could not be used.It is now used routinely in place of traditional CTD equipment (data held in source system 3) and widely used on RV Cefas Endeavour research cruises to provide profiles of the water column for fisheries and plankton work (replacing or supplementing data in sources 2 and 4).
The date ranges and numbers of observations for each data source are summarised in Table 1.

Data components and methods
Each specialist data collection system is described in detail in the appropriate metadata.The data files have been extracted from the source to provide the following (with field names in parentheses): 1. Cefas data source reference number (Source); 2. date and time of measurement (Time); The Ref1 and Ref2 fields were extracted from the source data files and provide an operational context (where this is appropriate and/or available) for the original source data, e.g.cruise and station.The Sample and Measure fields provide information on the acquisition of data and are included specifically to facilitate understanding and removal of sample bias and autocorrelation effects.The accuracy of the data is described in the metadata accompanying the data files.The number of decimal places provided reflects the source files and can generally be taken as a realistic indication of the accuracy of the position, depth, and temperature.Note that all data have standardised formats and trailing zeroes do not imply increased accuracy.
The methods used to measure parameters over the time span of the datasets vary widely in their resolution (the smallest change that can be measured), precision (the repeatability of the system used), and accuracy (the closeness of the measurement to the actual value).The data provided reflect our best estimates of accuracy when transforming the data from a wide variety of bespoke measuring, recording, and www.earth-syst-sci-data.net/10/27/2018/ Earth Syst.Sci.Data, 10, 27-51, 2018 Such are the "statistical" perils of data reuse.

Source
This denotes which of the 17 data sources the record was extracted from.This field allows data to be integrated across data sources whilst retaining a reference to the source and originating resolution, precision, accuracy, and original purpose for each of the records.A significant numerical majority of data extracted from the data sources come from sensors and platforms that will be familiar to a reader around the time of publication.However, historical data, whilst of particular interest, comes with historical navigation, sensors, data gathering methods, and platforms.The following sec-tions describe differences that a reuser of data should take into account.

Time
Across the data sources, dates and times have been recorded in a variety of ways.We have made the reasonable assumption that all times recorded used Zulu as the time zone, which equates to GMT and now UTC.Date and time were usually recorded for individual measurements unless the operational systems, such as point source data buoys, average the data at collection.Where times are not specifically recorded (usually old, shore-or vessel-based manual records) they are taken as standard for the particular source; daily reports are allocated as 12:00, morning as 08:00, and afternoon as 16:00 as best approximations for likely collection times.Some datasets take observations at local high tide.Some CTD profiles provide a start time only; depth and temperature measurements are allocated a time by interpolation using a standardised rate of descent (0.25 m s −1 ).The plankton data (source 4) required positional interpolation based on start and end times and positions.

Latitude and longitude
An informed use of the datasets requires an understanding of the changes in methods of measurement of location over time.Past practice separated the detailed recording of navigational data and associated uncertainties from the provision of positions to researchers.The former has not been specifically preserved.
The earliest research records consist of data from lightships which, we assume, were reasonably accurately located.We think, based upon historical statements on intentions of best practice, that navigation on the early vessels engaged in research and monitoring would have generally always followed good practice at the time (Lee, 1992, p. 173).When in range, research vessels would have used coastal navigation techniques, including physical aids to navigation, wherever possible and positional accuracy would depend upon the navigational chart's hydrographic survey.In addition, accurately surveyed depth contours were used as position lines when useful and practical (Graham, 1953).Locations close to charted objects would have been more reliable, precise, and accurate.
Beyond coastal waters where astronomical navigation was used, positional accuracies might have been "of the order of one or two miles" (Captain R. Jolliffe, personal communication, 2017) with uncertainties deriving from the ability of the navigator, the feasibility of sextant observations in weather, and the accuracy of navigational tables.Star sights (taken at dawn and dusk when the horizon and astronomical bodies were both visible) would provide two fixes per day.Morning sun sights run up to noon latitude would give a total of up to three fixes per day.In a chapter on navigation errors, the Royal Navy ( 2008) indicates an accuracy of 2 miles for an experienced navigator.From fixes of whatever sort, dead reckoning (DR) or estimated positions (EPs) would be applied to derive a station position where no actual fix was possible.DR is a process of calculating a position using distance and direction from the start, whilst EP applied corrections for the set (direction) and drift (speed) of the prevailing current.Both were probably used depending on circumstances and needs, but no records of when and where are available.Pawsey et al. (1920) report that during investigations of Lousy Bank in 1920, taking observations for station fixes based on the sun and/or three stars was the preferred method, but if the weather was inclement and they had no other option, they used DR but "with concerns about strong currents".
Civilian Decca navigation systems (in general use from the late 1940s to ∼ 2000) offered positional accuracies of the order of ∼ 200 m to 3 miles depending on the distance from the base stations.The longer-range Loran systems (in general use from ∼ 1974 to ∼ 2010) were less accurate.
Satellite navigation began with the Transit system in the late 1970s, giving global coverage and a fix at intervals, depending upon satellite availability, of anywhere between 1 and 6 h.Continuous positional information became available in the 1990s with the advent of the US Navstar GPS system.GPS accuracy depended, in part, on the application of selective availability (SA), which degraded the accuracy of the system for civilian use to between 30 and 100 m.DFR used differential GPS services to overcome this problem from about 1992, improving accuracy to the order of tens of metres.In 2000 the US government abandoned SA, making standard GPS accurate to within about 15-20 m.RV Cefas Endeavour routinely achieves positional accuracies of 5 m, improving to less than 10 cm if differential GPS services are used, e.g. on bathymetric surveys.
We make a reasonably secure assumption that the reference coordinate system used from the adoption of satellite navigation was the default of the system: WGS72 and then WGS84.
Other than the stated increase in accuracy with time from miles to hundreds to tens to single metres, we cannot be clearer on the actual positions of samples other than to note that the positions have been extracted "as is" and converted to decimal degrees where needed.
In addition to errors in measurement, positional data also suffer from potential human error, conversion errors and errors in electronic storage and display.Latitudes and longitudes are presented as a best estimate representing actual likely accuracy, e.g. 4 dp (∼ 4-11 m depending on location) or 3 dp (∼ 40-110 m).A position originally recorded in degrees, minutes, and integer seconds (2 dp for decimal degrees) would be accurate to ∼ 400 m to 1 km.
The long-term electronic data storage tags for fish do not use GPS but indirect interpolations of position from depth and time.

Depth
This is the depth at which the sample (physical or direct measurement) was taken.The main measurement devices use pressure suitably corrected for temperature for a depth below the surface."Surface" temperatures feature widely in the records and are taken as 0 m although there are clear sources of error with the position of the sensors (both depth and temperature) on the relevant instrument and/or sampling device.Again, these surface measurements can be affected by wind, wave, and tide."Bottom" temperature is less used in the data sources but features for profiles and tows.Its meaning varies from the maximum depth of sample measurement (in the water column) to the measurement taken when the sampling gear is on the seabed (where the sensors may be of the order of 1 m plus above the seabed).
Depths are as recorded with an accuracy of rounded integers (or ±0.1 m for some profiles).
The NOAA bathymetric data used to create the maps used in this paper allow for the interrogation of "water depth" by using the R package marmap (Pante and Simon-Bouhet, 2013).This was used as part of the quality control process in which positional data alone were insufficient to ensure an appropriate location.

tC
Values are in degrees centigrade.The accuracy of the seawater temperature measurements varies and is summarised in
The combination of MPT and SYS indicates a stationary data acquisition system that may need to be treated in a way that allows for data density bias and autocorrelation.

Measure
The Measure codes are the following: -MAN (manual) and -INS (instrument).

Ref1, Ref2
These fields record contextual data from the source systems with Ref1 providing a high-level aggregation and Ref2 a lower-level grouping.They allow data to be manipulated or interpreted in relation to their source and any relevant breakdown in activities of the operations of the source system.(https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/198752/ 13-744-shakespeare-review-of-public-sector-information.pdf), which states the following: "A National Data Strategy for publishing PSI should include a twin-track policy for data release, which recognises that the perfect should not be the enemy of the good: a simultaneous 'publish early even if imperfect' imperative AND a commitment to a 'high quality core' . . .get it all out and then improve".
The use of original, archived source data files means that any specialist QA-QC processes applied "upstream" during the original uses of the data are covered in general in the relevant publications, but the details of the data QA-QC processes deployed are not necessarily available.The historic nature of a lot of the archived data means that the focus was on the often highly specific measurement protocols with temperature as either a core or peripheral parameter.If it was core, for the bulk of the data, part of the physical oceanographic investigations that utilised a series of electronic measurement systems that were advanced and accurate at the time, each with bespoke acquisition and processing systems, ultimately created an archive with a reasonably consistent approach but over 10 often subtly different formats.If it was peripheral, data accuracy is reduced by dint of the sensors used and the calibrations employed.Formats again vary, from sensors of fishing trawls feeding into an operational database to sensors on plankton tows feeding into a large and diverse spreadsheet archive over 2 decades.
Data assembly, transformation, and scrutiny were as follows.
-Identification of Cefas data sources with public seawater temperature data and assembly of relevant datasets from source archives and extraction from operational databases.
-Extraction of required elements, primarily from text files and spreadsheets, including derivation of positions and time from start and end data where required and the reformatting of date and time from several different formats.
-The checking of date and time data consisted of format transformations which picked up systematic source differences and manual adjustments where, for example, sensor logging was not capable of recognising date changes during deployment and/or issues with early PCs, which had similar problems when interfacing with instruments.
-The checking of location by plotting on maps followed by the identification and, in some cases the removal, of plots that indicated errors in the often manual recoding of position.Positions on land indicated either a hemisphere recording error or omission or a manual positional recording error.Where the former were encountered and obvious, the relevant cruise reports were checked and adjustments to the extracted data were made.Where the latter were encountered, entire stations or sets of stations (probably associated with a watch) were omitted.
-Seawater temperature data included instrument and manual values indicating sensor errors, and these were screened by an initial ingestion filter of < −2.5 and ≥ 35 • C, followed by specific checks of temperature > 25 • C to remove erroneous values.These ranged from single, starting data points possibly arising from exposure to the air to transposition errors for which values of 30 in, e.g.winter, indicated a storage or transposition error in and from the raw data files usually associated with conductivity.Detection of such high values resulted in a reassessment of the bespoke ingestion programmes and a rerun to correct errors and maximise data ingestion.Sequential temperature difference plots were used to identify large changes in temperature over short time periods.In some cases, these apparent anomalies were artefacts of this simple analysis, with two sequential data points coming from different vessels in different hemispheres on different days.In other cases, this plot identified datasets, usually profiles, in which reasonably significant chunks of a profile were significantly different from the rest.These were removed.Plots of temperature against time and monthly average temperatures also highlighted potentially anomalous data, e.g. 4 • C measurements at the surface in summer and significantly higher averages compared to surrounding data.The former were resolved by the identification of an unexplained switch in one source's recording date format from DD/MM/YYYY to MM/DD/YYYY with the days and months involved, e.g.31/08 to 09/01 rather than the correct 01/09 not triggering date ingestion format check errors.
-Other test plots highlighted 0 • C data near the surface in summer in the North Sea.These were identified as sensor, transmission, transcription, or storage errors because the value 0.0 appeared in data sequences of, for example, 10.1, 10.2, and 10.3.These were also removed.
-Early plots of what became Fig. 16 indicated unseasonally high or low temperatures (e.g.UK Continental Shelf near-surface waters with 14-15 • C in February and 1-1.5 • C in June) and apparent outliers.These prompted a final systematic check of the fully assembled data by plotting data by month, followed by the identification of suspect data.This was then replotted www.earth-syst-sci-data.net/10/27/2018/ Earth Syst.Sci.Data, 10, 27-51, 2018 by individual source to provide a context against which to evaluate apparent outliers.Unseasonally high and low data revealed as outliers in the source dataset were removed.Other outlier data were removed where appropriate, although the majority of apparent high and low outliers (see Fig. 16) were attributed to sources and sites that included shallow and relatively isolated water bodies.
Given the wide variety of sources and, in a lot of cases, the non-physical oceanographic focus for the data-generating activities, a formal and rigid retrospective application of oceanographic data quality control procedures was not applied across the board.However, where appropriate they were applied at the source, e.g. the CTD and ScanFish data (sources 3 and 16).In both cases the relevant standard IOC methodology was applied.For the remaining sources, the descriptions above cover the intent of such standards, specifically basic checks for all data types, e.g.date and time, latitude and longitude, position (must not be on land), and other relevant checks, such as impossible speed, spike, global range, regional range, and check for duplicates.
Best efforts have been made to remove all obvious errors, but it is possible that some remain amongst the 10 million plus data points made available here.Please contact data.manager@cefas.co.uk to report any errors; these will be corrected and the source files on Cefas Data Hub and the relevant metadata will be updated on confirmation of any error.The same contact can be used if external users of the data wish to explore collaboration or need assistance with interpretation.

Bias estimation
The provision of these raw data is "as measured" with appropriate metadata to allow subsequent scientific trend analysis to be performed, which would usually include additional scrutiny for systematic bias.The main exercise here is to identify and facilitate access to a large source of hitherto unavailable data that is as yet unseen and unscrutinised by the broader community.
An assessment of accuracy and bias has been conducted by the data creators for some of the sources included here.For example for source 10, we referenced Wright et al. (2016) who examined whether the temperature data derived from hundreds of recreational scuba divers and many different models of dive computer were consistent with global sea temperature datasets.Similarly, temperature sensors on Cefas SmartBuoys and WaveNet platforms (sources 6 and 7) are calibrated annually at Cefas against certified platinum resistance thermometers.Data are subject to a full quality assurance procedure which assigns flags to poor-quality data (e.g. for sensor malfunction or drift; see https://www.cefas.co.uk/ cefas-data-hub/dois/cefas-smartbuoy-monitoring-network/).
We note that ICOADs and other collated datasets (e.g.HadSST) tend to carry out their own systematic bias correction routines whenever new data are uploaded or admitted.Our intention is to make our data available so that they can be easily included (by other authors) in platforms such as the ones listed (ICOADS, COBE-SST, ERSST, and HadSST3).
Within the text of the paper we include references to papers that discuss bias correction (e.g.Mathews, 2013; Kennedy et al., 2011a, b;Karl et al., 2015;Hausfather et al., 2017), but we leave it to those who might make use of the data to judge what procedures might be necessary for their own purposes.

Data summary by source
Table 1 provides summary metadata for each of the 17 source datasets, including their temporal coverage, the number of data points, and the type of measurement (e.g.fixed station, CTD profile, electronic device attached to an animal, etc.).Sources 1 and 2 provide the longest time series of measurements (each more than 100 years), but more recent data systems, e.g.sources 6, 7, and 8 (autonomous surveillance systems) and the undulating tow systems for plankton (4) and oceanography ( 16), contribute the bulk of the assembled observations.
Table 2 provides an overview of the estimated actual accuracy of the data by data source.Information on sensor resolution, accuracy, and precision is available in the relevant data source metadata or in any cited publications and/or associated documents.Where sensor resolution, precision, and calibration are unclear or unknown, conservative estimates are made based on local knowledge from internal records or cruise participants.

Summary of sources, geographic range, depth range, and temporal coverage used in data subsets
Example potential uses of the data and subsets are described using plots of data in four selected groups of four ICES rectangles covering areas of particular fisheries interest.The full dataset enables extensive data synthesis, for example in the southern North Sea where issues of spatial and numerical bias from a data source are explored.The full dataset also facilitates the construction of long-term temperature time series and an examination of changes in the phenology (seasonal timing) of ecosystem processes for a wide geographic area with an exploration of the limitations of data coverage over long periods.
Table 3 provides a summary of the subsetting of the data undertaken to illustrate potential uses and limitations of a simplistic approach to synthesis and analysis.Source is a key variable with, in this case, potentially significant temporal, spatial, and sensor resolution differences.The intervals used Earth Syst.Sci.Data, 10, 27-51, 2018 www.earth-syst-sci-data.net/10/27/2018/ to subsample the data reflect the requirements of visualisation and plotting rather than any intrinsic temperature-related aspect.The highlighted geographic areas were selected to illustrate data coverage and any issues of numerical, spatial, and temporal bias.The depth ranges used reflect a primary interest in sea surface temperatures with 44 % of the data falling within a 0-5 m depth.The time range selections primarily reflect data availability.

Data summary by location
Figure 3 shows the location of measurements across all 17 data sources.It is clear that the majority of coverage is of the English Channel, the North, Irish, and Celtic seas, and the UK Continental Shelf area, reflecting historic work focused on fisheries, plankton, and oceanography as part of repeated survey programmes or bespoke research.The data from around Svalbard, Greenland, and Labrador reflect the historic interest in cod fisheries around the Arctic and the physical oceanography in those regions (see Townhill et al., 2015).
Figure 4 provides an overview of the relative data density in the English Channel, the North, Irish, and Celtic seas, and the UK Continental Shelf area.It highlights the numerical dominance of point source data, e.g.autonomous Smart-Buoys (source 6, primarily in the North and Irish seas), data from WaveNet (source 7, off the east and west coasts of Scotland), and the single year (2014) of near-continuous (1 min) data from the Coastal Temperature Network at the Port of Dover.Areas of scientific interest in the Celtic Sea (mainly source 4, plankton studies) and the North Sea (a combination of oceanographic studies, sources 3 and 16; vessel-mounted data from sources 8 and 15 and general purpose CTD data from source 17) provide more widespread but significant data densities.Subsequent sections explore data availability by source, time, geographic location, and depth in more detail.

Data summary by year
Figure 5 illustrates the inherent differences in the data coverage with time throughout the 134 years covered with low but increasing numbers of annual records between 1880 and 1956 and a 2 order of magnitude increase during the 1980s to around the year 2000.This is followed by a further order of magnitude increase as a result of the introduction of autonomous monitoring platforms that make measurements on an hourly or even minute-by-minute basis in some cases.These platforms were also deployed in research roles on the North Dogger Bank and Oyster Grounds.
Other seawater temperature data compilations (e.g.HadSST3) show similar data acquisition trends.There are challenges when attempting to reconstruct long-term trends in a region, as many thousands of records may derive from one particular sampling locality, with very few data points  elsewhere (see below and e.g.MacKenzie and Schiedek, 2007).

Data summary by depth
Figure 6 illustrates data coverage by depth.Figure 6a shows data between the surface and 10 m with high numbers (10 5 to 10 6 ) reflecting the preponderance of automated data collection platforms and vessel-mounted loggers.Figure 6b shows coverage between 10 and 100 m, and Fig. 6c shows data from 100 to 250 m covering the continental shelf break.Data coverage drops considerably with increasing depth as shown in www.earth-syst-sci-data.net/10/27/2018/ Earth Syst.Sci.Data, 10, 27-51, 2018  It is important to note that most of the existing data portals containing seawater temperature measurements (e.g.ERSST, HadSST3, COBE-SST) only accommodate records at the sea surface.The World Ocean Database (https:// www.nodc.noaa.gov/OC5/WOD/pr_wod.html) and the Met Office EN4 database (https://www.metoffice.gov.uk/hadobs/en4/) do contain subsurface data and ICES (http://www.ices.dk/marine-data/data-portals/Pages/ocean.aspx)attempts to provide insights into near-seabed temperature conditions in certain geographical areas, but data are generally sparser than for the surface.Argo is a global array of 3800 freedrifting profiling floats that measure the temperature and salinity of the upper 2000 m of the ocean.Argo deployments began in 2000, and by November 2007, the millionth profile was collected, greatly increasing the knowledge base with regard to open-ocean and deep-water temperature conditions (see Riser et al., 2016).
The emergence of novel undulating platforms, such as ScanFish (source 16), electronic instruments attached to animals (source 9), and more recently autonomous gliders, will steadily increase the availability of measurements at depth, as will opportunistic data obtained from recreational scuba divers (source 10).

Data summary by ICES statistical rectangle groupareas of fisheries interest
To demonstrate data coverage in more detail, groups of four ICES rectangles of particular fisheries interest were selected with summary plots of the available "near-surface" data (0-5 m).This depth range specifically includes the large datasets from vessel-mounted pumped seawater systems.The four areas shown in Fig. 7 are (from N, W, S, and E) -Liverpool Bay (Irish Sea), -Haig Fras (Celtic Sea), -Brixham (English Channel), and the Thames Estuary and the East Anglian coast (southern North Sea).
Liverpool Bay is an inshore area of langoustine (Nephrops), herring, and plaice fisheries but also an area characterised by major development of offshore wind farms in recent years.The ICES rectangles selected are 35E5, 35E6, 36E5, and 36E6 with a geographic bounding box of 54 • N, 3 • W, 53 • N, and 5 • W. They include extensive sampling along the North Wales coast as part of fisheries research projects and surveys centred on Red Wharf Bay in the 1960s.Figure 7 shows the intensive sampling efforts that occurred throughout the 1960s and 1970s and again after 2000 when the autonomous Liverpool Bay SmartBuoy (source 6) was installed, taking hundreds of new measurements each day.A number of long-term Coastal Temperature Network (source 1) monitoring stations have existed in this area, notably at Wylfa, Amlwch, Moelfre, and Bangor.
The ICES rectangles in the Celtic Sea (29E1, 29E2, 30E1, 30E2; geographic bounding box of 51 • N, 7 • W, 50 • N, 9 • W) were selected because this is known as an important area for cod, hake, angler fish, and megrim.The selected area includes Haig Fras, a 45 km long submarine granitic rocky outcrop which, because of the diverse fauna associated with its bedrock reef habitat, is protected as a special area of conservation (SAC).Other seawater temperature records have only been collected on an occasional basis in this region, although more surveys have been conducted in recent years associated with the designation of this feature as a new marine protected area.
Brixham is now one of the most important fishing ports in England and home to major beam-trawl fishing fleets.Important sole, plaice, and lemon sole fisheries exist inshore, and a cuttlefish fishery extends offshore.The ICES rectangles selected are 28E6, 28E7, 29E6, and 29E7 with a geographic bounding box of 50 • 30 N, 2 • W, 49 • 30 N, and 4 • W. Temperature sampling in this region, particularly in recent years, has generally been focussed around the annual Channel Groundfish Surveys, with a particular concentration of data measurements in quarter 1 (March) and quarter 3 (July).
The Thames Estuary and East Anglian coast are important for sea bass, sole, and elasmobranch fisheries.The ICES rectangles selected are 32F1, 32F2, 33F1, and 33F2 with a geographic bounding box of 52 • 30 N, 3 • E, 51 • 30 N, and 1 • E. Some of the longest-running time series exist for this region, in particular from the Coastal Temperature Network (source 1) monitoring stations that have existed at Bradwell since 1964, Leigh on Sea and Southwold since 1966, and Sizewell since 1967.Earlier temperature measurements were taken primarily during fisheries research surveys and, in addition, regular sampling was begun aboard the Harwich to Rotterdam ferry after 1970.A major intensification of samwww.earth-syst-sci-data.net/10/27/2018/ Earth Syst.Sci.Data, 10, 27-51, 2018 pling occurred after 2000 following the installation of the autonomous Warp and Gabbard SmartBuoys (source 6).

Southern North Sea geographic data coveragespatial, source, and numerical bias
The southern North Sea is an area of particular interest because it is one of the regional seas that is reported to have warmed the most dramatically over the 20th century (Dye et al., 2013;Hobday and Pecl, 2014).Figure 8 shows the geographic distribution of Cefas near-surface (between 0 and 5 m) seawater temperature data (specifically chosen to include data from vessel-mounted pumped systems).It also shows a clear geographical bias in terms of data coverage in the selected offshore area (geographic bounding box of 54 • N, 4 • E, 52 • N, 2 • E).This does not overlap with the Thames Estuary and East Anglian coast data plot above.
The area selected specifically includes data from autonomous platforms to highlight potential issues with data density in any reuse of this data.
Figure 8 shows concentrations of measurements around major offshore fishing grounds on the North Norfolk sandbanks (e.g.Leman Ground, Smiths Knoll, Swarte Bank, Indefatigable Banks), line transects across the North Sea from ferry routes, ScanFish and the FerryBox system (sources 8, 15 and 16), and a background pattern of gridded stations from the ICES International Bottom Trawl Survey Programme (source 2).
The distribution of numbers of data points within this area led to the sources being grouped as follows: -> 100 000 data points (represented as red in Figs. 9 and 10), -≥ 30 000 ≤ 50 000 (represented as blue in Figs. 9 and 10), -≥ 2000 ≤ 6000 (represented as green in Figs. 9 and 10), and < 2000 (removed from this analysis to aid clear visualisation).
Figure 9 breaks down the temporal and numerical coverage of the data illustrated in Fig. 8, illustrating the temporal dominance of source 8 (ferry routes and surface logger systems) and the combined, post-2000 numerical dominance of the single SmartBuoy and WaveNet moored autonomous platforms (sources 6 and 7), both located in the western part of the selected area.

Southern North Sea data coverage by number and time
Figure 10a illustrates the numerical dominance of sources 6 and 7 highlighted above.Figure 10b combines plots of the selected seawater temperature records with time, using the colours from Fig. 8 to further clarify the temporal influences of major data sources.Several patterns can be discerned.Firstly, a slight upward trend is apparent across the whole 100-year time series with generally warmer temperatures at the end of the 20th century compared to the beginning.There is an absence of data from the periods of both World Wars when the DFR research vessels were requisitioned by the Admiralty for war service, mines were installed in coastal waters, and all research at the Lowestoft laboratory ceased.
Several extremely cold winters are apparent, most obviously the winter of 1962-1963 (also known as the "Big Freeze"), which was one of the coldest winters on record.In February to March 1963, seawater along the coasts of Essex and Kent froze over and catches of dead fish (particularly sole) were recorded throughout much of the region (Woodhead, 1964).
It is also clear that from around 2000 onwards, winter minima rarely fall below around 5 • C. It is not clear whether this is related to the beginning of the operational deployments of SmartBuoy and WaveNet stations by Cefas around this time.
In addition to the potential influences of data volumes with time on, e.g.trend interpretation, there are potential geographic and depth biases associated with source.These are illustrated in Fig. 11, which partitions the data shown in Fig. 7 by time (focusing on the period after the year 2000 identified in Fig. 10b) and by depth; 95 % of all the available data in the selected area are between 0 and 5 m, with 90 % of the 0-5 m data in the top 1 m.
Figure 11a shows the geographical distribution of data post-2000 between 1 and 5 m, whilst Fig. 11b shows data between 0 and 1 m (dominated by sources 6 and 7).The locations of the two autonomous monitoring stations are shown as orange spots in Fig. 11b; both are in the south-west quadrant of the selected area.This provides a numerical, geographical, and depth bias in the data available since 2000.These factors would need to be taken into account in any investigation into the causes of the absence of minimum annual data less than around 5 • C, e.g. using models that allow for spatio-temporal trends and correlation.It is beyond the scope of this paper to construct the statistical models necessary to clarify the influences of data availability in space, time, and number; however, we do provide a further, simple examination of the potential effects of depth, location, and data number bias.
Figure 12 compares the seawater temperature records of the data in the selected area post-2000.Figure 12a shows data that do not come from the two autonomous monitoring stations, whilst Fig. 12b does.The patterns in the plots of individual data points are similar with some higher individual readings in Fig. 12a, possibly reflecting data acquired at the surface where aerial exposure during deployment is a known possible influence.
Figure 13 explores the potential influence of numerical differences in data numbers with time using all available data in the selected area of the southern North Sea to calculate annual seawater temperature statistics.It plots annual statistics (all sources, all depths) as points before 1955 when data are particularly sparse.This limited data coverage gives rise to apparent anomalies with maximum average temperatures below 10 • C in the early 1930s and one year below 5 • C in the early 1950s.Post-1955, the increase in data volumes provides a more coherent picture (plotted as points and lines), reflecting to some degree the trend in increasing maximum and mean temperatures expected from the scientific papers cited above.The observed winter of the "Big Freeze" in the early 1960s is again very clear.However, the post-2000 absence of data below 5 • C at the surface (shown in Fig. 10b) is not reflected in the annual minimum data for all depths.
Figure 14 illustrates the depth component of the data sources in a small selected geographic "belt".Source 3 (Oceanographic Archive) is represented by a vertical CTD profile.Source 4 (Plankton Analysis System) shows temperature data gathered during a "V" profile plankton tow.
Figure 15 illustrates some of the characteristics of the data sources.Source 4, the Plankton Analysis System, provides more data at depth and this is illustrated in the south-western quadrant, an area of particular interest for plankton studies.Further north, routes to and from a series of set stations (source 8) provide data from the late 1950s to the mid-1990s.In the North Sea, the bulk of data offshore and at depth come from an extensive series of ScanFish tows (source 16; see Brown et al., 1996).The data subsets described above are comprised of surface measurements and temperatures at depth, so it is possible to extract time series with different depth bands to illustrate the breadth and depth of the data coverage with time; see Fig. 16.There are apparent artefacts in Fig. 16 Figure 16 clearly shows the annual cycle of seawater temperatures around the British Isles and interesting features such as the run of three cold winters (1985)(1986)(1987) followed by three warm winters (1988)(1989)(1990) plus warm summers (1995,2006).The datasets are very comprehensive for the sea surface (0-5 m depth) but are sparser for deeper depths (in this case 20-25 m).Typically, and as expected, sea surface temperatures are slightly higher than temperatures at depth in this region.Dulvy et al. (2008) have shown that many fish in the North Sea have responded to rising seawater temperatures by shifting their distributions into deeper and therefore cooler waters.They suggested that the whole North Sea demersal fish assemblage deepened by ∼ 3.6 m per decade in response to climate change between 1980 and 2004.Such long-term trends have been associated with a number of observed changes in biological systems, including a clear seasonal shift to the earlier appearance of fish larvae at Helgoland Roads in the southern North Sea (Greve et al., 2005), linked to marked changes in zooplankton com- position and sea surface temperature in this region (Beaugrand et al., 2002).Greve et al. (2005) suggested that in 10 cases, both the "start of season" and "end of season" (Julian date on which 15 and 85 % of all larvae were recorded respectively) were correlated with sea surface temperature.
Similarly, ichthyoplankton sampling suggests that winterbreeding species in the English Channel region also spawn earlier in cooler years, while summer-spawning fish tend to spawn later (Genner et al., 2010).Phenology is the study of the timing of recurrent biological events, such as the return of migrating species or the first flowering of certain trees each year.Though most examples of phenological change in the literature have been drawn from terrestrial systems, the yearclass size of marine fish is greatly influenced by the timing of spawning and the resulting match-mismatch with their prey and predators (Cushing, 1990), which are in turn greatly influenced by seawater temperatures.
The data now readily available here can contribute to further explorations of these changes although we note that the low average mid-depth seawater temperatures for the month of December in 2007 and 2009 arise from single data points forming that average.The high data point for mid-water in April 2011 comes from a diver.The following statistics (Table 4) are derived for the data used in Fig. 17.They indicate the importance of the statistical modelling outlined above, especially for earlier periods and for large areas.

Data availability
Data are available from the Cefas Data Hub.
The contents of the Cefas Data Hub website are provided as part of the Cefas role as a Defra agency under the Defra Open Data Strategy.
Cefas requires users to make their own decisions regarding the accuracy, reliability, and applicability of information provided.The data provided by the Cefas Data Hub are believed by Cefas to be reliable for their original purposes and are accompanied by discovery metadata that provide a copy of the information available to Cefas scientists, describing the original purposes of data collection.It is the responsibility of the data user to take this information into account when reusing data.Regardless of any quality control processes, Cefas does not accept any liability for the use the data provided; use is at the users' own risk.Cefas does not give any warranty as to the quality or accuracy of the information or the medium on which it is provided or its suitability for any use.All implied conditions relating to the quality or suitability of the information and the medium and all liabilities arising from the supply of the information (including any liability arising from negligence) are excluded to the fullest extent permitted by law.
The use of data from the Cefas Data Hub requires that the correct and appropriate interpretation is solely the responsibility of the data users, that results, conclusions, and/or recommendations derived from the data do not imply endorsement from Cefas, that data sources must be acknowledged, preferably using a formal citation, that data users must respect all restrictions on the use of data such as for commercial purposes, and that data may only be redistributed, i.e. made available in other data collections or data portals, with the prior written consent of Cefas.

Conclusions
This data rescue, assembly, integration, and publication exercise stemmed from what seemed at the time to be a relatively simple plea made at an internal workshop to make all temperature datasets held within the Lowestoft laboratory available via a common data portal.What emerged was a general realisation that there were 17 separate data systems, each containing records of varying quality, on paper and stored electronically in a myriad of different formats and archaic file types, some of which could no longer be easily read without bespoke computer software.Potentially valuable information was collected for various operational reasons over the past 134 years, but every system was tailored for a specific purpose.Where temperature was specifically measured by oceanographers, some form of CTD was deployed, and in these cases semi-standardised data were often transferred to national repositories, for example the British Oceanographic Data Centre (https://www.bodc.ac.uk/ data/bodc_database/ctd/) or the ICES Data Centre.However, in most cases, the data described here have never been made publicly available before, except within the context of summary outputs from the individual research projects published in peer-reviewed journal articles.The internal workshop wanted "all the temperature data in one place in the same format" so that anyone could use it.The initiating request for access without having to understand the originating formats was driven primarily by requirements for studying long-term climate change but also encompassed biological and ecological uses and work on linked data.These requirements became even more pressing given a UK governmentwide drive to make publicly funded scientific datasets available.Whomersley et al. (2015) describe the reuse of data by Earth Syst.Sci.Data, 10, 27-51, 2018 www.earth-syst-sci-data.net/10/27/2018/ specialists who did not need to understand the dataset and format or the associated limitations.This paper has taken a step further and decomposed the original data formats with a view to making the seawater temperature data more accessible and available, thereby widening access and reuse.
In June 2015 Defra's Secretary of State, Elizabeth Truss, announced her vision for the future of British food, farming, and the natural environment, stating that "at least 8000 datasets -will be made freely available to the public, putting Britain at the forefront of the data revolution".She stated that "vast data reserves from Defra are set to transform the world of food and farming in the single biggest government data giveaway the UK has ever seen".As a result of this initiative, Cefas has released more than 1950 individual datasets via the Cefas Data Hub (www.cefas.co.uk/cefas-data-hub/), a majority of which currently provide data in the original format.
The data presented here have not been corrected or adjusted in any way to take account of the different sampling methodologies used, as has been attempted for the most wellknown data collation efforts such as ERSST, HadSST3, and COBE-SST (see Mathews, 2013;Kennedy et al., 2011a, b).Inherent biases have been partially addressed by the provision of contextual fields (Source, Sample, Measure, Ref1, Ref2), and areas for easy but potentially misleading uses of the data have been explored above.Some of the datasets described here have contributed to the ICES Report on Ocean Climate (IROC), which provides summary information on climatic conditions in the North Atlantic on an annual basis (see https://ocean.ices.dk/iroc/).
The archive of processed Coastal Temperature Network data has been widely cited (see https://scholar.google.co.uk/citations?hl=en&user= GkV5fMwAAAAJ&view_op=list_works).This paper has made the underlying data readily available (source 1).Other datasets, such as source 10 comprised of temperature and depth records obtained via a citizen science project from recreational scuba divers (see Wright et al., 2016), represent a hitherto largely untapped resource for oceanographic researchers.
Author contributions.SD, LF, and OW provided data, data processing, and deeper insights into specialist areas along with SF, who also advised on the choice of ICES rectangles to demonstrate data in areas of high fisheries interest.JP was the Cefas staff member who, during a workshop in frustration, asked "Why can't I just get access to all Cefas temperature data?".This paper provides the requested access and expands on "just".DJM took on the challenge, identified and ingested the data, decomposed the temperature data from their multiple originating formats, ran the QA-QC, and prepared the pa-per.DM provided statistical inputs regarding the selection and presentation of data designed to illustrate the effects of bias and spatial and temporal influence.SR (through Cefas Seedcorn) provided the internal funding (part of Cefas Seedcorn DP705 "Delivering Linked Data"), early comments on the form of the paper, the support and resources needed to build the Cefas Data Hub, and ongoing support for what turned out to be more than a year of effort.

Figure 3 .
Figure 3. Overview of the locations of Cefas seawater temperature measurements with plotted point intensity reflecting data density.

Figure 4 .
Figure 4. Overview of the relative data density in the English Channel, the North, Irish, and Celtic seas, and the UK Continental Shelf area.

Figure 5 .
Figure 5. Illustration of data coverage with time: (a) 1880-1956 and (b) 1957-2014 (note the order of magnitude differences in counts).

Fig. 6d ,
Fig.6d, which illustrates data availability in the hundreds and then tens per 1 m bin for depths below 250 m.Most of the sampling programmes involving the Lowestoft laboratory over the past 130+ years have focussed exclusively on the continental shelf, where the most productive commercial fish stocks exist and water depths rarely exceed 200 m.Only occasional forays have been made into the deeper North Atlantic, and these records are contained primarily in sources 3 and 11.It is important to note that most of the existing data portals containing seawater temperature measurements (e.g.ERSST, HadSST3, COBE-SST) only accommodate records at the sea surface.The World Ocean Database (https:// www.nodc.noaa.gov/OC5/WOD/pr_wod.html) and the Met Office EN4 database (https://www.metoffice.gov.uk/hadobs/en4/) do contain subsurface data and ICES (http://www.ices.dk/marine-data/data-portals/Pages/ocean.aspx)attempts to provide insights into near-seabed temperature conditions

Figure 8 .
Figure 8. Illustration of near-surface (0-5 m) data coverage in the southern North Sea.Plotted point intensity reflects data density.Note that the area inside the West Frisian Islands is primarily sandbanks and reclaimed land, not sea.

Figure 9 .
Figure 9. Near-surface (0-5 m) data coverage and temperatures in the southern North Sea by source and time.Note the different timescales for each data source.

Figure 11 .
Figure 11.Illustration of potential numerical, geographical, and depth biases associated with data source in the southern North Sea from the year 2000 on: (a) 1-5 m and (b) 0-1 m (primarily autonomous platforms, sources 6 and 7).Plotted point intensity reflects data density.

Figure 12 .
Figure 12.Plot of seawater temperature in the southern North Sea against time post-2000.(a) Data from sources other than autonomous platforms.(b) Data from the two autonomous monitoring stations in the selected area.

Figure 13 .
Figure 13.Average (green), minimum (blue), and maximum (red) annual temperatures for the southern North Sea including all sources and all depths.

Figure 14 .
Figure14.Selected small-scale location illustrating the diversity of data sources available and their associated depth profiles.Note that the data illustrated were not collected at the same time.

Figure 15 .
Figure 15.Illustration of the distribution of near-surface data (0-5 m, in red) and mid-water data (20-25 m, in blue) for the bulk of the UK Continental Shelf.

4. 9
Distribution and patterns in seawater data for the bulk of the UK Continental Shelf area for the near surface and mid-water This section widens the geographic coverage of the data exploration to the bulk of the assembled data (Fig.15; see also Figs.3 and 4for context).We retain the near-surface 0-5 m (red) subsetting and extend it to "mid-water" at 20-25 m (blue).As already shown in the Fig.7a(Liverpool Bay) subset, near-surface data coverage is extensive in the Irish Sea www.earth-syst-sci-data.net/10/27/2018/ Earth Syst.Sci.Data, 10, 27-51, 2018

Figure 17 .
Figure 17.Average near-surface and mid-water seawater temperature around the British Isles by month from 1880 to 2015 (surface 0-5 m in red, mid-water 20-25 m in blue).

4. 10
Surface and mid-water seawater temperature around the British Isles from 1880 to 2015

4. 11
Figure17shows the near-surface and mid-water seawater temperatures for seas around the British Isles (UKCS area 48 to 58 • N and 10 • W to 10 • E) from 1880 to 2015 plotted by month.It shows that, for most months of the year, the sea surface temperatures around the British Isles increased throughout the 20th century, with stronger upward trends in the spring and summer months (March to August) and smaller increases in autumn and winter (September to February).Such long-term trends have been associated with a number of observed changes in biological systems, including a clear seasonal shift to the earlier appearance of fish larvae at Helgoland Roads in the southern North Sea(Greve et al., 2005), linked to marked changes in zooplankton com-

Table 1 .
Summary metadata for the 17 seawater temperature sources.
use systems (some data were presented with decimal places beyond those implied by statements regarding accuracy of measurement or, in the case of position, than is known to have been feasible at the time of collection).QA-QC processes for the sources were, and are, appropriate for their particular requirements.The data published here have been subjected to additional checks in the form of minimum and maximum and outlier detection plus location plotting.These uncovered a variety of data quality issues, primarily around location but also showing sensor-related data issues.Best efforts have been made to ensure the data are clean, reliable, and representative of what was measured.A degree of selection bias is inherent in this data compilation exercise ranging from what was originally done and where and when, to what was reasonably accessible for compilation, what was removed on the grounds of quality control and uncertainty regarding validity, to what users select and do with the data.
Table 2 and detailed in the metadata for each data source.

Table 2 .
Summary of the estimated actual accuracy of seawater temperatures by data source.
They also provide ready links to other documentation and context, e.g.cruise reports and other data types that may be available.Direct reconnection to the originating data source is, of course, available through time and position.Since 2009 the terms Cruise and Survey have become interchangeable for the RV Cefas Endeavour, with the latter mandated at the time of writing.

Table 3 .
Summary of data sources: geographic, depth, and temporal ranges for the subsetted data used in the figures.
ues that appear to be outliers (see above).High and low data points in Fig.16illustrate the importance of recognising the source of the data.Source 1 (Coastal Temperature Network) and source 5 (Fisheries Ecology Research Programme) both have data from relatively isolated bodies of water that have higher and lower temperatures than the surrounding sea (e.g.North Norfolk coast and South Wales inlets respectively).In addition, there are other source-affected influences on patterns and plots.In August 2001, for example, the surface data were primarily coastal in the south and in the Liverpool Bay area.The mid-water data were in the eastern North Sea, were dominated by ScanFish measurements, and were in and around the thermocline.In August 2009 the mid-water data were from CTD casts in the North Sea as far north as the Orkney Islands, whilst the surface data are coastal and in or south of the Humber Estuary.In the summer of 2012, The majority of surface data are from the RV Cefas Endeavour FerryBox system (source 10), which recorded tracks across the North Sea, again as far north as the Orkney Islands, whilst the mid-water data were dominated by citizen science diver data, especially on the coast of Northern Ireland.

Table 4 .
Rounded statistical summary of data used to calculate monthly averages.