Processing of water level derived from water pressure data at the Time Series Station Spiekeroog

The quality of water level time series data strongly varies with periods of highand low-quality sensor data. In this paper we are presenting the processing steps which were used to generate high-quality water level data from water pressure measured at the Time Series Station (TSS) Spiekeroog. The TSS is positioned in a tidal inlet between the islands of Spiekeroog and Langeoog in the East Frisian Wadden Sea (southern North Sea). The processing steps will cover sensor drift, outlier identification, interpolation of data gaps and quality control. A central step is the removal of outliers. For this process an absolute threshold of 0.25 m 10 min was selected which still keeps the water level increase and decrease during extreme events as shown during the quality control process. A second important feature of data processing is the interpolation of gappy data which is accomplished with a high certainty of generating trustworthy data. Applying these methods a 10-year data set (December 2002–December 2012) of water level information at the TSS was processed resulting in a 7-year time series (2005–2011). Supplementary data are available at doi:10.1594/PANGAEA.843740.


Introduction
The Time Series Station (TSS) Spiekeroog measures different time series data in the East Frisian Wadden Sea.The station is positioned in the tidal inlet between the East Frisian islands Spiekeroog and Langeoog (Fig. 1, left) and has been measuring hydrographic, atmospheric and biological parameters since 2002 (Reuter et al., 2009) as part of a long-term study of biogeochemical processes of tidal flats (Rullkötter, 2009).The goal of the Time Series Station was to have a platform at a fixed position for measurements during the whole year.Especially autumn and winter months were of interest because of extreme events.Time series are often analysed for hindcast of events and trends in the past (Visser et al., 1996;Gräwe and Burchard, 2012).Due to the long-term aspect of these measurements data sets are often affected by sensor degradation and maintenance resulting in time series combining periods of highand low-quality of data.To assess the quality of the time series data and to correct low-quality data sections different processing steps are needed.In this work we describe the processing of water pressure data recorded at the TSS Spiekeroog.The method emphasises the drift of sensor data and the handling of outliers and data gaps.At the end the quality of the data will be verified by frequency and storm surge analyses.

Instrumentation
The pressure sensor of the TSS Spiekeroog is installed in a tube approximately 1.5 m above the sea floor (Fig. 1, right).The axis of the tube is aligned with the main current directions during ebb and flood.A total of three different sensors were used for the pressure measurements and up to two were measuring at the same time.Two of the sensors (ID 40202 and 40801) were PDCR 901 (Druck Limited, England, Table 1).The third (ID 40301) was a PDCR 4000 (General Electric CO., USA).One of the PDCR 901 (ID 40801) was  Data were pre-processed and stored on the TSS.The preprocessing includes the conversion from measured currents to pressure data and binning of the measurement data to 1 min.In regular intervals the data were copied to a landbased computer.From this system the data were exported at an interval of 10 min for the time interval from 2003 to 2012 as monthly files.

Data Provenance and Structure
The data used in this paper are available at PANGAEA (doi:10.1594/PANGAEA.843740).The data were divided into yearly sections.Each section provides the original water pressure data (L1), the water level after the subtraction of a trend and removal of outliers (L2) and the final product of this paper the water level data, where gaps were interpolated (L3).In addition, each value is flagged with a bit coded value (Table 2) depending on the operations conducted with the value.

Methods
The data processing method is divided into four steps.For these steps meta-information about the time series, measurement station and nearby measurements is required.These steps are the following: 1. Subtraction of a trend 2. Removal of outliers 3. Calculation of supporting points and interpolation of missing data 4. Quality control of processed data

Subtraction of a trend
At first the time series of the water level data are divided into different sections.The divisions depend on the maintenance of the pressure sensor.Figure 2 shows the water pressure data (blue) and sensor maintenance (red) for the TSS Spiekeroog which dates are given in Appendix A in Tables A1 and A2.Each section is analysed for a trend by first calculating a running mean for all data points and then deriving the trend of the result.These trends can result from short-term and longterm changes.Short-term changes are based on diurnal and semi-diurnals tidal cycles and wind stress.The long-term changes can be a result of sensor drift, bio-fouling, longer tidal cycles or sea level changes due to climate change.
If a strong decreasing trend is detected (Fig. 2, End 2011) this trend will be subtracted from the time series due to the high probability that this trend is based on sensor drift or biofouling.From other sections only the mean water level will be subtracted.The information which was subtracted from a section is given in Tables A1 and A2 in Appendix A.
During this step the transition from pressure data to water level data will also be performed for which the Gibbs Sea Water (GSW) Oceanographic Toolbox (V3.01, 11 May 2011) for MATLAB (R2014b, The MathWorks) was used.The toolbox is based on the International Thermodynamic Equation of SeaWater -2010 (TEOS-10, IOC, SCOR and IAPSO, 2010).

Removal of outliers
The next step of the data processing is to remove outliers.Outliers occur for example during sensor maintenance.To detect probable outliers the speed of water level change (gradient) between two adjacent data points is calculated.For this, the distribution of the gradients is presented in the histogram of Fig. 3.Most values (99.96 %) have a maximum absolute gradient of 0.25 m for a t of 10 min.Consequently, the probability is high that an absolute gradient exceeding 0.25 m 10 min −1 is an outlier and should be removed.In ad- dition, each year is scanned visually in search for outliers missed by the gradient method.

Calculation of supporting points and interpolation of missing data
The removed outliers and gaps of the original time series represent missing information making it more difficult to interpret the data set especially since many methods for the analysis of time series require evenly sampled data (Karl et al., 1982).To fill these gaps an interpolation can be used.However, for gaps longer than a tidal cycle dominated by the M 2 tide the interpolated values might be wrong especially when using linear interpolation.Spline interpolation can also lead to wrong data due to overestimation or underestimation of zenith points.To prevent wrong data in these cases, it is possible to calculate zenith points for the interpolation representing the water level at high or low tide.These supporting points will be calculated by comparison with water level data from nearby measurement stations.In this case, a comparison was made with water level data from Neuharlingersiel (53 • 42 06 N, 7 • 42 15 E) measured with a pressure sensor that can fall dry during spring low tide.In case of the sensor falling dry the value is not selected.For consistency, the first two steps of the data processing are also performed on the data from Neuharlingersiel.
Comparison between the two data sets is drawn in two ways.The first is a comparison of the two curves to deduce the similarities in the water level range of the tide and differences at high and low water.This provides a possible offset and scale factor.The second is a cross correlation (MATLAB R2014b, function xcorr) analysis to find out the time lag between the two measurement sites.With this information it is possible to calculate auxiliary supporting points for the interpolation.
Tides can be described as a combination of cosine using a spline interpolation (MATLAB R2014b, function interp1 with spline method) the missing data points can be interpolated with a high certainty of achieving trustworthy data.For gaps longer than a tidal cycle (12.5 h) the prior calculated supporting points are used to avoid greater over and under estimation.

Quality control of processed data
Finally, to ensure the quality of the processed data a Fourier harmonic analysis and a storm flood analysis were performed.The Fourier harmonic analysis intends to show that it is possible to find all the main tidal frequencies and the difference between the original and the processed data.The storm flood analysis searches for severe short-term rises in the water level data.The German Hydrographic Institute (BSH, "Bundesamt für Schifffahrt und Hydrographie") has published values for different magnitudes of storm floods at the German North Sea Coast (BSH, 2015).These values classify a weak storm flood with water levels between 1.5 and 2.5 m above mean high water, a severe storm flood with water levels between 2.5 and 3.5 m and a very severe storm flood with water levels more than 3.5 m above mean high water.

Results
The different processing steps were applied to the whole time series presented in Fig. 2.During the processing of the first three steps it became obvious that greater amounts of data and/or comparison measurements were missing in the first 2 years (2003 and 2004) and at the end of 2012.Following these observations these 3 years were excluded for further data processing in this work.
Examples of the results of the first four processing steps for data from 2008 are shown in Fig. 4. For the quality control the fully processed data set (2005-2011) will be used.

Data processing
Figure 4 presents the time series at different stages of the data processing for a time period from 1 January to 31 December 2008.The top graph shows the measured pressure data before the validation (blue) and pressure sensor maintenance (red).At the second vertical red line of the graph a shift in the water level is apparent and after the third a gap in the time series.
The middle graph in Fig. 4 shows the time series after the subtraction of a linear trend and the removal of outliers.Two important details have to be mentioned about this graph.First, after the removal of the trend the whole time series has a mean value of zero.That means that the aforementioned shifts in the top graph were removed from the time series.Second, some outliers which also occurred near the shift have also been removed.In the whole time series 12 402 out of 368 064 (3.37 %) data points is missing after the removal of outliers.
Earth Syst.Sci.Data, 7, 289-297, 2015 www.earth-syst-sci-data.net/7/289/2015/ The bottom graph of Fig. 4 shows the comparison with the water level data from Neuharlingersiel and the result of the interpolation of missing data.Figure 5 illustrates similar data for a shorter time period (15-31 June 2007).Both figures display only small differences in amplitude and time between the data from Neuharlingersiel and the Time Series Station.From the cross correlation the time difference was calculated with 20 min meaning that high/low water is earlier at the TSS than at Neuharlingersiel.A further comparison between the high/low tide of both measurement stations has shown that the delay mainly changes between 0 and 40 min with some outliers in both directions.In addition, this provided mean differences and standard deviation for high (0.00 ± 0.17 m) and low (−0.03 ± 0.20 m) water at both stations.

Data quality
Figure 6 shows the result of the Fourier transformation of the processed water level data (black) and the original data (red) for the years 2005 to 2011.The most influential tidal frequency is the M 2 tide and other semi-diurnal frequencies.The next most influential frequencies are the diurnal tides around the K 1 tide.Equally important for shallow coastal areas are the M 4 and M 6 tides (Stanev et al., 2014).All of these peaks are much more pronounced in the processed time series compared to the original.The highest peak in the original time series is directly at the beginning of the graph representing the mean of data.
In Fig. 7 the storm flood analysis of the processed water level data is presented.To find and characterize the storm floods the mean high water was calculated with 1.33 m at the TSS.This leads to the discovery of six storm floods at the TSS Spiekeroog.These six storm floods can be characterized as weak storm floods.The data from Neuharlingersiel (mean high water: 1.33 m) show eight storm events in the same time frame.Six of these storms can be characterized as a weak storm flood, one as a severe and one as very severe storm flood.Table 3 shows the water level during the storm floods above the mean high tide and the name of the storms.

Subtraction of a trend
In the first processing step a piecewise linear trend or mean was subtracted.While it is easier to identify outliers by this way, it makes the data set more difficult to analyse for long-term sea level changes.The manual of the utilized pressure sensors states that the sensor degradation is ±0.1 % year −1 of a maximum of 30 dBar (Table 1).In winter 2008/2009 the water level date evinced a decrease by 0.8 dBar in 8 months.By contrast, the "Niedersächsische Landesbedtrieb für Wasserwirtschaft, Küsten-und Naturschutz" (NLWKN) has claimed that the water level is increasing (NLWKN, 2006).A possible reason for the decreasing values of the water level is bio fouling which is especially severe in spring and summer.

Removal of outliers
149 outliers were removed which represent only 0.04 % of the measured 368 064 values.A larger portion of data points (12 253, 3.33 %) is missing due to gaps in the observations.These gaps can happen during power failure or computer problems at the TSS.The comparatively small number of outliers implies that the measurement system is very robust.But the use of a gradient threshold for outlier detection can www.earth-syst-sci-data.net/7/289/2015/ Earth Syst.Sci.Data, 7, 289-297, 2015 also lead to the removal of correct data if the water level is indeed increasing very fast.A comparison between Fig. 4 and the results of the storm flood analysis (Fig. 7 and Table 3) shows that no values during extreme events were removed.In addition, Fig. 3 shows that about 99.96 % of all values have an absolute gradient lower than 0.25 m in 10 min for the TSS Spiekeroog.All of these pieces of information indicate that the selection of a gradient threshold of 0.025 m min −1 is an acceptable choice for the measurement station in this study.

Calculation of supporting points and interpolation of missing data
A comparison between the time series data from Spiekeroog and Neuharlingersiel reveals that both are in good agreement.Constant values below −2 m at Neuharlingersiel are explained by the sensor falling dry during low water at spring tides.If one of these values would be taken as a support point then this could lead to an overestimation of the interpolated water level data.The cross correlation analysis reveals a 20 min time lag between the Time Series Station Spiekeroog and Neuharlingersiel since the tidal wave which moves through the North Sea reaches the station first.Comparing high and low water at both stations this differences change mainly between 0 and 40 min with outlier in both directions.A comparison with the local tidal chart (BSH, 2011) shows a delay between Spiekeroog and Neuharlingersiel between 1 and 5 min.The discrepancy could be a result of the 10 min sampling rate leading to possible errors of ±5 min.
Another possibility is the influence of shallow water constituents on the tidal signal and the influence of wind.A scaling factor for the used water level data from Neuharlingersiel was not needed because only small differences were detected at high/low tide.Sturges (1983) suggests that it is still possible to interpolate data with gaps as long as 1/3 of the times series and a total number of missing values as high as half the time series.In this time series only 3.37 % of the data is missing.Using the supporting points it was possible to interpolate the missing data also for longer gaps (> 7 h) with reasonable results.This can be observed by the identification of no additional extreme events at the TSS and the similarities between the measured and interpolated data (Fig. 5).

Data quality
The Fourier analysis has shown two pronounced differences between the original and processed data.The first is the high peak at the beginning of the curve with a frequency of 0 representing the mean of the original data.The second difference is the strength of the tidal peaks.For the original time series data these peaks are not as distinct as for the processed data.These differences result from the previous processing steps and illustrate their usefulness.
Discrepancies in the storm flood events between the TSS Spiekeroog and Neuharlingersiel originate from the different positions of the measurement sites.During storm floods the wind blew from northern or western directions forcing the water into the harbour of Neuharlingersiel where it can accumulate and lead to increased water levels.The water level is also increasing at the measurement station but here it is not possible for the water to pile up resulting in lower water levels than at Neuharlingersiel.

Conclusions
Processing water level data resulted in a relevant long-term data set.Here it should be emphasised that: -From 99.96 % of the time series data only a linear trend was subtracted which leads to no difference in the ratio between the values.
-During the removal of outliers no values during extreme events were deleted.This was concluded derived from a comparison of storm floods between the Time Series Station Spiekeroog and Neuharlingersiel.
-The calculation of supporting points has yielded a mean 20 min time lag between the TSS Spiekeroog and Neuharlingersiel.This is greater than comparable data from the BSH and needs further analysis.
-The interpolated data follow the same trend as the measured data at the TSS Spiekeroog.
-A spectral analysis has shown that all major tidal frequencies can be found.Also during the storm flood analysis six events were found for the Time Series Station and eight for Neuharlingersiel.The difference comes from the different positions of the measurement stations.

Figure 1 .
Figure 1.Left: Time Series Station Spiekeroog in the tidal inlet between the East Frisian islands Spiekeroog and Langeoog.Right: schematic of the Time Series Station Spiekeroog with attached sensors; T: temperature sensor; C: conductivity sensor; P: pressure sensor, ADCP: acoustic Doppler current profiler, MST: Multispectral Transmissometer.(Badewien et al., 2009)

Figure 2 .
Figure 2. Measured water pressure data (blue) at the Time Series Station Spiekeroog before the validation and times of sensor maintenance (red vertical lines).

Figure 3 .
Figure 3.The histogram is showing the gradient between two adjacent data points.Vertical lines indicate the 0.25 m 10 min −1 threshold for the removal of outliers.

Figure 4 .
Figure 4.The subfigures show the validation process for the year 2008.Top: the time series before the validation (blue) and sensor maintenance (red vertical lines) are shown.Middle: the time series (blue) and outliers (red) after the first two steps of the validation.Bottom: the time series after the interpolation (blue) and the comparison data of Neuharlingersiel (red) are shown.

Figure 5 .
Figure 5.All three curves show data for 2 weeks in June 2007.The blue line shows the data after the removal of a trend and outliers and the red curve the interpolated data at the TSS Spiekeroog.The green curve shows data from Neuharlingersiel.

Figure 6 .
Figure 6.Fast Fourier Transformation (FFT) of the original (red) and processed (black) water level at Time Series Station Spiekeroog.K 1 : lunar diurnal constituent; M 2 : principal lunar semidiurnal constituent; M 4 and M 6 : shallow water overtide of principal lunar semi-diurnal constituent.

Figure 7 .
Figure 7. Storm flood analysis of the water level data from the Time Series Station Spiekeroog (top) and Neuharlingersiel (bottom).Green marker: weak storm flood; red marker: severe storm flood; black marker: very severe storm flood.

Table 1 .
Pressure sensors used between 2002 and 2012 at the Time Series Station Spiekeroog (FS: full scale).

Table 2 .
Flag codes

Table 3 .
Water levels of storm floods above the mean high tide in Neuharlingersiel and at the Time Series Station Spiekeroog between 2005 and 2011.Water level between 1.5 and 2.5 m characterise a weak storm flood, between 2.5 and 3.5 m characterise a severe storm flood and values above 3.5 m indicate a very severe storm flood.

Table A1 .
Maintenance times of the water level pressure sensor between December 2002 and January 2009.Dates marked with a "*" were not used during the trend removal.

Table A2 .
Maintenance times of the water level pressure sensor between April 2009 and November 2012.Dates marked with a "*" were not used during the trend removal.