Standardization of a geo-referenced fishing data set for the Indian Ocean bigeye tuna , Thunnus obesus ( 1952 – 2014 )

Geo-referenced catch and fishing effort data of the bigeye tuna fisheries in the Indian Ocean over 1952–2014 were analyzed and standardized to facilitate population dynamics modeling studies. During this 62year historical period of exploitation, many changes occurred both in the fishing techniques and the monitoring of activity. This study includes a series of processing steps used for standardization of spatial resolution, conversion and standardization of catch and effort units, raising of geo-referenced catch into nominal catch level, screening and correction of outliers, and detection of major catchability changes over long time series of fishing data, i.e., the Japanese longline fleet operating in the tropical Indian Ocean. A total of 30 fisheries were finally determined from longline, purse seine and other-gears data sets, from which 10 longline and 4 purse seine fisheries represented 96 % of the whole historical geo-referenced catch. Nevertheless, one-third of total nominal catch is still not included due to a total lack of geo-referenced information and would need to be processed separately, accordingly to the requirements of the study. The geo-referenced records of catch, fishing effort and associated length frequency samples of all fisheries are available at doi:10.1594/PANGAEA.864154.


Introduction
Bigeye tuna is one of the most valuable tropical tuna species that has been exploited in the Indian Ocean by international industrial longline fleets since the 1950s and by purse seine fishery since 1980 (IOTC, 2015;Miyake et al., 2004;Sharma et al., 2014).During 1952During -2014, over 4 million tonnes of bigeye tuna were removed from the Indian Ocean, 74 % of it by longline fishing.Longline fishery historically developed by expansion of Japanese fishery from 1952 after releasing virtual lines set in place at the end of the Second World War, restricting its fishing activity to Japanese waters only (Haward and Bergin, 2001;Miyake et al., 2004;Okamoto et al., 2004).In the early period (1952), Japanese longline was concentrated in the eastern Indian Ocean (Menard et al., 2007;Mohri and Nishida, 1999).A few years later, bigeye tuna also became exploited by longline fleets from Korea in 1965 and Taiwan in 1967 (Miyake et al., 2004).
Since then, 13 longline fleets have been declared fishing bigeye tuna within the Indian Ocean.These are the Seychelles, China, Australia, La Réunion (France), South Africa, Mauritius, Thailand, Portugal, Mayotte (France), the Maldives, Malaysia, India and the Philippines.
More recently, purse seine fishing has become responsible for a significant percentage of bigeye tuna catch in the Indian Ocean, especially in the juvenile age classes, in contrast with longline fisheries targeting adult fish (IOTC, 2015).These surface fisheries started operations in 1980s, when the French purse seine fleet moved from the eastern Atlantic Ocean to the Indian Ocean (Allen, 2010;Majowski, 2007).They were joined by the Spanish and Japanese and then Thai, Seychelles and Korean purse seine fleets.Target species of purse seiners are skipjack (SKJ) and yellowfin (YFT) for the canning industry, but bigeye tuna (BET) were also caught in small proportions in the early period of exploitation, when purse seine vessels operated mainly in association with tuna schools (free swimming schools: FS).With the introduction of the fish aggregating device (FAD) fishing technique in the 1990s, the purse seine catch of juvenile bigeye tuna increased significantly, representing nearly half of total bigeye catch in the recent years (Davies et al., 2014;Fonteneau et al., 2013;IOTC, 2015;Kaplan et al., 2014).
Intensive exploitation by longliners and increasing fishing mortality of juveniles in the last two decades by purse seiners fishing on FADs have reduced bigeye tuna stock in the Indian Ocean to a level close to its maximum sustainable yield (IOTC, 2015).However, the uncertainty on the stock assessment studies for this species is substantial and needs to be reduced by improving both data sets and models.Until now, most tuna stock assessment studies have used nominal catch aggregated at basin scale.New modeling approaches, however, require spatially disaggregating the fishing data either between a few large geographical regions (e.g., Multifan-CL; Hampton and Fournier, 2001) or at a spatial resolution of 1 to a few degrees (e.g., SEAPODYM: Lehodey et al., 2015; APECOSM-E: Dueri et al., 2012).These higher resolution data sets are also needed to investigate species habitats, allowing catch per unit of effort (CPUE) standardization and more generally the relationships between the species distribution and the variability of climate and environment.These studies require standardized data sets of historical catch data, allowing inter-comparisons of results.
The secretariat of the Indian Ocean Tuna Commission (IOTC) is collecting and publishing fishing data (catch, effort and size frequency of catch) for stock assessment analyses and estimations of fishing mortality.There are two data sets providing nominal and geo-referenced data.The nominal catch data set is the official annual catch declaration by each member country to the IOTC.It gives total annual catch by species and by fishing gear.However, there is no georeferenced information on where the fish are caught.This information is partially provided in the second data set that gives subsamples of monthly geo-referenced catch and effort by fleet.
A key objective of the present study, by examining all available information, is to build a geo-referenced data set, i.e., with monthly catch spatially distributed, that matches the total (nominal) catch for fleets of fishing countries providing both geo-referenced and nominal catches, taking into account as far as possible the size selectivity of the fishing gears.This requires revising catch, effort and length frequency data of bigeye fishing in the Indian Ocean available from the public IOTC database (www.iotc.org),using a careful screening, standardization and validation approach.There are many problems with such long time series of data due to changes in fishing practices and data reporting.The data sets are constructed from various spatial resolutions ranging between 1 • ×1 • and 10 • ×20 • .Catch and effort data are derived from various types of fishing gears characterized by different fishing methods and target species.Consequently, the catchability, a key coefficient that links the catch to fishing effort and fish abundance, varies from one fishing mode to another.Over time, a variety of catch and effort units have been used that prevent long time series analyses.Finally, for studies requiring computing total fishing mortality, the geo-referenced catch data need to be raised to match the nominal catch.
Therefore, the objective is to provide a standardized data set -with a definition of the longline, purse seine and othergears fisheries -to researchers from various disciplines that may have not the necessary expertise in fisheries sciences to interpret these data correctly.Several steps are described, including the homogenization of spatial resolutions, the standardization of catch unit, the raising of catch data to fit the total nominal catch, the standardization of effort unit, and, finally, the analysis of data time series and fisheries history to detect major changes in catchability.
Standardized data sets resulting from this study are provided in ASCII format on PANGEA (doi:10.1594/PANGEA.864154).

Data
Nominal and geo-referenced data (catch, effort and size frequency of catch) are freely available on the IOTC website (http://www.iotc.org/data/datasets).The catch and effort data were classified accordingly to three groups of gear type: longline, purse seine, and other gears.Bigeye tuna data were extracted from each group and analyzed separately.A screening of the geo-referenced data set using a topographic mask led to excluding 6.87 % of longline, 1 % of purse seine and 0.97 % of other-gears data with position incorrectly located on land (i.e.all of the cell at a given resolution is on land).

Longline
Initial geo-referenced data set of longline bigeye tuna catch included five categories: longline targeting bigeye tuna (LL), longline targeting swordfish (ELL), fresh tuna longline (FLL), exploratory fishing longline (LLEX) and longline targeting shark (SLL).The last category was ignored since it contained only five bigeye catch observations.In the four remaining, 93 % of bigeye tuna catch was due to the LL category, including Japanese, Taiwanese, Korean, Seychelles, Chinese, Thai, Mauritian, Maldivian, and Philippine fleets.The ELL category contributed to 4.4 % of longline bigeye catch data by Australian, La Réunion, Seychelles, South African, Portuguese, Mayotte and Mauritian fleets.The FLL fleets of Taiwan, China and Malaysia caught the remaining 2.6 %, and LLEX contained only a few Indian longline data (0.05 % of total longline data).
The vast majority of longline data (92.68 %) were structured in 5 • × 5 • grid cells.The remaining data were at a resolution of 1

Purse seine
Geo-referenced purse seine fishing data consist of large (PS) and small purse seine (PSS).PS has carrying capacity of about 1000-1500 t, while PSS has less than about 200-250 t (Joseph, 2003).The PS consists of geo-referenced data from fleets of Spain, France, the Seychelles, Japan, Mauritius, Thailand, Korea, the former Soviet Union, NEIPS and NEISU.NEIPS data are those collected by European scientists onboard non-European vessels, while NEISU data were collected by Russian scientists from purse seine vessels of Liberia, Belize and Panama.The small purse seiners data consisted only of Indonesian observations.Almost all (> 99.9 %) purse seine fishing data were at the resolution of 1 • × 1 • , and a very small number of data had a resolution of 5 • × 5 • .These data were subdivided between sets on free schools (FS), associated with artificial (FAD) or natural logs (LS), mixed strategy (MIX) and unknown sets (UNCL).There were 11 % of sets purely on free school and 70.8 % associated with logs.A very small number of data (< 0.2 %) for small purse seiners were reported as unknown.The remainder (17.9 %) consisted of large purse seiners operating either on free school or log but reporting a single fishing effort without distinction of the fishing strategy.Purse seine fishery is dominated by Spanish and French fleets, which together provide 65.9, 65.5 and 59.5 % of FS, LS and MIX sets, respectively.
Catches of the purse seine data were uniformly expressed in total weight.The number of fishing hours (FHOURS) is the most used unit of fishing effort (87.3 %), followed by the number of fishing days (FDAYS, 7.6 %) and the number of days at sea (DAYS, 4.0 %).A very small number of records (1.1 %) used number of sets (SETS) or number of trips (TRIPS).The fishing effort unit can change for a same fleet over certain periods of time.The Spanish fleet reported effort in FDAYS until 1990 but in FHOURS after this year.Similarly, three periods occurred for the Japanese fleet with effort in days at sea (1989-1999), fishing days (2000-2010) and sets (2011)(2012)(2013)(2014).The Thai fleet had only 2 years of data with 2006 in fishing days and 2009 in sets.

Other gears
The other fishing gears associated with bigeye tuna catch are coastal longline (LLCO) in the Maldives; gillnet (GILL) from the Taiwanese fleet; a combination of gillnet and longline (GL) used in Sri Lanka; hand line (HAND) and baitboat (BB) both used in the Maldives and Australia; troll line (TROL) from the Maldives, Australia and Indonesia; hand line and troll line (HATR) from La Réunion and Australia; and sport fishing (SPOR) in South Africa.Some records from Sri Lanka have unknown gear (UNCL).From these various categories, coastal longline and gillnet represented respectively 53.5 and 23.6 % of all records.Spatial resolutions used were either 1 • × 1 • (62.2 %) or 5 • × 5 • (37.8 %) grid cells, with the lower resolution used by Taiwanese GILL, La Réunion HATR, Sri Lankan GL, Sri Lankan UNCL, South African SPOR and Indonesian TROL.
Catches of this group were declared in total weight and the fishing effort was composed of various units: number of hooks (HOOKS), number of days with the net in the water (NETS), number of fishing days (FDAYS), number of trips (TRIPS), number of boats (BOATS) and number of days at sea (DAYS).

Length frequency
The IOTC also maintains a database of length frequency of catch collected onboard fishing vessels by observers or during landing operations.These data provide key information for population dynamics models, as well as for extrapolation of nominal catch to subsampled spatial distributions.The length-frequency catch data are aggregated either monthly or quarterly first to assist in the catch data standardization.For the final data set provided with this study, they are all aggregated on a quarterly basis.All bigeye tuna size samples were measured in centimeter fork length (FL).The original size data were distributed in 150 classes starting at 10 cm length with 2 cm intervals between each class.In this study the maximum size was limited to 200 cm, a limit used in most stock assessment studies (IOTC, 2015;Langley et al., 2013), since there are only a very few fish caught with bigger size.These size frequencies of catch were associated with their corresponding fisheries.
Spatial resolution included both regular and irregular cells.Regular cells can be 1 • ×1

Standardization of spatial resolution
The main spatial resolution used for geo-referenced catch and effort declaration is 5 • × 5 • for longline and 1 • × 1 • for both purse seine and other-gears data.These resolutions were selected as representative of these three types of fishery, and data that were not provided at these resolutions were converted to these respective reference spatial resolutions, either by aggregating catch and effort when resolution was higher or, conversely, by dividing the catch and effort equally in the case of the original lower resolution.All longitude and latitude references were adjusted to the center of each cell.

Conversion of longline catch unit
Length-frequency data were used to convert catch declared in numbers of individuals into catch in weight.The number to weight conversion is based on the length-weight relationship w = aL b , with w = weight (kg), a = 3.661 × 10 −5 , L = fork length (cm), and b = 2.901 (Nakamura and Uchiyama, 1966).The Japanese and Taiwanese longline length-frequency data were averaged to construct annual and single weight conversion factors on eight regions (see below).When temporal occurrence of catch and annual weight factors did not exist, the catches were converted using a single weight factor.
This individual to weight conversion concerned the portion (30 %) of geo-referenced catch data of La Réunion ELL fleet that were declared in number of individual fish.It was also used for the geo-referenced catch data expressed in number of individual tuna in the Japanese LL (100 %) and Korean LL (75 %) fleets, before being raised to the nominal catch level (see below).However, given the importance of these two longline fleets, fisheries catches in number of individuals are also provided unchanged.

Raising geo-referenced catch and effort data to nominal catch level
The IOTC geo-referenced data set provides a large subset of the total catch declared as nominal catch by each country and fleet.To compute total fishing mortality from geo-referenced fishing data, this catch needs to be raised to the level of nominal catch.This is a data processing step sometimes conducted directly by national fisheries statistics services before being provided to the regional fisheries management organizations (RFMOs; Fonteneau et al., 2013).When the difference between total annual nominal and geo-referenced catch was above 5 % for a given fleet, we used a raising factor I and added the product of I with the annual catch difference to the monthly catch of geo-referenced cell i, j .The factor I used to distribute the total annual catch differences was computed for each fleet and gear type using the following equation: where C i, j, m is the catch in the cell of indices i, j of a given month m.
The same approach and factor was used to raise the fishing effort associated with the catch C i, j, m .
Unlike in geo-referenced data, the nominal catch data for purse seiners did not discriminate between type of sets (i.e., FS or LS).To maintain this key information in the georeferenced data set the difference between total annual nominal and geo-referenced catch data were divided proportionally to the proportion of each set type available in the georeferenced data set.
For the Japanese and Korean longline catch data expressed in number of fish (see below), we provide both the original catch data in number of individuals and the catch converted to weight and raised to nominal catch level (Sect.2.3).

Detection and correction of outliers
An outlier screening based on the Hampel identifier method (Pearson, 2011) and using catch per unit of effort (CPUE) was conducted.This process was conducted for each subdata set characterized by the same gear, flag, and catch and effort units.Outliers were defined on the basis of a threshold value t.A CPUE x k is defined as outlier if where and S is the scale estimate from the median absolute deviation from median (MADM) The threshold value was adjusted for each sub-fleet to avoid excessive removing, practically no more than ∼ 5 % of each sub-fleet data set.Following the robust procedure proposed by Davies and Gather (1993), this method was used within a loop until no outliers remained in the data set.For CPUE records detected as outliers, the effort was corrected relatively to the mean local CPUE of the neighboring non-outlier observations, with the condition that they occurred at the similar month within a defined maximum radius.An iterative algorithm allowed for selection of the first two adjacent non-outlier CPUE values to compute the local mean CPUE.
When the neighboring observations were not available within the defined radius, the outlier record was moved to a separate fishery where only catch values are retained.This approach was chosen to avoid a loss of information on the total catch.It is preferable to modify the effort because its value does not directly influence the stock variation (Maunder and Punt, 2004;Maunder et al., 2006).

Standardization of effort units
When possible, fishing effort units were converted to the reference units, i.e., number of hooks for longline and number of fishing hours for purse seine.This was possible when the different units used by a fleet also included the reference unit.
In that case, the conversion was based on the ratio calculated from mean CPUE of reference and targeted period.When there was no reference unit, the reference was obtained from another fleet with similar characteristics (i.e., similar fishing gear and tuna target).As for the conversion of catch units, when spatiotemporal occurrences of effort in both units did not exist at the original resolution, the conversion was performed by testing decreasing resolution and eventually by using a monthly climatological value.

Time series analysis to detect major changes in Japanese longline fishery
Over the historical industrial fishing period since the 1950s, changes in tuna fishing technologies and tuna market demand have significantly modified the fishing strategy of longline fleets.The introduction of monofilament for the mainline, allowing deeper longline sets (Okamoto and Shono, 2006;Okamoto et al., 2001); the installation of super-cold freezers for fish storage (Haward and Bergin, 2001;Matsumoto et al., 2013;Okamoto and Shono, 2006;Ward and Hindmarsh, 2007); and increasing market demand for sashimi (Miyake et al., 2004;Sakagawa et al., 1987) have led to stronger targeting of bigeye tuna.These changes particularly affected the Japanese longline fleet, which has the longest periods of exploitation and the largest market demand (Haward and Bergin, 2000;Lee et al., 2005;Yeh and Chang, 2013).Consequently, the catchability of the fishing gears and thus the CPUE were modified over time (Fonteneau et al., 2000;Maunder et al., 2006).Therefore, spatiotemporal variability in the Japanese longline CPUE time series was analyzed using a spatial stratification into eight large regions as proposed for stock assessment of Indian Ocean bigeye tuna (Kolody et al., 2010) and CPUE standardization (Matsumoto et al., 2015) studies, but with extended north and south boundaries to include all longline data.
Abrupt changes in temporal trends of CPUE were sought using the breaks for additive seasonal and trend (BFAST) method, which is widely applied for detection of long-term changes (Forkel et al., 2013;de Jong et al., 2012;Lambert et al., 2013;Verbesselt et al., 2010aVerbesselt et al., , b, 2015;;Watts and Laffan, 2014).BFAST differentiates a time series (Y t ) into a sum of its seasonal (S t ), trends (T t ) and residual (e t ) components.A break is defined when the slopes in the trends of adjacent periods are significantly different (de Jong et al., 2012).The BFAST method requires defining one parameter, either the minimum duration of the time series before a potential break or the maximum number of breakpoints allowed to be detected within the time series (de Jong et al., 2012).Both approaches were tested in this study.Since BFAST cannot accommodate missing values within time series data, the values were replaced by monthly climatological (monthly average) CPUEs when only a few of them were missing; otherwise, the time series was cut.

Results
Based on the nominal catch data, the majority of bigeye tuna landings in the Indian Ocean are provided by industrial longline (74.2 %), followed by purse seine (18.5 %) and other gears (7.3 %).The Taiwanese, Japanese and Korean longline fleets together captured 68 % of longline catch.The Japanese fleet started to capture bigeye tuna in 1952, followed by the Taiwanese in 1954 and the Korean in 1965.The catch was largely due to the Japanese fleet during 1952 to the mid-1970s.Then, Korea until the mid-1980s and Taiwan became two other major players in longline fishery (Fig. 1).Finally, the Indonesian fresh longline and the NEI.FROZEN longline fleets respectively contributed to 11 and 5.7 % of longline catch, but their geo-referenced catch data are unavailable.The nominal catches of the latter fleet were estimated by the IOTC secretariat from various non-reporting longline flags, including Honduras, Belize, Equatorial Guinea and Panama (Fig. 1).
Nominal catch by industrial purse seiners were dominated by two European fleets: the Spanish (33.7 % of purse seine catch) and the French (25.2 %).From the early 1980s to the mid-1980s, the French fleet dominated this fishery, and then until the mid-1990s the catches from both fleets were at a similar level.Since then, the Spanish have contributed to the largest annual bigeye tuna catch for this fishing gear (Fig. 1).The Indonesian small purse seine, the Seychelles and the NEIPS (see Sect. 2.12) have respectively contributed 13.8, 9, and 7.2 % of purse seine catch (Fig. 1).Unfortunately, nominal catches of the Indonesian fleet were not accompanied by geo-referenced data sets.
After catch conversion from number of individuals to weight, the differences with nominal catch are relatively small (11 %) for the Japanese LL and much higher for the Korean LL (40.7 %), especially before mid-1980s and still with the data gap in the geo-referenced data set during 1988-1991.For La Réunion ELL, there is only a large difference during a few years (2005-2008) (Fig. 2).
Over the whole period of exploitation, the longline bigeye catch distribution has covered all of the Indian Ocean basin up to 50 • S but with the maximum catch coming from the tropical region 10 • N-15 • S (Fig. 3).For the purse seine fishery the catch was also concentrated in the tropical region, but more particularly in the western Indian Ocean (Fig. 4).The other-gears group had activities concentrated in the central and southern Indian Ocean.Coastal longline and unknown gears captured bigeye tuna in the central Indian Ocean, and they together contributed to 64.3 % of other-gears catch.Bigeye tuna catches from gillnet (32.6 % of other-gears catch) are distributed in the southern Indian Ocean (Fig. 4).

Raising of geo-referenced fishing data to nominal catch level
Geo-referenced fishing data from 14 longline fleets, 5 purse seine fleets and 4 other gear fleets required raising to the nominal catch level (Table 1).Unfortunately, the various fleets that do not provide any geo-referenced information cannot be processed here to provide spatially explicit distributions of catch.These fleets represent 33 % of the total nominal catch over the whole historical fishing period (Table 1) with the biggest catch contribution from Indonesian FLL (30 %), NEI.FROZEN LL (12.8 %), Indonesian LLCO (8.4 %), Indonesian PSS (7.7 %), NEI FRESH FLL (5.4 %) Earth Syst.Sci.Data, 9, 163-179, 2017 www.earth-syst-sci-data.net/9/163/2017/ and NEI Indonesian FLL (5.2 %) (Fig. 5).These data would require special treatment according to the type of study (see Sect. 5).

Longline
A total of 2571 outliers were detected from the longline data set.Using a threshold classically fixed to a value of 3, the Japanese LL fleet (31 963 records) and the Taiwanese LL fleet (24 918 records) contributed to two-thirds of this total with respectively 1218 and 491 outliers (Table S1 in Supplement).Fishing effort of 2359 outliers (i.e., 91.7 % of the total) was corrected using a maximum radius of 15 • from the position of the outliers.For 53.4 and 47.8 % of the corrected outliers in the fleets using respectively number of individuals or total weight as catch units, a radius of 5 • was sufficient to estimate the mean CPUE from neighboring points.However, for a few outliers, it was not possible to correct the fishing effort and thus the catch was simply moved to the outlier longline data file for which there is low confidence.This was the case, for example, for two outliers in the Japanese LL fleet detected in the southwestern Indian Ocean during summer of 1974 and 1980.These outliers had extremely high bigeye catches relatively to a small fishing effort, leading to a factor of > 100 comparative to the mean CPUE of neighboring records of the same month.The impact of the correction of fishing effort on detected outliers can be illustrated with the distribution of variance of the CPUE (Fig. 6).

Fleet
Time period Total nominal catch Difference with geo-referenced (tonnes) catch (%) changes in the CPUE time series, with the largest occurring in 1992.The mean annual CPUE in 2012 is the highest of the whole 47-year time series before and after correction (Fig. 7).

Purse seine
There were 3472 outliers detected in the purse seine data set.For the largest fleet, i.e., the Spanish PS-LS-FHOURS (17 586 data) and the French PS-LS-FHOURS (14 118 data), the outlier threshold value was set to 5 to avoid having too selective a criterion since variability in purse seine fishing CPUE can be much higher than with longline.With this threshold 5.7 and 5 % of the records of the Spanish and French fleets, respectively, were detected as outliers (Table S1).Fishing efforts of these outliers were corrected with the mean CPUE of neighboring records in a maximum radius (r) of 5 • (the resolution for purse seine data being 1 • ).For the associated log data set (LS), 94.5 % of detected outliers were corrected, 31.8 % were corrected using mean CPUE within radius 1 • , 36.9 % with r = 2 • , 20.2 % with r = 3 • , 6.9 % with r = 4 • , and 4.1 % with r = 5 • .Using also a maximum radius of 5 • , it was possible to correct 74 % of the total associated mixed strategy (MIX) fishery's outliers, 59.6 % of the total associated free schools' (FS) outliers, and 83.3 % of uncategorized outliers (UNCL).The non-corrected outliers (438 records) were kept in a separate fishery file ("Purse seine outliers").The impact of this outlier screening and correction on time series CPUE is shown for Spanish and French log-associated purse seine fleets in Fig. 7. Unlike with longline data the effect was more uniformly distributed over time.Despite the correction concerning only 5 % of the data, the change was also stronger than in the case of longline data.For the French fleet, however, the difference with the corrected series decreased in the early 2000s and remained in its smallest range of deviation after 2005.For this fleet, the correction reduced variances of CPUE, particularly in the eastern Indian Ocean (Fig. 8).

Other gears
A total of 564 outliers were detected from the other-gears group, from which 346 (61.3 %) were corrected using neighboring points within a maximum radius of 5 • (78.6 % within a radius of 3 • ).Non-corrected outliers (219 records) were kept in a separate data file ("Other-gears outliers").

Effort unit
Longline fishing efforts were converted to number of hooks from fishing days for the Portuguese ELL (2006, 2007, 2013, 2014), Mayotte ELL, Philippine LL, andThai LL (2012, 2013), as well as from the number of sets for the Thai LL (2007,2008).These efforts were converted using mean CPUE ratio for the period available with the reference unit.For the Mayotte ELL and the Philippine LL that only provided 1-year fishing-days data, the ratio was respectively calculated from the Portuguese ELL and the Thai LL number of hooks fishery.
Three Spanish and two Japanese purse seine fleets required effort standardization.The Spanish efforts were standardized to number of fishing hours.It can be checked that converted efforts of the LS and the FS sub-data sets occupy ranges of the reference efforts (Table S2).The efforts of Japanese and Thai LS were standardized to number of fishing days.For the other-gears group, efforts of the Maldivian coastal longline and the La Réunion hand line and troll line were standardized to number of hooks.

Detected breaks in Japanese longline fishery
From the eight regions (see Fig. 3) defined to investigate historical changes in fishing practices of the Japanese longline fishery, region VII was excluded of the BFAST analysis because of too many missing values during the time periods 1955-1961, 1972-1990, and 2007-2014.The monthly CPUE time series for the seven remaining regions varies between 605 months (∼ 50 years) and 746 months (∼ 62 years).The fishery in region VIII has the longest series, from November 1952 to December 2014 (Table 2).
For the BFAST parameterization, we tested a minimum duration of time series between 10 and 25 years or a maximum number of breakpoints between one and four.There was no change in the BFAST results for a value above four.The period of 10 years was selected because we sought a break corresponding to long-term change in fishing strategy, while the 25-year period corresponds to the maximum length that can be selected to detect at least one break.Detected breaks were considered very robust when they were detected in at least 75 % of the tests carried out with the two parameterization approaches.Based on these thresholds, two very robust time breaks were detected in the northwestern (region I) and eastern tropical (region V) Indian Ocean (Table 2 and Fig 3).

Length frequency data
The available length-frequency data coincide with the definition of 18 fisheries.These are L1-L10, L12, S13, S15, S16, S19, S20, O22, and O29.For the longline fisheries, the highest number of size data is from the tropical Taiwanese LL (Fishery L5), with 9583 samples over 1980-2014, and the large fish measured (mean fork length > 140 cm) in the western Indian Ocean (Fig. 10).For the purse seine fisheries, the largest number of samples, 11 433 over 1984-2014, comes from log-associated fishery (S13).Mean fork lengths of catch higher than 52 cm in this fishery are distributed over the central and eastern Indian Ocean (Fig. 10).

Data availability
Geo-referenced bigeye tuna catches and fishing efforts along with their compilation of length-frequency (Wibawa et al., 2016) resulting from standardization procedures as described in this study are archived in the freely accessible PANGAEA's storage (doi:10.1594/PANGAEA.864154).

Discussion
Most stock assessment studies of bigeye tuna conducted by the Indian Ocean Tuna Commission (IOTC) have been based on nominal fishing data aggregated either over the whole oceanic basin or a few large areas and geo-referenced data  1953-1955, 1957, 1958 1961, 1963-1968, 1970, FLL, 1971, 1973-1986, 1989  of the few main fisheries used to provide relative abundance indices (e.g., Kolody et al., 2010;Langley et al., 2013;Matsumoto et al., 2015;Yeh and Chang, 2015).The compre-hensive geo-referenced fishing data set prepared here allows for envisaging future stock assessment studies accounting for more detailed spatial structures, which is a key issue for highly migratory species like tunas (Maunder et al., 2014;Punt et al., 2014;Sharma et al., 2014;Lehodey et al., 2014).The first key objective in building this data set was to raise the available subsampled geo-referenced catch and effort data to the nominal level to account for all fishing mortality of the fleets in spatially explicit stock assessment studies.This has been achieved through a careful extrapolation by crossing data of three data sets containing total aggregated (nominal) catch, subsampled geo-referenced catch, and effort and length frequencies of catch.The other objectives were to standardize the different units to avoid a multiplication of fisheries and a robust screening of data to remove conspicuous errors.Obviously, these data and their treatment here remain with several sources of uncertainties that are discussed below.
Despite our efforts in this study to process all available data, it appears that a substantial amount of catch declaration has no geo-referenced information at all.Therefore, these catches would need to be processed accordingly based on the type of study and use.For instance, in stock assessment studies based on a few large areas, these catch data are recorded within the area of the countries concerned (Langley, 2016).With higher spatial resolution, a more detailed analysis should be conducted to allocate catch to coastal, exclusive economic zone or offshore fishing grounds.

Catch of the Asian longline and European purse seine fleets
The detailed and careful analysis of IOTC bigeye fishing data has shown several inconsistencies that we tried to resolve in the best possible way.Given the importance of Asian longline fleets in the history of bigeye tuna exploitation and the extensive series of geo-referenced catch subsamples declared in number of individual fish for Japanese and Korean longline fleets, we decided to provide both original data in num-bers and a conversion in weight raised to the nominal catch level.Unfortunately, length-frequency data did not cover the whole period of the Japanese and Korean catch, leading to application of less accurate single weight conversion for a short period, e.g., in 1952-1964, of the Japanese LL.Nevertheless, before submitting its national annual fishing statistics to the IOTC, the Japanese National Research Institute of Far Seas Fisheries (NRIFSF) applies a raising procedure to provide geo-referenced data consistent with the nominal catch declaration (Matsumoto et al., 2013).Therefore, its geo-referenced number of individual tuna should be consistent with declared nominal catch.The Korean National Fisheries Research and Development Institute (NFRDI) has aggregated catch from fishermen's logbooks into monthly 5 • × 5 • cells (Lee et al., 2014), but whether the catches were raised to nominal catch or unraised is unclear.As reported by Chassot et al. (2015), a raising procedure is also conducted by fishery scientists involved in IOTC statistical working group to match geo-referenced purse seine catch data to the level of nominal catch.This is confirmed by the good match between geo-referenced and nominal catch data that we obtained for the purse seine fisheries.This is not the case, however, for the small European longline fleets (La Réunion, Mayotte and Portugal), for which a raising procedure was applied in this study.

Data screening of large longline and purse seine fleets
There are various potential sources of mistakes along the chain of fishing data reporting and different approaches to check and screen these data.For instance, Japanese fishery scientists check the effort data of the longline logbooks and remove those with less than 200 or more than 5000 hooks (Hoyle et al., 2015).In this study, we employed a robust outlier filtering method (Hampel identifier method) based on CPUE to detect anomalous data.Then, instead of removing the catch and effort observation of outliers, the fishing effort value was corrected relative to the nearest-neighbor CPUE values in order to avoid an underestimation of the catch as far as possible, which is key information for fishing mortality estimates.When it was impossible to correct the fishing effort in the absence of neighboring values, the catch observation was retained in a special fishery (outliers) file, allowing for keeping track of all declared catches.Among the largest anomalies detected with this filtering method, there is a high peak of CPUE in 2003 for the Taiwanese longline fleet that has been already identified in previous analyses and potentially linked to misreporting of logbook data that occurred among the Taiwanese fleets operating in the Pacific, Atlantic and Indian oceans (see Hoyle et al., 2015).Unusually high CPUEs observed in the Spanish and French log-associated purse seine sets were detected in 1999.Since this was observed in both fleets, it is likely that this particular year was effectively highly fa-vorable.Despite a threshold value set to 5 in the Hampel identifier method for these purse seiner data, a substantial number of effort data were classified as outliers and corrected.It is possible that these high peaks in CPUE variability reflect some heterogeneity in the fleets, e.g.due to the few super-seiners (> 2000 gross tonnage) and super, superseiners (> 3500 gross tonnage) used by Spanish and French fleets (Lopez et al., 2014;Davies et al., 2014).Nevertheless, for fishing data analyses, and particularly stock assessment studies, it seems more appropriate to adjust the fishing effort relatively to neighboring CPUE of the same fleet (for the same month) while keeping the catch unchanged.

Fishing effort of purse seine and change in catchability on the Japanese longline
While the number of hooks seems a reliable measure of fishing effort for passive fishing gears like longlines, it is much more difficult to define consistent fishing effort unit for purse seiners.When considering fishing day, the time spent when searching for tuna schools can be highly variable depending on the skills of the skipper, the technology used, the engine power, and the communications between boats.By using only fishing hours, the effort unit is supposed to be independent of such variability, though there is still some uncertainty on what is included in this time of fishing activity.The effort of French purse seiners of the geo-referenced IOTC data set was already standardized entirely to number of fishing hours through re-processing of data for the period 1981-1990, when efforts were not declared with this unit (Chassot et al., 2013).For the Spanish fleets we similarly converted the effort to number of fishing hours for the period 1984-1990 to have homogeneous series based on the same unit.The comparison of both series in fishing hours showed that the French fleet had a lower annual total effort than the Spanish fleet except at the beginning of the fishery between 1981, 1989 and 1990.This is consistent with the number of purse seine vessels of both fleets operating during these years, ranging from 21 to 26 and 12 to 21 for French and Spanish fleets, respectively, until the mid-1980s but increasing to 26 for the Spanish fleet during 1989-1990(Pianet et al., 2008)).Over long historical periods, a fishery is potentially subject to strong changes due to exploitation, market and technological evolutions.In addition to potentially modify the measure of the fishing effort, it can also change the catchability for a given species.The Japanese longline fleet has been the most important bigeye tuna fishery in the Indian Ocean.It has provided the longest time series since the early 1950s that has a major influence on all stock assessment studies.Important changes have been documented for this fleet.Until the mid-1950s, the fleet was still limited to the eastern Indian Ocean (south of Java).Thereafter the fishing ground expanded into the central and western tropical Indian Ocean (Mohri and Nishida, 1999).In the 1970s and 1980s, with an increasing market demand for sashimi, the introduction of Earth Syst.Sci.Data, 9, 163-179, 2017 www.earth-syst-sci-data.net/9/163/2017/ monofilament allowed for setting the line in deeper depths to target bigeye tuna.This produced major changes in the catchability of this species (Campbell et al., 2001;Okamoto et al., 2001;Ward and Hindmarsh, 2007;Hoyle et al., 2015).
The detailed analysis of CPUE by geographical strata conducted in the present study allowed for the timing of change to be identified.Consistent breakpoints were identified in the northwestern and eastern tropical Indian Ocean.The break is detected earlier in the eastern than in the western tropical regions.This is confirmed by Okamoto et al. (2001), who reported that the use of deep tuna longline started in the south of Java and west of Sumatra around 1977 and then extended to the western equatorial Indian Ocean.Once the major breaks were identified by region, it was possible to aggregate the subsets into more homogeneous longline fisheries based on their average CPUE.

Perspectives
Finally, a total of 30 fisheries were defined to cover the whole period of exploitation of bigeye tuna in the Indian Ocean since 1952, with their associated catch length frequency data.
There is certainly further sub-disaggregation possible to get still more homogeneous fisheries data sets, but it is necessary to find a balance with a reasonable number of fisheries that can be manipulated in further studies of complex spatial fish dynamics.If necessary the number of fisheries could be limited to the main fleets that extracted most of the bigeye tuna catch over the historical period of exploitation.For instance, the first 10 longline fisheries (L1-L10) together with the first 4 purse seine fisheries (S13-S16) represent 96 % of the total bigeye tuna geo-referenced catch in the Indian Ocean during 1952-2014 (Table 3).But in other contexts, even small domestic fisheries representing a very small portion of catch can provide useful information on the distribution of the species.Therefore, the data set proposed here is thought to be a practical and useful geo-referenced representation of the historical distribution of bigeye catch over its modern history of exploitation.There are some uncertainties that are described and need to be accounted for when using these data.The uncertainty of fishing mortality for certain fleets due to unreported geo-referenced catch should be addressed in future data sets.Catch monitoring in some countries has been long to implement or is still inexistent, especially for artisanal fleets that may, however, contribute to a substantial catch due to a large number of small boats.This is likely the case for the artisanal Iranian and Pakistani driftnet fleets or the Sri Lankan gill net fleet (IOTC, 2015); for the purse seine fleet of Iran; and for distant-water longline fleets of India, Indonesia, Malaysia, and the Philippines.Finally, the strongest uncertainty is obviously for illegal catch.While there is no available estimate of illegal bigeye tuna catch for the whole Indian Ocean, it seems that an increasing trend of illegal fishing appeared in the eastern Indian Ocean during 1980s-2000s, whereas it was decreasing in the western region (Agnew et al., 2009).
Hopefully, new communication technologies should facilitate in the improvement of fishing data statistics and the control of illegal fishing.However, strong networks of observers and port samplers will continue to be additional requisite to monitor these fisheries and provide the most critical information for assessing their impact on the stocks.

Figure 2 .
Figure 2. Total annual converted weight catch (red bars) and total annual nominal catch (solid blue line).

Figure 3 .
Figure 3. Spatial distribution of bigeye tuna catch by longline fishing gears (total catch over 1952-2014).(a) Catch from the Japanese and Korean fleets expressed in number of individual tuna.(b) Catch from the remaining longline fleets expressed in tonnes.

Figure 5 .
Figure 5. (a) Nominal catch (lines) and raised geo-referenced catch (bars) of the fisheries described and provided in this study.(b) Remaining nominal catch declared to IOTC without geo-referenced information and not provided in this study.Codes: FLL, fresh tuna longline; NEI FRESH FLL, catch from non-reporting fresh tuna longline vessels; NEI Indonesia FLL, catch from non-reporting Indonesian fresh tuna longline vessels operating within its economic exclusive zone; NEI.FROZEN LL, catch from non-reporting longline vessels; PSS, small purse seine; LLCO, coastal longline.

Figure 6 .
Figure 6.Spatial variances in catch per unit effort (CPUE) computed from the Japanese longline fleet before (a) and after (b) correcting or eliminating outliers.

Figure 7 .
Figure 7. CPUE time series before (dashed red line) and after (solid blue line) correction of efforts of outliers for the Japanese and Taiwanese longline and for the Spanish and French purse seine fleets fishing on logs (LS).

Figure 8 .
Figure 8. Spatial variances in catch per unit effort (CPUE) computed from French log-associated purse seine before (a) and after (b) correcting or eliminating outliers.To enhance visibility, the variances are presented in 5 • × 5 • cells.
. 9).They occurred in October 1980 in region I and May 1977 in region V.A third breakpoint was detected in region III in August 1977.As a consequence, the Japanese longline fishing data of regions I, III, and V were merged into two historical periods.The first period includes data of the period March 1955-October 1980 in region I, data of November 1952-May 1977 in region V and data of January 1955-August 1977 in region III.The second period includes remaining Japanese longline fishing data corresponding to tropical (regions II, III, and IV) and subtropical (regions VI, VII, and VIII) fisheries (Table

Figure 9 .
Figure 9.Time series monthly CPUE, trend and detected break over the western (regions I and III) and eastern tropical Indian Ocean (region V).

Figure 10 .
Figure 10.Distributions of bigeye tuna size derived from catch sampling in (a) the tropical Taiwanese longline fishery (L5) and (b) the purse seine log-associated fishery (S13).The original data of size frequency from both fisheries are spatially distributed in 5 • ×5 • cells.
• ×1 • .Several fleets (La Réunion ELL, Indian LLEX, Mauritian LL, Seychelles ELL, Thai LL and South African ELL) provided 5 • × 5 • data in certain years and 1 • × 1 • in others.The 20 • × 10 • cell consisted only of the Mayotte fleet.IOTC provides two types of bigeye tuna catch unit: total weight and numbers of individuals.Four categories can be differentiated: catch declaration only in numbers (Japanese LL: 41.7 %), both in total weight and numbers for the same period (Taiwanese LL and FLL: 34.8 %), total weight and numbers for different periods (Korean LL and La Réunion ELL: 13.2 %), or alternatively only in weight (all remaining fleets: 10.3 %).Effort units are expressed in number of hooks (99.3 % of data), number of fishing days (0.58 %) and number of sets (0.16 %).The Mayotte ELL and the Philippine LL used only fishing days.Thai LL reported in fishing days in 2013-2014 but also in number of sets in 2007-2008 and in number of hooks in 2011.Portuguese ELL declared effort unit in number of hooks in 2008, 2010 and 2011 but also in number of fishing days in 2006-2007 and 2013-2014.
• × 1 • (7.30 %) and 20 • × 10 • .The Maldives LL and Mauritius ELL provided all their data at resolution 1 E), the eastern Indian Ocean (F57; east of 80 • E), the Indian Ocean northwest (IONW; west of 80 • E and north of the Equator), the Indian Ocean northeast (IONE; east of 80 • E and north of the Equator), the Indian Ocean southwest (IOSW; west of 80 • E and south of the Equator), and the Indian Ocean southeast (IOSE; east of 80 • E and south of the Equator).

Table 2 .
Results of the BFAST analysis of Japanese time series.Breakpoints with at least 50 % detections in both parameterization approaches are highlighted with bold letters.