ICES WGMG REPORT 2008 Reports/Expert...ICES. 2008. Report of the Working Group on Methods of Fish...

ICES WGMG REPORT 2008 ICES RESOURCE MANAGEMENT COMMITTEE

ICES CM 2008/RMC:03

REF: ACOM

Report of the Working Group on Methods of Fish Stock Assessments (WGMG)

7-16 October 2008

Woods Hole, USA

International Council for the Exploration of the Sea Conseil International pour l’Exploration de la Mer

H. C. Andersens Boulevard 44–46 DK‐1553 Copenhagen V Denmark Telephone (+45) 33 38 67 00 Telefax (+45) 33 93 42 15 www.ices.dk info@ices.dk

Recommended format for purposes of citation:

ICES. 2008. Report of the Working Group on Methods of Fish Stock Assessments (WGMG), 7‐16 October 2008,Woods Hole, USA. ICES CM 2008/RMC:03. 147 pp.

For permission to reproduce material from this publication, please apply to the Gen‐eral Secretary.

The document is a report of an Expert Group under the auspices of the International Council for the Exploration of the Sea and does not necessarily represent the views of the Council.

ICES WGMG REPORT 2008 | i

Contents

Executive summary ................................................................................................................1

1 Introduction ....................................................................................................................4 1.1 Terms of Reference (ToRs)...................................................................................4 1.2 Report structure ....................................................................................................4

2 Working papers..............................................................................................................4 2.1 WP1 – Chris Legault: An MSE Wrapper in Toolbox, Or: Does

Splitting Surveys Really Fix Retrospective Patterns? ......................................5 2.1.1 Abstract .....................................................................................................5 2.1.2 Summary of discussion...........................................................................6

2.2 WP 2 – Coby Needle: Thoughts about spatial management evaluation ..............................................................................................................7 2.2.1 Abstract .....................................................................................................7 2.2.2 Summary of discussion...........................................................................8

2.3 WP 3 – Coby Needle: Recent modifications to North Sea haddock MSE.........................................................................................................................8 2.3.1 Abstract .....................................................................................................8 2.3.2 Summary of discussion...........................................................................8

2.4 WP 4 – José de Oliveira and co‐authors: Evaluation of proposed amendments to the North Sea Cod Recovery Plan..........................................8 2.4.1 Abstract .....................................................................................................8 2.4.2 Summary of discussion...........................................................................9

2.5 WP 5 – Lionel Pawlowski and co‐authors: MSE using three approaches for Bay of Biscay anchovy...............................................................9 2.5.1 Abstract .....................................................................................................9 2.5.2 Summary of discussion.........................................................................10

2.6 WP 6 – Jan Jaap Poos and co‐authors: ITQs, effort allocation and high grading in mixed fisheries ........................................................................10 2.6.1 Abstract ...................................................................................................10 2.6.2 Summary of discussion.........................................................................11

2.7 WP 7 – Chris Legault: Report of the GARM Retrospective Working Group ...................................................................................................................11 2.7.1 Abstract ...................................................................................................11 2.7.2 Summary of discussion.........................................................................12

2.8 WP 8 – Chris Legault: Is advice sensitive to which “retro fix” is used?.....................................................................................................................13 2.8.1 Abstract ...................................................................................................13 2.8.2 Summary of discussion.........................................................................13

2.9 WP 9 – Jan Jaap Poos and co‐author: Comprehensive discard reconstruction and abundance estimation using flexible selectivity functions...............................................................................................................14 2.9.1 Abstract ...................................................................................................14

ii | ICES WGMG REPORT 2008

2.9.2 Summary of discussion.........................................................................14 2.10 WP 10 – Anders Nielsen: Contemporary implementation of a state‐

space stock assessment model ..........................................................................15 2.10.1 Abstract ...................................................................................................15 2.10.2 Summary of discussion.........................................................................15

2.11 WP 11 – Chris Legault and Liz Brooks: Incorporation of bootstrap‐derived landings uncertainty in VPA ..............................................................16 2.11.1 Abstract ...................................................................................................16 2.11.2 Summary of discussion.........................................................................17

2.12 WP 12 – Noel Cadigan: On the foundations for inference in fish stock assessment by sequential population analysis .....................................17 2.12.1 Abstract ...................................................................................................17 2.12.2 Summary of discussion.........................................................................17

2.13 WP 13 – Carmen Fernández and co‐authors: A Bayesian stock assessment model incorporating discards estimates in some years ............18 2.13.1 Abstract ...................................................................................................18 2.13.2 Summary of discussion.........................................................................18

2.14 WP 14 – Coby Needle: Developments in SURBA: uncertainty estimation and new implementations..............................................................18 2.14.1 Abstract ...................................................................................................18 2.14.2 Summary of discussion.........................................................................19

2.15 WP 15 – Benoit Mesnil: Detecting changes in time‐trends (FISBOAT)............................................................................................................19 2.15.1 Abstract ...................................................................................................19 2.15.2 Summary of discussion.........................................................................21

2.16 WP 16 – Joachim Gröger: Analysis of interventions and structural breaks 21 2.16.1 Abstract ...................................................................................................21 2.16.2 Summary of discussion.........................................................................22

3 Subgroup 1: Management strategy evaluation and retrospective bias..............22 3.1 Management Strategy Evaluations ..................................................................22

3.1.1 Full vs. shortcut......................................................................................22 3.1.2 Splitting survey series in response to retrospective patterns ..........30

3.2 Analyzing and summarizing MSE outputs.....................................................40 3.2.1 PCA and Cluster analysis .....................................................................40 3.2.2 Factor sensitivities and some methods of summarization ...............44 3.2.3 Conclusions ............................................................................................54 3.2.4 Recommendations .................................................................................54

3.3 Catch advice when different methods are used to account for retrospective bias ................................................................................................54 3.3.1 Methods...................................................................................................54 3.3.2 Results .....................................................................................................55 3.3.3 Discussion ...............................................................................................56 3.3.4 Conclusions ............................................................................................56

ICES WGMG REPORT 2008 | iii

3.3.5 Recommendations .................................................................................57 3.4 A state‐space fish stock assessment model applied to a dataset

with a strong retrospective pattern ..................................................................59 3.4.1 Motivation...............................................................................................59 3.4.2 Model and data ......................................................................................59 3.4.3 Results .....................................................................................................59 3.4.4 Discussion ...............................................................................................59 3.4.5 Conclusions ............................................................................................60 3.4.6 Recommendations .................................................................................60

3.5 Recommendations regarding retrospective patterns in stock assessment ...........................................................................................................63

4 Subgroup 2: Uncertainty in stock assessment models..........................................63 4.1 SURBA..................................................................................................................63

4.1.1 Methods...................................................................................................64 4.1.2 Results .....................................................................................................68 4.1.3 Conclusions ............................................................................................69 4.1.4 Recommendation ...................................................................................70

4.2 State‐space fish stock assessment model .........................................................77 4.2.1 Motivation...............................................................................................77 4.2.2 Model.......................................................................................................78 4.2.3 Results .....................................................................................................79 4.2.4 Conclusions ............................................................................................83

4.3 Stock assessment models incorporating partial information about discards ................................................................................................................83 4.3.1 Introduction to the problem .................................................................83 4.3.2 A model based on age selectivity smoothing via splines .................84 4.3.3 A model based on autoregressive‐in‐time age selectivities .............86 4.3.4 Description of the experiment considered .........................................87 4.3.5 Detailed results under scenario 1: full time‐series ............................88 4.3.6 Detailed results under scenario 3: many missing years ...................96 4.3.7 Comparing the three scenarios ..........................................................104 4.3.8 Conclusions ..........................................................................................110 4.3.9 Research recommendations................................................................112

5 Subgroup 3: Detecting changes in stock productivity ........................................112 5.1 Introduction.......................................................................................................112 5.2 Description of Methods....................................................................................113

5.2.1 Methods of Statistical Process Control (SPC)...................................113 5.2.2 Analysis of structural breaks using econometric techniques.........116 5.2.3 Analysis of interventions using an ARIMAX approach.................117 5.2.4 Other trend detection methods..........................................................120 5.2.5 A simple graphical method to detect shifts: the traffic light

plot .........................................................................................................121 5.3 Application of change‐detection methods ....................................................121

5.3.1 North Sea cod .......................................................................................121

iv | ICES WGMG REPORT 2008

5.3.2 North Sea haddock ..............................................................................128 5.3.3 General comment regarding the intervention and structural

break models: .......................................................................................131 5.4 Conclusions .......................................................................................................131 5.5 Recommendations ............................................................................................131

6 Conclusions ................................................................................................................132 6.1 Future directions for WGMG..........................................................................132

7 References ...................................................................................................................133

Annex 1: List of participants.............................................................................................136

Annex 2: WGMG Terms of Reference for the next meeting ......................................138

Annex 3: Recommendations .............................................................................................140

Annex 4: Working Papers .................................................................................................141

ICES WGMG REPORT 2008 | 1

Executive summary

Subgroup 1: Management Strategy Evaluations (MSEs) and Retrospective Bias

Comparing MSEs with and without full assessments

Perceptions of the impact of harvest‐control rules (HCRs) on the underlying popula‐tion can change with different levels of approximation in the MSE, particularly when considering the longer term. These changes are not always obvious when considering only individual distributions of quantities of interest (e.g. SSB and landings yield). It is possible that, using an approximated MSE (e.g. omitting both the assessment and intermediate‐year lag) could lead to perceptions of superior performance of one HCR relative to another in terms of summary statistics that would not be concluded if a full MSE were conducted.

Splitting survey series in response to retrospective patterns

In all cases examined, splitting the survey series produced fishing mortality rates in the population much closer to the target value than ignoring the source of the retro‐spective pattern in the original assessment. This conclusion held independent of the source of the retrospective pattern and whether it was a step change or gradual change. However, the best performance was found for models which most closely met the assumption underlying the split survey approach: a change in survey catchability or a sudden step change in process. Splitting of surveys is not recom‐mended as a routine fix for all assessments exhibiting retrospective patterns: external information should be used to guide the decision about how to address an assess‐ment with a retrospective pattern.

Analyzing and summarizing MSE outputs

Conducting an MSE can be approached in a two‐step procedure. The first step would be done among the scientists, and would address the issue of model dimensional‐ity—which factors are important, and how many levels and/or iterations should be retained for the ‘final’ MSE. The second step involves summarizing the information for managers and the general public, and this should be done graphically. All results of the MSE could be put into an appendix table for persons interested in the fine de‐tails.

Catch advice when different methods used to account for retrospective

Catch advice in the cases examined was relatively insensitive to the assumption made to address the retrospective patterns, being always lower than the original catch ad‐vice from the unadjusted base assessment as a consequence of the direction of the retrospective patterns.

A state-space fish stock assessment model applied to a dataset with a strong retro-spective pattern

The use of random effects in state‐space models for fisheries stock assessment ap‐pears to be a promising avenue for future research as a consequence of the limited number of parameters and quick run times to produce both point estimates and measures of uncertainty. These models can produce results without retrospective pat‐terns in cases where traditional VPA assessments exhibited strong retrospective pat‐terning.

2 | ICES WGMG REPORT 2008

Recommendations regarding retrospective patterns in stock assessment

WGMG considered the recommendations presented to the 2008 US Groundfish As‐sessment Review Meeting (GARM) Data Meeting regarding retrospective patterns in stock assessment. Several of these were accepted as appropriate, and are summarized in Appendix 3.

Subgroup 2: Uncertainty in stock assessment models

The SURBA assessment method can provide unbiased estimates of population quan‐tities when information is available on the relative catchability of the survey or tun‐ing index. However, even in this unrealistically good situation, SURBA CIs can have poor simulated coverage properties. Most notably, CIs for total mortalities were too wide, and CIs for recruitments were too narrow. However, CIs for SSB were fairly reasonable.

SURBA was also sensitive to penalty function weights, especially when relative catchability is estimated. This particularly affected estimates of mortality at younger ages, and biomass and SSB estimates. We cannot provide any guidance on how to choose these weights in practice, other than trial and error. SURBA estimates of (rela‐tive) recruitment and total mortality at older ages seemed more reliable.

A random‐effects approach is a more objective way to deal with controlling the varia‐tion in high‐dimensional parameters. The variance of the random effects is analogous to the penalty weight, and these variances can be estimated (i.e. chosen objectively).

A state-space stock assessment model

New algorithms and software tools capable of optimizing a full state‐space stock assessment model in minutes rather than in hours allow for the investigation of the frequentist properties of such models. It can be concluded that the simple confidence intervals for average fishing mortality and spawning‐stock biomass are too narrow. It is recommended that more advanced ways of constructing confidence intervals (profile likelihood, or simulation based methods) are investigated.

More generally. state‐space fish stock assessment models are worthy of further investigation, as they are able to separate process and observation noise and they avoid arbitrary smoothing parameters and ad‐hoc weighting of different data sources.

Stock assessment models incorporating partial information about discards

The two stock assessment models presented are able to incorporate discards data (and other kinds of catch data, such as bycatch) that may be available in just some of the assessment years. The models produce stock assessments that incorporate all the available catch information, although at the same time, complete time‐series of model estimates are obtained for discards (and any other components of the catch incorpo‐rated in the model, such as bycatch). The model estimates have associated confidence (in the maximum likelihood setting) or posterior probability (in the Bayesian setting) intervals and the stock assessment results also incorporate this uncertainty.

The finding that the model predictions are relatively robust against removal of part of the discards data relies entirely on the quality of the survey tuning indices. It appears that there is enough information in these to do a reasonable reconstruction of the dis‐cards. This is of course no guarantee that the same is true for all fish stocks. Also, the

reconstruction of the discards is only possible in years where survey tuning series exists. This may hamper reconstructing discards for very long time‐series.

Subgroup 3: Detecting changes in stock productivity

The test applications implemented show that available change‐detection methods can help to resolve a problem regularly encountered by assessment Expert Groups – that of deciding when the underlying recruitment state has change to a lower level. For both NS cod and NS haddock, the methods were fully consistent in the identification of breakpoints in recruitment series despite their distinct theoretical frameworks. Be‐yond the advantage of permitting an objective choice of breakpoints, the methods are also very simple to implement.

1 Introduction

1.1 Terms of Reference (ToRs)

The Working Group on Methods of Fish Stock Assessments [WGMG] (Chair: Coby Needle, UK) met in Woods Hole, USA from 7–16 October 2008 to:

a ) develop or verify methods and software for evaluating harvest control rules;

b ) develop methods for detecting, quantifying, communicating and correct‐ing retrospective bias and noise, and consider implications of retrospective bias for management advice;

c ) develop methods for the appropriate incorporation and estimation of un‐certainty in stock assessment methods;

d ) develop methods to detect persistent changes in stock productivity, includ‐ing time‐series approaches;

e ) examine ways to accommodate zero observations in stock assessment methods.

WGMG reported by 16 November 2008 for the attention of the Resource Management Committee and ACOM.

1.2 Report structure

Section 2 provides abstracts and rapporteurs’ notes for the working papers that were presented to the meeting. The WG proceeded to work via three subgroups, as fol‐lows:

• Subgroup 1: management strategy evaluation and retrospective bias. This covered ToRs (a) and (b), and is reported in Section 3 in this report.

• Subgroup 2: uncertainty in stock assessment methods. This covered ToR (c) and some aspects of ToR (e). The report of this subgroup is given in Sec‐tion 4.

• Subgroup 3: detecting change. This addressed ToR (d) and is presented in Section 5.

Section 6 summarizes a discussion that was held on the future direction of the group. Annex 1 lists the participants of the meeting, Annex 2 gives the proposed Terms of Reference for the next meeting of WGMG, and Annex 3 collates recommendations (both for WGMG members and other ICES Expert Groups). Finally, Annex 4 includes the full text of two of the working papers that were presented to WGMG.

2 Working papers

The following table summarizes the working papers presented to WGMG:

NUMBER NAME TITLE TOR PAPER PRESENTATION

1 Chris Legault

An MSE Wrapper in Toolbox, Or: Does Splitting Surveys Really Fix Retrospective Patterns?

A No Yes

2 Coby Needle

Thoughts about spatial management evaluation

A No Yes

3 Coby Needle

Recent modifications to North Sea haddock MSE

A Yes Yes

4 José de Oliviera and co‐authors

Evaluation of proposed amendments to the North Sea Cod Recovery Plan

A Yes Yes

5 Lionel Pawlowski and co‐authors

MSE using three approaches for Bay of Biscay anchovy

A No Yes

6 Jan Jaap Poos and co‐authors

ITQs, effort allocation and highgrading in mixed fisheries

A Yes Yes

7 Chris Legault

Report of the GARM Retrospective Working Group

B Yes Yes

8 Chris Legault

Is advice sensitive to which “retro fix” is used?

B No Yes

9 Jan Jaap Poos and co‐author

Comprehensive discard reconstruction and abundance estimation using flexible selectivity functions

C Yes Yes

10 Anders Nielsen

Contemporary implementation of a state‐space stock assessment model

C No Yes

11 Chris Legault and Liz Brooks

Incorporation of bootstrap‐derived landings uncertainty in VPA

C No Yes

12 Noel Cadigan

On the foundations for inference in fish stock assessment by sequential population analysis

Yes Yes

13 Carmen Fernández and co‐authors

A Bayesian stock assessment model incorporating discards estimates in some years

Yes Yes

14 Coby Needle

Developments in SURBA: uncertainty estimation and new implementations

C No Yes

15 Benoit Mesnil

Detecting changes in time‐trends (FISBOAT) D No Yes

16 Joachim Gröger

Analysis of interventions and structural breaks

D No Yes

2.1 WP1 – Chris Legault: An MSE Wrapper in Toolbox, Or: Does Splitting Surveys Really Fix Retrospective Patterns?

2.1.1 Abstract

A number of programs from the NOAA Fisheries Toolbox (NFT) were combined to create a management strategy evaluation (MSE) with the specific purpose of evaluat‐ing one method of removing retrospective patterns: splitting survey time‐series. The NFT PopSim program was the operating model determining the true population for the simulations. Three different sources of retrospective pattern were introduced in the PopSim calculations: 1) misreporting of catch, 2) a change in the natural mortality rate M, or 3) a change in the survey catchability q. Changes in catch or natural mortal‐ity could occur either in the start of the time‐series or in the recent part of the time‐

series. The NFT Adapt VPA (virtual population analysis) program was used to assess the stock and the NFT AgePro (age based projection) program predicted the catch two years in advance under the management rule setting the quota according to fish‐ing mortality F = 0.25. A small amount of implementation error (CV=10%) for the amount of fish actually caught was incorporated in the MSE.

For each of the six cases (base case, change catch early, change catch recent, change M early, change M recent, increase q), two MSEs were conducted: one with the surveys treated as a full time‐series and the other splitting the time‐series in the same year that the actual change in catch, M or q occurred. The response variables for each of the 100 realizations were annual landings, total landings over the eleven year hori‐zon, annual change in landings, spawning‐stock biomass (SSB) in the population, SSB estimated in the final VPA for each realization minus the true SSB, and the fully se‐lected fishing mortality rate which occurred in the population. In all cases, the full survey time‐series exhibited a retrospective pattern (although the retrospective pat‐tern in the “increase catch recent” case went away by the final years as a consequence of a coding error) and splitting the survey series removed (in most cases) or reduced (the “increase M recent” case) the retrospective pattern. The estimated populations were biased relative to the truth in the split survey series cases, except for the “change in q” case where the “fix” matched the source. However, once the corre‐sponding management action had been applied, the biases cancelled out and splitting the survey series produced the correct fishing mortality rate in the population in each case except the “increase recent catch” case, where the F in the population was below the management target (but this case needs to be repeated with the coding error fixed). There was either no change or a small decrease in total landings comparing the full time‐series with the split survey series, but the annual changes in landings were much less variable in the split survey series cases relative to the full time‐series cases. The true SSB was higher in the split survey series cases. Thus, by all measures, splitting the survey series produced better management advice than ignoring the ret‐rospective pattern in the assessments, even for the cases when the actual source of the retrospective pattern was a change in catch or M.

2.1.2 Summary of discussion

It was noted that random perturbations to data are unlikely to generate a strong ret‐rospective pattern. It is easy to remove a retrospective bias, but doing so does not guarantee that the assessment will be correct.

A question was asked about what the “truth” was in the projections. The response was that the truth is known, but is different for each run, and was not plotted in the presentation. In response to questions about the harvest control rule and the stock‐recruitment model, it was explained that a simple F = 0.25 control rule was used, and a Beverton‐Holt stock‐recruitment model was used to generate simulated data, but a simpler model was used to do projections in the assessment. Some participants felt it would also be informative to examine trends in annual “perceived” SSB.

It was suggested that splitting the VPA catchabilities (q) is similar to running the VPA only for the period after the “change” causing the retrospective pattern. However, it may be less desirable to shorten the time‐series for VPA in many cases because a longer time‐series is more useful for deriving limit reference points, etc. The efficacy of splitting may depend on whether the split occurs in the converged part of the VPA, or not. In the studies presented it was noted that splitting q reduced the retro‐

spective pattern; however, some participants indicated that the reverse can also oc‐cur.

A general discussion followed on fisheries management based on F control rules. It was felt that this approach may be more robust to model‐misspecification, and this is consistent with the results presented in the talk.

A problem was noted regarding the recent catch misreporting scenario, and the pre‐senter indicated that he would look into this further.

The main conclusion was that splitting VPA q can work, as long as the real problem or source of the retrospective pattern is a single, sudden change in a population or sampling process that is identified roughly in the right year and in the converged part of the VPA.

Summary of potential new work:

1 ) Look at annual assessment bias in the terminal year; 2 ) Sensitivity of the approach to the year of split in relation to the source of

the retrospective bias, particularly if this year is in the unconverged part of the VPA;

3 ) How would the approach work if there was a gradual change in a popula‐tion process?

4 ) What would be the effect of TAC constraints instead of, or in addition to, an F control rule?

2.2 WP 2 – Coby Needle: Thoughts about spatial management evaluation

2.2.1 Abstract

In recent years, a great deal of effort has been spent evaluating fisheries management plans, focussing (in Europe at least) on cod, haddock, flatfish, saithe, herring and mackerel. Management plans are all different, in one respect or another and the evaluations have been correspondingly rather specific in the methods used. They all share one feature, however: they all assume that the behaviour of fishermen will not change when confronted with changes in management or stock abundance. That is, the gear used, the discard pattern and the spatial distribution of effort will stay broadly the same despite radical changes in regulations or fish numbers (one excep‐tion is the flatfish study summarized in WP 6 which models shifts in fishing location). These assumptions are clearly not met in practice: fishermen will change what they do to suit the prevailing situation they find themselves in. An example of the impact of this kind of assumption is demonstrated in WP 3.

We can consider two possible approaches to incorporating changes in fleet behav‐iour, although there are doubtless others. The first is to model behaviour explicitly using economic utility functions, as in WP 6. This is an approach with a strong pedi‐gree, but it does require us to assume that we know what the likely response of the fleet will be to change. The second is to explore the range of responses via simulation game playing, in which the player is asked to achieve an economic target (say, maxi‐mum profit or minimum risk) while conducting a virtual fishery in a generated world. The responses of players to perturbations (such as a closed area, a discard ban, the appearance of a competitor etc) can be collated over time, and a model of likely real behaviour can be constructed on the basis of this information. The resultant model could then be used to address questions about the probable utility of spatial (and other) management instruments. The implementation of this second approach,

which is the subject of this short presentation, is in the very early stages of develop‐ment and needs considerable work before is can be considered as a valid analysis tool.

Several interesting points were noted. Fleet behaviour in response to closed areas can vary: in the North Sea vessels tend to move to a different area, whereas American vessels are often observed to ring the edges of closed areas. The modelling approach can be very computer‐intensive (and thereby slow), although the gaming approach has the advantage of being a simple and quick characterization from which much can be learned (although it relies on the availability of willing test candidates). In any case, the goal of a simulation game would be to learn about behaviours which would subsequently be implemented in a model.

2.3 WP 3 – Coby Needle: Recent modifications to North Sea haddock MSE

2.3.1 Abstract

Since the last WGMG meeting in March 2007, the management strategy evaluation of the EU‐Norway management plan for North Sea haddock has undergone two main development phases. First, the revised plan adopted in January 2007 including a TAC constraint and a sliding‐F rule was evaluated during spring of 2008, leading ICES to advise that the plan had a low risk of resulting in biomass falling below Blim. Second, the question of interannual quota flexibility was addressed. In other words, is there a sustainability implication of allowing countries to borrow 10% of next year’s quota to be fished this year, or bank 10% of this year’s quota to be fished next year? The evaluation concluded that there is very little change in risk if this facility is permitted.

The haddock evaluations incorporate an assessment module and assume constant proportion discards‐at‐age, and the combination of these two factors results in a simulation in which biomass is underestimated while a large year class is passing through the fishery. This leads in turn (via the sliding‐F rule) to management out‐comes which are more precautionary than they need to be. The questions for WGMG are 1) is it necessary and/or desirable to include live assessments in an MSE, and 2) can modelling of discards be improved?

It was clarified that the TAC constraint in the haddock management plan does not apply if B < Bpa, and that in the projections the discards were taken to be a fixed frac‐tion of landings.

Questions were asked about how to manage banking and borrowing in an MSE, and whether there are interactions between such quota flexibility and assumed stock‐recruit relationships. More specifically, under what kind of SR relationship does banking and borrowing “work”? In response it was suggested that for North Sea haddock the stronger issue is discarding.

2.4 WP 4 – José de Oliveira and co-authors: Evaluation of proposed amend-ments to the North Sea Cod Recovery Plan

2.4.1 Abstract

ICES were requested to evaluate an EC proposal for cod recovery plans, a request that was later extended to include a Norwegian proposal. The time frame from

agreement of terms of reference to delivery of advice was very tight (two months), which meant that a comprehensive analysis was not possible. The approach was to agree a set of specifications and to code these up in FLR. The analysis focused on North Sea cod, with the aim of investigating the consequences of each plan in terms of biological risks, yields (particularly longer term) and stability of catches by con‐ducting a Management Strategy Evaluation (MSE). The design included four operat‐ing models: two possible sources of additional mortality (misreported catch, varying natural mortality) with each subject to two recruitment scenarios (recent low or his‐torical high). In addition, there were three observation error models: adjusting for misreported catch, adjusting for natural mortality, or making no adjustment; and the two proposed HCRs (EU and Norway), which each included some form of constrain‐ing interannual variation in the TAC. This gave in all 4 operating models × 6 man‐agement procedure = 24 combinations. Two additional combinations were considered, where for the misreported catch‐historical high recruitment operating model and misreported catch observation error model combination, each HCR was considered without TAC constraints, given a grand total of 26 combinations. Results for each HCR‐observation error model are compared across the different operating models, and presented in a table for selected summary statistics. The conclusion of the report presented by ICES was that “…both the EC and the Norwegian proposed Recovery/Management Plans are likely to recover the North Sea cod stock.”

The different symbols for points in the P(SSB>Blim) vs. catch plots was questioned. The reply was that these show different operating models.

The main conclusion reported from the evaluation was that the authors could not determine which recovery plan was better. A possible deficiency in the recovery plans was that the time to recovery was not specified. However, it was felt that it was useful to go through the process, and now that the code is available managers are going to rethink recovery plans. Weighting of operating models was another issue that requires further consideration.

Implications for WGMG: The code to do the work is available. Are there simple met‐rics/graphics/diagnostics that could be used to assist in differentiating the two HCR’s? PCA analysis or some type of cluster analysis is potentially useful ap‐proaches. Also, how should risk be summarized?

A discussion followed on differences in management objectives between the US and Europe.

2.5 WP 5 – Lionel Pawlowski and co-authors: MSE using three approaches for Bay of Biscay anchovy

2.5.1 Abstract

STECF has been requested to make a proposal to the European Commission about the implementation of a long‐term management plan of Bay of Biscay anchovy. This presentation is a summary of the group’s efforts to evaluate the biological, economi‐cal and social impacts of a management strategy based on a provisional TAC auto‐matically defined by a harvest control rule over a 10 years period.

Simulations of SSB were provided by a Bayesian biomass model. Simulations were carried for an average recruitment from the observations since 1987 and for a low recruitment, the situation of the stock since 2002. For both levels of recruitment, three

different HCRs with additional constraints on TAC (minimum TAC to open fishery and/or maximum TAC to protect the stock) were detailed. The study also included scenarios with changes on allocation between country, fleets and semester.

The economic component involves production, market price and the proportion of fishing effort allocated to anchovy as functions of TAC. Simulations were provided for a range of harvest rates between 10% and 100% of the TAC. The evolution of SSB catches and risks of falling below Blim and closure were presented. While risks and catches rise with harvest rates, setting a maximum TAC reduces the maximum at‐tainable catch and decreases interannual variability in TACs. A minimum TAC does not alter mean catch but increases the probability of closure. For a same risk level, catches are higher when TAC is not capped. The framework also presents a strong sensitivity to the level of recruitment.

Further work should focus on improving the bioeconomic coupling and developing social aspects. Currently the framework assumes that all the TAC is taken in a given year, which is probably not realistic. A further step will need to include some feed‐back effects of the market on the allocation of the fishing effort to anchovy.

The stock‐recruitment (SR) model was questioned; in particular, why use one at all because the data are really noisy. A participant responded that it might account for some of the SR relationship, and that a SR model might do better in estimating re‐cruitment at low stock size, which is where the stock currently is. It was questioned whether environmental effects exist in the SR relationship. It was explained that in the past this was tried, and results were not always reliable. It was suggested to look at a linear regression approach to fitting the SR relationship, but also account for autocorrelation in residuals. It was concluded that it is not a good idea to pretend that recruitment can be modelled. It seems that recruitment model scenarios (e.g. low, medium, or high) are more relevant when a SR relationship is not evident in data.

It was suggested that the management plan should work whether recruitment is low or high. The TAC limit in the management plan was discussed. In the past the limit was not an issue, but in the simulated projections the TAC limit had a large effect on the probability of SSB being less than Blim. What will happen in reality when the TAC limit is reached? Will the fleet stop fishing?

Questions were asked about how the probabilities from the MSE were computed. It was explained that they were aggregated over years. This led to discussion about how many years a management plan is evaluated for, and how probabilities should be computed (i.e. annually or not). This can be done several ways, and we need to be clear about how the probabilities are computed. The method used may depend on what the management imperative is.

There are many options and output measures for an MSE. It would be useful to study how to compare different rules, particularly using graphical methods.

2.6 WP 6 – Jan Jaap Poos and co-authors: ITQs, effort allocation and high grading in mixed fisheries

2.6.1 Abstract

Many fisheries are managed by Total Allowable Catches (TACs) and a substantial part by individual quotas. This output management has not been successful in mixed fisheries if fishers may continue fishing while discarding marketable fish. We analyse

the effects of individual quotas on spatial and temporal effort allocation and over‐quota discarding in a multispecies fishery. Using a spatially explicit dynamic state variable model, the optimal fishing strategy of fishers constrained by annual individ‐ual quotas, facing uncertainty in catch rates, is studied. Individual fishers will move away from areas with high catches of the restricted quota species and, depending on the cost of fishing will stop fishing in certain periods of the year. Individual vessels will discard marketable fish, but only after the individual quota for the species under consideration has been reached. These results are in line with the observations on the effort allocation and discarding in the Dutch beam trawl fleet. The models we present can be used to predict the outcomes of fisheries management, and are thus a useful tool for fisheries scientists and managers.

It was suggested that there is a requirement to model fisherman’s behaviour in an MSE (see WP 2).

The discussion focused on whether fishermen behave “rationally” with respect to fishing, and if they fish to maximize profit. It was suggested that it is useful to study this with fishermen, using their responses to scenarios, etc. It was noted that often when fishery regulations are implemented there is an expectation that F will reduce in the future; however, this rarely has occurred. This suggests that it is very difficult to model fishermen’s behaviour.

2.7 WP 7 – Chris Legault: Report of the GARM Retrospective Working Group

2.7.1 Abstract

This report summarizes a wide range of work related to retrospective patterns in stock assessment conducted by US scientists for the 2008 Groundfish Assessment Re‐view Meeting (GARM) Methods Meeting, culminating in conclusions and recom‐mendations. A retrospective pattern is a systematic inconsistency among a series of estimates of population size, or related assessment variables, based on increasing pe‐riods of data (Mohn 1999). This pattern of change in estimated values can have severe consequences for management of a stock, potentially resulting in depletion of a stock although the assessments indicate the targets are being met. Retrospective patterns have been observed in some but not all of the stocks in New England, as well as other stocks around the world. Retrospective patterns are not limited to virtual population analysis, having been observed in a wide range of models including statistical catch‐at‐age models. Instead retrospective patterns are an indication something is inconsis‐tent in the data or model assumptions. However, retrospective patterns are just one diagnostic for stock assessments and lack of a retrospective pattern does not necessar‐ily imply that all is well.

Simulation analyses have demonstrated a number of sources for retrospective pat‐terns, including missing catch, an increase in natural mortality rate, or a change in survey catchability. The GARM working group examined a number of potential methods to try to determine the source of a retrospective pattern using simulated data, but was unable to do so. However, the working group found it does appear possible to identify the timing of an intervention which leads to the retrospective pat‐tern in some cases. Similarly, a number of methods were examined to fix retrospec‐tive patterns. While the fixes did in fact remove the retrospective pattern, the new assessment was not always closer to the truth than the original assessment, even if the diagnostics of the new model were good. This means that caution must be exer‐

cised when applying any fix to an actual assessment to remove the retrospective pat‐tern.

The GARM working group recommends that stock assessment scientists always check for the presence of a retrospective pattern and that a strong retrospective pat‐tern is grounds to reject the assessment model as an indication of stock status or the basis for management advice. The working group also recommended future research to be conducted on the topic to define objective criteria for acceptance of an assess‐ment with retrospective patterns and to determine what type and level of adjustment in management advice is appropriate through management strategy evaluations.

There was a question about whether the disconnection in the closed area example was as a consequence of a mismatch in the index catchability q or to the F constraint on older ages? It was suggested that the mismatch could be as a consequence of the F constraint. A subgroup will very readily resolve this point.

Regarding the GARM working group recommendation to look for alternative models if a retrospective pattern was detected, it was asked why one would not also have a look back at the data? It was explained that this particular recommendation focused on looking back to the model assumptions and model misspecification, although of course the data should be considered as well.

Regarding recommendation 7, clarification was sought as to the point of converting the indices to swept‐area abundances. It was explained that when the effect of split‐ting the surveys was examined, the new q estimate changed; however, when the sur‐vey units were mean number per tow, the value of q had no interpretation. However, when the survey was scaled to swept‐area biomass, then q has an interpretation, and a “red flag” is raised when q is estimated to have a value greater than 1. In reality, we still have questions about herding, the calculation for minimum swept‐area, etc. which makes interpretation of q difficult.

A WGMG member mentioned that in the past, he had tried to find a retrospective pattern in the CSA model but could not create one. It was mentioned that in Woods Hole, a GARM simulation looked at model complexity and found that all models, even simple ones, exhibited strong estimation biases when factors that lead to retro‐spective patterns in VPA are present. The same WGMG member further elaborated that although he found no retrospective pattern, there was still bias in his investiga‐tion. In response, the presenter suggested that the appearance of a retrospective pat‐tern is at least a visible warning that something is wrong, and one can then look more closely at the data or model misspecification.

Regarding recommendations 3 and 4, it was asked whether anything more specific had been developed. The presenter explained that in terms of deciding whether to correct a retrospective pattern, the current protocol at Woods Hole compares the suggested rho adjustment with the plot of SSB vs. with error bars; if the rho adjusted value falls outside of the error bars, then the retrospective pattern is assumed to be more than just noise.

The WGMG Chair suggested that a working group look at the recommendations made in the presentation to determine if the group agrees with them and whether the recommendations would be appropriate to suggest for ICES advice (see Section 3).

2.8 WP 8 – Chris Legault: Is advice sensitive to which “retro fix” is used?

2.8.1 Abstract

Two separate approaches were used to address the question in the title. The first compared status determination criteria (F/Fmsy and SSB/SSBmsy) of a number of Northeast US stocks for three situations: 1) a base run, 2) when the base run was ad‐justed to account for any retrospective patterning, and 3) if necessary when the sur‐vey time‐series were split to remove the retrospective pattern. The adjustment for status determination criteria were based on a seven year peel and calculation of Mohn’s rho, the difference between the terminal (final year) estimate from an assess‐ment with limited years of data to the corresponding estimate from the assessment with the full years of data. In stock assessments which did not exhibit strong retro‐spective patterns, the rho adjusted base case status determination was within the bounds of uncertainty of the base case. In stock assessments which did exhibit strong retrospective patterns, the rho adjust base case status determination was similar to the split survey series status determination, and quite different from the base case status determination. The two independent approaches to account for the retrospec‐tive pattern produced similar advice.

The second approach utilized a number of different methods to account for the retro‐spective pattern from a single‐stock assessment and compared the resulting catch advice. The six methods were 1) base case with retrospective adjustment, 2) reduce the reported catch early in the time‐series, 3) increase reported catch in the recent years of the time‐series, 4) reduce M early in the time‐series, 5) increase M in the re‐cent years of the time‐series, and 6) split the survey time‐series. In cases 2‐6, the same timing of the split was used based on a moving window analysis, approximately half way through the time‐series. The current F and SSB varied by a factor of three among the six cases, as did the SSB and MSY reference points, while the estimated F refer‐ence point was the same for all cases except the “change M recent” case. The initial catch advice also varied by a factor of three among the six methods. However, a number of adjustments to the initial catch advice must be made to account for the retrospective pattern, assumed changes in reporting, or management limits. Making retrospective adjustments for the base case and “decrease early M” case (a retrospec‐tive pattern was still present), adjusting the “increase recent catch” case to account for the misreporting, and limiting the “decrease catch early” and “increase M recent” cases produced catch advice that was much more consistent. All the approaches to account for the retrospective pattern produced lower catch advice than the original base advice. Catch advice is relatively consistent no matter which approach used to account for a retrospective pattern.

Clarification was sought as to how the catch advice was derived. For example, in the Change C.recent case, the level of catch had to be inflated by a factor of 3. The pre‐senter clarified that although the predicted catch in 2008 was 10.5, it was scaled down to the magnitude of reported catches. Basically, because catch needed to be inflated by a factor of 3 to reduce the retrospective pattern, then the catch advice was scaled similarly. Therefore, 10.5/3 gave an adjusted 2008 catch of 3.5. A follow‐up question was asked what happened if you made no retrospective adjustment. The presenter responded that if no retrospective adjustment had been made, then advice would be from the Base case estimate of Catch 2008, which was 6.7.

A question was asked regarding the stock status across the various models. The speaker clarified that the estimated status was relatively consistent, except in the Change Mrecent scenario where the F reference point was unrealistically high.

A WGMG member recollected that at last year’s meeting, a conclusion was that you may not be successful in correcting the retrospective issue, but here it sounds like the conclusion is that the catch advice may turn out to be insensitive. The presenter agreed, reiterating that in this set of simulations, it appeared that if one follows the logic of a given retrospective‐fix all of the way through to the final catch advice, then that advice may be consistent.

A WGMG member asked why the presenter chose to apply the SSBrho to adjust catch rather than the Frho? The presenter responded that he thought the SSB would be more related to MSY (catch) because it is a measure of abundance. Ultimately, there are other ways to do it, but none of them have been evaluated for this presentation. Often, the rho for F and SSB was similar (though usually with opposite signs), al‐though they can differ when different age ranges are used for the two metrics.

2.9 WP 9 – Jan Jaap Poos and co-author: Comprehensive discard reconstruc-tion and abundance estimation using flexible selectivity functions

2.9.1 Abstract

The additional mortality caused by discarding may hamper a sustainable use of ma‐rine resources, especially if it is not accounted for in stock assessment and fisheries management. Assessment of the stock size requires information on the total catch, consisting of both landings and discards. Generally, long and relatively precise time‐series on age‐structured landings exist, but historical discards estimates are often lacking or imprecise.

The flatfish fishery in the North Sea is a mixed fishery targeting a range of flatfish species, mainly sole and plaice. Owing to the gear characteristics and a Minimum Landing Size (MLS) for the species, considerable discarding occurs, especially for juvenile plaice. Consistent discard sampling by on‐board observers are only available since 1999, from a limited number of commercial fishing trips.

In the paper reported to WGMG (Aarts and Poos 2008), the authors develop a statisti‐cal catch‐at‐age model with flexible selectivity functions to reconstruct historical dis‐cards and estimate stock abundance. They do not rely on simple predefined selectivity ogives, but use spline smoothers to capture the unknown non‐linear selec‐tivity and discard patterns, and allow these patterns to vary in time.

The model is fitted to the age‐structured landings, discards and survey data. This statistical population dynamic model also allows selection of the most appropriate model complexity and estimates of uncertainty for all population and fishery charac‐teristics, according to objective criteria. We discuss the results and possible exten‐sions of this model and make comparisons to the current perceptions of the stock dynamics.

WGMG discussed the issue of handling zeros in this model. The presenter reported that they just added a constant initially, moving subsequently to half the smallest ob‐servation for each age. In any case, there are so few zeros in the example given that it doesn’t really matter how they are handled. Adding 1 to all observations to account for zeros was a problem, because it removed variance at older ages, and it was sug‐

gested that it would be safer to treat zeros as missing or (even better) to make appro‐priate distributional assumptions. A WGMG member found that adding constants artificially imposes structure (e.g. for successive “zeros”, adding a constant imposes a “no‐trend” structure.

The approach to modelling selectivity in the paper may be inflexible because it is unidirectional (spline parameters change linearly over time).

The reported flat likelihood surface seems counter‐intuitive given the relatively tight confidence bounds in SSB. It can be explained by correlation among estimates, as the F and N at‐age estimates have relatively large confidence bounds. The authors exam‐ined this aspect in detail and found some parameters that could be removed as a con‐sequence of high correlation, this improved the fitting, but they were still left with an overall flat likelihood surface. The main problem is still inventing discards on the basis of relatively little information, as seen by large confidence intervals for discard estimates early in the time‐series.

It would be an instructive exercise to take an assessment with full time‐series of dis‐cards and randomly drop a number of years to see if this model (or the Bayesian model in WP 13) can accurately estimate the values. It is still likely that we cannot truly discern between discards, black landings, and changes in natural mortality early in the time‐series.

2.10 WP 10 – Anders Nielsen: Contemporary implementation of a state-space stock assessment model

2.10.1 Abstract

A full state‐space fish stock assessment model was introduced as an alternative to the commonly used (semi‐) deterministic approaches and fully parameterized statistical models. State‐space stock assessment models were introduced by Gudmundsson (1987, 1994) and used by Fryer (2001). Previous implementations have been based on the extended Kalman filter, which uses a first order Taylor approximation of the non‐linear parts of the model. The current implementation is based on the Laplace ap‐proximation which is better suited to handle non‐linearities. The current implementa‐tion has been applied to three different cod stocks.

A state‐space assessment model is proposed, because it has a number of appealing properties: It is a full statistical model and as such quantification of uncertainties is an integrated part of the model. It allows selectivity to evolve gradually in the data pe‐riod. It is able to handle missing data (e.g. missing catches in a year). Finally it has fewer model parameters than other statistical assessment models, as quantities such as fishing mortalities (F) and stock sizes (N) are included in this model as so‐called random effects.

The model was validated by comparison to existing assessments and via simulated data. It was concluded that the state‐space assessment model was identifiable with respect to the model parameters, and capable of reconstructing the time‐series of spawning‐stock biomass (SSB) and average fishing mortality and quantifying their uncertainties. It was able to separate observation noise from process error, and if the model was restricted to having zero observation noise on catches it produced results similar to the (semi‐) deterministic approaches.

Why would one assume that fishing mortality is a random walk?

Why would one assume the variance parameters are constant over time?

ADMB is used and it was pointed out that ADMB is incredibly useful for problems involving estimation of more than a few parameters. The treatment of log F as a ran‐dom walk was of interest in that F could be related to real changes in effort or man‐agement quotas over time. The author pointed out that modelling F as a random walk is unique to Gudmundsson (1994) and Fryer’s work. It was further noted that abrupt changes in F from one year to the next might not be modelled well with the random walk process. There was some confusion on the ability to estimate initial numbers for a given cohort. The author answered that these are predicted values. Some thought that confidence intervals seemed too optimistic (i.e. small), but the au‐thor thought they were comparable in scale to those derived from bootstrap results for VPA‐based models. Furthermore, the truth was “almost” always (e.g. ~95%) in‐side the confidence interval. The model could also be modified to detect change points (changes in the system behaviour).

2.11 WP 11 – Chris Legault and Liz Brooks: Incorporation of bootstrap-derived landings uncertainty in VPA

2.11.1 Abstract

A three‐level bootstrap was used to estimate uncertainty in landings at‐age for a number of Northeast US groundfish stocks. Bootstrapping was applied at the port level by market category, for lengths within port sample, and ages within length samples. This process mimics the standard approach to estimating landings at‐age and has been incorporated into the software used for this purpose. A separate ana‐lytic approach to estimating uncertainty in landings at‐age was conducted for two stocks and produced similar results, confirming the applicability of bootstrapping for this situation. This bootstrapping process has been used to quantify the level of un‐certainty in landings at‐age relative to other sources in the stock assessments, to pro‐vide guidance on setting the plus group in the assessment, and to provide feedback to the port samplers in terms of modifying sampling requests by stock. The boot‐strapping process also lends itself to use in a VPA where the catch matrix can be re‐placed by bootstrapped values for each year to generate uncertainty in estimated parameters such as fishing mortality, total stock size, and spawning‐stock biomass. Results were compared to the standard approach to estimate uncertainty in VPA, bootstrapping only the tuning indices, as well as a combination run where both catch and indices were bootstrapped.

Bootstrapping only the indices produced uncertainty in only the most recent years as a consequence of the convergence properties of VPA. Bootstrapping the landings produced uncertainty throughout the entire time‐series. The uncertainty in the termi‐nal year estimates was slightly less for the catch bootstrap relative to the index boot‐strap. Bootstrapping both catch and indices produced slightly more uncertainty in the terminal year than either individual case, but the amount of uncertainty was much less than the sum of the two individual sources.

A commonly seen feature of the landings samples as measured by either the boot‐strap process or an analytical derivation is positive correlations among estimated values at‐age. This is attributable to the likelihood of sampling similar ages, espe‐cially old ages, among trips due to the overlap of length at‐age. The multinomial as‐sumption is often used in statistical catch‐at‐age models to fit proportions of catch at‐age. However, the multinomial distribution has only negative or zero correlations among ages. To test if this assumption caused different results, two sets of data were

created based on multinomial distributions: an effective sample size of 75 or 200 fish. These samples were expanded to the total catch weight for that year. As expected, larger samples had lower uncertainty in the VPA estimates. The trends differed slightly from the catch bootstrap case though, indicating that the positive correlations among ages in the catch bootstrap are important. The differences in trend were not so severe as to conclude that the multinomial assumption is fatally flawed, but more research should be conducted regarding this assumption in statistical catch‐at‐age models.

A WG member asked how the effective sample sizes for the multinomial distribution were chosen. The presenter explained that they were chosen because they are “rules of thumb” for low vs. high assumptions of uncertainty, and have been commonly used in assessments.

A question was asked if the years where there were shifts between the multinomial and the catchboot were from years where the positive correlations were largest. The presenter confirmed that this was the case.

A WG member pointed out that an EU project on estimating uncertainty concluded that model misspecification rather than data uncertainty made a bigger difference. Although inclusion of uncertainty in catch increased the size of confidence intervals within a given model, the trends and advice were not impacted. Another WG mem‐ber mentioned that he had conducted a similar simulation, but used a CV to alter the CAA each year; however, he did not constrain total annual catch to have the same total weight. It was also mentioned that there are a lot of studies on this same topic that can be found in ICES papers and the journal Fisheries Research.

2.12 WP 12 – Noel Cadigan: On the foundations for inference in fish stock assessment by sequential population analysis

2.12.1 Abstract

We consider the basis for statistical inference in sequential population analysis (SPA). We focus on the type of SPA routinely used in Northwest Atlantic groundfish stock assessments. We show that current methods used for SPA statistical inferences weight survey indices differently than the implicit and intuitive weighting that many fisheries scientists use to track cohorts in survey data. Scientists focus on the ages that tend to be caught well in the survey, whereas common SPA fitting methods focus on ages not well caught. We present an alternative method based on a different stochas‐tic model for survey indices that is more consistent with the implicit methods used by fisheries scientists. This alternative approach is based on a detailed consideration of the sources of variability in survey indices. In addition, we also consider the issue of conditional statistical inference, and how this applies to SPA. Appropriate condition‐ing makes frequentist inferences relevant, and we suggest that it may be appropriate to condition on the total survey catches‐at‐age, over all years; however, further study is required.

A WGMG member suggested that the author was not aging all fish, only subsam‐pling upwards. How would that sampling fit into conditioning idea? Is there a way to fit effective sampling size? Sampling is going to vary from year to year.

The presenter replied that this would indeed generate additional (and unknown) complexity to consider, currently, this process is not considered in the models

In addition, subsampling and the use of age‐length keys to get age distributions of the samples created may bias the variance estimates that are now done by age. This would not be a problem if the entire catch would be sampled.

Ideally, we would like to bring in more information in our stock assessments, like tagging, or catch‐at‐age by gear type information; the SPA does not necessarily allow for this. Other methods are probably more appropriate for this. A Bayesian or state space approach may be the way forward.

We should try to make our inferences as informative as possible. There is probably information in the catch at‐age matrix that is not used in the SPA.

2.13 WP 13 – Carmen Fernández and co-authors: A Bayesian stock assessment model incorporating discards estimates in some years

2.13.1 Abstract

A Bayesian age‐structured stock assessment model was presented that takes into ac‐count the information available about discards and is able to handle gaps in the time‐series of discards estimates. The mechanism for doing this is to explicitly incorporate a term in the model reflecting mortality due to discarding, and to make appropriate assumptions about how this mortality may change over time. The result is a stock assessment that takes due account of discards while, at the same time, producing a complete time‐series of discards estimates. The method is applied to the hake stock in ICES divisions VIIIc and IXa, which experiences very high discarding on the younger ages. The stock is fished by Spain and Portugal, and for these countries there are only discards estimates in recent and non‐coincident years. Two runs of the model were performed; one assuming zero discards and another one incorporating discards. Re‐sults were compared and discussed and possible implications for management briefly commented on.

The method appears to be comprehensive and provides safeguards lacking in other discard reconstruction methods. However: discards are linked to effort in landings, but if discards from other fisheries are not landing the given stock, model would be problematic.

Survey residuals don’t change because of autocorrelation estimates close to 1 (B‐adapt in reverse), and few data in 1991‐2001 period.

One could attempt cross validation and dropping one data point at a time to check influence on model estimates.

It would be useful to try the method on a stock with a long time‐series of data. WGMG concluded that the method should do well when provided with good survey information.

2.14 WP 14 – Coby Needle: Developments in SURBA: uncertainty estimation and new implementations

2.14.1 Abstract

Stock assessments based on research vessel surveys or other fishery‐independent sources of information are becoming increasingly important as drivers of fisheries

management advice, in Europe and elsewhere. In some cases this approach has arisen as a consequence of stringent management measures which have led to less reliable commercial catch and effort data; in others, the stock trends indicated by fishery‐independent data are used as informative counterparts to more traditional assess‐ment methods. An important feature of any survey‐based method should be the es‐timation of the variance (or distribution) of output quantities such as mortality and abundance. There are many ways to estimate these variances. In this paper (Needle and Hillary 2007), we explore the characteristics of five such methods, namely the analytic delta method, residual and data bootstraps, parametric multinomial resam‐pling, and Bayesian methods. We use each approach to analyse the variance and bias properties of linear and non‐linear model fits to simple bivariate data, before consid‐ering the implications for a simple separable survey‐based stock assessment model (SURBA). We conclude that variance estimators need to be considered carefully for such models, as the incorrect choice can result in misleading fisheries management advice. Recent work, including implementations in Excel and R, are also reported.

There are a number of different ways in which this question could be addressed, and WGMG members suggested the use of ADModelBuilder, state‐space approaches, and modified (optimized) SURBA implementations in SAS and R as possibilities. Com‐ments focused mainly on ways in which the SURBA method could be improved, rather than methods for estimating variance, but this is probably appropriate – before considering variance, it is important to understand whether or not the method pro‐duces biased results.

With sufficient replicates, the parameter simulation approach should be identical (and much easier to code) than the delta method.

The interpretation of Bayes and Frequentist approaches are different, and we shouldn’t interpret Bayesian properties in a frequentist sense. We could derive Bayes‐ian CIs that are similar to Frequentist, but this is not necessarily sensible.

Optimised R coding could help speed up SURBA.

It is difficult to draw inferences for management from wide CIs – managers have to deal with this though, and uncertainty is not necessarily a hindrance to advice.

2.15 WP 15 – Benoit Mesnil: Detecting changes in time-trends (FISBOAT)

2.15.1 Abstract

Detecting changes in time‐series data collected to monitor a process is a fairly com‐mon problem in many areas (manufacturing and quality control, health and medi‐cine, econometrics, environment, etc.), and a wide array of methods has been developed over decades to deal with the issue. The presentation summarized an ex‐ploration of some of these methods, carried out during the EU project FISBOAT (Fishery Independent Survey Based Operational Assessment Tools).

One family of methods is known as Statistical Process Control (SPC), and includes many tools, among which the Control Charts routinely used in industry since the 1930s. These are graphical procedures used to monitor a process online, i.e. the chart is updated with every new observation and an alarm is raised if the chart crosses some predefined control limit; otherwise, wait for the next observation. The charts’ parameters can be tuned to achieve the desired trade‐off between detecting worri‐some changes quickly and keeping the risk of false alarms low. Performance of con‐

trol charts is commonly measured with reference to their Average Run Length (ARL), which is the expected number of sampling events until an alarm (false or justified) is raised; ARLs for given parameter settings can be looked up from published tables or computed with adequate software.

The focus was on the Decision Interval form of the Cumulative Sum (CUSUM) chart, which is designed to detect small persistent changes; it is also appropriate when one has a single measurement at each sampling event (e.g. one annual survey index or assessment output). Applications on simulated and real (IBTS recruitment index for cod) data showed that the CUSUM was effective in detecting changes even in a back‐ground of large variability. A difficulty that was identified for fisheries – or marine ecosystems – applications is the need to characterize an in‐control state or period, for which reference mean and standard deviation can be computed and used to stan‐dardize the full stream of observations: surveys typically span a relatively short pe‐riod of time, when stocks and ecosystems were already altered.

The presentation also showed trials with some tools belonging to the Change‐Point family of methods, taking advantage of their availability in R packages (strucchange and bcp). In contrast with SPC methods, these are “off‐line” methods: that is, the analysis is conducted after all data have been collected. The tools tried were able to identify the date of change in the simulated data but were less convincing in applica‐tions to the NS cod IBTS data. However, no firm conclusion should be drawn from such a superficial exploration.

Some useful lessons for ICES were suggested:

1 ) Detection of changes or trends in an underlying process is a problem com‐mon to many fields and disciplines, which has mobilized the attention of eminent specialists. It would be of advantage for ICES to consider the methods developed in other fields; thoroughly validated methods can be borrowed in particular from the SPC world. Expertise on these methods is likely to exist on the environmental side of ICES (notably WGSAEM); it would be beneficial to improve the collaborations between the methodo‐logical support groups of the fisheries and environment research within ICES.

2 ) Most change detection methods can provide assessments of the state of systems in a fully objective way (no subjective settings by users) and simi‐lar conclusions from the same data can be reached by independent experts. This is a considerable advantage in a context characterized by wide uncer‐tainties on cause‐effect relations, large natural and measurement variabil‐ity, and heated controversies. We note, however, that the notions of “worrisome change”, “acceptable risk of false alarm” or “desired perform‐ance” of the detection procedures are largely political issues that need to be decided in partnership with managers and stakeholders.

3 ) SPC methods accept that large inherent variability may exist in the refer‐ence state and strive to filter this out before raising an alarm. Moreover, they make it explicit that the risk of false alarms (which come with a cost) should be kept minimal. Over‐reactive advice to fisheries managers has at times cast assessment noise straight into the decision system, resulting in long lasting damage to the credibility of scientists. Methods that deliber‐ately avoid this happening should be helpful.

Can this approach be used to detect when ICES advice should be reopened or not due to incoming year‐class strength? The idea is to see if assessment WGs are just jump‐ing at noise, rather than responding to true change. However, there is inertia in all the examples provided, so they cannot easily respond so quickly without greatly in‐creasing risk of false alarm.

Could this approach be used to determine whether to use a lower or higher value for recruitment in the terminal year of an assessment? It is difficult to anticipate the an‐swer, but the presenter suggested that he might be able to look at the issue during this meeting.

It is not clear how well these approaches work when there is a slow trend over time instead of step change. We could detrend first and look at residuals. In fisheries, we could look at residuals to a stock recruitment relationship instead of recruitment di‐rectly, for example. The main difficulty in fisheries is that we have only one observa‐tion each year instead of being able to collect samples every five minutes as they do in industrial settings.

It is possible to use these methods with the output from assessments, e.g. SURBA or XSA, to see if there has been a “significant” change in SSB.

Have these methods been tested on haddock‐like cases with strong pulse changes? No, but it would be easy enough to try it.

2.16 WP 16 – Joachim Gröger: Analysis of interventions and structural breaks

2.16.1 Abstract

I briefly summarized the features and potential to four different classes (types) of de‐tection instruments to fisheries problems in WP16:

1 ) Econometric methods to detect structural breaks (Multiple regression models).

2 ) Time series methods to detect interventions (ARIMAX models). 3 ) The analysis of means (ANOM). 4 ) Illustrative graphical methods such as traffic light plots that help to iden‐

tify and locate potential changes through quantile‐based colour coding of time‐series.

It should be mentioned that methods 1 to 3 are “static” methods assuming that the location of the break or intervention is known beforehand. Given this, these methods needed to be made “iterative” first by allowing them to scan the time‐series and find (estimate) the prospective location of a potential structural break or intervention. This was done by using a search algorithm that basically looks for the best value of a qual‐ity‐of‐fit criterion. I used the mean squared error (MSE) in cases where the degrees of freedom were stable and Akaike’s information criterion (AIC) otherwise. At the same time I tested the structural break (intervention) variable(s) for significance and plot‐ted the resulting p values associated with the potential break along with the values of the quality‐of‐fit criterion. I implemented these algorithms for all procedures in SAS, Version 9.1.3. Beside using a specific real world dataset (births, New York, 320 days in 1965) that has been manipulated in order to mimic different situations by creating different types of structural breaks and interventions (different combinations of changes in trends and levels) the procedures has been also applied to real North Sea

cod (IBTS, 1st quarter, age 1 data, 1971 – 2008) and haddock data (recruitment data, 1963 ‐ 2007) while being at the WGMG.

How many data points are needed for one of these more in‐depth methods to work? It depends on the type (order) of the process. If itʹs a simple process, then itʹs similar to a regression model. It may be necessary to differentiate to make the time‐series essentially stationary: other approaches to detrending are also possible.

Can these methods be used in real time or are they better when they have the full time‐series? The biggest problem is if the intervention point is close to the end of the time‐series, but this is true for all methods, not just the ones presented here.

A good point of time‐series analysis is that you can forecast several data points and you can see future effect of intervention at a given time. A transfer function has been used to forecast herring recruitment.

In the market there are plenty of methods. An awkward problem with these methods is inference. It is important that the intervention is strong enough to be able to distin‐guish from normal variability. Some methods here could detect even a rather small intervention.

Can these methods detect pulses if there is more than one breakpoint?

3 Subgroup 1: Management strategy evaluation and retrospective bias

Participants: Chris Legault (Subgroup Chair), José De Oliveira, Liz Brooks, Lionel Pawlowski, Chris Darby, Anders Nielsen.

3.1 Management Strategy Evaluations

3.1.1 Full vs. shortcut

3.1.1.1 Methods

When performing Management Strategy Evaluations (MSEs) of proposed manage‐ment plans, where the management plan relies on an assessment model coupled with a short‐term forecast and application of a Harvest Control Rule (HCR), an approach that has been used in the past is to approximate the management plan for the pur‐poses of the evaluation. This approximation typically takes the form of simulating the behaviour of the assessment model by generating values directly from the operating model (the underlying “truth”) with variance and bias that is assumed to reflect the behaviour of the assessment model. A further approximation is to ignore the short‐term forecast required for the year following the final assessment data year but pre‐ceding the year for which a TAC is needed, known as the intermediate year. Short‐term forecast assumptions can differ markedly from the operating model, with poten‐tially serious consequences for the performance of HCRs being evaluated. These con‐sequences could remain hidden if the intermediate year lag is ignored when conducting a MSE. For example, Figure 3.1.1.1(a) shows trajectories for a full MSE, whereas Figure 3.1.1.1(b) shows the corresponding trajectories for an MSE where both the assessment and intermediate‐year lags are ignored. The trajectories show quite different behaviour, the full MSE showing cyclical behaviour, which does not appear in the approximated MSE. In this case the approximation produces a different perception of how the HCR impacts the underlying “true” population.

This section provides an initial look at the consequences of short‐cutting an MSE by approximating the management plan in two steps: first the assessment is omitted, and then the intermediate year lag is removed. The basis for the work is the cod re‐covery plan evaluations performed by ICES‐AGCREMP (2008), and relies on data from ICES‐WGNSSK (2008). Evaluations are based on the FLR‐framework (Kell et al., 2007). Only a subset of the options presented in ICES‐AGCREMP (2008) is considered here, namely an operating model where additional mortality is caused by misreport‐ing or by natural mortality, and where the management procedure (assessment, short‐term forecast and HCR) makes the correct adjustment. In both these cases, the HCR applies a TAC constraint where appropriate, but a third case is also considered, where for the operating model with catch misreporting, the HCR does not apply a TAC constraint. These three options are evaluated for the EU and Norway HCRs, making a total of 6 options. Results for these six options are obtained for each of three cases, namely the full MSE, the MSE where the assessment is ignored, and the MSE where both the assessment and the intermediate year lag are ignored.

3.1.1.2 Results

Results are given in Figures 3.1.1.2 to 3.1.1.4 for two quantities of interest in three years (SSB and landings yield in years 2010, 2015 and 2020) for the three operating model‐TAC constraint combinations considered. Although comparisons would usu‐ally be made between HCRs for any given operating model, of interest in these plots is how the distributions of each quantity change for each HCR under the different levels of approximation. The plots show only slight changes to the median and spread of these distributions for the cases where the TAC constraint is applied (Fig‐ures 3.1.1.2‐3). More marked differences result when the TAC constraint is removed (Figure 3.1.1.4), with a lower median and tighter spread for later years for the EU rule, but there is a less consistent picture for the Norway rule because of the cyclical behaviour shown in Figure 3.1.1.4.

Another important consideration for approximation is whether the same conclusions are reached in terms of relative HCR performance when approximate MSEs are con‐ducted compared to when a full MSE is performed. Figure 3.1.1.5 plots the 6 operat‐ing model‐TAC constraint‐HCR options considered in Figures 3.1.1.2 to 3.1.1.4 for two composite summary statistics, namely one where normalized landings yield and the probability of SSB being above Blim are combined with equal weighting, and an‐other where only the landings yield associated with SSB values above Blim are con‐sidered. These composite summary statistics attempt to reflect the type of choices managers would make between competing objectives (e.g. maximizing yield and minimizing the probability of SSBs falling below Blim) when selecting HCRs. These plots indicate that different conclusions in terms of the relative performance of HCRs may well be reached under the different approximation cases, particularly consider‐ing performance in the longer term (2012 onwards). For example, in Figure 3.1.1.5(a), one could conclude that there is little to choose between the EU and Norway rules when the operating model is catch misreporting (compare c.1.N.c.85 with c.1.E.c.85), but the Norway rule performs much better in terms of this composite statistic for the same operating model under the approximate MSE (compare c.1.N.c.85 with c.1.E.c.85 , Figure 3.1.1.5(c)). In this case, the approximate MSE leads to one rule being clearly favoured over another, whereas this would not have been the case had a full MSE been performed.

3.1.1.3 Conclusions

• Perceptions of the impact of HCRs on the underlying population can change with different levels of approximation in the MSE, particularly when considering the longer term (Figure 3.1.1.1). These changes are not always obvious when considering only individual distributions of quanti‐ties of interest (e.g. SSB and landings Yield.

• It is possible that, using an approximated MSE (e.g. omitting both the as‐sessment and intermediate‐year lag) could lead to perceptions of superior performance of one HCR relative to another in terms of summary statistics that would not be concluded if a full MSE were conducted.

3.1.1.4 Recommendations

Although approximated MSEs may provide similar results to full MSEs in the short term, it is recommended that, on the basis of differences between approximated and full MSEs in the longer term demonstrated in this study, full MSEs should be con‐ducted in order to evaluate whether HCRs are able to meet longer term objectives.

a ) Full MSE

b ) Approximated MSE

Figure 3.1.1.1. Comparison of trajectories for a full MSE (a), and an MSE where both the assess‐ment and intermediate‐year lag are omitted (b). The red trajectories are the median and inter‐quartile trajectories for the underlying operating model, the blue ones for the management pro‐cedure, and the grey ones show actual realizations of operating model trajectories.

EU.2 EU.3 EU.4 Norw ay.3

ssb: 2010

yield: 2010

Figure 3.1.1.2. Additional mortality due to catch misreporting, TAC constraints applied. Estimates of SSB (left) and landings yield (right) for three years (2010, 2015, and 2020). The plots compare three levels of MSE approximation: HCR.2 = full MSE, HCR.3 = assessment omitted, HCR.4 = as‐sessment and intermediate‐year lag omitted. The HCRs considered are EU and Norway.

ssb: 2010

yield: 2010

Figure 3.1.1.3. Additional mortality due natural mortality, TAC constraints applied. Estimates of SSB (left) and landings yield (right) for three years (2010, 2015, and 2020). The plots compare three levels of MSE approximation: HCR.2 = full MSE, HCR.3 = assessment omitted, HCR.4 = assess‐ment and intermediate‐year lag omitted. The HCRs considered are EU and Norway.

ssb: 2010

yield: 2010

Figure 3.1.1.4. Additional mortality due misreported catch, TAC constraints omitted. Estimates of SSB (left) and landings yield (right) for three years (2010, 2015, and 2020). The plots compare three levels of MSE approximation: HCR.2 = full MSE, HCR.3 = assessment omitted, HCR.4 = assess‐ment and intermediate‐year lag omitted. The HCRs considered are EU and Norway.

Figure 3.1.1.5. Comparison of 3 approximation cases (top: full MSE, middle: assessment omitted, bottom: assessment and intermediate‐year lag omitted) for two composite summary statistics.

3.1.2 Splitting survey series in response to retrospective patterns

3.1.2.1 Methods

A management strategy evaluation study was conducted to examine whether split‐ting survey series would produce improved catch advice for assessment situations where retrospective patterns were present. The comparison was made between split‐ting the survey series and not taking any action to address the retrospective pattern. Alternative methods to address the retrospective pattern, such as estimating unre‐ported catch in recent years, were not examined in this exercise due to time limita‐tions.

A number of NOAA Fisheries Toolbox programs were used in the study with the simulated population following characteristics similar to the Georges Bank yellowtail flounder stock assessment. Live stock assessments and projections were included in the MSE, and the management strategy was simply to fish at F=0.25 in the projection years. Four sources were used to cause retrospective patterning in the unmodified VPAs: 1) change in survey catchability, 2) decrease in M for early years, 3) increase in M for recent and projection years, and 4) over reporting of catch in early years. Each of the four methods initially assumed a large step change, but additional examina‐tions allowed for gradual changes between states. A fifth source, underreporting of recent and projection catch, was attempted, but coding issues prevented inclusion of this source in the results.

In each MSE, 100 realizations were generated and distributions of the following re‐sponse variables were measured during the feedback loop of projected years: 1) real‐ized F in the population, 2) spawning‐stock biomass in the population, 3) the difference between the SSB estimated in the final VPA and the true SSB, 4) annual landings, 5) change in annual landings, and 6) total landings over all years in the pro‐jection horizon. These response variables measure how well the management strat‐egy works in terms of achieving its goal of F=0.25, how the population responds to the management rule when the VPA is potentially biased, and any costs to the fishery in terms of reduced landings or increased variability in annual landings. An initial base case which did not exhibit a retrospective pattern was also examined to see if splitting surveys in situations when it was not necessary caused problems.

3.1.2.2 Results

The base case situation which did not exhibit a retrospective pattern did not differ whether the survey series were kept together or split (Figure 3.1.2.1). There was a slight increase in the uncertainty of the response variables, as expected because there are unnecessary parameters being estimated, but this was quite minor.

In all other cases examined, splitting the survey series produced F in the population much closer to the target level, caused the true SSB to grow to a higher level, had smaller annual changes in landings, and had only slight or no decrease in total land‐ings during the projection horizon (Figures 3.1.2.2‐3.1.2.7). For the gradual change in survey catchability case, three splits for the surveys were examined: at the start, mid‐dle, and end of the period of change. Splitting the surveys at the end of the change period produced the best management advice, in terms of achieving the target F of 0.25 (Figure 3.1.2.8), but all three performed much better than ignoring the source of the retrospective pattern (Figure 3.1.2.2 top left panel).

3.1.2.3 Discussion

Fishing mortality rates in 2007 and 2008 are the same in the population between the full time‐series and split time‐series for all cases due to initialization of MSE. This is why there is no variability in the underlying true parameters for the realized fishing mortality rate in the population, but landings and SSB do vary among realizations.

When the true source of the retrospective pattern differed from the fix, biases in the VPA could be quite strong. However, the feedback loop of projections from the VPA with split survey series produced the management advice much closer to the desired level than ignoring the retrospective pattern. This does not mean that other methods to remove the retrospective pattern would not perform well. In fact, other work pre‐sented in this report suggests that any of the methods to address the retrospective pattern will produce correct advice (Section 3.3). However, if external information is available regarding the source of the retrospective pattern, it should be used to guide choice of methods to produce an assessment without a retrospective pattern: a survey split may not always be appropriate.

3.1.2.4 Conclusions

In all cases examined, splitting the survey series produced fishing mortality rates in the population much closer to the target value than ignoring the source of the retro‐spective pattern in the original assessment. This conclusion held independent of the source of the retrospective pattern and whether it was a step change or gradual change. However, the best performance was found for models which most closely met the assumption underlying the split survey approach: a change in survey catchability or a sudden step change in process. Splitting of surveys is not recom‐mended as a routine fix for all assessments exhibiting retrospective patterns: external information should be used to guide the decision about how to address an assess‐ment with a retrospective pattern.

3.1.2.5 Recommendations

WGMG recommends that a wider range of situations should be examined in this MSE approach to evaluating retrospective patterns in stock assessments. These in‐clude: alternative fixes to the retrospective patterns (especially changes in catch re‐porting), stocks with more ages and lower fishing mortality rates, and large recruitment pulses.

Figure 3.1.2.1. Comparison of 100 management strategy evaluations for the base case when the surveys were maintained and when the surveys were split between 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Population plots denotes the target F in the MSE (0.25).

Figure 3.1.2.2. Comparison of 100 management strategy evaluations for the increase in survey catchability case when the surveys were maintained and when the surveys were split between 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Population plots denotes the target F in the MSE (0.25).

Figure 3.1.2.3. Comparison of 100 management strategy evaluations for the decrease early M case when the surveys were maintained and when the surveys were split between 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Population plots denotes the target F in the MSE (0.25).

Figure 3.1.2.4. Comparison of 100 management strategy evaluations for the increase recent M case when the surveys were maintained and when the surveys were split between 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Population plots denotes the target F in the MSE (0.25).

Figure 3.1.2.5. Comparison of 100 management strategy evaluations for the decrease early catch case when the surveys were maintained and when the surveys were split between 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Population plots denotes the target F in the MSE (0.25).

Figure 3.1.2.6. Comparison of 100 management strategy evaluations for the gradual increase in survey catchability case when the surveys were maintained and when the surveys were split be‐tween 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Popula‐tion plots denotes the target F in the MSE (0.25).

Figure 3.1.2.7. Comparison of 100 management strategy evaluations for the gradual increase in M case when the surveys were maintained and when the surveys were split between 1994 and 1995. There are six paired comparisons with the full time‐series on the left and the split survey series on the right for each pair. The horizontal red line in the Realised F in Population plots denotes the target F in the MSE (0.25).

Figure 3.1.2.8. Comparison of realized F in the populations when survey catchabilities gradually increase and the survey split is made at the start, middle, or end of the period over which the sur‐vey catchabilities change.

3.2 Analyzing and summarizing MSE outputs

MSE studies produce enormous amounts of output, which can present a challenge when interpreting, summarizing and conveying information to managers. WGMG focused on the results produced by two of the MSE presentations: North Sea cod (José de Oliveira) and Bay of Biscay anchovy (Lionel Pawlowski). At the outset, the group discussed trying to develop a suite of tools, algorithms, analyses, and graphi‐cal displays that would permit a quick visual synthesis of the information, and which did not rely on an exhaustive scrutiny of an overwhelming table. The examples of‐fered below are a first attempt at addressing this task: however, the “best” way to display results will depend on the specifics of a given analysis, and the particular question being asked.

3.2.1 PCA and Cluster analysis

Summary tables from MSEs are often difficult to interpret due to the large number of scenarios evaluated and the variety of outputs generated. It is desirable to identify which parameters have the strongest impacts on a management plan and also to be able to group scenarios having similar results together, in order better to summarize the results for the target audience. Furthermore, because MSEs generally require a substantial amount of time for model runs, such summaries can be used to determine which scenarios to concentrate on by highlighting those for which the impact of changing some parameters is known to be very strong. Knowing the sensitivity of the MSE for some parameters can also provide some guidance to improve the models involved.

A principal component analysis (PCA) and a cluster analysis were performed on a subset of outputs from the MSE of the Bay of Biscay anchovy (WP5) which consisted of 32 scenarios (Table 3.2.1.1). The following outputs were used for the analyses: me‐dian SSB, median SSB at 10 years, average catches (and their standard deviations), probabilities of falling below Blim and of closure (once and during the decade), total duration of the periods during which SSB is below Blim, total duration of closure and time necessary to recover to Blim. Although harvest rate is another parameter of the MSE ranging between 0 and 1 with 0.1 increments, in order to limit the number of scenarios used for analyses to a tractable number, only runs with harvest rate set to 1.0 were selected. As changing the harvest rate has an obvious impact on the outputs, it is advisable to check results with other subsets of data. The consistency of the re‐sults has been checked with other rates (0.2, 0.4, 0.6, and 0.8).

PCA (Figure 3.2.1.1) and cluster analysis (Figure 3.2.1.2) respectively identified 7 and 8 groups of analyses. Results show changing the type of allocation of TAC from con‐stant to variable between France and Spain has almost no effect on the results. In most cases except for simulations with low recruitment and HCR rule A, it appeared that setting a minimum 7000 t TAC to open the fishery does not impact substantially on the results. On the contrary, simulations are highly dependant on the choice of recruitment model. Setting a ceiling at 33000t for the TAC and the type of HCR has major effects on the outputs.

Results from PCA and cluster analysis may be quite difficult to use for people with little or no statistical background. Therefore it is advisable to consider how including these analyses in a final report of a MSE would add a clear message to the target au‐dience (which could be managers, fisherman or the general public). Those analyses are intended to help scientists to divide the final results into major groups of simula‐tions instead of producing an extensive list of results from each scenario.

Table 3.2.1.1. Excerpt from the summary table of MSE on Bay of Biscay anchovy. PCA and cluster analysis were performed on this subset of data.

-0.4 -0.2 0.0 0.2 0.4

rot[, 2]

ACLNNACL3N

ACLN7ACL37

AVLNNAVL3N

AVLN7AVL37BCLNNBCL3NBCLN7

BCL37BVLNNBVL3N

BVLN7BVL37

A/B - RickerNo TacMax

A/B - RickerTacMax

B - LowA – LowTACmin

A – LowNo TACmin

Figure 3.2.1.1. Loadings plots (principal components 2 and 3) of the PCA applied to MSE for the Bay of Biscay anchovy.

Low Ricker

A33000t No

Recruitment model

No 7000tTACmax

33000tTACmin

TACmaxS

Figure 3.2.1.2. Cluster dendrogram of the MSE of the Bay of Biscay anchovy.

3.2.2 Factor sensitivities and some methods of summarization

3.2.2.1 Sensitivity of output to MSE Factors (ANOVA)

Using the Bay of Biscay anchovy case study (WP5), first‐level effects were evaluated with an ANOVA in PROC GLM (part of the SAS package). The response variables were yield (or landings), discards, catch (yield+discard), spawning‐stock biomass (SSB), and recruitment. Explanatory factors in the linear model were the operating model (OM), the observation error model (OEM), the alpha‐multiplier in the Ricker stock recruit function (SR), the harvest control rule (HCR), and whether or not there was a constraint on the minimum TAC (minTAC). The partial sums of squares (Type III) were scaled relative to the total sums of squares as a method of determining the proportion of variability that an individual factor explained, given that all factors were in the linear model. No higher order interactions were considered: however, they are clearly important as the single order effects generally explained a very small proportion of total model variance. The motivation for considering only first order effects was to determine whether there were any factors that could potentially be dropped from future MSEs based on very low relative explanatory contributions. As an illustration, Table 3.2.2.1 summarizes the proportion of total model error ex‐plained by each factor for the response variables yield and SSB. Alternatively, this information could be assessed graphically (Figure 3.2.2.1). The conclusion from this example is that the operating model (OM) and the multiplier applied to alpha in the Ricker stock recruit function (SR) were responsible for explaining the most variation in results, although the degree of explanation depended on the response variable considered. The observation error model (OEM) explained a small amount of varia‐tion in SSB. The harvest control rule (HCR) does not appear to explain much varia‐tion.

The influence of levels within factors could be evaluated by examining marginal means (Table 3.2.2.2). From this sort of analysis, one could determine if there are lev‐els within a factor that could be dropped from future considerations.

Note that these results are not for a particular year; rather they are summarized over all years in the MSE. Subsequent analyses could consider interaction terms, which could allow the same sort of winnowing and possible justification for not conducting a full factorial simulation design. WGMG suggested that other techniques from the field of data mining, such as dendrograms, could provide insight regarding the simi‐larity of influence among model factors.

3.2.2.2 How many iterations are enough? (Subsampling results)

By their very nature, MSEs are time consuming, so the subgroup pursued another angle at reducing the dimension of these simulations by considering the number of realizations of each strategy and whether information is lost if fewer runs are made. The North Sea cod MSE (WP4) was carried out using 250 iterations for each combina‐tion of model factors (a total of 26 different models). Box percentile plots were made for the response variables (yield and SSB in year 2025) with all 250 iterations, and for subsamples of 100 and 50 (Figure 3.2.2.2). For SSB, the smallest sample size appeared to cover the same interquartile range; however the extreme values in the upper tail were not part of the range. For yield, it appeared that the 75th percentile was also de‐pendent on the number of iterations considered.

Recognizing that trade‐offs must be made between model dimensionality and the number of iterations, one can envision an iterative process where important factors

are identified through ANOVA or other analysis, levels within factors are exam‐ined—and either dropped or new levels suggested, then the number of iterations can be examined. From such a pilot study, one can settle on the suite of factors and levels and perform subsequent MSEs with the appropriate iteration number.

3.2.2.3 Summarizing multiple results simultaneously (4-D plots)

It was desired to be able to compare model results in several dimensions and at the same time to be able to evaluate trade‐offs between results. An example of a plot that allows the trade‐offs between yield and SSB to be evaluated through time and across the levels of the factor OEM (Observation Error Model) is given in Figure 3.2.2.3. Contours of SSB levels are overlaid on yield contours, and one can see that level 2 of OEM maximized both yield and SSB, although SSB is maximized by year 2020 while the yield is maximized a few years later.

Note that in this example, the factor levels for OEM are categorical, and this plot is interpolating between these categories. In this situation, the interpolation is inappro‐priate because there is no natural continuum between categories. If a different MSE had been run, and if the plotted factor had levels which were numeric (e.g. values of steepness between 0.2 and 1.0 in increments of 0.2), then the interpolation would be more meaningful. Therefore, one should interpret these plots more carefully when the factors are categorical rather than numeric.

3.2.2.4 Visualizing trade-offs through time and using weighting factors to guide management decision (Contour plots)

As an alternative to overlaying contours for two response variables (as in the 4‐D plots of Section 3.2.2.3), one could weight each of the response variables and plot the weighted sum. This was explored for output from the North Sea cod MSE (WP4 and Section 3.1.1) with weights that were illustrative rather than prescriptive. As many of the response variables are related, the correlations were examined and only three of the outputs were retained as being “more independent” relative to some of the other response variables (Table 3.2.2.3). Catch, discard, and yield have obvious correlations as catch is the sum of yield and discard. SSB and total biomass also have obvious cor‐relations, and discard and recruitment were correlated because most of the discards were on young of the year. The subgroup selected three response variables (yield, probability that SSByear>Blim, and recruitment) as possible outputs of interest to man‐agement. Their lower relative correlations removed concern that the same variable would get ‘extra weight’ if another response that was highly correlated had been in‐cluded. Two separate weight vectors were applied across three response variables: w1=[0.5, 0.5, 0.0] and w2= [0.5, 0.25, 0.25]. These two weighting structures give equal weight to the fishery (as measured by yield) and to the population (as measured ei‐ther by SSB alone, or as SSB and recruitment). Because yield and recruitment are on absolute scales, and SSB is measured with a probability, it was necessary to put all response variables on a similar scale before applying the weights. Both yield and re‐cruitment were normalized to the range [0, 1] within years over all models (m=26) and iterations (i=250) as follows:

)(min)(max)(min~

,,,,,,

yearimyearyearimyear

yearimyearyearimyearim yy

−−

)(min)(max)(min~

,,,,,,

yearimyearyearimyear

yearimyearyearimyearim rr

−−

The weights, w1 and w2, were then applied to the normalized response variables and the mean of the weighted sum of these variables for a given model was plotted for each year (Figure 3.2.2.4). In this case, the different weighting options produced simi‐lar qualitative conclusions between models, but incorporating recruitment into the weighted sum produced a lower overall scale in the results.

A second method of summarizing results for yield and SSB over models and years was attempted. In this case, the yield for a given model and iteration and year was multiplied by 1 if the SSB in that case exceeded Blim or was multiplied by 0 if SSB did not exceed Blim. The mean of this ‘success weighted’ yield ( y ) across iterations for a given model was then calculated yearly:

∑∑

iyearmiyearm

,ˆ , where pm,year,i=1 if SSBm,year,i>Blim or pm,year,i=0 otherwise.

Contour plots of success weighted yield allowed model comparisons through time (Figure 3.2.2.5). This type of plot can be interpreted as the mean expected yield for successful harvest strategies.

WGMG discussed these contour plots in terms of dividing rows to compare a particu‐lar MSE factor. For example, to compare harvest control rules, one could divide EU and Norway models where all other factors were alike. This would then produce a contour plot where results>1 indicated that the factor in the numerator outperformed the factor in the denominator. WGMG had many ideas about alternative ways to summarize these data, but ultimately the comparison depends on the question being asked by the scientist or manager.

3.2.2.5 Application to Bay of Biscay Anchovy

For the Bay of Biscay MSE (WP5), WGMG investigated the possibilities of summariz‐ing results and risks through a series of simple plots. Summary tables are hard to read while plots can intuitively provide information about 1) the relative perform‐ances of harvest control rules against each other and 2) how changing one parameter of the fishery may affect catches, biomass and risks.

A common question when comparing HCR is “which rule is better?” There is often no straight answer to this as it requires managers to define what they expect from the management plan. This leaves to scientists the task of providing managers a sum‐mary of the relative performances of the HCRs. This is done here through average catch levels, median values of SSB, probabilities of being below Blim and risks of clo‐sure of the fishery. For the same harvest rate (11 different values) and constraints on TAC (4 combinations between minimum and maximum TAC), the above‐mentioned variables for each rule are simply compared at each other through scatterplots (Fig‐ure 3.2.2.6).

Overall, those plots show average catch levels are better with rule B for any harvest rate. The median values of the SSB are higher for rule A. The probability of falling below Blim is higher with rule B. As most series of points overlap each other, it ap‐pears the different constraints on TAC do not affect substantially the previous vari‐ables. However, the probability of closure clearly depends on those constraints. When no minimum TAC is set, the fishery is open for any non‐zero value of TAC. In that case, the risks of closure are higher with rule B. When a TAC over 7000t is neces‐sary to open the fishery, the risks are higher for rule B when harvest rates are over 0.5. Under that value, risks are higher with rule A. This behaviour is explained by the fact the closure in case of low TAC depends on the product between SSB and harvest rate. In case of low SSB and high harvest rates, rule B is more risky. In case of high SSB but low harvest rates, risks are higher for rule A. Capping or not capping the TAC at 33000t does not appear to have substantial effects over the results.

The second series of figures (Figures 3.2.2.7 and 3.2.2.8) aims at replacing the sum‐mary table by two sets of interconnected plots. “Interconnected” means those plots are arranged to act as some sort of graphical ‘lookup table’: among rules and con‐straints, by knowing the value of one factor (among harvest rate, average catches, median SSB and probability of being below Blim), the respective values of all the other factors can be graphically determined. 8 series of points (the combination between the 2 rules and the 4 different constraints on TAC) are represented for each plot.

Figure 3.2.2.7 summarizes the principal outputs from the MSE which are the average catch levels, median SSB and the probability of falling below Blim. Figure 3.2.2.8 fo‐cuses on risks according to the SSB: total duration of periods where SSB is below Blim, time necessary to recover to Blim and duration of closure.

For each figure, the reader can start a search from any plot. For example s/he may want to know for a level of SSB what might be the catch. The value of the average catch is determined graphically from the plot on the upper right corner. By moving horizontally to the left, the reader will find the corresponding harvest rates for the same catch level. By going down vertically to the left lower plot, according to the harvest rates, the probability of falling below Blim will be given. This probability could also have been estimated from the plot on the lower right corner from the estimate of SSB. On a paper version of a report about MSE, the reader will only require a ruler to find all parameters. The use of plots instead of tables also shows directly all the op‐tions for the fisheries according to risks or expected catches. From the previous ex‐ample, it is easy to see that getting higher SSB means lower catches, lower harvest rates and lower probabilities of falling below Blim.

For the second figure (3.2.2.8), SSB needs to be known or estimated from the previous figure. From that value, the durations of closure and the total period where SSB is below Blim can directly be estimated by the two plots on the left of the figure. The time to recover to Blim is easily determined from the plots on the right side of the figure.

The working group made the comment an interactive version of those plots would be useful for managers. The user would select a management strategy and using scroll‐bars; some sort of pointers would move across plots and give the corresponding val‐ues of each variable. Due to time constraint, it was impossible to develop such a product during the meeting.

Table 3.2.2.1. Proportion of total error explained by each factor, calculated as (Type III SS)/(Total SS), for the response variables yield and spawning‐stock biomass (SSB).

PROPORTION OF TOTAL SS EXPLAINED

FACTOR YIELD SSB

OEM 0.001 0.015

OM 0.056 0.056

HCR 0.000 0.001

SR 0.037 0.058

minTAC 0.000 0.000

Table 3.2.2.2. Marginal means of response variables to different levels of MSE factors (option LSMEANS in PROC GLM, SAS Institute).

FACTOR / LEVEL MARGINAL MEANS

OEM YIELD CATCH DISCARDS REC SSB BIOMASS

catc 87,368 111,083 23,714 273,824 74,569 307,055

m 92,762 114,157 21,394 308,786 276,440 567,033

wg 90,897 113,903 23,006 291,989 159,969 420,491

cat 111,581 135,692 24,111 286,482 328,135 602,963

m 69,104 90,402 21,299 296,585 12,518 260,090

EU 90,430 113,128 22,698 281,277 151,930 399,675

Norw 90,255 112,967 22,712 301,790 188,723 463,378

1 107,583 133,034 25,451 365,343 330,526 671,089

0.5 73,102 93,060 19,959 217,724 10,126 191,964

Table 3.2.2.3. Correlation between response variables in the North Sea cod MSE.

YIELD CATCH DISCARDS REC SSB BIOMASS

yield 1.00 0.99 0.55 0.19 0.41 0.46

catch 0.99 1.00 0.66 0.26 0.35 0.42

discards 0.55 0.66 1.00 0.51 ‐0.11 0.04

rec 0.19 0.26 0.51 1.00 ‐0.07 0.15

ssb 0.41 0.35 ‐0.11 ‐0.07 1.00 0.95

biomass 0.46 0.42 0.04 0.15 0.95 1.00

Proportion of Total SS Explained (Type III)

OEM OM HCR SR minTAC

Factor

yieldssb

Figure 3.2.2.1. Proportion of total error explained by each factor, calculated as (Type III SS)/(Total SS), for the response variables yield and spawning‐stock biomass (SSB).

Figure 3.2.2.2. Subsampling from the full 250 iterations to examine the effect on the distribution of spawning‐stock biomass (SSB) in year 2025 (left) or yield in 2025 (right).

Figure 3.2.2.3. Example of a 4‐D plot that summarizes two response variables of interest (yield and SSB) as overlaid contours. Note that the factor on the y‐axis is categorical with only three levels; thus, the plotting function has interpolated between these levels.

Figure 3.2.2.4. Contour plots of three response variables (yield, probability that SSB

year>B

lim, and recruitm

ent level. The weights applied were arbitrary and for

illustration only. M

odel factor levels are concatenated to the labels seen on the y‐axis, with a ‘.’ Separating factors. The order of concatenation was OM (catch or

natural m

ortality), the scalar applied to alpha in the Ricker (1.0 or 0.5), the harvest control rule (EU, or Norway), the observation error model (catch, natural mor‐

tality, or the working group status quo), and the scalar on the TAC (0.85 or o.001).

Figure 3.2.2.5. Contour plot of Blim adjusted yield over time by management strategy factors. Within each 250 iterations for a particular model, yieldyear was multiplied by 1 if the probabil‐ity(SSByear>Blim) or by 0 otherwise. The value plotted is the mean (over 250 iterations) of this Blim adjusted yield.

0 5000 10000 15000 20000 25000

[7000t-33000t][0-33000t][7000t- no TAC max]Unlimited TAC

0 10000 20000 30000 40000 50000 60000 70000 80000 90000

med SSB A

0 0.1 0.2 0.3 0.4 0.5 0.6

lim) B

P(SSB<Blim) A0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

P(Closure) A

Figure 3.2.2.6. Comparison of performances between harvest control rules A and B for the Bay of Biscay anchovy MSE.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Harvest Rate

A [7000-33000t]A [0-33000t]A [7000-no max]A Unlimited TACB [7000-33000t]B [0-33000t]B [7000-no max]B Unlimited TAC

0 20000 40000 60000 80000 100000

Median SSB

0 20000 40000 60000 80000 100000

median SSB

Figure 3.2.2.7. Interconnected plots summarizing the principal results from the MSE of the Bay of Biscay anchovy. The blue dotted arrows indicate the links between variables and how the figure should be read to estimate all variables.

0 0.5 1 1.5 2 2.5 3 3.5

Time to recover to Blim

0 0.5 1 1.5 2 2.5 3 3.5

Time to recover to Blim

0 20000 40000 60000 80000 100000

Median SSB

A [7000-33000t]A [0-33000t]A [7000-no max]A Unlimited TACB [7000-33000t]B [0-33000t]B [7000-no max]B Unlimited TAC

0 20000 40000 60000 80000 100000

Median SSB

Figure 3.2.2.8. Interconnected plots summarizing the risks for the MSE of the Bay of Biscay an‐chovy.

3.2.3 Conclusions

Conducting an MSE can be approached in a two‐step procedure. The first step would be done among the scientists, and would address the issue of model dimensional‐ity—which factors are important, and how many levels and/or iterations should be retained for the ‘final’ MSE. The second step involves summarizing the information for managers and the general public, and this should be done graphically. All results of the MSE could be put into an appendix table for persons interested in the fine de‐tails.

3.2.4 Recommendations

WGMG suggests that graphical displays of large tables be considered for other ICES WGs involved in MSEs as they are a faster and easier product to interpret.

3.3 Catch advice when different methods are used to account for retrospective bias

3.3.1 Methods

Legault (WP1) calculated the adjustments to input data and model assumptions re‐quired to remove the retrospective bias within a series of assessment estimates and the subsequent consequences of making the adjustments in the provision of advice. It was found that either increasing the level of natural mortality or catch in recent years or splitting the survey time‐series could remove series of retrospective patterns.

In the following section, three stock assessments which exhibited retrospective pat‐terns were used as case studies to examine how catch advice varied with the method used to address the retrospective pattern. These case studies are not the actual stock

assessments, but rather are illustrative of how such analyses can be conducted. The North Sea example examines annual changes in recent catch through the B‐ADAPT estimation process, block changes in recent M, and splitting the surveys. The two US examples examine block changes in catch and M, either early or recent, and splitting the surveys. The base case assessment catch advice was also adjusted by the SSB ret‐rospective pattern and all the adjustments were compared to the base case catch ad‐vice without any adjustment to account for the retrospective pattern. In each case, reference points were re‐estimated, and catch advice for next year was generated as‐suming a target F and either constant catch or constant F between the last year in the VPA and the current year.

3.3.2 Results

The North Sea and Skagerrak cod assessment uses the B‐ADAPT model to estimate unallocated mortality that could arise from additional discarding, natural mortality and/or unrecorded catch. The relevant ICES working group (WGNSSK; ICES‐WGNSSK 2008) has noted that the estimation of unallocated removals, based on an assumption of constant survey catchability, removes the retrospective bias in the as‐sessment and results in consistent fishing mortality and spawning biomass estimates from year to year. In order to examine the sensitivity of the advice provided for the North Sea cod, resulting from the alternative model assumptions, a similar exercise was carried out to that of Legault (WP1). Adjustments to natural mortality and split‐ting of the survey time‐series were explored, until the retrospective bias in the as‐sessment during the last seven years was removed. The implications of adjusting the cod stock forecast using the alternative assumptions was examined. Corrections to the level of catch required have previously been examined by WGNSSK. The esti‐mated unallocated proportion of the catch increased to its highest level in 2002 when the TAC was halved, and since then the proportion has subsequently declined. A short‐term forecast (Table 3.3.1) with the input derived from the catch‐corrected as‐sessment based on a status quo fishing in the year following the assessment and fish‐ing at F0.1 (= 0.14) two years after the assessment resulted in catch advice of 21kt at F0.1 before correction for the unallocated mortality and 14t (a 33% reduction) after re‐moval of the average unallocated removals from the total. This advice was similar to that based on the uncorrected catch data adjusted using a bias correction applied to the forecast catch based on the retrospective correction made to SSB each year. Ad‐justing natural mortality by similar levels to those of the estimated catch bias did not remove the retrospective pattern in the assessment. Ad hoc exploration of the appli‐cation of a constant change in natural mortality established that in order to remove the retrospective bias it was necessary to increase natural mortality by a factor of 2.5 from 2000 to 2008. The increase in natural mortality is considerably greater than the adjustment applied to the catch data. A forecast at status quo fishing in the year fol‐lowing the assessment and fishing at F0.1 (= 0.36, based on the revised natural mor‐tality) two years after the assessment resulted in catch advice of 27kt. Splitting the survey series did not remove the retrospective pattern completely from the time‐series of assessment results; year‐to‐year bias was still present. For the cod assess‐ment it would appear that an assumption of a constant change in catchability is not the complete solution to the retrospective bias. A forecast made on the assumption that the splitting the survey could correct the problem, fishing at F0.1 (0.14) two years after the assessment resulted in catch advice of 14kt.

The Georges Bank yellowtail flounder‐like stock assessment (Table 3.3.2) produced a wide range of initial catch advice ranging from 3.1 to 10.5 kt. However, adjusting the

“increase recent catch” case to account for the large underreporting in recent years (as implied by the need to increase catch threefold to remove the retrospective pattern) reduced the largest catch considerably. Adjusting the base case and the change M early case (which did not remove the retrospective pattern) also reduced the catch advice considerably. Finally, in the US, MSY is a cap to annual catch and so limited the “change catch early” and “change M recent” cases. The final catch advice was much more consistent than the original range of catch advice and all values were well below the initial catch advice from the base assessment. This consistency was also evident in Legault (WP 1), and results from carrying changes all the way through to advice.

The Gulf of Maine winter flounder‐like stock assessment (Table 3.3.3) was quite simi‐lar to the Georges Bank yellowtail flounder‐like results. There was a wide range of terminal year F and SSB estimates and a wide range of reference points. Again, the “change M early” case did not remove the retrospective pattern and resulted in simi‐lar status to the base case assessment. However, adjusting these two cases to account for the retrospective pattern would produce the same stock status of F > Fmsy and SSB < ½ SSBmsy as the remaining four cases. The initial catch advice varied widely among the six cases, but once the same adjustments were made as in the Georges Bank yel‐lowtail flounder‐like case, the final catch advices was much more consistent and al‐ways lower than the initial catch advice from the base case assessment.

3.3.3 Discussion

The one approach that consistently produced different catch advice was changing recent M. The cases examined required large increases in recent M (2.5‐4 fold) in or‐der to remove the retrospective pattern. However, this implies radically different population dynamics in the stock and would be difficult to explain on biological grounds. It is unclear why larger changes in M are needed to remove the retrospec‐tive relative to changes in catch.

The two US stocks had moving window analyses performed previously to determine the appropriate year in which to change catch, M, or split the surveys. This analysis was not available for the North Sea stock, and caused some difficulties in determin‐ing when to split the surveys or adjust the M. Using the B‐ADAPT approach, annual changes in catch can be estimated that are smaller than the block changes used in the US stocks.

When addressing retrospective patterns in a stock assessment, it is important to be consistent and follow through on the logic of the adjustments. For example, when missing catch is assumed as the source of the retrospective pattern, the catch advice must account for this missing catch when setting the quota. This assumes that what‐ever caused the retrospective pattern will continue in the near future.

3.3.4 Conclusions

Catch advice in the cases examined was relatively insensitive to the assumption made to address the retrospective patterns, being always lower than the original catch ad‐vice from the unadjusted base assessment due to the direction of the retrospective patterns.

WGMG recommends that benchmark assessments conduct this sort of sensitivity analysis when assessments exhibit retrospective patterns to determine the robustness of management advice to different approaches of removing retrospective patterns.

Table 3.3.1. Comparison of catch advice and derived parameters for a number of approaches to account for the retrospective pattern in the base assessment using data similar to the North Sea cod stock assessment.

Table 3.3.2 Comparison of catch advice and derived parameters for a number of approaches to account for the strong retrospective pattern in the base assessment using data similar to the Geor‐ges Bank yellowtail flounder stock assessment.

Change Change Change Change Split BaseCase Cearly Crecent Mearly Mrecent Surveys Fref 0.27 0.27 0.27 0.27 0.30 0.23 Catch 2008 6.7 3.1 10.5 5.7 4.6 3.2 rho (SSB) 1.91 1.44 rho adjusted Catch 2.30 2.34 Catch Over-reporting 3 adjusted catch 3.5 MSY limit 2.34 2.86 Final 2008 Catch 2.3 2.3 3.5 2.3 2.9 3.2

Table 3.3.3. Comparison of catch advice and derived parameters for a number of approaches to account for the strong retrospective pattern in the base assessment using data similar to the Gulf of Maine winter flounder stock assessment.

Decrease Increase Decrease Increase Split range/

Parameter BaseCase Catch Early

Catch Recent M Early

M Recent Surveys mean

SSB2007 2621 1019 4080 2207 2830 1037 1.33F2007 0.1237 0.4418 0.4418 0.1513 0.169 0.4352 1.08 SSBmsy 4315 2270 9079 3325 1075 3786 2.01Fmsy 0.2753 0.281 0.281 0.2757 1.5052 0.2805 2.55MSY 1019 541 2164 786 861 900 1.55 Status SSB/SSBmsy 0.61 0.45 0.45 0.66 2.63 0.27 2.79F/Fmsy 0.45 1.57 1.57 0.55 0.11 1.55 1.51 Catch Advice assume C2008=C2007, F2008=Fmsy Initial 990 377 1509 817 1968 376 1.58 SSB rho 2.15 0.14 0.14 1.53 0.01 0.29 rho adj Catch 315 323 catch overreporting 4 adjusted Catch 377 MSY limited 861 Final Catch 315 377 377 323 861 376 1.25

3.4 A state-space fish stock assessment model applied to a dataset with a strong retrospective pattern

3.4.1 Motivation

One of the unique features of the state‐space fish stock assessment model (WP 10; Section 4.2.2 this report) is that it allows the selectivity pattern to evolve over the en‐tire time‐series. WGMG investigated whether a dataset with strong retrospective pat‐tern in F and SSB when assessed with a standard virtual population analysis would still exhibit a retrospective pattern when analyzed by the state‐space assessment model.

3.4.2 Model and data

The model is fully described in Section 4.2.2 of this document. The data are derived from an existing assessment of Georges Bank yellowtail flounder, provided by Chris Legault.

3.4.3 Results

There was not a retrospective pattern in the estimated SSB from the state‐space model and only a slight hint of retrospective patterning in estimated F (Figure 3.4.1). The uncertainty was quite high in both SSB and F and encompassed the results from the retrospective runs with fewer years of data. The standardized residuals were well behaved, in terms of relative magnitude among data sources, but did exhibit some patterning in terms of positive and negative blocks in both catch and index fits (Fig‐ure 3.4.2). The fishing mortality rates by age all showed a declining trend over the time‐series, but there was some change in selectivity at‐age over time (Figure 3.4.3).

3.4.4 Discussion

The small number of parameters in the random‐effects approach to this state‐space model enabled quick application of the model to these dataset. In contrast, applica‐tion of a traditional statistical catch‐at‐age model with many more parameters would require a considerable amount of time to set up input, and numerous runs to deter‐mine appropriate weightings of different components of the objective function. The ADMB‐RE coding provided uncertainty estimates with short run‐times (a few min‐utes) that would require many hours of MCMC evaluations, even using an ADMB statistical catch‐at‐age model.

The trends in F and SSB resulting from this analysis lie in between those from VPA analyses using the standard inputs and splitting the survey series. However, the standard inputs results in a strong retrospective pattern, while splitting the survey series removes the retrospective pattern but implies a large change in survey catchability. This model addresses the conflict between the catch and survey data which leads to the retrospective pattern in the base VPA by adjusting both fits. How‐ever, while the retrospective pattern has been removed, there remains patterning in the residuals which would require further exploration. It is exactly the conflict be‐tween the catch and survey data which results in fairly wide confidence regions.

There were also a few unexpected results from this application which would require further exploration before this assessment could be put forward for use by managers. Specifically, the process error for abundance N at‐age 1 (that is, the amount of uncer‐tainty in the stock recruitment relationship), was much less (approximately half) than the process error for all other ages. The much smaller process error for the stock re‐

cruitment relationship relative to survivorship for all other ages has not been ob‐served in this model previously and warrants further research with simulated data.

3.4.5 Conclusions

The use of random effects in state‐space models for fisheries stock assessment ap‐pears to be a promising avenue for future research due to the limited number of pa‐rameters and quick run times to produce both point estimates and measures of uncertainty. These models can produce results without retrospective patterns in cases where traditional VPA assessments exhibited strong retrospective patterning.

WGMG recommends further exploration of these models with simulated data con‐taining known sources of retrospective patterns with emphasis on examining the trade‐offs between bias and uncertainty in estimated quantities.

Figure 3.4.1. Retrospective plot of SSB and F4_5. The thick solid line is the estimate from the entire data series, and the shaded area is the corresponding 95% confidence region. Each of the thin solid lines are estimate from a retrospective run using data only up to the year of the red dot at the end of each line. The dashed lines are 95% confidence limits of the corresponding retrospec‐tive estimates.

Figure 3.4.2. Standardised residuals of the log‐catches and log‐indices.

Figure 3.4.3. The estimated fishing mortality time‐series by age.

3.5 Recommendations regarding retrospective patterns in stock assessment

WGMG examined a set of recommendations presented to the 2008 US Groundfish Assessment Review Meeting (GARM) Data Meeting regarding retrospective patterns in stock assessment. After reflection on results from the ICES WGMG in this and pre‐vious meetings, the following recommendations were agreed to:

1 ) Always check for the presence of a retrospective pattern as a matter of quality assurance (for both update and benchmark assessments). A strong retrospective pattern is a warning flag that the assumed processes in the model are not stationary. The presence and implications of a retrospective pattern as a source of uncertainty in the assessment should be clearly communicated to managers.

2 ) If a model shows a retrospective pattern, then consider alternative models or model assumptions such as changes in survey catchability, splitting surveys, adjusting M or adjusting catch. Although many working group analyses have demonstrated that it is usually possible to identify the tim‐ing associated with a change, it is not possible to identify the cause. There‐fore, biological and fishery considerations should be explored as a basis for adjustments for retrospective patterns.

3 ) The methods working group is encouraged to develop objective and con‐sistent criteria for the acceptance of assessments with retrospective pat‐terns. When a moderate retrospective pattern is encountered: 3.1 ) Consider alternative states of nature approach to advice. For exam‐

ple, there may be different hypotheses about natural mortality. To explore ‘alternative states,’ one could perform the assessment with M and again with 0.5*M and compare the management advice that each assessment would produce.

3.2 ) Investigate the performance of alternative methods for retrospective adjustments through management strategy evaluations.

3.3 ) Evaluate the change in catch advice under different methods to ad‐dress the retrospective pattern.

4 Subgroup 2: Uncertainty in stock assessment models

Participants: Carmen Fernández (Subgroup Chair), Noel Cadigan, Anders Nielsen, Tim Miller, Jan Jaap Poos.

4.1 SURBA

In this section the efficacy of standard confidence intervals (CIs) is investigated for a highly parameterized stock assessment model. The SURBA model (Needle 2008) was chosen because it is a relatively simple model to implement, and is highly parameter‐ized. The analyses were motivated by WP 14 which presented hypotheses about vari‐ance estimation in SURBA. The original intention of the subgroup had been to consider different methods of estimating variance, but issues quickly arose with bias in SURBA estimates which took precedence.

Two new implementations of SURBA were studied, namely a SAS version and an ADModelBuilder (ADMB) version. The methods implemented differed only in the way parameters were penalised to control their variation. Work was also begun on converting the existing Fortran‐90 implementation to a dynamic link library that

could be called from within R, but this has not yet been completed and is not re‐ported further here.

4.1.1 Methods

4.1.1.1 SAS version of SURBA

An implementation in SAS (using PROC NLMIXED) used the following fit function:

{ }2 2 2 2 2 2

, , 1 1

1( ) log(2 ) log( ) ,

( , , ) a function of ' , 's, and 's,

log( ) log( ), 1,

log( ) log( ) 2 log(

y y a a a a

a y f f s s q qa y

a y a y a a y

a y a y y y a y y a

e I q N

N N N f s N s f s

θ π σ σ λ λ λ−

= − + + + Δ + Δ + Δ

− >Δ =

+ −Δ =

⎧⎨⎩

), 2...4,

log( ) log( ), 4,

log( ) log( ) 2 log( ), 2...4,

3 log( ), 4,a

q q q a

+ − =Δ =

⎧⎨⎩⎧⎨⎩

Ia,y denotes a survey index. This fit function is similar to a log‐likelihood with some parameters treated like random effects, but with fixed λ weights. In the simulations the λ’s were each set to 0.001, which seemed to give realistic variations in estimates based on some trial simulated datasets. The problem of choosing penalty weights will be discussed in Section 4.1.1.2.

SURBA is a relative model and some parameters are confounded. In the separable Z model, sa and fy are confounded, and to make these parameters identifiable we set s6 = 1. Also, qa’s and recruitments are confounded, so we set q1 = 1. Hence, the SURBA re‐cruitment estimates are at the same scale as survey recruitment. Other estimates are relative to this scale.

The penalty function, 2yf

Δ , used for the fy parameters favours constant values for

these parameters; that is, the penalty function is zero only when the fy parameters are all equal. For ages 4 and less, the penalty functions on the sa and qa parameters fa‐vours a linear function in age; that is, the penalty function is zero when the sa and qa parameters are linear in a. For older ages we shrunk log(qa) to zero because of appar‐ent confounding with mortalities at younger ages. That is, SURBA can produce simi‐lar fits to the survey indices by using 1) high mortality on younger ages and high q’s on older ages, or 2) low mortality on younger ages and low q’s on older ages. With parameter setting 1) the population size at older ages is underestimated because of the high mortality cohorts experience at younger ages, and consequently the survey indices at older ages then appear to have a higher catchability, q. With parameter set‐ting 2), the opposite occurs and population size at older ages is overestimated.

4.1.1.2 ADMB version of SURBA

The equations provided in Needle (2008) were generally followed for implementation of SURBA in the AD Model Builder code. Important differences include the objective

function that is minimized and treatment of the penalty terms in the objective func‐tion for the yearly mortality components ( yf ) and the age‐specific mortality compo‐

nents ( as ).

We evaluated the behaviour of three different objective functions in AD Model Builder. In all three, the indices‐at‐age are treated as IID lognormal random variables and the corresponding log‐likelihood comprises the objective function component for these data. Similarly, the differences of yf across years and as across ages are treated as IID lognormal random variables with mean zero. However, the same set of yf

values are treated as data as those in Equation 2.13 in Needle (2008). The values of as for ages 2,…,7 are treated as data. In the first objective function (O1), arbitrary scalars of 0.01 and 0.1 are multiplied with the log‐likelihood components for yf and as , re‐

spectively, but variance parameters ( 2

fσ and 2

sσ ) are estimated. In the second objec‐

tive function (O2), no scalars are provided (i.e. equal weights to the age‐specific index data), but variance parameter estimates are calculated analytically as

log logY

= +=−

−∑

log logA

= +=−

−∑,

respectively. Finally, in the third objective function (O3), we assumed fixed values of 2 100fσ = and 2 100sσ = with no additional scalars.

Our criteria for evaluating the behaviour of the three objective functions included relative bias and confidence interval (CI) coverage.

4.1.1.3 Confidence Intervals and Simulations

The CI method studied was based on a normal approximation of the Z‐statistic,

ˆ( ) ( )ˆ. .{ ( )}g

s e gθ

−= ,

where ˆ( )g θ is an estimate of some model result ( )g θ , which is itself a function of the

model parametersθ . The standard error of ˆ( )g θ , denoted as ˆ. .{ ( )}s e g θ , was obtained using the delta method (as used in Needle, 2008). The Z‐statistic confidence internal for ( )g θ is

1 / 2 1 / 2ˆ ˆ ˆ ˆˆ ˆ( ) ( ) . .{ ( )}, ( ) ( ) . .{ ( )}L Ug g Z s e g g g Z s e gα αθ θ θ θ θ θ

− −= − = + .

The “accuracy” of this CI method was assessed using simulations. One thousand datasets were generated, and ˆ ˆ{ ( ), ( )}L Ug gθ θ was computed for each dataset. If the CI’s are accurate then the true value should lie within the intervals in 100(1‐α)% of

the simulations. If the intervals are 2‐sided accurate then the true value should lie outside either bound only 100α/2% of the simulations. A variety of model outputs were examined: annual estimates of 1) recruitment, 2) total abundance, 3) spawning‐stock biomass (SSB), 4) total biomass, 5) average z for ages 1‐4, and 6) average z for ages 5‐8.

All parameters are estimated on the log‐scale in the AD Model Builder implementa‐tion and we calculate corresponding asymmetric confidence intervals by exponentiat‐ing the approximate 95% confidence interval of the log‐estimator X as:

( ){ } ( ){ }0.975ˆ ˆ ˆexp expCI X X z SE X= ± .

The motivation behind the exponentiation is that the log‐estimators are apt to have distributions closer to Gaussian. In the SAS version, CIs for annual recruitment, total abundance, biomass, and SSB were derived from log results, but Z confidence inter‐vals were not.

The data were generated using a very simple model, chosen to be the same as the SURBA assessment model. This was intended to remove the impact of model‐misspecification and focus solely on the accuracy of the CIs. Note that it would be fortuitous if CIs work for a misspecified model, but this is usually not expected. The population model was purely deterministic, with separable total mortality, Zay = safy. Some population quantities are shown in Figure 4.1.1 and Table 4.1.1. The population was modelled for 31 years and 8 ages. Survey data were generated from the observa‐tion equation,

, , , ,exp( ), ~ (0, ), 0.3a y a a y a y a yI q N Nε ε σ σ= = .

The random survey index (I) observations have approximately a 30% CV. An exam‐ple simulated dataset is shown in Figure 4.1.2.

SURBA cannot estimate the scale of the simulated population, because it provides only relative estimates of stock size. In fact, based on the q1 = 1 constraint on catchability we used, SURBA estimated stock size times q1. Hence, the coverage prop‐erties for stock size CIs were evaluated based on population values scaled by q1. This does not apply to total mortalities, which SURBA estimates on an absolute scale.

Table 4.1.1. Parameters for population simulator.

Age log(qa) log(sa) Maturity Weight

1 ‐1.5 ‐2.0 0 2

2 ‐1.0 ‐1.5 0 3

3 ‐0.5 ‐1.0 0.25 4

4 0 ‐0.5 0.5 5

5 0 0 0.75 6

6 0 0 1 7

7 0 0 1 8

8 0 0 1 9

1980 1985 1990 1995 2000 2005 2010

0.00.10.20.30.40.50.60.7

1980 1985 1990 1995 2000 2005 2010

8101214161820

Figure 4.1.1. Population to generate simulated data.

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 20100

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010

1980 1985 1990 1995 2000 2005 2010

72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02

Figure 4.1.2. Simulated survey dataset.

4.1.2 Results

4.1.2.1 ADMB version of SURBA

For two of the three objective functions, we found some subsets of parameter estima‐tors were confounded for some datasets. For O1, all parameters were estimable for all 1000 simulated datasets, but some parameters were inestimable for 23 (2.3%) of the datasets using O2 and 7 (0.7%) of the datasets using O3.

There was very little difference in the bias of the estimators among any of the objec‐tive functions (Figure 4.1.3). The relative biomass and spawning biomass were negli‐gibly biased whereas there was strong positive bias in the estimator of average Z over ages 1 to 4 (greater than 50%). Some positive bias was observed in the estimator of average Z over ages 5 to 8 in the early years of the time‐series, but reduced over the last 20 years of the time‐series.

Unlike the bias of the point estimators, there were large differences in bias of the con‐fidence interval coverage between objective functions O2 and O1 and O3 (Figure 4.1.4). That is, CI estimators corresponding to objective functions O1 and O3 were similar and negatively biased for relative biomass (and SSB) and mortality estimators whereas those corresponding to O2 were positively biased for relative biomass esti‐mators and negligibly biased for average Z estimators. The over coverage of CI esti‐mators for relative biomass (and SSB) using O2 is due to the positive bias of

corresponding standard error estimators (Figure 4.1.5). Moreover, the positive bias of standard error estimators for average mortality parameters using O2 compensates for the negative bias in corresponding point estimators to yield nearly appropriate confi‐dence interval coverage for most years.

The negligible bias of the relative biomass and spawning biomass appears to be due to the weighting of older aged fish in the estimator and the negligible bias in the es‐timator of average Z over ages 5 to 8.

4.1.2.2 SAS version of SURBA

Catchability Known

The first simulation was based on the true values of the catchability parameters, scaled so that q1 = 1. The results are shown in Figures 4.1.6 to 4.1.8. The median bias was very small for all population quantities (Figure 4.1.6). The CIs (Figure 4.1.7) were reasonably accurate for total abundance and biomass, and SSB. However, the CIs were too wide for total mortality, which is why the simulated exceedances were smaller than the nominal value of 2.5%. If the CIs were 2‐sided accurate then the simulated exceedances should equal 2.5%. The CIs also seemed too narrow for re‐cruitment because the sum of the lower and upper simulated exceedances were greater than 5% indicating that probability that population recruitment was within the confidence interval was less than 95% . The bias divided by the standard devia‐tion (SE standardized bias) directly affects CI performances. These biases (Figure 4.1.8) were not large however, so the problem with the CIs seems more related to the lack of normality in the distribution of the Z‐statistics.

Catchability Estimated

The results in Figure 4.1.9 show that estimates of abundance, biomass, SSB, and total mortality at‐ages 1‐4 were substantially biased. The bias is caused by confounding between q’s and z’s at the younger ages. The penalty function we used led to underes‐timation of q’s and z’s at‐ages 1–4. The low z’s estimated for these ages subsequently led to overestimation of population size at older ages. However, estimates of recruit‐ment and total mortality on ages 5‐8 were much less biased.

The CIs (Figure 4.1.10) performed very poorly for abundance, biomass, SSB, and total mortality at‐ages 1–4. This is expected given the bias in their estimates. Similar to the previous analysis, the CIs were too wide for Z at‐ages 5‐8, usually leading to low ex‐ceedance probabilities, especially for the lower bound. During the first 10 years of the time‐series the upper bound was too small, which seems related to negative bias in these years (Figure 4.1.11). The CIs for recruitment were reasonably accurate for most of the time‐series, but not in the first 10 years. This is due to bias in the estimates for these years (Figure 4.1.11).

4.1.3 Conclusions

Two problems were apparent in the accuracy of standard confidence intervals (CIs) in a highly parameterized model. They are:

1 ) Bias in parameter estimates, caused by confounding or poor identification of parameters with the available data. This is the main reason for inaccu‐rate confidence intervals.

2 ) Even when parameters can be estimated unbiasedly, the information in the data on these parameters can be limited. In this situation CI statistics such as the Z‐statistic (see Section 4.2.1) can have a non‐normal distribution and

variability greater than one. The distribution of the Z‐statistic may also de‐pend on other unknown parameters (i.e. not a pivotal statistic), and esti‐mates of these parameters may themselves involve substantial uncertainty.

SURBA can provide unbiased estimates of population quantities when information is available on the relative catchability of the survey or tuning index. However, even in this unrealistically good situation, SURBA CIs can have poor simulated coverage properties. Most notably, CIs for total mortalities were too wide, and CIs for recruit‐ments were too narrow. However, CIs for SSB were fairly reasonable.

SURBA was also sensitive to penalty function weights, especially when relative catchability is estimated. This particularly affected estimates of mortality at younger ages, biomass, and SSB estimates. The penalty weights are usually chosen subjec‐tively and many choices can lead to biased parameter estimates and poor confidence intervals (the latter primarily due to biased estimation of standard errors). When the variance parameters for the variation in yf and as were estimated (i.e. the penalty

weights), the standard error estimators were strongly positively biased whereas when fixed weights of some type were assigned to those objective function compo‐nents, there was mild negative bias of the standard error estimators. The important result of these analyses is that weighting of non‐data objective function components and whether weights are used or not have important effects on the inferences made for SURBA‐type models. We cannot provide any guidance on how to choose these weights in practice, other than trial and error. SURBA estimates of recruitment (rela‐tive) and total mortality at older ages seemed more reliable.

A random‐effects approach is a more objective way to deal with controlling the varia‐tion in high‐dimensional parameters. The variance of the random effects is analogous to the penalty weight, and these variances can be estimated (i.e. chosen objectively).

4.1.4 Recommendation

Investigate the utility of random‐effects or other suitable approaches to 1) reduce the dimension of highly parameterized fisheries models and 2) reduce bias in estimates of important population quantities, 3) reduce bias in standard errors of these estima‐tors, and 4) improve the accuracy of confidence intervals.

1980 1985 1990 1995 2000 2005 2010

Relative Biomass

1980 1985 1990 1995 2000 2005 2010

Relative SSB

1980 1985 1990 1995 2000 2005 2010

Zbar(1-4)

1980 1985 1990 1995 2000 2005 2010

Zbar(5-8)

Figure 4.1.3. Relative bias of estimators of yearly parameters provided by three alternative objec‐tive functions (O1, O2 and O3 are black, red and blue, respectively) for the ADMB SURBA model. Horizontal black line corresponds to unbiasedness.

1980 1985 1990 1995 2000 2005 2010

Relative Biomass

1980 1985 1990 1995 2000 2005 2010

Relative SSB

1980 1985 1990 1995 2000 2005 2010

Zbar(1-4)

1980 1985 1990 1995 2000 2005 2010

Zbar(5-8)

Figure 4.1.4. Coverage probability of 95% confidence intervals for estimators of yearly parameters provided by three alternative objective functions (O1, O2 and O3 are black, red and blue, respec‐tively) for the ADMB SURBA model. Horizontal black line corresponds to the expected coverage probability.

1980 1985 1990 1995 2000 2005 2010

Relative Biomass

1980 1985 1990 1995 2000 2005 2010

Relative SSB

1980 1985 1990 1995 2000 2005 2010

Zbar(1-4)

1980 1985 1990 1995 2000 2005 2010

Zbar(5-8)

Figure 4.1.5. Relative bias of standard error estimators of yearly parameters provided by three alternative objective functions (O1, O2 and O3 are black, red and blue, respectively) for the ADMB SURBA model. Horizontal black line corresponds to unbiasedness.

Recruitment

Abundance

1980 1990 2000 2010

4000450050005500600065007000

Biomass

0.7Ave Z ages 5-8

1980 1990 2000 2010

Ave Z ages 1-4

Figure 4.1.6. Medians of the simulation population estimates (points) as estimated by the SAS SURBA implementation. True population values are plotted as red lines.

2.53.03.54.0

4.55.05.5 Lower Upper

Recruitment

Abundance

1980 1990 2000 2010

Biomass

Ave Z ages 5-8

0123456

1980 1990 2000 2010

Ave Z ages 1-4

Figure 4.1.7. Simulated 95% confidence interval exceedances, or the percent of times the true population value was less than the lower bound (solid circles), or greater than the upper bound (open circles), from SAS SURBA.

0.3Recruitment Abundance

-0.3-0.2-0.10.00.10.20.30.4 SSB Biomass

-0.6-0.4-0.20.00.20.4

1980 1990 2000 2010

Ave Z ages 5-8 Ave Z ages 1-4

Figure 4.1.8. Simulated median SE standardized bias, which is the bias divided by the standard error (SE), from SAS SURBA.

Recruitment

1000120014001600180020002200

Abundance

1980 1990 2000 2010

Biomass

0.7Ave Z ages 5-8

1980 1990 2000 2010

Ave Z ages 1-4

Figure 4.1.9. Simulated median population estimates (points) from SAS SURBA. True population values are plotted as red lines.

101214 Lower Upper

Recruitment

Abundance

0102030405060

1980 1990 2000 2010

0102030405060

Biomass

Ave Z ages 5-8

0102030405060

1980 1990 2000 2010

Ave Z ages 1-4

Figure 4.1.10. Simulated confidence interval exceedances, or the percent of times the true value was less than the lower bound (solid circles), or greater than the upper bound (open circles), from SAS SURBA.

-0.20.00.20.40.60.81.0 Recruitment Abundance

1.10SSB Biomass

-101234

1980 1990 2000 2010

Ave Z ages 5-8 Ave Z ages 1-4

Figure 4.1.11. Simulated median SE standardized bias, which is the bias divided by the standard error (SE), from SAS SURBA.

4.2 State-space fish stock assessment model

4.2.1 Motivation

Deterministic procedures, where, for instance, commercial catches are assumed known without uncertainties are frequently used by fish stock managers, and in cases where fully stochastic models are used, the number of model parameters is often approaching the total number of observations.

Figure 4.2.1.1. Simulation of a random walk with observation noise added.

Fish stock assessment models are fairly complex systems, so in order to motivate the state‐space approach consider the following example: Observations Y are generated from 0 = 0λ , 1=i i iλ λ η

−+ , =i i iY λ ε+ , where = 1 50i K , 2(0, )i pNη σ~ , and

2(0, )i Nε σ o~ all independent. The underlying unobserved quantities λ are to be estimated.

If we approached this system by a deterministic method (pretending zero observation noise) the logical estimator for the underlying iλ is the corresponding observed value iY . This would naturally lead to a more fluctuating estimated time‐series than

the true underlying λ if the observation noise is in fact not zero. This would not use the information fully, as it does not take advantage of the correlation between neighboring lambdas. Finally, this approach makes it impossible to quantify uncertainties in the estimated values within the model.

If we approached this system by a fully parametric statistical model we would have to add some model assumptions to make the model identifiable. One option would be to assume that 1 2 3 4 49 50= , = , , =λ λ λ λ λ λK . This pairwise coupling is naturally arbitrary, and other assumptions could have been chosen, but it illustrates the trade‐off that we face. If we choose small λ ‐groups (here pairs), we get highly uncertain estimates, as the ratio between number of parameters and number of observations is high. If we use large λ ‐groups, we get highly biased estimates because the lambdas we are assuming to be identical are in fact very different.

The third approach presented in this section is a state‐space model. In a state‐space model, the underlying process (here λ ) is considered a random variable that is not observed. The only thing observed is a derived variable subject to measurement noise. The model parameters (here 2

pσ and 2σ o ) are estimated in the marginal

distribution of the observations Y , then the unobserved random variables λ are predicted via their conditional distribution given Y .

Models based on unobserved random variables are widely used in other quantitative sciences, for instance, agricultural, economic, and medical studies. Part of the reason unobserved random variables are not widely used in fisheries science is that fish stock assessment models are fairly complex. Using unobserved random variables in this setting is computer intensive, and the software tools and algorithms to make this feasible have been lagging. State‐space models were introduced in fisheries by Gudmundsson (1987, 1994) and Fryer (2001). Both used the extended Kalman filter to compute the likelihood. The model presented here uses new software (the random effects module for AD Model Builder), which uses a combination of automatic differentiation and the Laplace approximation (MacKay 2003) to solve high dimensional non‐linear models with unobserved random effects efficiently.

4.2.2 Model

The model is a state‐‐space model. The states α are the log‐transformed stock sizes

1log , , log AN NK and fishing mortalities 1

log , , logi inF FK corresponding to different

age classes and total international catches. In any given year y the state is the

combined vector yα = 1(log , , log ,AN NK 1

log , , log )i inF F ′K . The transition equation

describes the distribution of the next years state from a given state in the current year. The following is assumed:

1= ( )y y yTα α η−

The transition function T is where the stock equation and assumptions about the stock‐recruitment relationship enter the model. The equations are:

1, 1, 1 1, 1 1, 1 , 1 , 1 , 1log = log( ( ))y y y y A y A y A yN R w p N w p N− − − − − −

, 1, 1 1, 1 1log = log , 2a y a y a y aN N F M a A⋅

− − − − −− − ≤ ≤

, , 1log = log , 1a y a yF F a A−

≤ ≤

Here aM is the age specific natural mortality parameter, which is most often assumed

known from outside sources. ( )

1, 1a yF ⋅

− − is the total fishing mortality, which includes

fishing mortality from fleets both with and without effort information. The function R describes the relationship between stock and recruitment. The parameters of the chosen stock‐‐recruitment function are estimated within the model. Often it is assumed that certain aF parameters are identical (e.g. 1 =A AF F

The prediction noise η is assumed to be uncorrelated Gaussian with zero mean, and

three separate variance parameters: for recruitment 2

Rσ , for survival 2

Sσ , and for the

yearly development in fishing mortality 2

This completes the description of the unobserved state process. One distinct feature of this model is that the survival process is stochastic. Stock assessment methods frequently assume deterministic survival process, which means that full knowledge of aN , aM , and aF in the previous year imply full knowledge of 1aN +

in the current year. This assumption originates from purely deterministic assessment methods

where aF was considered equivalent to a known catch that was simply subtracted

from aN .

In fully parametric statistical stock assessment models the assumption of deterministic survival is combined with structural assumptions on the F parameters (e.g. multiplicative), which is inconsistent, as an approximated F cannot give an exactly known number of survivors. In this model aF is considered a mortality rate,

and even full knowledge of aN , aM , and aF in the previous year only gives a

prediction of 1aN + in the current year, and the uncertainty of this prediction is

estimated within the model.

The observation part of the state‐space model describes the distribution of the observations for a given state yα . Here the vector of all observations from a given

year y is denoted yx . The elements of yx are residual log‐landings ( )

,log a yC o (which

equals total landings if no other commercial fleets are present), log‐catches from commercial fleets with effort data ( )

,log f

a yC , and log‐indices from scientific surveys ( )

,log s

a yI . The combined observation equation is:

= ( ) .y y yx O α ε+

The observation function O consists of the familiar catch equations for fleets and surveys, and independent measurement noise yε with separate variance parameters

for separate fleets and surveys. An expanded view of the observation equation becomes:

,( ) ( ),, , ,

log = log (1 )Za y a y

a y a y a y

FC e N

− +⎛ ⎞⎜ ⎟⎝ ⎠

( ) ( )

( ) ( ),, , ,

log = log (1 )f f

Zy af fa ya y a y a y

E QC e N

− +⎛ ⎞⎜ ⎟⎝ ⎠

,( ) ( ) ( )365, , ,log = log

sDZa ys s s

a y a a y a yI Q e N ε−

+⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

Here Z is the total mortality rate ( ) ( )

, ,= f f

a y a a y y afZ M F E Q+ + ∑ , ( )sD is the number of

days into the year where the survey s is conducted, ( )f

aQ and ( )s

aQ are model

parameters describing catchabilities. Finally ( ) 2

, (0, )a y Nε σo

o~ , ( ) 2

, (0, )f

a y fNε σ~ , and ( ) 2

, (0, )s

a y sNε σ~ are all assumed independent.

4.2.3 Results

WGMG investigated how the state‐space model, which is based on a random walk assumption, would react to a sudden jump in the underlying fishing mortality. Two scenarios were constructed. In the first scenario fishing mortality was doubled in the middle of the data period, and in the second scenario fishing mortality was doubled

in the last data year. Doubling fishing mortality is a large model violation (compare Figures 4.2.3.1 and 4.2.3.2).

1970 1980 1990 2000

F2−−

1970 1980 1990 2000

F2−−

1970 1980 1990 2000

F2−−

Figure 4.2.3.1. Three cases (one in each column) simulated from the same model as is used to estimate. The red line is the simulated truth, black line is the model prediction, and the shaded area is the estimated 95% point wise confidence intervals.

The state‐space fish stock assessment model was able to follow the jump in fishing mortality, and the resulting drop in SSB, in the scenario where the change was in the middle of the data period (Figure 4.2.3.2), but not in the scenario where the jump was in the final year (Figure 4.2.3.3). The state‐space model used was not altered in any way to accommodate the sudden jump. If sudden large changes in fishing mortality was suspected in real assessment data the model could (and should) be configured to shift fishing mortality pattern, or if the sudden change was caused by a change in effort, the model should be configured to include effort data.

1970 1980 1990 2000

F2−−

1970 1980 1990 2000

F2−−

1970 1980 1990 2000

1970 1980 1990 20000.

F2−−

Figure 4.2.3.2. Three cases (one in each column) simulated with a sudden doubling in fishing mortality in the middle of the data period. The red line is the simulated truth, black line is the model prediction, and the shaded area is the estimated 95% point wise confidence intervals.

1970 1980 1990 2000

F2−−

1970 1980 1990 2000

F2−−

1970 1980 1990 2000

1970 1980 1990 20000.

F2−−

Figure 4.2.3.3. Three cases (one in each column) simulated with a sudden doubling in fishing mortality last year of the data period. The red line is the simulated truth, black line is the model prediction, and the shaded area is the estimated 95% point wise confidence intervals.

WGMG also decided to investigate the frequentist properties of the confidence intervals produced by the state‐space fish stock assessment model. Starting with the estimated model parameters from the North Sea Cod assessment, 1000 datasets were generated from the true model. From each of these datasets the model parameters were estimated, states were predicted, and the confidence intervals for 2 4F

− and SSB

were computed via the delta method. The same model was thus used to generate populations and to estimate parameters.

The simulated cover probabilities ranged from 80% to 95% and averaged around 85%, which is lower than the expected 95% (Figure 4.2.3.4). The confidence intervals are constructed assuming that the uncertainties in the model parameters (not the random effects) can be neglected, and this is likely what is causing the confidence intervals to be too narrow.

1970 1980 1990 2000

F2−−4

1970 1980 1990 20000.

Figure 4.2.3.4. Cover probability based on 1000 simulations from the true model. Lines indicate probability of the simulated truth being below (red), above (green), and inside (black) the estimated 95% point wise confidence interval.

4.2.4 Conclusions

New algorithms and software tools capable of optimizing a full state‐space stock assessment model in minutes rather than in hours allow for the investigation of frequentist properties. It can be concluded that the simple confidence intervals for average fishing mortality and spawning‐stock biomass are too narrow covering only about 85% as opposed to the desired 95%. It is recommended that more advanced ways of constructing confidence intervals (profile likelihood, or simulation based methods) are investigated.

State‐space fish stock assessment models are worthy of further investigation, as they are able to separate process and observation noise and they avoid arbitrary smoothing parameters and ad‐hoc weighting of different data sources.

4.3 Stock assessment models incorporating partial information about discards

4.3.1 Introduction to the problem

Discarding is the practice of returning an unwanted section of the catch back to the sea during fishing operations. Discards not only include non‐commercial species, but also commercial species that are below minimum landing size (MLS) or less profit‐able owing to market conditions and quota restrictions (Catchpole et al., 2005). Dis‐cards represent a significant proportion of global marine catches and are generally considered to constitute waste, or suboptimal use of fishery resources (Kelleher 2004).

Accounting for discards in the exploitation of fish stocks starts with their estimation and use in stock assessment. This often requires information on the total catch, con‐sisting of both landings and discards. Discard surveys in recent years reveal that dis‐cards may cover a substantial part of the catch and for some stocks may even exceed

the landings. Generally, long time‐series on age‐structured landings (based on port sampling) exist, but historical estimates on discards are often lacking. In addition, the existing estimates often cover a small fraction of the fleet and may therefore be im‐precise.

Here, we present two similar methods that allow us to conduct stock assessments when there is information about discards in only some years. They are both based on statistical catch‐at‐age models. These models describe simple age‐structured popula‐tions and use existing data on landings, discards and surveys to estimate the parame‐ters and their uncertainties in the population model. These models are thus similar to (for example) the work described by Punt et al. (2006). In addition to stock assess‐ments, the methods also provide a reconstruction (point estimates) of the historical discards time‐series and their associated uncertainties. Below, we describe the two approaches in turn.

In order to test the performance of models of this type with respect to their ability to reconstruct discards, we use data from a stock where long time‐series exist: North Sea haddock. Discards, landings, industrial bycatch and survey series are available from 1978 onwards. First, the models are fitted to the entire existing datasets to evaluate the ability of the models to explain the observations. Subsequently, discard estimates from the available datasets are removed in order to test the ability of the models to estimate the parameters and uncertainties. This reflects the situation in which the two models are actually used to reconstruct historical discards estimates. For all runs, the reconstruction can be compared to the actual observations (even when these are treated as missing for the purpose of the analysis).

4.3.2 A model based on age selectivity smoothing via splines

The first approach uses splines to capture the complex age dependent processes that shape selectivity and discarding. It follows closely the approach described by Aarts and Poos (in prep; WP 9).

Here we give only a condensed summary of the method, described in detail in the WP 9. The WP is set in the context of the North Sea plaice stock. However, as is shown here, the concept can be translated to other fisheries for which discard time‐series exist.

The model is a statistical catch‐at‐age model, in which the population numbers for each age a at time t are estimated using the recruitment at a given year N1,t and an exponential decay function for the decrease in population numbers owing to the in‐tegral of the total mortality Za,t at‐age a and time t:

In turn, Za,t is composed of the instantaneous natural mortality rate Ma and the fishing mortality rate Fa,t.

The fishing mortality Fa,t is the result of the catchability q, the amount of fishing effort et and the selectivity pattern fa,t, such that

In this equation, the catchability q is the extent to which a stock is susceptible to fish‐ing. The fishing effort et is the total amount of fishing in a year, and varies each year

(hence the subscript t). With the available data it is only possible to estimate the product of these two. The selectivity pattern fa,t defines the relative proportion of age classes in the catch. This age dependent selectivity is the result of several processes. The mixture of these processes make the fishing selectivity a complex function of age and specifying an a priori shape may not fully address the multitude of processes that take place in shaping its functional form. Therefore, we used a smooth function of age, using 4 b‐spline basis functions hk(a) (de Boor 2001). These functions can be viewed as four transformations of the explanatory variable a. Each basis function hk(a) is weighted by a constant bk,t. Summing these weighted functions results in the com‐plex smooth function of age, such that

The inverse logit ensures that fa,t takes values between 0 and 1. Temporal changes in the overlap of spatial distribution between fishing effort and the different age classes of the fish population can result in changes of the selectivity pattern. This is captured by modelling the weighting constants as a function of time, hence the subscript t in bk,t.

From Na,t , the expected catch Ca,t for age a and year t can be calculated as follows

The catch consists of discards Da,t, landings La,t and industrial bycatch Ia,t. We assume that an age‐dependent fraction da,t of the catch is discarded, such that

tatatata

tatata

tatatata

iCdICdLiCdD

The discard selection pattern and industrial by catch selection pattern are also mod‐elled using b‐splines, with inverse logit transformation to ensure selectivity between 0 and 1.

Finally, the survey catchabilities for the different surveys are assumed to be constant in time, with selection patterns fitted by b‐splines. Because the different surveys sample the population at different times of year, the survey estimates are trans‐formed to the first of January.

The likelihood function is based on the discards, landings, bycatch and surveys. The maximum likelihood is estimated using fitted using the Broyden‐Fletcher‐Goldfarb‐Shanno (BFGS) quasi‐Newton or variable metric algorithm. Starting values were uni‐formly selected (within appropriate boundaries). In order to ensure the global maxi‐mum in the likelihood function is found, at least 20 different sets of starting values are used, and the highest maximum used.

Maximizing the likelihood function results in maximum likelihood parameter esti‐mates and the variance‐covariance matrix is derived from the inverse of the Hessian.

It is now possible to select random values from a normal distribution with those means and variance‐covariances, and use the resulting random realization to estimate the population and fisheries characteristics of interest. Each randomization leads to different predictions for the population characteristics. Repeating such randomiza‐tions 1000 times can provide an estimate of variance. The 95% quantiles of these pre‐dictions approximate the corresponding confidence intervals for these variables, and therefore provide the required measure of uncertainty.

4.3.3 A model based on autoregressive-in-time age selectivities

The model to be described here follows closely the one developed by Fernández et al. (2008), presented as WP 13, where full details can be found. It is a statistical catch‐at‐age model, with population dynamics modelled forwards in time, and it is fitted us‐ing Bayesian methods. The general approach taken for choosing prior distributions was to centre them at values deemed reasonable, which requires some knowledge of the stock being modelled, and to assign them large CVs (coefficients of variation), so as to prevent them from having too strong an impact on posterior inferences.

Modelling starts by setting prior distributions on recruitment (say, age 0 individuals) in each of the assessment years, and on numbers of individuals in each of the ages (including the plus‐group) in the first assessment year. Lognormal prior distributions were chosen.

Given yearly recruitments, numbers‐at‐age in the first year, fishing and natural mor‐tality rates, population dynamics are assumed to be deterministic, with coherent treatment of the plus‐group.

The ideas to deal with fishing mortality due to discarding are presented in full detail in the WP and here a condensed summary is given. The WP is set in the context of a hake stock fished by Spain and Portugal, but it is straightforward to translate the ideas to other stocks and fisheries. The core of these ideas was already presented by Punt et al. (2006) and they have also been developed in a slightly different direction in the model described in the previous subsection of this report.

Consider a stock fished by two different fleets, say F1 and F2. Fishing mortality caused by discarding is handled following 3 key ideas.

First: To decompose the total fishing mortality rate for age a in year y, F(y,a), as

F(y,a) = FL(y,a) + FDF1(y,a) + FDF2(y,a),

where FL(y,a) denotes fishing mortality corresponding to the landings, and FDF1(y,a) and FDF2(y,a) fishing mortality corresponding to discards from fleets F1 and F2, re‐spectively.

Second: To relate each type of catch data (landed numbers‐at‐age, discarded num‐bers‐at‐age from F1, discarded numbers‐at‐age from F2) to the underlying population abundances N(y,a) and the appropriate term in F(y,a), using the Baranov catch equa‐tion and assuming lognormal errors. For example, for the landed numbers‐at‐age, L(y,a), this is

L(y,a) ~ Lognormal(N(y,a) [1‐exp(‐Z(y,a))] FL(y,a)/F(y,a), CV[L(a)] ),

where the two parameters of the Lognormal distribution correspond to its median and CV, respectively. There are similar equations for the discarded numbers‐at‐age from fleets F1 and F2. It is important to note that in these Baranov catch equations,

the landings are related to FL(y,a) and the discards from fleets F1 and F2 to FDF1(y,a) and FDF2(y,a), respectively.

We are mainly concerned with the situation where there are estimates of landed numbers‐at‐age in all of the assessment years, but there are gaps in the time‐series of discarded numbers‐at‐age estimates. The gaps for F1 and F2 could be in the same or in different years. This causes problems for the estimation of the full matrices of fish‐ing mortality rates FDF1(y,a) and FDF2(y,a), applying the third key idea.

Third: To reduce the dimensionality of FDF1(y,a) and FDF2(y,a). This could be done in different ways. Here, separability‐type assumptions were considered, as follows:

FL(y,a) = f(y) rL(y,a)

FDF1(y,a) = f(y) rDF1(y,a)

FDF2(y,a) = f(y) rDF2(y,a)

where f(y) is a common factor related to overall yearly fishing effort, and rL(y,a), rDF1(y,a) and rDF2(y,a) relate to fishery selectivity‐at‐age for landings and discards from F1 and F2, respectively. To get an identifiable model, rL(y,aref)=1 for some refer‐ence age aref, might be set.

For many stocks, it will be reasonable to assume that selectivities‐at‐age vary smoothly over time, without big changes from one year to the next. In that case, it is sensible to choose an autoregressive model (call it a prior distribution in a Bayesian setting or a random effects model in a classical setting) for each of log(rL(y,a)), log(rDF1(y,a)) and log(rDF2(y,a)). A separate AR(1) model has been chosen for each of these series considering each age separately (although some ages were taken to have the same autocorrelation parameter controlling the AR(1) process). When the autocorrelation parameter of the AR(1) process goes to 0, selectivities‐at‐age are in‐dependent from year to year, whereas when the autocorrelation parameter goes to 1, the selectivities‐at‐age are constant in time (i.e. the model tends to a separable model). The multiplicative factor f(y) was also assigned an AR(1) process in log‐scale.

4.3.4 Description of the experiment considered

During this WG, it was decided to test the performance of both models (spline‐based and autoregressive‐in‐time) on the haddock stock in the North Sea dataset. For this stock, there are complete time‐series of estimates of three types of catch data: landed, discarded and industrial bycatch numbers‐at‐age (which will be referred to as “cur‐rent estimates” henceforth), which are summed up before applying XSA in the cur‐rent accepted ICES assessment. With the two models considered here, each of these three types of catch data has their own observation equations, so that they are not summed before entering them in the assessment models.

The models were tested under three different scenarios. In all of them it was assumed that a complete time‐series of landed numbers‐at‐age was available (years 1978–2007).

• Scenario 1: Full time‐series. Estimates of discarded and industrial bycatch numbers‐at‐age available in all years (1978–2007)

• Scenario 2: Some missing years. Estimates of discarded and industrial by‐catch numbers‐at‐age available in some years (1980, 1983, 1986, 1989, 1991, 1993, 1995, 1997, 1999–2007)

• Scenario 3: Many missing years. Estimates of discarded and industrial by‐catch numbers‐at‐age available in only a few years (1992, 1995, 1997, 1999–2007)

The scenarios were selected to represent decreasing levels of available information, with Scenario 3 deemed to be the most realistic for many stocks, where there is in‐formation on discards in the last decade or so, but only very limited information in earlier years. In line with the decreasing levels of information available as input data, it is expected that model estimates will be further away from the “current estimates” when going from Scenario 1 to Scenarios 2 and 3. Due to space considerations, de‐tailed results will be presented only for Scenarios 1 (no missing data) and 3 (deemed to be the most realistic); Scenario 2 results will be intermediate between these two. Summary plots of results for all three scenarios will also be presented.

4.3.5 Detailed results under scenario 1: full time-series

Results from the spline-based model

If all data for discards, industrial bycatch and landings for North Sea Haddock are used, the b‐spline based model seems to fit reasonably well to the data with respect to the catch at‐age data (see Figure 4.3.1). However, several of the ʺcurrent estimatesʺ are far outside of the confidence bounds of the model estimates. For example, the dis‐cards model estimate in 1978 is much lower than the observation, while the discard model estimate in 2000 is much higher than the observation. Also, both observations are far outside of the estimation confidence bounds. The large difference in 2000 is owing to the fact that the model fits too high discards of age 1 in that particular year. With respect to the industrial bycatch, there is a structural overestimation of the in‐dustrial bycatches, especially in the period up to 2000. Also, the observations do not lie within the confidence intervals of the model estimates.

Figure 4.3.1. North Sea haddock. B‐spline model estimates of landings, discards and industrial bycatch estimates and confidence bounds. Triangles in the top of the panels indicate the year for which discard data were put into the model.

The raw residuals of the model fits with respect to the discards, landings and indus‐trial bycatch (Figure 4.3.2) indicate that the variance in the discards and industrial bycatch increases with age. In contrast, the variance in the landings‐at‐age data are much smaller, and lowest for the intermediate ages. The residuals for the estimates from the discards model fits exhibit strings of positive or negative residuals in some of the younger ages. This indicates that the single parameter that makes the spline parameters dependent of time does not fully explain the changes that have been go‐ing on in the fishery. Hence, the model can probably be improved by increasing the number of parameters that make the model selectivities dependent of time. This hy‐pothesis can be tested by rerunning the model with such a setup, and comparing the AIC value for the different models.

Figure 4.3.2. North Sea haddock. B‐spline model raw log‐residuals for the discards, landings and industrial bycatch estimates for ages 1 to 7 (left set of panels) and raw log‐residuals for the differ‐ent survey tuning series (right panels).

The spawning‐stock biomass, mean fishing mortality and recruitment estimates gen‐erated from the model (Figure 4.3.3) indicate a large year class occurred in 1999, fol‐lowed by several years of low recruitment. Mean fishing mortality has been fluctuating around 0.8, with confidence bounds between 0.7 and 0.9. Since the large 1999 year class, the fishing mortality has come down, but has been increasing again to current levels of approximately 0.5, with confidence bounds between 0.4 and 0.55. The SSB has fluctuated through out the time‐series, with a high in 2002, when SSB reached levels of approximately 350 000 tonnes. The SSB in 2007 is estimated to be between 100 000 and 150 000 tonnes. These estimates are in line with the current per‐

ceptions of the stock dynamics based on XSA by the ICES WG that deals with North Sea haddock (ICES –WGNSSK 2008).

Figure 4.3.3. Estimates of North Sea haddock SSB, F and recruitment from the spline based model with full data. Triangles in the top of the panels indicate the year for which discard data were put into the model.

Results from the autoregressive-in-time model

As this model was fitted using Bayesian methods, a joint posterior distribution for all unknown quantities has been obtained. Simulated values from the posterior distribu‐tion were generated using an MCMC (Markov chain Monte Carlo) algorithm. This was done with the (free) software WinBUGS, called from R using the package R2WinBUGS. As MCMC algorithms simulate the posterior distribution in a depend‐ent fashion (each simulated value depends on the previous one), long simulation runs are necessary to get a reasonable approximation to the posterior distribution. At the same time, WinBUGS was found to be rather slow, which limited the length of the simulation runs that could be performed during the Methods WG meeting. Each simulation run presented in this report consisted of a burn‐in period (an initial num‐ber of draws to be discarded, to mitigate the effect of start‐up values in the MCMC algorithm) of 1600 iterations, followed by 4000 additional iterations, of which every second one was recorded in order to reduce autocorrelation in the recorded values. Each of these simulation runs took about 12 hours on a standard desktop PC. No convergence or autocorrelation problems were apparent in the recorded values, al‐though this issue was not studied in full detail. With more time available, this would have been examined more carefully and it would probably have been desirable to

conduct longer simulation runs, particularly for the estimation of extreme (2.5% and 97.5%) posterior quantiles.

Figure 4.3.4 displays estimated stock trends (posterior medians and 95% probability intervals), which are, for the most part, consistent with the results from the current ICES XSA assessment. A notable difference with the ICES XSA assessment is the in‐crease in SSB estimated by the autoregressive‐in‐time model for 2007, which is not seen in the XSA assessment. Given the little time available during the Methods WG, and that comparison with XSA results was not a main focus of this work, potential causes of this difference were not explored.

1980 1985 1990 1995 2000 2005

SSB: estimated and 95% prob intervals

1980 1985 1990 1995 2000 2005

Recruitment: estimated and 95% prob intervals

1980 1985 1990 1995 2000 2005

Fbar(2-4): estimated and 95% prob intervals

Figure 4.3.4. North Sea haddock, autoregressive‐in‐time model. Full time‐series, stock trends.

Figures 4.3.5 and 4.3.6 display, respectively, raw and standardized residuals of log(landed numbers‐at‐age), log(discarded numbers‐at‐age) and log(by‐caught num‐bers‐at‐age). By raw residuals it is meant the posterior median of (observed value ‐ estimated value from model), whereas by standardized residuals it is meant the pos‐terior median of (observed value ‐ estimated value from model)/(estimated standard deviation of the Normal observation equation in log‐scale). As there are no landings of age 0 in any of the assessment years, this was taken to be the case also in the model which, therefore, did not produce any fitted values. This explains the absence of re‐siduals for landed numbers of age 0.

From the raw residuals plot (Figure 4.3.5), it is clear that model fit follows most closely the landed numbers‐at‐age, ages 0 to 4 of the discarded numbers‐at‐age and ages 1 to 4 of by‐caught numbers‐at‐age. The older ages show larger residuals, which is not surprising as the input data numbers for those ages are usually low (including many zeros), so those ages are, presumably, less well sampled. The poor residual pat‐tern for the age 0 bycatch is strongly related to the fact that there is a sudden drop in

the bycatch input data values, which become zero in the last four years. The autore‐gressive‐in‐time model does not appear to cope well with such a strong and sudden drop.

Figure 4.3.6 shows standardized residuals. The poor behaviour of age 0 bycatch re‐siduals is again seen in this graph. The standard deviation appears to have been overestimated for certain ages (deduced from the very small standardized residuals), but, on the whole, model fit is considered to be acceptable.

1980 1985 1990 1995 2000 2005

Discards (raw)

1980 1985 1990 1995 2000 2005

IBC (raw)

1980 1985 1990 1995 2000 2005

Landings (raw)

Figure 4.3.5. North Sea haddock, autoregressive‐in‐time model. Full time‐series, raw residuals of log(numbers‐at‐age).

1980 1985 1990 1995 2000 2005

Discards (stand)

1980 1985 1990 1995 2000 2005

IBC (stand)

1980 1985 1990 1995 2000 2005

Landings (stand)

Figure 4.3.6. North Sea haddock, autoregressive‐in‐time model. Full time‐series, standardized residuals of log(numbers‐at‐age).

Figure 4.3.7 presents yearly landings in weight (solid circles) and fitted values from the model (posterior medians, open circles), as well as 95% posterior probability in‐tervals. Figures 4.3.8 and 4.3.9 present similar results for discards and bycatch. The model fit appears to be quite reasonable for the landings and even better for the dis‐cards. On the other hand, bycatch tends to be overestimated (model estimates often larger than the input data values) and the 95% probability intervals have very long right tails. Further investigation of this issue indicated that this behaviour was com‐ing mostly from the age 0 individuals and that it is related to the sudden drop in the input data from large positive to zero values in the last four years. It is reasonable to expect that this bad behaviour for the bycatch would not happen had this sudden drop in bycatch input data values not occurred.

1980 1985 1990 1995 2000 2005

Landings: Observed (solid circles), estimated (open circles) and 95% prob intervals

Figure 4.3.7. North Sea haddock, autoregressive‐in‐time model. Full time‐series. Observed and fitted yearly weight landed.

1980 1985 1990 1995 2000 2005

Discards: Observed (solid circles), estimated (open circles) and 95% prob intervals

Figure 4.3.8. North Sea haddock, autoregressive‐in‐time model. Full time‐series. Observed and fitted yearly discarded weight.

1980 1985 1990 1995 2000 2005

By-Catch: Observed (solid circles), estimated (open circles) and 95% prob intervals

Figure 4.3.9. North Sea haddock, autoregressive‐in‐time model. Full time‐series. Observed and fitted yearly by‐caught weight.

4.3.6 Detailed results under scenario 3: many missing years

In the scenario where most of the discard data from the beginning of the time‐series are removed, the landings estimated by the model and the related uncertainty do not change substantially (Figure 4.3.10). However, the uncertainty bounds around the discards estimates (which are now not in the model), have become wider, reflecting the uncertainty that the model has to reconstruct estimates for discard data in the ab‐sence of historical information. This results in most discard observations now being within the uncertainty bounds. Much of the dynamics in the discards time‐series with respect to the median estimates are unchanged compared to the model with full in‐formation. However, the underestimation of the median model discards at the begin‐ning of the time‐series increased with the removal of the discard data. The estimation of the industrial bycatch data seems most problematic. The overestimation at the be‐ginning of the time‐series increased. Although the confidence bounds around the median estimates have also increased, the observations are still outside of the confi‐dence bounds.

Figure 4.3.10. North Sea haddock, b‐spline model. Landings, discards and industrial bycatch es‐timates and confidence bounds. Triangles in the top of the panels indicate the year for which discard data were put into the model.

The raw residual from the industrial bycatch estimates (Figure 4.3.11) indicates that the stronger overestimation in the model occurs in age 1 and 2.

Figure 4.3.11. North Sea haddock, b‐spline model. Raw log‐residuals for the discards, landings and industrial bycatch estimates for ages 1 to 7 (left set of panels) and raw log‐residuals for the different survey tuning series (right panels).

The recruitment estimates do not change substantially when all data are removed (Figure 4.3.12). The median estimates of mean F have increased, with historical levels fluctuating around 1.0. The confidence intervals for the mean F values have also in‐creased. The overall pattern is similar to the full data scenario. The SSB estimates have not changed substantially compared to the full data scenario.

Figure 4.3.12. Estimates of North Sea haddock SSB, F and recruitment from the spline based model with least discard data. Triangles in the top of the panels indicate the year for which dis‐card data were put into the model.

Figure 4.3.13 displays stock trends. The perception of stock trends is not very differ‐ent with respect to Scenario 1 (full time‐series). The most obvious difference appears to be in the recruitment estimate for 1994 (one of the years with missing discards and bycatch data), which is now estimated to be considerably larger, with a large increase in the associated uncertainty too.

1980 1985 1990 1995 2000 2005

SSB: estimated and 95% prob intervals

1980 1985 1990 1995 2000 2005

Recruitment: estimated and 95% prob intervals

1980 1985 1990 1995 2000 2005

Fbar(2-4): estimated and 95% prob intervals

Figure 4.3.13. North Sea haddock, autoregressive‐in‐time model. Many missing years. Stock trends.

Raw and standardized residuals of log(landed numbers‐at‐age), log(discarded num‐bers‐at‐age) and log(by‐caught numbers‐at‐age) are presented in Figures 4.3.14 and 4.3.15, respectively. On the whole, the residuals present similar characteristics to those found under Scenario 1 (full time‐series).

1980 1985 1990 1995 2000 2005

Discards (raw)

1980 1985 1990 1995 2000 2005

IBC (raw)

1980 1985 1990 1995 2000 2005

Landings (raw)

Figure 4.3.14. North Sea haddock, autoregressive‐in‐time model. Many missing years. Raw re‐siduals of log(numbers‐at‐age).

1980 1985 1990 1995 2000 2005

Discards (stand)

1980 1985 1990 1995 2000 2005

IBC (stand)

1980 1985 1990 1995 2000 2005

Landings (stand)

Figure 4.3.15. North Sea haddock, autoregressive‐in‐time model. Many missing years. Standard‐ised residuals of log(numbers‐at‐age).

Figure 4.3.16 displays observed (solid dots) and estimated (posterior median, open circles) landed weight by year, with 95% posterior probability intervals. Figures 4.3.17 and 4.3.18 present similar plots for yearly weights of discards and bycatch, re‐spectively. In these two figures, the years with available data are denoted with the solid circles. For the years without data available, the “current estimates” (by which we mean those values treated as input data under Scenario 1, many of which are treated as missing under Scenario 3) can be deduced by following the dashed line.

Comparing with Scenario 1 (full time‐series), there is very little difference in the esti‐mates of landings weight (Figures 4.3.7 and 4.3.16). There is, however, much more difference in the posterior distributions of discards, which now have much wider 95% probability intervals (Figures 4.3.8 and 4.3.17). Figure 4.3.17 shows that the model estimates (posterior medians) of discards are smaller than the “current esti‐mates” in the early part of the time‐series, when all discards data are missing, al‐though “current estimates” are inside 95% posterior probability intervals in almost all years. It is also clear from Figure 4.3.16 that these probability intervals are much nar‐rower in the years with discards data than in the years where such data are missing. In terms of yearly bycatch estimates, the values estimated by the model under Sce‐nario 3 are much larger than the “current estimates” (Figure 4.3.17), in some years by many orders of magnitude, and the 95% posterior probability intervals are generally very large. Once again, the 95% posterior probability intervals contain the “current estimates” in most years.

1980 1985 1990 1995 2000 2005

Landings: Observed (solid circles), estimated (open circles) and 95% prob intervals

Figure 4.3.16. North Sea haddock, autoregressive‐in‐time model. Many missing years. Observed and fitted yearly landed weight.

1980 1985 1990 1995 2000 2005

Discards: Observed (solid circles), estimated (open circles) and 95% prob intervals

Figure 4.3.17. North Sea haddock, autoregressive‐in‐time model. Many missing years (discards data in 1992, 1995, 1997, 1999–2007). Observed and fitted yearly discarded weight.

1980 1985 1990 1995 2000 2005

By-Catch: Observed (solid circles), estimated (open circles) and 95% prob intervals

Figure 4.3.18. North Sea haddock, autoregressive‐in‐time model. Many missing years (bycatch data in 1992, 1995, 1997, 1999–2007). Observed and fitted yearly by‐caught weight.

4.3.7 Comparing the three scenarios

In order to facilitate comparison of results across the 3 different scenarios considered throughout, two different plots are presented:

1 ) Time series of (Model estimated landings – Observed landings)/Observed landings, under each of the 3 scenarios.

2 ) Time series of (Model estimated landings – Observed land‐ings)/Interquantile range, under each of the 3 scenarios, where the inter‐quantile range is defined as the distance between the model point estimate and the extreme of the 95% interval that is closest to the observed landings value.

Similar plots are produced for discards and bycatch.

Since going from Scenario 1 to Scenarios 2 and 3 corresponds to a decrease in the level of information that is input into the model, it is expected that performance will “deteriorate” (in the sense of model estimated values being further away from the “current estimates”) when considering the scenarios in order 1, 2, 3.

Figure 4.3.19. North Sea haddock, b‐spline model. Model estimated discards – observed dis‐cards)/observed discards (left panel) with scenarios 1, 2 and 3 depicted by solid, dashed and dot‐ted lines, respectively. Model estimated discards – observed discards)/interquantile range (right panel) with scenarios 1, 2 and 3 depicted by black, dark grey and light grey dots, respectively .

From Figure 4.3.19 (upper plot), it is clear that there is not much difference between the three scenarios in terms of model estimates of yearly discards in weight, with the large majority of observed values falling inside the corresponding 95% confidence bounds.

For discards, no clear trend can be seen about the model that fits the data best. How‐ever, the underestimations in the scenario with least data are strongest. The differ‐ence between the three scenarios is much less clear when the width of the 95% probability interval is also taken into consideration (Figure 4.3.19, lower plot). Obvi‐ously, the wider confidence intervals obtained under scenarios 2 and 3 make the ob‐servations lie within the confidence bounds, reflecting the larger uncertainty.

From Figures 4.3.20 and 4.3.21, it is clear that there is not much difference between the three scenarios in terms of model estimates of yearly landings in weight, with the

large majority of observed values falling inside the corresponding 95% posterior probability interval.

For yearly discards, model estimates are closest to “current estimates” under Scenario 1, with the exception of a few years (see Figure 4.3.22), although this difference be‐tween the three scenarios is much less clear when the width of the 95% probability interval is also taken into consideration (Figure 4.3.23). This is a consequence of the wider posterior probability intervals obtained under scenarios 2 and 3.

In terms of bycatch, model estimates are almost always above the “current esti‐mates”, with the difference becoming larger when going through the scenarios in the order 1, 2 and 3 (Figure 4.3.23). In year 1992 the model estimate under scenario 3 is much closer to the “current estimate” than the model estimate under scenario 2. This might seem counterintuitive, since scenario 3 has more missing years of data than scenario 2. However, 1992 is the only year where discards and bycatch data are treated as missing under scenario 2 but not under scenario 3, explaining the result. The differences between the three scenarios are much less pronounced when also tak‐ing into account the uncertainty in the estimation, as Figure 4.3.24 illustrates.

We note that under all three scenarios, the vast majority of values in Figures 4.3.20, 4.3.22 and 4.3.24 are inside the bounds (‐1,1), indicating that “current estimates” are generally inside the 95% posterior probability intervals derived from the autoregres‐sive‐in‐time model.

1980 1985 1990 1995 2000 2005

Landings: (Estimate/Observation) - 1

Figure 4.3.20. North Sea haddock, autoregressive‐in‐time model. (Model estimated landings ‐ Ob‐served landings)/Observed landings (Scenarios 1, 2 and 3 depicted by solid, dashed and dotted lines, respectively).

1980 1985 1990 1995 2000 2005

Landings: (Model estimate - Observed value)/(Interquantile range)

Figure 4.3.21. North Sea haddock, autoregressive‐in‐time model. (Model estimated landings – Observed landings)/Interquantile range (Scenarios 1, 2 and 3 depicted by solid, dashed and dot‐ted lines, respectively).

1980 1985 1990 1995 2000 2005

Discards: (Model estimate/'Current estimate') - 1

Figure 4.3.22. North Sea haddock, autoregressive‐in‐time model. (Model estimated discards – “Current estimates”)/”Current estimates” (Scenarios 1, 2 and 3 depicted by solid, dashed and dot‐ted lines, respectively).

1980 1985 1990 1995 2000 2005

Discards: (Model estimate - 'Current estimate')/(Interquantile range)

Figure 4.3.22. North Sea haddock, autoregressive‐in‐time model. (Model estimated discards – “Current estimates”)/Interquantile range (Scenarios 1, 2 and 3 depicted by solid, dashed and dot‐ted lines, respectively).

1980 1985 1990 1995 2000 2005

By-Catch: (Model estimate/'Current estimate') - 1

Figure 4.3.23. North Sea haddock, autoregressive‐in‐time model. (Model estimated bycatch – “Current estimates”)/”Current estimates” (Scenarios 1, 2 and 3 depicted by solid, dashed and dot‐ted lines, respectively).

1980 1985 1990 1995 2000 2005

By-Catch: (Model estimate - 'Current estimate')/(Interquantile range)

Figure 4.3.24. North Sea haddock, autoregressive‐in‐time model. (Model estimated bycatch – “Current estimates”)/Interquantile range (Scenarios 1, 2 and 3 depicted by solid, dashed and dot‐ted lines, respectively).

4.3.8 Conclusions

The two stock assessment models presented here are able to incorporate discards data (and other kinds of catch data, such as bycatch) that may be available in just some of the assessment years. The models produce stock assessments that incorpo‐rate all the available catch information, while, at the same time, complete time‐series of model estimates are obtained for discards (and any other components of the catch incorporated in the model, such as bycatch). The model estimates have associated confidence (in the maximum likelihood setting) or posterior probability (in the Bayes‐ian setting) intervals and the stock assessment results also incorporate this uncer‐tainty.

The way the models fill in the gaps in the time‐series of discards and/or bycatch is by reducing the number of parameters in the selectivities‐at‐age of landings, discards and/or bycatch. One of the models presented here does this by considering smooth‐ing splines over the ages and a time‐trend of parametric form, whereas the other model uses autoregressive processes in time for the selectivities of each age sepa‐rately. The main information that helps in reconstructing the unobserved component of catch comes from tuning series of constant catchability over time spanning the as‐sessment years.

The models were fitted under different statistical paradigms: whereas the spline‐based model fit was achieved by maximum likelihood, the autoregressive‐in‐time model was analysed in a Bayesian context. This difference in statistical approaches, however, is not expected to have a major impact on the conclusions drawn from the analyses, particularly since the prior distributions chosen in the Bayesian model were quite wide (only slightly informative), so that the results obtained are expected to be quite similar to those that would have been obtained under a maximum likelihood analysis of the same model.

Three scenarios were analysed in order to examine the behaviour of the models un‐der different levels of available information. The scenarios are in the context of the North Sea haddock stock, for which complete time‐series of discards and bycatch numbers‐at‐age (denoted as “current estimates”) exist. In Scenario 1, full time‐series of discards and bycatch were assumed to be available. In Scenario 2, several of the discards and bycatch data values were assumed to be missing. In Scenario 3, even more of the discards and bycatch data values were assumed to be missing. In the two missing data scenarios, it was assumed that all values from 1999 onwards were avail‐able and that data were missing in earlier years, as this is deemed to be the more real‐istic situation. Unfortunately, although we have a similar setup for the scenarios between the two approaches, in case of the spline‐based model only the discard esti‐mates were removed. In addition, in all three scenarios, a full time‐series of landings data were assumed to exist.

For the autoregressive‐in‐time model, the three scenarios behaved similarly in terms of fitting the landings data, as Figures 4.3.19 and 4.3.20 indicate. For the discards data, model point estimates were closer to “current estimates” under Scenario 1 (full time‐series of input data) than under Scenarios 2 and 3 (Figure 4.3.21). However, no major differences between scenarios were detected once the differences between model point estimates and “current estimates” were rescaled to evaluate the widths of the 95% posterior probability intervals (Figure 4.3.22). This is due to the wider probability intervals obtained under Scenarios 2 and 3. In terms of bycatch, model point estimates were generally above the “current estimates” and this effect became very strong in scenarios 2, 3 (Figure 4.3.23). The differences between the three scenar‐

ios became much less pronounced, however, once the differences between model point estimates and “current estimates” were rescaled to evaluate the widths of the 95% posterior probability intervals. From this we conclude that, although model point estimates (posterior medians) of discards and bycatch tend to be far away from the “current estimates” as the number of missing years in the data increases (and, indeed, for the bycatch they can be away by many orders of magnitude), these devia‐tions are captured by the width of the 95% posterior probability intervals, which con‐tain almost always the “current estimates”. In some sense, the model is implicitly acknowledging a lack of knowledge of discards and, particularly, about bycatch, in some years and automatically incorporates this uncertainty into the assessment re‐sults. Since 95% posterior probability intervals have been found to contain almost always the “current estimates”, there is no reason strongly to suspect that there may be serious biases in the assessment results when the Bayesian autoregressive‐in‐time model is used.

For the spline‐based model, the proposed estimation model seems to capture most of the dynamics of the input data. The main problem lies in the estimation of the indus‐trial bycatch data, which was added ad hoc during the WGMG meeting to accommo‐date the specific North Sea haddock catch at‐age data. In retrospect, more model parameters could be included in the selection patterns that define the landings, dis‐cards and industrial bycatches. As a result, the model would need more parameters to estimate, but in the case of the relatively data‐rich haddock stock, many free pa‐rameters exist. Also, although the discards are relatively well estimated, in some years the discard estimates are rather far outside of the confidence bounds. This could be fixed by trying to add more process knowledge of the discarding into the model. However, in this exercise we tried to test the appropriateness of a simple model.

When removing the discard data from the observations that the model is fit to, most of the estimates stay relatively close to the estimates based on the full dataset. Also, the uncertainty bounds for several relevant estimates increase, reflecting the lack of data to get precise estimates for these parameters. The perception of the population dynamics seems to confirm the current perception of the North Sea demersal work‐ing group (ICES‐WGNSSK 2008). The perception of the average fishing mortality seems to increase with the removal of data in the earlier time period, but in general, the uncertainty estimates for these parameters are large.

The finding that the model predictions are relatively robust against removal of part of the discards data relies entirely on the quality of the survey tuning indices. It appears that there is enough information in these to do a reasonable reconstruction of the dis‐cards. This is of course no guarantee that the same is true for all fish stocks. Also, the reconstruction of the discards is only possible in years where survey tuning series exists. This may hamper reconstructing discards for very long time‐series.

One of the drawbacks of the method presented here is that the approach cannot dis‐tinguish between changes in natural mortality and changes in the discarding prac‐tices. To the model, only the historical estimates of removals are used, based on the surveys. Because the natural mortality estimates are fixed in these models (generally based on historical estimates), changes in the mortality will be interpreted as changes in the discarding.

4.3.9 Research recommendations

The performance of the two models (spline‐based and autoregressive in time) consid‐ered for stock assessment purposes in the presence of discards information in only some years has only been tested in a limited amount of cases. Hence, a wider ranging simulation exercise would be appropriate before their widespread use can be recom‐mended. Nonetheless, from the work done so far, conducting stock assessments us‐ing these models is a possible way forward when information on discards and/or bycatch is missing for certain years and for explicit incorporation of uncertainty in estimates. This is under the provision that key model assumptions (in particular that selectivities‐at‐age do not change abruptly between consecutive years and that yearly fishing effort can be treated as a common factor for all the catch types) hold.

The main use of these models should be in providing stock assessments that can in‐corporate discards information in just some years. Whether they can “correctly” re‐construct historical discarded values (or any other missing components of the catch) is less clear but, at least, they are able to incorporate the available information regard‐ing discards and they acknowledge the fact that there is great uncertainty about dis‐carded past values by estimating very wide confidence (or posterior probability) intervals. This uncertainty will then automatically be incorporated in the assessment results.

The fact that the models explicitly have terms to account for discards (and/or other missing components of the catch) makes them useful for evaluation of management strategies that incorporate ways to manage discards.

5 Subgroup 3: Detecting changes in stock productivity

Participants: Benoit Mesnil (Subgroup Chair), Joachim Gröger.

5.1 Introduction

Detection of changes in the behaviour of a process based on past observations or on ongoing monitoring data is a routine concern in many fields of activity (e.g. epidem‐ics, econometrics, environmental surveillance, etc.). Since undesirable changes may have serious social or economic implications, decision‐makers have for decades turned to statisticians to provide reliable methods for prompt and sure detection of such changes. The literature on this issue is very extensive, and numerous methods have been proposed and discussed by experts. Since the detection problem in the fisheries context is not essentially different than in other fields, most of these methods should be directly relevant to fisheries applications. However, WGMG was not a po‐sition to review the full array of available methods, keeping in mind that it was the first time that it was asked to address this term of reference. Therefore, a subset of such methods only is considered here.

Before starting to identify shifts in time‐series there are questions that need to be ad‐dressed; this is necessary to choose the right procedure of shift detection:

a ) What is the specific purpose of the exercise? i ) Purely detecting one or more shift(s) in

1 ) time‐series? 2 ) other types of data?

ii ) Fitting a (good) model to the disturbed time‐series and by forecasting diagnose potential future effects of the shift?

iii ) Removing the shift(s) detected for a correction of the time‐series (shift correction)?

b ) Of what type is the shift: shock‐ (impulse, pulse), step‐, continuous‐like? i ) Is it a shift in the mean (level)? ii ) Is it a shift in the time‐trend? iii ) Is it a change in the variance (heteroscedasticity)?

c ) How strong is the shift / effect / response relative to the “normal” fluctua‐tion? i ) Is the underlying time‐series stationary? ii ) What are the assumptions / limitations of the identification methods

Here we review and also explore the applicability of four different classes of instru‐ments to fisheries problems:

1 ) Methods of statistical process control (SPC); 2 ) Econometric methods to detect structural breaks (Multiple regression

models); 3 ) Time series methods to detect interventions (ARIMAX models).; 4 ) Illustrative graphical methods such as traffic light plots that help to iden‐

tify and locate potential changes through quantile‐based colour coding of time‐series.

A description of the methods and of their theoretical bases is provided in Section 5.2. It should be mentioned that methods 3) and 4) are “static” methods which assume that the location of (respectively) the break and the intervention is known before‐hand. Given this, these methods need to be made “iterative” in that they now scan the time‐series to find (estimate) the location of the structural break or intervention. This is done here by using a search algorithm that looks for the best value of a qual‐ity‐of‐fit criterion. Here we use the mean squared error (MSE) along with Akaike’s information criterion by at the same time testing whether the structural break or in‐tervention was significant. Section 5.3 then gives examples of applications to two “familiar” North Sea stocks.

5.2 Description of Methods

5.2.1 Methods of Statistical Process Control (SPC)

SPC methods have been routinely used in manufacturing contexts since the 1930s, to signal anomalies in production processes. They also have found applications in many other domains, including environmental monitoring, but fisheries applications have been rare (Nicholson 1984; Scandol 2003, 2005). They include a variety of tools (see e.g. Montgomery 2005), among which control charts are the most commonly used. Control charts (CC) are graphical displays of some summary statistic of the observed data (e.g. indicators) against the order index of the sample (e.g. time), together with reference “marks” based on the in‐control mean and variance, that are designed to detect whether a worrisome change in process output is indicated by the current data and action is required to fix it. Different variants of CC are designed to deal with con‐tinuous or binary data, proportions, attributes, univariate or multivariate measure‐ments, etc.; also, specific CC can be used to monitor shifts in the mean level or in the variance, and combination of these are sometimes recommended to improve detec‐

tion ability (e.g. Stoumbos et al 2003). Despite their apparent simplicity, CC have strong statistical bases and their properties have been ascertained by eminent experts.

Use of CC is exemplified here with one variant, the “decision interval” (aka “tabu‐lar”) form of the Cumulative Sum (CUSUM) CC. Its features are treated in detail in the book of Hawkins and Olwell (1998), who recommend it for the type of data han‐dled in marine applications (e.g. survey data).

The first step is to characterize the in‐control state of the system, when it is deemed to function within desired specifications, typically from a pilot study or over a selected “reference period”. The associated measurements are used to compute in‐control mean μ and standard deviation σ. Then, as subsequent measurements xi are taken, it is convenient to standardize them as zi = (xi‐μ)/σ The decision‐interval CUSUM works by recursively accumulating positive and negative deviations separately with two statistics:

1max 0,i i iS S z k+ +

−= + −⎡ ⎤⎣ ⎦

for positive deviations (“one‐sided upper CUSUM”), and

1min 0,i i iS S z k− −

−= + +⎡ ⎤⎣ ⎦

for negative deviations (“one‐sided lower CUSUM”), with starting values normally set as 0 0 0S S+ −= = . A CUSUM chart is obtained by plotting these statistics against i. If measurements tend to stay above the in‐control mean, the upper CUSUM S+ develops an upward trend; likewise, the lower CUSUM S‐ shows a downward trend if observa‐tions are consistently below the mean.

The chart’s parameter k – usually called the reference value, or the allowance – is related to the size of the smallest shift in the level of z that one is wishing to detect quickly. The decision rule is to declare an out‐of‐control state whenever S+ exceeds the decision interval h or S‐ falls below –h. The values chosen for the parameters h and k, both be‐ing measured in standard deviation units, determine the performance of the CC. There is no theoretical objection against setting different h‐k pairs for upper and lower CUSUMʹs if changes in one direction matter more than in the other. The CUSUM chart together with its k and h parameters define an SPC monitoring scheme.

The performance of control charts is generally evaluated in terms of their run length (RL). A run is the number of sampling events that elapse between the start of the monitoring and the first alarm. The run length is a random variable whose probabil‐ity distribution depends on the process and the chart parameters. Its expectation – called Average Run Length (ARL) – is commonly used as a summary measure of per‐formance. The notation ARL(δ) is used to designate the ARL of an SPC scheme for detecting a change of size δ (in σ units) occurring in the process mean level. Thus, ARL(0) is the ARL of a scheme when the process actually stays in‐control all the time (in‐control, or IC ARL); yet, due to its inherent variability, an alarm may be raised by chance alone when the chart is updated with a new datum. In other words, ARL(0) is the average time until a false alarm is raised, which should ideally be large. Con‐versely, if the mean of the process distribution shifts from μ to μ+δ, due to an anom‐aly, the chart should detect this quickly, implying a short ARL(δ).

Chart parameters can be tuned to achieve a desired compromise between a long ARL(0) and a short ARL(δ) —i.e. low false alarm rate vs. fast detection abilities. ARL values for given h, k and δ can be looked up in tables or provided by computer rou‐tines (see e.g. Supplementary material with Mesnil and Petitgas, in press). There is broad consensus in the SPC literature that k should be set at half the value of the change of interest; too small values of k should be avoided. h is adjusted to the de‐sired rate of false alarm; larger values of k and h lead to larger ARLs. It must be em‐phasized that the statistical properties of the CUSUM – and in particular the ARL values tabulated in most manuals – are dependent on the following assumptions be‐ing met: (1) monitored data normally distributed; (2) non‐correlated residual varia‐tion; (3) in‐control parameters known rather than estimated. Adequate treatment can be applied to approach (1) and perhaps (2), but (3) may be more difficult for our ap‐plications. Violation of any of these assumptions has the effect of reducing the actual in‐control ARL compared to the “clean” case, i.e. the risk of false alarm is in practice greater than indicated from tabulated ARLs. An ad hoc remedy is to take relatively large h values. A conservative advice is to use (k,h) parameters giving large in‐control RLs: ARL > 20 years and 25th percentile of RL distribution > 10 years.

Perhaps the main limitation in fisheries applications is our poor ability to characterize the reference state, which is “estimated” based on a finite set of existing data, without possibility of replication, instead of being determined through planned experiments (industry) or with reference to norms (pollution, health). However, the reference state does not imply perfect stability and the process may show substantial variability even when deemed well behaved. The goal of control charts is to pinpoint those events where the state of the system deviates beyond the domain of its inherent vari‐ability.

An alternative worth considering is the Exponentially Weighted Moving Average (EWMA) chart, which also uses the information from the full series through the re‐cursion:

1)1( −−+= iii ExE λλ

with E0 set at the target value or at the mean over the reference period. The parameter (0 1)λ λ< ≤ determines the weight placed on each data; simple calculations show

that the weight decreases geometrically with the age of the data. Also, the weights sum to unity. Like with the CUSUM, the control limits are set at some multiple h of the standard deviation of Ei; if the observations are independent random variables with variance σ², the latter is given by:

2[1 (1 ) ]2

λσ σ λ

λ= − −

Note that the width of the control interval varies with i; as the scheme is maintained, the width expands from a narrowband initially and tends to reach an asymptote for large i (steady state control limits) at:

0 /(2 )hμ σ λ λ± −

The choice of h is also based on trade‐offs between in‐control and out‐of‐control ARL, which depend on λ. Popular choices of λ are in the range 0.05 0.25λ≤ ≤ and smaller

values are preferred for detecting smaller shifts; small values also improve the EWMA’s robustness to non normality of the data. In many respects, the EWMA per‐forms very much like the CUSUM, but with more inertia to catch up changes when λ is small. A feature of interest is that the Ei value is indeed a forecast of where the process mean value will be at time i+1; its performance can thus be evaluated also by comparing Ei with the subsequent data value xi+1.

5.2.2 Analysis of structural breaks using econometric techniques

The analysis of structural breaks has been established and originally introduced into econometric theory to test whether the structure of linear regression models fitted to economic data is actually linear. This concept is an extension of the simple multiple regression case with metric explanatory variables as it allows us to piece‐wisely com‐bine linear segments with changing slopes and/or intercepts as part of one model us‐ing categorical variables. The idea here is to use preferably binary indicator (dummy) variables as explanatory variables that help to detect and mimic structural breaks in the linear relationship between the response and the exogenous variable(s). However, the segments between the structural breaks are supposed to be continuous and linear. Two principal cases need to be distinguished: those methods which allow only changes of the slope of the segments from those that allow changes both in the inter‐cept and the slope. The first type of methods is called piecewise regression models which in fact are special cases of so‐called spline functions. The second type of meth‐ods is called switching regression methods. Both types of methods can be further subdivided into smaller sub‐classes dependent on the number of breakpoints to be identified. For further details regarding the theory of structural breaks as part of lin‐ear regression modelling, see Lütkepohl (1993), Pindyck and Rubinfeld (1991) and Fahrmeir et al. (1996).

5.2.2.1 Piecewise linear regressions (PLR)

In general piecewise linear regressions only allow variation in the slope(s) of the “re‐gression pieces” involved (not the intercept). The type of piecewise regression may be distinguished by the number of breaks assumed/modelled. In the following two ex‐amples are outlined: a piecewise linear regression with 1 and one with 2 structural breaks. Several example types of structural breaks are given in Figure 1 of WP 16 (see specifically (a), (e) to (i)).

PLR with 1 structural break

For a persistent effect that changes its trend after the breakpoint the piecewise regres‐sion may be specified as:

1 2 3 0 0 tC t (t t ) D= β + β + β − + ε

with the βs being the linear model parameters, εt the error term, t the time, t0 the point in time at which the break is assumed to have occurred and D0 a binary term that models the break. In such a case D0 may be formulated as

1 if t tD

0 otherwise

⎧⎨⎩

If the effect is rather pulse or shock‐like it may be modelled as an immediate peaking effect at solely one specific point in time, i.e.

1 2 3 0 tC t D= β + β + β + ε with 0

1 if t tD

0 otherwise

⎧⎨⎩

PLR with 2 structural breaks

Sometimes more than one structural break may occur. In such a case the above linear piecewise regression concept may be extended with further binary indicator variables Di, for instance, in the following way with two binary dummy variables D0 and D1:

1 2 3 0 0 4 1 1 tC t (t t ) D (t t ) D= β + β + β − + β − + ε

The two structural breaks at times t0 and t1 may be, for instance, specified as continu‐ous effects such as

1 if t tD

0 otherwise

⎧⎨⎩

1 if t tD

0 otherwise

⎧⎨⎩

5.2.2.2 Switching linear regression method

The piecewise regression concept does not allow intercepts and slopes varying at the same time. If for whatever reason the intercept may also change then the switching regression concept can help to formulate such an effect. Here both the intercept and the slope are allowed to change. As an example see the following equation:

1 2 3 0 4 0 tC t D t D= β + β + β + β + ε with 0

1 if t tD

0 otherwise

⎧⎨⎩

and εt being the error term.

5.2.2.3 Pros and Cons

• The regression methods as described above are relatively objective. Used within the time domain, their general potential is nearly the same on aver‐age.

• The regression methods appear to be relatively flexible tools in that they allow for setting up the detection of the shift in different ways using one or more dummy variables that may be coded in different ways (as examples, see the equations related to D0 or D1 above).

• The regression methods are designed to predict the future and as such can forecast the potential future effect of the shift(s). This can also be done in terms of best and/or worst case scenarios. One way to do this is to specify a shift close to the end of the time‐series as being persistent.

5.2.3 Analysis of interventions using an ARIMAX approach

Interventions are used for modelling events that occur at specific times. Intervention models may be seen as a special kind of regression model where the explanatory variables are binary indicator variables (dummy variables) taking on the values of 0 or 1. Having said this, regression models may be more generally interpreted as spe‐cial cases of an autoregressive integrated moving average model (the ARIMA or Box‐Jenkins model). In contrast to “normal” regression, ARIMA models allow for auto‐

correlation in the error term εt. This is the reason why the name “intervention model” is merely used in pure time‐series analysis, specifically in the context of ARIMA modelling. As part of this, intervention models or interrupted time‐series models may be seen as special transfer functions in which the exogenous variable (the input series) is not metric but an indicator variable containing discrete values that flag the occurrence of an event affecting the response series. Given this, this event is an inter‐vention in or an interruption of the normal evolution of the response time‐series, which, in the absence of the intervention, is usually assumed to be a pure ARIMA process.

Intervention models can be used both to model and forecast the response series and to analyze the impact of the intervention. When the focus is on estimating the effect of the intervention, the process is often called intervention analysis or interrupted time‐series analysis. Several example types of interventions are displayed in Figure 1 of WP 16 (see specifically (a) to (d)). Two principal types of interventions may be dis‐tinguished from each other:

• Impulse interventions (also known as pulse or point interventions) and • Continuing interventions.

However, continuing interventions can be further classified into:

• Step interventions and • Ramp interventions.

In general ARIMA models are of form ARIMA(p, q, d) in which p, q and d are the lag orders of the underlying process identified (p for the AR component, q for the MA component, d is the differentiation order). Intervention models are special cases of a more general type of ARIMA models: These are called ARIMAX models or transfer function models that can contain one or more exogeneous variables. In case of inter‐vention models these variables are indicator or dummy variables instead of metric ones. These dummy variables will be denoted as Di in the following (and more spe‐cifically as D0 if only one indicator variable is given). It should be noted that both lin‐ear regression models as well as ARIMAX models are designed for prognostic purposes. This is the reason why these may be advantageous over standard QC or SPC methods as the first two are able to predict a future effect of the intervention. This means that interventions that occur towards the end of the time‐series may be evaluated by diagnosing their impact on the future. This is not possible with stan‐dard QC methods.

Further details regarding the theory of interventions within the context of ARIMA modelling can be found in Schlittgen (2001), Schlittgen and Streitberg (2001) as well as in Lütkepohl (1996). The SAS/ETS Software (1991) gives some illustrative examples on how to do this type of modelling using SAS. Gröger and Rumohr (2006), Gröger et al. (2007) and Gröger and Rohlf (2007) present solutions to climate linked benthic and fisheries problems as part of more general transfer function models.

5.2.3.1 Impulse interventions

This type of intervention addresses a one‐time event (an impulse or pulse). To mark an intervention at that specific point in time, denoted as t0 in the following, the input variable D0 (indicator variable, dummy variable) takes on the value of 1 only at t0 and 0 elsewhere. Intervention variables of this kind are sometimes called impulse, pulse or point functions and consequently are of following form:

1 if t tD

0 otherwise

⎧⎨⎩

5.2.3.2 Continuing Interventions

Other interventions can be continuing, in which case the input variable flags periods before and after the intervention. Two main types may be distinguished from each other:

1 ) Step Interventions: Step interventions are continuing, and the input time‐series D0 flags periods after the intervention. For a step intervention, before time t0, values of the intervention variables are zero then step to a constant level thereafter, hence

1 if t tD

0 otherwise

⎧⎨⎩

2 ) Ramp Interventions: A ramp intervention is a continuing intervention that increases linearly after the intervention point in time t0. For a ramp inter‐vention, before time t0, values of the intervention variable D0 are zero and increase linearly thereafter that is proportional to time. Hence,

t t if t tD

0 otherwise

− >=

⎧⎨⎩

5.2.3.3 The steps of an ARIMAX approach

The approach may be described by simplifying the notation using the abbreviation ARIMAX (p, q, d; Di) in which p denotes the order of the AR component, q that of the MA component, d the order of differentiation and finally Di (or more specifically D0) the exogenous binary indicator variable that is coded as above. It should be noted here that more than one exogenous indicator variable may be included if necessary. The ARIMAX modelling steps, given that the input series is binary, may be summa‐rized as follows:

1 ) Stabilizing a non‐stationary response time‐series (differentiation of order d); in case of differentiation also the dummy variable(s) need to be differ‐enced by using the same order of d.

2 ) Estimating a preliminary intervention model with an appropriate transfer function form, identified as described above.

3 ) Identifying the error process for the model from plots of the autocorrela‐tion (ACF) and the partial autocorrelation functions (PACF) for the residu‐als from the preliminary model.

4 ) Estimating the parameters of the final intervention model, including the error process, by including the identified correct orders of p and q from the previous step.

5 ) Applying final model diagnostics. 6 ) Performing forecasts (effect scenarios).

5.2.3.4 Pros and Cons

• ARIMAX based methods are more or less objective instruments.

• ARIMAX based methods are relatively flexible in that they allow for set‐ting up the detection of the shift in different ways using either one or more dummy variables that may be coded in different ways (see Figure 1 of WP 16 and the above equations related to D0).

• In case the time‐series to be studied shows signs of an autoregressive and/or a moving average process component being involved, ARIMAX based methods are superior to regression methods as their parameter es‐timates would be unbiased in such a case.

• As with regression methods ARIMAX based methods are designed to pre‐dict the future and as such can forecast the potential future effect of the shift (intervention). Again, this may be done on the basis of best and/or worst case scenarios. One way to do this is to model an intervention close to the end of the time‐series as being persistent. This is the reason why ARIMAX methods may be superior to simple QC methods. However, while regression methods are trend depicting methods, ARIMAX models try to reconstruct the entire time‐series.

• One trade‐off is that ARIMAX models may be more data demanding than other methods such as the regression methods described above. This is mainly driven by the underlying process which determines the orders of p, q and d and also by the number of parameters to be estimated that are re‐lated to the dummy variables included.

5.2.4 Other trend detection methods

This section briefly reports on two methods for detecting trends developed or ex‐plored during the EU FISBOAT project.

Most trend detection methods are best fit for trends persisting over long time periods, e.g. a decade or so. Trenkel and Rochet (in press) developed a method to test for trends over the recent past (last three to five years). In brief, it goes through five steps:

1 ) Fit a smoother to the whole available indicator time‐series to obtain a smoothed series;

2 ) Test whether smoother provides a satisfactory fit to the data, if yes proceed with 3., otherwise method is unsuitable for data;

3 ) Calculate first and second derivatives for the smoothed time‐series for all years (including years with no data);

4 ) Carry out parametric bootstrap of indicator time‐series to obtain resam‐pled indicator time‐series and repeat steps 1.‐3.;

5 ) Carry out two sets of intersection‐union tests, the formulation of which depends on the sign of change one is looking for (see paper for details).

An application of the method to IBTS data for 33 North Sea taxa indicated that the intersection‐union test was able to pick up far more changes than both a linear re‐gression and a Mann‐Kendall test, in particular for a short time horizon of three years. An advantage over the latter methods is also that it accounts for the uncer‐tainty in the survey indices.

Although fisheries scientists typically prefer modelling, i.e. parametric methods, for assessing trends in fish stocks, trends could sometimes be established with less reli‐ance on assumptions about the data and models if nonparametric methods were

used. Cotter (in press) reviewed – and compared using IBTS indices for North Sea cod – alternative nonparametric tests, including runs test, Mann‐Kendall’s K, Spear‐man rho, Jonckeere test, Thiele’s or Sen’s slope estimator, Cochran’s Q, and Dietz‐Killeen multivariate trend test. Cotter concludes that each method has strengths and weaknesses, and suggests applying an assortment of nonparametric methods when checking trends.

5.2.5 A simple graphical method to detect shifts: the traffic light plot

Traffic light plots are very illustrative for recognizing overall patterns in multiple time‐series. They display colour coded quantiles or percentiles (for instance, quin‐tiles) of normalized variables in order to illustrate pronounced changes in their val‐ues. They provide a simple visual inspection method to observe changes in patterns such as regime shifts or oscillations and allow a simultaneous identification of corre‐spondences between these variables when their traffic lights are plotted alongside. For an example of the usefulness of traffic light plots with regard to fisheries related problems see Gröger and Rohlf (2007). An example of traffic light plots is given in Figure 2 of WP 16.

Pros and Cons of the traffic light approach are:

• As a graphical tool this method may be quite illustrative by simultane‐ously giving a good overall picture over multiple time‐series and can help to preliminarily locate a potential shift.

• However, this method is not an objective instrument to decide exactly where a shift occurred as it does not allow for statistical testing. Interpret‐ing traffic light plots always involve a good portion of the reader’s subjec‐tivity; this can be alleviated by using the breakpoint detection methods above for setting the colour codes.

5.3 Application of change-detection methods

WGMG examined the utility of the methods described above in the context of a spe‐cific question. When deciding on plausible values for incoming recruitment in catch forecasts, assessment WGs often have to make a choice between using the GM over the full time‐series or, when a persistent change is suspected, over a shorter recent period; in the latter case, choosing the period is often subjective. The question thus was, can those methods help to make that choice as objective as possible?

Two test cases were considered: (1) North Sea cod recruitment, based on 1st Quarter IBTS indices for age 1; for consistency with use of these data by WGNSSK, indices prior to 1983 were ignored; (2) North Sea haddock recruitment (age 0) since 1963, based on VPA results from the 2008 assessment, excluding the 2008 datum.

5.3.1 North Sea cod

5.3.1.1 CUSUM control charts

The indices were log‐transformed to approach normality. The data are plotted in Fig‐ure 5.3.1.1, which does not reveal a clear pattern of change (e.g. a low value similar to the low 2003 index had already been observed in 1985).

1985 1990 1995 2000 2005

IBTS Cod - Log R

Figure 5.3.1.1. NS cod. Log(age 1 IBTS indices) 1983–2008.

Based on expert judgement, the years 1983‐1994 were selected as the reference period needed to compute “in‐control” mean and standard deviation. Moving the final year by ± 2 years did not change the conclusions. An indicative value for the allowance parameter k is half the mean outside the reference period, 0.3 SD units in this case. An alternative value of 0.5 was also considered, assuming that only larger changes were to be detected; moreover, it reduces the risk of declaring a change when there is none. Again, there was no difference in the conclusions, and the choice k = 0.5 was retained. The CUSUM chart is shown in Figure 5.3.1.2. Clearly, the lower CUSUM develops a consistent dive from 1997 on, indicating that a persistent reduction in the mean of the recruitment process has taken place since that year. The upper CUSUM only indicates transient and inconsequential upward changes. Given the steep change between 2002 and 2003, a wide range of values for the control limit h would have triggered an alarm in that period, i.e. some 6–7 years after the change. For illustration, an h of ‐0.2 is shown on the Figure, corresponding to a reasonable compromise in terms of in‐control and out‐of‐control ARLs, but other choices would lead to the same conclu‐sion.

1985 1990 1995 2000 2005

IBTS Cod - LogR - Ref: 83-94 k= 0.5 h= 2 <=> ARL in/out ~ 38/4

upperlowerh

Figure 5.3.1.2. NS cod. CUSUM chart of IBTS age 1 indices.

WGMG noted that, with these parameters, the delay until detection was relatively large. This is mostly due a succession of variable year classes about 2000, causing some hesitation in the CUSUM. In any case, it was noted that it would be imprudent to declare a persistent change unless its existence was confirmed over a sufficient number of years.

In conclusion, the CUSUM results in Figure 5.3.1.2 indicate that a change did occur in the time‐series, specifically in 1997, and also that this change was not detectable as such until 2002‐2003. Given that an alarm was triggered by 2002‐2003 based on the entire series, the question was raised as to whether the chart would have been able to detect a change by the time of the alarm. Trials with IBTS data up to 2002 and the same chart’s parameters indicated that the results would have been ambiguous. However, the addition of the 2003 data was sufficient to ascertain that a persistent change had been ongoing since 1997.

5.3.1.2 Piecewise regression approach

To make the approach more flexible, for the analysis of North Sea cod data in contrast to the examples in the theoretical section above, a piecewise regression approach of the following form has been chosen:

1 2 3 0 4 1 tY t D D= β + β + β + β + ε

1 if t tD

0 otherwise

⎧⎨⎩

1 if t (t 1)D

0 otherwise

⎧⎨⎩

in which Y denotes the age 1 IBTS index. Given this, D0 picks up a potential impulse at t0 while D1 takes care of a potential continuous effect induced by D0 immediately after the impulse has occurred. This setup means that the impulse and its conse‐quence (assuming a potentially persistent effect) are modelled independently. Based on this the trend lines before and after the impulse are independently handled and thus may differ in slope and intercept from each other. The shift detection algorithm can then be summarized as follows: While iteratively moving t0 over the time‐series (by incrementing t0 by 1 year each step) a quality of fit‐criterion (Akaike’s information criterion AIC) is recorded per each step. In addition the significance of the pulse at t0 as well as the change of the level before and after the pulse is recorded and tested (F tests). Per each iterative step all observations plus the model fitted with its 95% con‐fidence intervals are then plotted over time. At the end of the iterative process all stepwisely resulting marginal p values of both F tests are plotted along with the AIC criterion over time. The decision criterion is: The lowest AICs that at the same time point to statistically significant pulses along with statistically significant before/after level shifts. In combination this indicates the location(s) of a significant shift. For the cod case this is illustrated by Figure 5.3.1.1:

Figure 5.3.1.2.1. Summary of the shift detection process using the regression approach for North Sea cod. (a) Traffic light plot of quintiles of cod ages 1 to 6. (b) Resulting plot of decision criteria: the continuous red line indicates the AIC behaviour over time where in contrast to the open cir‐cles the blue filled circle here indicates that this impulse was statistically significant. The grey needles indicate the p values of the F test for comparing before and after impulse levels, the up‐per horizontal line marks the given level of α = 0.05. (c) Plot of observations and fitted model for t0 = 1997 in which the significant shift as indicated by (b) occurred; the continuous grey line indi‐cates the plot of observations over time, the continuous red line indicates the regression model fitted to the observations, the two continuous orange lines indicate the upper and lower 95% con‐fidence limits for the expected value (predicted line segments), the three blue lines (trend line: continuous line; upper and lower 95% prediction limits: dashed lines) display the overall trend in the data series.

The traffic light plot in Figure 5.3.1.1(a) indicates some fluctuations during period 1971 to 1977 and seemingly a bigger in the age 1 index change in year 1997. As the initial fluctuations in age 1 cannot be well interpreted, as with the CUSUM plot the regression analysis was performed for years 1978 onwards. Figure 5.3.1.1(b) indicates that a significant shift occurred in year 1997 (as for CUSUM). It is the year with the best fit (the adjusted R2 explains more then 64% of the variance and thereby is about three to four times higher than the other adjusted R2 values, AIC is minimum at the same time displaying the biggest change to AIC values before and after). Moreover, the impulse was statistically significant (p < .0001) and the p values (grey needles) indicate a significant change in the level before and after the impulse (the p values of 4 consecutive years fall below the horizontal orange line in 5.3.1.1(b)). This can also be similarly inferred from Figure 5.3.1.1(c) in which the regression segments before and after the spike (peak) at t0 differ significantly from each other: each line segment is not covered by the confidence interval of the other line segment.

(a) (b)

5.3.1.3 ARIMAX approach

The results regarding the ARIMAX approach are only given here for the cod case. The reason is that in the haddock case no autocorrelation could be found (the had‐dock case study also lacked an auto regression as a moving average component: p=0, q=0) so that the specification of the ARIMAX model was identical to the regression approach above. However in the North Sea cod case, the data appeared non‐stationary and we found 2nd order autocorrelation. Given this, for the analysis of North Sea cod data the following subset ARIMAX design has been chosen:

Y = ARIMAX(p=(2), q=0, d=1; D0, D1)

1 if t tD

0 otherwise

⎧⎨⎩

1 if t (t 1)D

0 otherwise

⎧⎨⎩

as exogenous transfer variables. In this formulation, Y denotes the age 1 IBTS index. The analysis has been performed on the 1st order differences. As before regarding the regression model above D0 picks up a potential impulse at t0 while D1 takes care of a potential continuous effect induced by D0 after the impulse has occurred. Apart from the difference in the assumptions regarding the pure modelling process, the decision criteria are similarly constructed to the regression case above and will thus not be repeated here.

Figure 5.3.1.3.1. Summary of the shift detection process using the ARIMAX model for North Sea cod. (a) Traffic light plot of quintiles of cod ages 1 to 6. (b) Resulting plot of decision criteria: the continuous red line indicates the AIC behaviour over time where in contrast to the open circles the two blue filled circles here indicate that the impulses associated were statistically significant. The grey needles indicate the p values of the F test for comparing before and after impulse levels, the upper horizontal line marks the given level of α = 0.05. (c) Plot of observations and fitted model for t0 = 1997 in which the significant shift as indicated by (b) occurred; the continuous grey line indicates the plot of observations over time, the continuous red line indicates the regression model fitted to the observations, the two continuous orange lines indicate the upper and lower 95% prediction limits for the individual value, the three blue lines (trend line: continuous line; upper and lower 95% prediction limits: dashed lines) display the overall trend in the data series, the vertical continuous black line separates given from predicted data.

Figure 5.3.1.3.1(b) looks very similar to Figure 5.3.1.2.1(b). As before in case of the regression approach, Figure 5.3.1.3.1(b) clearly indicates that a significant shift obvi‐ously occurred in year 1997. It is the year with the best fit (AIC is minimum at the same time indicating the biggest change of AIC values before and after that pulse). Also here, the impulse was statistically significant (p < .0001) and the p values (grey needles) indicate a significant change in the abundance level before and after the im‐pulse (as in the regression case the p values of 4 consecutive years fall below the hori‐zontal orange line in 5.3.1.3.1(b)). The ARIMAX based figure however indicates that the impulse in 1997 seems to be stronger than given by the regression based plot above as three AIC values are located on the spike marking year 1997 of which two are significant.

(a) (b)

5.3.2 North Sea haddock

5.3.2.1 CUSUM control charts

Recruitment estimates from the VPA summary were log‐transformed. A plot of the data shows the wide variability typical of that stock, but no clear indication that re‐cent years have been special (Figure 5.3.2.1).

1970 1980 1990 2000

NS haddock - Log Rvpa

Figure 5.3.2.1. NS haddock. Log(Rvpa) 1963‐2007.

The years 1963‐1990 were selected for the reference period. Trials with recruitment data since 1981 and a reference period 1981‐1990 gave the same conclusions regard‐ing recent changes. Two values (0.3 and 0.5) for the allowance parameter k were tried, and led to the same conclusion; thus the value of 0.5 was retained as it is less likely to result in false detection. The CUSUM chart is shown in Figure 5.3.2.2. The lower CUSUM indicates a persistent reduction in the mean level of recruitment since 2000, and the relatively strong 2005 year class was not sufficient to neutralize the chart. The slope is so steep between 2000 and 2004 that the detection ability would not have been sensitive to the choice of the control limit h over a wide range. A value of –2.3 was adopted for the illustration, corresponding to in‐control ARL of 54 years and out‐of‐control ARL of 4 years; that value would also have avoided triggering alarms for the earlier transient changes in both the lower and upper CUSUMs. Overall, several combinations of settings for the chart would all lead to the conclusion that recruit‐ment since 2000 has a different character than in the earlier period.

1970 1980 1990 2000

NS Haddock - LogR - Ref: 63-90 k= 0.5 - h= 2.3

upperlowerh

Figure 5.3.2.2. NS haddock. CUSUM chart of recruitment estimates.

5.3.2.2 Piecewise regression approach

To make the approach more flexible, for the analysis of North Sea haddock (years 1963 – 2007) in contrast to the examples in the theoretical section above, a piecewise regression approach of the following form has been chosen:

1 2 3 0 4 1 tY t D D= β + β + β + β + ε

1 if t tD

0 otherwise

⎧⎨⎩

1 if t (t 1)D

0 otherwise

⎧⎨⎩

in which Y denotes the recruitment estimate. Given this, D0 picks up a potential im‐pulse at t0 while D1 takes care of a potential continuous effect induced by D0 immedi‐ately after the impulse has occurred. This setup means that the impulse and its consequence (assuming a potentially persistent effect) are modelled independently. Based on this the trend lines before and after the impulse are independently handled and thus may differ in slope and intercept from each other. The shift detection algo‐rithm can then be summarized as follows: While iteratively moving t0 over the time‐

series (by incrementing t0 by 1 year) a quality of fit‐criterion (Akaike’s information criterion AIC) is recorded on each step. In addition the significance of the pulse at t0 as well as the change of the level before and after the pulse is recorded and tested (F tests). For each iterative step all observations plus the model fitted with the 95% con‐fidence intervals around the regression lines are then plotted over time. At the end of the iterative process all stepwisely resulting marginal p values of both F tests are plotted along with the AIC criterion over time. The decision criterion is: the lowest AICs that at the same time point to statistically significant pulses as well as statisti‐cally significant before/after level shifts. This in combination indicates the location(s) of a significant shift. For the haddock case this is illustrated by Figure 5.3.2.1:

Figure 5.3.2.2.1. Summary of the shift detection process using the regression approach for North Sea haddock. (a) Traffic light plot of quintiles of recruitment (R), total‐stock biomass (TSB), spawning‐stock biomass (SSB) and catch (C). (b) Resulting plot of decision criteria: the continu‐ous red line indicates the AIC behaviour over time where in contrast to the open circles the two blue filled circles indicate that this impulse was statistically significant. The grey needles indi‐cate the p values of the F test for comparing before and after impulse levels, the upper horizontal line marks the given level of α = 0.05. (c) Plot of observations and fitted model for t0 = 1967 in which the first significant shift as indicated by (b) occurred; the continuous grey line indicates the plot of observations over time, the continuous red line indicates the regression model fitted to the observations, the two continuous orange lines indicate the upper and lower 95% confidence limits for the expected value (predicted line segments), the three blue lines (trend line: continu‐ous line; upper and lower 95% prediction limits: dashed lines) display the overall trend in the data series.

(a) (b)

The traffic light plot in Figure 5.3.2.1(a) indicates some fluctuations during years 1963 to 1974 and seemingly a bigger change in 1999. Figure 5.3.2.1(b) indicates that obvi‐ously two statistically significant impulses occurred in 1967 and 1999 (p < .0001). While for 1999 no significant level changes before and after the impulse could be de‐tected, the situation for the 1967 peak is obviously different. The p values (grey nee‐dles) indicate a significant (persistent) change when comparing the before and after levels for this impulse (actually the p values of 2 consecutive years fall below the horizontal orange line and fluctuating thereafter over about 10 years around this line). It is also the year with the best fit (the adjusted R2 explains more then 74% of the variance and thereby is about three to four times higher than the other adjusted R2 values, AIC is minimum at the same point displaying the biggest change to AIC val‐ues before and after).

5.3.3 General comment regarding the intervention and structural break models:

Right now the entire period before the pulse is taken and compared with the period thereafter. As part of the iteration this means that all values before the pulse are somewhat accumulated (aggregated), even significant previous pulses, and are then compared with all the values after the next pulse to be considered. To avoid this, the approach may be modified by using a fixed reference period that then may be com‐pared with the iteratively changing period after the pulse. Another approach would be to install a window of a specific size, of say five or more values before and after the break that may be taken to test a potential level change within this window, i.e. between the level right before and right after the impulse. Whereas this approach gives the user (on the one hand) some more flexibility, it at the same time involves a larger portion of subjectivity as decisions on what is an appropriate reference period or what is an appropriate window size are required. This however also applies to the CUSUM methods applied here as these also require to define a reference period from where the mean and variance can be taken to get the expected values for monitoring the process.

5.4 Conclusions

The test applications above show that available change‐detection methods can help to resolve a problem regularly encountered by assessment WGs. For both stocks, they were fully consistent in the identification of breakpoints in recruitment series despite their distinct theoretical frameworks. Beyond the advantage of permitting an objec‐tive choice of breakpoints, the methods are very simple to implement.

In this instance, the methods were applied to series of indices or estimates. However, they can equally be used with residuals about a given functional relation, e.g. residu‐als about a stock‐recruitment relationship when dealing with recruitment data. Al‐though the example focused on detection of mean levels, it is recalled that specific variants of the methods can deal with changes in variance.

5.5 Recommendations

Assessments WGs are encouraged to further explore the application of these (and other) relatively simple methods of change detection, which enable choices on inputs to forecasts to be made in an objective and replicable way.

Expertise in these methods can certainly be found among ICES scientists involved in environmental surveillance, and notably in WGSAEM. ICES is in a unique position to facilitate cross‐fertilization between the methods support groups of its fisheries and

environment branches, and practical ways of enhancing collaboration should be im‐plemented.

6 Conclusions

The conclusions of the meeting are summarized in the Executive Summary at the be‐ginning of this report.

6.1 Future directions for WGMG

The morning of the second day of the meeting included a discussion on the possible future directions of WGMG, and how the group could make the best contribution to the work of the broader ICES community. The meeting was joined for this session by Mike Sissenwine (Chair of ACOM).

Since being reconvened in 2001, the work undertaken by WGMG has considered four main themes:

1 ) Testing the behaviour of stock assessment methods under model mis‐specification;

2 ) Evaluating management strategies and harvest control rules; 3 ) Commenting on software testing; 4 ) Considering the estimation of variance.

There have been exceptions, notably a special request in the 2003 meeting to provide advice on suitable assessment models for Norwegian spring‐spawning herring and blue whiting, but overall WGMG has been focused on these four broad issues.

However, the question remains: how much of the recent work of WGMG has actually changed how ICES (and ICESʹ working groups) operate? Some of the management strategy evaluation development work has fed into the advisory area, and develop‐ments of stock assessment methods initiated by WGMG have made a difference (for example, the BADAPT and SURBA methods), but much of the output of WGMG has not had a strong influence beyond the confines of the meeting. The discussion was wide‐ranging, but concentrated on how we could change this. The challenge is to try and continue strong threads of research within the group while avoiding marginali‐sation.

Better communication between WGMG and the wider assessment community is clearly required. One way to achieve this could be for WGMG to act (at least in part) as a method exploration and development service for the series of benchmark as‐sessment meetings that are planned to be held each year by ICES. In this context, a possible schedule would be as follows:

1 ) Well in advance of the WGMG meeting in October, the WGMG Chair would approach the chairs of the forthcoming benchmark meetings (and, ideally, the relevant stock assessment scientists) to discuss and determine the key methodological issues for those benchmarks. For example, a flat‐fish benchmark group might be concerned with the estimation and model‐ling of discards, while a roundfish group might consider stock structure to be a priority. Many groups would have concerns about commercial catch data and be interested in conducting survey‐based assessments. In some cases, the generation and presentation of management advice would be the pressing concern. These are just a few of the many possibilities, and the WGMG Chair would have to be careful to ensure a focus on a limited

number each year. This approach assumes that the subjects of the bench‐mark meetings are determined early in the preceding year.

2 ) These discussions would provide the basis for the WGMG ToRs for the Oc‐tober meeting. The WGMG Chair would circulate these ToRs as widely as possible, to try to bring together a group with the skills and interests nec‐essary to address the relevant issues.

3 ) WGMG would then meet in October (or around then) to consider ways in which to improve the methods available for the key issues. The intention would be to feed these into the subsequent benchmark meetings in the early spring of the following year ‐ but only in a general sense. Rather than do the work of the benchmark groups for them, WGMG would provide general advice on how to deal with generic issues (such as modelling dis‐cards etc.)

This scheme should ensure that WGMG remains relevant and focused to the key stock assessment issues prevalent within ICES, while also allowing the flexibility for some regular themes to be continued (in as much as they are relevant to a forthcom‐ing benchmark).

A final point was raised about the large number of projects funded by the EU and others around the world which generate potentially useful approaches that never see the inside of an assessment meeting – the flexible‐yet‐focused approach outlined above would provide an opening for some of these to be implemented in an ICES context. This may require the participation by ICES in such funded projects to facili‐tate cross‐fertilization.

7 References

Aarts, G., and Poos, J.‐J. 2008. Comprehensive discard reconstruction and abundance estima‐tion using flexible selectivity functions. ICES WGMG 2008, WP 9.

Aarts, G., and Poos, J.‐J. 2008. Comprehensive discard reconstruction and abundance estima‐tion using flexible selectivity functions. In prep.

Catchpole, T., Frid, C., and Gray, T. 2005. Esimating discards using selectivity data: the effects of including discard data in assessments of the demersal fisheries in the Irish Sea. Journal of Northwest Atlantic fishery Science, 19, 91–102.

Cotter, A J. C. in press. Nonparametric statistical methods for assessing trends. Aquatic Living Resources, 21 (special on‐line issue).

Fahrmeir L., Hamerle, A., and Tutz, G. 1996. Multivariate statistische Verfahren. Walter de Gruyter, Berlin, Germany.

Fernández, C., Cerviño, S., Pérez, N., and Jardim, E. 2008. A Bayesian stock assessment model incorporating discards estimates in some years. ICES WGMG 2008, WP 13.

Fryer, R. F. 2001. TSA: Is it the way? Working Paper to the ICES Working Group on Methods of Fish Stock Assessments, Copenhagen, December 2001.

Gröger, J. P., and Rumohr, H. 2006. Modelling and forecasting long‐term dynamics of Western Baltic macrobenthic fauna in relation to climate signals and environmental change. Journal Sea Research 55:266‐277.

Gröger, J. P., Winkler, H., and Rountree, R. A. 2007. Population dynamics of pikeperch (Sander lucioperca) and its linkage to fishery driven and climatic influences in a southern Baltic la‐goon of the Darss‐Zingst Bodden Chain. Journal of Fisheries Research 84:189–201.

Gröger, J.P., and Rohlf, N. 2007. Shedding light on recruitment mysteries: internal and external signals in the stock‐recruitment relationship of North Sea herring ICES CM 2007/ H:01, pp 26.

Gudmundsson, G. 1987. Time series models of fishing mortality rates. ICES C.M. D:6.

Gudmundsson, G. 1994. Time series analysis of catch‐at‐age observations. Appl. Statist. 43:117‐126.

Hawkins, D. M., and Olwell, D. H. 1998. Cumulative sum charts and charting for quality im‐provement. Statistics for engineering and physical sciences, Springer, New York.

ICES. 2008. Report of the Ad hoc Group on Cod Recovery Management Plan Request (AG‐CREMP), 18–19 August 2008. ICES CM 2008/ACOM: 61 (Draft).

ICES. 2008. Report of the North Sea and Skagerrak Demersal Fisheries Working Group (WGNSSK), 7–13 May 2008. ICES CM 2008/ACOM:09 (Draft).

Kell, L.T., Mosqueira, I., Grosjean, P., Fromentin, J‐M., Garcia, D., Hillary, R., Jardim, E., Mardle, S., Pastoors, M.A., Poos, J.J., Scott, F., and R.D. Scott 2007. FLR: an open‐source framework for the evaluation and development of management strategies. ICES Journal of Marine Science 64: 640‐646.

Kelleher, K. 2004. Discards in the world’s marine fisheries. An update. FAO Fisheries Technical Paper No. 470, Rome, 131 pp.

Lütkepohl, H. 1993. Introduction to multiple time‐series analysis. Springer‐Verlag, Berlin, Germany.

MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge.

Mesnil, B., and Petitgas, P. in press. Detection of changes in time‐series of indicators using CUSUM control charts. Aquatic Living Resources, 21 (special online issue), DOI: 10.1051/alr:2008.

Mohn, R. 1999. The Retrospective Problem in Sequential Population Analysis: An Investigation Using Cod Fishery and Simulated Data, ICES Journal of Marine Science, 56, 473—488.

Montgomery, D. C. 2005. Introduction to Statistical Quality Control 5th Edition. Wiley, New York

Needle, C. L. 2008. Survey‐based fish stock assessment with SURBA. Fisheries Research Ser‐vices Marine Laboratory. Aberdeen, Scotland.

Needle, C. L., and Hillary, R. 2007. Estimating uncertainty in nonlinear models: Applications to survey‐based assessments. ICES CM 2007/O:36.

Nicholson, M. D. 1984. Some applications of CUSUM techniques in fisheries research. Interna‐tional Council for the Exploration of the Sea, CM 1984/D:5: 10 p.

Pindyck R. S., Rubinfeld, D. L 1991. Econometric Models and Forecasts. McGraw Hill, New York, 596 pp.

Punt, A.E., Smith, D.C., Tuck, G.N., and Methot, R.D. 2006. Including discard data in fisheries stock assessments: Two case studies from south‐eastern Australia. Fisheries Research, 79, 239‐250.

SAS/ETS Software 1991. Applications Guide 1, Version 6, 1st Edition: Time Series Modelling and Forecasting, Financial Reporting and Loan Analysis, Cary, NC: SAS Inc., 380 pp.

Scandol, J. P. 2003. Use of cumulative sum (CUSUM) control charts of landed catch in the man‐agement of fisheries. Fisheries Research, 64: 19‐36.

Scandol, J. P. 2005. Use of quality control methods to monitor the status of fish stocks. In Fish‐eries Assessment and Management in Data‐Limited Situations., pp 213‐233. Ed. by G. H. Kruse, V. F. Galluci, D. E. Hay, R. I. Perry, R. M. Peterman, T. C. Shirley, P. D. Spencer, B.

Wilson, and D. Woodby. AK‐SG‐05‐02, Alaska Sea Grant College Program, University of Alaska Fairbanks

Schlittgen, R. 2001. Angewandte Zeitreihenanalyse. Oldenbourg Verlag, München, Germany.

Schlittgen, R., and Streitberg, B. H. J. 2001. Zeitreihenanalyse. Oldenbourg Verlag, München, Germany.

Stoumbos, Z. G., Reynolds, M. R. Jr., and Woodall, W. H. 2003. Control chart schemes for monitoring the mean and variance of processes subject to sustained shifts and drifts. In Handbook of Statistics, Vol. 22, pp 553‐571. Ed. by R. Khattree and C. R. Rao. Elsevier

Trenkel, V.M., and Rochet, M.‐J. accepted. Intersection‐union tests for characterizing recent changes in smoothed indicator time‐series. Ecological Indicators.

Annex 1: List of participants

NAME ADDRESS PHONE/FAX EMAIL

Anders Nielsen DTU‐Aqua Charlottenlund Slot Jægersborg Allé 1 DK‐2920 Charlottenlund Denmark

an@aqua.dtu.dk

Benoit Mesnil IFREMER BP 21105 F44311 – Nantes Cedex 3 France

Tel: +33 240 374 009

Benoit.Mesnil@ifremer.fr

Carmen Fernández

Instituto Español de Oceanografía Cabo Estai, Canido Apdo. 1552 36200 Vigo Spain

Tel: +34 986 492 111 Fax: +34 986 498 626

carmen.fernandez@vi.ieo.es

Chris Darby CEFAS Lowestoft Laboratory Pakefield Road Lowestoft Suffolk NR33 7BA UK

Tel: +44 1502 524329 Fax: +44 1502 524511

chris.darby@cefas.co.uk

Chris Legault NMFS Northeast Fisheries Science Center 166 Water Street Woods Hole, MA 02543 USA

Tel: +1 508 495 2025 Fax: +1 508 495 2393

Chris.Legault@noaa.gov

Coby Needle (Chair)

FRS Marine Laboratory PO Box 101 375 Victoria Road Aberdeen AB11 9DB Scotland

Tel: +44 1224 295456 Fax: +44 1224 295511

needlec@marlab.ac.uk

Jan Jaap Poos IMARES Haringkade 1 1976 CP Ĳmuiden The Netherlands

janjaap.Poos@wur.nl

Joachim Gröger Institut für Seefischerei Palmaille 9 D‐22767 Hamburg Germany

Tel: +49 40 38905 266 Fax: +49 40 38905 263

joachim.groeger@t‐online.de joachim.groeger@uti.bund.de

José De Oliveira

CEFAS Lowestoft Laboratory Pakefield Road Lowestoft Suffolk NR33 7BA UK

Tel: +44 1502 527727 Fax: +44 1502 524511

jose.deoliveira@cefas.co.uk

NAME ADDRESS PHONE/FAX EMAIL

Lionel Pawlowski

IFREMER 8 Rue François Toullec 56100 Lorient France

Tel: +33 297 873846 Fax: +33 297 873836

Lionel.Pawlowski@ifremer.fr

Liz Brooks NMFS Northeast Fisheries Science Center 166 Water Street Woods Hole, MA 02543 USA

Tel: +1 508 495 2238 Fax: +1 508 495 2393

Liz.Brooks@noaa.gov

Mike Sissenwine

ICES H.C. Andersens Boulevard 44‐46 DK‐1553 Copenhagen V Denmark

Tel: +45 3338 6700 Fax: +45 3393 4215

M_Sissenwine@surfglobal.net

Noel Cadigan DFO Northwest Atlantic Fisheries Science Center 80 White Hills Road East PO Box 5667 St. John’s, NL Canada

Tel: +1 709 772 5028

Noel.Cadigan@dfo‐mpo.gc.ca

Tim Miller NMFS Northeast Fisheries Science Center

166 Water Street

Woods Hole, MA 02543

Tel: +1 508 495 2365

Fax: +1 508 495 2393

Timothy.J.Miller@noaa.gov

Mike Sissenwine (ACOM Chair) also attended the meeting during one morning in the first week, to participate in the discussion on the future directions of WGMG.

Annex 2: WGMG Terms of Reference for the next meeting

The Working Group on Methods of Fish Stock Assessment [WGMG] (Chair: C. L. Needle, UK) will meet in Nantes from 13—22 October 2009 to:

a ) Develop methods for fish stock assessment and advice that are applicable to benchmark assessment meetings in 2010, as determined in consultation with relevant benchmark and assessment WG chairs and stock assessors.

WGMG will report by 20 November 2009 for the attention of the Advisory and Re‐source Management Committees.

Supporting Information

Priority: The work of this group is essential to ICES to progress in the development of methods for fish stock assessment and advice.

Scientific justification and relation to action plan:

Action Plan No: XXXX. Term of Reference a) Much of the recent output of WGMG has not had a strong influence beyond the confines of the meeting. The challenge is to try to continue strong threads of research within the group while avoiding marginali‐sation. Better communication between WGMG and the wider assessment community is clearly required. One way to achieve this could be for WGMG to act (at least in part) as a method exploration and development service for the series of benchmark as‐sessment meetings that are planned to be held each year by ICES. In this context, a possible schedule would be as follows: 1 ) Well in advance [six months] of the WGMG meeting in October, the WGMG Chair would approach the chairs of the forthcoming benchmark and assessment meetings (and, ideally, the relevant stock assessment scientists) to discuss and determine the key methodological issues for those benchmarks. The WGMG Chair would have to be careful to ensure a focus on a limited number each year. 2 ) These discussions would provide the basis for the WGMG ToRs for the October meeting. The WGMG Chair would circulate these ToRs as widely as possible, to try to bring together a group with the skills and interests necessary to address the relevant issues. 3 ) WGMG would then meet in October (or around then) to consider ways in which to improve the methods available for the key issues. The intention would be to feed these into the subsequent benchmark meetings in the early spring of the following year ‐ but only in a general sense. Rather than do the work of the benchmark groups for them, WGMG would provide general advice on how to deal with generic issues (such as modelling discards etc.) This scheme should ensure that WGMG remains relevant and focused to the key stock assessment issues prevalent within ICES, while also allowing the flexibility for some regular themes to be continued (in as much as they are relevant to a forthcoming benchmark).

Resource requirements:

Participants: The Group is well‐manned by regular members. However, it may benefit from some wider participation to deal with specific issues arising relevant to subsequent benchmark meetings.

Secretariat facilities:

None required.

Financial: No financial implications.

Linkages to advisory committees:

ACOM has strongly supported the work of this group and has worked actively in formulating the ToRs for recent meetings. WGMG will report to ACOM at its autumn meeting in 2009.

Linkages to other committees or groups:

WGMG will report to the Resource Management Committee at the ICES ASC in 2009.

Linkages to other organizations:

There is similar work going on within ICCAT and NAFO. Coordination should be assured.

Annex 3: Recommendations

RECOMMENDATION FOR FOLLOW UP BY:

1. (Section 3.1.1) Management strategy evaluations can contain full assessment models or simpler approximations to the assessment process. Although approximated MSEs may provide similar results to full MSEs in the short term, it is recommended that full MSEs be conducted in order to evaluate whether HCRs are able to meet longer term objectives.

Expert Groups conducting management strategy evaluations.

2. (Section 3.1.2) A wider range of situations should be examined in an MSE approach to evaluating retrospective patterns in stock assessments before firm conclusions can be drawn. These include: alternative corrections for the retrospective patterns (especially changes in catch reporting), stocks with more ages and lower fishing mortality rates, and stocks with large recruitment pulses.

3. (Section 3.2) WGMG suggests that graphical displays of large tables be considered for other ICES WGs involved in MSEs as they are a faster and easier product to interpret.

Expert Groups conducting management strategy evaluations.

4. (Section 3.3) When assessments exhibit retrospective patterns, WGMG recommends that benchmark assessment groups conduct sensitivity analyses exploring different methods of addressing the problem. This would facilitate a determination of the robustness of management advice to different approaches of removing retrospective patterns.

Benchmark assessment WGs.

5. (Section 3.4) WGMG recommends further exploration of random‐effects state‐space models with simulated data containing known sources of retrospective patterns with an emphasis on examining the trade‐offs between bias and uncertainty in estimated quantities.

6. (Section 3.5) WGMG is encouraged to develop objective and consistent criteria for the acceptance of assessments with retrospective patterns. When a moderate retrospective pattern is encountered, assessment and benchmark WGs should a) consider alternative states of nature; b) investigate the performance of alternative methods for retrospective adjustments through management strategy evaluations; and c) evaluate the change in catch advice under different methods to address the retrospective pattern. Biological and fishery considerations should be explored as a basis for adjustments for retrospective patterns.

WGMG; assessment and benchmark WGs.

7. (Section 4.1) Investigate the utility of random‐effects or other suitable approaches to 1) reduce the dimension of highly parameterized fisheries models and 2) reduce bias in estimates of important population quantities, 3) reduce bias in standard errors of these estimators, and 4) improve the accuracy of confidence intervals.

8. (Section 4.3) Carry out a wider‐ranging simulation exercise for the “discards‐missing” presented. Conducting stock assessments using these models may be a possible way forward when information on discards and/or bycatch is missing for certain years and for explicit incorporation of uncertainty in estimates, and this could be considered in forthcoming benchmarks.

WGMG; benchmark WGs.

9. (Section 5)( Assessments WGs are encouraged to further explore the application of these (and other) relatively simple methods of change detection, which enable choices on inputs to forecasts to be made in an objective and replicable way. Expertise in these methods can certainly be found among ICES scientists involved in environmental surveillance, and notably in WGSAEM. ICES is in a unique position to facilitate cross‐fertilization between the methods support groups of its fisheries and environment branches, and practical ways of enhancing collaboration should be implemented.

Assessment WGs; WGSAEM.

Annex 4: Working Papers

These are to be appended by the ICES Secretariat. The group was unable to do this as the Working Papers are only available in PDF format. The documents to be appended are:

WP 13 – Carmen Fernández, Santiago Cerviño, Nélida Pérez and Ernesto Jardim: A Bayesian stock assessment model incorporating discards estimates in some years

WP 16 – Joachim Gröger: Analysis of interventions and structural breaks

A BAYESIAN STOCK ASSESSMENT MODELINCORPORATING DISCARDS ESTIMATES IN SOME YEARS

Carmen Fernandez(1), Santiago Cervino(1), Nelida Perez(1) and Ernesto Jardim(2)

(1)Instituto Espanol de Oceanografıa, Cabo Estai–Canido, Apdo. 1552, 36200 Vigo,Spain

(2)IPIMAR, Lisbon, Portugal

Abstract

We develop a Bayesian age-structured stock assessment model that takes into accountthe information available about discards and is able to handle gaps in the time series ofdiscards estimates. The mechanism for doing this is to explicitly incorporate a term inthe model reflecting mortality due to discarding, and to make appropriate assumptionsabout how this mortality may change over time. The result is a stock assessment thattakes due account of discards while, at the same time, producing a complete time series ofdiscards estimates. The method is applied to the hake stock in ICES divisions VIIIc andIXa, which experiences very high discarding on the younger ages. The stock is fished bySpain and Portugal, and for these countries there are only discards estimates in recent andnon-coincident years. Two runs of the model are performed, one assuming zero discardsand another one incorporating discards. Results are compared and discussed and possibleimplications for management briefly commented on.

ICES WGMG Report 2008 142

1. INTRODUCTION

Discarding, referring to the practice of returning (dead or dying) fish to sea, is a bigproblem in many fisheries worldwide. Discarding practices can be due to many differentreasons, for example, not being legal to land the fish (if it is below some establishedminimum landing size) or the fish having a low market value. Discarding is particularlyharmful for the sustainability of biological populations, as it tends to concentrate mostlyon young immature fish, which are killed before they have had a chance to reproduce.Regulatory efforts are constantly being made in order to try and minimise discardingpractices, with e.g. the European Commission being actively involved in that effort.

A recognised problem in many stock assessments is that they do not take discards intoaccount and implicitly assume that discards are non-existent, which can be very far fromthe truth. A main reason for not incorporating discards in assessments is that informationabout them is usually rather limited. Whereas many stock assessments use landings datagoing back a few decades, most countries have only started discards sampling programmesin the last few years. The fact that discards time series are incomplete hampers seriouslytheir incorporation into stock assessments.

In this paper we develop a Bayesian age-structured assessment model that is able toincorporate coherently discards estimates when they are available in just some years. Themain mechanism for achieving this is to explicitly incorporate a term in the model toaccount for fishing mortality due to discarding, and to make appropriate assumptionsabout how it may change over time. In this way, it is possible to get around the problemof gaps in the time series of discards estimates. This approach can also handle otherdefficiencies in the discards data, for example, the situation where estimates of discardsare available for just some of the countries (or, more generally, some fleets) that prosecutea particular fishery, or when there are estimates of discards for different fleets in yearsthat do not coincide. The stepping stone to achieve this is, again, to be able to makeappropriate assumptions about mortality due to discarding that may allow to fill in thegaps. The main product of our work is the stock assessment itself. As a by-product, acomplete time series of discards estimates is also obtained.

The model developed here falls within the general class of statistical catch-at-age models,where uncertainty in catch values is incorporated explicitly in the model by means ofa, so-called, observation equation. Statistical catch-at-age models have a long traditionin stock assessments in some parts of the world, although they are used infrequently inICES assessments. The idea of separating the landed and (various) discarded componentsof the catch, assigning a separate observation equation to each component, was alreadysuggested by Punt et al. (2006) and the ideas developed here are within their generalapproach. Other recent work in that direction is in Aarts and Poos (2008).

The paper will be built around the assessment of the hake (Merluccius merluccius) stockin ICES divisions VIIIc and IXa, essentially corresponding to the Atlantic waters of the

Iberian Peninsula. This stock is exploited by Spain and Portugal. Its status has beenassessed by ICES in 2008 using a Bayesian statistical catch-at-age model (ICES, 2008).Landings data start in year 1982. In the assessment performed by ICES, the fishery selec-tivity for each age was assumed to be constant during each of two different periods of time,with changepoint in year 1994. A fully separable model was not considered to be appro-priate given the changes that the fishery has experienced, particularly during the decadeof the 1990’s. Five separate time series were assumed to provide relative stock abundanceindices in several time periods (the so-called “tuning series”). There are estimates ofSpanish discards for years 1994, 1997, 1999, 2000, 20003-2007, and of Portuguese discardsfor 2004-2007, but these were not incorporated in the ICES assessment (which implicitlyassumed zero discards). Figure 1 displays landed (in black) and estimated discarded (inred) numbers-at-age for this stock. For discards, we only plot the last four years becausewe are adding up the Spanish and Portuguese discards (for the purpose of this plot only)and they are only available during 2004-2007. The plot clearly shows that discards of ages0-2 are very substantial, raising concern about the possible consequences of ignoring thisbig source of mortality in the assessment. In this paper, we will incorporate the availablediscards information in the assessment in a coherent fashion, and will compare the resultswith those obtained under the zero discards assumption.

Full details of the Bayesian assessment model that we propose are given in Section 2. Thebasic model (the one that would be used if zero discards were assumed) is quite similarto the one used in the ICES 2008 assessment, although some improvements have beenintroduced, particularly in what concerns the modelling of fishery selectivities-at-age overtime. However, as noted before, our main contribution here is the further developmentof the model to incorporate the available discards information in a coherent fashion. Thedetails of how we do this are provided in Subsections 2.4 and 2.5. Section 3 applies themodel to the hake stock and discusses the results. Section 4 presents conclusions andindicates directions of future work.

2. ASSESSMENT MODEL

We consider an age-structured population dynamics model, with forward dynamics. Through-out, N(y, a) will denote the number of individuals of age a in year y. For the time being,the years considered will be labelled as y = Y0, . . . , Y and the ages as a = 0, . . . , A+,where A − 1 is the last true age and A+ is a plus group, consisting of individuals of ageA and older. The fitting methodology will be Bayesian. Model details, including priordistributions, are described in the sequel.

2.1. Prior distribution on numbers of age a = 0 (recruitment) in years y = Y0, . . . , Y :

N(y, 0) ∼ Log − Normal (median = medrec, CV = cvrec), (1)

where medrec and cvrec are values to be chosen by the analyst, which should be deemed tobe realistic for the stock being assessed. The larger the value chosen for cvrec, the widerthe prior distribution of recruitment, hence indicating less prior knowledge.

2.2. Prior distribution on numbers of ages a = 1, . . . , A+ in initial year y = Y0:

N(Y0, a) ∼ Log − Normal (median = medinit(a), CV = cvinit), (2)

where the subscript “init” stands for initial year. The median of the prior distribution willlogically be age-dependent, but, for simplicitly, the same prior coefficient of variation willbe used for all ages. Again, medinit(a) and cvinit are values to be chosen by the analyst.For the hake assessment, we have applied the following idea to choose medinit(a):

Instead of choosing medinit(a) directly, think instead of a reasonable value for F -at-agein the initial year, say Finit(a), and define:

medinit(a) = medrec exp{−aM −a−1∑

Finit(j)}, if 1 ≤ a ≤ A − 1 (3)

medinit(A+) = medrec exp{−AM −A−1∑

Finit(j)}/[1 − exp{−M − Finit(A+)}] (4)

where M is the assumed value of natural mortality.

2.3. Population dynamics:

Given the yearly recruitments, numbers-at-age in the initial year, fishing mortality F (y, a)and total mortality Z(y, a) rates, the population dynamics are assumed to be determin-istic, governed by the equations:

N(y, a) = N(y − 1, a − 1) exp{−Z(y − 1, a − 1)}, if 1 ≤ a ≤ A − 1 (5)

N(y, A+) = N(y − 1, A − 1) exp{−Z(y − 1, A − 1)} + N(y − 1, A+) exp{−Z(y − 1, A+)},

where Z(y, a) = M + F (y, a).

2.4. Modelling F (y, a) to take account of discards:

In the model so far, F (y, a) represents the fishing mortality rate. We now decompose thisfishing mortality into terms corresponding to landings and discards. To focus ideas, weconsider the stock of hake, whose fishery is pursued by Spain and Portugal. For this stockwe have data on landed numbers-at-age for the whole of the stock, Spanish discardednumbers-at-age (for just some years) and Portuguese discarded numbers-at-age (for asubset of the years with data on Spanish discards). Hence, it seems natural to decomposethe total fishing mortality rate as:

F (y, a) = FL(y, a) + FSPD(y, a) + FPD(y, a), (6)

where FL, FSPD and FPD are the fishing mortality rates corresponding to landings,Spanish discards and Portuguese discards, respectively. Obviously, for other stocks, the

partition of F (y, a) in various components will be different. Certain terms in the F (y, a)decompositon could correspond to discards from different countries or from different seg-ments of the commercial fishery. The main point is that F (y, a) should be decomposedinto disjoint terms that add up to the total fishing mortality rate for the stock.

Since there are many gaps in the time series of Spanish and Portuguese discards estimates,it is not possible to estimate the full matrices of parameters FSPD(y, a) and FPD(y, a),and their dimensions will be reduced by applying separability-type assumptions. Thiscould be done in various ways, depending on what is considered more reasonable for thestock at hand. For the hake stock, currently evaluated at ICES using a separable modelwith 2 different separability time-periods, we propose a slightly different approach, whichwe consider to be more realistic:

Three time periods are considered, where in the first and third periods selectivies-at-ageare assumed to be constant over time and in the intermediate period selectivies-at-age areassumed to be autoregressive in time. The reason for this choice is that the hake fisheryis thought to have been rather stable during the first and third periods (years 1982-1990and 2001-2007), whereas it has undergone several changes (progressive enforcement of theminimum landing size, a progressive reduction of the long-line fleet and a change of meshsize) during the intermediate period (1991-2001). Hence, it is expected that selectivitywould have been changing, albeit in a reasonably smooth manner, over time during thedecade of the 1990’s.

To translate the ideas from the previous paragraph into a modelling framework, we firstdefine

FL(y, a) = f(y) rL(y, a),

FSPD(y, a) = f(y) rSPD(y, a), (7)

FPD(y, a) = f(y) rPD(y, a),

where f(y) is a common factor related to overall yearly fishing effort. Now consider

Y0 < Y1 < Y2 < Y,

where, as already indicated, Y0 and Y are the initial and final assessment years, andY1 < Y2 are the two intermediate years that are used to define the three time periods. Weindicate how rL(y, a) is modelled, with the same modelling procedure also to be appliedto rSPD(y, a) and rPD(y, a):

First period y = Y0, . . . , Y1 (selectivities-at-age constant over time):

rL(Y0, a) ∼ Log − Normal(median = medrL(a), CV = cvrL(a)) (8)

For y > Y0 : rL(y, a) = rL(Y0, a) (9)

Intermediate period y = Y1+1, . . . , Y2 (selectivities-at-age autoregressive in time accordingto an AR(1) model):

log(rL(y, a)) ∼ Normal(

mean = (1 − ρL) log(medrL(a)) + ρL log(rL(y − 1, a)),

var = (1 − ρ2L) log(1 + cvrL(a)2)

Final period y = Y2 + 1, . . . Y (selectivities-at-age constant over time):

rL(y, a) = rL(Y2, a). (11)

medrL(a), cvrL(a), medrSPD(a), cvrSPD(a), medrPD(a) and cvrPD(a) are the medians andcoefficients of variation of the marginal prior distributions of rL(y, a), rSPD(y, a) andrPD(y, a), and must be chosen by the analyst. We will explain later how the medianswere chosen for the hake stock. The width of the prior distributions (controlled by theirCVs) relative to the information content of the data (captured by the shape and widthof the likelihood function) will determine how much impact the prior distribution hason posterior results. In general, the wider the prior, the less its impact on the posteriordistribution.

The autocorrelation parameters are assigned the following prior distributions:

ρL ∼ Uniform(0, 1), ρSPD ∼ Uniform(0, 1), ρPD ∼ Uniform(0, 1). (12)

Note that as the autocorrelation parameters approach 0, selectivities-at-age approach thesituation of complete independence from one year to the next, whereas as the autocorre-lation approaches 1, selectivities-at-age approach the case where they are constant overtime.

The factor f(y), common to the three components of the fishing mortality, will alsobe estimated. An identifiability problem arises, since multiplying f(y) by any value anddividing the three selectivity terms by the same value leaves the equations in (7) invariant.To solve this we have chosen a reference age aref and set rL(y, aref) = 1 in all years. Thechoice of reference age is arbitrary, but it seems convenient for interpretability to chooseit as the first age for which there are no discards (this is assuming that discards occuronly at the younger ages). With this choice, f(y) is interpreted as the fishing mortality ofthe reference age, i.e. f(y) = F (y, aref). A Normal autoregressive in time (AR(1)) modelseems like a sensible prior distribution for log(f(y)). Hence,

f(y) ∼ Log − Normal(median = medf , CV = cvf ) (13)

and there is an autocorrelation in time parameter, ρf , for log(f(y)). Suitable values ofmedf and cvf should be chosen by the analyst. For the autocorrelation parameter wehave taken the prior distribution

ρf ∼ Uniform(0, 1). (14)

A final modelling aspect, at least in the case of the hake stock, is that Spanish discardsare assumed to be non-existent above a certain age and the same happens for Portuguesediscards (although the age for which it happens is different from the Spanish one). Hence,for the appropriate Spanish and Portuguese ages, it is assumed that rSPD(y, a) = 0 andrPD(y, a) = 0.

2.5. Observation equations for commercial landed and discarded numbers-at-age:

So far we have set up the age-structured population dynamics model assumptions, includ-ing prior distributions on the parameters to be estimated. Now we need to consider theinformation provided by the available data and how this relates to the underlying pop-ulation abundances and model parameters. This will be done via so-called observationequations, which provide stochastic links between observed data and model abundancesand parameters.

In terms of commercial fishery data, we have landed numbers-at-age, L(y, a), in all years.For the hake stock, we also have Spanish discarded numbers-at-age, SPD(y, a), althoughin only some years, and Portuguese discarded numbers-at-age, PD(y, a), in a subset of theSpanish years. Each of these three sources of information will have its own observationequation, for each age, as follows:

L(y, a) ∼ Log − Normal(

median = N(y, a)[1 − exp{−Z(y, a)}]FL(y, a)

Z(y, a), CV = cvL

SPD(y, a) ∼ Log − Normal(

median = N(y, a)[1−exp{−Z(y, a)}]FSPD(y, a)

Z(y, a), CV = cvSPD

PD(y, a) ∼ Log − Normal(

median = N(y, a)[1−exp{−Z(y, a)}]FPD(y, a)

Z(y, a), CV = cvPD

(17)The medians of the Log-Normal distributions correspond to the values of landings, Spanishand Portuguese discards according to the model, which are obtained applying Baranovcatch equation to the population abundances using the appropriate terms of the fishingmortality rates. Hence, landings are related to population abundances via FL(y, a),Spanish discards via FSPD(y, a) and Portuguese discards via FPD(y, a). This is crucialfor coherent treatment of the three sources of information, as it avoids the relativelycommon procedure of treating the landings as if they were total catch (which implicitlyassumes that discards = 0).

The coefficients of variation cvL, cvSPD and cvPD are assigned prior distributions. Forcomputational simplicitly, a Gamma prior distribution will be set on the precision (inverseof variance) of the Normal observation equations when considering the data on logarithmicscale. Hence, we will have:

1/ log(1 + cv2L) ∼ Gamma(shape = s1L, rate = s2L)

1/ log(1 + cv2SPD) ∼ Gamma(shape = s1SPD, rate = s2SPD) (18)

1/ log(1 + cv2PD) ∼ Gamma(shape = s1PD, rate = s2PD)

where the hyperparameters of the Gamma prior distributions can be chosen on the basisof the implied prior values for cvL, cvSPD and cvPD. For example, taking s1L = 4 ands2L = 0.345, the prior distribution of cvL has a median value of 0.31 and (0.20, 0.61) as its

95% central credible interval. This is the choice we have made in this paper for landings,Spanish and Portuguese discards.

2.6. Observation equations for relative indices of stock abundance:

Another important source of information to enter the stock assessment model is in the formof time series of relative indices of abundance-at-age (often denoted as “tuning series”).For the hake stock, there are two such indices obtained from Spanish and Portugueseresearch surveys and three additional tuning series corresponding to CPUE of commercialtrawl fleets.

Let Iff(y, a) be a relative abundance index corresponding to “tuning fleet” f , whichoperates during the fraction of the year (αff , βff) ⊆ (0, 1). For the years and ages inwhich the index is available, we consider the observation equation

Iff (y, a) ∼ Log − Normal(

median = qff (a)N(y, a)exp{−αffZ(y, a)} − exp{−βffZ(y, a)}

(βff − αff )Z(y, a),

CV = cvff

, (19)

The median of the Log-Normal distribution corresponds to the average stock abundanceduring the period in which the fleet operates, multiplied by the age-specific catchabilityfactor qff (a). It is assumed that the fleet catchability-at-age is constant over time, butunknown. A log-Normal prior distribution is set as follows

qff (a) ∼ Log − Normal(median = medqff , CV = cvqff ), (20)

with the values of medqff and cvqff to be chosen by the analyst. It is quite often assumedthat catchability remains constant above a certain age (so-called “q-plateau”).

Similarly to what was done for the CV of the observation equations of landings and dis-cards, a Gamma prior distribution will be set on the precision (inverse of variance) of theNormal observation equations when considering the abundance index data on logarithmicscale. Hence, we will have:

1/ log(1 + cv2ff ) ∼ Gamma(shape = s1ff , rate = s2ff ) (21)

where, again, the hyperparameters of the Gamma prior distribution can be chosen on thebasis of the implied prior values for cvqff .

A similar procedure is repeated for each of the tuning fleets.

3. RESULTS

The model described in Section 2 has been applied to the stock of hake in ICESdivisions VIIIc and IXa. The data available consist of:

* Estimated landed numbers-at-age, L(y, a), for years y = 1982, . . . , 2007 and ages a =0, . . . , 8+.

* Estimates of Spanish discarded numbers-at-age, SPD(y, a), for years y = 1994, 1997,1999, 2000, 2003-2007 (Spanish discards occur between ages 0 and 3, although for age3 they are very minor).

* Estimates of Portuguese discarded numbers-at-age, PD(y, a), for years y = 2004−2007(Portuguese discards occur between ages 0 and 6, although starting from age 3 they arevery minor).

Five tuning series:

* Indices of abundance-at-age from Spanish survey (labelled SP), for years y = 1983, . . . , 2007(excluding 1987) and ages a = 0, . . . , 4.

* Indices of abundance-at-age from Portuguese survey (labelled POCT), for years y =1989, . . . , 2007 and ages a = 0, . . . , 4 (ages 0 and 1 missing in some years).

* CPUE from A Coruna trawl fleet (labelled C85), for years y = 1985, . . . , 1993 and agesa = 2, . . . , 8+

* CPUE from A Coruna trawl fleet (labelled C94), for years y = 1994, . . . , 2007 and agesa = 3, . . . , 8+

* CPUE from Portuguese trawl fleet (labelled P95), for years y = 1995, . . . , 2007 and agesa = 2, . . . , 8+

The current assessment performed by ICES for this stock, uses all the above listed in-formation except for the discards estimates (it implicitly assumes that discards = 0).To assess the effect that accounting for discards has on the assessment, we perform tworuns of the model described in Section 2: one of them (Run 1) incorporates the discardsinformation, whereas the other one (Run 2) assumes that discards = 0.

Table 1 presents the choices made for prior distributions hyperparameter values. As ageneral rule, the prior distributions have been centred on values that were felt to bereasonable according to the current knowledge had about this stock and its fishery, while,at the same time, they were taken to be fairly wide (large CVs), so as to prevent themfrom having an unduly high influence on posterior results. For each age, the prior mediansmedrL(a), medrSPD(a) and medrPD(a) of the selectivities of landings, Spanish discardsand Portuguese discards were chosen as the proportion landed, discarded by Spain anddiscarded by Portugal for that age, averaged over the years for which there are estimatesof the three quantities (2004-2007).

Figure 2 presents stock trends in SSB (tons), recruitment (thousands of age 0 individuals)and Fbar (average F over ages 2 to 5) for the runs assuming discards = 0 (in black) and

including discards information (in red). Whereas including discards information intothe assesment has negligible impact on estimates of SSB, the effect on the recruitmentestimates is very pronounced, without even any overlap of the 90% posterior probabilityintervals between the two runs in most of the years. Fbar is slightly underestimated in therun that assumes discards = 0, with respect to the run that includes discards information.

Figures 3 to 19 all refer to the run that incorporates information on discards (Run 1).

Figure 3 presents the evolution of the commercial fishery selectivities-at-age over time.The two time periods (1982-1990, 2001-2007) of constant selectivities and the intermediateautoregressive period (1991-2001) can be seen clearly in the graphs. Over time, selectivityhas decreased for ages 0-2, increased for ages 3 and 4, remained roughly level for ages5 and 6, and decreased for ages 7 and 8+. The decrease in selectivity for the youngerages could be due to the fleet now targeting them less, since regulations no longer allowlanding them, as well as to the increase in mesh size. The decrease for ages 7 and 8+could be reflecting the reduction of the long-line component of the fishery.

Figure 4 displays the fishing mortality-at-age corresponding to discards for the stock.This is defined as FD = FSP +FPD. This mortality rate is substantial for ages 0-2 andvirtually negligible for ages 4 and older.

Figure 5 presents FD/F , which corresponds to the probability that a fish is discardedgiven that it has been caught. It is clear that this probability has increased over time forages 0 to 2. At present, practically all of the individuals caught of age 0 and about 90%and 30% of those caught of ages 1 and 2, respectively, are discarded.

Figures 6-8 present residuals of landings, Spanish and Portuguese discards, all in loga-rithmic scale. Residuals of landings do not show any worrying patterns, with the onlynoticeable feature being the larger magnitude of those corresponding to age 0. This isunsurprising because landings of this age have decreased very substantially over time andhave been very low for approximately one decade, implying more noisy landings data forthat age. Residuals of Spanish discards also look good. For these residuals, the outlierseen for age 3 in 2004 corresponds to a 0 discards estimate which, for practical purposes,was replaced by a very small value. Residuals for Portuguese discards are not lookingparticularly good, being mostly either all positive or all negative for each given age.

As a by-product of the analysis, we can estimate the discarded numbers-at-age for thestock in all of the years. This simply corresponds to the posterior distribution of

Dmod(y, a) = N(y, a)[1 − exp{−Z(y, a)}]FD(y, a)

Z(y, a),

where, as indicated before, FD = FSPD+FPD. Results are presented in Figure 9. The90% posterior probability intervals are generally very wide, reflecting large uncertaintyassociated with the discards estimates. This is unsurprising given the scarcity of discardsinput data for this stock. Nonetheless, the results correspond to the discards values that

are most coherent with the various sources of information that were input into the model(as well as with the model and prior assumptions, of course).

Figures 10-14 give the residuals of the five tuning fleets, all in logarithmic scale. Thesefigures are presented mostly for completeness. They do not show any striking or partic-ularly worrying patterns. The smaller magnitude of the residuals from tuning fleet C85(A Coruna trawl fleet during 1985-1993) indicates that the model fit follows most closelythe signal coming from this fleet.

Figures 15-19 display the prior (in red) and posterior (in black) distributions of thelog(catchability-at-age) for each of the five tuning fleets. The posterior distributionsare much more concentrated than the prior and centred at different places. This indicatesthat the prior distribution of catchabilities has not had substantial impact on the results.We note that the posterior distribution of catchability is always the same for ages 6 andolder, since we have chosen to have a plateau starting from age 6.

Figures 20-22 show again comparison of results from the run assuming discards = 0(in black, in the graphs) and the run that incorporates discards information (in red, inthe graphs). The comparison is now focused on quantities that constitute input datafor stock projections into the future, which are essential for evaluating alternative stockmanagement scenarios. In particular, we display survivors-at-age at the beginning of theyear following the assessment (Figure 20), selectivity-at-age of the commercial fishery forthe most recent period (Figure 21), the average Fbar of the last three years (Figure 22)and a stock-recruitment plot based on posterior medians (Figure 22). It is clear that thereare substantial differences in several of these quantities depending on whether discards areassumed to be 0 or information about discards is incorporated into the analysis. The effectthose differences would have when considering stock projections into the future should beexamined, although we consider this to be outside the focus of the present paper.

4. CONCLUSIONS

We have developed a Bayesian stock assessment model that is able to use discards infor-mation when they are available in just some years in a coherent manner. The result is astock assessment that takes due account of the available information about discards and,as a by-product, a complete time series of discards estimates is obtained. The basic ideaused to achieve this was to relate each piece of information (be it landings or discards)to an appropriate term of the total fishing mortality F . Gaps in the time series of dis-cards estimates were essentially filled by making appropriate assumptions about how thecorresponding terms in F may have varied over time.

The ideas were applied to the hake stock in ICES divisions VIIIc and IXa, fished by Spainand Portugal. There were Spanish discards estimates in only some years and Portuguesediscards estimates in a subset of the Spanish years. This lack of coincidience in the yearsdoes not cause methodological problems, as Spanish and Portuguese discards estimates

enter separately in the model (they are not summed in the input data). The idea couldeven be extended to having a third country (or fleet) in the fishery for which there was noinformation at all on discards, provided realistic assumptions could be made about howthe discards fishing mortality for that country or fleet was. Making such assumptions,however, may not be easy.

When comparing the results from a run that assumed zero discards with one incorporatingthe discards information, we have found (in the context of the hake stock) virtually nodifference in the estimates of SSB. By far the biggest differences were in the estimatesof recruitment (age 0 individuals), with 90% posterior probability intervals that did notoverlap in most years. Fishing mortality was always a bit lower in the run that assumedzero discards.

Continuing with the comparison of the runs with and without discards, we have also foundsubstantial changes in some of quantities that constitute the input data to make stockprojections into the future. For the hake stock, by incorporating discards we estimatelarger abundances for the younger ages, as well as larger F values, and we are also gettinga somewhat different perception of the stock-recruitment curve. The impact that thesechanges may have on predictions of future stock trajectories and how this may alter theperception of how best to manage the stock is an important subject of future research.

Since a fishing mortality term for discards is now explicitly incorporated in the model,it is possible to evaluate the impact of different levels of discarding on future populationtrends, hence providing the necessary ingredients to be able to evaluate managementstrategies concerning discards.

REFERENCES:

Aarts, G. and Poos, J.J. (2008). Comprehensive discard reconstruction and abundanceestimation using flexible selectivity functions. Manuscript.

ICES (2008). Report on the working group on the assessment of Southern Shelf stocks ofhake, monk and megrim.

Punt, A.E., Smith, D.C., Tuck, G.N. and Methot, R.D. (2006). Including discard data infisheries stock assessments: Two case studies from south-eastern Australia. Fisheries

Research, 79, 239-250.

Table 1: Prior settings for the runs performed (for Run 2, only differences with Run 1 areindicated). “NA” stands for Not Applicable.

RUN 1 RUN 2(includes discards) (discards=0)

M 0.2medrec 80000cvrec 1

Finit(a), a = 0, 1 0.3Finit(a), a > 1 0.5

cvinit 2medrL(0) 0.0003medrL(1) 0.008medrL(2) 0.61medrL(3) 0.97

medrL(4), a = 4, 5, 6 0.99medrL(7), a = 7, 8+ 1

cvrL(0) 5cvrL(a) a ≥ 1 1medrSPD(0) 0.2661 NAmedrSPD(1) 0.28 NAmedrSPD(2) 0.18 NAmedrSPD(3) 0.01 NA

rSPD(a), a > 3 0 NAcvrSPD(a), all a 1 NA

medrPD(0) 0.7336 NAmedrPD(1) 0.64 NAmedrPD(2) 0.21 NAmedrPD(3) 0.02 NA

medrPD(a), a = 4, 5, 6 0.01 NArSPD(a), a > 6 0 NAcvrPD(a), all a 1 NA

medf 0.6cvf 1s1L 4s2L 0.345

s1SPD 4 NAs2SPD 0.345 NAs1PD 4 NAs2PD 0.345 NA

s1ff , all tuning fleets 4s2ff , all tuning fleets 0.345

medqff , all tuning fleets exp(−7)cvqff , all tuning fleets 12

1985 1995 2005

00age 3

1985 1995 2005

age 8+

Numbers(thousands) landed (black) and discarded (red)

bers-at-age

(black

destim

ateddiscard

bers-at-age

1985 1990 1995 2000 2005

Recruits

1985 1990 1995 2000 2005

Fbar(2−5)

Stock trends assuming discards=0 (black) and incorporating discards (red)

atedtren

,recru

(posterior

90%prob

ability

intervals)

1985 1995 2005

F(y,0)/Fbar(y)

1985 1995 2005

F(y,1)/Fbar(y)

1985 1995 2005

F(y,2)/Fbar(y)

1985 1995 2005

F(y,3)/Fbar(y)

1985 1995 2005

F(y,4)/Fbar(y)

1985 1995 2005

F(y,5)/Fbar(y)

1985 1995 2005

F(y,6)/Fbar(y)

1985 1995 2005

F(y,7)/Fbar(y)

1985 1995 2005

F(y,8+)/Fbar(y)

Evolution of fishery age selectivity over time

Selectiv

ities-at-ageof

ercialfish

osteriorm

tervals)

1985 1995 2005

FD(y,0)

1985 1995 2005

FD(y,1)

1985 1995 2005

FD(y,2)

1985 1995 2005

FD(y,3)

1985 1995 2005

FD(y,4)

1985 1995 2005

FD(y,5)

1985 1995 2005

FD(y,6)

1985 1995 2005

0FD(y,7)

1985 1995 2005

FD(y,8+)

Discards F−at−age

mortality

todiscard

osteriorm

ability

tervals)

1985 1995 2005

FD(y,0)/F(y,0)

1985 1995 2005

FD(y,1)/F(y,1)

1985 1995 2005

FD(y,2)/F(y,2)

1985 1995 2005

FD(y,3)/F(y,3)

1985 1995 2005

FD(y,4)/F(y,4)

1985 1995 2005

FD(y,5)/F(y,5)

1985 1995 2005

FD(y,6)/F(y,6)

1985 1995 2005

0FD(y,7)/F(y,7)

1985 1995 2005

FD(y,8+)/F(y,8)

Probability of discarding caught individuals (each panel is one age)

ability

isdiscard

edgiven

osteriorm

ability

intervals)

1985 1995 2005

log( L(y,0) )

1985 1995 2005

log( L(y,1) )

1985 1995 2005

log( L(y,2) )

1985 1995 2005

log( L(y,3) )

1985 1995 2005

log( L(y,4) )

1985 1995 2005

log( L(y,5) )

1985 1995 2005

log( L(y,6) )

1985 1995 2005

log( L(y,7) )

1985 1995 2005

log( L(y,8+) )

Residuals of log(landed numbers−at−age)Figu

oflog(lan

osteriorm

ability

intervals)

1985 1990 1995 2000 2005

log( SPD(y,0) )

1985 1990 1995 2000 2005

log( SPD(y,1) )

1985 1990 1995 2000 2005

log( SPD(y,2) )

1985 1990 1995 2000 2005−

log( SPD(y,3) )

Residuals of log(Spanish discarded numbers−at−age)

oflog(S

ishdiscard

osteriorm

ability

intervals)

1985 1995 2005

log( PD(y,0) )

1985 1995 2005

log( PD(y,1) )

1985 1995 2005

log( PD(y,2) )

1985 1995 2005

log( PD(y,3) )

1985 1995 2005

log( PD(y,4) )

1985 1995 2005

log( PD(y,5) )

1985 1995 2005

log( PD(y,6) )

Residuals of log(Portuguese discarded numbers−at−age)

oflog(P

ortugu

esediscard

osteriorm

ability

intervals)

1985 1995 2005

Dmod(y,0)

1985 1995 2005

Dmod(y,1)

1985 1995 2005

Dmod(y,2)

1985 1995 2005

Dmod(y,3)

1985 1995 2005

Dmod(y,4)

1985 1995 2005

Dmod(y,5)

1985 1995 2005

Dmod(y,6)

Discarded numbers−at−age according to model

Discard

bers-at-age

accordin

(posterior

90%prob

ability

intervals)

1985 1995 2005

log( CPUE.SP(y,0) )

1985 1995 2005

log( CPUE.SP(y,1) )

1985 1995 2005

log( CPUE.SP(y,2) )

1985 1995 2005

log( CPUE.SP(y,3) )

1985 1995 2005

log( CPUE.SP(y,4) )

Residuals of log(abundance index from Spanish survey) (SP)

oflog(ab

fromSpan

(posterior

90%prob

ability

intervals)

1990 1995 2000 2005

log( CPUE.POCT(y,0) )

1990 1995 2000 2005

Residuals of log(abundance index from Portuguese survey) (POCT)

oflog(ab

fromPortu

survey

osteriorm

ability

intervals)

1986 1990

log( CPUE.C85(y,2) )

1986 1990

log( CPUE.C85(y,8+) )

Residuals of log(CPUE from Coruña trawl) (C85)

oflog(C

fleet)

(posterior

90%prob

ability

intervals)

1994 1998 2002 2006

log( CPUE.C94(y,8+) )

Residuals of log(CPUE from Coruña trawl) (C94)

oflog(C

fleet)

(posterior

90%prob

ability

intervals)

1996 2002

log( CPUE.P95(y,2) )

1996 2002

log( CPUE.P95(y,8+) )

Residuals of log(CPUE from Portuguese trawl) (P95)

oflog(C

trawlfleet)

(posterior

90%prob

ability

intervals)

−10 −8 −6 −4

log(catchability) of Spanish survey (SP)

posterior

(black

)log(catch

ability

-at-age)of

rvey(p

osteriorm

ability

intervals)

−10 −8 −6 −4

log(catchability) of Portuguese survey (POCT)

posterior

(black

)log(catch

ability

-at-age)of

survey

(posterior

90%prob

ability

intervals)

−10 −6

age 8+

log(catchability) of Coruña trawl (C85)

posterior

(black

)log(catch

ability

-at-age)of

(posterior

90%prob

ability

intervals)

−10 −8 −6 −4

age 8+

log(catchability) of Coruña trawl (C94)

posterior

(black

)log(catch

ability

-at-age)of

(posterior

90%prob

ability

intervals)

−10 −6

age 8+

log(catchability) of Portuguese trawl (P95)

posterior

(black

)log(catch

ability

-at-age)of

APortu

galtraw

lfleet

(posterior

90%prob

ability

intervals)

0e+00 2e+05

N(2008,1)

thousands

50000 150000

N(2008,2)

thousands

10000 25000 40000

N(2008,3)

thousands

6000 12000

N(2008,4)

thousands

3000 5000

N(2008,5)

thousands

1000 2000 3000

N(2008,6)

thousands

200 600 10000.

N(2008,7)

thousands

100 300 500

N(2008,8+)

thousands

Survivors assuming discards=0 (black) and incorporating discards (red)

Posterior

distrib

ivors-at-ageat

0.0 0.5 1.0 1.5

F(2007,0)/Fbar(2007)

0.0 0.4 0.8 1.2

F(2007,1)/Fbar(2007)

0.4 0.6 0.8 1.0 1.2

F(2007,2)/Fbar(2007)

1.0 1.2 1.4 1.6

F(2007,3)/Fbar(2007)

0.7 0.9 1.1 1.3

F(2007,4)/Fbar(2007)

0.8 1.0 1.2 1.4

F(2007,5)/Fbar(2007)

1.0 1.5 2.0 2.5

F(2007,6)/Fbar(2007)

0.8 1.2 1.6 2.0

F(2007,7)/Fbar(2007)

0.5 1.5 2.5 3.5

F(2007,8+)/Fbar(2007)

Selectivity−at−age during 2001−2007, assuming discards=0 (black) and incorporating discards (red)

Posterior

distrib

selectivity

-at-ageof

ercialfish

erydurin

g2001-2007

0.40 0.50 0.60

Average Fbar over 2005−2007

0 10000 20000 30000

00SSB (tons)

Stock−Recruitment

Results assuming discards=0 (black) and incorporating discards (red)

el:Posterior

distrib

eaverage

g2005-2007.

el:Sto

ecruitm

onposterior

(verticallin

curren

On the Analysis of Interventions and Structural Breaks in Time Series Preferably Using Iterative Methods

Working Document for WGMG (Woods Hole, USA, 6.-16.10.2008)

By Joachim Gröger

Institute for Sea Fisheries, Hamburg, Germany

1. Introduction to the detection of regime shifts, structural breaks and interventions A key objective in European fishery management is to maximize landings (or economic value) on a sustainable and precautionary basis. Precaution and sustainability, however, can be interpreted in various ways subject to different constraints. One important constraint is the limited understanding of the processes that influence the size of exploited fish populations and how the stock fluctuations interact with exogenous factors. In their essay on regime shifts Rothschild and Shannon (Rothschild and Shannon, 2004) pointed out that: “Multi-decadal fluctuations in fish-population abundance … are often dramatic in magnitude. … . Understanding the variability in fish populations related to regime shifts is complicated because the abundance of fish populations is driven by both environmental forcing and fishing. … . New insights into the cause … will be valuable because managers will be able to adjust fishing effort to match the productivity of the ocean environment.” Thus regime shifts in ecosystems have profound implications for sustainability. Consequently, there is a great need for specific indicators and instruments that are capable to detect regime shifts in real systems. While such problems have been considered in statistical quality control for a long time, similar questions also arise in the fisheries context. Moreover, more recently a specific need of research has emerged: given that new fish stock related data arrive more or less continuously it seems appealing to setup a monitoring approach and to check whether incoming data are consistent with a previous level or previously established relationship. Before starting to identify regime shifts there are a couple of questions that need to be addressed; this is necessary to choose the right procedure:

1. What is the specific purpose of the exercise? a. Purely detecting regime shift(s)?

i. in time series? ii. in other types of data?

b. Fitting a (good) model to the disturbed time series and by forecasting diagnose potential future effects of the shift?

c. Removing the shift(s) detected for a correction of the time series (shift correction)?

2. Of what type is the shift: shock- (impulse, pulse), step-, continuous-like (see the examples in Fig. 1 for an explanation)?

a. Is it a shift in the mean (level)? b. Is it a shift in the time trend? c. Is it a change in the variance (heteroscedasticity) (see Fig. 3b as an example?

3. How strong is the shift / effect / response relative to the “normal” fluctuation? 4. Is the underlying time series stationary (see Fig. 3b as an example)? 5. What are the assumptions / limitations of the identification methods used?

Here I basically briefly summarize the features and potential of four different classes (types) of detection instruments to fisheries problems:

Method 1: Econometric methods to detect structural breaks (Multiple regression models) Method 2: Time series methods to detect interventions (ARIMAX models). Method 3: The analysis of means (ANOM) Method 4: Illustrative graphical methods such as traffic light plots that help to identify

and locate potential changes through quantile-based colour coding of time series. It should be mentioned that methods 1. to 3. are “static” methods assuming that the location of the break or intervention is known beforehand. Given this, these methods need to be made “iterative” in that they scan the time series to find (estimate) the prospective location of a potential structural break or intervention. This can be done by using a search algorithm that basically looks for the best value of a quality-of-fit criterion; as an example see Fig. 3. I implemented those algorithms for all procedures as described below in SAS, Version 9.1.3. As candidate functions for this the mean squared error (MSE) can be used if the degrees of freedom are stable or if not so alternatively Akaike’s information criterion (AIC). It is further necessary to test at the same time whether the structural break or intervention was significant by, for instance, studying the p value(s) associated with the potential break and plotting these along with the quality-of-fit criteria. The following paragraphs summarize very briefly important features of methods 1., 2. and 4. as listed above. Method 3 will be skipped here as it belongs to the class of QC methods that will be considered more thoroughly by Benoit Mesnil in his presentation and working document. Associated with this WD a PowerPoint presentation has been created as well. Further results will be derived during the WGMG meeting by applying the methods as described below to North Sea cod (IBTS, 1st quarter, age 1 data, 1971 - 2008) and haddock data (recruitment data, 1963 - 2007). 2. Analysis of structural breaks The analysis of structural breaks has been established and originally introduced into econometric theory to test whether the structure of linear regression models fitted to economic data is actually linear. This concept is an extension of the simple multiple regression case with metric explanatory variables as it allows to piece-wisely combine linear segments with changing slopes and/or intercepts as part of one model using categorical variables. The idea here is to use preferably binary indicator (dummy) variables as explanatory variables that help to detect and mimic structural breaks in the linear relationship between the response and the exogenous variable(s). However, the segments between the structural breaks are supposed to be continuous and linear. Two principal cases need to be distinguished: those methods which allow only changes of the slope of the segments from those that allow changes both in the intercept and the slope. The first type of methods is called piecewise regression models which in fact are special cases of so-called spline functions. The second type of methods is called switching regression methods. Both types of methods can be further sub-divided into smaller sub-classes dependent on the number of break points to be identified. Having said this, we have studied on the following three versions. However, for further details regarding the theory of structural breaks as part of linear regression modelling see Lütkepohl (1993), Pindyck and Rubinfeld (1991) and Fahrmeir et al. (1996). 2.1 Piecewise linear regressions (PLR) In general piecewise linear regressions only allow the slope(s) of the “regression pieces” involved varying (not the intercept). The type of piecewise regression may be distinguished by the number of breaks assumed/modelled. In the following two examples are outlined: a

piecewise linear regression with 1 and one with 2 structural breaks. Several example types of structural breaks are given in Fig. 1 (see specifically (a), (e) to (i)). 2.1.1 PLR with 1 structural break For a persistent effect that changes its trend after the break point the piecewise regression may be specified as:

1 2 3 0 0 tC t (t t ) D=β +β +β − +ε with the β’s being the linear model parameters, ε the error term, t the time, t0 the point in time at which the break is assumed to have occurred and D0 a binary term that models the break. In such a case D0 may be formulated as

1 if t tD

0 otherwise>⎧

=⎨⎩

If the effect is rather pulse or shock-like it may be modelled as an immediate peaking effect at solely one specific point in time, i.e.

1 2 3 0 tC t D=β +β +β +ε with 00

1 if t tD

0 otherwise=⎧

=⎨⎩

2.1.2 PLR with 2 structural breaks Sometimes more than one structural break may occur. In such a case the above linear piecewise regression concept may be extended by further binary indicator variables Di, for instance, in the following way with two binary dummy variables D0 and D1

1 2 3 0 0 4 1 1 tC t (t t ) D (t t ) D=β +β +β − +β − + ε The two structural breaks at time points t0 and t1 may be, for instance, specified as continuous effects such as

1 if t tD

0 otherwise>⎧

=⎨⎩

1 if t tD

0 otherwise>⎧

=⎨⎩

2.2 Switching linear regression method The piecewise regression concept does not allow intercepts and slopes varying at the same time. If for whatever reason also the intercept may change than the switching regression concept can help to formulate such an effect. Here both the intercept and the slope are allowed to change. As an example see the following equation:

1 2 3 0 4 0 tC t D t D=β +β +β +β + ε with 00

1 if t tD

0 otherwise>⎧

=⎨⎩

3. Analysis of interventions Interventions are used for modelling events that occur at specific times. Intervention models may be seen as a special kind of regression models where the explanatory variables are binary indicator variables (dummy variables) taking on the values of 0 or 1. Furthermore, regression models may be more generally interpreted as special cases of an autoregressive integrated moving average model (ARIMA or Box-Jenkins models). This is the reason why the name “intervention model” is merely used in pure time series analysis and there specifically in the context of ARIMA modelling. As part of this, intervention models or interrupted time series models may be seen as special transfer functions in which the exogenous variable (the input series) is not a metric but an indicator variable containing discrete values that flag the occurrence of an event affecting the response series. Given this, this event is an intervention in or an interruption of the normal evolution of the response time series, which, in the absence of the intervention, is usually assumed to be a pure ARIMA process. Intervention models can be used both to model and forecast the response series and to analyze the impact of the intervention. When the focus is on estimating the effect of the intervention, the process is often called intervention analysis or interrupted time series analysis. Several example types of interventions are displayed in Fig. 1 (see specifically (a) to (d)). Two principal types of interventions may be distinguished from each other: • Impulse interventions (also known as pulse or point interventions) and • Continuing Interventions. However, continuing interventions can be further classified into: • Step interventions and • Ramp interventions. In general ARIMA models are of form ARIMA (p, q, d) in which p, q and d are the lag orders of the underlying process identified (p for the AR component, q for the MA component, d is the differentiation order). Intervention models are special cases of a more general type of ARIMA models: These are called ARIMAX models or transfer function models that can contain one or more exogeneous variables. In case of intervention models these variables are indicator or dummy variables instead of metric ones. These dummy variables will be denoted as D0 in the following. A simplified notation for intervention models thus would be ARIMAX (p, q, d, D0). It should be noted that both linear regression models as well as ARIMAX models are designed for prognostic purposes. This is the reason why these may be advantageous to standard QC or SPC methods as the first two are able to predict a future effect of the intervention. This means that specifically interventions that occur towards the end of the time series may be evaluated by diagnosing their impact on the future. This is not possible with standard QC methods. Having said this, further details regarding the theory of interventions within the context of ARIMA modelling can be looked up in Schlittgen (2001), Schlittgen and Streitberg (2001) as well as in Lütkepohl (1996). The SAS/ETS Software (1991) gives some illustrative examples how to do this type of modelling using SAS. Gröger and Rumohr (2006), Gröger et al. (2007) and Gröger and Rohlf (2007) present solutions to climate linked benthic and fisheries problems as part of more general transfer function models.

3.1 Impulse interventions This type of intervention addresses a one-time event (an impulse or pulse). To mark an intervention at that specific point in time, being denoted as t0 in the following, the input variable D0 (indicator variable, dummy variable) takes on the value of 1 only at t0 and 0 elsewhere. Intervention variables of this kind are sometimes called impulse, pulse or point functions and consequently are of following form:

1 if t tD

0 otherwise=⎧

=⎨⎩

3.2 Continuing Interventions Other interventions can be continuing, in which case the input variable flags periods before and after the intervention. Two main types may be distinguished from each other: 1. Step Interventions: Step interventions are continuing, and the input time series D0 flags

periods after the intervention. For a step intervention, before time t0, values of the intervention variables are zero and then step to a constant level thereafter, hence

1 if t tD

0 otherwise>⎧

=⎨⎩

2. Ramp Interventions: A ramp intervention is a continuing intervention that increases

linearly after the intervention point in time t0. For a ramp intervention, before time t0, values of the intervention variable D0 are zero and increase linearly thereafter, that is, proportional to time. Hence,

t t if t tD

0 otherwise− >⎧

=⎨⎩

3.3 Modelling interventions using the ARIMA approach The approach may be described by simplifying the notation using the abbreviation ARIMAX (p, q, d, D0) in which p denotes the order of the AR component, q that of the MA component, d the order of differentiation and finally D0 the exogenous binary indicator variable that is coded as given above. In contrast to the standard ARIMA modelling procedure the ARIMAX modelling steps, given the input series is binary, may be summarized as follows:

1. stabilizing a non-stationary response time series (differentiation of order d) 2. estimating a preliminary intervention model with an appropriate transfer function

form, identified as described above 3. identifying the error process for the model from plots of the autocorrelation (ACF) and

the partial autocorrelation functions (PACF) for the residuals from the preliminary model

4. estimating the parameters of the final intervention model, including the error process, by including the identified correct orders of p and q from the previous step

5. applying final model diagnostics 6. forecasting (effect scenarios).

4. A simple graphical method to detect shifts: the traffic light plot Traffic light plots are very illustrative for recognizing overall patterns in multiple time series. They display colour coded quantiles or percentiles (for instance, quintiles) of normalized variables in order to illustrate pronounced changes in their values. They provide a simple visual inspection method to observe changes in patterns such as regime shifts or oscillations and allow a simultaneous identification of correspondences between these variables when their traffic lights are plotted alongside. For an example of the usefulness of traffic light plots with regard to fisheries related problems see Gröger and Rohlf (2007). An example of traffic light plots is given in Fig. 2. 5. Some Pros and Cons In the following this section lists some of the pros and cons but does not claim completeness: • Methods 1 to 3 are more or less objective methods. Used within the time domain their

potential is nearly the same on average. • Methods 1 and 2 seem to be relatively flexible in that they allow for setting up the

detection of the shift in different ways using either one dummy variable that may be coded in different ways (as outlined according to Fig. 1 and the above equations related to D0) or at the same time multiple of these to mimic various structural breaks or interventions simultaneously.

• Methods 1 and 2 are both designed to predict the future and as such can forecast the potential future effect of the shift (structural break or intervention), may be in terms of best and/or worst case scenarios. One way to do this is to model a shift (structural break, intervention) close to the end of the time series as being persistent based on the various types given in Fig. 1. This is the reason why these two methods may be superior to simple QC methods as Method 3 is an example for. While Method 1 is a trend depicting method, Method 2 tries not only to model the trend but to reconstruct the entire time series. However, Method 1 may be considered a special case of Method 2 which is more general.

• Method 4 is a purely graphical tool and as such can give a good overall picture simultaneously over multiple time series. These plots however require some subjective interpretation by the reader but can anyway help to preliminarily locate a potential shift.

• Method 2 is more data demanding than Method 1. It however may be at the same time more informative than Method 1 in that it does not only model and predict a trend but reconstructs the whole time series.

6. References Fahrmeir L., Hamerle, A., and Tutz, G. 1996. Multivariate statistische Verfahren. Walter de Gruyter, Berlin, Germany. Gröger, J. P., and Rumohr, H. 2006. Modelling and forecasting long-term dynamics of Western Baltic macrobenthic fauna in relation to climate signals and environmental change. Journal Sea Research 55:266-277. Gröger, J. P., Winkler, H., and Rountree, R. A. 2007. Population dynamics of pikeperch (Sander lucioperca) and its linkage to fishery driven and climatic influences in a southern Baltic lagoon of the Darss-Zingst Bodden Chain. Journal of Fisheries Research 84:189–201. Gröger, J.P. and Rohlf, N. 2007. Shedding light on recruitment mysteries: internal and external signals in the stock-recruitment relationship of North Sea herring ICES CM 2007/ H:01, pp 26.

Lütkepohl, H. 1993. Introduction to multiple time-series analysis. Springer-Verlag, Berlin, Germany. Pindyck R. S., Rubinfeld, D. L 1991. Econometric Models and Forecasts. McGraw Hill, New York, 596 pp. SAS/ETS Software 1991. Applications Guide 1, Version 6, 1st Edition: Time Series Modelling and Forecasting, Financial Reporting and Loan Analysis, Cary, NC: SAS Inc., 380 pp. Schlittgen, R. 2001. Angewandte Zeitreihenanalyse. Oldenbourg Verlag, München, Germany. Schlittgen, R., and Streitberg, B. H. J. 2001. Zeitreihenanalyse. Oldenbourg Verlag, München, Germany.

7. Figures

Fig. 1. Types of effects and shifts, respectively.

Potential types of effects / shifts

a. b. c. d.

e. f. g. h. i.

Level changing effects

Trend changing effects

Fig. 2. Examples of traffic light plots displaying colour coded quintiles of the two climate indices AMO and PDO (red = max, dark blue = min).

Climate Example (AMO, PDO)

Jan DecAMO= Atlantic Multi-decadal

Oscillation (sea surface temperature (SST) index)

= Pacific Decadal Oscillation Index (SST anomalies for the Pacific Ocean to the north of 20N latitude)

Jan DecPDOAMO

Fig. 3. Shows an example on how to use an iterative method to screen a times series and detect potential changes (structural breaks, interventions). (a) displays the algorithm to move a window or dummy coded shift variable (indicator variable) over the time series to be investigated. (b) shows the births of New York in 1965 as an example for a non-stationary time series. (c) shows the end result of the screening process displayed in (a) in terms of a plot of MSE (quality-of-fit criterion used in this case) over time displaying the minimum MSE to indicate the most likely position of the structural break or intervention in time.

Timestep140 150 160 170 180

Times series

MSE, AIC, ...

• Dummy variable(s)• Window (a)

(b)(c)

ICES WGMG REPORT 2008 Reports/Expert...ICES. 2008. Report of the Working Group on Methods of Fish...

Documents

ICES/NAFO WGDEC REPORT 2015ices.dk/sites/pub/Publication Reports/Expert Group Report...ICES/NAFO WGDEC REPORT 2015 ICES A DVISORY C OMMITTEE ICES CM 2015/ACOM:27 Report of the ICES/NAFO

Report of the Workshop on the ROle of Phytobenthic ... Doccuments/CM-2008/MHC/wkphyt08.pdf · ICES. 2008. Report of the Workshop on the Role of Phytobenthic Communities in ICES Waters,

ices cooperative research report no. 307 special issue 2011 · ICES Zooplankton Status Report 2008/2009 ices cooperative research report rapport des recherches collectives no. 307

Binder1 - cpm.nioc.ircpm.nioc.ir/.../The_Introduction_Of_GigaPowers(Reservoir_Simulation... · msformations and simulation geneous porous media, Ices, 2008 ... gigapowers -POWERS:

ICES CM 2008/L:08 Coupled physical and biological models

ICES WKHIST REPORT 2008 - VLIZICES WKHIST REPORT 2008 ICES RESOURCE MANAGEMENT COMMITTEE ICES CM 2008/RMC:04 Report of the Workshop on historical data on fisheries and fish …

ICES WKADR REPORT 2008 Reports/Expert... · 2013. 2. 4. · ICES WKADR REPORT 2008 3 . 1.3 Workshop structure a nd working procedure Participants are listed in the following Section

Remote Modular Controller (RMC) RMC-100

User Manual eDANA System · RMC Your applications is successful submitted and waiting for approval by RMC (if applicable). 3. Approved by RMC RMC approved your application. 4. Rejected

ICES PGEGGS REPORT 2008 Doccuments/CM-2008/LRC/PGEGGS08.pdfICES PGEGGS REPORT 2008 ICES LIVING RESOURCES COMMITTEE ICES CM 2008/LRC:07 R EF. R MC Report of the Planning Group on North

Presentation rmc

RMC Assignment

9104-1 Rewrite Team Report to the RMC RMC Annual Meeting 2008 23 January 2008

Multicooker RMC-M10 / RMC-M20 / RMC-M30 - REDMOND · Multicookers REDMOND RMC-M10/RMC-M20/RMC-M30 allow manually ad-justing the cooking time, set by default for each program. Setting

06/19/2008© STC Rocky Mountain Chapter 1 STC RMC 2007-2008 Year-End Review

Инструкция Redmond RMC-M110, Redmond RMC-M110E · Мультиварки, скороварки, рисоварки Redmond RMC-M110, RMC-M110E: Инструкция пользователя

New ICES Advisory Committee ICES CM 2018/ACOM:59 Ref. ACOM Reports/Expert... · 2019. 12. 5. · ices wkbedpres1 report 2018 ices advisory committee ices cm 2018/acom:59 ref. acom

RMC-8356 User Manual - National Instruments · RMC-8356 User Manual RMC-8356 User Manual April 2017 376940A-01

F] F] IYæ— F] ECO F] @ ) RMC-HP2K RMC-HP3KD/RMC-HP3K RMC-HP3 MITSIBISHI @ (Blffi)

Aluminum RMC