Upload
blue-bridge
View
87
Download
4
Embed Size (px)
Citation preview
BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu
CMSY Workshop
Gianpaolo Coro [email protected]
Verhulst (1844) Model of Population Growth
The Schaefer Model (1954)Fmsy = ½ rmax
Bmsy = ½ k
http://onlinelibrary.wiley.com/doi/10.1111/faf.12190/full
CMSY
An Open-source software for data-limited stock assessment
https://github.com/SISTA16/cmsy
From Catch-MSY to CMSY• Catch-MSY gave robust
estimates of MSY, but biased estimates of r (too low) and k (too high).
Catch-MSY could not reliably predict biomass
CMSY overcomes the bias and gives reasonable estimates of Fmsy and Bmsy
CMSY gives reasonable estimates of biomass
Input
https://github.com/SISTA16/cmsy
https://github.com/SISTA16/cmsy/blob/master/CMSY_UserGuide_24Oct16.docx
Resilience prior r rangeHigh 0.6 – 1.5Medium 0.2 – 0.8Low 0.05 – 0.5Very low 0.015 – 0.1
stock NameEnglishName
ScientificName Source Resilience StartYear EndYear
Biomass status
beginning
Biomass status
end TypePossible
Crash
her-47d3
Herring in Sub-area IV,
Divisions VIId & IIIa (autumn-spawners)
Atlantic herring
Clupea harengus
www.ices.dk Medium 1947 2013 Good/Bad Good/Bad
Biomass/CPUE/none
No
Stock ID Year Catch Biomass/CPUEher-47d3 1947 581760 7053257her-47d3 1948 502100 6362933her-47d3 1949 508500 6070794her-47d3 1950 491700 6119555her-47d3 1951 600400 6199629her-47d3 1952 664400 6058665her-47d3 1953 698500 5950584her-47d3 1954 762900 5809471
… … … …
ID File:
Time Series File: Estimated status of the biomass at the beginning and the end of the time series
Output
Illex coindetiibroadtail shortfin squid
Analysis charts Management charts
CMSY - Approach
• Given a catch trend estimate the best pair of values for the intrinsic rate of increase (r) and the carrying capacity (k) that generated the trend
• Goal: estimate r and k.
Constraint: the Schaefer function
CMSY has a double approach: Monte Carlo Analysis and Bayesian Schaefer Model
Step 1: sample all possible r and k pairs compliant with the Schaefer function and the priors
Step 2: resample in the lower tip. We search for the mean of maximum viable r-values
Step 3: divide the tip in 25 ranges
Step 4: take the median of the non-empty ranges
Result by CMSY analysis
True valueMonte Carlo approach
𝑀𝑆𝑌=𝑟 𝑏𝑒𝑠𝑡𝑘𝑏𝑒𝑠𝑡
4
Monte Carlo Analysis
Bayesian Schaefer Analysis
• In the case the Biomass or CPUE trends are available, CMSY increases the precision of the estimation:
• Goal: estimate r and k.
Constraint: the Schaefer function
Issues
Simple curve fitting does not work
Estimate after curve fitting
1. Clustering Analysis (DBScan)
4. Viable pairs densities
3. Gaussian Mixtures2. Trapezoidal density over the best fit r-k line
Gm of the largest cluster
Simulation of r density
Search in the tip of the r-k triangle
X
Other unpromising approaches
Difficulty of the problem
At each step of the sampling process:
• The biomass values are strongly correlated between them
• An iterative fitting model should
• approximate the complete biomass curve using better and better r and k values
• produce a new biomass curve correlated to the previous biomass curve
• account for time dependency between the samples of one curve
Brain signals
Robotics
Biology
Statistics
Speech processing
Mathematics
Promising approach: Markov Chain Monte Carlo methods
MCMC and the Schaefer function
𝜃={𝛼 ,𝑘 ,𝑟 ,𝑏0 ,𝑏1 ,𝑏2 , .. ,𝑏𝑇 }
b0
b1
bT…
rk𝛼
𝑏𝑡+1=𝑏𝑡−𝑐𝑡+𝒓 𝑏𝑡 (1− 𝑏𝑡
𝒌 )𝑣𝑠
• The Schaefer formula is used as likelihood(s)• Priors are required for k and r
At each step, the MCMC produces samples for these parameters: where T is the maximum time of the biomass trend
𝜃 0={𝛼 0 ,𝑘0 ,𝑟 0 ,𝑏0 0 ,𝑏1 0 ,𝑏2 0 ,.. ,𝑏𝑇 0 }
𝜃𝑀={𝛼𝑀 ,𝑘𝑀 ,𝑟𝑀 ,𝑏0 𝑀 ,𝑏1 𝑀 ,𝑏2 𝑀 ,.. ,𝑏𝑇𝑀 }𝜃 1
𝜃2
3
𝜃4
After M steps…
Hierarchical model for the variables
Details in Coro G. Gibbs Sampling with JAGS: Behind the Scenes. Technical report, 2017, CNR PUMA, cnr.isti/2017-B5-001http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2017-B5-001&langver=it&scelta=Metadatahttps://www.researchgate.net/publication/313905185_Gibbs_Sampling_with_JAGS_Behind_the_Scenes
• Simulating a biomass trend by means of an MCMC requires the model to produce, at each step of the sampling process, a new biomass time series by means of new values assigned to model variables
• At each step the MCMC tries to simulate the whole biomass time series using new values for r and k
• The new picked values are constrained by the Schaefer function and by the prior probability distributions that we assume for the r and k variables
• MCMC accounts for these constraints during the fitting phase. After several sampling and adjustment steps, the model finds the variables values that produce the best approximation of the target biomass trend
𝜃1={𝛼1 ,𝑘 1 ,𝑟 1 ,𝑏0 1 ,𝑏11 ,𝑏21 , .. ,𝑏𝑇 1 }
𝜃𝑀={𝛼𝑀 ,𝑘𝑀 ,𝑟𝑀 ,𝑏0 𝑀 ,𝑏1 𝑀 ,𝑏2 𝑀 ,.. ,𝑏𝑇𝑀 }….
𝜃 0={𝛼 0 ,𝑘0 ,𝑟 0 ,𝑏0 0 ,𝑏1 0 ,𝑏2 0 ,.. ,𝑏𝑇 0 }
MCMC and the Schaefer function
MCMC using Gibbs Sampling
• The user takes model variables and designs a graph of the constraints between the variables
• The system writes a posterior probability density in terms of priors, likelihoods and conditionals
• The model samples variables values from each factor, using approximate or analytical forms of these factors
• At each variable sampling step, the model fixes the values of the other variables
• After several steps the values are likely to converge to the best estimate
…
Best estimate set
(Markov Chain)
Details in Coro G. Gibbs Sampling with JAGS: Behind the Scenes. Technical report, 2017, CNR PUMA, cnr.isti/2017-B5-001http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2017-B5-001&langver=it&scelta=Metadatahttps://www.researchgate.net/publication/313905185_Gibbs_Sampling_with_JAGS_Behind_the_Scenes
Step 1: consider the complete r,k space. Use the CMSY points as background reference only
Step 2: produce iteratively points that are compliant with the observed Schaefer function and the priors
Step 3: concentrate the search in the accumulation area
Step 4: take the geometric mean in the accumulation area
Bayesian Schaefer Model (BSM) estimate
proxies
1. Defining the form of the distributions of the priors was crucial!
This was done using 50 simulated stocks for which r and k were known
2. Defining the initial ranges of the parameters is important
This is done by the stock “expert” when indicating the prior knowledge in the ID file
3. A good balance was found between prior knowledge and knowledge from the data
This was done by testing the model for several years in Workshops and in focus groups
Key aspects of CMSY
CMSY on simulated data
• CMSY was tested against 50 simulated stocks where true r, k, MSY and biomass were known
• Monte Carlo analysis included the true r-k in 100% of the cases. BSM was used as coherence check
CMSY applications
ICES:WKLife IV meeting (27-31 Oct. 2014): CMSY was applied to all the data-limited stocks proposed by ICES.http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/acom/2014/WKLIFE4/wklifeIV_2014.pdf
WKLife V meeting (5-9 Oct. 2015): CMSY was applied to all the data-limited stocks proposed by ICES.http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/acom/2015/WKLIFEV/wklifeV_2015.pdf
FAO:Assessed CMSY among the best performing data-limited stocks modelshttp://www.fao.org/docrep/019/i3491e/i3491e.pdf
Is building a Web interface to produce fisheries management reports using CMSYhttp://data.d4science.org/UHZhM2pVWW1IOXRjZk9qTytQTndqaUpjamJScDg0VVVHbWJQNStIS0N6Yz0
Oceana:Based on CMSY Oceana study (on 400 stocks) found that fish catches in European waters could increase by 57% if stocks were managed sustainablyhttp://oceana.org/press-center/press-releases/oceana-study-finds-fish-catches-european-waters-could-increase-57-if
Results on European stocks
R. Froese, C. Garilao, H. Winker, G. Coro, N. Demirel, A. Tsikliras, D. Dimarchopoulou, G. Scarcella, A. Sampang-Reyes (2016) http://eu.oceana.org/sites/default/files/stockstatusreport_newversion_0.pdf
Full Oceana report and status of EU stocks
European Stocks in 2013-2015◄ Management Decision ►
Analysis of 397 stocks in European Seas and adjacent waters. Froese et al. 2016.
◄
F &
Rep
rodu
ction
& G
row
th
►
Exploitation of 397 stocks in European Seas in 2013-2015. Note overlapping of different types of overexploitation, and therefore the numbers do not add up to 100%. Froese et al. 2016
Status of 397 stocks in European Seas 2013-2015. Froese et al. 2016
Froese et al. 2016
Compliance to Common Fisheries Policy of the European Union (CFP 2013) by Ecoregion 2013-2015
1. Take the estimated biomass of the stocks in a certain region 2. Evolve the relative biomasses in time starting from values in the
neighbourhoods of B/Bmsy, F and Fmsy considering different F scenarios
3. For each evolution, cluster the B/Bmsy values and then average the values
4. Average the averages of each evolved variable, and estimate the confidence intervals
5. Plot the averaged evolutions
Producing multi-species future fisheries scenarios
𝐵𝑡+1
𝐵𝑚𝑠𝑦=
𝐵𝑡
𝐵𝑚𝑠𝑦+2𝐹𝑚𝑠𝑦
𝐵𝑡
𝐵𝑚𝑠𝑦 (1−𝐵𝑡
2𝐵𝑚𝑠𝑦 )−𝐵𝑡
𝐵𝑚𝑠𝑦𝐹
𝑡
Percentage of Stocks at or above Bmsy
Best rebuilding under the 0.5 Fmsy scenario, worst under the 0.95 Fmsy scenario
Rainer Froese – Presentation at the EU Parliament 27/02/2017
Percentage of Depleted Stocks
Best rebuilding under the 0.5 Fmsy scenario, worst under the 0.95 Fmsy scenario
Rainer Froese – Presentation at the EU Parliament 27/02/2017
Profitability
Good profits for the 0.5 – 0.8 Fmsy scenarios Low profit for the 0.95 Fmsy scenario
Rainer Froese – Presentation at the EU Parliament 27/02/2017
𝜋 𝑡=𝐹 𝑡
𝐹𝑚𝑠𝑦 ( 𝐵𝑡
𝐵𝑚𝑠𝑦−
(1− 𝜇𝑚𝑒𝑎𝑛
100 )( 𝐶𝑀𝑆𝑌 )
𝑚𝑒𝑎𝑛
( 𝐹𝐹𝑚𝑠𝑦 )𝑚𝑒𝑎𝑛
)
Analysis of current (2013 -2015) and potential catches for 397 stocks in European Seas. Because of trophic interactions, all stocks cannot support maximum yields simultaneously. Froese et al. 2016.
Comments on the multi-species application of CMSY (1/2)
Species interactions and environmental impact are implicitly considered in surplus production models by the rate of net productivity (r), which summarizes natural mortality such as caused by predation by other species, somatic growth such as modulated by available food sources, and recruitment such as impacted by environmental conditions and by parental egg production. CMSY accounts explicitly for reduced recruitment at small stock sizes*.
*Froese, N. Demirel, G. Coro, K. Kleisner, H. Winker, Estimating fisheries reference points from catch and resilience. Fish Fish., (in press) 10.1111/faf.12190,J.T. Schnute, L.J. Richards, “Surplus production models” in Handbook of Fish Biology and Fisheries, P.J.B. Hart, J.D. Reynolds, Eds. (Blackwell, 2002), vol. 2, pp. 105–126.T.J. Quinn, R.B. Deriso, Quantitative fish dynamics (Oxford University Press, NY, 1999)
Compared with age-structured models where exploitation is typically reported for a narrow range of fully selected age classes, surplus production models estimate exploitation as total catch to biomass ratio.
This is similar to using the mean exploitation rate across all age classes weighted by their respective contribution to the catch. If the catch consists to a large part of juveniles that are only partly selected by the gear, then the overall rate of fishing mortality strongly underestimates the fishing mortality of the fully selected older year classes.
In order to address the problem of underestimation of fishing mortality in fully selected age classes CMSY reduces the estimate of Fmsy as a linear function of biomass below 0.5 Bmsy.
𝐹 𝑟𝑒𝑑𝑢𝑐𝑒𝑑=2𝐵𝑡
𝐵𝑚𝑠𝑦𝐹∨
𝐵𝑡
𝐵𝑚𝑠𝑦<0.5
Comments on the multi-species application of CMSY (2/2)
A collaborative approach to CMSY
Big Data1. Large volume
2. High generation velocity
3. Large variety
4. Untrustworthyness (veracity)
5. High complexity(variability)
Big Data: a dataset with large volume, variety, generation velocity, containing complex and untrustworthy information that requires nonconventional methods to extract, manage and process information within a reasonable time.
6. Understandable value
New Science Paradigms Open Science: make scientific research, data and dissemination
accessible to all levels of an inquiring society, amateur or professional.
Keywords: Open Access, Open research, Open Notebook Science
E-Science: computationally intensive science is carried out in highly distributed network environments that use large data sets and require distributed computing and collaborative tools.
Keywords: Provenance of the scientific process, Scientific workflows
Science 2.0: process and publish large data sets using a collaborative approach. Share from raw data to experimental results and processes. Support collaborative experiments and Reproducibility-Repeatability-Reusability (R-R-R) of Science.
Keywords: collaborative and repeatable Science
Requirements for IT systems
• Support collaborative research and experimentation• Implement Reproducibility-Repeatability-Reusability of
Science• Allow sharing data, processes and findings• Grant free access to the produced scientific knowledge• Tackle Big Data challenges• Sustainability: low operational costs, low maintenance
prices• Manage heterogeneous data/processes access policies• Meet industrial processes requirements
Distributed e-Infrastructures
e-Infrastructures enable researchers at different locations across the worldto collaborate in the context of their home institutions or in national or multinational scientific initiatives.• People can work together having shared access to unique or distributed scientific facilities (including data,
instruments, computing and communications).
Examples:
Belief, http://www.beliefproject.org/OpenAire, http://www.openaire.eu/i-Marine, http://www.i-marine.eu/EU-Brazil OpenBio, http://www.eubrazilopenbio.eu/
D4Science.org – Hybrid Data Infrastructure
Unified Resource Space
Powered by gCubeEnab
les
Inte
grat
es
D4Science.org Infrastructure
WPS
Variety/Veracity VolumeVelocity/Variability
1. External Systems:• Storage• Computations• Data services
2. Integration services:• Manage external systems• Harmonise data• Host data and processes• Support adaptability
3. Infrastructure resources:• Manage security• Expose Integration services• Support information
exchange between services
Data ComputationalInfrastructures
ComputationalServices
A system of systems
Virtual Research Environments
Inte
grat
es
D4Science.org Infrastructure
Unified Resource Space
Powered by gCubeEnab
les
VRE VRE VRE
WPS
• Define sub-communities
• Allow temporary dedicated assignment of computational, storage, and data resources
• Manage policies
• Support data and information sharing
Virtual Research EnvironmentsInnovative, web-based, community-oriented, comprehensive, flexible, and
secure working environments.
• Communities are provided with applications to interact with the VRE services• Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
D4Science.org Services
Mediators / Adapters
Data Analytics Services Data Space Services
Infrastructures and Service Providers
Collaborative Services Core Services
Resources Mgr
Catalogue
HN
AAA
VRE Mgr
Social Networking Workspace Users Mgmt
Standard based (e.g. CWS)Ad-hoc mediators
Search
Access
Storage
Dashboard
Algorithms
Workflows
Browse
Publish
Curation
ResearchersD4Science supports scientists in several domains
1. More than 25 000 taxonomicstudies per monthwww.i-marine.eu
2. More than 60 000 species distribution maps produced and hostedwww.d4science.eu
3. Used to build a pan- European geothermal energy mapwww.egip.d4science.org
4. Processing and management of heterogeneous environmental and Earth system data
www.envriplus.eu
5. Enhances communication and exchange in Linguistic Studies, Humanities, Cultural Heritage, History and Archaeologywww.parthenos-project.eu
Society and citizens1. CNR Smart Campus - PISA: a Smart City experiment to optimise the
use of resources and reduce the environmental impact, whilst increasing the quality of life and work. www.smart-applications.area.pi.cnr.it
2. SoBigData EU Prj. : create the Social Mining & Big Data Ecosystem, a research infrastructure for ethic-sensitive scientific discoveries and advanced applications of social data mining. www.sobigdata.eu
data storage and mining of the large data information flow on parking, buildings and mobility
computational platform and cloud storage to integrate data mining processes and host data and results, VA enabler
Policy Makers
1. D4Science hosts and runs the CMSY model to assess the health status of fisheries stocks
http://www.cnr.it/news/index/news/id/5987
CMSY model
2. D4Science supports the identification of Marine Protected Areas to reduce adverse impact of human activities (e.g. fishing, aquaculture, tourism) on ecosystems, and to ensure these activities are properly embedded in policy frameworks.
http://www.bluebridge-vres.eu/services/protected-area-impact-maps
Companies
1. Predict aquaculture revenue and business development
www.bluebridge-vres.eu
2. Host and process satellite data from Copernicus
3. Collect logs from experts and centralize the network of information
4. Self-service integration of algorithms to enable Cloud computation
services.d4science.org
EducationLecture-style: the course topics stress is different depending on the audience
Interactive: after each explained topic, students do experiments
Experimental: students reproduce the experiment shown by the teacher and possibly repeat it on their own data
Social: students communicate via messaging or VRE discussion panel
• 1 course/yearIn Pisa
• 1 course/yearIn Paris
• 12 coursesIn Copenhagen
www.bluebridge-vres.eu
International Council for the Exploration of the Sea
• 38 coursesAll over the world+1000 attendees
Numbers• +2000 scientists in 44 countries, • integrating +50 heterogeneous
data providers, • executing +25,000
processes/month,• providing access to over a billion
quality records in repositories worldwide,
• 99,7% service availability.• +50 VREs hosted
StatisticalManager
D4ScienceComputational
FacilitiesSharing
Setup and execution
Computing Platform
Coro, G., Candela, L., Pagano, P., Italiano, A., & Liccardo, L. (2015). Parallelizing the execution of native data mining algorithms for computational biology. Concurrency and Computation: Practice and Experience, 27(17), 4630-4644.
Collaborative experiments
WS
Shared online folders
Inputs
Outputs
Results
Computational system
In the e-Infrastructure
Through third party software
Process description:http://dataminer-d-d4s.d4science.org/wps/WebProcessingService?Request=DescribeProcess&Service=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-42fe-81e0-bdecb1e8074a&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.generators.CMSY
Process execution:http://dataminer-d-d4s.d4science.org/wps/WebProcessingService?request=Execute&service=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-42fe-81e0-bdecb1e8074a&lang=en-US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchserver.mappedclasses.generators.CMSY&DataInputs=IDsFile=http://goo.gl/9rg3qK;StocksFile=http://goo.gl/Mp2ZLY;SelectedStock=HLH_M07
R/JAVA ClientGuide:https://wiki.gcube-system.org/gcube/How_to_Interact_with_the_Statistical_Manager_by_client#WPS_Client
InterfacesWeb Processing ServiceWeb Interfaces
QGIS
WPS
REST
I.S.
Infrastructure
Infrastructure resources
Geospatial data
External infra.
WPS
Advantages of integrations The process is available as-a-Service Invoked via communication standards Higher computational capabilities Automatic creation of a Web interface Provenance management Storage of results on a high-availability system Collaboration and sharing Re-usability, e.g. from other software (e.g. QGIS)
Innovation through integrationVision: integration, sharing, and remote hosting help informing people and taking decisions
Using CMSY
https://i-marine.d4science.org/group/drumfish/drumfish
Thank you!