Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
SpatialmodellingetcDavidLMiller&MarkVBravington
InternationalWhalingCommissionScientificCommittee2017
Whyarewehere/whydidwedothis?StratifiedHorvitz-Thompsonisworkhorseofmanyabundanceestimates
HowisH-Tgoingtofail?
Whendoweneedtousespatialmodels?
Whatare“Badsurveys”?
Reporthttp://converged.yt/papers/iwc-2017-ht.pdf
OverviewToday:
1. what'swrongwithH-T?
2. Spatialmodelsoverview
3. Whatcangowrongwithspatialmodels?
4. TestingdesignsinR
5. Modelcheckingforspatialmodels
Tomorrow:
1. Whatwemissed,whatishard
2. Tryouttesteronyourdata
3. Othermethods/software,futurework
4. Guidelines
PracticalitiesTrynottotalkformorethananhourwithoutabreak
Webothhavefunnyaccents,yellifyoudon'tunderstand!
Thereismaths–don'tworry
Thisisnotadistancesamplingcourse!Thismaterialusuallytakes4days+toteach
Thiswillnotprepareyoutoanalysespatialdata
BUTyoucandothisinStAndrewsthissummer!
creem2.st-andrews.ac.uk
Whyareweinterestedinspatially-explicitestimation?
Inferentialaims
PartI
Horvitz-Thompsonestimation:thegood,thebadandtheugly
Horvitz-Thompson-likeestimatorsRescalethe(flat)densityandextrapolate
aregroup/clustersizes
isthedetectionprobability(fromdistancesampling)
=N̂ study areacovered area ∑
i=1
n si
p̂ i
si
p̂ i
VarianceofH-TMultiplesourcesofrandomnessinH-Tequation:
-detectability
-dealtwithas ,encounterrate
-groupsize
p̂ i
n n/Ls
Hiddeninthisformulaisasimpleassumption
Probabilityofsamplingeverypointinthestudyareaisequal
Isthistrue?Sometimes.
If(andonlyif)thedesignisrandomised
Manyfacesofrandomisation
Whatdoesthisrandomisationgiveus?Coverageprobability
H-Testimatorassumesevencoverage
(oryoucanestimate)
Otherwisenotreallyvalid
EstimatingcoverageWecanestimatecoverageofanon-uniformdesign!
InDistance!
ExamplefromBC,Canadainthispaper:
Estimatingcoverage
AcomplexsurveyplanThomas,WilliamsandSandilands(2007)
Differentareasrequiredifferentstrategies
Zig-zags,parallellines,census
AnalysisinDistance
Sideline:alternativeterminology“Adesignisanalgorithmforlayingdownsamplersinthesurveyarea”
“Arealization(fromthatalgorithm)iscalledasurveyplan”
LenThomas(Talk@CREEM2004)
H-TestimationagainCan'testimatew/H-Tw/ocoverage
“Fixed”“designs”violateassumptions
Someanimalshave
“Deteriorate”poolingrobustnessproperty
Whatcanwedo?
ℙ(included) = 0
MoreonvarianceEncounterratevariance
Within-transectvariationcanbebad
e.g.,N-Stransect,N-Sdensitygradient
≈ / − n/Lnj lj
StratificationIfwesuspectdensitychangecanstratify!
Preorposthoc(spatialandnon-spatial)
Iamgoingtostoptalkingverysoon
SummaryH-Tisaspatialmodel(sortof)
Violatedanassumptionifnorandomness
Hardtoassesshowbadthisis
Fewsteretal(2009)andFewster(2011)givevarianceapproaches
PartII
Spatialmodels
SpatialmodelsofdistancesamplingdataCollectspatiallyreferenceddata
Whynotmakespatially-explicitmodels?
Gobeyondstratifiedestimates
Relateenvironmentalcovariatestocounts
Thisistherosypicturetalk
We'lltalkaboutthegrimrealitylater
Exampledatainthistalk
SpermwhalesofftheUSeastcoastHangoutnearcanyons,eatsquid
Surveysin2004,USeastcoast
Combinationofdatafrom2NOAAcruises
ThankstoDebiPalka,LanceGarrisonfordata.JasonRobertsfordataprep.
Exampledata
ModelformulationPurespatial,pureenvironmental,mixed?
Mayhavesomepriorknowledge
Biology/ecology
Whataredriversofdistribution?
Inferentialaim
Abundance
Ecology
DensitysurfacemodelsHedleyandBuckland(2004)
Milleretal.(2013)
Ignoringgroupsize(moreonthattomorrow)
PhysetercatodonbyNoahSchlottman
Howdowemodelthat?
SPOILERALERT:yourmodelisprobablyjustaveryfancyGLM
Generalisedadditivemodels(in1slide)Takingthepreviousexample…
somecountdistribution
( ) = exp [ + ( )]nj Ajp̂ j β0 ∑k
sk zkj
∼nj
area of segmentprobability of detection in segment(inverse) link functionmodel terms
Whataboutthosesthingys?
Covariatesspace,time,environmental(remotelysensed?)data
Modellingsmooths1-dimension:notmuchdifference
2Dmoretricky
edgeeffects
trickyboundaries
moretomorrow
Nowgoingtodosomemaths…
(ignoreatwill)
Howdowebuildthem?Functionsmadeofother,simplerfunctions
Basisfunctions,
Estimate
bk
βk
s(x) = (x)∑ Kk=1 βkbk
Straightlinesvs.interpolationWantalinethatis“close”toallthedata
Don'twantinterpolation–weknowthereis“error”
Balancebetweeninterpolationandgenerality
Howwigglyisafunction?
MakingwigglynessmatterFitneedstobepenalised
Somethinglike:
(Canalwaysre-writethisintheform )
Estimatethe termsbutpenaliseobjective
“closenesstodata”+penalty(REML/ML)
dx∫ℝ ( )s(x)∂2
∂x2
2
SββT
βk
Smoothingparameter
Sideline:GAMsareBayesianmodelsGenerally:
penaltiesareimproperpriorprecisionmatrices
(nullspacegivesimproperpriors)
Usingshrinkagesmoothers:
properpriors
empiricalBayesinterpretation
Beyondunivariatesmooths?Canbuild(anisotropic)tensorproductterms
Take2ormoreunivariateterms
Thinplateregressionsplinesallowmultivariateterms(isotropic)
SpatialsmoothingCanjustsmoothinspace
Validabundanceestimationtechnique
UsefulforEDAforenv.cov.models(hardday2!)
Notgoodforextrapolations
Basischoicecanmatter!
WhyGAMsarecool...Fancysmooths(cyclic,boundaries,…)
Fancyresponses(expfamilyandbeyond!)
Randomeffects(byequivalence)
Markovrandomfields
Correlationstructures
SeeWood(2006/2017)forahandyintro
Let'sfitamodel
dsmisbasedonmgcvbySimonWood
library(dsm)# environmental covariatesdsm_env_tw <- dsm(count~s(Depth) + s(NPP) + s(SST), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw())# spacedsm_xy_tw <- dsm(count~s(x, y), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw())
Simple!Done?
NO
Moreonmodelcheckinglater...
Predictions/abundanceestimatesGridofcovariatesmustbeavailable
Predictwithinsurveyarea
Extrapolateoutside(withcaution)
Workingonagridofcells
Plotiss(x,y) +s(Depth)Adduptogetabundance
EstimatingvarianceUncertaintyfrom:
detectionfunctionparameters
spatialmodel
Needtopropagateuncertainty!
MethodsindsmBravington,Hedley&Miller(inprep)
PlottinguncertaintyMapsofcoefficientofvariation
CVforgivenstratum(better)
Visualisationishard
CommunicatinguncertaintyAreanimationsagoodwaytodothis?
Simulatefromposteriorparameterdistribution
Somefeatures(e.g.shelf,N-Sgradient)stickout
β ∼ N( , )β̂ Σ̂
Iamgoingtostoptalkingverysoon
SummaryBuildmodelsinstages(detectionfunction+GAM)
Countsarefunctionsofcovariates
Purespatialmodels
Environmentalcovariatemodels
Mix?!
Fit/checkusingdsmMostofthetheoryisresolved,applicationsarehard
PartIII
H-Torspatialorgiveup?
SpatialmodelscanhelpSpatialmodellingcangiveubiasedabundanceests
evenwithunevencoverage
limitstoextrapolation
V.evencoverage=>HT?
“Evenness”subtle,detectabilityeffect
e.g.,weatherbadineast
Weatherordistribution?Weatherhasabigeffectondetectability
Needtorecordduringsurvey
Disambiguatebetweendistribution/detectability
PotentialconfoundingcanbeBAD
VisibilityduringPOWER2014
ThankstoHirotoMuraseandcoforthisdata!
Covariatescanmakeabigdifference!
Otherstuff
Spatialmodellingwon'tsolveallyrproblems
Designissues
Ludicrousextrapolation
Surveyplannotrobusttoweatherissues
Non-uniformdistributionwrtsampler
Migration
Spatialmodelsalonecan'tsolvetheseissues
Spatialmodellingwon'tsolveallyrproblems
Violationsofsurveyprocedure
Followinganimals
Responsivemovement
Guardingthetrackline
Groupsizeestimation
Spatialmodelsalonecan'tsolvetheseissues
Spatialmodellingwon'tsolveallyrproblems
Detectionfunctions
Notenoughobservations
UncertainspeciesID
Groupsize
Spatialmodelsalonecan'tsolvetheseissues
@kitabet
Shouldeverythingbespatial?Doyouhaveenoughobservations?
Iftheydolookgood(evencoverage,etc)
Isitworthre-analysingfromH-T?
Pointestimatessimilar?
Variancemaywellbedifferent?
Iamgoingtostoptalkingverysoon
SummarySpatialmodelsdon'tsolveallproblems
Complexmodelscanleadtocomplexissues
Recordingweatherconditionsisimportant
Youcanalwaysgiveup!
PartIV
Testingdesigns
Whatcanwedo?Takeasurveyandsimulate
IsH-Trobust?
Howdodifferentspatialmodelscompare?
Onlythinkingabouttotalabundance&CV
Softwareltdesigntester(basedonDSsimbyLauraMarshall,CREEM)
Setupsimulations,testwhatcanbedone
MostoftheworkneedstobedoneinGIS
Surveyshapefiles,covariatesetc
ImporttoR,runsmodels,showsoutput
github.com/dill/ltdesigntester
Settingupasurveysimulation
DensityGridinpolygonofstudyarea
Eitherspecifysimplegradientoruseothertoolstomakecomplexdensity
Densityasgrid
DesignGenerateusingGIS/Distance
Exporttoshapefile
DetectionfunctionFunctionalform(half-normal,hazard-rate)
Parameters(scale,shape)
Truncation
(Covariatesviamultiplefunctions,morelater)
SpecificationtosimulationGeneratemultiplerealizations
Analyseeachwithamanymodels
Differentspatial,H-T
Compareresults
TestmodelsSpatialsmoothers
thinplatespline,bs="tp"(Wood,2003)
thinplatesplinewithshrinkage,bs="ts"(Marraetal.,2011)
Duchonspline,bs="ds", m=c(1, 0.5)(Milleretal.,2014)
tensorofthinplatespline(w/andw/orotatedcovariates)
Stratifiedestimates
Horvitz-Thompson(w/andw/ocovariates)
stratifiedHorvitz-Thompson(w/andw/ocovariates)
Comparingperformance
ImportantcaveatsNomodelchecking
Dependenton“good”detectionspecs
Nogroupsizemodel
No oravailabilityg(0)
Quickexamplecodelibrary(ltdesigntester)# setup a simulationmy_sim <- build_sim(design_path="path/to/shp", dsurf=density_surface_matrix, n_grid_x=dsurf_dim_x, n_grid_y=dsurf_dim_y, n_pop=true_N, df=detection_function_specs, region="path/to/shp")# run it!res <- do_sim(nsim=number_of_sims, scenario=my_sim, pred_dat=prediction_data_frame, ...)
Wemadeabigdealaboutweatherearlier...Wecanaddcovariatestoo(aweebitunwieldyatthemoment)
Buildmultipledetectionfunctions/simsinlist()Covariatesvaryaccordingto:
logitfunctionE-W(cansetpars,2state)
setvaluesinsegmentdata(alreadyobserved)
Iamgoingtostoptalkingverysoon
SummaryWecantestmultiple(simple)scenarios
Assumptionofsimplegradients
Modelslikelywon'tworkfordifficultstuffiftheydon'tworkforsimplethings
Whatwillwork/whatwon't
Simplesummaryplots
Betterthantherest good≠
PartV
ModelcheckingforDSMs
ModelcheckingCountdistribution
Basiscomplexity
Model(term)selection
Sensitivity
Observedvs.expected
Cross-validation(replicability)
(Plusalltheusualstufffordetectionfunctions!)
CountdistributionsResponseisacount
Often,it'smostlyzero
Aggregationsoccuratscalessmallerthanspatialmodel
Wantresponsedistributionthatdealswiththat
Couldmess-upvarianceifignored
Linkedtosegmenting
Flexiblemean-variancerelationship
Negativebinomial
Estimate
Isquadraticrelationshipa“strong”assumption?
SimilartoPoisson:
Var (count) =(count) + κ(count)2
κ
Var (count) = (count)
Tweediedistribution
Commondistributionsaresub-cases:
Poisson
Gamma
inverse-Gaussian
Weareinterestedin
(here)
Var (count) = ϕ(count)q
q = 1 ⇒q = 2 ⇒q = 3 ⇒
1 < q < 2
q = 1.2, 1.3, … , 1.9
BasiscomplexityBefore:
Howbigshouldkbe?“Bigenough”Penaltytakescareoftherest
?gam.checkgivesusefuloutput(alsoresidualchecksetc)
s(x) = (x)∑ Kk=1 βkbk
gam.checktextoutputgam.check(dsm_env_tw)
Method: REML Optimizer: outer newtonfull convergence after 8 iterations.Gradient range [-3.139726e-08,2.036272e-08](score 375.9503 & scale 4.316452).Hessian positive definite, eigenvalue range [0.5725432,298.5906].Model rank = 28 / 28
Basis dimension (k) checking results. Low p-value (k-index<1) mayindicate that k is too low, especially if edf is close to k'.
k' edf k-index p-values(Depth) 9.000 4.049 0.814 0.36s(NPP) 9.000 2.846 0.779 0.04s(SST) 9.000 4.916 0.771 0.04
Tobler'sfirstlawofgeography“Everythingisrelatedtoeverythingelse,butnearthingsaremorerelatedthandistantthings”
Tobler(1970)
ImplicationsofTobler'slaw
Whatcanwedoaboutthis?Carefulinclusionofterms
Testforsensitivity(lotsofmodels)
Fitmodelsusingrobustcriteria(REML)
Testforconcurvity(mgcv::concurvity,dsm::vis.concurvity)
Termselection(approximate) values(Marra&Wood,2012)
pathdependenceissues
shrinkagemethods(Marra&Wood,2011)
ecological-leveltermselection
whichbiomassmeasure?
includespatialsmoothornot?
p
Observedvs.expectedDiagnostic–compareobservedvs.expectedcounts
Comparefordifferentcovariate/aggregations
Innextdsm,obs_exp()doesthisGoingbacktothoseroughPOWERmodels…
> obs_exp(b, "beaufort") 1 2 34Observed 3.00000 10.00000 80.00000Expected 6.97715 12.42649 83.03773
> obs_exp(b_nc, "beaufort") 1 2 34Observed 3.000000 10.00000 80.00000Expected 8.478759 17.00705 73.23535
Cross-validationHowwelldoesthemodelreproducewhatwesaw?
Leaveoutonearea,re-fitmodel,predicttonewdata
Wenger&Olden(2012)havegoodspatialexamples
Cross-validationexample
Cross-validationexample
Iamgoingtostoptalkingverysoon
2(ormore)-stagemodelsNot“cool”(statistically),but…
Multi-stagemodelsarehandy!
Understandandcheckeachpart
Splityourmodellingeffortsamongstpeople
ConclusionsThismethodologyisgeneral
Bears,birds,beercans,LochNessmonsters…
Modelsareflexible!
Linearthings,smooththings,randomeffectthings(andmore)
IfyouknowGLMs,youcangetstartedwithDSMs
Maturetheoreticalbasis,stilllotstodo
Activeusercommunity,activesoftwaredevelopment
Resources
distancesampling.org/R/
distancesampling.org/workshops/duke-spatial-2015/
Thanks!Slidesw/referencesavailableatconverged.yt
ReferencesFewster,R.M.,Buckland,S.T.,Burnham,K.P.,Borchers,D.L.,Jupp,P.E.,Laake,J.L.,etal.(2009)EstimatingtheEncounterRateVariancein
DistanceSampling.Biometrics,65,225–236.
Fewster,R.M.(2011),VarianceEstimationforSystematicDesignsinSpatialSurveys.Biometrics,67:1518–1531.
Hedley,S.L.,&Buckland,S.T.(2004).Spatialmodelsforlinetransectsampling.JournalofAgricultural,Biological,andEnvironmental
Statistics,9(2).
Marques,T.A.,Thomas,L.,Fancy,S.G.,&Buckland,S.T.(2007).Improvingestimatesofbirddensityusingmultiple-covariatedistance
sampling.TheAuk,124(4).
Marra,G.,&Wood,S.N.(2011).Practicalvariableselectionforgeneralizedadditivemodels.ComputationalStatisticsandDataAnalysis,
55(7).
Marra,G.,&Wood,S.N.(2012).CoveragePropertiesofConfidenceIntervalsforGeneralizedAdditiveModelComponents.Scandinavian
JournalofStatistics,39(1).
Wenger,S.J.andOlden,J.D.(2012)Assessingtransferabilityofecologicalmodels:anunderappreciatedaspectofstatisticalvalidation.
MethodsinEcologyandEvolution,3,260–267.
Handyawkwardquestionanswers
Don'tthrowawayyourresiduals!
gam.check
rqgam.check(DunnandSmyth,1996)
PenaltymatrixForeach calculatethepenalty
Penaltyisafunctionof
calculatedonce
smoothingparameter( )dictatesinfluence
bk
βλ SββT
Sλ
Howwigglyarethings?Wecansetbasiscomplexityor“size”( )
Maximumwigglyness
Smoothshaveeffectivedegreesoffreedom(EDF)
EDF<
Set “largeenough”
k
kk
Let'stalkaboutdetectability
Detectability
Distancesampling“Fittothehistogram”
Model:
Calculatetheaverageprobabilityofdetection:
ℙ [animal detected | animal at distance y] = g(y; θ)
= g(y; )dyp̂ 1w ∫
w
0θ̂
Distancesampling(extensions)Covariatesthataffectdetectability(Marquesetal,2007)
Perceptionbias( )(Burtetal,2014)
Availabilitybias(Borchersetal,2013)
Detectionfunctionformulations(MillerandThomas,2015)
Measurementerror(Marques,2004)
FigurefromMarquesetal(2007)
g(0) < 1
That'snotreallyhowtheoceanworks...
Availability
WecanonlyseewhalesatthesurfaceWhatproportionofthetimearetheythere?
Acoustics
Tags(DTAGsetc)
Behaviouralstudies
Fixedcorrectionto ?
ModelviafancyMarkovmodels(Borchersetal,2013)
PicturefromUniversityofStAndrewsLibrarySpecialCollections
p̂