Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ComputationalStatistics,RandFunctions
NielsRichardHansen
September3,2018
Loading[MathJax]/jax/output/CommonHTML/jax.js
ComputationalStatisticsProblems
Computenonparametricestimates.
Computeintegrals(andprobabilities)
Ef(X) = ∫ f(X)dP P(X ∈ A) = ∫ (X∈A ) dP .
Optimizethelikelihood.
Methods
Numericallinearalgebra.
MonteCarlointegration.
EM-algorithm,stochasticgradient.
( )
Example,aminoacidangles
qplot(phi, psi, data = phipsi) qplot(phi, psi, data = phipsi2)
Ramachandranplot
hist(phipsi$phi, prob = TRUE)rug(phipsi$phi)
hist(phipsi$psi, prob = TRUE)rug(phipsi$psi)
Example,aminoacidangles
lines(density(phipsi$phi), col = "red", lwd = 2)
lines(density(phipsi$psi), col = "red", lwd = 2)
Example,aminoacidangles
StatisticaltopicsofthecourseSmoothing:whatdoesdensitydo?
Howdowecomputenonparametricestimators?Howdowechoosetuningparameters?
Simulation:howdoweefficientlysimulatefromatargetdistribution?
HowdoweassessresultsfromMonteCarlomethods?Whatifwecannotcomputethedensity?
Optimization:howdowecomputetheMLE?
Whatifwecannotcomputethelikelihood?Howtodealwithverylargedatasets?
ComputationaltopicsofthecourseImplementation:writingstatisticalsoftware.
RdatastructuresandfunctionsS3objectorientedprogramming
Correctness:doestheimplementationdotherightthing?
testingdebuggingaccuracyofnumericalcomputations
Efficiency:minimizememoryandtimeusage.
benchmarkcodeforcomparisonprofilecodeforidentifyingbottlenecksoptimizecode(Rcpp)
PrerequisitesinRGoodworkingknowledgeof:
Datastructures(vectors,lists,dataframes).
Controlstructures(loops,if-then-else).
Functioncalling.
Interactiveandscriptusages(source)ofR.
Youdon'tneedtobeanexperiencedprogrammer.
AssignmentsThe8assignmentscovering4topicswillformthebackboneforthecourse.Manylecturesandpracticalswillbebuildaroundtheseassignments.
Youallneedtoregister(inAbsalon)forthepresentationofoneassignmentsolution.
Presentationsaredoneingroupsoftwo-threepersons.OnfourWednesdaystherewillbepresentationswithdiscussionandfeedback.Fortheexamyouneedtopreparefourindividualpresentations,oneforeachtopicassignment.
ExamForeachofthefourtopicsyouchooseoneoutoftwoassignmentstopreparefortheexam.
Theexamassessmentisbasedonyourpresentationonthebasisoftheentirecontentofthecourse.Getstartedimmediatelyandworkcontinuouslyontheassignmentsasthecourseprogresses.
RprogrammingRfunctionsarefundamental.Theydon'tdoanythingbeforetheyarecalledandthecallisevaluated.
AnRfunctiontakesanumberofarguments,andwhenafunctioncallisevaluateditcomputesareturnvalue.
AnRprogramconsistsofahierarchyoffunctioncalls.Whentheprogramisexecuted,functioncallsareevaluatedandreplacedbytheirreturnvalues.
ImplementationsofRfunctionsarecollectedintosourcefiles,whichcanbeorganizedintoRpackages.
AnRscript(orRMarkdowndocument)isacollectionofRfunctioncalls,which,whenevaluated,computeadesiredresult.
Rprogrammingincludesactivitiesatmanydifferentlevelsofsophisticationandabstraction.
RprogrammingRprogrammingcanbe
writingRscripts(fordataanalysis),interactivelyrunningscriptsandbuildingreportsorotheroutput.
writingRfunctionstoeaserepetitivetasks(avoidcopy-paste),toabstractcomputations,toimproveoverviewbymodularizationetc.
developingRpackagestoeasedistribution,toimproveusabilityanddocumentation,toclarifydependenciesetc.
WritinganRfunctionAfunctionthatcountsthenumberofzeroesinavector.
countZero <- function(x) { nr <- 0 for(i in seq_along(x)) ## 1:length(x) wrong if length(x) is 0 nr <- nr + (x[i] == 0) ## logical coerced to numeric nr}
Theimplementationworksinmostprogramminglanguages:
Initializeacountertobe0.Loopthroughtheelementsofthevector.Incrementthecounterwheneveranelementequalszero.Returnthevalueofthecounter.
TestinganRfunctiontests <- list( c(0, 0, 0, 0, 25), c(0, 1, 10, 0, 5, -4, 2.5), c(2, 1, 10), c()) ## Extreme casesapply(tests, countZero)
## [1] 4 2 0 0
Wecanmanuallyinspectthattheresultsarecorrect,orwecandoitprogrammatically.
all(sapply(tests, countZero) == c(4, 2, 0, 0))
## [1] TRUE
WritinganRfunction,version2countZero2 <- function(x) sum(x == 0)
Testing:
all(sapply(tests, countZero) == sapply(tests, countZero2))
## [1] TRUE
identical( sapply(tests, countZero), sapply(tests, countZero2))
## [1] FALSE
Resultsareequalbutnotidentical!?typeof(sapply(tests, countZero))
## [1] "double"
countZero <- function(x) { nr <- 0L ## counter now of type integer! for(i in seq_along(x)) nr <- nr + (x[i] == 0) nr}
typeof(sapply(tests, countZero))
## [1] "integer"
Resultsarenowidenticalidentical( sapply(tests, countZero), sapply(tests, countZero2))
## [1] TRUE
VectorizedRcodecountZero2
## function(x)## sum(x == 0)## <bytecode: 0x7faffc41ecf0>
Theexpressionx == 0testseachentryofthevectorxforbeingequalto0andreturnsavectoroflogicals.
Thesumfunctioncomputesandreturnsthesumofallelementsinavector.Logicalsarecoercedtointegers.
Thevectorizedimplementationisshortbutexpressiveandeasytounderstand.
Thevectorizedcomputationsareperformedbycompiledcode.TheyarefastertoevaluatethanRcode.
WritingvectorizedcoderequiresalargerknowledgeofRfunctions.
Exercises1.1,1.2,1.3
FunctionsRfunctionsconsistofthreecomponents.
foo <- function(x) { x^2 + x + 1}
Thebodyofthefunction,whichisthecodeinthefunction.
Theargumentlistthatthefunctiontakes.
Theenclosingenvironmentwherethefunctionwascreated.
Functionsbody(foo)
## {## x^2 + x + 1## }
formals(foo) ## formal argument list
## $x
environment(foo) ## enclosing environment
## <environment: R_GlobalEnv>
FunctionevaluationWhenfunctionsarecalledthebodyisevaluatedwithformalargumentsreplacedbyactualarguments(values).
foo(10) ## body evaluated with x = 10
## [1] 111
Theresultisthesameas:
{x <- 10; x^2 + x + 1}
## [1] 111
FunctionbodyThefunctioncomponentscanbemanipulatedlikeotherRcomponents.
body(foo) <- quote({y*x^2 + x + 1})foo ## new body
## function (x) ## {## y * x^2 + x + 1## }
ThefunctionquotepreventsRfromevaluatingtheexpression{y*x^2 +x + 1}andsimplyreturnstheunevaluatedRexpression,whichthenreplacesthebodyofthefunctionfoo.
FunctionevaluationWhathappenswhenwetrytoevaluatethenewfunction?
foo(10)
## Error in foo(10): object 'y' not found
y <- 2 ## y is in the global environmentfoo(10)
## [1] 211
Variablesthatarenotassignedinthebodyorareinthelistofformalargumentsaresearchedforinthefunctionsenclosingenvironment.
ChangingtheenclosingenvironmentTheenclosingenvironmentcanbechanged,andvaluescanbeassignedtovariablesinenvironments.
environment(foo) <- new.env()foo(10)
## [1] 211
assign("y", 1, environment(foo))
foo(10) ## What is the result and why?
## [1] 111
ChangingtheargumentlistTheargumentlistcanbechangedaswell.
formals(foo) <- alist(x = , y = )
foo(10) ## What will happen?
## Error in foo(10): argument "y" is missing, with no default
foo(10, 2) ## Better?
## [1] 211
HowfunctionsworkinprincipleWhenafunctioniscalled,theactualargumentsarematchedagainsttheformalarguments.
Expressionsintheactualargumentlistareevaluatedinthecallingenvironment.
Thenthebodyofthefunctionisevaluated.
Variablesthatarenotargumentsordefinedinsidethebodyaresearchedforintheenclosingenvironment.
Thevalueofthelastexpressioninthebodyisreturnedbythefunction.
foo(10, y^2)
## [1] 411
ScopingrulesForavariablewedistinguishbetweenthesymbol(they)anditsvalue(2,say).Therulesforfindingthesymbolarecalledscopingrules,andRimplementslexicalscoping.
Symbolsresideinenvironments,andwhenafunctioncallisevaluatedRsearchesforsymbolsintheenvironmentwherethefunctionisdefinedandnotwhereitiscalled.
Inpractice,symbolsarerecursivelysearchedforbystartinginthefunctionsevaluationenvironment,thenitsenclosingenviromentandsoonuntiltheemptyenvironment.
Forinteractiveusagelexicalscopingcanappearstrange.Setting"globalvariables"mightnotalwayshavethe"expectedeffect".
The"expectation"istypicallyadynamicscopingrule,whichisdifficulttopredictandreasonaboutasaprogrammer.
Functionsasfirst-classobjectsAfunctionisanRlanguageobjectthatcanbetreatedlikeotherobjects.Itisaso-calledfirst-classobject.Thelanguagecancreateandmanipulatefunctionsduringexecution.
Thismeansthatfunctionscanbearguments.
head(optim, 6)
## ## 1 function (par, fn, gr = NULL, ..., method = c("Nelder-Mead", ## 2 "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf, ## 3 upper = Inf, control = list(), hessian = FALSE) ## 4 { ## 5 fn1 <- function(par) fn(par, ...) ## 6 gr1 <- if (!is.null(gr))
Functionsasfirst-classobjectsFunctionscanbereturnvalues,likeinthefunctionoperatorVectorize.
## ## 1 function (FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, ## 2 USE.NAMES = TRUE) ## 3 { ## 4 ... ## 17 FUNV <- function() { ## 18 args <- lapply(as.list(match.call())[-1L], ## 19 eval, parent.frame()) ## 20 ... ## 24 do.call("mapply", c(FUN = FUN, args[dovec], ## 25 MoreArgs = list(args[!dovec]), SIMPLIFY = SIMPLIFY, ## 26 USE.NAMES = USE.NAMES)) ## 27 } ## 28 formals(FUNV) <- formals(FUN) ## 29 FUNV
Chapters10-12inAvdRcoverfunctionalprogramminginR,butwillnotbedealtwithsystematicallyinthiscourse.
ComputingonthelanguageRcanextractandcomputewithitsownlanguageconstructs.Thiscanbeusedformetaprogramming;writingprogramsthatwriteprograms.
Someconcreteexamplesofcomputingonthelanguageareinplotfunctionsforlabelconstruction,andinsubsetandlmforimplementingnon-standardevaluation,whichcircumventstheusualscopingrules.
TheformulainterfaceisanexampleofadomainspecificlanguageimplementedinRformodelspecificationthatusesvarioustechniquesforcomputingonthelanguage.
Chapters13-15inAvdRcoverthesetechniquesingreaterdetail,butwillnotbedealtwithsystematicallyinthiscourse.
ReturnvaluesAfunctioncanreturnanykindofRobjecteitherexplicitlybyareturnstatementorasthevalueofthelastevaluatedexpressioninthebody.
Manyfunctionsreturnbasicdatastructuressuchasanumber,avector,amatrixoradataframe.Somefunctionsreturnotherfunctions.
Listscanbegivenaclasslabel("integrate"below).ThisispartoftheS3objectsysteminR,whichwillbecoverednextweek.
ReturnvaluesMultiplereturnvaluesthatdon'tfitintothebasicdatastructurescanbereturnedasalist.
integrate(sin, 0, 1)
## 0.4596977 with absolute error < 5.1e-15
str(integrate(sin, 0, 1))
## List of 5## $ value : num 0.46## $ abs.error : num 5.1e-15## $ subdivisions: int 1## $ message : chr "OK"## $ call : language integrate(f = sin, lower = 0, upper = 1)## - attr(*, "class")= chr "integrate"