34
Computational Statistics, R and Functions Niels Richard Hansen September 3, 2018 Loading [MathJax]/jax/output/CommonHTML/jax.js

Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ComputationalStatistics,RandFunctions

NielsRichardHansen

September3,2018

Loading[MathJax]/jax/output/CommonHTML/jax.js

Page 2: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ComputationalStatisticsProblems

Computenonparametricestimates.

Computeintegrals(andprobabilities)

Ef(X) = ∫ f(X)dP P(X ∈ A) = ∫ (X∈A ) dP .

Optimizethelikelihood.

Methods

Numericallinearalgebra.

MonteCarlointegration.

EM-algorithm,stochasticgradient.

( )

Page 3: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Example,aminoacidangles

Page 4: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

qplot(phi, psi, data = phipsi) qplot(phi, psi, data = phipsi2)

Ramachandranplot

Page 5: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

hist(phipsi$phi, prob = TRUE)rug(phipsi$phi)

hist(phipsi$psi, prob = TRUE)rug(phipsi$psi)

Example,aminoacidangles

Page 6: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

lines(density(phipsi$phi), col = "red", lwd = 2)

lines(density(phipsi$psi), col = "red", lwd = 2)

Example,aminoacidangles

Page 7: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

StatisticaltopicsofthecourseSmoothing:whatdoesdensitydo?

Howdowecomputenonparametricestimators?Howdowechoosetuningparameters?

Simulation:howdoweefficientlysimulatefromatargetdistribution?

HowdoweassessresultsfromMonteCarlomethods?Whatifwecannotcomputethedensity?

Optimization:howdowecomputetheMLE?

Whatifwecannotcomputethelikelihood?Howtodealwithverylargedatasets?

Page 8: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ComputationaltopicsofthecourseImplementation:writingstatisticalsoftware.

RdatastructuresandfunctionsS3objectorientedprogramming

Correctness:doestheimplementationdotherightthing?

testingdebuggingaccuracyofnumericalcomputations

Efficiency:minimizememoryandtimeusage.

benchmarkcodeforcomparisonprofilecodeforidentifyingbottlenecksoptimizecode(Rcpp)

Page 9: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

PrerequisitesinRGoodworkingknowledgeof:

Datastructures(vectors,lists,dataframes).

Controlstructures(loops,if-then-else).

Functioncalling.

Interactiveandscriptusages(source)ofR.

Youdon'tneedtobeanexperiencedprogrammer.

Page 10: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

AssignmentsThe8assignmentscovering4topicswillformthebackboneforthecourse.Manylecturesandpracticalswillbebuildaroundtheseassignments.

Youallneedtoregister(inAbsalon)forthepresentationofoneassignmentsolution.

Presentationsaredoneingroupsoftwo-threepersons.OnfourWednesdaystherewillbepresentationswithdiscussionandfeedback.Fortheexamyouneedtopreparefourindividualpresentations,oneforeachtopicassignment.

Page 11: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ExamForeachofthefourtopicsyouchooseoneoutoftwoassignmentstopreparefortheexam.

Theexamassessmentisbasedonyourpresentationonthebasisoftheentirecontentofthecourse.Getstartedimmediatelyandworkcontinuouslyontheassignmentsasthecourseprogresses.

Page 12: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

RprogrammingRfunctionsarefundamental.Theydon'tdoanythingbeforetheyarecalledandthecallisevaluated.

AnRfunctiontakesanumberofarguments,andwhenafunctioncallisevaluateditcomputesareturnvalue.

AnRprogramconsistsofahierarchyoffunctioncalls.Whentheprogramisexecuted,functioncallsareevaluatedandreplacedbytheirreturnvalues.

ImplementationsofRfunctionsarecollectedintosourcefiles,whichcanbeorganizedintoRpackages.

AnRscript(orRMarkdowndocument)isacollectionofRfunctioncalls,which,whenevaluated,computeadesiredresult.

Rprogrammingincludesactivitiesatmanydifferentlevelsofsophisticationandabstraction.

Page 13: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

RprogrammingRprogrammingcanbe

writingRscripts(fordataanalysis),interactivelyrunningscriptsandbuildingreportsorotheroutput.

writingRfunctionstoeaserepetitivetasks(avoidcopy-paste),toabstractcomputations,toimproveoverviewbymodularizationetc.

developingRpackagestoeasedistribution,toimproveusabilityanddocumentation,toclarifydependenciesetc.

Page 14: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

WritinganRfunctionAfunctionthatcountsthenumberofzeroesinavector.

countZero <- function(x) { nr <- 0 for(i in seq_along(x)) ## 1:length(x) wrong if length(x) is 0 nr <- nr + (x[i] == 0) ## logical coerced to numeric nr}

Theimplementationworksinmostprogramminglanguages:

Initializeacountertobe0.Loopthroughtheelementsofthevector.Incrementthecounterwheneveranelementequalszero.Returnthevalueofthecounter.

Page 15: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

TestinganRfunctiontests <- list( c(0, 0, 0, 0, 25), c(0, 1, 10, 0, 5, -4, 2.5), c(2, 1, 10), c()) ## Extreme casesapply(tests, countZero)

## [1] 4 2 0 0

Wecanmanuallyinspectthattheresultsarecorrect,orwecandoitprogrammatically.

all(sapply(tests, countZero) == c(4, 2, 0, 0))

## [1] TRUE

Page 16: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

WritinganRfunction,version2countZero2 <- function(x) sum(x == 0)

Testing:

all(sapply(tests, countZero) == sapply(tests, countZero2))

## [1] TRUE

identical( sapply(tests, countZero), sapply(tests, countZero2))

## [1] FALSE

Page 17: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Resultsareequalbutnotidentical!?typeof(sapply(tests, countZero))

## [1] "double"

countZero <- function(x) { nr <- 0L ## counter now of type integer! for(i in seq_along(x)) nr <- nr + (x[i] == 0) nr}

typeof(sapply(tests, countZero))

## [1] "integer"

Page 18: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Resultsarenowidenticalidentical( sapply(tests, countZero), sapply(tests, countZero2))

## [1] TRUE

Page 19: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

VectorizedRcodecountZero2

## function(x)## sum(x == 0)## <bytecode: 0x7faffc41ecf0>

Theexpressionx == 0testseachentryofthevectorxforbeingequalto0andreturnsavectoroflogicals.

Thesumfunctioncomputesandreturnsthesumofallelementsinavector.Logicalsarecoercedtointegers.

Thevectorizedimplementationisshortbutexpressiveandeasytounderstand.

Thevectorizedcomputationsareperformedbycompiledcode.TheyarefastertoevaluatethanRcode.

WritingvectorizedcoderequiresalargerknowledgeofRfunctions.

Page 20: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Exercises1.1,1.2,1.3

Page 21: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

FunctionsRfunctionsconsistofthreecomponents.

foo <- function(x) { x^2 + x + 1}

Thebodyofthefunction,whichisthecodeinthefunction.

Theargumentlistthatthefunctiontakes.

Theenclosingenvironmentwherethefunctionwascreated.

Page 22: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Functionsbody(foo)

## {## x^2 + x + 1## }

formals(foo) ## formal argument list

## $x

environment(foo) ## enclosing environment

## <environment: R_GlobalEnv>

Page 23: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

FunctionevaluationWhenfunctionsarecalledthebodyisevaluatedwithformalargumentsreplacedbyactualarguments(values).

foo(10) ## body evaluated with x = 10

## [1] 111

Theresultisthesameas:

{x <- 10; x^2 + x + 1}

## [1] 111

Page 24: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

FunctionbodyThefunctioncomponentscanbemanipulatedlikeotherRcomponents.

body(foo) <- quote({y*x^2 + x + 1})foo ## new body

## function (x) ## {## y * x^2 + x + 1## }

ThefunctionquotepreventsRfromevaluatingtheexpression{y*x^2 +x + 1}andsimplyreturnstheunevaluatedRexpression,whichthenreplacesthebodyofthefunctionfoo.

Page 25: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

FunctionevaluationWhathappenswhenwetrytoevaluatethenewfunction?

foo(10)

## Error in foo(10): object 'y' not found

y <- 2 ## y is in the global environmentfoo(10)

## [1] 211

Variablesthatarenotassignedinthebodyorareinthelistofformalargumentsaresearchedforinthefunctionsenclosingenvironment.

Page 26: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ChangingtheenclosingenvironmentTheenclosingenvironmentcanbechanged,andvaluescanbeassignedtovariablesinenvironments.

environment(foo) <- new.env()foo(10)

## [1] 211

assign("y", 1, environment(foo))

foo(10) ## What is the result and why?

## [1] 111

Page 27: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ChangingtheargumentlistTheargumentlistcanbechangedaswell.

formals(foo) <- alist(x = , y = )

foo(10) ## What will happen?

## Error in foo(10): argument "y" is missing, with no default

foo(10, 2) ## Better?

## [1] 211

Page 28: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

HowfunctionsworkinprincipleWhenafunctioniscalled,theactualargumentsarematchedagainsttheformalarguments.

Expressionsintheactualargumentlistareevaluatedinthecallingenvironment.

Thenthebodyofthefunctionisevaluated.

Variablesthatarenotargumentsordefinedinsidethebodyaresearchedforintheenclosingenvironment.

Thevalueofthelastexpressioninthebodyisreturnedbythefunction.

foo(10, y^2)

## [1] 411

Page 29: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ScopingrulesForavariablewedistinguishbetweenthesymbol(they)anditsvalue(2,say).Therulesforfindingthesymbolarecalledscopingrules,andRimplementslexicalscoping.

Symbolsresideinenvironments,andwhenafunctioncallisevaluatedRsearchesforsymbolsintheenvironmentwherethefunctionisdefinedandnotwhereitiscalled.

Inpractice,symbolsarerecursivelysearchedforbystartinginthefunctionsevaluationenvironment,thenitsenclosingenviromentandsoonuntiltheemptyenvironment.

Forinteractiveusagelexicalscopingcanappearstrange.Setting"globalvariables"mightnotalwayshavethe"expectedeffect".

The"expectation"istypicallyadynamicscopingrule,whichisdifficulttopredictandreasonaboutasaprogrammer.

Page 30: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Functionsasfirst-classobjectsAfunctionisanRlanguageobjectthatcanbetreatedlikeotherobjects.Itisaso-calledfirst-classobject.Thelanguagecancreateandmanipulatefunctionsduringexecution.

Thismeansthatfunctionscanbearguments.

head(optim, 6)

## ## 1 function (par, fn, gr = NULL, ..., method = c("Nelder-Mead", ## 2 "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), lower = -Inf, ## 3 upper = Inf, control = list(), hessian = FALSE) ## 4 { ## 5 fn1 <- function(par) fn(par, ...) ## 6 gr1 <- if (!is.null(gr))

Page 31: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

Functionsasfirst-classobjectsFunctionscanbereturnvalues,likeinthefunctionoperatorVectorize.

## ## 1 function (FUN, vectorize.args = arg.names, SIMPLIFY = TRUE, ## 2 USE.NAMES = TRUE) ## 3 { ## 4 ... ## 17 FUNV <- function() { ## 18 args <- lapply(as.list(match.call())[-1L], ## 19 eval, parent.frame()) ## 20 ... ## 24 do.call("mapply", c(FUN = FUN, args[dovec], ## 25 MoreArgs = list(args[!dovec]), SIMPLIFY = SIMPLIFY, ## 26 USE.NAMES = USE.NAMES)) ## 27 } ## 28 formals(FUNV) <- formals(FUN) ## 29 FUNV

Chapters10-12inAvdRcoverfunctionalprogramminginR,butwillnotbedealtwithsystematicallyinthiscourse.

Page 32: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ComputingonthelanguageRcanextractandcomputewithitsownlanguageconstructs.Thiscanbeusedformetaprogramming;writingprogramsthatwriteprograms.

Someconcreteexamplesofcomputingonthelanguageareinplotfunctionsforlabelconstruction,andinsubsetandlmforimplementingnon-standardevaluation,whichcircumventstheusualscopingrules.

TheformulainterfaceisanexampleofadomainspecificlanguageimplementedinRformodelspecificationthatusesvarioustechniquesforcomputingonthelanguage.

Chapters13-15inAvdRcoverthesetechniquesingreaterdetail,butwillnotbedealtwithsystematicallyinthiscourse.

Page 33: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ReturnvaluesAfunctioncanreturnanykindofRobjecteitherexplicitlybyareturnstatementorasthevalueofthelastevaluatedexpressioninthebody.

Manyfunctionsreturnbasicdatastructuressuchasanumber,avector,amatrixoradataframe.Somefunctionsreturnotherfunctions.

Listscanbegivenaclasslabel("integrate"below).ThisispartoftheS3objectsysteminR,whichwillbecoverednextweek.

Page 34: Computational Statistics, R and Functionsweb.math.ku.dk/~richard/courses/compstat2018/Intro_R_Functions.p… · Implementations of R functions are collected into source files, which

ReturnvaluesMultiplereturnvaluesthatdon'tfitintothebasicdatastructurescanbereturnedasalist.

integrate(sin, 0, 1)

## 0.4596977 with absolute error < 5.1e-15

str(integrate(sin, 0, 1))

## List of 5## $ value : num 0.46## $ abs.error : num 5.1e-15## $ subdivisions: int 1## $ message : chr "OK"## $ call : language integrate(f = sin, lower = 0, upper = 1)## - attr(*, "class")= chr "integrate"