Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
AnintroductiontoRJorgeCimentadaandBasilioMoreno
6thofJuly2019
RstudioisjustanicesoftwaretorunR!
Afactorisjustawayofsayingthatavariablehas
uniquevalues!Andtheycanbeordered.
elm<-c("Good","Bad","Medium")(elm_factor<-factor(elm,levels=c("Bad","Medium","Good"),ordered=T))
[1]GoodBadMediumLevels:Bad<Medium<Good
Thishasconsequencestable(elm)
elmBadGoodMedium111
table(elm_factor)
elm_factorBadMediumGood111
HowtoinstallR?Luckily,youguyshaveRandRstudioinstalled,soyou
don'thavetoworryaboutthis!
Butifyouwanttoinstallitathome,pleasefollow
ThatguidecanhelpyouinstallRRstudioAndswirl,apackageinwhichyoucoulddoabunchofexercisesashomework!
thisguide
WhatisR?Risaprogramminglanguagedesignedtododata
analysis,usuallyinteractive.
Rishelpfulfor..Gettingthatdarnexcel/statafileintoR(importing)Turningthatveryuglydatasetintosomethingtoworkwith(datacleaning)Automatingyourweeklyreports(automatingtasks)Analyzingdata(modeling)Creatingnicelyformatteddocuments(communicatingresults)Buildingyourowncommandstodospecificthings(functions)
Buildingverycreativegraphics
Amongmanythings…
Andso..whatisRstudio?
Andso..whatisRstudio?
Let'sgettoitthen!Risaninteractivelanguage.Thatmeansthatifyoutype
anumber,youwillgetanumber.#Input10
[1]10
#Input5
[1]5
IntroductiontoRobjectsRisalsoacalculator
TrytypingtheseoperationsinR:5+510-510*520/10(10*20)-5/2+22^3
Beforewecontinue,whattypeofoperationsarethese?
Answersinnextslide!
IntroductiontoRobjectsAdditionSubtractionMultiplicationDivisionAcombinationofallExponentiation
NumbersinRarecallednumerics.
Forexample:is.numeric(10)is.numeric(10+20)is.numeric(10/2)
IntroductiontoRobjectsHavingsinglenumbers,like10,isnotveryuseful.
Wewantsomethingsimilartoacolumnofadataset,likeageorincome.
Wecandothatwithc(),whichstandsforconcatenate.
Readthisexpressionas:concatenatethesenumbersintoasingleobject.
c(32,34,18,22,65)
[1]3234182265
IntroductiontoRobjectsWecanalsogiveitaname,likeage.
Whydidn'ttheresultgetprinted?Whereisthisageobjectat?Whatisformallytheageobject?
age<-c(32,34,18,22,65)
IntroductiontoRobjectsWejustcreatedourfirstvariable!Thetypical
SAS/Excel/Statacolumn.
InR,theseobjectsarecalled'vectors'.
Vectorscanhaveseveralflavours:Numerics(wejustsawone)LogicalsCharactersFactors
IntroductiontoRobjectsSupposetheseagesbelongtocertainpeople.Wecan
createacharactervectorwiththeirnames.
Followingthisguideline,createityourself.Createacharactervectorwithc()IncludethenamesPaul,Maria,Andres,RobertoandAliciainsidewrapeverynameinquoteslikethis“Paul”,“Maria”,etc…ThiswillmakeRunderstandthatinputascharacters.
IntroductiontoRobjectsAnswer:
Wecanalsogiveitaname,likeparticipants.
c("Paul","Maria","Andres","Robert","Alicia")
[1]"Paul""Maria""Andres""Robert""Alicia"
participants<-c("Paul","Maria","Andres","Robert","Alicia")
IntroductiontoRobjectsCharactervectorsarefilledbystrings,like“Paul”or
“Maria”.
Canwedooperationswithstrings?
Makessense..wecan'taddanyletters.
Alright,we'reset.Concatenatethenumericvectorageandparticipants.
"Paul"+"Maria"
Errorin"Paul"+"Maria":non-numericargumenttobinaryoperator
IntroductiontoRobjects
What'stheproblemwiththisresult?
c(age,participants)
[1]"32""34""18""22""65""Paul""Maria"[8]"Andres""Robert""Alicia"
ThisbreaksanRlaw!Wejoinedanumericvectorandacharactervector.
VectorscanONLYbeofoneclass.c(1,"one")#forcestocharactervector
[1]"1""one"
c(1,"1")#notethatthefirstoneisanumeric,whilethesecondisacharacter
[1]"1""1"
IntroductiontoRobjectsNow,whichofthesepeoplehasanageabove20?
That'salogicalvector.
Contrarytocharacterandnumericvectors,logicalvectorscanonlyhavethreevalues:
TRUEFALSENA(whichstandsfor“Notavailable”.)
age>20
[1]TRUETRUEFALSETRUETRUE
IntroductiontoRobjectslogicalscanbecreatedmanuallyorwithalogical
statement.
Theaboveexpressiontestsforthelogicalstatement.
Forexample,
c(TRUE,FALSE,TRUE,TRUE)
[1]TRUEFALSETRUETRUE
age<60
[1]TRUETRUETRUETRUEFALSE
3234182265TRUETRUETRUETRUEFALSE
IntroductiontoRobjectsYoucanalsowriteTorFasshortabbreviationsofTRUE
andFALSE.
Whichiscomparing:
Butbehindthescenes,TRUEandTarejusta1andFandFALSEarejusta0.
Whatistheresultofthis?
c(T,T,F,T)==c(TRUE,TRUE,FALSE,TRUE)
[1]TRUETRUETRUETRUE
TRUETRUEFALSETRUE"T""T""F""T"
T+5TRUE-5FALSE+TRUET+T-FALSE
IntroductiontoRobjectsNowthatyouknowthat..whatwouldbetheclassofthe
followingvectors?
numeric:TRUEiscoercedto1character:“FALSE”isastring,can'tbeturnedtoanumberlogical:bothelementsarelogical!numeric:FALSEiscoercedto0
c(5,TRUE)c(5,"FALSE")c(FALSE,TRUE)c(1,FALSE)
IntroductiontoRobjectsWhatdoweknowsofar?
NumericvectorsCharactervectorsLogicalvectorsHowtoassignanametothesevectorsVectorscancontainonlyoneclassofdata
What'smissing?Factors
IntroductiontoRobjectsFactorsareR'swayofstoringcategoricalvariables.
Categoriessuchas:'Male'and'Female'or'Married'and'Divorced''Good','Middle'and'High'
gender<-c("Male","Female","Male","Male","Female")
#Canbeturnedinto
gender<-factor(gender)
IntroductiontoRobjects
IntroductiontoRobjectsFactorsareusefulforsomespecificoperationslike:
ChangingorderoflevelsfortermsinmodellingChangingorderofaxislabelsinplotsAmongotherthings..
Inmanycasesyoucanusecharacterstodowhatyouwouldwantwithfactors!
IntroductiontoRobjectsNow,haveyounoticedthatwe'vebeenassigningnames
tothings?
Thenameageholdsalltheseelementsinside.Howdoweknowwhereallthevariableswe'vecreatedare?
Let'saskRwhatobjectscanbelistedfromourworkspaceorenvironment.
age
[1]3234182265
ls()
[1]"age""elm""elm_factor""gender"[5]"lgl""participants"
IntroductiontoRobjectsSofar,wehaveabunchofvariablesscatteredaround
ourworkspace.Thisisusuallynothewaytogo!Wewanttogroupsimilarthingsinthesameplace.
AdataframeisusuallytheprimarystructureofanalysisinR
our_df<-data.frame(name=participants,age=age,gender=gender,age_60=lgl)our_df
nameagegenderage_601Paul32MaleTRUE2Maria34FemaleTRUE3Andres18MaleTRUE4Robert22MaleTRUE5Alicia65FemaleFALSE
IntroductiontoRobjectsIt'simportantthatyouunderstandthethingthatdefines
adataframe.Adataframehasrowsandcolumns,moretechnicallycalleddimensions.Dataframeshavetwodimensions.
dim(our_df)
[1]54
nrow(our_df)
[1]5
ncol(our_df)
[1]4
IntroductiontoRobjectsDataframesareverydistinctivebecausetheycanhold
anytypeofvector.
Matricescannot!
Matricesareverysimilartodataframes.Theyhavesamenumberofdimensions.Youcanchooserows/columnsinsimilarways.
our_matrix<-matrix(1:20,ncol=4,nrow=5)our_matrix
[,1][,2][,3][,4][1,]161116[2,]271217[3,]381318[4,]491419[5,]5101520
IntroductiontoRobjectsFinally,we'remissingthesecretingridientthedifferentiatesbothmatricesanddataframes.
Lists
IntroductiontoRobjectsThinkoflistsasabagthatcanstoreanything.
Thisisabagthathas3objects.AcharachterAfactorAnumeric
our_list<-list(names=participants,gender=gender,age=age)our_list
$names[1]"Paul""Maria""Andres""Robert""Alicia"
$gender[1]MaleFemaleMaleMaleFemaleLevels:FemaleMale
$age[1]3234182265
IntroductiontoRobjectsThinkoutsidethebox…whenIsayanything,Imean
ANYTHING!complex_list<-list(df=our_df[1:3,],matrix=our_df[1:3,],avg_age=mean(age))complex_list
$dfnameagegenderage_601Paul32MaleTRUE2Maria34FemaleTRUE3Andres18MaleTRUE
$matrixnameagegenderage_601Paul32MaleTRUE2Maria34FemaleTRUE3Andres18MaleTRUE
$avg_age[1]34.2
IntroductiontoRobjectsTosumup,thesearethe4typesofdatastructures
availableinR.
IntroductiontoRobjectsNowI'mgonnarockyourworld…
Adataframeisalist(becauseitcanhaveanyclass)witharowandcolumndimensions.
data.frame(our_list)
namesgenderage1PaulMale322MariaFemale343AndresMale184RobertMale225AliciaFemale65
Tobecontinued….