8/19/2019 04a Functions for NAs Strings Etc
1/17
Data Analysis & DataScience with R
Functions for dealing with NAs,NULLs, dates, strings, regular
expressions, etc.
y !arin Fotache
Al.I. Cuza University of Iași
Faculty of Economics and Business Administration
Department of Accounting, Information ystems andtatistics
8/19/2019 04a Functions for NAs Strings Etc
2/17
R script associated with thispresentation
!"a#functions#for#$As#strings#etc.%
&ttp'(()drv.ms()E)m*)i
http://1drv.ms/1E1m81ihttp://1drv.ms/1E1m81i
8/19/2019 04a Functions for NAs Strings Etc
3/17
"e# sites with R tutorials forsyste$ functions
&ttp'((+++.sr.&am.ac.u-(a/rs(%(r0function#list.&tmlort
&ttp'((+++.sr.&am.ac.u-(a/rs(%(r0functio
n#list.&tml
http://www.sr.bham.ac.uk/~ajrs/R/r-function_list.htmlhttp://www.sr.bham.ac.uk/~ajrs/R/r-function_list.htmlhttp://www.sr.bham.ac.uk/~ajrs/R/r-function_list.htmlhttp://www.sr.bham.ac.uk/~ajrs/R/r-function_list.html
8/19/2019 04a Functions for NAs Strings Etc
4/17
%wo $ain types of $issing alues
$A'◦ Stands for Not Available
◦ Is the equivalent of NULL in relational databases
◦ When importing data from Excel, tabdelimited files, etc!, usuall"
un#no$n values are represented b" NA%s
◦ Not to be confounded $ith &NA& string '$hich sometimes occurs
$hen importing(
◦ )ain function* is.na()
$U11'
◦ +ompletel" different from NULLs in relational databases
◦ Within a vector, an element can be NA but not NULL 'NULL is
atomic( if used inside a vector, a NULL element simpl"
dissapears
8/19/2019 04a Functions for NAs Strings Etc
5/17
Function is.na
Create a very simple vector'
> y is.na(y)
[1] FALSE FALSE FALSE TRUE
8/19/2019 04a Functions for NAs Strings Etc
6/17
Function na.fail
Function na.fail c&ec-s if t&ere are NA values in a dataset2na.fail +ill generate an error if t&ere is at least one NA
+it&in one of t&e columns of t&e data set
Data frames student'gi, patientdata, $pi,
Fuel()ciency.new, %oyota*orolla do not containNA
values
> na.fail(st!"nt#$i) na%" a$" sc&'las&i la*#ass"ss%"nt final#$a!"
1++1 '"sc . asil" 23 S'cial /in" 0.++
1++2 an's . A!iana 10 St!i1 F'at" *in" 0.1++3 4'5ac6 . 's"f 21 St!i2 E7c"l"nt 0.8
1++ /a*a!a$ . 9aia 22 9"it /in" 0.++
1++ ' . 'n 31 St!i1 Sla* :.++
8/19/2019 04a Functions for NAs Strings Etc
7/17
Function na.fail +cont.
Data frame leadership contains at least one $A value,so na.fail +ill generate an error'
> na.fail(l"a!"s&i)
E' in na.fail.!"falt(l"a!"s&i) ; %issin$5al"s in '*"ct
> l"a!"s&i %ana$" !at" c'nty $"n!" a$" =1 =2 =3= =
1 1 2+1+1+2 US 9 32
2 2 1001+2? US F 3 2
3 3 10?1+1 U4 F 2 3 2
2+++121+ U4 9 0 3 3
8/19/2019 04a Functions for NAs Strings Etc
8/17
*hec2 NA3s for range ofele$ents
Display, for eac& student, if t&e e0mail is missing ornot
> &"a!(st!s2+1)
> is.na(st!s2+1[,@"%ail@])
Display only t&e students +it& misssing e0mail
address'
> st!s2+1[is.na(st!s2+1"%ail),]
Display, for eac& oservation, if variales 4)'45 are$A in data frame leaders&ip'
> is.na(l"a!"s&i[,:;1+]) =1 =2 =3 = =
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE FALSE
8/19/2019 04a Functions for NAs Strings Etc
9/17
*ounting NA3s
Counting t&e numer if $A values +it&in anentire data frame is possile +it& function sumand t&e follo+ing synta6's%(is.na(t&".!ata.fa%"))
7o+ many $A values are t&e in data frameleadership8
> s%(is.na(l"a!"s&i))
[1] 2
7o+ many $A values are t&e in data framecomp8
> s%(is.na(c'%))
8/19/2019 04a Functions for NAs Strings Etc
10/17
Function complete.cases
Display oservations(ro+s +&ic& &ave at least one$A
> l"a!"s&i[Bc'%l"t".cas"s(l"a!"s&i),]
%ana$" !at" c'nty $"n!" a$" =1 =2 =3= =
2+++121+ U4 9 0 3 3 NA NA
> c'%[Bc'%l"t".cas"s(c'%),]
Counting &o+ many oservations(ro+s &ave atleast one $A and &o+ many &ave no $As 9arecomplete cases:
> ta*l"(c'%l"t".cas"s(l"a!"s&i))
FALSE TRUE1
8/19/2019 04a Functions for NAs Strings Etc
11/17
*ounting4displaying NA3s foraria#les and ranges
7o+ many students &ave no e0mail address 8◦ Using sum!!!
> s%(is.na(st!s2+1"%ail))
[1] 1:18
◦ !!!or table
> ta*l"(is.na(st!s2+1"%ail))
FALSE TRUE
?8 1:18
Display, for eac& oservation, oservations in+&ic& at least one value of variales 5/657 is $Ain data frame leadership
> l"a!"s&i[Bc'%l"t".cas"s(l"a!"s&i[:;1+]),]
%ana$" !at" c'nty $"n!" a$" =1 =2 =3
8/19/2019 04a Functions for NAs Strings Etc
12/17
NULLs
Completely di;erent from dataases 9$A is prettyclose to t&e concept of $U11 in relational dataases
5.Cit&.nll =)> ) ? " 5 @
8/19/2019 04a Functions for NAs Strings Etc
13/17
NULLs +cont.> l"n$t&(5.Cit&.na)
=)> @> l"n$t&(5.Cit&.nll)
=)> 5
> s%(5.Cit&.na)
=)> $A> s%(5.Cit&.nll)
=)> )
In a data frame, a variale(column set to $U11 alsodissapears
> na%"s(a!l2+13#st!)
=)> $r $umeren atricol Email
> a!l2+13#st!N na%"s(a!l2+13#st!)
=)> $umeren atricol Email
8/19/2019 04a Functions for NAs Strings Etc
14/17
Functions for $anaging stringaria#les
Base %
◦ nchar ◦ substr
◦ strsplit
◦ paste - paste. - sprintf
ac-age stringr◦ str/c'( string concatenation 0 paste'(
◦ str/length'( number of characters 0 nchar'(
◦ str/sub'( extracts substrings 0 substring'(
◦ str/dup'( duplicates characters 0 no equivalent
◦ str/trim'( removes leading and trailing $hitespace 0 noequivalent
◦ str/pad'( pads a string 0 no equivalent
◦ str/$rap'( $raps a string paragraph 0 str$rap'(
◦ str/trim'( trims a string 0 no equivalent
◦ $ord '( 1 extracts $ords from a string 0 no equivalent
8/19/2019 04a Functions for NAs Strings Etc
15/17
So$e we# pages for processingstrings in R
aston anc&ez 0 7andling androcessing trings in %
&ttp'((gastonsanc&ez.com(7andling#and#r
ocessing#trings#in#%.pdf o&n yles
8/19/2019 04a Functions for NAs Strings Etc
16/17
Regular expressions
ital for te6t(string(+e searc&ing
Implemented in almost every programminglanguage
In J1 t&e asic mec&anist is rudimentar andased on operators' 1IKE, I1IKE, II1A% 3L
% &as full support for regular e6pressions
◦ 2unctions in base 3*
grep, grepl,
su, gsu,
rege6pr, grege6pr
◦ 2unctions in pac#age stringr *
strdetect
stre6tract, stre6tractall, strmatc&, strmatc&all
strlocate, strlocateall
strreplace, strreplaceall
strsplit, strsplitM6ed
8/19/2019 04a Functions for NAs Strings Etc
17/17
"e# sites4ideo8tutorials for regularexpression +generally and R speci9c
Basics of regular e6pressions 9in generaland in %:
&ttp'((+++.re6egg.com(rege604uic-start
.&tml&ttp'((+++.r0loggers.com(regular0e6pres
sion0and0associated0functions0in0r(
&ttp'((+++.r0loggers.com(r0tal-0on0regular0e6pressions0rege6(
%egular E6pressions
&ttps'((+++.youtue.com(+atc&8vN$v7/O
http://www.rexegg.com/regex-quickstart.htmlhttp://www.rexegg.com/regex-quickstart.htmlhttp://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/https://www.youtube.com/watch?v=NvHjYOilOf8https://www.youtube.com/watch?v=NvHjYOilOf8https://www.youtube.com/watch?v=NvHjYOilOf8http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/http://www.r-bloggers.com/r-talk-on-regular-expressions-regex/http://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.r-bloggers.com/regular-expression-and-associated-functions-in-r/http://www.rexegg.com/regex-quickstart.htmlhttp://www.rexegg.com/regex-quickstart.html