17
Data Science: Advanced-R Boot Camp Data Reshaping and Subsetting Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 23 February 2020 1/17

Data Science: Advanced-R Boot Camp Data Reshaping and ...ccartled/Teaching/2020... · Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files Make things

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Data Science: Advanced-R Boot CampData Reshaping and Subsetting

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    23 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 202023 February 2020

    1/17

  • 2/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Table of contents (1 of 1)

    1 Intro.

    2 Simple ways

    3 Data frame reshaping

    4 Hands-onLooking at “old” data

    5 Q & A

    6 Conclusion7 References8 Files

    c©Old Dominion University

  • 3/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    What are we going to cover?

    We’re going to talk about moulding thedata we have into data we want.

    Look at lots of different ways tomodify data

    Look at lots of different ways toextract data

    look at lots of different ways toreshape data

    c©Old Dominion University

  • 4/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Atomic vectors, and lists

    Subscripts: positive,negative, ordered, duplicate,logical, named

    x

  • 5/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Data frames behave differently

    1 When subsetting with asingle index, they behavelike lists and index thecolumns, so df[1:2] selectsthe first two columns.

    2 When subsetting with twoindices, they behave likematrices, so df[1:3, ] selectsthe first three rows (and allthe columns).

    rm(list=ls())

    df

  • 6/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Make things go away, or not

    Assigning the reserved valueNULL to anelement/dimension willremove thatelement/dimension

    To assign the reserved valueNULL to anelement/dimension, encloseit in a list()

    x

  • 7/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    A lookup table based on abbreviations

    Use a look-up table to translatefrom abbreviation to full text.The look-up table has namedentries that correspond exactlywith the items to be “looked-up.”The entire column is returned foreach matched entry.

    rm(list=ls())

    x

  • 8/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    cbind() and rbind()

    The simplest case is when wehave two datasets with eitheridentical columns (both thenumber of and names) or thesame number of rows. In thiscase, either rbind or cbind workgreat.

    sport

  • 9/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Different types of data set joins

    Generally there are sevendifferent types:

    1 Inner: elements common toboth sets

    2 Left outer: elements not inright

    3 Right outer: elements not inleft

    4 Full outer: all elements in rightand left, but no commonelements

    5 Right anti: elements in rightouter and not left inner

    6 Left anti: elements in left outerand not right inner

    7 Anti inner: elements in leftouter and right outer and notinner

    Image from [1].

    c©Old Dominion University

  • 10/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Same image.

    Image from [1].c©Old Dominion University

  • 11/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Translate SQL joins into R

    Generally there are seven different types:

    1 Inner

    2 Left outer

    3 Right outer

    4 Full outer

    5 Right anti

    6 Left anti

    7 Anti inner

    See code in attached

    "snippet" file.

    Ideas from [2].

    c©Old Dominion University

  • 12/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Extracting data from data frame

    Lots of different ways:

    Direct selection based on logicals

    base::subset(x, subset, select,drop = FALSE, . . . ) uses logicalto select rows

    base::transform( data, . . . )creates or modifies new columns

    plyr::arrange(df, . . . ) combinessubset() and transform()

    reshape2::melt(data, . . . ) genericfunction that calls specifics basedon data type

    reshape2::dcast(data, formula,func, . . . ) cast “melted” datainto a data frame

    ChickWeight[(ChickWeight$Diet==4)&

    (ChickWeight$Time==21),]

    subset(ChickWeight, Diet==4 &

    Time == 21)

    subset(airquality, Temp > 80,

    select = c(Ozone, Temp))

    with(airquality, subset(Ozone,

    Temp > 80))

    transform(airquality,

    new = -Ozone,

    Temp = (Temp-32)/1.8)

    arrange(mtcars, cyl, disp)

    arrange(mtcars, cyl, desc(disp))

    See embedded file.

    c©Old Dominion University

  • 13/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Looking at “old” data

    With the Motor Trend cars dataset:

    Write a script that:

    Converts the mtcars dataset wt column into pounds

    Identifies the most fuel efficient vehicle by transmission typeand number of carburetors

    Creates a data frame with all the column data for the vehiclesidentified in the previous requirement

    c©Old Dominion University

  • 14/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Q & A time.

    Q: What was the greatestachievement in taxidermy?A: The Royal Canadian MountedPolice.

    c©Old Dominion University

  • 15/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    What have we covered?

    Looked at different ways to createdata frame, from raw data or otherdata framesLooked at different functions thatdo the same things

    Next: String manipulations

    c©Old Dominion University

  • 16/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    References (1 of 1)

    [1] deadman87, Sql joins explained (x-post r/sql),https://www.reddit.com/r/programming/comments/

    1xlqeu/sql_joins_explained_xpost_rsql/, 2014.

    [2] Dan Goldstein, How to join (merge) data frames (inner, outer,left, right),https://stackoverflow.com/questions/1299871/how-

    to-join-merge-data-frames-inner-outer-left-right,2009.

    c©Old Dominion University

    https://www.reddit.com/r/programming/comments/1xlqeu/sql_joins_explained_xpost_rsql/https://www.reddit.com/r/programming/comments/1xlqeu/sql_joins_explained_xpost_rsql/https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-righthttps://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right

  • 17/17

    Intro. Simple ways Data frame reshaping Hands-on Q & A Conclusion References Files

    Files of interest

    1 Code snippets

    c©Old Dominion University

    ## First codesrm(list=ls())

    x