15
1/15 Introduction Overview Administrivia Process Overview Q&A Conclusion References Files Vita Big Data: Data Analysis Boot Camp Introduction and Overview Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 29 March 2019 c Old Dominion University

Big Data: Data Analysis Boot Camp Introduction and Overviewccartled/Teaching/2019...Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita The

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • 1/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Big Data: Data Analysis Boot CampIntroduction and Overview

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    29 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 201929 March 2019

    c©Old Dominion University

  • 2/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Table of contents (1 of 1)

    1 IntroductionThe global view

    2 OverviewThe world from 50,000feet.Text

    3 AdministriviaMiscellaneous andnecessary things

    4 Process Overview5 Q & A6 Conclusion7 References8 Files9 Vita

    c©Old Dominion University

  • 3/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    The global view

    Big Data: Data Analysis Boot Camp

    We will cover aspects common toall Big Data investigations,including: defining Big Data,surveying tools and techniquesfor processing Big Data, andvisualizing selected aspects ofBig Data.The emphasis of the camp is tounderstand what is Big Datadata analysis beyond themarketing hype of the 3Vs ofvolume, variety, and velocity,

    Image from [1].

    More detailed information at:https://www.odu.edu/cepd/bootcamps/data-analysisc©Old Dominion University

    https://www.odu.edu/cepd/bootcamps/data-analysis

  • 4/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    The world from 50,000 feet.

    Things we’ll be covering over the next three days:

    Friday1 Administrivia2 What is BD?3 What is R?4 Looking at the built-in

    iris and Titanic datasetsSaturday

    1 Visualizing data withdifferent packages

    2 Exploring cluster analysis(of different types)

    3 Linear regression andsome variants

    4 Classification techniques5 Text analysis6 Serial vs. parallel

    processing

    Sunday1 R limitations2 R and Hadoop3 R and SQL and No-SQL

    DBMs4 Hands-on with real-world

    crime data5 Wrap-up

    c©Old Dominion University

  • 5/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Text

    Not required reading, but referenced throughout ourtime together.

    Learning Predictive Analyticswith R (LPAR)

    Big Data Analytics with R(BDAR)

    Not necessary, but really helpful.c©Old Dominion University

  • 6/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Text

    Code samples

    There are lots. And, they looklike this:

    library(cluster.datasets)

    data(all.us.city.crime.1970)

    crime =

    all.us.city.crime.1970

    plot(crime[5:10])

    Available in a separate file embedded in each presentation.

    c©Old Dominion University

  • 7/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Miscellaneous and necessary things

    All things related to paper work.

    Parking – front and backwithout permitsBreaks – yes we’ll have them.Lunch – yes places near by:right a main light to “fast food”Text books – recommended butnot necessary, have good ideas,techniquesNon-credit optionCredit option – two additionalassignment

    Hours – 9AM to 5PM with abreak for lunchSunday access – yes, check inwith securitySoft copies – all presentations,and software are availableComputer logins and passwords– will be coordinatedBreak room – across hallBathrooms – around elevator

    Other things as well.

    c©Old Dominion University

  • 8/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Miscellaneous and necessary things

    Soft copies available from Internet

    All information(presentations, scripts, anddata) is available on yourVM desktop (static)

    All information is availablevia the I’net (dynamic)

    Errata updated nightly

    I’m not a web designer, nor do Iplay one on TV.

    http://www.cs.odu.edu/

    ~ccartled/Teaching/

    c©Old Dominion University

    http://www.cs.odu.edu/~ccartled/Teaching/http://www.cs.odu.edu/~ccartled/Teaching/

  • 9/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Miscellaneous and necessary things

    Same image.

    http://www.cs.odu.edu/~ccartled/Teaching/

    c©Old Dominion University

    http://www.cs.odu.edu/~ccartled/Teaching/

  • 10/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    How do Data Wrangling, Analysis, and Visualization fittogether?

    Notionally, there are threedistinct phases in data analysis.

    1 Data wrangling – getting theraw data into a usable form

    2 Data analysis – evaluatingand understanding the data

    3 Data visualization –presenting the analyticalresults in an intelligiblemanner

    Management continues across allphases. The other phases mayoverlap.

    c©Old Dominion University

  • 11/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Q & A time.

    Q: What is the square root of4b2?A: To be or not to be.

    c©Old Dominion University

  • 12/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    What have we covered?

    Where we are.Where we’re going.How we’ll get there.

    Now!! On to exploring the world of Big Data!

    c©Old Dominion University

  • 13/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    References (1 of 1)

    [1] Vangie Beal, Big Data,https://www.webopedia.com/TERM/B/big_data.html,2017.

    c©Old Dominion University

    https://www.webopedia.com/TERM/B/big_data.html

  • 14/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Files of interest

    1 Code snippets

    c©Old Dominion University

    library(cluster.datasets)data(all.us.city.crime.1970)crime = all.us.city.crime.1970plot(crime[5:10])

    "Chuck Cartledge"

  • 15/15

    Introduction Overview Administrivia Process Overview Q & A Conclusion References Files Vita

    Who am I?

    Father

    Husband (only 42 years, but it seemslonger)

    PhD, Computer Science, 2014

    CAPT, USN retired 2004 (31+ years)

    Professional software developer (38 years)

    A perennial student

    1st computer: 1970, donated ICBMguidance computer, machine code,paper/mylar tape, and drum memory

    Interests: autonomic systems, real–time applications, distributed processing,long-term preservation of digital data, Big Data

    c©Old Dominion University

    IntroductionThe global view

    OverviewThe world from 50,000 feet.Text

    AdministriviaMiscellaneous and necessary things

    Process OverviewQ & AConclusionReferencesFilesVita