Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
DBAs and R - At the Intersection of Oracle and Unstructured Data
2
Introduction
3 3
Today’s Objectives § Understand the R Language. § Understand the Data Visualization and it’s value. § Learn the basic constructs of R § See R in Action via a Demo § Learn how Oracle is integrating R into it’s relational database product line.
4 4
Robert Dawson – Meta7
q Oracle Master Consultant, Meta7q AVP, Enterprise Databases, Oppenheimer Funds, Denver, COq Oracle DBA, Janus Capital, Denver, COq Oracle Application DBA, Blue Cross Blue Shield Denver, CO
5 5
Ashokkumar Sivasankaran – ACXIOM q Senior Team Leader and Database Architect, Acxiom ITO q OCE RAC Expert & OCP Database Administrator 7.3 to 11g q ITIL V3 Foundation Certified q Member Chicago Oracle User Group q Chicago “RAC Attack” Instructor
6
About You.
7 7
About You. How do you learn? Do you like to read and access to content on media? Do you like digest information from charts, diagrams, timelines or maps? Do you enjoy hands-on activities involving movement? Verbal Learner
Visual Learner
Kinesthetic learner
8 8
Think about these three questions? What is your learning style? What is the learning style of your boss? What is the learning style of your “customer”?
9
DBAs and R - At the Intersection of Oracle and Unstructured Data
10 10
The United States of Data(bases)
The Mainframe Colonies
The Relational Heartland
The NoSQL Outpost
The Hadoop States Somewhere at the Intersection of Relational and Unstructured…..
11
The Big Data Story
12 12
The Big Data Landscape
13 13
R is NOT a Big Data Tool 1. It’s a Data Tool. 2. Leveraged by Data Scientists, Analysts, Developers, Engineers, Planners
and Researches. 3. Open Source. 4. Processes large sets of data fast!
14 14
What Data Tools are we using?
15 15
Telling the Visual Data Story…. “The most common data display is a noun accompanied by a number. For example, a medical patient's current level of glucose is reported in a clinical record as a word and number.” – Edward Tufte
Source: http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR
16
Stay Relevant.
17 17
The Data-Driven Organization
Warby-Parker - New York Online Sunglass Companywww.warbyparker.com
Carl Anderson @LeapingLlamas
18 18
The Data-Driven Organization
"People want to move from a culture of reporting to a culture of analytics”
19 19
Are you a Data Driven DBA? Data Questions about your Oracle Databases? 1. What are the average IOPS for you databases during peak load. 2. How many average active sessions does your primary DB support. 3. What is the typical HCC compression ratio for you Exadata Storage? 4. How many executions does you TOP SQL complete every day? 5. What is the average DOP for your SQL Statements? 6. What is you average and max CPU utilization? 7. How many hours do you spend performing refreshes a month? 8. How many Oracle Core are you utilizing today?
How are you making your next Hardware purchase decision? How do you know you are ready to expand?
20 20
What’s your learning Style?
Verbal Learner
Visual Learner
Kinesthetic learner
You Boss Customer
21 21
Traditional DBA Reporting Tools AWR Reports Vendor Tools OEM Graphics ADDM Report
22 22
R – AWR Reporting PDF File Load.
23
R In Action: Demo
24 24
R Language Basics § Developed at Bell Labs (est. 2004) § Open Source § Runs on Windows, OSX, Linux, Unix § Interpreted Language vs. Compiled § Session-based § http://www.r-project.org/ § 5,000 packages available. CRAN: Comprehensive R Archive Network
25 25
Key Components of R 1. Simple Data: Vectors 2. Compound Data Stored in: Data.Frame,Matrix, List 3. Functional Programming 4. Shared Code: Packages 5. Graphic Packages: qplot(), ggplot(), hist()
26 26
Things to Remember about R-Basics
R is not Perl, Sed or Awk. 1. Data.Frames = Tables 2. Package-based. 3. Use help() 4. Graphic are ‘Packages’
27 27
R Development Tools
28 28
Demo: Data Load from Excel File (5000 rows) > awr_data <- read.xlsx2("awr-io-waits.csv.xlsx", 1, colClasses = c(snap_id="numeric",wait_class="character",event_name="character",wait_time_milli="numeric",wait_count="numeric")) > str(awr_data) 'data.frame': 5000 obs. of 5 variables: $ SNAP_ID : num 8195 8195 8195 8195 8195 ... $ WAIT_CLASS : Factor w/ 3 levels "Commit","System I/O",..: 3 3 3 3 3 3 3 3 3 2 ... $ EVENT_NAME : Factor w/ 4 levels "db file scattered read",..: 1 1 2 2 2 2 2 2 2 3 ... $ WAIT_TIME_MILLI: num 1 2 1 2 4 8 16 32 64 1 ... $ WAIT_COUNT : num 3 1 255 23 33 100 118 70 16 585 ... Key Things to remember: ü Columns are variables ü Rows are observations
29 29
Demo: Data Head > head(awr_data) SNAP_ID WAIT_CLASS EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT 1 8195 User I/O db file scattered read 1 3 2 8195 User I/O db file scattered read 2 1 3 8195 User I/O db file sequential read 1 255 4 8195 User I/O db file sequential read 2 23 5 8195 User I/O db file sequential read 4 33 6 8195 User I/O db file sequential read 8 100
30 30
Demo: Simple Table Group by w/ Pie Graph § > table(awr_data$WAIT_CLASS) Commit System I/O User I/O 1197 1378 2425 § > pie(table(awr_data$WAIT_CLASS))
31 31
Demo: Group by Pie Chart > table(awr_data$EVENT_NAME)
db file scattered read db file sequential read log file parallel write log file sync
714 1711 1378 1197
> pie(table(awr_data$EVENT_NAME))
32 32
Demo: Table Graphic Plot plot(awr_data$EVENT_NAME)
33 33
Demo: Table Plot Multicolumn.
> plot(awr_data$EVENT_NAME,awr_data$WAIT_COUNT)
34
Oracle R Extension: Use R with Oracle
35 35
Some Limitations Data Analysts Face with R. 1. Memory-based processing. 2. Data Extraction is time-consuming and painful! 3. Data Security not included in program. 4. Programing is “adhoc”, not “production-ready” 5. Users are not typically, “IT”.
Oracle doesn’t have these limitations.
36 36
The Oracle R Products
Oracle R Distribution
Oracle R Enterprise (AA)
Oracle R Advanced for Hadoop (Connectors)
R Oracle (Package)
37 37
R on TechNet
38