42
Hello, and welcome to this online, self-paced lesson entitled Producing Graphs in R. This session is part of an 8-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lecture and review on this lesson. ORE - Producing Graphs - 1

Hello, and welcome to this online, self-paced lesson

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hello, and welcome to this online, self-paced lesson

Hello, and welcome to this online, self-paced lesson entitled Producing Graphs in R. This

session is part of an 8-lesson tutorial series on Oracle R Enterprise.

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lecture and

review on this lesson.

ORE - Producing Graphs - 1

Page 2: Hello, and welcome to this online, self-paced lesson

Before we begin, now might be a good time to take a look at some of the features of this

Flash-based course player. Feel free to skip this slide and start the lecture if you’ve attended

similar Oracle Self Study courses in the past.

To your left, you will find a hierarchical course outline. This course enables and even

encourages you to go at your own pace, which means you are free to skip over topics you

already feel confident on, or jump right to a feature that really interests you, or go back and

review topics that were already covered. Simply click on a course section to expand its

contents and then select an individual slide. However, note that by default we will

automatically walk you through the entire course without requiring you to use the outline.

Standard Flash player controls are also found at the bottom of the player, including pause,

previous, and next buttons. There is also an interactive progress bar to fast forward or rewind

the current slide. Interactive slides may have additional controls and buttons along with

instructions on how to use them.

Also found at the bottom of the player is a panel containing any additional reference notes for

the current slide. Feel free to read these reference notes at the conclusion of the course, in

which case you can minimize this panel and restore it later. Or if you prefer you can pause

and read them as we go along.

ORE - Producing Graphs - 2

Page 3: Hello, and welcome to this online, self-paced lesson

Various handouts may be available from the Attachments button, including the audio

narration scripts for this course.

The course will now pause, so feel free to take some time and explore the interface. Then

when you’re ready to continue, click the next button below or alternatively click the Module 1

slide in the course outline at left.

ORE - Producing Graphs - 3

Page 4: Hello, and welcome to this online, self-paced lesson

Producing Graphs in R is the fourth lesson of eight self-study sessions on Oracle R

Enterprise.

Let’s take a look at the topics for this lesson.

ORE - Producing Graphs - 4

Page 5: Hello, and welcome to this online, self-paced lesson

This lesson includes four topics:

• First, you’ll learn how to create common graph types, including scatter plots, pie charts,

histograms, and box plots. You also learn how to annotate graphs with labels and

legends. Finally, you learn how to write graphs to a file.

• Second, you’ll learn about a few more advanced graph types, including perspective

graphs, plotting linear models, smooth scatter plots, quality control charts, and Pareto

charts.

• Third, you’ll learn about the structure of the graphic packages that are available to R,

and examine some of the functionality associated with a few of these packages.

• Finally, you’ll examine some of the graphics functions that are available through ORE.

So first , an overview of some of the common graph types.

ORE - Producing Graphs - 5

Page 6: Hello, and welcome to this online, self-paced lesson

As mentioned in a previous lesson, data graphics and visualization help convey information

faster than other means.

You can explore many online resources about R graphics:

• Perform a web search on “R graphics.”

• Visit the link that is shown here.

• Use the many resources available via the R help system.

Now let’s take a look at some specific examples of graphs that you can create in R.

ORE - Producing Graphs - 6

Page 7: Hello, and welcome to this online, self-paced lesson

First, let’s look at a simple scatter plot.

We’re plotting two sets of points: pressure versus temperature by using a data set named

“pressure” that comes with R.

• First, we type plot(pressure) at the command line.

• Then, we use the text() function to specify a caption for the graph. The text is placed at

the center position of the x and y coordinates.

R automatically takes care of the axis labels for the scatter plot graph. Later, we’ll examine

how you can customize the labels.

You can also try plotting with different types.

• “l” is for line.

• “b” is for both line and point.

• “h” is for height.

As noted in the comment in the first example, “p” is the default type, and it stands for point.

ORE - Producing Graphs - 7

Page 8: Hello, and welcome to this online, self-paced lesson

For simple pie charts, let’s look at sales percentage figures that are assigned to the

paint.sales variable.

Then, use the pie function to create the pie chart.

• The first pie chart uses a gray-scale sequence with seven shades of gray, ranging from .5 to 1.0. This sequence is specified in the col parameter.

• Next, assign explicit colors to the values in paint.sales by using the names() function.

In this example, red is assigned to the first value, green to the second value, and so on.

• Then, in the second pie chart, use the seven colors that were assigned to the values

when drawing the chart.

• In the last chart, generate colors by using the rainbow() function and specifying 12

colors.

ORE - Producing Graphs - 8

Page 9: Hello, and welcome to this online, self-paced lesson

You can customize graphs in R in various ways.

One way is labeling, where you can specify the main title, subtitle, labeling on axis, and the

range for the axis.

The legend() function has many options. See help on the legend() function for its various

options, as shown in the slide.

From a performance perspective, consider the amount of data that you provide when creating

graphs :

• If you have lots of data, R can take a long time to plot the graph.

• In addition, when the graph is rendered, overplotting of data points can make the graph

too dense to be useful, especially with scatter plots. Sampling is a common technique to

avoid overplotting; otherwise, your graph may end up filled with points that shed no light

on patterns in the data.

• You may also want to consider sampling or filtering data for a better view.

So, now let’s look a few examples of labeling graphs.

ORE - Producing Graphs - 9

Page 10: Hello, and welcome to this online, self-paced lesson

Here, let’s look at a few labeling examples with histograms.

The code at the top uses the stats package to randomly generate a chi square distribution of

1000 points, with degrees of freedom “3.” The result is stored in the “x” vector.

Then, we create histograms of this data.

In the first histogram example, the hist() function:

• Takes the vector x

• Specifies density (not frequency)

• And also specifies the range of “y” values to be from 0 to .25

In the second histogram example, the hist() function takes an additional parameter , adding

“My New Histogram” as the main title to the graph.

By contrast in the first graph, notice the default title of “Histogram of x,” where x is simply the

name of the first argument provided to the hist() function.

ORE - Producing Graphs - 10

Page 11: Hello, and welcome to this online, self-paced lesson

Building on the previous example, this third histogram adds:

• A subtitle

• Specific labels for the x and y axes

• A fitted curve to the chi square distribution

• And a legend

As you can see , you can customize graphs in a variety of ways.

ORE - Producing Graphs - 11

Page 12: Hello, and welcome to this online, self-paced lesson

This example shows a scatter plot matrix, including a linear model line and a lowess curve. In

data analysis, lowess is a technique for producing a “smooth” set of values from a scatter plot

between two variables.

Using the pairs() function , identify the prestige, income, and education columns from the

Duncan data set, which we have read in and attached to our session.

Inside the pairs() function , use the panel() function , which enables you to specify how you

want each nondiagonal panel drawn. In this case, draw:

• The points

• The corresponding linear model associated with the points, and

• The lowess curve

For the diagonal panels, specify a histogram of the values for each column. Turn off the

axes, because the pairs() function already accounted for them.

The pairs() function is a good example of a function that allows other functions to be passed

in. As you have learned, functions in R are just objects that may be passed to other

functions.

ORE - Producing Graphs - 12

Page 13: Hello, and welcome to this online, self-paced lesson

The next slide examines a box plot graph. But first, let’s discuss the structure of this graph type.

A box plot provides a compact representation of various aspects of a variable’s distribution. It:

• Facilitates comparison among multiple variables, and

• Depicts a limited number of quantiles to summarize the variable

The basic elements of the box plot include:

• The interquartile range (IQR), which measures the spread of the distribution between the first and third quartiles. This range represents the middle 50% of the data.

• The median position , or middle value, indicates the skew of the data. This position indicates whether more of the data is at the high end or the low end. In this example, the median position is skewed toward the third quartile, which means more of the data is at the low end.

• The notch provides a 95% confidence interval around the median. It can help you see whether two variables are likely to have truly different median values.

• The graph also represents outliers , which are indicated by dots above or below the whiskers , those values outside 1½ times the IQR.

Now, let’s look at a graph that uses this representation.

ORE - Producing Graphs - 13

Page 14: Hello, and welcome to this online, self-paced lesson

Here is a box plot example that looks at tooth growth in guinea pigs, depending on their dosage

of ascorbic acid versus orange juice. This example is available in help for the boxplot() function,

as indicated at the top of the slide.

Some key points in this example are a follows:

The boxwex assignment has two elements:

• The width of the boxes in the graph.

• The parameter at, which indicates where the boxes should be placed on the x-axis.

Also, notice the subset option in both uses of the boxplot() function . This option identifies the

subset of data to be used for the box plot .

In the first use of the boxplot() function:

• The subset option specifies the Asorbic Acid (Vitamin C) supplement, with a color of

yellow.

• You also see labeling options that were discussed previously.

ORE - Producing Graphs - 14

Page 15: Hello, and welcome to this online, self-paced lesson

In the second use of the boxplot() function:

• The add parameter means that this second use of boxplot should be added to the

previous one.

• Notice also the subset parameter , used this time to specify the comparative

supplement: Orange Juice.

Finally , the legend() function adds a legend to the interior of the graph.

ORE - Producing Graphs - 15

Page 16: Hello, and welcome to this online, self-paced lesson

R also allows you to write graphs to a file in any of the file formats listed in the slide. Each file

format has a corresponding R function.

In the code example:

• Use the pdf() function to specify an output file in pdf format.

• At the end of the graph commands , use the dev.off() function to stop the graphic output

and close the output file.

• Within the graph creation code, two graphs that you have seen previously are defined

and executed. As each graph is completed , it is automatically written to the specified

output file.

ORE - Producing Graphs - 16

Page 17: Hello, and welcome to this online, self-paced lesson

In this second section, let’s examine a few slightly more sophisticated graph types, including

perspective graphs, plotting linear models, smooth scatter plots, quality control charts, and

Pareto charts.

ORE - Producing Graphs - 17

Page 18: Hello, and welcome to this online, self-paced lesson

You can use the persp package to draw perspective plots. As shown, use the example()

function to see examples of perspective graphs.

The slide displays one such example, using the persp() function to create a mathematical

surface graph.

ORE - Producing Graphs - 18

Page 19: Hello, and welcome to this online, self-paced lesson

This slide shows examples of two types of graphs:

• First, we have three examples of perspective graphs.

• Then, five examples of bar plot graphs.

In both cases, the example() function is invoked by specifying the appropriate graphics

package.

ORE - Producing Graphs - 19

Page 20: Hello, and welcome to this online, self-paced lesson

Use the lm() function to plot results for linear models. The lm() function contains a parameter

that lets you create up to six different plots.

This example uses the LifeCycleSavings data set to build a linear model to predict sr, the

savings ratio, and it uses several predictors.

Four of the six possible plots are displayed:

• First, a plot of the residuals against fitted values

• Second, a scale location plot of the square root of the absolute value of residuals

against fitted values

• Third, a normal QQ plot

• Fourth, a plot of residuals against leverages

Not shown here are two other options:

• A plot of Cooks distance versus row labels

• A plot of Cooks distances against leverage, over 1 minus the leverage

In statistics, when you perform a least squares regression analysis, use Cooks distance to

estimate the influence of a data point.

ORE - Producing Graphs - 20

Page 21: Hello, and welcome to this online, self-paced lesson

The graphs you just saw can also be put into a single plot. Use the par() function to set or

query graphical parameters. These parameters can be passed to the par() function with a “tag

= value” form or they can be passed as a list of tagged values.

In this example , the par() function indicates that you want a 2 by 2 matrix of graphs, where

the text and symbol sizing should be 60% of normal. The margins around the graph should be

6 pixels on three sides and 4 pixels on the other.

In the plot() function , id.n indicates the number of points to be labeled in each plot, starting

with the most extreme.

Use help(plot.lm) for a full description of the arguments.

ORE - Producing Graphs - 21

Page 22: Hello, and welcome to this online, self-paced lesson

The smoothScatter package produces a smoothed-colored density representation of the

scatter plot obtained through a kernel density estimate.

In the first example of the smoothScatter() function, the nrpoints parameter specifies the

number of points to be superimposed on the density image. Adding points to the plot allows

for the indication of outliers. If all points are to be plotted, use nrpoints=“inf.”

In the second smoothScatter() function example, the default value of 100 is applied for

nrpoints.

In the last example:

• The colorRampPalette() function specifies a color-specific output for the smoothScatter

graph.

• This color specification is passed to the smoothScatter() function by the colramp

parameter.

Check out help(smoothScatter) for more details on this package.

ORE - Producing Graphs - 22

Page 23: Hello, and welcome to this online, self-paced lesson

Quality control charts are tools that you use to determine if a manufacturing or business

process is in a state of statistical control.

Use the qcc graphics package to create quality control charts. In this package, the qcc()

function creates an object of class qcc to perform statistical quality control. You can use the

qcc object to plot a range of quality control charts.

This, and the following example, illustrate that domain-specific graphics are included in open

source CRAN packages, which can be leveraged by ORE.

In this example:

• Load the qcc library. Then, identify and attach the pistonrings data set, and read in the

first six rows of the data set.

• Next, use the gcc.groups() function to read the data columns into a variable named

diameter.

• Then, use the par() function to specify the output of two charts in a 2 x 1 form.

• Finally, we create two qcc objects, one for each chart shown at right.

- The top chart is an “xbar” chart, which focuses on the mean of the variable being

considered. In this case, it’s the diameter of the piston rings.

- The second chart is an “s” chart, which deals with the standard deviations of the

continuous variable.

ORE - Producing Graphs - 23

Page 24: Hello, and welcome to this online, self-paced lesson

The Pareto chart highlights the most important factor among a possible large set of factors,

such as causes of defects or customer complaints.

R produces a proper Pareto chart, which includes the following elements:

• Both bars and a line graph: The individual values are represented in descending order

by frequency. The cumulative total is represented by the line.

• The left vertical axis often contains the frequency of occurrence, but it can also

represent cost or another metric.

• The right vertical axis is the cumulative percentage of the total number of occurrences.

In this example:

1. A vector of five values is read into the “defect” variable.

2. Then, names are provided for each value.

3. Finally, a Pareto chart is generated for this data by using the pareto.chart() function.

ORE - Producing Graphs - 24

Page 25: Hello, and welcome to this online, self-paced lesson

Let‘s use this chart as a representation for another example. In the new example:

• The bars refer to customer complaints and indicate reasons for customer churn.

• Because the reasons are in descending order, you’d have a good understanding of the

most significant reasons.

• Using these graph results, you could also make a statement that “fixing the top three

issues will address 80% of customer complaints.”

ORE - Producing Graphs - 25

Page 26: Hello, and welcome to this online, self-paced lesson

This third section provides an overview of some graphics packages that are available to R,

including some code examples.

ORE - Producing Graphs - 26

Page 27: Hello, and welcome to this online, self-paced lesson

To orient you to the scope of the R graphics system, let’s look at the overall graphics package

structure.

The R graphics system has five core packages, which are represented by the blue ovals. The

yellow ovals are examples of packages that extend the core packages. Some packages

provide their own graphics system, and these are shown in green.

Functions like plot(), hist(), and persp() are from the graphics package.

For more information on any of these packages, see the CRAN website or simply do a web

search.

Now, let’s take a quick look at three of the packages : lattice, ggplot2, and plotrix.

ORE - Producing Graphs - 27

Page 28: Hello, and welcome to this online, self-paced lesson

The lattice package is an implementation of Trellis graphics for R. Here are some of its

characteristics:

• It is a high-level data visualization system that has a reputation for being powerful and

elegant.

• It emphasizes the use of multivariate data.

• It allows you to partition your data and graph the individual partitions to help you

understand data relationships better.

• It’s designed to meet most of your graphical needs with minimal parameter tuning, which

means that the default settings are excellent.

• It can be easily extended and customized to handle nonstandard requirements.

ORE - Producing Graphs - 28

Page 29: Hello, and welcome to this online, self-paced lesson

The main functions of the lattice package are listed here, including xyplot() for bivariate

scatter plots and time-series plots, bwplot() for box-and-whisker plots, and many others.

All of the functions respond to a common set of arguments that control conditioning, layout,

aspect ratio, legends, axis annotation, and other details in a consistent manner.

Use the example() function to see code examples for these functions. Here, we show how to

display examples for the xyplot() function.

ORE - Producing Graphs - 29

Page 30: Hello, and welcome to this online, self-paced lesson

The ggplot2 package is an unusual graphics package, because it is a plotting system for R

that is based on a grammar of graphics. Using the grammar of graphics design, ggplot2

proposes to take the best parts of base and lattice graphics packages.

Although it may provide a steep initial learning curve, the ggplot2 package does provide a

powerful model that produces complex, multilayered graphics. Generally, the default ggplot2

graphics look more refined than other graphic packages.

ORE - Producing Graphs - 30

Page 31: Hello, and welcome to this online, self-paced lesson

The ggplot2 grammar of graphics paradigm uses a small set of functions that produce a

different kind of plot component. Rather than employing a wide variety of distinct functions to

produce different types of graphs (as most other graphics packages do), ggplot2 uses a set of

primitives that you can combine in interesting ways to produce a wide range of graphs.

You follow three primary steps to create graphs with ggplot2:

1. First, define the data and create an empty plot object, by using the ggplot() funciton.

2. Then, specify graphic shapes, called geomes, and add them to the plot.

3. Finally, you specify features, called aethetics, to represent data values.

Next, we’ll look at simple ggplot2() examples.

ORE - Producing Graphs - 31

Page 32: Hello, and welcome to this online, self-paced lesson

The first example shows the three-step process for creating a plot of temperature versus

pressure by using ggplot2.

1. First, use the ggplot() function to define the data and create an empty plot object. In this

case, the data set is named “pressure.”

2. Second, create a graphic shape named geom_point.

3. Third, create aethetics by using the aes() function. In this case, assign temperature to

the “x” variable and pressure to the “y” variable. Notice that the plus (+) operation is

overloaded to work with these graphic objects.

In the second example , ggplot2 provides a function called qplot(), which simplifies the more

primitive aspects of ggplot2. In this way, the qplot() function is similar to the other graphics

packages. This qplot() example provides the data set and the basic aethetics as parameters

to the function.

Both command formats create the same graph.

ORE - Producing Graphs - 32

Page 33: Hello, and welcome to this online, self-paced lesson

The last example shows a few ways to customize the qplot() function:

• A title is added by using the main option.

• The geom contains both point and line, so that the points are connected by the lines.

• And the line type (lty) is set to dashed.

This example produces the second graph.

ORE - Producing Graphs - 33

Page 34: Hello, and welcome to this online, self-paced lesson

The plotrix package has a wide range of graphical functions. The polar.plot() function is an

interesting function. It displays radial lines, symbols, or a polygon, which are centered at the

midpoint of a plot frame in a circle.

In this example:

• Compute the average revenue per country.

• Then, after loading the plotrix package, take the last 20 countries alphabetically, and

set up the data for plotting. Also compute the spacing degrees, which depend on the

number of rows. In this case “n” is equal to 20, so each country will be allocated 18

degrees in the circle.

• Then, using the polar.plot() function ,:

- Specify a start at 90 degrees, with label points plotted clockwise, and a line width

of 2.

- Indicate where the labels should be placed and what the labels should look like.

- Finally, specify the rp.type to indicate whether to plot radial lines, symbols, or a

polygon. In this case, specify a polygon.

Use the help system to see all options that can be used with polar.plot().

ORE - Producing Graphs - 34

Page 35: Hello, and welcome to this online, self-paced lesson

This last section examines several ORE graphic functions that you can use when creating R

graphs.

ORE - Producing Graphs - 35

Page 36: Hello, and welcome to this online, self-paced lesson

This is the set of ORE graphics functions that have been overloaded to be compatible with

ore.frame objects. Because these functions can use ore.frames directly, they can access data

that resides in Oracle Database.

In some cases, summary data is retrieved for plotting; in other cases, the data is pulled into

the R environment.

We’ve actually already touched on some of these functions, such as boxplot(), hist(), pairs(),

and plot(). Now, let’s look at several more, which are highlighted in the slide in red.

ORE - Producing Graphs - 36

Page 37: Hello, and welcome to this online, self-paced lesson

This simple example shows the arrows() and segments() functions.

The first primitive is for annotating your graph with arrows. You do this by indicating the set of

points that connect the arrows.

• First, plot some x, y data.

• Then, add arrows , coloring them with alternating blue, red, and green.

In similar fashion , add segments. Here, they are pink.

ORE - Producing Graphs - 37

Page 38: Hello, and welcome to this online, self-paced lesson

The next primitive we’ll look at is lines. The example shows a variety of related functions. All

except the xspline() and polygon() functions are commented out.

First, create the data set with a sequence of points from 60 to 360 and in increments of 30

degrees. Then, create the coordinates by using functions of cos() for x and sin() for y.

Use the function plot() just to plot the individual data points.

Then, show a variety of line functions that can be used on this data, including lines(),

segments(), arrows(), rect(), polypath(), and rasterImage().

In addition, we show two graphs:

• Create the first graph with the xspline() function, and add a continuous line shape.

• Create the second graph with the polygon() function, and use red as the interior color.

ORE - Producing Graphs - 38

Page 39: Hello, and welcome to this online, self-paced lesson

In addition to the primitives, you can use more advanced plotting capabilities, including

conditional density plots.

In these examples, we use the cdplot() function to compute the plots of conditional density

that describe how the conditional distribution of the categorical “y” variable changes over a

numerical “x” variable. These examples generate graphs similar to those shown on the right.

Refer to the online help for more details on conditional density plots and the other, more

advanced ORE graphic plotting functions.

ORE - Producing Graphs - 39

Page 40: Hello, and welcome to this online, self-paced lesson

Radio button (only one correct choice) quiz:

A: Incorrect. Although ORE graphics functions are compatible with data.frames, but the

primary purpose of overloading these function is to make them compatible with ore.frames.

This enables ORE graphic functions to use ore.frames, enabling direct access to data that

resides in Oracle Database.

B: Correct. Because these functions can use ore.frames, they can directly access data that

resides in Oracle Database.

ORE - Producing Graphs - 40

Page 41: Hello, and welcome to this online, self-paced lesson

In this lesson, you learned:

• How to create common graph types, including scatter plots, pie charts, histograms, and

box plots; and how to annotate graphs with labels and legends.

• How to create a few more advanced graph types, including perspective graphs, plotting

linear models, smooth scatter plots, quality control charts, and Pareto charts.

• About the structure of the graphic packages that are available to R, and about some of

the functionality that is associated with a few of these packages.

• And finally, about some of the graphics functions that are available through ORE.

ORE - Producing Graphs - 41

Page 42: Hello, and welcome to this online, self-paced lesson

You’ve just completed “Producing Graphs in R”. Please move on to the next lesson in the

series : “The ORE Transparency Layer”.

ORE - Producing Graphs - 42