28
1 Copyright @ 2009 Jim Holtman Does Pivot Tables and More Jim Holtman [email protected]

Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

1 Copyright @ 2009 Jim Holtman

Does Pivot Tables and More

Jim Holtman

[email protected]

Page 2: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

2 Copyright @ 2009 Jim Holtman

Agenda

R Does Pivot Tables

Sparklines (Edward Tufte)

Misc. Graphics

Questions

Page 3: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

3 Copyright @ 2009 Jim Holtman

Pivot Tables & More

John Van Wagenen’s CMG2008 paper “Pivot Tables/Charts – Magic Beans Without Living in a Fairly Tale”.

Pivot tables are a nice way to slice/dice/aggregate data.

I had been doing similar things in R, so it motivated me to write a paper on another way to get the same information.

I have used his data to illustrate how to do these techniques in R.

Now walk through some examples.

Page 4: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

4 Copyright @ 2009 Jim Holtman

Excel Spreadsheet

CSV File Exported from above (10,696 data lines)

Page 5: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

5 Copyright @ 2009 Jim Holtman

Excel Pivot Table Generated from the Data

Read John’s paper for the procedure for generating the pivot table in Excel

Page 6: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

6 Copyright @ 2009 Jim Holtman

Page 7: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

7 Copyright @ 2009 Jim Holtman

This is what the data objects in R look like.

Page 8: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

8 Copyright @ 2009 Jim Holtman

“Casting” New Data

From the same ‘melt’ data, I can create a daily summary and add an indicator for PRIME time:

Page 9: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

9 Copyright @ 2009 Jim Holtman

Excel Spreadsheet (24,560 data points)

Pivot Table

Chart

Page 10: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

10 Copyright @ 2009 Jim Holtman

R Script

HOLIDAY

PERIOD2

PERIOD3

PRIME

WEEKEND

Breakdown by Shifts

0.6 seconds to read in 24,560 lines of data, summarize by shift and create the pie chart.

Page 11: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

11 Copyright @ 2009 Jim Holtman

“batch” Data Object in R

Page 12: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

12 Copyright @ 2009 Jim Holtman

EDA on the “batch” Data

Histogram of batch$cpu.hrs

batch$cpu.hrs

Fre

quency

0 5 10 15 20

05000

10000

15000

20000

25000

Histogram of batch$cpu.hrs[batch$cpu.hrs < 0.03]

batch$cpu.hrs[batch$cpu.hrs < 0.03]

Fre

quency

0.000 0.005 0.010 0.015 0.020 0.025 0.030

05000

10000

15000

Histogram of batch$cpu.hrs[batch$cpu.hrs < 0.005]

batch$cpu.hrs[batch$cpu.hrs < 0.005]

Fre

quency

0.000 0.001 0.002 0.003 0.004 0.005

02000

4000

6000

8000

Page 13: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

13 Copyright @ 2009 Jim Holtman

Summarize by Prod & Dev (3rd character)

Excel Spreadsheet Pivot Table

0

500000

1000000

1500000

2000000

2500000

5/1

/200

7

6/1

/200

7

7/1

/200

7

8/1

/200

7

9/1

/200

7

10/1

/20

07

11/1

/20

07

12/1

/20

07

1/1

/200

8

2/1

/200

8

3/1

/200

8

4/1

/200

8

5/1

/200

8

6/1

/200

8

cp

u s

ec

on

ds

DEV

PROD

Chart From Pivot Table

Page 14: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

14 Copyright @ 2009 Jim Holtman

Summarize by Prod & Dev Using R

Page 15: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

15 Copyright @ 2009 Jim Holtman

Chart from R

20

07

-05

-01

20

07

-06

-01

20

07

-07

-01

20

07

-08

-01

20

07

-09

-01

20

07

-10

-01

20

07

-11

-01

20

07

-12

-01

20

08

-01

-01

20

08

-02

-01

20

08

-03

-01

20

08

-04

-01

20

08

-05

-01

20

08

-06

-01

DEV

PROD

0

500000

1000000

1500000

2000000

To

tal C

PU

Se

co

nd

s

Page 16: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

16 Copyright @ 2009 Jim Holtman

Pivot Table Summary

R & Excel (and other products) can produce summaries that are equivalent to “pivot tables”

In R it is easy to automate the scripts and run through a set of files and quickly produce output in various formats: PDF, PNG for web pages, WMF for inclusion in WORD/PowerPoint documents, …

The interactive nature of R makes it easy to do EDA (exploratory data analysis) on your data.

Page 17: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

17 Copyright @ 2009 Jim Holtman

Sparklines

Invented by Edward Tufte, well known expert on data visualization – www.edwardtufte.com for more examples

Inspired by Ron Kaminski’s CMG2008 paper

Page 18: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

18 Copyright @ 2009 Jim Holtman

Sparklines from ‘vmstat’ data

Script on production systems log the ‘vmstat’ data to a file every 30 seconds. This is used to create the daily and monthly utilization charts for a system.

Data used to create “sparklines” of 19 variables in the log file below

Page 19: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

19 Copyright @ 2009 Jim Holtman

Page 20: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

20 Copyright @ 2009 Jim Holtman

Monthly Data

Have used “levelplot” to show 3D data – day of the month on the y-axis, time of day on the x-axis and color to represent the value of the z-axis, which would be the CPU utilization.

Sparklines for the month’s performance of the system were plotted next to the levelplot for comparison.

Both presentation methods allow you to look for patterns. Which do you find the easiest to see patterns in?

Sparklines would make an interesting presentation of yearly data. The example just duplicates the monthly data to provide an idea of what it might look like.

Page 21: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

‘levelplot’ and sparklines of the same monthly utilization data.

Page 22: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

22 Copyright @ 2009 Jim Holtman

Page 23: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

23 Copyright @ 2009 Jim Holtman

Transaction Data

Consolidated ~79K transactions into 10 transaction groups and 10 user pools to make the reports easier to see.

Data has the user, transaction name, start and end time.

Response was calculated.

Look at this data with some stacked barcharts and mosaic plots.

Pivot Table of User/Transaction Counts

Page 24: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

24 Copyright @ 2009 Jim Holtman

User.01 User.02 User.03 User.04 User.05 User.06 User.07 User.08 User.09 User.10

Stacked Bar Chart of Transaction Count by User

Tota

l T

ransactions

05000

10000

15000

Tran.01

Tran.02

Tran.03

Tran.04

Tran.05

Tran.06

Tran.07

Tran.08

Tran.09

Tran.10

Page 25: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

25 Copyright @ 2009 Jim Holtman

Stacked Bar Chart/Mosaic Chart

Lets you see who the busy users are in terms of number of transactions.

A mosaic chart shows the same data, but the “area” of the boxes is proportional to the counts. y-axis range is the same for all data elements.

Sometimes easier to the ratios (mix) between the use of transactions for a user; may denote a different role for that user. – User.06 (lowest count) has on Trans.06, Trans.09 and Trans.10 a higher ratio than

User.08 (highest count)

Page 26: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

26 Copyright @ 2009 Jim Holtman

Mosaic Plot of the Number of Transactions by User - Area Proportional to Count

User

Tra

n

User.

01

User.

02

User.

03

User.

04

User.

05

User.

06

User.

07

User.

08

User.

09

User.

10

Trans.01

Trans.02

Trans.03

Trans.04

Trans.05

Trans.06

Trans.07

Trans.08

Trans.09

Trans.10

Page 27: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

27 Copyright @ 2009 Jim Holtman

Summary

Short introduction to R that will hopefully whet you appetite to look at what R might be able to do for you.

Shown how R can be used to generate summaries equivalent to pivot tables in Excel.

Examples of sparklines and mosaic plots that help to visualized data in some different ways.

e-mail me some of your data, and an idea of what you would like summarized, and I will try to show how R can do some basic processing on it.

Page 28: Does Pivot Tables and Morefiles.meetup.com/1736007/CinDay RUG.pdf · Pivot tables are a nice way to slice/dice/aggregate data. I had been doing similar things in R, so it motivated

28 Copyright @ 2009 Jim Holtman

[1] J. Van Wagenen, “Pivot Tables/Charts – Magic Beans Without Living in a Fairy Tale”, CMG 2008

[2] Ron Kaminski, “Automating Process Pathology Detection – Rule Engine Design Hints”, CMG 2008

[3] R Development Core Team, “R: A Language and Environment for Statistical Computing”, {ISBN} 3-900051-07-0, http://www.R-project.org

[4] J. Holtman, “Using R for System Performance Analysis”, CMG 2004

[5] J. Holtman, “Visualization Techniques for Analyzing Patterns in System Performance Data”, CMG 2005

[6] N. J. Gunther, “Guerrilla Capacity Planning”, Springer-Verlag, Heidelberg, Germany, 2007

[7] H. Wickham, “Reshaping data with the reshape package”, Journal of Statistical Software, 21(12), 2007

[8] Venables, W. N. and Ripley, B. D. “Modern Applied Statistics with S. Fourth Edition”. Springer, 2002, ISBN 0-387-95458-0

[9] Tufte, Edward “Beautiful Evidence” Graphic Press 2006

[10] Spector, Phil “Data Manipulation with R (Use R)” Springer, 2009. ISBN 978-0387747309

References