19
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Embed Size (px)

Citation preview

Page 1: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

UNLOCKING THE SECRETS HIDDEN IN YOUR DATA

Data Analysis

Page 2: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Why Do Data Analysis ?

Avoids incorrect assumptions

Does the data makes sense?

Which one is better?

time elevation0 400000

10 39900020 39300030 384000

y = 0.3333x3 - 35x2 + 216.67x + 400000

R2 = 1382000

384000

386000

388000

390000

392000

394000

396000

398000

400000

402000

0 5 10 15 20 25 30 35

y = -20x2 + 60x + 400100

R2 = 0.9988

382000

384000

386000

388000

390000

392000

394000

396000

398000

400000

402000

0 5 10 15 20 25 30 35

Page 3: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Why Do Data Analysis ?

Are your assumptions correct?

Did you collect enough data?

If this is a model of a following body which is better?

Be careful what's better mathematically is not always better scientifically

y = -20x2 + 60x + 400100R2 = 0.9988

y = 0.3333x3 - 35x2 + 216.67x + 400000R2 = 1

250000

270000

290000

310000

330000

350000

370000

390000

410000

430000

0 10 20 30 40 50 60 70 80 90

Page 4: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Ways to Analyze Data

Plotting Data Ways to visually

understand data

Statistics Makes is easier to

compare data Mean, Median,

Mode Makes it clear if you

have NOISY data Range,

Variance, Standard Deviation

0

5

10

15

20

25

30

0 10 20 30 40 50 60

Mean Pink

Pink

Mean Blue

Blue

Page 5: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Ways to Analyze Data

Derivatives (Slopes) Tell if changes in

parameters affect data Parameter 2 has a

greater effect than Parameter 1

Get more information from data 0

0.5

1

1.5

2

2.5

3

3.5

4

0.00 2.00 4.00 6.00 8.00 10.00 12.00

Base Case

Parameter 1

Parameter 2

Slope = 0.08

Slope = 0.16

Slope = 0.39GreatDerivative

Page 6: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Extracting from Netlogo

Two ways 1st Way: Write code to

extract the data you want – see File Output Example in the Code Examples

Open file in setup procedure

Create a write-to-file procedure

Page 7: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Extracting from Netlogo

2nd way: Extract data from Netlogo graphs Have Netlogo generate graph on

Interface page (example on later slide) Create a setup-plot procedure and a

do-plot procedure Call the setup-plot procedure in setup

procedure Call do-plot procedure in go procedure

Page 8: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Extracting from Netlogo

Run model until sufficient data obtained

(PC) Right Click on Graph/(Mac) Select Export Choose location and File name -

select save Excel File is created – Next Slide

Contains all the information in the plot and input parameters used.

Contains excess information about the plot (color, pen down, mode, interval…)

LET’S DO IT – Open Rabbits Grass Weeds

Page 9: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Extracting from Netlogo

This is what You need

Page 10: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Different Types of PlotsAll plots from http://www.statcan.ca

Pie Charts – music preference

Pets purchased at pet store

Bar Charts – preferred snacks

Page 11: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Different Types of PlotsAll plots from http://www.statcan.ca

Line Graphs – cell phone use http://www.statcan.ca

Scatter Plotshttp://en.wikipedia.org/wiki/Scatterplot

Page 12: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Activity in Excel

Open File Car Data Insert ChartSelect type of chart

XY Scatter Select Data RangeHighlight data to

be plotted

LET’S DO IT

Page 13: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Plotting Data – Activity in Excel

Label each data series Label Graph and Axis Select where you want

graph to be (on that page -worksheet –or on another worksheet in same file)

Page 14: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

2

6

10

14

18

22

0 10 20 30 40 50 60

Noisy

Noisier

Mean (both)

Noisy + 2SD

Noisy - 2SD

Noisier + 2SD

Noisier - 2SD

Statistics

Statistics help you Summarize data Describe data Analyze data

2

6

10

14

18

22

0 10 20 30 40 50 60

Noisy

Noisier

Hard to describe the difference Between the two data sets

Now it is easy to summarize, describe and analyze the data….The blue and the pink data have the AVERAGE value (mean) but the bluedata is “NOISIER” (greater standarddeviation). Therefore…

Page 15: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Statistics – How to Calculate in Excel

+,-,*,/ used for addition, subtraction, multiplication and division.

Each cell has a label based on the column and row.

Use cells to perform calculations instead of numbers. Example : =(A4+B4)/C4

Perform calculations on an entire column - copy and paste the equation .Warning : this changes the cell number for each line.

Fix a specific cell - use the $ symbol, example (A4+B4)/$C$1

Excel has many built in statistical functions

Makes life easy!

E1

Page 16: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Statistics – Measurements of Central Tendency

Mean (Average), Median, and Mode

Definitions Mean (Average) – Sum divided by the number of data points Median – Middle data point when arranged from highest to

lowest Mode – Most frequent value

Use data set to calculate Mean (Average) Median, Mode, Max and Min

Select Cell where you want the value of the function to appear Select Insert then Function Select Statistical Select function wanted (AVERAGE, MEDIAN, or MODE) then

hit OK Select Range of data you want to analyze by clicking on range

symbol and highlighting range. Hit enter or OK

LET’S DO IT : StarlogoTNG : Fish and Plankton Netlogo : Rabbits and Grass

Page 17: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Statistics – Measurements of Data SpreadRange, Variance and Standard Deviation

Definitions Range = maximum - minimum

Variance = measures noise of the data around the mean value.

Standard Deviation (S) is the square root of the variance. Most commonly used measure of spread (same units as the data). Another reason to use S:

~68% of the data are in the interval Mean – S to Mean + S

~95% of the data are in the interval Mean – 2 S to Mean + 2 S

~99% of the data are in the interval Mean – 3 S to Mean + 3 S

EXCEL does it for you!!!

Rabbit Population

0

50

100

150

200

250

300

0 500 1000 1500 2000

Ticks

Num

ber o

f Rab

vits

Rabbits Mean Mean - 2 S Mean + 2 S

LET’S DO IT : StarlogoTNG : Fish and Plankton Netlogo : Rabbits and Grass

Page 18: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

Derivatives

What are Derivatives? A simple calculation using data Instantaneous rate of change

= SLOPEWhy use Derivatives?

Get more information from data More Ways to comparison data Car moving down a road

Data = the distance traveled Velocity = the 1st derivative

of distance Acceleration = 2nd derivative

of distance

= the 1st derivative of velocity

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12

Dis

tanc

e

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12

Vel

ocit

y

-4

-3

-2

-1

0

1

2

0 2 4 6 8 10 12

Time

Acc

eler

atio

n

Slope of distance

Slope of velocity

Page 19: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA Data Analysis

How to Calculate a Derivative

Mathematically: x = position t = time

In Excel

12

12

tt

xx

t

x

t

x

2323 AABB

You Don’tHaveTo UseThis

Use this in Excel

LET’S DO IT : StarlogoTNG : Fish and Plankton Netlogo : Rabbits and Grass