35
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Embed Size (px)

Citation preview

Page 1: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

UNLOCKING THE SECRETS HIDDEN IN YOUR DATA

PART 1

Data and Data Analysis

Page 2: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Data

What is Data?Data is information gathered from

observation, experimentation or modeling Qualitative – not precise (usually descriptive) Quantitative - Precise (usually numeric)

The output of your model (i.e. number of healthy agents, number of infected agents, time…)

Page 3: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Data

How do we gather data?Data collection is the systematic recording of

information while changing Variables (a quantity that may assume any given value or set of values).

Collect the output (i.e. number of healthy agents, number of infected agents, time…) while changing the variables (number of devils, number initially infected) of the model

Page 4: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Data

Why should we get data?To answer questionsTo develop understandingTo validate experimentsWhat should we do with data?Display – usually graph it to make it easier to see

trendsAnalysis – use math skills to uncover patterns and

trends in data sets Interpretation - involves possible explanation

those patterns and trends.

Page 5: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Extracting Data from StarlogoTNG

There are three ways to extract data from StarlogoTNG

Collect the data by hand Create a chart in Starlogo TNG and extract the data to

Excel Create a table in Stalogo TNG and extract the data to

Excel

Page 6: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Why Should We Display Data ?

What did you see?Makes your data

visibleHelps find

obvious patterns Does the data

makes sense? Are your

assumptions correct?

Did you collect enough data?

Rabbit Population

0

50

100

150

200

250

300

0 50 100 150 200 250 300 350

Time

Num

ber o

f Rab

bits

Page 7: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Why Should We Analyze Data ?

What does it Mean? Is there is more information in

the data emergent behavior unexpected patterns

Was the hypothesis correct ?

Why Does it Matter? Draw conclusions from data

More grass gives more rabbits

To help you answer questions Provide visible evidence and

support for our conclusions to you audience (e.g. Challenge judges)

Validity of model, experiment, theory, …

Rabbit Population

0

50

100

150

200

250

300

0 50 100 150 200 250 300 350

Time

Num

ber o

f Rab

bits

Page 8: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Ways to Analyze Data

Plotting Data Ways to visually

understand data

Statistics Makes it easier to

compare data Mean, Median,

Mode Makes it clear if you

have NOISY data Range,

Variance, Standard Deviation

0

5

10

15

20

25

30

0 10 20 30 40 50 60

Mean Pink

Pink

Mean Blue

Blue

Page 9: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Ways to Analyze Data

Derivatives (Slopes) Tell if changes in

parameters affect data Parameter 2 has a

greater effect than Parameter 1

Get more information from data 0

0.5

1

1.5

2

2.5

3

3.5

4

0.00 2.00 4.00 6.00 8.00 10.00 12.00

Base Case

Parameter 1

Parameter 2

Slope = 0.08

Slope = 0.16

Slope = 0.39GreatDerivative

Page 10: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Collecting Data: Variable Sweeping

Did you collect enough data? Did you vary the

parameters throughout their ranges? If you have sliders (input

variables) in your program, you need data for the full range of those sliders. Minimum 3 runs for a

single variable (low, medium, high)

More than one slider (variable), must vary them separately. 2 variable perhaps 9

runs

Page 11: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Collecting Data from Starlogo TNG

Gathering Data by hand Tasmanian Devils Variable sweep More than one

variable Multiple runs at each

variable combination Average the data

Page 12: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Collecting Data from Starlogo TNG

Lets Do It Open Tasmanian Devil Run a section of the

data sheet Do variable sweep

Initial Population Initial Percent Infected

Multiple runs at each set of variables

Collect output in data sheet Number healthy after

200 ticks

Page 13: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Collecting Data from Starlogo TNG

Put Data into Excel

Calculate Averages

Page 14: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Collecting Data from Starlogo TNG

Make a Summary Table

Create XY Charts

Page 15: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Collecting Data from Starlogo TNG

Make a 3D Chart

Page 16: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Extracting from Starlogo TNG

Data can be extracted from a graph or a table in Starlogo TNG Create a graph using the line graph block

Put reset clock on Setup block to clear and reset graph

LET’S DO IT – Tasmanian Devils !!

Page 17: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Extracting from StarlogoTNG

After program is run Click on graph in Spaceland

Save File – Excel file

LET’S DO IT – Tasmanian Devils !!

Page 18: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Data Analysis: Plotting Data – Types of PlotsAll plots from http://www.statcan.ca

Pie Charts – music preference

Pets purchased at pet store

Bar Charts – preferred snacks

Page 19: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Data Analysis: Plotting Data – Types of Plots

All plots from http://www.statcan.ca

XY Graphs – cell phone use http://www.statcan.ca

Scatter Plotshttp://en.wikipedia.org/wiki/Scatterplot

Page 20: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Activity in Excel

Open Tasmanian Devil Export file (csv file ) by double clicking on the file

In EXCEL - Insert ChartSelect type of chart

XY Scatter Hit the Next button

LET’S DO IT

Page 21: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Activity in Excel

Select Data Range

Highlight data to be plotted

Page 22: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Activity in Excel

Label each data series NEXT - Label Graph and

Axis

Page 23: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Activity in Excel

Choose where you want the graph to be

Get your graph Tasmani Devil Population

-100

0

100

200

300

0 10 20 30 40

Time

Po

pu

lati

on

Sick

Healthy

Page 24: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Extracting from Netlogo

Two ways 1st Way: Write code to

extract the data you want – see File Output Example in the Code Examples

Open file in setup procedure

Create a write-to-file procedure

Page 25: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Extracting from Netlogo

2nd way: Extract data from Netlogo graphs Have Netlogo generate graph on

Interface page (example on later slide) Create a setup-plot procedure and a

do-plot procedure Call the setup-plot procedure in setup

procedure Call do-plot procedure in go procedure

Page 26: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Extracting from Netlogo

Run model until sufficient data obtained

(PC) Right Click on Graph/ (Mac) Control Click on Graph

Select Export Choose location and File name -

select save Excel File is created – Next Slide

Contains all the information in the plot and input parameters used.

Contains excess information about the plot (color, pen down, mode, interval…)

LET’S DO IT – Open Rabbits Grass Weeds

Page 27: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Plotting Data – Extracting from Netlogo

This is what You need

Page 28: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

2

6

10

14

18

22

0 10 20 30 40 50 60

Noisy

Noisier

Mean (both)

Noisy + 2SD

Noisy - 2SD

Noisier + 2SD

Noisier - 2SD

Statistics

Statistics help you Summarize data Describe data Analyze data

2

6

10

14

18

22

0 10 20 30 40 50 60

Noisy

Noisier

Hard to describe the difference Between the two data sets

Now it is easy to summarize, describe and analyze the data….The blue and the pink data have the Same AVERAGE value (mean) but theblue data is “NOISIER” (greaterstandard deviation). Therefore…

Page 29: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Statistics – How to Calculate in Excel

+,-,*,/ used for addition, subtraction, multiplication and division.

Each cell has a label based on the column and row.

Use cells to perform calculations instead of numbers. Example : =(A4+B4)/C4

Perform calculations on an entire column - copy and paste the equation .Warning : this changes the cell number for each line.

Fix a specific cell - use the $ symbol, example (A4+B4)/$C$1

Excel has many built in statistical functions

Makes life easy!

E1

Page 30: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Calculate in Excel Activity

Open a blank spread sheet in Excel Create 2 columns of numbers Then Add, Subtract, Multiple and Divide the first row Copy and paste the formulas

Page 31: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Statistics – Measurements of Central Tendency

Mean (Average), Median, and Mode

Definitions Mean (Average) – Sum divided by the number of data points Median – Middle data point when arranged from highest to

lowest Mode – Most frequent value

Use data set to calculate Mean (Average) Median, Mode, Max and Min

Select Cell where you want the value of the function to appear Select Insert then Function Select Statistical Select function wanted (AVERAGE, MEDIAN, or MODE) then

hit OK Select Range of data you want to analyze by clicking on range

symbol and highlighting range. Hit enter or OK

LET’S DO IT : StarlogoTNG : Fish and Plankton data Netlogo : Rabbits and Grass data

Page 32: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Statistics – Measurements of Data SpreadRange, Variance and Standard Deviation

Definitions Range = maximum - minimum

Variance = measures noise of the data around the mean value.

Standard Deviation (S) is the square root of the variance. Most commonly used measure of spread (same units as the data). Another reason to use S:

~68% of the data are in the interval Mean – S to Mean

+ S ~95% of the data are in the interval

Mean – 2 S to Mean + 2 S

~99% of the data are in the interval Mean – 3 S to

Mean + 3 S

EXCEL does it for you!!!

Rabbit Population

0

50

100

150

200

250

300

0 500 1000 1500 2000

Ticks

Num

ber o

f Rab

vits

Rabbits Mean Mean - 2 S Mean + 2 S

LET’S DO IT : StarlogoTNG : Fish and Plankton data Netlogo : Rabbits and Grass data

Page 33: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Derivatives

What are Derivatives? A simple calculation using data Instantaneous rate of change

= SLOPEWhy use Derivatives?

Get more information from data More Ways to comparison data Car moving down a road

Data = the distance traveled Velocity = the 1st derivative

of distance Acceleration = 2nd derivative

of distance = the 1st derivative of velocity

0

5

10

15

20

25

30

35

40

0 2 4 6 8 10 12

Dis

tanc

e

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12

Vel

ocit

y

-4

-3

-2

-1

0

1

2

0 2 4 6 8 10 12

Time

Acc

eler

atio

n

Slope of distance

Slope of velocity

Page 34: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

A Note on Randomness

This data is not RANDOM

Random means that there is an equal probability of getting each outcome (like rolling a die)

There is scatter in the data but it is not random 2

6

10

14

18

22

0 10 20 30 40 50 60

Noisy

Noisier

Page 35: UNLOCKING THE SECRETS HIDDEN IN YOUR DATA PART 1 Data and Data Analysis

Other Things to Think About

Is there “scatter” in your model? Evaluate how the “scatter” effects your results – repeat model runs Make sure you get enough data to get a good statistics

Did you collect enough data? Did you let the model run long enough? Has the model reached

“equilibrium”

Rabbit Population

0

50

100

150

200

250

300

350

0 10 20 30 40 50 60

Ticks

Nu

mb

er o

f R

abb

its

Rabbit Population

0

50

100

150

200

250

300

350

0 100 200 300 400 500 600 700 800 900 1000

Ticks

Nu

mb

er o

f R

abb

its