Upload
emma-may
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
UNLOCKING THE SECRETS HIDDEN IN YOUR DATA
PART 1
Data and Data Analysis
Data
What is Data?Data is information gathered from
observation, experimentation or modeling Qualitative – not precise (usually descriptive) Quantitative - Precise (usually numeric)
The output of your model (i.e. number of healthy agents, number of infected agents, time…)
Data
How do we gather data?Data collection is the systematic recording of
information while changing Variables (a quantity that may assume any given value or set of values).
Collect the output (i.e. number of healthy agents, number of infected agents, time…) while changing the variables (number of devils, number initially infected) of the model
Data
Why should we get data?To answer questionsTo develop understandingTo validate experimentsWhat should we do with data?Display – usually graph it to make it easier to see
trendsAnalysis – use math skills to uncover patterns and
trends in data sets Interpretation - involves possible explanation
those patterns and trends.
Extracting Data from StarlogoTNG
There are three ways to extract data from StarlogoTNG
Collect the data by hand Create a chart in Starlogo TNG and extract the data to
Excel Create a table in Stalogo TNG and extract the data to
Excel
Why Should We Display Data ?
What did you see?Makes your data
visibleHelps find
obvious patterns Does the data
makes sense? Are your
assumptions correct?
Did you collect enough data?
Rabbit Population
0
50
100
150
200
250
300
0 50 100 150 200 250 300 350
Time
Num
ber o
f Rab
bits
Why Should We Analyze Data ?
What does it Mean? Is there is more information in
the data emergent behavior unexpected patterns
Was the hypothesis correct ?
Why Does it Matter? Draw conclusions from data
More grass gives more rabbits
To help you answer questions Provide visible evidence and
support for our conclusions to you audience (e.g. Challenge judges)
Validity of model, experiment, theory, …
Rabbit Population
0
50
100
150
200
250
300
0 50 100 150 200 250 300 350
Time
Num
ber o
f Rab
bits
Ways to Analyze Data
Plotting Data Ways to visually
understand data
Statistics Makes it easier to
compare data Mean, Median,
Mode Makes it clear if you
have NOISY data Range,
Variance, Standard Deviation
0
5
10
15
20
25
30
0 10 20 30 40 50 60
Mean Pink
Pink
Mean Blue
Blue
Ways to Analyze Data
Derivatives (Slopes) Tell if changes in
parameters affect data Parameter 2 has a
greater effect than Parameter 1
Get more information from data 0
0.5
1
1.5
2
2.5
3
3.5
4
0.00 2.00 4.00 6.00 8.00 10.00 12.00
Base Case
Parameter 1
Parameter 2
Slope = 0.08
Slope = 0.16
Slope = 0.39GreatDerivative
Collecting Data: Variable Sweeping
Did you collect enough data? Did you vary the
parameters throughout their ranges? If you have sliders (input
variables) in your program, you need data for the full range of those sliders. Minimum 3 runs for a
single variable (low, medium, high)
More than one slider (variable), must vary them separately. 2 variable perhaps 9
runs
Collecting Data from Starlogo TNG
Gathering Data by hand Tasmanian Devils Variable sweep More than one
variable Multiple runs at each
variable combination Average the data
Collecting Data from Starlogo TNG
Lets Do It Open Tasmanian Devil Run a section of the
data sheet Do variable sweep
Initial Population Initial Percent Infected
Multiple runs at each set of variables
Collect output in data sheet Number healthy after
200 ticks
Collecting Data from Starlogo TNG
Put Data into Excel
Calculate Averages
Collecting Data from Starlogo TNG
Make a Summary Table
Create XY Charts
Collecting Data from Starlogo TNG
Make a 3D Chart
Plotting Data – Extracting from Starlogo TNG
Data can be extracted from a graph or a table in Starlogo TNG Create a graph using the line graph block
Put reset clock on Setup block to clear and reset graph
LET’S DO IT – Tasmanian Devils !!
Plotting Data – Extracting from StarlogoTNG
After program is run Click on graph in Spaceland
Save File – Excel file
LET’S DO IT – Tasmanian Devils !!
Data Analysis: Plotting Data – Types of PlotsAll plots from http://www.statcan.ca
Pie Charts – music preference
Pets purchased at pet store
Bar Charts – preferred snacks
Data Analysis: Plotting Data – Types of Plots
All plots from http://www.statcan.ca
XY Graphs – cell phone use http://www.statcan.ca
Scatter Plotshttp://en.wikipedia.org/wiki/Scatterplot
Plotting Data – Activity in Excel
Open Tasmanian Devil Export file (csv file ) by double clicking on the file
In EXCEL - Insert ChartSelect type of chart
XY Scatter Hit the Next button
LET’S DO IT
Plotting Data – Activity in Excel
Select Data Range
Highlight data to be plotted
Plotting Data – Activity in Excel
Label each data series NEXT - Label Graph and
Axis
Plotting Data – Activity in Excel
Choose where you want the graph to be
Get your graph Tasmani Devil Population
-100
0
100
200
300
0 10 20 30 40
Time
Po
pu
lati
on
Sick
Healthy
Plotting Data – Extracting from Netlogo
Two ways 1st Way: Write code to
extract the data you want – see File Output Example in the Code Examples
Open file in setup procedure
Create a write-to-file procedure
Plotting Data – Extracting from Netlogo
2nd way: Extract data from Netlogo graphs Have Netlogo generate graph on
Interface page (example on later slide) Create a setup-plot procedure and a
do-plot procedure Call the setup-plot procedure in setup
procedure Call do-plot procedure in go procedure
Plotting Data – Extracting from Netlogo
Run model until sufficient data obtained
(PC) Right Click on Graph/ (Mac) Control Click on Graph
Select Export Choose location and File name -
select save Excel File is created – Next Slide
Contains all the information in the plot and input parameters used.
Contains excess information about the plot (color, pen down, mode, interval…)
LET’S DO IT – Open Rabbits Grass Weeds
Plotting Data – Extracting from Netlogo
This is what You need
2
6
10
14
18
22
0 10 20 30 40 50 60
Noisy
Noisier
Mean (both)
Noisy + 2SD
Noisy - 2SD
Noisier + 2SD
Noisier - 2SD
Statistics
Statistics help you Summarize data Describe data Analyze data
2
6
10
14
18
22
0 10 20 30 40 50 60
Noisy
Noisier
Hard to describe the difference Between the two data sets
Now it is easy to summarize, describe and analyze the data….The blue and the pink data have the Same AVERAGE value (mean) but theblue data is “NOISIER” (greaterstandard deviation). Therefore…
Statistics – How to Calculate in Excel
+,-,*,/ used for addition, subtraction, multiplication and division.
Each cell has a label based on the column and row.
Use cells to perform calculations instead of numbers. Example : =(A4+B4)/C4
Perform calculations on an entire column - copy and paste the equation .Warning : this changes the cell number for each line.
Fix a specific cell - use the $ symbol, example (A4+B4)/$C$1
Excel has many built in statistical functions
Makes life easy!
E1
Calculate in Excel Activity
Open a blank spread sheet in Excel Create 2 columns of numbers Then Add, Subtract, Multiple and Divide the first row Copy and paste the formulas
Statistics – Measurements of Central Tendency
Mean (Average), Median, and Mode
Definitions Mean (Average) – Sum divided by the number of data points Median – Middle data point when arranged from highest to
lowest Mode – Most frequent value
Use data set to calculate Mean (Average) Median, Mode, Max and Min
Select Cell where you want the value of the function to appear Select Insert then Function Select Statistical Select function wanted (AVERAGE, MEDIAN, or MODE) then
hit OK Select Range of data you want to analyze by clicking on range
symbol and highlighting range. Hit enter or OK
LET’S DO IT : StarlogoTNG : Fish and Plankton data Netlogo : Rabbits and Grass data
Statistics – Measurements of Data SpreadRange, Variance and Standard Deviation
Definitions Range = maximum - minimum
Variance = measures noise of the data around the mean value.
Standard Deviation (S) is the square root of the variance. Most commonly used measure of spread (same units as the data). Another reason to use S:
~68% of the data are in the interval Mean – S to Mean
+ S ~95% of the data are in the interval
Mean – 2 S to Mean + 2 S
~99% of the data are in the interval Mean – 3 S to
Mean + 3 S
EXCEL does it for you!!!
Rabbit Population
0
50
100
150
200
250
300
0 500 1000 1500 2000
Ticks
Num
ber o
f Rab
vits
Rabbits Mean Mean - 2 S Mean + 2 S
LET’S DO IT : StarlogoTNG : Fish and Plankton data Netlogo : Rabbits and Grass data
Derivatives
What are Derivatives? A simple calculation using data Instantaneous rate of change
= SLOPEWhy use Derivatives?
Get more information from data More Ways to comparison data Car moving down a road
Data = the distance traveled Velocity = the 1st derivative
of distance Acceleration = 2nd derivative
of distance = the 1st derivative of velocity
0
5
10
15
20
25
30
35
40
0 2 4 6 8 10 12
Dis
tanc
e
0
1
2
3
4
5
6
7
8
0 2 4 6 8 10 12
Vel
ocit
y
-4
-3
-2
-1
0
1
2
0 2 4 6 8 10 12
Time
Acc
eler
atio
n
Slope of distance
Slope of velocity
A Note on Randomness
This data is not RANDOM
Random means that there is an equal probability of getting each outcome (like rolling a die)
There is scatter in the data but it is not random 2
6
10
14
18
22
0 10 20 30 40 50 60
Noisy
Noisier
Other Things to Think About
Is there “scatter” in your model? Evaluate how the “scatter” effects your results – repeat model runs Make sure you get enough data to get a good statistics
Did you collect enough data? Did you let the model run long enough? Has the model reached
“equilibrium”
Rabbit Population
0
50
100
150
200
250
300
350
0 10 20 30 40 50 60
Ticks
Nu
mb
er o
f R
abb
its
Rabbit Population
0
50
100
150
200
250
300
350
0 100 200 300 400 500 600 700 800 900 1000
Ticks
Nu
mb
er o
f R
abb
its