22
Funded by: H2020 EU research and innovation programme by Anna Alberts and Michael Peters Basic Statistics

Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

by Anna Alberts and Michael Peters

Basic Statistics

Page 2: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Statistics is the science of organising, analysing, collecting, and representing data

Page 3: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Overview of Today

● Statistics gone wrong

● Basic Statistical Measures

● Correlation, Regression, Causation and some trends

Break

● Hands on: draw basic measures on example data and your dataset

● Wrap up

Page 6: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Descriptive Statistics

What is in my dataset

● Measures of Central Tendency ○ on average

● Measures of Spread ○ how is it divided

Page 7: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Measures of Central Tendency

● Mean: the average. ○ All data points divided by the number of datapoints

● Median: the value in the middle○ All the values - for which we use N - and then half of N - that is your middle value

● Mode: most frequent value○ the datapoint that appears the most

○ Very useful with qualitative data, or rankings

Page 8: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Measures of Spread

● Range: minumum to maximum

● Variance: the average of the squared differences from the Mean

● Standard deviation: ○ Measures what is “normal” or expected

○ Root Square of the variance

Page 9: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Exercise - Real Life Statistics

- Write down your length

- High to low: range

- Now find the others with your length - behind each other

- “Normal distribution”

- Who is an outlier.

Sheet for illustration!

Page 10: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Normal Distribution

Central limit theorem: averages of random variables converge to the normal

distribution if the number of observations is high enough

Y: Frequency

X: Value

Examples: blood pressure, human height, IQ scores, test scores

Page 11: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Mutual relation of two or more things (dictionary.com)

● Two variables x and y

● Y = dependent variable (to be explained)

● X = independent variable (explaining Y) ○ Y = Shoe size

○ X = Length of foot (in cm)

● How are they related?

Correlation

Page 12: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

● Y = average monthly net income in Euro● This is what “classic” statistical software produces (STATA)

Average Income per Age

Page 13: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Regression

Statistical process for estimating relationships among variables.

● “Trying to make sense of data”

● fitting a line into the “cloud” of data

● Finding causal relationships

Correlation Coefficients: quantify a relationship between two or more

random variables

● Range between - 1 and 1

● http://guessthecorrelation.com/

Page 15: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Inferences from Data

“Correlation does not imply causation.”

● Stork population and human birth rates correlate

● Expected relationship?

● Underlying: stork population and birth rates higher in countryside

● Control variables → use your common sense!

Page 16: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Inferring a “causal” dependence of different variables

Think medicine:

● Falling from a tree and breaking your arm ○ Clear causal effect

● Claim that variable x determines outcome y ○ Foot length determines shoe size

○ Storks deliver the babies

● Can you control for other factors?

● Does the relationship hold up against reasonable doubt?

Causation

Page 18: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Exercise

Excel or Google Spreadsheets

● Find the mean = Average (Value 1:Value 10)

● Find the median = Median (Value 1:Value 10)

● Find the mode = Mode (Value 1:Value 10)

● Variance = Var (Value 1:Value 10)

● Standard deviation = STDEVP

● Try out different plots “Insert, Chart”

http://bit.ly/2fVrepJ - World Bank Unemployment Data

http://bit.ly/2fUbbKM - World Bank GDP Growth

Page 19: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Wrap Up

Basic Statistical Concepts:

● Measures of Central Tendency and Measures of Spread

● Aka what is in my dataset

Correlation, Causation, Regression

http://tylervigen.com/spurious-correlations

Page 20: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

● High correlation does not imply causation

○ but there is often reason for the correlation (storks, babies, rural areas)

● Do not trust small samples!

○ Population vs. Sample size

○ Statistical studies with the number of observations below 100 dubious, better n > 1000

■ For Social Sciences survey sample sizes n > 90

○ Rule of large numbers → convergence to normal (outliers have little effect)

● Control factors

○ Was something obvious left out? Critical thinking

● Wording

○ Genuine scientists do not use words such as “definitely, absolutely…”

● Visualization

○ X and Y Axes - do they start at 0?

○ Misconstrued data, question yourself do I understand the graph?

○ If something does not add up, it might actually be wrong!

Statistics Cheat Sheet

Page 21: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Symbolism

x usually stands for the variable

n = the number of observations

Or X₄= Variable with the value X, observation number 4

Squared: X²

Square root: √

∑ : sigma = the sum of all the observations

Page 22: Basic Statistics - European Youth Press...Basic Statistics. Funded by: H2020 EU research and innovation programme Statistics is the science of organising, analysing, collecting, and

Funded by: H2020 EU researchand innovation programme

Thanks!

Find us on [email protected]

@anna_alberts@miguelitoj89

@okfde