Upload
donald-hall
View
217
Download
0
Embed Size (px)
Citation preview
Lecture 1Dustin Lueker
Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals) Hypothesis testing Inferential methods for two samples Simple linear regression and correlation
STA 291 Summer 2008 Lecture 1
Research in all fields is becoming more quantitative◦ Look at research journals◦ Most graduates will need to be familiar with basic
statistical methodology and terminology Newspapers, advertising, surveys, etc.
◦ Many statements contain statistical arguments Computers make complex statistical
methods easier to use
STA 291 Summer 2008 Lecture 1
Many times statistics are used in an incorrect and misleading manner
Purposely misused◦ Companies/people wanting to furthur their
agenda Cooking the data
Completely making up data Massaging the numbers
Incidentally misused◦ Using inappropriate methods
Vital to understand a method before using it
STA 291 Summer 2008 Lecture 1
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data
Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities
Statistics are used for making informed decisions◦ Business◦ Government
STA 291 Summer 2008 Lecture 1
STA 291 Summer 2008 Lecture 1
Population◦ Total set of all subjects of interest
Entire group of people, animals, products, etc. about which we want information
Elementary Unit◦ Any individual member of the population
Sample◦ Subset of the population from which the study
actually collects information◦ Used to draw conclusions about the whole
population
STA 291 Summer 2008 Lecture 1
Variable◦ A characteristic of a unit that can vary among
subjects in the population/sample Ex: gender, nationality, age, income, hair color,
height, disease status, state of residence, grade in STA 291
Parameter◦ Numerical characteristic of the population
Calculated using the whole population Statistic
◦ Numerical characteristic of the sample Calculated using the sample
STA 291 Summer 2008 Lecture 1
Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy
May not be able to find every unit in the population◦ Time
Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing
STA 291 Summer 2008 Lecture 1
University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to
complete a questionnaire◦ One question is “have you regretted something
you did while drinking?” What is the population? Sample?
STA 291 Summer 2008 Lecture 1
Descriptive Statistics◦ Summarizing the information in a collection of
data Inferential Statistics
◦ Using information from a sample to make conclusions/predictions about the population
STA 291 Summer 2008 Lecture 1
The Current Population Survey of about 60,000 households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)
It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?
The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?
STA 291 Summer 2008 Lecture 1
Univariate data◦ Consists of observations on a single attribute
Multivariate data◦ Consists of observations on several attributes
Special case Bivariate Data
Consists of observations on two attributes
STA 291 Summer 2008 Lecture 1
Quantitative or Numerical◦ Variable with numerical values associated with
them Qualitative or Categorical
◦ Variables without numerical values associated with them
STA 291 Summer 2008 Lecture 1
Nominal◦ Gender, nationality, hair color, state of residence
Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green
hair is greater/higher/better than orange hair
Ordinal◦ Disease status, company rating, grade in STA 291
Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does
another unit
STA 291 Summer 2008 Lecture 1
Quantitative◦ Age, income, height
Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval
scale
STA 291 Summer 2008 Lecture 1
A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?
Yes No
◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque
◦ Interval (Quantitative): Number of teeth
STA 291 Summer 2008 Lecture 1
A birth registry database collects the following information on newborns◦ Birthweight: in grams◦ Infant’s Condition:
Excellent Good Fair Poor
◦ Number of prenatal visits◦ Ethnic background:
African-American Caucasian Hispanic Native American Other
What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)
STA 291 Summer 2008 Lecture 1
Statistical methods vary for quantitative and qualitative variables
Methods for quantitative data cannot be used to analyze qualitative data
Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in
Interval (Quantitative) Can be treated at Qualitative
Ordinal: Short Average Tall
Nominal: <60in or >72in 60in-72in
STA 291 Summer 2008 Lecture 1
Try to measure variables as detailed as possible◦ Quantitative
More detailed data can be analyzed in further depth
◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA)
STA 291 Summer 2008 Lecture 1
A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team
Qualitative variables are discrete
STA 291 Summer 2008 Lecture 1
Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day
43 minutes 2 minutes 27.487 minutes 27.48682 minutes
Can be subdivided into more accurate values Therefore continuous
STA 291 Summer 2008 Lecture 1
Number of children in a family Distance a car travels on a tank of gas % grade on an exam
STA 291 Summer 2008 Lecture 1
Quantitative variables can be discrete or continuous
Age, income, height?◦ Depends on the scale
Age is potentially continuous, but usually measured in years (discrete)
STA 291 Summer 2008 Lecture 1
Each possible sample has the same probability of being selected
The sample size is usually denoted by n
STA 291 Summer 2008 Lecture 1
Population of 4 students: Alf, Buford, Charlie, Dixie
Select a SRS of size n = 2 to ask them about their smoking habits◦ 6 possible samples of size 2
A,B A,C A,D B,C B,D C,D
STA 291 Summer 2008 Lecture 1
Each of the size possible samples has to have the same probability of being selected◦ How could we do this?
Roll a die Random number generator
STA 291 Summer 2008 Lecture 1
Convenience sample◦ Selecting subjects that are easily accessible to you
Volunteer sample◦ Selecting the first two subjects who volunteer to
take the survey
What are the problems with these samples?◦ Proper representation of the population◦ Bias
Examples Mall interview Street corner interview
STA 291 Summer 2008 Lecture 1