9
Dr. Janet Winter, [email protected] Stat 200 Page 1 Chapter 1: The Nature of Probability and Statistics Learning Objectives Upon successful completion of Chapter 1, you will have applicable knowledge of the following concepts: Statistics: An Overview and Description Types of Variables Measurement Scales Sampling Methods or Methods to Select Subjects for Samples Experimental Studies I. An Overview and Description A. Statistics is the science of conducting studies to: Collect Organize Summarize Analyze Draw conclusions from data B. Where do you use statistics? Sports Business Research Public Health C. Why do we use statistics? To be able to read and understand statistical studies. To conduct research To become a better consumer and citizen. D. Two branches of statistics: Descriptive statistics (Unit 1) is a collection, organization, summarization, and presentation of data. Inferential statistics (Units 3 & 4) uses data to generalize from a sample to its population, conduct hypothesis tests to determine relationships among variables, and estimate parameters.

Chapter1_Printable

  • Upload
    scmret

  • View
    6

  • Download
    3

Embed Size (px)

DESCRIPTION

G

Citation preview

Page 1: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 1

Chapter 1: The Nature of Probability and Statistics

Learning Objectives Upon successful completion of Chapter 1, you will have applicable knowledge of the following concepts:

• Statistics: An Overview and Description • Types of Variables • Measurement Scales • Sampling Methods or Methods to Select Subjects for Samples • Experimental Studies

I. An Overview and Description A. Statistics is the science of conducting studies to:

• Collect • Organize • Summarize • Analyze • Draw conclusions from data

B. Where do you use statistics? • Sports • Business • Research • Public Health

C. Why do we use statistics? • To be able to read and understand statistical studies. • To conduct research • To become a better consumer and citizen.

D. Two branches of statistics:

• Descriptive statistics (Unit 1) is a collection, organization, summarization, and presentation of data.

• Inferential statistics (Units 3 & 4) uses data to generalize from a sample to its population, conduct hypothesis tests to determine relationships among variables, and estimate parameters.

Page 2: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 2

E. Basic Vocabulary Review • A variable is a characteristic or attribute of interest that can assume different values. It

is the question asked.

• Data are the values that the variables have assumed (answers to the question).

• Data set is a collection of data values, where each value is called a data value or datum.

• The population is all subjects of interest to the study.

• The sample is a part of the population or a subgroup (subset) of the subjects from the population.

• A parameter is a numerical summary of all data from the population. (e.g. mean of the population𝜇).

• A statistic is a numerical of the data from the sample (e.g. mean of the sample �̅�).

Symbols for Specific Parameters Type Parameter Statistic Mean 𝜇 𝑥� Size N n Variance 𝜎2 𝑠2 Standard Deviation

𝜎 s

Proportion P 𝑝�

II. Variables A. Types of Variables

I. Qualitative – Data can be placed in distinct categories, according to some characteristic or attribute.

Note: data is used to find the count and proportion for each category.

a) Examples: • Hair color • Gender • TV ownership • Do you own a car? • Employment status (full time, part time, not employed) • Social security number • License plate number • Computer account number

Page 3: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 3

II. Quantitative – Data are numerical representing counts or measurements that can be ordered and ranked.

Note: data values can be used to find the average, standard, deviations, and variance.

a) Types of Quantitative Variables: 1) Discrete – counts or data with space between its possible values. ex: shoe shize. 2) Continuous – measures or data that can assume an infinite number of values

between two endpoints. ex: foot length

b) Examples of Quantitative Variables: • How many TV’s do you own (discrete) • How many cars does your family own? (discrete) • What is your salary for the year? (discrete) • What is your weight? (continuous) • How far do you live from campus? (continuous) • What is your blood pressure? (continuous)

B. Exercise with Types of Variables Directions: Identify each of the following variables as Qualitative; Quantitative, discrete; or Quantitative, continuous. Use the chart provided below to write in your answers.

• Hair color • License plate number

• Computer password • Zip code

• Shoe size • Foot length

• Shirt size (S, M, L) • Height

• Shirt size (10, 12, 14, etc.) • Time to drive to campus

*Answer key is located at the end of the document.

Qualitative Quantitative, Discrete

Quantitative, Continuous

Page 4: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 4

C. Measurement Scales for Variables I. Types of Measurement Scales

a) Nominal – only classifies data into mutually exclusive (non-overlapping), exhausting categories in which no order or ranking can be imposed on the data. Example: SS#

b) Ordinal – classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. Example: Shirt size (S, M, L)

c) Interval – ranks data with precise differences between the data values; however, there is no meaningful zero. Example: Shoe size

d) Ratio – possesses all the characteristics of interval measurement and a true zero. Example: Foot length Note: This level of measurement is called the ratio level because the zero starting point makes ratio meaningful.

II. Levels of Measurement Data

Level Summary Example Nominal Categories only. Data

cannot be arranged in an ordering scheme.

Bear encounter states: 5 New York 20 Idaho 40 Wyoming

Ordinal Categories are ordered, but differences can’t be

found or are meaningless.

Bears according to aggressiveness: 5 not aggressive 20 somewhat aggressive 40 highly aggressive

Interval Differences are meaningful, but there is

no natural starting point, and ratios are

meaningless.

Bear den temperatures: 5° F 20° F 40° F

Ratio There is a natural zero starting point and ratios

are meaningful.

Bear migration distances: 5 miles 20 miles 40 miles

(Triola & Triola, 2006)

Categories or names only

An order is determined by “not,” “somewhat,” “highly.”

0°F doesn’t mean “no heat.” 40°F is not twice as hot as 20°F.

40 miles is twice as far as 20 miles.

Page 5: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 5

III. Exercise 01 with Measurement Scales Directions: Identify each of the following variables as Nominal, Ordinal, Interval or Ratio. Use the chart provided below to write in your answers.

• Zip code • Gender • Grade • IQ • Eye color • SAT score • Rating • Height • Ranking • Temperature (F, C) • Weight • Time

Nominal Ordinal Interval Ratio

*Answer key is located at the end of the document.

D. Exercise 02 with Measurement Scales – Transportation Table Directions: The chart shows the number of job-related injuries for each of the transportation industries for 1998. Refer to this chart to answer the following 5 questions.

Industry Number of injuries

Railroad 4520

Intercity 5100

Subway 6850

Trucking 7144

Airline 9950

1. What are the variables under study?

2. Classify each of the variables as Quantitative, continuous; Quantitative, discrete; or Qualitative.

3. Identify the level of measurement for each variable.

4. The railroad is shown as the safest transportation industry. Does that mean railroads have fewer accidents than the other industries?

5. What factors other than safety influence a person’s choice of transportation?

***Answers are located at the end of this document.

Page 6: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 6

III. Samples (part of the population, a subgroup or subset of the population) A. Samples Methods (or ways to select subjects or participants)

I. Random samples – number each subject in the population; select the subjects whose numbers match with numbers from a random number table.

II. Systematic samples – number each subject in the population; select the subject with every kth number.

III. Stratified samples – divide the population into subgroups according to some characteristic that is important to the study, then sample randomly from each subgroup.

IV. Cluster samples – randomly select entire intact groups called a cluster that represents the population.

B. Why Use Samples Instead of Populations • Saves time and money • Experiment can include more detail • It is effective

C. SRS (Simple Random Sample) • Required for most statistical procedures • If data is not collected correctly, the study is useless. • Every possible sample of size n has the same chance of being selected.

D. Ways to Collect Data from Participants I. Surveys – the researcher asks questions using a personal interview, telephone

interview, or written questions.

II. Observational study – the researcher observes and draws conclusions based on the observations.

III. Experimental study –the researcher manipulates one of the variables and determines how the manipulation influences other variables.

E. Kinds of Variables in Experimental Studies: • Variable is the characteristic of interest or the questions asked.

• Independent or explanatory variable – are manipulated by the researcher. • Dependent, outcome, or response variable changes because of the manipulation of

the independent variable.

Page 7: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 7

F. Some Problems with Experimental Studies • Hawthorne effect – the subjects know they are participating in an experiment and

change their behavior in ways that affect the results of the study.

• Confounding variable – a variable that influences the dependent or outcome variable but cannot be separated from the independent variables (e.g., IQ, previous knowledge or experience with dependent variable).

• Experimenter effect – the experimenter unintentionally influences the dependent variable or outcome of the experiment.

G. Controlling Effects • Single blind- subjects do not know if they are in the experiment or control group.

• Double blind- neither the participant nor the experimenter know who is in the experimental or control group.

H. Errors • Sampling error- caused by chance fluctuations; it is the difference between the sample

statistic and population parameter

• Non-sampling error- caused when sample data are incorrectly collected

IV.Some Misuses of Statistics (Read the sections in the textbook which describe the misuses of statistics)

A. Suspect Samples • Very small samples • Biased samples • Volunteer samples

B. Other Issues

• Ambiguous averages • Detached statistics • Implied connections • Misleading graphics • Faculty survey questions

Page 8: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 8

V. Computers and Calculators • Calculators and computers simplify statistical computations and save time.

• A calculator with statistical functions is required for this class; you will not be able to do the work in this course without it.

• The TI-83 is strongly recommended and will be the only calculator demonstrated in class.

If you choose to use a different calculator, you will be responsible to learn how to use it on your own.

VI.Conclusion • The applications of statistics are many and varied. You encounter statistics reading

newspapers or magazines, listening to the radio, or watching television.

• Statistics have improved health care, business, social science, and every aspect of life.

ANSWER KEYS TO EXERCISES Types of Variables (or Types of Data) *Exercise: Identify each variable as Qualitative or Quantitative

Qualitative Quantitative, Discrete

Quantitative, Continuous

Hair color Shoe size Foot length Computer password Shirt size (10, 12, 14, etc.) Height

License plate number Time to drive to campus Shirt size (S, M, L)

Zip code

Exercise: Measurement Scales **Exercise: Identify each of the following variables as Nominal, Ordinal, Interval or Ratio. Use the chart provided below to write in your answers.

Nominal Ordinal Interval Ratio Zip code Grade IQ Height Gender Rating SAT score Time

Eye color Ranking Temperature (F, C) Weight

Page 9: Chapter1_Printable

Dr. Janet Winter, [email protected] Stat 200 Page 9

Exercise: Measurement Scales - Transportation Table ***Exercise: The chart shows the number of job-related injuries for each of the

transportation industries for 1998. Refer to this chart to answer the following 5 questions.

Industry Number of injuries

Railroad 4520

Intercity 5100

Subway 6850

Trucking 7144

Airline 9950

1. What are the variables under study?

Answer: Industry and Number of injuries.

2. Classify each of the variables as Quantitative, continuous; Quantitative, discrete; or Qualitative. Answer: Industry is Qualitative. Number of injuries is Quantitative, discrete.

3. Identify the level of measurement for each variable.

Answer: Industry is Nominal. Number of injuries is Ratio. Additional questions to consider (these were reflective questions):

4. The railroad is shown as the safest transportation industry. Does that mean railroads have fewer accidents than the other industries?

5. What factors other than safety influence a person’s choice of transportation?

Works Cited Triola, M.D., Marc M. and Mario F. Triola. Biostatistics for the Biologoical and Health Sciences. New York: Pearson Education, Inc., 2006.