13

FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

  • Upload
    vanminh

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,
Page 2: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

FUNDAMENTALSOF RESEARCH

METHODOLOGY

Dr. Kamini Khanna

Associate Professor,Institute of Management Studies and Research,

Bharati Vidyapeeth, Navi Mumbai.

MUMBAI NEW DELHI NAGPUR BENGALURU HYDERABAD CHENNAI PUNE LUCKNOW AHMEDABAD ERNAKULAM BHUBANESWAR INDORE KOLKATA GUWAHATI

Page 3: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

© AuthorNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of thepublisher.

First Edition : 2015

Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.Phone: 022-23860170/23863863, Fax: 022-23877178E-mail: [email protected]; Website: www.himpub.com

Branch Offices :New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj,

New Delhi - 110 002. Phone: 011-23270392, 23278631; Fax: 011-23256286Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.

Phone: 0712-2738731, 3296733; Telefax: 0712-2721216Bengaluru : No. 16/1 (Old 12/1), 1st Floor, Next to Hotel Highlands, Madhava Nagar,

Race Course Road, Bengaluru - 560 001.Phone: 080-22286611, 22385461, 4113 8821, 22281541

Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,Hyderabad - 500 027. Phone: 040-27560041, 27550139

Chennai : New-20, Old-59, Thirumalai Pillai Road, T. Nagar, Chennai - 600 017.Mobile: 9380460419

Pune : First Floor, "Laksha" Apartment, No. 527, Mehunpura, Shaniwarpeth(Near Prabhat Theatre), Pune - 411 030. Phone: 020-24496323/24496333;Mobile: 09370579333

Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549

Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847

Ernakulam : 39/176 (New No. 60/251) 1st Floor, Karikkamuri Road, Ernakulam,Kochi – 682011. Phone: 0484-2378012, 2378016 Mobile: 09387122121

Bhubaneswar : 5 Station Square, Bhubaneswar - 751 001 (Odisha).Phone: 0674-2532129, Mobile: 09338746007

Indore : Kesardeep Avenue Extension, 73, Narayan Bagh, Flat No. 302, IIIrd Floor,Near Humpty Dumpty School, Indore - 452 007 (M.P.). Mobile: 09303399304

Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank,Kolkata - 700 010, Phone: 033-32449649, Mobile: 7439040301

Guwahati : House No. 15, Behind Pragjyotish College, Near Sharma Printing Press,P.O. Bharalumukh, Guwahati - 781009, (Assam).Mobile: 09883055590, 08486355289, 7439040301

DTP by : OM Graphics, Bhandup, Amir.Printed at : M/s Sri Sai Art Printer Hyderabad. On behalf of HPH.

Page 4: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

PREFACE

The book titled ‘Fundamentals of Research Methodology’ covers the entire problem faced by thestudents during the analysis and interpretation of data. The main problem confronted by students duringdata analysis is regarding the choice of statistical tools according to the characteristics of data. If wrong toolis applied, then data analysis as well as its interpretation will certainly be wrong which causes a hugeproblem. In this book, all these aspects have been covered and are being presented with the help of graphs,diagrams, charts and tables making it easier even for a layman to understand. This book is not onlyconfined to area of business or economy but also helpful in other fields like, health science, medicalsciences, etc.

The topics covers are:(1) Chapter 1 encompasses the relevance of variable in research. To understand the

characteristics of variables and how we use them in research, this chapter is divided into threemain sections. First, we illustrate the role of dependent and independent variables. Second, wediscuss the difference between experimental and non-experimental research. Finally, weexplain how variables can be characterized as either categorical or continuous. This lessonfurther describes two important types of data sets – populations and samples.

(2) Planning a study. To derive conclusions from data, we need to know how the data werecollected; that is, we need to know the methods of data collection. Along this, the presentchapter covers the measurement and sampling error (bias and measurement errors).

(3) Scale of measurements and its calculation. Measurement scales are used to categorizeand/or quantify variables. This lesson describes the different scales of measurement that arecommonly used in statistical analysis: nominal, ordinal, interval, ratio scales, etc. and how wecalculate it. Further, this chapter clarifies the measurement error in detail (reliability, validityand sensitivity).

(4) How to describe data patterns in statistics (dot plots, boxplots, segmented bar charts,etc.). One of the most convincing and appealing ways in which statistical results may bepresented is through diagrams and graphs. There are numerous ways in which statistical datamay be displayed pictorially such as different types of diagrams, graphs and maps. Very often,the problem is that of selecting the best out of several methods that may be available. This isdifficult task and requires a great deal of artistic talent and imagination on the part ofindividual or agency engaged in the preparation of diagrams and graphs. An attempt is madein this chapter to illustrate some of the major types of diagrams, graphs and maps frequentlyused in presenting statistical data.

(5) Univariate, bivariate and multivariate analysis of data. Once the raw data is collected fromboth primary and secondary sources, the next step is analyzing the same so as to draw logicalinference from them. The data collected in a survey could be voluminous in nature, dependingupon the size of the sample. In a typical research, there may be large number of variables thatthe researcher needs to analyze. The analysis could be univariate, bivariate and multivariate innature. This chapter emphasizes on application of tools or which test apply where, on thenature of data. Further, the present chapter also explain the effects of changing units,transformation of data, and how to interprete the data.

Page 5: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

(6) Normality of data. Normality analysis is a statistical method that uses parametric statistics toshow the fact that data related to the study variables came from a normally distributedpopulation or not. The present chapter provides a comprehensive knowledge of different tools(Skewness Kurtosis, Kolmogrove-Smrinove and Shapiro-Wilk plot) are used to determinewhether it is parametric or non-parametric statistics.

(7) Test of significance. On the basis of Normality test, we bifurcate the set of statistical test intoparametric and non-parametric test. When data is normally distributed, then we generallyapply parametric test or vice versa. An entire chapter is devoted to the analysis of parametricand non-parametric methods actively used by the researcher.

(8) Standard error. The last chapter of this book describes various topics that are relevant for theactual conduct of research. It provides an exhaustive and comprehensive view of the varioussteps of a research process, from problem identification to hypothesis development, i.e.,developing the statement to be tested for acceptance or rejection.

AuthorDr. Kamini KhannaAssociate Professor,

Institute of Management Studies and Research,Bharati Vidyapeeth, Navi Mumbai.

Page 6: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

CONTENTS

CHAPTER 1: Types of Variables 1 - 5

Dependent and Independent Variables, Experimental and Non-experimentalResearch Variables, Categorical and Continuous Variables, Ambiguities inclassifying a type of variables, What are Variables?, Qualitative vs. QuantitativeVariables, Discrete vs. Continuous Variables, Univariate vs. Bivariate Data(variables), Population vs. Sample, Sampling with Replacement and WithoutReplacement.

CHAPTER 2: Planning a Study: Survey, Data Collection Methods 6 - 14

Methods of Data Collection, Data Collection Methods: Pros and Cons, SurveySampling Methods, Population Parameter vs. Sample Statistic, Probability vs. Non-probability Samples, Non-probability Sampling Methods, Bias in Survey Sampling,Bias Due to Unrepresentative Sample, Bias Due to Measurement Error, SamplingError and Survey Bias, What is an Experiment?, Parts of an Experiment,Characteristics of a Well-designed Experiment, Experimental Design, AnExperimental Design Example, Completely Randomized Design, Randomized BlockDesign, Matched Pairs Design.

CHAPTER 3: Scales of Measurement in Statistics 15 - 24

Properties of Measurement Scales, Types of Scale of Measurement, Nominal Scaleof Measurement, Ordinal Scale of Measurement, Interval Scale of Measurement,Ratio Scale of Measurement, Likert Scale, Semantic Differential Scale, TheDifference between Likert and Semantic Differential Scale, Stapel Scale:,Measurement Error, Criteria for Good Measurement, Reliability, Validity andSensitivity, Split-half Reliability Method, Validity: Content Validity, ConcurrentValidity, Predictive Validity Sensitivity

CHAPTER 4: Describing Data Patterns in Statistics. 25 - 38

Pattern in Data, Centre, Spread, Shape, Unusual Features, What is a Dotplot? DotplotOverview, Dotplot Example, Bar Charts and Histograms, The Difference betweenBar Charts and Histograms, Boxplots (Box and Whisker Plots), Boxplot Basics,How to Interpret a Boxplot? Cumulative Frequency Plots, Frequency vs. CumulativeFrequency, Absolute vs. Relative Frequency, Descrete vs. Continuous variables,What is a Scatterplot?, How to Read a Scatterplot?, Patterns of Data in Scatterplots,Categorical Data: One-way Tables in Statistics, Two-way Tables in Statistics,Segmented Bar Charts.

CHAPTER 5: Univariate, Bivariate and Multivariate Analysis of Data 39 - 57Descriptive vs. Inferential Analysis, Descriptive Analysis, Inferential Analysis,Measures of Central Tendency: The Mean, Median and Mode, The Mean and theMedian, The Mean vs. the Median, Use of Mode, Effect of Changing Units,Skewness, How to Measure Variability in Statistics (Measure of Dispersion)?, TheRange, The Interquartile Range (IQR), The Variance, Difference between Mean and

Page 7: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

Standard Deviation, The Coefficient of Variance, The Standard Deviation (SD),Effect of Changing Units in Variance, Correlation Coefficient, How to Interpret aCorrelation Coefficient?, Scatterplots and Correlation Coefficients, How to Calculatea Correlation Coefficient?, Coefficient of Correlation and Probable Error, Conditionsfor the Use of Probable Error, What is Linear Regression? Prerequisites forRegression?, The Least Squares Regression Line, How to Define a Regression Line?,Properties of the Regression Line, The Coefficient of Determination, Standard Error,Simple Linear Regression Example, Problem Statement, How to Find the RegressionEquation?, How to Use the Regression Equation?, How to Find the Coefficient ofDetermination?, Residual Analysis in Regression, Residuals, Residual Plots,Transformations to Achieve Linearity, What is a Transformation to AchieveLinearity?, Methods of Transforming Variables to Achieve Linearity, How toPerform a Transformation to Achieve Linearity?, A Transformation Example,Influential Points in Regression, Outliers, Influential Points.

CHAPTER 6: Normal Distribution (Normality of the Data) 58 - 63Normal Curve, Normal or Gaussian Distribution, Standard Scores (z-Scores),Interpretation of z-scores, Normality Test.

CHAPTER 7: Test of Significance: Parametric and Non-Parametric Test 64 - 87Difference between Parametric and Non-parametric Test, Parametric Test, Student’st-distribution, Why Use the t-distribution, Degrees of Freedom, Properties of thet-distribution, When to Use the t-distribution? Applications of the t-distribution,

F-Test, F-distribution, Z-test, Chi-Square Distribution (

), The Chi-SquareDistribution, Cumulative Probability and the Chi-Square Distribution, Conditions for

Applying

2 Test, Sign Test, Points to Remember While Using Sign Test, ClaimsInvolving Matched Pairs, Guidelines for Performing a Paired-Sample Sign Test,Mann-Whitney U Test for Independent, Sample, The Wilcoxon Signed Rank Test,The Kruskal-Wallis Test.

CHAPTER 8: Standard Error 88 - 95Standard Error, Notation, Standard Deviation of Sample Estimates, Standard Error ofSample Estimates, Finite Population Correction (fpc), What is a ConfidenceInterval?, How to Interpret Confidence Intervals?, Confidence Interval DataRequirements, How to Construct a Confidence Interval?, What is HypothesisTesting?, Statistical Hypotheses Hypothesis Tests, Decision Errors, Decision Rules,One-tailed and Two-tailed Tests, Power of a Hypothesis Test, Effect Size, Factorsthat Affect Power, How to Test Hypotheses?, How to Conduct Hypothesis Tests?,Interpretation of the P-value.

Statistical Tables 96 - 121

Page 8: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

C

H

A

P

T

E

R

TYPES OF

VARIABLES1

All experiments examine some kind of variable(s). A variable is not only something that wemeasure, but also something that we can manipulate and something we can control for. Tounderstand the characteristics of variables and how we use them in research, this chapter is dividedinto three main sections. First, we illustrate the role of dependent and independent variables.Second, we discuss the difference between experimental and non-experimental research. Finally, weexplain how variables can be characterised as either categorical or continuous.

DEPENDENT AND INDEPENDENT VARIABLES

An independent variable, sometimes called an experimental or predictor variable, is a variablethat is being manipulated in an experiment in order to observe the effect on a dependent variable,sometimes called an outcome variable.

Imagine that a tutor asks 100 students to complete a maths test. The tutor wants to know whysome students perform better than others. Whilst the tutor does not know the answer to this, shethinks that it might be because of two reasons: (1) some students spend more time revising for theirtest; and (2) some students are naturally more intelligent than others. As such, the tutor decides toinvestigate the effect of revision time and intelligence on the test performance of the 100 students.The dependent and independent variables for the study are:

Dependent Variable: Test Mark (measured from 0 to 100)

Independent Variables: Revision time (measured in hours) and Intelligence (measured usingIQ score)

The dependent variable is simply that, a variable that is dependent on an independentvariable(s). For example, in our case, the test mark that a student achieves is dependent on revisiontime and intelligence. Whilst revision time and intelligence (the independent variables) may (or maynot) cause a change in the test mark (the dependent variable), the reverse is implausible; in otherwords, whilst the number of hours a student spends revising and the higher a student’s IQ score may(or may not) change the test mark that a student achieves, a change in a student’s test mark has nobearing on whether a student revises more or is more intelligent (this simply doesn’t make sense).

Page 9: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

Therefore, the aim of the tutor’s investigation is to examine whether these independentvariables – revision time and IQ – result in a change in the dependent variable, the students’ testscores. However, it is also worth noting that whilst this is the main aim of the experiment, the tutormay also be interested to know if the independent variables – revision time and IQ – are alsoconnected in some way.

In the section on experimental and non-experimental research that follows, we find out a littlemore about the nature of independent and dependent variables.

EXPERIMENTAL AND NON-EXPERIMENTAL RESEARCH Experimental research: In experimental research, the aim is to manipulate an

independent variable(s) and then examine the effect that this change has on adependent variable(s). Since it is possible to manipulate the independent variable(s),experimental research has the advantage of enabling a researcher to identify a causeand effect between variables. For example, take our example of 100 studentscompleting a maths exam where the dependent variable was the exam mark (measuredfrom 0 to 100), and the independent variables were revision time (measured in hours)and intelligence (measured using IQ score). Here, it would be possible to use anexperimental design and manipulate the revision time of the students. The tutor coulddivide the students into two groups, each made up of 50 students. In “group one”, thetutor could ask the students not to do any revision. Alternately, “group two” could beasked to do 20 hours of revision in the two weeks prior to the test. The tutor could thencompare the marks that the students achieved.

Non-experimental research: In non-experimental research, the researcher does notmanipulate the independent variable(s). This is not to say that it is impossible to do so,but it will either be impractical or unethical to do so. For example, a researcher may beinterested in the effect of illegal, recreational drug use (the independent variable(s)) oncertain types of behaviour (the dependent variable(s)). However, whilst possible, itwould be unethical to ask individuals to take illegal drugs in order to study what effectthis had on certain behaviours. As such, a researcher could ask both drug and non-drugusers to complete a questionnaire that had been constructed to indicate the extent towhich they exhibited certain behaviours. Whilst it is not possible to identify the causeand effect between the variables, we can still examine the association or relationshipbetween them. In addition to understanding the difference between dependent andindependent variables, and experimental and non-experimental research, it is alsoimportant to understand the different characteristics amongst variables. This isdiscussed next.

CATEGORICAL AND CONTINUOUS VARIABLESCategorical variables are also known as discrete or qualitative variables. Categorical variables

can be further categorized as nominal, ordinal or dichotomous. Nominal variables are variables that have two or more categories, but which do not have an

intrinsic order. For example, a real estate agent could classify their types of property intodistinct categories such as houses, condos, co-operatives or bungalows. So, “type of property”is a nominal variable with four categories called houses, condos, co-operatives and bungalows.Of note, the different categories of a nominal variable can also be referred to as groups or

Page 10: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

levels of the nominal variable. Another example of a nominal variable would be classifyingwhere people live in the USA by state. In this case, there will be many more levels of thenominal variable (50 in fact).

Dichotomous variables are nominal variables which have only two categories or levels. Forexample, if we were looking at gender, we would most probably categorize somebody aseither “male” or “female”. This is an example of a dichotomous variable (and also a nominalvariable). Another example might be if we asked a person if they owned a mobile phone. Here,we may categorize mobile phone ownership as either “Yes” or “No”. In the real estate agentexample, if type of property had been classified as either residential or commercial, then “typeof property” would be a dichotomous variable.

Ordinal variables are variables that have two or more categories just like nominal variablesonly the categories can also be ordered or ranked. So, if you asked someone if they liked thepolicies of the Democratic Party and they could answer either “Not very much”, “They are OK”or “Yes, a lot”, then you have an ordinal variable. Why? Because you have three categories,namely “Not very much”, “They are OK” and “Yes, a lot” and you can rank them from the mostpositive (Yes, a lot), to the middle response (They are OK), to the least positive (Not very much).However, whilst we can rank the levels, we cannot place a “value” to them; we cannot say that“They are OK” is twice as positive as “Not very much” for example.

Continuous variables are also known as quantitative variables. Continuous variables can befurther categorized as either interval or ratio variables.

Interval variables are variables for which their central characteristic is that they can bemeasured along a continuum and they have a numerical value (for example, temperaturemeasured in degrees Celsius or Fahrenheit). So, the difference between 20oC and 30oC is thesame as 30oC to 40oC. However, temperature measured in degrees Celsius or Fahrenheit isNOT a ratio variable.

Ratio variables are interval variables, but with the added condition that 0 (zero) of themeasurement indicates that there is none of that variable. So, temperature measured indegrees Celsius or Fahrenheit is not a ratio variable because 0oC does not mean there is notemperature. However, temperature measured in Kelvin is a ratio variable as 0 Kelvin (oftencalled absolute zero) indicates that there is no temperature whatsoever. Other examples ofratio variables include height, mass, distance and many more. The name “ratio” reflects thefact that you can use the ratio of measurements. So, for example, a distance of 10 metres istwice the distance of 5 metres.

AMBIGUITIES IN CLASSIFYING A TYPE OF VARIABLES

In some cases, the measurement scale for data is ordinal, but the variable is treated ascontinuous. For example, a Likert scale that contains five values – strongly agree, agree, neitheragree nor disagree, disagree, and strongly disagree – is ordinal. However, where a Likert scalecontains seven or more value – strongly agree, moderately agree, agree, neither agree nor disagree,disagree, moderately disagree, and strongly disagree – the underlying scale is sometimes treated ascontinuous (although where you should do this is a cause of great dispute).

It is worth noting that how we categorise variables is somewhat of a choice. Whilst wecategorised gender as a dichotomous variable (you are either male or female), social scientists maydisagree with this, arguing that gender is a more complex variable involving more than two

Page 11: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

distinctions, but also including measurement levels like genderqueer, intersex and transgender. Atthe same time, some researchers would argue that a Likert scale, even with seven values, shouldnever be treated as a continuous variable.

WHAT ARE VARIABLES?

In statistics, a variable has two defining characteristics:♦ A variable is an attribute that describes a person, place, thing, or idea.

♦ The value of the variable can “vary” from one entity to another.

For example, a person’s hair colour is a potential variable, which could have the value of“blond” for one person and “brunette” for another.

QUALITATIVE VS. QUANTITATIVE VARIABLESVariables can be classified as qualitative (categorical) or quantitative (numeric).

Qualitative: Qualitative variables take on values that are names or labels. The colour ofa ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, and terrier)would be examples of qualitative or categorical variables.

Quantitative: Quantitative variables are numeric. They represent a measurable quantity.For example, when we speak of the population of a city, we are talking about the

number of people in the city

a measurable attribute of the city. Therefore, populationwould be a quantitative variable.

In algebraic equations, quantitative variables are represented by symbols (e.g., x, y, or z).

DISCRETE VS. CONTINUOUS VARIABLESQuantitative variables can be further classified as discrete or continuous. If a variable can take

on any value between its minimum value and its maximum value, it is called a continuous variable;otherwise, it is called a discrete variable.

Some examples will clarify the difference between discrete and continuous variables.

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250pounds. The weight of a fire fighter would be an example of a continuous variable; since a firefighter’s weight could take on any value between 150 and 250 pounds.

Suppose we flip a coin and count the number of heads. The number of heads could be anyinteger value between 0 and plus infinity. However, it could not be any number between 0 andplus infinity. We could not, for example, get 2.3 heads. Therefore, the number of heads mustbe a discrete variable.

UNIVARIATE VS. BIVARIATE DATA (VARIABLES)Statistical data are often classified according to the number of variables being studied.

Univariate Data: When we conduct a study that looks at only one variable, we say thatwe are working with univariate data. Suppose, for example, that we conducted a survey

Page 12: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

to estimate the average weight of high school students. Since we are only working withone variable (weight), we would be working with univariate data.

Bivariate Data: When we conduct a study that examines the relationship between twovariables, we are working with bivariate data. Suppose we conducted a study to see ifthere were a relationship between the height and weight of high school students. Sincewe are working with two variables (height and weight), we would be working withbivariate data.

Multivariate Data: When we conduct a study that examines the relationship betweenmore than two variables. For example, height, weight, nutritional status, IQ, dropoutrate, etc. means we are dealing with multivariate data.

POPULATION VS. SAMPLEThe study of statistics revolves around the study of data sets. This lesson further describes two

important types of data sets – populations and samples. Along the way, we introduce simple randomsampling, the main method used in this chapter to select samples.

The main difference between a population and sample has to do with how observations areassigned to the data set.

A population includes all of the elements from a set of data.

A sample consists of one or more observations from the population.

Depending on the sampling method, a sample can have fewer observations than the population,the same number of observations, or more observations. More than one sample can be derived fromthe same population.

Other differences have to do with nomenclature, notation, and computations. For example,

A measurable characteristic of a population, such as a mean or standard deviation, iscalled a parameter; but a measurable characteristic of a sample is called a statistic.

We will see in future lessons that the mean of a population is denoted by the symbol μ;but the mean of a sample is denoted by the symbol x.

We will also learn in future lessons that the formula for the standard deviation of apopulation is different from the formula for the standard deviation of a sample.

We can explain this concept with the help of Simple Random Sampling. A samplingmethod is a procedure for selecting sample elements from a population. Simple randomsampling refers to a sampling method that has the following properties:

The population consists of N objects.

The sample consists of n objects.

All possible samples of n objects are equally likely to occur.

Page 13: FUNDAMENTALS - · PDF fileexplain how variables can be characterized as either categorical or continuous. This lesson further describes two important types of data sets ... F-Test,

An important benefit of simple random sampling is that it allows researchers to use statisticalmethods to analyze sample results. For example, given a simple random sample, researchers can usestatistical methods to define a confidence interval around a sample mean. Statistical analysis is notappropriate when non-random sampling methods are used.

There are many ways to obtain a simple random sample. One way would be the lottery method.Each of the N population members is assigned a unique number. The numbers are placed in a bowland thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population members havingthe selected numbers are included in the sample.

SAMPLING WITH REPLACEMENT AND WITHOUT REPLACEMENTSuppose we use the lottery method described above to select a simple random sample. After we

pick a number from the bowl, we can put the number aside or we can put it back into the bowl. If weput the number back in the bowl, it may be selected more than once; if we put it aside, it can selectedonly one time.

When a population element can be selected more than one time, we are sampling withreplacement. When a population element can be selected only one time, we are sampling withoutreplacement. In the next chapter, we are discussing the detail of data collection method.