34
The Who, What, Why, Where, When and How of Data

BMGT220 - Lecture 1

  • Upload
    buddybd

  • View
    215

  • Download
    2

Embed Size (px)

DESCRIPTION

Business Statistics

Citation preview

The Who, What, Why, Where, When and How of Data

The Who, What, Why, Where, When and How of Data

Statistics consists of two parts:Descriptive statistics (coping with lots of numbers)Draw a picture (graph, charts etc)Calculate a few numbers which summarize the data (mean, median, percentile)Inferential statisticsHow can one make decisions and predictions abouta population even if we have data for relatively few subjects from that population?We need to generalize the facts we learn from a sample ( i.e. a part of the population) to the entire population

Data typesLets consider the following questionsWhat is your sex?How tall are you? (inches)What year are you in school?What is your major?6. How many miles do you travel to UMD each day?7. What is your GPA?

VariablesEach question measures some aspect of you.Variable: the aspect/characteristic that differs from subject to subject, individual to individual.Age, Sex, Major, Data: the value of the variables20, Male, English,

Two Types of VariablesQuantitative or numerical variablesNumbers, measurementsAge, height, miles traveled

Qualitative or categorical variablesClassifying each observationSex, year in school, major

Quantitative or numerical variablesDiscrete variables: there is a natural gapbetween the valuesNumber of childrenNumber of credit cardsContinuous variables: the values can be arbitrarily close togetherWeightHeightAge

Qualitative or categorical variablesOrdinal variables: categories that have a natural orderingNumbers could be assigned to categoriesClass 1 = Freshman2 = Sophomore3 = Junior4 = SeniorGrade A, B, C, D, F (GPA)Preference Strongly Agree, Agree, Disagree, Strongly DisagreeNominal variables: categories that have no natural orderingMajor business, mathematics, historyEye color blue, green, black

Types of variables summary

variablesqualitativequantitativediscretecontinuousnominalordinal

Examples (What are the types?)Appraisal of a companys inventory level (excellent, good, fair, poor)

2. Mode of transportation to work. (automobile, bicycle, bus, subway, walk)

3. Speed of a vehicle

4. The number of persons in each family.

5. The diagnostic test for pneumonia (symptoms present, symptoms absent)

Qualitative, ordinalQualitative, nominalQuantitative, continuousQuantitative, discreteQualitative, nominal

Interval dataNo meaningful zero point; cant multiply or divide but the difference between two values is meaningfultemperatureRatio dataMeaningful zero point; can multiply and divideIncome, weight, heightTime series dataOrdered data values over timeCross sectional dataData values observed at a single point in time

Sales (in $10000s)2003200420052006New York435460475490Dallas320345375395Seattle405390410395Orlando260270285280

Time Series DataCross Sectional Data

Surveys and SamplingKey ideas1: Examine a part of the wholeThe first idea is to draw a sample. Goal: learn about an entire population of individuals, but examining all of them is not feasibleExamine a smaller group of individuals, called sample, chosen from the population.Samples that over- or underemphasize some characteristics of the population are said to be biased

BiasSample doesnt represent population.Trouble!! Generalizations are no longer valid Conclusions may no longer be true

Sources of BiasSelection BiasProblem in sampling scheme; systematic tendency to exclude one kind of individual from the surveyDifference between population of interest and effective populationNon-response BiasSubjects dont answerSkip questionsResponse BiasSubjects lie Interviewer effect

Telephone Poll BiasSelection biasCell phonesMultiple phonesAnswererNon-response biasAnswering machinesSocial lifeResponse bias

Why those internet polls are worthlessSelf-selected sampleMore passionate = More likely to respondMinority opinion = more passion

Opposite of the truth

Selection Bias

2. RandomizeRandomization can protect you against factors that you know are in the data. It can also help protect against factors you are not even aware ofRandomization gets rid of biasesRandomizing makes sure that on the average the sample looks like the rest of the populationSample-to-sample differences are referred to as sampling error

Idea 3: The Sample Size Is What MattersHow large a random sample do we need for the sample to be reasonably representative of the population?It is the size of the sample, not the size of the population, that makes the difference in sampling. Exception: If the population is small enough and the sample is more than 10% of the whole population, the population size can matter.The fraction of the population that you have sampled does not matter. It is the sample size itself that is important.

Population vs. SampleSample: The part of the population we actually examine and for which we do have data. How well the sample represents the population depends on the sample design.

A statistic is a number describing a characteristic of a sample.

Population: The entire group of individuals in which we are interested but cant usually assess directly.Example: All voters in the US, Visa card holders in D.C., all packages at a UPS center

A parameter is a number describing a characteristic of the population.PopulationSample

Sampling Techniques

Convenience

Sampling Techniques

Nonstatistical Sampling

Statistical Sampling Simple RandomSystematicStratifiedClusterVoluntary

ConvenienceCollected in the most convenient manner for the researcher (ask whoever is around)Bias: Opinions limited to individuals presentVoluntaryIndividuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. Often called public opinion polls, these are not considered valid or scientificBias: Sample design systematically favors a particular outcome

Statistical SamplingIndividuals in the sample are chosen based on known or calculable probabilities

Statistical Sampling(Probability Sampling)

Systematic

StratifiedClusterSimple Random

Simple Random SamplingEvery possible sample of a given size has an equal chance of being selectedThe simplest way to obtain a sample is to draw names out of a hatThe sample can be obtained using a table of random numbers or computer random number generator

Sampling FrameList of populationExamplesPhone bookRegistered voter listMembership listsEffective population

SRS picks equally from whole frame

Stratified Random SamplingDivide population into subgroups (called strata) according to some common characteristice.g., gender, income levelSelect a simple random sample from each subgroupCombine samples from subgroups into one

PopulationDividedinto 4strataSample

Cluster SamplingDivide population into several clusters, each representative of the population (e.g., county)Select a simple random sample of clustersAll items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling techniquePopulation divided into 16 clusters.

Randomly selected clusters for sample

Decide on sample size: nDivide ordered (e.g., alphabetical) frame of N individuals into groups of k individuals: k=N/nRandomly select one individual from the 1st group Select every kth individual thereafterSystematic Random Sampling

N = 64n = 8k = 8First Group

Exit PollsStratify on statesChoose a simple random sample of polling places in each state.Number of polling places is proportional to the number of voters in each state.Choose a simple random sample of voters as they leave the polling place.Where is the bias?

28

Sources of BiasSelection BiasAre the pollsters more likely to choose women or young people? Pollsters were not allowed at certain polling places.Non-response BiasAre Republicans (Democrats) more suspicious of pollsters, and more likely to refuse to answer the questions?Response BiasDo people know who they voted for? Are they lying to the pollster?

Sample surveysA sample survey is designed to ask questions of a small group of people in order to learn something about the entire populationsA main objective is to collect accurate and reliable data so that we can make appropriate decisions

Survey DesignDefine the issueWhat are the purpose and objectives of the survey?How will the survey be administered? (e.g. phone, email, face to face)Define the population of interestDevelop survey questionsMake questions clear and unambiguousUse universally-accepted definitionsLimit the number of questions

Pre-test the surveyPilot test with a small group of participantsAssess clarity and lengthDetermine the sample size and sampling methodSelect sample and administer the survey

Pre-test the surveyPilot test with a small group of participantsAssess clarity and lengthDetermine the sample size and sampling methodSelect sample and administer the survey

Types of QuestionsClosed-end QuestionsSelect from a short list of defined choicesExample: Major: __business__liberal arts __science__other Open-end QuestionsRespondents are free to respond with any value, words, or statementExample: What did you like best about this course?

Demographic QuestionsQuestions about the respondents personal characteristicsExample: Gender: __Female __ Male