Upload
buddybd
View
215
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Business Statistics
Citation preview
The Who, What, Why, Where, When and How of Data
The Who, What, Why, Where, When and How of Data
Statistics consists of two parts:Descriptive statistics (coping with lots of numbers)Draw a picture (graph, charts etc)Calculate a few numbers which summarize the data (mean, median, percentile)Inferential statisticsHow can one make decisions and predictions abouta population even if we have data for relatively few subjects from that population?We need to generalize the facts we learn from a sample ( i.e. a part of the population) to the entire population
Data typesLets consider the following questionsWhat is your sex?How tall are you? (inches)What year are you in school?What is your major?6. How many miles do you travel to UMD each day?7. What is your GPA?
VariablesEach question measures some aspect of you.Variable: the aspect/characteristic that differs from subject to subject, individual to individual.Age, Sex, Major, Data: the value of the variables20, Male, English,
Two Types of VariablesQuantitative or numerical variablesNumbers, measurementsAge, height, miles traveled
Qualitative or categorical variablesClassifying each observationSex, year in school, major
Quantitative or numerical variablesDiscrete variables: there is a natural gapbetween the valuesNumber of childrenNumber of credit cardsContinuous variables: the values can be arbitrarily close togetherWeightHeightAge
Qualitative or categorical variablesOrdinal variables: categories that have a natural orderingNumbers could be assigned to categoriesClass 1 = Freshman2 = Sophomore3 = Junior4 = SeniorGrade A, B, C, D, F (GPA)Preference Strongly Agree, Agree, Disagree, Strongly DisagreeNominal variables: categories that have no natural orderingMajor business, mathematics, historyEye color blue, green, black
Types of variables summary
variablesqualitativequantitativediscretecontinuousnominalordinal
Examples (What are the types?)Appraisal of a companys inventory level (excellent, good, fair, poor)
2. Mode of transportation to work. (automobile, bicycle, bus, subway, walk)
3. Speed of a vehicle
4. The number of persons in each family.
5. The diagnostic test for pneumonia (symptoms present, symptoms absent)
Qualitative, ordinalQualitative, nominalQuantitative, continuousQuantitative, discreteQualitative, nominal
Interval dataNo meaningful zero point; cant multiply or divide but the difference between two values is meaningfultemperatureRatio dataMeaningful zero point; can multiply and divideIncome, weight, heightTime series dataOrdered data values over timeCross sectional dataData values observed at a single point in time
Sales (in $10000s)2003200420052006New York435460475490Dallas320345375395Seattle405390410395Orlando260270285280
Time Series DataCross Sectional Data
Surveys and SamplingKey ideas1: Examine a part of the wholeThe first idea is to draw a sample. Goal: learn about an entire population of individuals, but examining all of them is not feasibleExamine a smaller group of individuals, called sample, chosen from the population.Samples that over- or underemphasize some characteristics of the population are said to be biased
BiasSample doesnt represent population.Trouble!! Generalizations are no longer valid Conclusions may no longer be true
Sources of BiasSelection BiasProblem in sampling scheme; systematic tendency to exclude one kind of individual from the surveyDifference between population of interest and effective populationNon-response BiasSubjects dont answerSkip questionsResponse BiasSubjects lie Interviewer effect
Telephone Poll BiasSelection biasCell phonesMultiple phonesAnswererNon-response biasAnswering machinesSocial lifeResponse bias
Why those internet polls are worthlessSelf-selected sampleMore passionate = More likely to respondMinority opinion = more passion
Opposite of the truth
Selection Bias
2. RandomizeRandomization can protect you against factors that you know are in the data. It can also help protect against factors you are not even aware ofRandomization gets rid of biasesRandomizing makes sure that on the average the sample looks like the rest of the populationSample-to-sample differences are referred to as sampling error
Idea 3: The Sample Size Is What MattersHow large a random sample do we need for the sample to be reasonably representative of the population?It is the size of the sample, not the size of the population, that makes the difference in sampling. Exception: If the population is small enough and the sample is more than 10% of the whole population, the population size can matter.The fraction of the population that you have sampled does not matter. It is the sample size itself that is important.
Population vs. SampleSample: The part of the population we actually examine and for which we do have data. How well the sample represents the population depends on the sample design.
A statistic is a number describing a characteristic of a sample.
Population: The entire group of individuals in which we are interested but cant usually assess directly.Example: All voters in the US, Visa card holders in D.C., all packages at a UPS center
A parameter is a number describing a characteristic of the population.PopulationSample
Sampling Techniques
Convenience
Sampling Techniques
Nonstatistical Sampling
Statistical Sampling Simple RandomSystematicStratifiedClusterVoluntary
ConvenienceCollected in the most convenient manner for the researcher (ask whoever is around)Bias: Opinions limited to individuals presentVoluntaryIndividuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. Often called public opinion polls, these are not considered valid or scientificBias: Sample design systematically favors a particular outcome
Statistical SamplingIndividuals in the sample are chosen based on known or calculable probabilities
Statistical Sampling(Probability Sampling)
Systematic
StratifiedClusterSimple Random
Simple Random SamplingEvery possible sample of a given size has an equal chance of being selectedThe simplest way to obtain a sample is to draw names out of a hatThe sample can be obtained using a table of random numbers or computer random number generator
Sampling FrameList of populationExamplesPhone bookRegistered voter listMembership listsEffective population
SRS picks equally from whole frame
Stratified Random SamplingDivide population into subgroups (called strata) according to some common characteristice.g., gender, income levelSelect a simple random sample from each subgroupCombine samples from subgroups into one
PopulationDividedinto 4strataSample
Cluster SamplingDivide population into several clusters, each representative of the population (e.g., county)Select a simple random sample of clustersAll items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling techniquePopulation divided into 16 clusters.
Randomly selected clusters for sample
Decide on sample size: nDivide ordered (e.g., alphabetical) frame of N individuals into groups of k individuals: k=N/nRandomly select one individual from the 1st group Select every kth individual thereafterSystematic Random Sampling
N = 64n = 8k = 8First Group
Exit PollsStratify on statesChoose a simple random sample of polling places in each state.Number of polling places is proportional to the number of voters in each state.Choose a simple random sample of voters as they leave the polling place.Where is the bias?
28
Sources of BiasSelection BiasAre the pollsters more likely to choose women or young people? Pollsters were not allowed at certain polling places.Non-response BiasAre Republicans (Democrats) more suspicious of pollsters, and more likely to refuse to answer the questions?Response BiasDo people know who they voted for? Are they lying to the pollster?
Sample surveysA sample survey is designed to ask questions of a small group of people in order to learn something about the entire populationsA main objective is to collect accurate and reliable data so that we can make appropriate decisions
Survey DesignDefine the issueWhat are the purpose and objectives of the survey?How will the survey be administered? (e.g. phone, email, face to face)Define the population of interestDevelop survey questionsMake questions clear and unambiguousUse universally-accepted definitionsLimit the number of questions
Pre-test the surveyPilot test with a small group of participantsAssess clarity and lengthDetermine the sample size and sampling methodSelect sample and administer the survey
Pre-test the surveyPilot test with a small group of participantsAssess clarity and lengthDetermine the sample size and sampling methodSelect sample and administer the survey
Types of QuestionsClosed-end QuestionsSelect from a short list of defined choicesExample: Major: __business__liberal arts __science__other Open-end QuestionsRespondents are free to respond with any value, words, or statementExample: What did you like best about this course?
Demographic QuestionsQuestions about the respondents personal characteristicsExample: Gender: __Female __ Male