Statistics in Theses

Dr. John P. AbrahamProfessor

University of Texas Pan American

Statistics in Theses

Describe an eggStudents try to do this

Differences in descriptionChildren’s viewAdults’ viewShopper’s viewSeller’s viewProducer’s viewChicken’s viewBiologist’s viewDietician’s viewChemist’s view

MeasurementsYou need to describe using some

measurementsErrors in measurements

Descriptive statisticssummarizing a collection of data in a clear

and understandable way.NumericalGraphical

Numerical descriptive statisticsSpread

RangeSemi-interquartile rangeStd deviation

central tendencyMeanMedianMode

Inferential StatisticsInfer about a population based on a sampleInfer about the future based on past

Hypothesis testing using variablesA variable is characteristic of an object of a

study that can be measured.The measurements will be different for

different objects.Can be quantitative or qualitativeCan be independent or dependentContinuous or discrete (when we create a 1

to 5 ranking)

Necessity for controlWhat is a control group

A control group study uses a control group to compare to an experimental group in a test of a causal hypothesis.

The control and experimental groups must be identical in all relevant ways except for the introduction of a suspected causal agent into the experimental group.

For example, if 'C' causes 'E', when we introduce 'C' into the experimental group but not into the control group, we should find 'E' occurring in the experimental group at a significantly greater rate than in the control group.

Significance is measured by relation to chance: if an event is not likely due to chance, then its occurrence is significant.

Double blind studya control group test where neither the

evaluator nor the subject knows which items are controls

A randomized test is one that randomly assigns items to the control and the experimental groups.

The purpose of controls, double-blind, and randomized testing is to reduce error, self-deception and bias.

PlaceboMany control group studies use a placebo in

control groups to keep the subjects in the dark as to whether they are being given the causal agent that is being tested.

For example, both the control and experimental groups will be given identical looking pills in a study testing the effectiveness of a new drug. Only one pill will contain the agent being tested; the other pill will be a placebo.

In a double-blind study, the evaluator of the results would not know which subjects got the placebo until his or her evaluation of observed results was completed. This is to avoid evaluator bias from influencing observations and measurements.

Inferential statisticswe use inferential statistics to make

inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.

we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.

T-testcompare the average performance of two

groups on a single measure to see if there is a difference.

You might want to know whether there is a difference between girls and boys in their math abilities.

Whenever you wish to compare the average performance between two groups you should consider the t-test for differences between groups.

Is there a difference?

How about here?

T-test exampleThe Acme Company has developed a new battery. The

engineer in charge claims that the new battery will operate continuously for at least 7 minutes longer than the old battery.

To test the claim, the company selects a simple random sample of 100 new batteries and 100 old batteries. The old batteries run continuously for 190 minutes with a standard deviation of 20 minutes; the new batteries, 200 minutes with a standard deviation of 40 minutes.

Test the engineer's claim that the new batteries run at least 7 minutes longer than the old. Use a 0.05 level of significance. (Assume that there are no outliers in either sample.)

See next slie

4 steps needed (1) state the hypotheses, (2) formulate an analysis

plan, (3) analyze sample data, and (4) interpret results State the hypotheses. The first step is to state the

null hypothesis and an alternative hypothesis.Null hypothesis: μ1 - μ2 >= 7

Alternative hypothesis: μ1 - μ2 < 7Formulate an analysis plan. For this analysis, the

significance level is 0.05. Using sample data, we will conduct a two-sample t-test of the null hypothesis.

Analyze sample data. Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t-score test statistic (t). t = [ (x1 - x2) - d ] / SE = [(200 - 190) - 7] / 4.472 = 3/4.472 = 0.67

Interpret results. Since the P-value (0.75) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Standard ScoreProblem A national achievement test is administered

annually to 3rd graders. The test has a mean score of 100 and a standard deviation of 15. If Jane's z-score is 1.20, what was her score on the test?

From the z-score equation, we know z = (X - μ) / σwhere z is the z-score, X is the value of the

element, μ is the mean of the population, and σ is the standard deviation.

Solving for Jane's test score (X), we getX = ( z * σ) + 100 = ( 1.20 * 15) + 100 = 18 + 100

= 118

ProbabilityMathematically, the probability that an event will occur

is expressed as a number between 0 and 1. Notationally, the probability of event A is represented by

P(A). A coin is tossed three times. What is the probability that

it lands on heads exactly one time? If you toss a coin three times, there are a total of eight

possible outcomes. They are: HHH, HHT, HTH, THH, HTT, THT, TTH, and TTT. Of the eight possible outcomes, three have exactly one head. They are: HTT, THT, and TTH. Therefore, the probability that three flips of a coin will produce exactly one head is 3/8 or 0.375.

ANOVA (Analysis of Variance)gives a statistical test of whether the

means of several groups are all equalMANOVA (multivariate analysis of variance)

Multivariate analysis of variance (MANOVA) is used when there is more than one dependent variable.

CorrelationStatistical correlation is a statistical technique

which tells us if two variables are related.

If the change in one variable is accompanied by a change in the other, then the variables are said to be correlated. We can therefore say that family income and family expenditure, price and demand are correlated.

You should measure manipulated variables rather than: one could compute 'r' between the size of shoe and intelligence of individuals, heights and income. Irrespective of the value of 'r', it makes no sense and is hence termed chance or non–sense correlation.

r Value

In general, r > 0 indicates positive relationship, r < 0 indicates negative relationship while r = 0 indicates no relationship (or that the variables are independent and not related). Here r = +1.0 describes a perfect positive correlation and r = -1.0 describes a perfect negative correlation.

value of rStrength of relationship-1.0 to –0.5 or 1.0 to 0.5Strong-0.5 to –0.3 or 0.3 to 0.5Moderate-0.3 to –0.1 or 0.1 to 0.3Weak–0.1 to 0.1None or very weak

Analysis of CovarianceAnova mixed with regression analysisANCOVA tests whether certain factors have an

effect on the outcome variable after removing the variance for which quantitative predictors (covariates) account.

Suppose you analyze the results of a clinical trial of three types of treatment of a disease - "Placebo", "Drug 1", and "Drug 2". The results are three sets of survival times, corresponding to patients from the three treatment groups. The question of interest is whether there is a difference between the three types of treatment in the average survival time.

ANCOVA cont.You might use analysis of variance to answer

this question. But, if you have supplementary information, for example, each patient's age, then analysis of covariance allows you to adjust the treatment effect (survival time, in this case) to a particular age, say, the mean age of all patients. Age in this case is a "covariate" - it is not related to treatment, but can affect the survival time. This adjustment allows you to reduce the observed variation between the three groups caused not by the treatment itself but by variation of age.

Regression AnalysisRegression analysis provides a "best-fit"

mathematical equation for the relationship between the dependent variable (response) and independent variable(s) (covariates).

In linear regression, the function is a linear (straight-line) equation. For example, if we assume the value of an automobile decreases by a constant amount each year after its purchase, and for each mile it is driven, we can create a formula to find the value.

Summarize the courseWhy use share point services?

You will have several faculty members on your committee

All will have to comment on your thesis and correct.

Best way to make appointments with many people

One central repository for all your files.Different versions are kept. In case of a

mistaken edit can go back.

Why review different thesesDiscussed styleDiscussed chaptersDiscussed contentHow to get ideas for your research from

suggestions

ReferencesDiscussed different types of references and

what is acceptable and what is not.Discussed plagiarism at lengthDiscussed how to quote and how to cite

Theses and ProjectDifferencesSimilaritiesReport writing

Formal research studiesHypothesis formulationCollect raw dataConduct statistical analysisMake concultionsReport

Documents

Statistics in Theses