USING UNIVARIATE STATISTICAL ANALYSIS IN BUSINESS RESEARCH

USING UNIVARIATE STATISTICAL ANALYSIS IN BUSINESS RESEARCH

*Hypothesis Testing

Classification of

Univariate Methods

How can we get benefit from hypothesis tests?

Let us assume that Ankara municipality decided not to allow the backers to produce bread under 300 grams. Later somebody has started to complain that X backer is producing bread under 300 grams. What should we do?

We took 100 breads as an example and weighted them. The average was 295 grams and the standard deviation was 3.45 grams. Is it enough to make a decision?

No of course

Why?

Because it is not scientific to make a decision by looking to the descriptive statistical indicators.

So, what should we do?

It is necessary to get advantages of hypothesis techniques.

Stag

es o

f Hyp

othe

sis Te

stin

g

Step 1: Formulate the hypothesis

Formulate the null and alternative hypotheses. A null hypothesis H0 is a statement of the status quo, one of no difference or no effect.

If the null hypothesis is not rejected, no changes will be made.

An alternative hypothesis H1 is one in which some difference or effect is expected.

Accepting the alternative hypothesis will lead to changes in opinions or actions. Thus, the alternative hypothesis is the opposite of the null hypothesis.A statistical test can have one of two outcomes: that the null hypothesis is rejected and the alternative hypothesis accepted, or that the null hypothesis is accepted and the alternative hypothesis is rejected.

One-tailed testA test of the null hypothesis where the alternative hypothesis is expressed directionally. H1: μ >300 grs or H1: μ < 300

Two-tailed testA test of the null hypothesis where the alternative hypothesis is not expressed directionally. H1: μ1≠ μ2

Reject

RejectReject

Accept

Accept

Accept

Step 2: Select an appropriate statistical technique

To test the null hypothesis, it is necessary to select an appropriate statistical technique. The researcher should take into consideration how the test statistic is computed and the sampling distribution that the sample statistic (e.g. the mean) follows.

The test statistic measures how close the sample has come to the null hypothesis.

If the sample size is n>30 and the population variants is known z test will be applied.

If the sample size is n>30 and the population variants is known z test will be applied.

Step 3: Choose the level of significance

Whenever we draw inferences about a population, there is a risk that an incorrect conclusion will be reached. Two types of error can occur.

Type I errorAn error that occurs when the sample results lead to the rejection of a null hypothesis that is in fact true. Also called alpha error (α).Level of significanceThe probability of making a type I error.

Type II errorAn error that occurs when the sample results lead to acceptance of a null hypothesis that is in fact false. Also called beta error (β)

α: 0.01 High significance Degree of Freedom

α: 0.05 medium significance Degree of Freedom

α: 0.10 Low significance Degree of Freedom

Step 4: Collect the data and calculate the test statistic

Sample size is determined after taking into account the desired α and β errors and other qualitative considerations, such as budget constraints. Then the required data are collected and the value of the test statistic is computed.

Look at P 569

Alternatively, the critical value of z, which will give an area to the right side of the critical value of 0.05, is between 1.64 and 1.65 and equals 1.645.

Note that, in determining the critical value of the test statistic, the area to the right of the critical value is either α or α/2. It is α for a one-tailed test and α/2 for a two-tailed test.

Steps 6 and 7: Compare the probability or critical values and make the decision

If calculated z or t value is < Table value H0 accepted H1 rejected

If Calculated z or t value is > Table value H0 rejected H1 accepted

Calculation of Hypothesis

Non-Parametric

Parametric

With Averages With Percentages

N is known N is not Known

Parametric Tests

One population Two populations

Parametric, Average and one population

H0: = 0H1: 0 H0: = 0H1: 0 H0: = 0H1: 0

Claimed average

A pharmaceutical company produces pain relief pills. This company is claiming that its pills relief the pain in shorter time than the others. A consumer association attempt to test this claim. They applied the drug on 100 persons and saw that the pain had gone averagely in 28 minutes. The standard deviation of this average was found to be 10 minutes. Whereas, other pain relievers, headache known as troubleshooting time is 30 minutes. How this association has met the claims of this company? Test level is 0.01

Example 1

H0 accepted, H1 rejected

This means that the claim of the firm is not true

At which level of confidence the average of 28 minute can be acceptable?

Z Table can be used. The value 2 against 00 is enough to find the ratio of 0.9772.

1-0.977= 0.023 is level of this confidence.

Example 2

A battery producer claimed in one of his ad that his product Is longer than the other brands. Upon a complaint done by the competitors a research designed to test the claim. 25 batteries were bought randomly from different markets. The average weights was 17 minutes and the standard deviation 3 minutes while the average of other brands is 15 minutes. Is the producer right or wrong? Test it by using the level of 0.05

To find the table value degree of freedom should be calculated. Degree of freedom is n-1. Therefore df is 25-1=24. If the significant level is 0.05 so table value is 1.71

Because the calculated value is biggest than the table value, H0 is rejected and H1 is accepted. In another words, the claim of the producer is right.

Parametric, Average and two populations

It is important to learn whether two samples which chosen from different population are similar or not. For example: Is the average wage different between two cities?

N1 N2

n1

n2

A firm has two different sale regions and wants to know whether the average weekly sales are equal or not. Two samples were drawn from these two areas. The average sales and standard deviations are as follows:

Example

Because the H1 is inequality so the test will be two tailed. In this case 0.05/2=0.025 To find the table value: 1-0.025=0.975 from the table 1.96

Because the calculated value (1.93) is smaller than table value 1.96 H0 will be accepted and H1 will be rejected.

In another words the average sale is similar in both regions.

Example

A tissue producer MELTEM advertised his products as the best absorber tissue in the market. The competitor of this firm NAZIK asked a research firm to test this claim then the research firm selected 20 packages of MELTEM and 30 packages of NAZIK from a market and tested them. How can we design the problem?

Because the H1 is and there are two populations so the test will be left tailed and the df will be n1+n2-2= 20+30-2=48. If the significant level is 0.01 the table value will be: 2.41

Because the calculated value (2.41) is bıgger than table value 2.77 H0 will be rejected and H1 will be accepted.

In another words the tıssue of Meltem is absorbing more than Nazik.

Parametric, Ratio with one population

It is every time possible to find the arithmetical means and the standard deviation of the population. Ratio test is the best way to avoid this obstacle. This formula can be used. Here P0 represents the common opinion.

P0+q0=1

The occurrence of an event+An event not to happened=1

SEHER is a company which produces soap. It’s market share was 15%. After a comprehensive new advertising campaign, the company wanted to measure it’s market share. In a survey conducted on 1,000 people, 155 people have been determined to use the SEHER brand of soap. Is this campaign successful?

H0: P0 = .15 (The new market share is still %15)H1: P0 .15 (The new market share is more than %15)

Z = 0.443

This is a Right-tail test at a 0.05 significance level. (1-0.05 =) 0.95 the table value is 1.65. Because the calculated value is smaller than the table value we will accept H0 and reject H1. In another words the "market share (still) is 15%”.

Parametric, Ratio with two independent population

A research has done on Kazakh and Azerbaijani youth people on the clothing habits. Jeans wear rates shown in the table. Is there a difference between the rates of two of the country's young people wearing jeans?

P1: The rate of wearing jeans of Kazak Youth P2: The rate of wearing jeans of Azeri Youth

H0: P1 = P2H1: P1 P2

Two tailed test: 0.05/2=0.025. To find the table value (1-0.025=) 0.975 This value is: 1.96. Because calculated value (4.675) is bigger than table value H0 will be rejected and H1 will be accepted. Thus the rates are not equal. The rate of Kazak people 160/200=%80 is bigger than 110/190=%58

Non-Parametric Single Sample Hypothesis

Testing

Kolmogorov-Simirnov (K-S) Test Kolmogorov–Smirnov test (K–S test) is a nonparametric test for the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).

The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples.

http://en.wikipedia.org/wiki/Andrey_Kolmogorov

http://en.wikipedia.org/wiki/Nikolai_Smirnov_(mathematician)

http://en.wikipedia.org/wiki/Nikolai_Smirnov_(mathematician)

http://en.wikipedia.org/wiki/Nonparametric_statistics

http://en.wikipedia.org/wiki/Nonparametric_statistics

http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Random_sample





http://en.wikipedia.org/wiki/Metric_(mathematics)

http://en.wikipedia.org/wiki/Empirical_distribution_function

http://en.wikipedia.org/wiki/Empirical_distribution_function

http://en.wikipedia.org/wiki/Cumulative_distribution_function



The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution (in the two-sample case) or that the sample is drawn from the reference distribution (in the one-sample case).

In each case, the distributions considered under the null hypothesis are continuous distributions but are otherwise unrestricted.


http://en.wikipedia.org/wiki/Null_distribution

http://en.wikipedia.org/wiki/Null_distribution

http://en.wikipedia.org/wiki/Null_hypothesis









K-S tests the difference between the theoretical distribution and observed distribution. So Ai presents the theoretical cumulative relative frequency distribution of each category and Oi presents the value of comparative example frequency. K-S is the absolute value of the deference between Ai and Oi

K-S = Max Ai - Oi

H0: Observed value = Theoritical valueH1: Observed value ≠ Theoritical value.

EXAMPLE:X construction firm produces five different house types. The firm wonder whether there are differences in preferring among the house types or not. 50 people had been questioned by a survey. 4 Persons preferred type A, 20 preferred type B, 8 preferred type C, 6 preferred type D, 12 preferred type E. Is there a significant difference between preferences?

H0: The preferrences are equal. (Observed value = Theoritical value)

H1: Tercihler birbirinden farklıdır (Observed value ≠ Theoritical value).

Theoritical frequency = 50/5 = 10

The highest value K-S is 0.12. The significant level is 0.05 and n=50. So the table value is 1.36/50=0.19. This value is bigger than 0.12. Therefore H0 hypothesis will be accepted. In another words ne differences among the preferences.

Runs Test

The Abraham Wald and Jacob Wolfowitz test is a non-parametric statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent.

http://en.wikipedia.org/wiki/Abraham_Wald

http://en.wikipedia.org/wiki/Abraham_Wald

http://en.wikipedia.org/wiki/Jacob_Wolfowitz



http://en.wikipedia.org/wiki/Non-parametric_statistic




http://en.wikipedia.org/wiki/Statistical_hypothesis_testing

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing

http://en.wikipedia.org/wiki/Statistical_independence

These parameters do not assume that the positive and negative elements have equal probabilities of occurring, but only assume that the elements are independent and identically distributed. If the number of runs is significantly higher or lower than expected, the hypothesis of statistical independence of the elements may be rejected.

"++++−−−+++−−++++++−−−−"

http://en.wikipedia.org/wiki/Independent_and_identically_distributed





Runs tests can be used to test:1.The randomness of a distribution, by taking the data in the given order and marking with + the data greater than the median, and with – the data less than the median; (Numbers equalling the median are omitted.)2. whether a function fits well to a data set, by marking the data exceeding the function value with + and the other data with −. For this use, the runs test, which takes into account the signs but not the distances, is complementary to the chi square test, which takes into account the distances but not the signs.

http://en.wikipedia.org/wiki/Median



http://en.wikipedia.org/wiki/Data_set





http://en.wikipedia.org/wiki/Chi_square_test

http://en.wikipedia.org/wiki/Chi_square_test

H0: The observations has selected randomly from the population.

H1: The observations has not selected randomly from the population.

D: Subgroups of the sample

Example:A glossary market wonders whether the gender of the costumers who enter the market are distributed randomly or nor. 35 costumers have been observed and their sex has been recorded.

Sample size n: 35Male n1: 18Female n2: 17Groups Number D : 14

H0 accepted H1 rejected. The gender is distributed randomly

Documents

USING UNIVARIATE STATISTICAL ANALYSIS IN BUSINESS RESEARCH