Quantitative Analysis

By Dr. Jatin Thukral

Quantitative Analysis

CFA Level I

Outline

• Probability Concepts• Common Probability Distributions• Sampling and Estimation• Hypothesis Testing

Probability Concepts

• A random variable is an uncertain quantity• An outcome is an observed value of a random

variable• An event is a single outcome or a set of

outcomes• Mutually exclusive and exhaustive events are

those that include all possible outcomes

Defining Properties of Probability

• The probability of occurrence of any event Ei is between 0 and 1

0 ≤ P(Ei) ≤ 1

• If a set of event E1 ,E2 , E3 , …. , EN is mutually exclusive and exhaustive, then

∑P(Ei) = 1

Types of Probabilities

• Empirical probability: based on historical data

• A priori probability: based on reasoning

• Subjective probability: based on personal judgment

Probability in terms of Odds

• Alternate way of expressing probabilities

• The odds that an event with probability 0.2 will occur are 0.2/(1-0.2) = 0.25

Unconditional and Conditional Probabilities

• Unconditional Probability: The marginal probability that an event occurs regardless of any past or future event

• Conditional Probability: The probability of any event given that another event(s) has occurred or will occur

An example illustrating unconditional, conditional and joint probabilities

Rules of Probability• Multiplication Rule:

Joint Probability P(AB) = P(A|B) P(B)

• Addition Rule: P( A or B) = P(A) + P(B) – P(AB)

• Total Probability Rule: P(A) = P(A|B1) P(B1) + P(A|B2) P(B2)

+ ………. + P(A|BN) P(BN) where B1, B2 ,…, BN is a mutually exclusive and exhaustive

set of outcomes

Applications of Probability Rules

• Multiplication rule is used to compute joint probability

• Addition rule is used to determine probability that at least one of two event occurs

• Joint probability of any number of independent events is calculated as

P(ABCD)=P(A) P(B) P(C) P(D)

Independent Events

• Independent Events: Occurrence of one event has no influence on occurrence of other events

P(A|B) = P(A)

Expected Value

Expected value is the weighted average of the possible outcomes of a random variable, where the weights are the probabilities that the outcomes will occur

Diagramming an investment problem using a tree diagram

Covariance

• Covariance is a measure of how two assets move together

• Variance is covariance of an asset with itself• Covariance can range from negative infinity to

positive infinity

Correlation Coefficient

• Correlation is covariance divided by product of standard deviations

• Measures the strength of linear relationship between two random variables

• Has no units and ranges from -1 to +1• If Corr(Ri,Rj)=1, the random variables have perfect

positive correlation

Portfolio Expected Value and Variance

• Portfolio Expected Value

• Portfolio Variance

Bayes’ Formula

• Bayes’ formula is used to update a given set of prior probabilities for a given event in response to the arrival of new information

Labeling

• There are n items and each can receive one of k different labels

• The number that receive label 1 is n1, that receive label 2 is n2 and so on, such that

• The total number of ways that the labels can be assigned is

Combination Formula

• A special case of labeling where k=2• n1=r and n2=n-r• Choosing r items from a set of n items• nCr is the number of possible ways of selecting r

items from n items when order of selection is not important

Permutation Formula

• nPr is the number of possible ways to select r items from a set of n items when the order of selection is important

Outline


Probability Distribution

• A probability distribution describes the probabilities of all the possible outcomes for a random variable

• A discrete random variable is one for which number of possible outcomes can be counted

• A continuous random variable has infinite possible outcomes

Probability Function

• A probability function p(x) specifies the probability that a random variable is equal to a specific value x

• The two key properties are 0 ≤ p(x) ≤ 1 ∑ p(x) = 1

Probability Density Function (pdf) and Cumulative Distribution Function (cdf)

• A pdf is a function f(x) that generates the probability that outcomes of a continuous distribution lie within a particular range

• A cdf defines the probability that a random variable X takes on a value less than a specific value x

Pdf and Cdf of Normal Distribution

Discrete Uniform Random Variable and a Binomial Random Variable

• A discrete uniform random variable: probabilities for all possible outcomes are equal

• A binomial random variable is the number of “successes” in a given number of trials when outcome can be either “success” or “failure”

Binomial Random Variable

• The probability of success p is constant for each trial and the trials are independent

• For number of trials = 1, the binomial random variable is called Bernoulli random variable

• The expected value of a Binomial variable is

Stock Price Movements as Binomial Tree

Continuous Uniform Distribution

• Outcomes can only occur between some lower limit a and some upper limit b

• P(X < a) = P(X > b) = 0• P(x1 ≤ X ≤ x2) = (x2 – x1) / (b – a)

Normal Distribution

• Completely described by mean µ and variance σ2, stated as X ~ N(µ,σ2)

• Skewness = 0, that is P(X ≤ µ) = P(µ ≤ X) = 0.5• Kurtosis = 3• A linear combination of normally distributed

random variable is also normally distributed• Probabilities of outcomes further above and

below from mean get smaller but never zero

Normal Distribution

Multivariate Distributions

• Specifies the probabilities associated with a group of random variables

• Meaningful only if behavior of one random variable depends on behavior of others

• Multivariate normal distribution is completely defined by three set of parameters

Confidence Interval

• A confidence interval is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time

Standard Normal Distribution

• A standard normal distribution is a normal distribution that has been standardized so that it has mean zero and standard deviation 1

• Standardization is the process of converting an observed value for a random variable to its z-value

Using the z-table• Consider an EPS distributed with µ=6,σ =2.

What is probability that EPS ≥ 9.70?• The z-value of EPS = 9.70 is

• From z-table, F(1.85)=0.9678• F(EPS > 9.70) = 1-0.9678 = 0.0322

Shortfall Risk and Safety-First Ratio• Shortfall risk is probability that return will fall

below a target value over a given time period• Roy’s safety-first criterion states that optimal

portfolio minimizes the probability that return falls below a minimum threshold level, i.e.,

• If returns are normally distributed, then

Lognormal Distributions

• The lognormal distribution is generated by the function ex where x is normally distributed

Monte Carlo Simulations

• Technique based on repeated generation of one or more factors that affect security values

• A probability distribution is assigned to each factor

• Each set of randomly generated factors is used with a pricing model to value the security

• Limitations: fairly complex; results no better than the assumptions on factor distributions

Historical Simulation

• Based on actual changes in risk factors over some time period

• Advantage of using the actual distribution of risk factors and not estimations

• Limitation compared to Monte Carlo simulation is that can’t answer “What if” questions

Outline


Sampling• Simple random sampling: selecting a sample such

that each item has same chance of being selected• Systematic sampling: selecting every mth item• Sampling error: difference between sample statistic

(mean, variance) and its population parameter• Sampling distribution of a sample statistic is a

probability distribution of all possible sample statistics computed from a set of equal-sized samples

Stratified Random Sampling

• Population is separated into smaller groups based on one or more distinguishing characteristics

• From each group, or stratum, a random sample is taken and results are pooled

• Size of sample from each stratum depends on size of stratum relative to population

• Often used in bond indexing

Time-Series and Cross-Sectional Data

• Times-series data: observations taken over a period of time at specific and equally spaced time intervals

• Cross-sectional data: observations taken at a single point in time

Central Limit Theorem

• For simple random samples of size n from a population with mean µ and variance σ2, the sampling distribution of sample mean approaches N(µ,σ2/ n)

• Extremely useful: Normal distribution is easy to apply to hypothesis testing and construct confidence intervals

• Inference can be made irrespective of population distribution if n>30

Standard Error of Sample Mean

• It is the standard deviation of the distribution of the sample means

Point Estimates and Confidence Interval

• Point estimates are single values used to estimate population parameters. For ex.

• Confidence interval estimates result in a range of values within which the actual parameter will lie with probability 1-α. Here, α is called the level of significance and 1-α is called degree of confidence

Construction of Confidence Intervals

point estimate ± (reliability factor X standard error)

Where

Point estimate = vale of a sample statisticReliability factor = a number depending on the

sampling distribution of estimation and probability that estimate falls within confidence interval

Standard error = standard error of the point estimate

Desirable Properties of an Estimate

• Unbiased: Expected value of the estimator is equal to the parameter you are trying to estimate

• Efficient: Variance of sampling distribution is smaller than all the other unbiased estimators

• Consistent: Accuracy of parameter estimate increases as the sample size increases

Student’s t-distribution

• Bell-shaped distribution, symmetric about mean• Appropriate distribution to use when

constructing confidence intervals based on small samples (n<30) from populations with unknown variance and a normal (or approx.) distribution

• Also appropriate when variance is unknown and sample size is large enough that CLT will ensure that sampling distribution is approx. normal

Properties of Student’s t-distribution

• Symmetrical about mean• Defined by a single parameter, the degrees of

freedom, equal to n – 1• More probability in tails than normal

distribution• As degrees of freedom get larger it

approaches standard normal distribution• Fatter tails => difficult to reject null hypothesis

Distributions for Different Degrees of Freedom

Calculating Confidence Interval

• Population is normal distributed with known variance

Calculating Confidence Interval

• Population is normal distributed with unknown variance

Constructing Confidence Intervals

• Unlike standard normal distribution, the reliability factor for t-distribution depends on sample size, so we have to look for reliability factors from table

• Owing to fatter tails of the t-distribution, confidence intervals using t-distribution are wider/more conservative

Selecting the Appropriate Test Statistic

Biases in Estimation Based on Samples

• Data-Mining Bias: Repetitive search for data patterns results in data-mining

• Sample Selection Bias: Some data is systematically excluded because of lack of availability

• Survivorship Bias: Most common bias, only surviving data is considered

• Look-Ahead Bias: A relationship is tested based on data not available on test date

• Time-Period Bias: Period over which data is gathered is too short or too long

Outline


Hypothesis Testing Procedure

Null Hypothesis and Alternative Hypothesis

• The null hypothesis H0 is the hypothesis that the researcher wants to reject. For ex,

• The alternate hypothesis Ha is what is concluded if there is sufficient evidence to reject the null hypothesis

Two-Tailed Tests of Hypotheses

• A two-tailed test for population mean is

• The general decision rule for a two-tailed test is

Two-Tailed Test Using Standard Normal(z) Distribution

• The decision rule for a two-tailed z-test at α=0.05 is

Example of Two-Tailed Test

• Data gathered on daily returns for 250 days; Mean daily return=0.1%, Sample Std. Dev. =0.25%. Belief: Mean Daily Return ≠ 0

• Null Hypothesis:• Decision Rule:

• Standard error = 0.0025/√250• Test Statistic = (0.001 – 0)/(0.0025/√250) = 6.33• Since 6.33 > 1.96, reject the null hypothesis

One-Tailed Hypothesis Test

• The null and alternative hypotheses are either• The null and alternative hypotheses are either

• Using Standard Normal(z) Distribution

Test Statistic

• Standard Error when population σ is known:

• Standard Error when population σ is not known:

Critical Value and Decision Rule

• A test statistic is a random variable that may follow one of several distributions: the t-distribution, the z-distribution, the chi-square distribution and the F-distribution

• The critical value for the appropriate test statistic depends on its distribution

• The decision rule: if the test statistic is (greater, less than) X, reject the null

Type I and Type II Errors

• Type I error: Rejection of H0 when it is actually true

• Type II error: Failure to reject H0 when it is false• The significance level is the probability of making a

Type I error and is designated by α• Power of a test is the probability of correctly

rejecting H0 when it is false:• Therefore, Power of a test = 1 – P(Type II error)

Type I and Type II Errors

Relation Between Confidence Intervals and Hypothesis Tests

• A confidence interval is determined as

• The above expression can be rewritten as

p-value and Hypothesis Testing

• The p-value is the smallest level of significance for which H0 can be rejected assuming it is true

• For example, Signif. level =95%,

Test statistic =2.3

Population mean: when to use t-test?• Use t-test when the population variance is

unknown and either n≥30 or n<30 but distribution is approx. normal

• A t-statistic with n-1 degrees of freedom is computed as

Population mean: when to use z-test?• Use z-test when population is normally

distributed and variance is known• A z-statistic is computed as

Population mean: when to use z-test?• Also, use z-test when sample size is large and

variance is unknown• Corresponding z-statistic is computed as

Critical z-Values

Equality Test of Two Population Means

• Three possible hypotheses structures

• Two possible t-statistics• Unknown variances but assumed equal• Unknown variances assumed unequal

Unknown Variances Assumed Equal

Unknown Variances Assumed Uequal

Paired Comparisons Test

• Test whether the means of the difference between observations for two samples are different

• Two-sided Test

• One-sided Test

t-statistic for Paired Comparisons Test

Test Variance of Normally-Distributed Population

• Chi-square test– Two-tailed test

– One-tailed test

Decision Rule for a Two-Tailed Chi-Square Test

Chi-Square Test Statistic

• The chi-square test statistic with n-1 degrees of freedom is computed as

Equality of Variance of Two Normally Distributed Populations

• F-distributed test statistic– Two-tailed test

– One-tailed test

• F-statistic computed as

F-test

• Always put larger variance in numerator• n1 – 1 and n2 – 1 are degrees of freedom used

to identify critical value from the F-table

Parametric and Nonparametric Tests

• Parametric Tests: Rely on assumptions regarding the distribution of the population and are specific to population parameters

• Nonparametric Tests: Either do not consider a particular population parameter or have few assumptions about the population that is sampled

Documents

Quantitative Analysis