22
Core Deposits Sensitivity and Survival Analysis Laura Roberts Hugh Blaxall Brian Velligan Sept. 13, 2010

Core deposits sensitivity and survival analysis

Embed Size (px)

DESCRIPTION

This presentation shows financial managers how to predict how long accounts will likely stay open. It is based on a sophisticated statistical probability model.

Citation preview

Page 1: Core deposits sensitivity and survival analysis

Core Deposits Sensitivity and

Survival AnalysisLaura RobertsHugh BlaxallBrian VelliganSept. 13, 2010

Page 2: Core deposits sensitivity and survival analysis

Research question 1a

• Question 1a.

How can we visually summarize account duration?

Page 3: Core deposits sensitivity and survival analysis

Research question 1b

• Question 1b. How can we predict the length of time a person will keep a core account open (account duration)? We cannot simply compute an average of account durations because we do not know how far into the future current accounts will “survive.” Simple means will produce a negatively biased estimate.

• Perhaps we can revise our question to read, “What is the probability a person will keep an account open for a specific period of time?” This new question allows us to use survival analysis, hazard probabilities, and risk functions to get a detailed picture of account duration.

Page 4: Core deposits sensitivity and survival analysis

Question 1b (continued)

• Can we create a model using time and other indictors (e.g. interest rate or change in the interest rate on the account) as predictors of account duration? This is a more sophisticated question for another time…food for thought for now…

Page 5: Core deposits sensitivity and survival analysis

Question 1c

• 1c – How can we summarize typical account duration with a single index? Remember means and other simple average indices will not do the trick because we do not know how long accounts will stay open…

Page 6: Core deposits sensitivity and survival analysis

What is the best statistical tool for answering each question?

• Question 1a – to visually summarize duration use a histogram of the frequency of duration for censored and uncensored accounts. I’ll show you how to do this.

• Question 1b - To predict duration, use survival analysis.

• Question 1c – for a single index, we can use median lifetime survival probability…more on this…

Page 7: Core deposits sensitivity and survival analysis

Background for Study

• 1. Use a multi-cohort analysis such as accounts opened between 1972 and 1977 and studied until 1984.

• 2. Measure duration of each account.

• 3. Predict length of time until a given event, in this case, closing of the account.

• 4. Some people will not close the account within the time period of observation. These people (accounts) are considered to be censored.

Page 8: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 8

Dataset acctdur.txt

Overview Discrete-time person-level dataset on the duration of accounts opened between 1972 and 1977, and which were followed uninterruptedly until 1984.

Source Bank records.

Sample size 3941 accounts.

More Info Singer & Willett, 2003

Let’s examine an example …Let’s examine an example …

Note on the labeling of the discrete time “bins.” We regarded an account’s first year as their zeroth year. If they then are closed sometime during the following year, they were classified as having a duration of one year and having been closed in “year one.”

Note on the labeling of the discrete time “bins.” We regarded an account’s first year as their zeroth year. If they then are closed sometime during the following year, they were classified as having a duration of one year and having been closed in “year one.”

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Introducing A Dataset On Account Duration

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Introducing A Dataset On Account Duration

““Multiple Cohort” Sample Multiple Cohort” Sample DesignDesignBe aware that multiple annual cohorts of accounts are pooled together into this single sample:

•Cohorts entered the sample sequentially between the 1972 and 1977.*

•All cohorts were followed until the end of 1984.

““Multiple Cohort” Sample Multiple Cohort” Sample DesignDesignBe aware that multiple annual cohorts of accounts are pooled together into this single sample:

•Cohorts entered the sample sequentially between the 1972 and 1977.*

•All cohorts were followed until the end of 1984.

Important Distinction Important Distinction You Must Keep In You Must Keep In MindMindThe two “modern” approaches to survival analysis are distinct in the way that they require duration to be measured:

•In discrete-time survival analysis, time is measured in discrete units, such as semesters, years, etc.

•In continuous-time survival analysis, time can be measured to any level of precision.

Important Distinction Important Distinction You Must Keep In You Must Keep In MindMindThe two “modern” approaches to survival analysis are distinct in the way that they require duration to be measured:

•In discrete-time survival analysis, time is measured in discrete units, such as semesters, years, etc.

•In continuous-time survival analysis, time can be measured to any level of precision.

Research Research QuestionQuestion

Whether, and if so when, accounts are closed?

Research Research QuestionQuestion

Whether, and if so when, accounts are closed?

Page 9: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 9

The dataset is straightforward, containing IDs and length of account, with one small hitch …The dataset is straightforward, containing IDs and length of account, with one small hitch …

Structure of Dataset

Col#

Var Name Variable Description Variable Metric/Labels

1 ID Customer identification code. Integer

2 acctopen

Number of years that the account remained open, or until the account was censored in 1984 by the end of the study.

Integer

3 CENSOR

Dummy variable to indicate to indicate whether an account was censored by the end of data collection in 1984.

Dichotomous variable: 0 = not censored,1 = censored.

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis The Difficult Problem of Censoring!!!

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis The Difficult Problem of Censoring!!!

There is a problem that is intrinsic to survival data, and is illustrated in this dataset: The event of importance in the

study is is “closing an account.” But not every customer (account)

actually experiences this event while being observed by researchers.

We say that they are “censored” by the end of the data-collection.

There is a problem that is intrinsic to survival data, and is illustrated in this dataset: The event of importance in the

study is is “closing an account.” But not every customer (account)

actually experiences this event while being observed by researchers.

We say that they are “censored” by the end of the data-collection.

And, of course, some of the censored accounts will eventually experience the event of interest, but not while the researchers are watching! Ignoring this can seriously impact

estimates of time-to-event. And, given that time-to-event is the

focus of our research question, we need to figure out how to deal with this!

And, of course, some of the censored accounts will eventually experience the event of interest, but not while the researchers are watching! Ignoring this can seriously impact

estimates of time-to-event. And, given that time-to-event is the

focus of our research question, we need to figure out how to deal with this!

Page 10: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 10

One sensible thing you can do is display the frequency with which each account length occurs, in a vertical histogram that includes all the accounts in the sample, both censored and un-censored.

One sensible thing you can do is display the frequency with which each account length occurs, in a vertical histogram that includes all the accounts in the sample, both censored and un-censored.

I created this vertical histogram by typing the frequencies of each account length into an EXCEL spreadsheet. You can create similar vertical histograms in SAS too, but they are not so pretty.

I created this vertical histogram by typing the frequencies of each account length into an EXCEL spreadsheet. You can create similar vertical histograms in SAS too, but they are not so pretty.

Note the impact of the multi-cohort research design – any account that was opened after 1977 and remained open longer than 6 years is a censored case.

Note the impact of the multi-cohort research design – any account that was opened after 1977 and remained open longer than 6 years is a censored case.

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Account Data ANSWER TO RESEARCH QUESTION 1a

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Account Data ANSWER TO RESEARCH QUESTION 1a

Page 11: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 11

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Data

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Data

Here are two hopeless strategies for dealing with censoring, while summarizing account duration length …

Here are two hopeless strategies for dealing with censoring, while summarizing account duration length …

If we set the duration lengths of the censored accounts to their longest observed career length, the mean account duration for all accounts is 6.31 years. This too is a negatively biased estimate of true duration even if only one only one account has lasted account has lasted longer than the longer than the censored durationcensored duration.

If we set the duration lengths of the censored accounts to their longest observed career length, the mean account duration for all accounts is 6.31 years. This too is a negatively biased estimate of true duration even if only one only one account has lasted account has lasted longer than the longer than the censored durationcensored duration.

If you take the average of the duration lengths of only the uncensored accounts, their mean account duration is 3.73 years, which is a negatively biased estimate of the average population account duration.

If you take the average of the duration lengths of only the uncensored accounts, their mean account duration is 3.73 years, which is a negatively biased estimate of the average population account duration.

Page 12: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 12

Dataset Acct dur_PP.txt

Overview Person-period dataset containing the same information as the Acctdur.txt person dataset, on the career duration of accounts who began between 1972 and 1977, and who were followed uninterruptedly until 1984.

Source Bank records.

Sample size 24875 annual person-period records.

More Info Singer & Willett, 2003

You can resolve these problems by working with your data in a different format:Re-format the data into a person-period

format. In a person-period dataset, you can estimate

a different class of summary statistics that address the “whether” and “when” questions. Hazard probability. Survival probability. Median lifetime.

You can resolve these problems by working with your data in a different format:Re-format the data into a person-period

format. In a person-period dataset, you can estimate

a different class of summary statistics that address the “whether” and “when” questions. Hazard probability. Survival probability. Median lifetime.

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Resolving The Problem Of Censoring By Working In A Person-Period Dataset

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Resolving The Problem Of Censoring By Working In A Person-Period Dataset

Notice that the name of the dataset is different

Here’s a clue to the difference between the person-level and the person-period dataset…

Page 13: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 13

Person-Level DatasetID acctopen CENSOR  1 1 Not censored 2 2 Not censored 3 1 Not censored 4 1 Not censored 5 12 Censored 6 1 Not censored 7 12 Censored 8 1 Not censored 9 2 Not censored10 2 Not censored12 7 Not censored13 12 Censored14 1 Not censored15 12 Censored16 12 CensoredEtc.

Person-Level DatasetID acctopen CENSOR  1 1 Not censored 2 2 Not censored 3 1 Not censored 4 1 Not censored 5 12 Censored 6 1 Not censored 7 12 Censored 8 1 Not censored 9 2 Not censored10 2 Not censored12 7 Not censored13 12 Censored14 1 Not censored15 12 Censored16 12 CensoredEtc.

Person-PeriodDatasetID PERIOD EVENT1 1 12 1 02 2 13 1 14 1 15 1 05 2 05 3 05 4 05 5 05 6 05 7 05 8 05 9 05 10 05 11 05 12 06 1 17 1 07 2 07 3 07 4 07 5 07 6 07 7 07 8 07 9 07 10 07 11 07 12 0Etc.

Person-PeriodDatasetID PERIOD EVENT1 1 12 1 02 2 13 1 14 1 15 1 05 2 05 3 05 4 05 5 05 6 05 7 05 8 05 9 05 10 05 11 05 12 06 1 17 1 07 2 07 3 07 4 07 5 07 6 07 7 07 8 07 9 07 10 07 11 07 12 0Etc.

In a person-period dataset:• Each person has one row of

data for each time-period,• Their data record continues

until the time-period in which they either experience the event of interest, or they are censored.

In a person-period dataset:• Each person has one row of

data for each time-period,• Their data record continues

until the time-period in which they either experience the event of interest, or they are censored.

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Inspecting the Person-Period Dataset

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Inspecting the Person-Period Dataset

account #2 is not censored and so it experiences the event of interest (i.e. closes account ) in the 2nd year.

account #2 is not censored and so it experiences the event of interest (i.e. closes account ) in the 2nd year.

account #7 is censored – it never experiences the event of interest (i.e. never closes account ) in all the 12 years during which accounts are observed.

account #7 is censored – it never experiences the event of interest (i.e. never closes account ) in all the 12 years during which accounts are observed.

Page 14: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 14

EVENT(Did Customer close Account in this Time Period?) 

Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo close ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆclose ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229

EVENT(Did Customer close Account in this Time Period?) 

Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo close ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆclose ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229

PERIOD(Current Time Period) 

‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875

PERIOD(Current Time Period) 

‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875

Here’s the Life Table – a Two-Way Contingency Table Analysis of EVENT by PERIOD …Here’s the Life Table – a Two-Way Contingency Table Analysis of EVENT by PERIOD …

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Beginning Of The Life Table Analysis – Estimating The Sample Hazard Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Beginning Of The Life Table Analysis – Estimating The Sample Hazard Probability

We can use these frequencies to estimate a hazard probability that describes the “risk of closing” in each time-period.

Hazard probability is the (conditional) probability that an account will experience the event of importance (i.e., close) in a particular time-period, given that it has “survived” up until this period.

We can use these frequencies to estimate a hazard probability that describes the “risk of closing” in each time-period.

Hazard probability is the (conditional) probability that an account will experience the event of importance (i.e., close) in a particular time-period, given that it has “survived” up until this period.In discrete time period #1, for instance:

There are 3941 accounts “at risk of closing.” Of this “risk set of accounts,” 456 were observed to close. Hence, the probability that an account will close in this period, given that it entered it, is (456/3941), or 0.1157. So, the sample hazard probability in discrete time-period #1 is

In discrete time period #1, for instance:There are 3941 accounts “at risk of closing.” Of this “risk set of

accounts,” 456 were observed to close. Hence, the probability that an account will close in this period, given that it entered it, is (456/3941), or 0.1157. So, the sample hazard probability in discrete time-period #1 is

ˆ h t1( ) = 0.1157

Page 15: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 15

EVENT(Was account closed in this Time Period?) 

Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆYes ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229

EVENT(Was account closed in this Time Period?) 

Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆYes ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229

PERIOD(Current Time Period) 

‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875

PERIOD(Current Time Period) 

‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875

And the sample hazard probabilities for discrete time-periods #4, #5, #6 and #7…And the sample hazard probabilities for discrete time-periods #4, #5, #6 and #7…

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Hazard Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Hazard Probability

Something different is happening here in the Life Table?

What is it?Why is it occurring?Is it a problem?

2229

Page 16: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 16

Conclusion? The hazard probability provides the risk of closing at eachyear after an account is open.

Conclusion? The hazard probability provides the risk of closing at eachyear after an account is open.

Collect the sample hazard probabilities together and plot them as a sample hazard function …Collect the sample hazard probabilities together and plot them as a sample hazard function …

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Hazard Function

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Hazard Function

Page 17: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 17

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123

Once you have the sample hazard probabilities, you can cumulate them to get sample survival probabilities …Once you have the sample hazard probabilities, you can cumulate them to get sample survival probabilities …

Sample Survival ProbabilitySurvival probability in any time period is the probability of “surviving” beyond that period (ie, the probability of not experiencing the event of interest until after the period).

Sample Survival ProbabilitySurvival probability in any time period is the probability of “surviving” beyond that period (ie, the probability of not experiencing the event of interest until after the period).

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability

Here, all accounts survived the 0th time period, so the estimated sample survival probability in the 0th period is 1.000.

Here, all accounts survived the 0th time period, so the estimated sample survival probability in the 0th period is 1.000.

The estimated hazard probability suggests that a proportion of 0.1157 of accounts in the 1st period risk set will “die” in the 1st period (i.e., close).

The estimated hazard probability suggests that a proportion of 0.1157 of accounts in the 1st period risk set will “die” in the 1st period (i.e., close).

Because a proportion of 0.1157 of the risk set will “die” in the 1st period, we know that (1 - 0.1157) or 0.8843 of the 1st period risk set will survive.

In other words, 0.8843 of the entering “1.0000” will remain “alive” beyond the 1st time-period (and will therefore be potentially available to close at some later time).

The sample survival probability in the 1st time period is therefore 0.8843 1.000, or:

Because a proportion of 0.1157 of the risk set will “die” in the 1st period, we know that (1 - 0.1157) or 0.8843 of the 1st period risk set will survive.

In other words, 0.8843 of the entering “1.0000” will remain “alive” beyond the 1st time-period (and will therefore be potentially available to close at some later time).

The sample survival probability in the 1st time period is therefore 0.8843 1.000, or:

8843.0)(ˆ1 tS

Page 18: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 18

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123

And, the estimated survival probability in discrete time period #2…And, the estimated survival probability in discrete time period #2…

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability

Here, according to the estimated sample survival probability, a proportion of 0.8843 of the accounts survived the 1th time period.

Here, according to the estimated sample survival probability, a proportion of 0.8843 of the accounts survived the 1th time period.

The estimated hazard probability suggests that a proportion of 0.1102 of accounts in the 2nd period risk set will “die” in the 2nd period (i.e., close).

The estimated hazard probability suggests that a proportion of 0.1102 of accounts in the 2nd period risk set will “die” in the 2nd period (i.e., close).

Because a proportion of 0.1102 of the risk set will “die” in the 2nd period, we know that (1 - 0.1102), or 0.8898, of the 2nd period risk set will survive.

In other words, a proportion of 0.8898 of the entering “0.8843” will remain “alive” beyond the 2nd time period (and be potentially available to close later).

The sample survival probability in the 2nd time period is therefore 0.8898

0.8843, or:

Because a proportion of 0.1102 of the risk set will “die” in the 2nd period, we know that (1 - 0.1102), or 0.8898, of the 2nd period risk set will survive.

In other words, a proportion of 0.8898 of the entering “0.8843” will remain “alive” beyond the 2nd time period (and be potentially available to close later).

The sample survival probability in the 2nd time period is therefore 0.8898

0.8843, or:7869.0)(ˆ

2 tS

Page 19: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 19

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

0 1.00001 0.1157 0.88432 0.1102 0.78693 0.1158 0.69584 0.1076 0.62095 0.0891 0.56566 0.0825 0.51897 0.0601 0.48778 0.0481 0.46429 0.0422 0.444610 0.0369 0.428211 0.0247 0.417712 0.0128 0.4123

And, the estimated survival probability in discrete time period #3 … etcAnd, the estimated survival probability in discrete time period #3 … etc

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability

Here, according to the estimated sample survival probability, a proportion of 0.7869 of the accounts survived the 2nd time period.

Here, according to the estimated sample survival probability, a proportion of 0.7869 of the accounts survived the 2nd time period.

The estimated hazard probability suggests that a proportion of 0.1158 of accounts in the 3rd period risk set will “die” in the 3rd period (i.e., close).

The estimated hazard probability suggests that a proportion of 0.1158 of accounts in the 3rd period risk set will “die” in the 3rd period (i.e., close).

Because a proportion of 0.1158 of the risk set will “die” in the 3rd period, we know that (1 - 0.1158), or 0.8842, of the 3rd period risk set will survive.

In other words, a proportion of 0.8842 of the entering “0.7869” will remain “alive” beyond the 3rd time period (and be potentially available to close later).

The sample survival probability in the 3rd time period is therefore 0.8842

0.7869, or:

Because a proportion of 0.1158 of the risk set will “die” in the 3rd period, we know that (1 - 0.1158), or 0.8842, of the 3rd period risk set will survive.

In other words, a proportion of 0.8842 of the entering “0.7869” will remain “alive” beyond the 3rd time period (and be potentially available to close later).

The sample survival probability in the 3rd time period is therefore 0.8842

0.7869, or:6958.0)(ˆ

3 tS

Page 20: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 20

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

TimePeriod

SampleHazard

Probabilityh(t)

Sample Survival

ProbabilityS(t)

jt )(ˆ jth )(ˆjtS

1jt )(ˆ1jtS

As a general principle, the estimated survivor probability in any time period j can be found by substituting into a simple little rule …

As a general principle, the estimated survivor probability in any time period j can be found by substituting into a simple little rule …

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Simple Rule For Estimating Sample Survival Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Simple Rule For Estimating Sample Survival Probability

So, in general, in any time period j ..So, in general, in any time period j ..

)(ˆ)](ˆ1[)(ˆ1 jjj tSthtS

Page 21: Core deposits sensitivity and survival analysis

© Willett, Harvard University Graduate School of Education, 04/13/23

S052/II.2(b) – Slide 21

Plotting the sample survival probabilities against time period provides the sample survivor function.Plotting the sample survival probabilities against time period provides the sample survivor function.

Typical monotonically decreasing survivor function …Median lifetime survival probability is 6.6, point at which half of accounts are “still alive.”

Typical monotonically decreasing survivor function …Median lifetime survival probability is 6.6, point at which half of accounts are “still alive.”

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Survivor Function And Estimating Median Lifetime Survivor Probability

S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Survivor Function And Estimating Median Lifetime Survivor Probability

Page 22: Core deposits sensitivity and survival analysis

Research Question 2for Next Time…

• Question 2. How can we predict core deposit interest rates?• A. from prime interest rate?• B. from market interest rate?• 1. Can we predict core deposit interest rate from

3 month LIBOR (one index of market interest rate)?

• 2. from lagged LIBOR indices?• 3. Are there other market interest rate indices we

want to include to predict core deposit interest rate?