View
0
Download
0
Category
Preview:
Citation preview
Mining Event Histories
Mining Event Histories:Some New Insights on Personal Swiss Life Courses
Gilbert Ritschard
Dept of Econometrics and Laboratory of Demography, University of Genevahttp://mephisto.unige.ch
PaVie Seminar, Lausanne, October 22, 2008
19/11/2008gr 1/95
Mining Event Histories
My talk is about life courses,So, let me start with an example of scientific life course
date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)
Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists
(longitudinal data)2005-... KDD and Data mining approaches
for analysing life course data2007-... Start a SNF project on “Mining Event Histories”
19/11/2008gr 2/95
Mining Event Histories
My talk is about life courses,So, let me start with an example of scientific life course
date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)
Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists
(longitudinal data)2005-... KDD and Data mining approaches
for analysing life course data2007-... Start a SNF project on “Mining Event Histories”
19/11/2008gr 2/95
Mining Event Histories
My talk is about life courses,So, let me start with an example of scientific life course
date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)
Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists
(longitudinal data)2005-... KDD and Data mining approaches
for analysing life course data2007-... Start a SNF project on “Mining Event Histories”
19/11/2008gr 2/95
Mining Event Histories
Outline
1 Sequence Analysis in Social Sciences
2 Survival Trees
3 Characterizing, rendering and clustering sequence data
4 Mining Frequent Episodes
19/11/2008gr 3/95
Mining Event HistoriesSequence Analysis in Social Sciences
Table of content
1 Sequence Analysis in Social Sciences
2 Survival Trees
3 Characterizing, rendering and clustering sequence data
4 Mining Frequent Episodes
19/11/2008gr 4/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Section content
1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data
19/11/2008gr 5/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation
Individual life course paradigm.Following macro quantities (e.g. #divorces, fertility rate, meaneducation level, ...) over timeinsufficient for understanding social behavior.Need to follow individual life courses.
Data availabilityLarge panel surveys in many countries(SHP, CHER, SILC, GGP, ...)Biographical retrospective surveys (FFS, ...).Statistical matching of censuses, population registers and otheradministrative data.
19/11/2008gr 6/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation
Individual life course paradigm.Following macro quantities (e.g. #divorces, fertility rate, meaneducation level, ...) over timeinsufficient for understanding social behavior.Need to follow individual life courses.
Data availabilityLarge panel surveys in many countries(SHP, CHER, SILC, GGP, ...)Biographical retrospective surveys (FFS, ...).Statistical matching of censuses, population registers and otheradministrative data.
19/11/2008gr 6/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation
Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use
Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)
Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?
19/11/2008gr 7/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation
Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use
Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)
Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?
19/11/2008gr 7/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation
Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use
Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)
Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?
19/11/2008gr 7/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation: KD in Social sciences
In KDD (Knowledge discovery in databases) and data mining,focus on prediction and classification.Improve prediction and classification errors.
In Social science, aim is understanding/explaining (social)behaviors.Hence focus is on process rather than output.
19/11/2008gr 8/95
Mining Event HistoriesSequence Analysis in Social SciencesMotivation
Motivation: KD in Social sciences
In KDD (Knowledge discovery in databases) and data mining,focus on prediction and classification.Improve prediction and classification errors.
In Social science, aim is understanding/explaining (social)behaviors.Hence focus is on process rather than output.
19/11/2008gr 8/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Section content
1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data
19/11/2008gr 9/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
What kind of data?
What kind of data are we dealing with?Mainly categorical longitudinal data describing life courses
Data can be in different forms ...
19/11/2008gr 10/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
What kind of data?
What kind of data are we dealing with?Mainly categorical longitudinal data describing life courses
Data can be in different forms ...
19/11/2008gr 10/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
What kind of data?
What kind of data are we dealing with?Mainly categorical longitudinal data describing life courses
Data can be in different forms ...
19/11/2008gr 10/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
ontology of longitudinal data (Aristotelean tree)
Longitudinal data
States
one state per time unit t
not
several states at each t
not
not
Events
time stamped events
notevent sequence
not
notspell duration
not
19/11/2008gr 11/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Alternative views of Individual Longitudinal Data
Table: Time stamped events, record for Sandra
ending secondary school in 1970 first job in 1971 marriage in 1973
Table: State sequence view, Sandra
year 1969 1970 1971 1972 1973civil status single single single single marriededucation level primary secondary secondary secondary secondaryjob no no first first first
19/11/2008gr 12/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Alternative views of Individual Longitudinal Data
Table: Time stamped events, record for Sandra
ending secondary school in 1970 first job in 1971 marriage in 1973
Table: State sequence view, Sandra
year 1969 1970 1971 1972 1973civil status single single single single marriededucation level primary secondary secondary secondary secondaryjob no no first first first
19/11/2008gr 12/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Transforming time stamped events into state sequencesExample: the “BioFam” data
Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.
19/11/2008gr 13/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Transforming time stamped events into state sequencesExample: the “BioFam” data
Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.
19/11/2008gr 13/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Transforming time stamped events into state sequencesExample: the “BioFam” data
Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.
19/11/2008gr 13/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Transforming time stamped events into state sequencesExample: the “BioFam” data
Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.
19/11/2008gr 13/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Creating state sequences
Example of time stamped data:individual LHome marriage childbirth divorce
1 1989 1990 1992 NA
19/11/2008gr 14/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
Deriving the states
Need one state for each combination of events:
LHome marriage childbirth divorce0 no no no no1 yes no no no2 no yes yes/no no3 yes yes no no4 no no yes no5 yes no yes no6 yes yes yes no7 yes/no yes yes/no yes
19/11/2008gr 15/95
Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?
From events to states
Example of transformation :events:
individual LHome marriage childbirth divorce1 1989 1990 1992 NA
states:
individual ... 1988 1989 1990 1991 1992 1993 ...1 ... 0 0 1 3 3 6 ...
19/11/2008gr 16/95
Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data
Section content
1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data
19/11/2008gr 17/95
Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data
Issues with life course data
Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.
Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.
Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.
19/11/2008gr 18/95
Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data
Issues with life course data
Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.
Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.
Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.
19/11/2008gr 18/95
Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data
Issues with life course data
Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.
Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.
Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.
19/11/2008gr 18/95
Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data
Multi-level: Simple linear regression example
y = 3.2 + 0.2 x
y = 6.2 - 0.8 x
y = 15.6 - 0.8 x
y = 12.5 - 0.8 x
0
1
2
3
4
5
6
7
8
9
1 3 5 7 9 11 13 15
Education
Chi
ldre
n
19/11/2008gr 19/95
Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data
Section content
1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data
19/11/2008gr 20/95
Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data
Classical statistical approachesSurvival Approaches
Survival or Event history analysis (Blossfeld and Rohwer, 2002)Focuses on one event.Concerned with duration until event occursor with hazard of experiencing event.
Survival curves: Distribution of duration until event occurs
S(t) = p(T ≥ t) .
Hazard models: Regression like models for S(t, x) or hazardh(t) = p(T = t | T ≥ t)
h(t, x) = g(t, β0 + β1x1 + β2x2(t) + · · ·
).
19/11/2008gr 21/95
Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data
Classical statistical approachesSurvival Approaches
Survival or Event history analysis (Blossfeld and Rohwer, 2002)Focuses on one event.Concerned with duration until event occursor with hazard of experiencing event.
Survival curves: Distribution of duration until event occurs
S(t) = p(T ≥ t) .
Hazard models: Regression like models for S(t, x) or hazardh(t) = p(T = t | T ≥ t)
h(t, x) = g(t, β0 + β1x1 + β2x2(t) + · · ·
).
19/11/2008gr 21/95
Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data
Survival curves (Switzerland, SHP 2002 biographical survey)
Women
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80
AGE (years)
Surv
ival
pro
babi
lity
Leaving home Marriage 1st Chilbirth Parents' deathLast child left Divorce Widowing19/11/2008gr 22/95
Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data
Analysis of sequences
Frequencies of given subsequencesEssentially event sequences, e.g. (First job → Marriage).Subsequences considered as categories ⇒ Methods forcategorical data apply (Frequencies, cross tables, log-linearmodels, logistic regression, ...).
Markov chain modelsState sequences.Focuses on transition rates between states.Does the rate also depend on previous states?How many previous states are significant?
Optimal Matching (Abbott and Forrest, 1986) .State sequences.Edit distance (Levenshtein, 1966; Needleman and Wunsch,1970) between pairs of sequences.Clustering of sequences.
19/11/2008gr 23/95
Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data
Typology of methods for life course data
IssuesQuestions duration/hazard state/event sequencingdescriptive • Survival curves: • Frequencies of given
Parametric patterns(Weibull, Gompertz, ...) • Optimal matching
and non parametric clustering, MDS(Kaplan-Meier, Nelson- • Rendering sequencesAalen) estimators. • Discovering typical
episodescausality • Hazard regression models • Markov models
(Cox, ...) • Mobility trees• Survival trees • Discriminating episodes
• Sequence HeterogeneityAnalysis (Anova)
19/11/2008gr 24/95
Mining Event HistoriesSurvival Trees
Table of content
1 Sequence Analysis in Social Sciences
2 Survival Trees
3 Characterizing, rendering and clustering sequence data
4 Mining Frequent Episodes
19/11/2008gr 25/95
Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data
Section content
2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues
19/11/2008gr 26/95
Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data
SHP biographical retrospective surveyhttp://www.swisspanel.ch
SHP retrospective survey: 2001 (860) and 2002 (4700 cases).We consider only data collected in 2002.Data completed with variables from 2002 wave (language).
Characteristics of retained data for divorce(individuals who get married at least once)
men women TotalTotal 1414 1656 30701st marriage dissolution 231 308 539
16.3% 18.6% 17.6%
19/11/2008gr 27/95
Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data
SHP biographical retrospective surveyhttp://www.swisspanel.ch
SHP retrospective survey: 2001 (860) and 2002 (4700 cases).We consider only data collected in 2002.Data completed with variables from 2002 wave (language).
Characteristics of retained data for divorce(individuals who get married at least once)
men women TotalTotal 1414 1656 30701st marriage dissolution 231 308 539
16.3% 18.6% 17.6%
19/11/2008gr 27/95
Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data
Distribution by birth cohortBirth year
year
Fre
quen
cy
1910 1920 1930 1940 1950 1960
010
020
030
040
050
0
19/11/2008gr 28/95
Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data
Marriage duration until divorceSurvival curves
0 8
0.85
0.9
0.95
1
vie
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 10 20 30 40
prob
. de
surv
Durée du mariage, Femmes
1942 et avant
1943-1952
1953 et après
0 8
0.85
0.9
0.95
1
vie
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 10 20 30 40pr
ob. d
e su
rvDurée du mariage, Hommes
1942 et avant
1943-1952
1953 et après
0 8
0.85
0.9
0.95
1
vie
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 10 20 30 40
prob
. de
surv
Durée du mariage, Femmes
1942 et avant
1943-1952
1953 et après
19/11/2008gr 29/95
Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data
Marriage duration until divorceHazard model
Discrete time model (logistic regression on person-year data)exp(B) gives the Odds Ratio, i.e. change in the odd h/(1− h)when covariate increased by 1 unit.
exp(B) Sig.birthyr 1.0088 0.002university 1.22 0.043child 0.73 0.000language unknwn 1.47 0.000
French 1.26 0.007German 1 refItalian 0.89 0.537
Constant 0.0000000004 0.000
19/11/2008gr 30/95
Mining Event HistoriesSurvival TreesSurvival Tree Principle
Section content
2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues
19/11/2008gr 31/95
Mining Event HistoriesSurvival TreesSurvival Tree Principle
Survival trees: Principle
Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)
Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics
TW =∑
i
wi
(di1 − E(Di )
)(w2
i var(Di ))1/2
are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.
19/11/2008gr 32/95
Mining Event HistoriesSurvival TreesSurvival Tree Principle
Survival trees: Principle
Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)
Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics
TW =∑
i
wi
(di1 − E(Di )
)(w2
i var(Di ))1/2
are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.
19/11/2008gr 32/95
Mining Event HistoriesSurvival TreesSurvival Tree Principle
Survival trees: Principle
Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)
Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics
TW =∑
i
wi
(di1 − E(Di )
)(w2
i var(Di ))1/2
are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.
19/11/2008gr 32/95
Mining Event HistoriesSurvival TreesExample
Section content
2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues
19/11/2008gr 33/95
Mining Event HistoriesSurvival TreesExample
Divorce, Switzerland, Differences in KM Survival Curves I
Zoom
� � � � � � � � � � �
� � � � � � � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � �
� � � � � � � � � �
� � �
� � � � � � � � � � � �
� � � � � �� � � � � � � � � �
� � � � � � � � � � �
� � � � ! " " #$ � "
� � � � � � � � � � � �
� � � � � � �� � � � � � � � �
� � � � � � � � � �
% & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � �
% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � � � �
� � � � � � � � � �
� � � � � � � � ( � ) � *� � � � � � � � � �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � �
� � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � ' � � � # � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � �
+ � + �
+
+ � + + � + �19/11/2008gr 34/95
Mining Event HistoriesSurvival TreesExample
Divorce, Switzerland, Differences in KM Survival Curves II
0 10 20 30 40
0.5
0.6
0.7
0.8
0.9
1.0
Cohort <=1940 & Non French Speaking & University
Cohort <=1940 & Non French Speaking & < University
Cohort <=1940 & French Speaking
Cohort > 1940 & No Child & University
Cohort > 1940 & No Child & < University
Cohort > 1940 & Child & German or Italian Speaking
Cohort > 1940 & Child & French or Unknown Speaking
19/11/2008gr 35/95
Mining Event HistoriesSurvival TreesExample
Divorce, Switzerland, Relative risk
� � � � � � �
� � � � � � � � � � � � � � �
� � � � � �
� � � � � � � � � � � � � �
� � � �
� � � � � � � � � �
� � � � �� � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� �
� � � � � � � � � �
� � � � � � � �
� � � � � � � � � � � � � � � � � � � � �
� � � � �
� � � � � � � � � � � � � �
� � � � � � � �
� � � � � � � � � � � � � �
� � � � � � �
� � � � � � � � � � � � � �
� � � � � � �
� � � � � � � � � � � � �
� � � � � � �
� � � � � � � � � � � � � �
19/11/2008gr 36/95
Mining Event HistoriesSurvival TreesExample
Hazard model with interaction
Adding interaction effects detected with the tree approachimproves significantly the fit (sig ∆χ2 = 0.004)
exp(B) Sig.born after 1940 1.78 0.000university 1.22 0.049child 0.94 0.619language unknwn 1.50 0.000
French 1.12 0.282German 1 refItalian 0.92 0.677
b_before_40*French 1.46 0.028b_after_40*child 0.68 0.010
Constant 0.008 0.00019/11/2008gr 37/95
Mining Event HistoriesSurvival TreesSocial Science Issues
Section content
2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues
19/11/2008gr 38/95
Mining Event HistoriesSurvival TreesSocial Science Issues
Issues with survival trees in social sciences
1 Dealing with time varying predictorsSegal (1992) discusses few possibilities, none being reallysatisfactory.Huang et al. (1998) propose a piecewise constant approachsuitable for discrete variables and limited number of changes.Room for development ...
2 Multi-level analysisHow can we account for multi-level effects in survival trees,and more generally in trees?Conjecture: Should be possible to include unobserved sharedeffect in deviance-based splitting criteria.
19/11/2008gr 39/95
Mining Event HistoriesSurvival TreesSocial Science Issues
Issues with survival trees in social sciences
1 Dealing with time varying predictorsSegal (1992) discusses few possibilities, none being reallysatisfactory.Huang et al. (1998) propose a piecewise constant approachsuitable for discrete variables and limited number of changes.Room for development ...
2 Multi-level analysisHow can we account for multi-level effects in survival trees,and more generally in trees?Conjecture: Should be possible to include unobserved sharedeffect in deviance-based splitting criteria.
19/11/2008gr 39/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence data
Table of content
1 Sequence Analysis in Social Sciences
2 Survival Trees
3 Characterizing, rendering and clustering sequence data
4 Mining Frequent Episodes
19/11/2008gr 40/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Section content
3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences
EntropyLongitudinal within sequence entropies
Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences
19/11/2008gr 41/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Sequence analysis
Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).
Rendering sequencesColorize your life courses
Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.
19/11/2008gr 42/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Sequence analysis
Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).
Rendering sequencesColorize your life courses
Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.
19/11/2008gr 42/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Sequence analysis
Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).
Rendering sequencesColorize your life courses
Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.
19/11/2008gr 42/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Sequence analysis
Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).
Rendering sequencesColorize your life courses
Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.
19/11/2008gr 42/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Evolution tendencies in familial life course trajectories
Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):
De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.
19/11/2008gr 43/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Evolution tendencies in familial life course trajectories
Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):
De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.
19/11/2008gr 43/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories
Evolution tendencies in familial life course trajectories
Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):
De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.
19/11/2008gr 43/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Section content
3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences
EntropyLongitudinal within sequence entropies
Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences
19/11/2008gr 44/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Characterizing sets of sequences
Sequence of transversal measures (between entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·
Summary of longitudinal measures (sequence entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·
Other global characteristics: Central sequence, Sequencediversity, ...
19/11/2008gr 45/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Characterizing sets of sequences
Sequence of transversal measures (between entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·
Summary of longitudinal measures (sequence entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·
Other global characteristics: Central sequence, Sequencediversity, ...
19/11/2008gr 45/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Characterizing sets of sequences
Sequence of transversal measures (between entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·
Summary of longitudinal measures (sequence entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·
Other global characteristics: Central sequence, Sequencediversity, ...
19/11/2008gr 45/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Entropy
Entropy: Measure of uncertainty regarding state predictability.
pi , proportion of cases (or time points) in state i .Shannon h(p) =
∑i −pi log2(pi )
Other types of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.
(Transversal) entropy of the state at each time (age) point:Entropy increases with diversity of states observed at eachtime point (age).(Longitudinal) entropy of each individual sequences: Entropyincreases with diversity of states during the observed lifecourse and varies with the time spend in each state.
19/11/2008gr 46/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Entropy
Entropy: Measure of uncertainty regarding state predictability.
pi , proportion of cases (or time points) in state i .Shannon h(p) =
∑i −pi log2(pi )
Other types of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.
(Transversal) entropy of the state at each time (age) point:Entropy increases with diversity of states observed at eachtime point (age).(Longitudinal) entropy of each individual sequences: Entropyincreases with diversity of states during the observed lifecourse and varies with the time spend in each state.
19/11/2008gr 46/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Entropy
Entropy: Measure of uncertainty regarding state predictability.
pi , proportion of cases (or time points) in state i .Shannon h(p) =
∑i −pi log2(pi )
Other types of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.
(Transversal) entropy of the state at each time (age) point:Entropy increases with diversity of states observed at eachtime point (age).(Longitudinal) entropy of each individual sequences: Entropyincreases with diversity of states during the observed lifecourse and varies with the time spend in each state.
19/11/2008gr 46/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Illustrative data
Data from the 2002 SHP biographical surveyInterested in relationship between
Cohabitational trajectories (10 states)Professional trajectories (8 states)
We use the coding retained by Gauthier (2007)Focus on ages 20 to 45 (sequence length = 26 years)1503 cases (751 women, 752 men)
19/11/2008gr 47/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Transversal entropy at each time (age) point
Living Arrangement Trajectories
Age
Ent
ropy
A20 A23 A26 A29 A32 A35 A38 A41 A44
0.3
0.4
0.5
0.6
0.7
0.8
Professional Trajectories
Age
Ent
ropy
A20 A23 A26 A29 A32 A35 A38 A41 A44
0.3
0.4
0.5
0.6
0.7
0.8 1910−1940 1941−1950 1951−1957
19/11/2008gr 48/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Transversal entropy at each time (age) point
Men : Living Arrangement Trajectories
Age
Ent
ropy
A20 A23 A26 A29 A32 A35 A38 A41 A44
0.3
0.4
0.5
0.6
0.7
0.8
Women : Living Arrangement Trajectories
Age
Ent
ropy
A20 A23 A26 A29 A32 A35 A38 A41 A44
0.3
0.4
0.5
0.6
0.7
0.8 1910−1940 1941−1950 1951−1957
19/11/2008gr 49/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Transversal entropy at each time (age) point
Men: Professional Trajectories
Age
Ent
ropy
A20 A23 A26 A29 A32 A35 A38 A41 A44
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Women: Professional Trajectories
Age
Ent
ropy
A20 A23 A26 A29 A32 A35 A38 A41 A44
0.2
0.3
0.4
0.5
0.6
0.7
0.8 1910−1940 1941−1950 1951−1957
19/11/2008gr 50/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Hypothesis about longitudinal entropies
Cohabitational and professional life trajectoriesbecome less stablemore diversified
Their entropy tends to increase for younger generations.Are increases in professional trajectories related to increases incohabitational trajectories?
19/11/2008gr 51/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Hypothesis about longitudinal entropies
Cohabitational and professional life trajectoriesbecome less stablemore diversified
Their entropy tends to increase for younger generations.Are increases in professional trajectories related to increases incohabitational trajectories?
19/11/2008gr 51/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Hypothesis about longitudinal entropies
Cohabitational and professional life trajectoriesbecome less stablemore diversified
Their entropy tends to increase for younger generations.Are increases in professional trajectories related to increases incohabitational trajectories?
19/11/2008gr 51/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Entropy of cohabitational trajectories
●●●●●●●●●●●●●●
●
●●
●
●●●●●●●
1910−1940 1941−1950 1951−1957
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Men: Living Arrangment Trajectories
●
1910−1940 1941−1950 1951−19570.
00.
10.
20.
30.
40.
50.
60.
7
Women: Living Arrangment Trajectories
p(F > f ) = .000∗∗∗ p(F > f ) = .073∗all 2 by 2 differences significant coh3 significantly (.02) different from coh1
19/11/2008gr 52/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Entropy of professional trajectories
●●
●
●
●
1910−1940 1941−1950 1951−1957
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Men: Professional Trajectories
1910−1940 1941−1950 1951−19570.
00.
10.
20.
30.
40.
50.
60.
7
Women: Professional Trajectories
p(F > f ) = .002∗∗∗ p(F > f ) = .001∗∗∗coh3 not significantly different from coh2
19/11/2008gr 53/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences
Correlation between cohabitational and professional entropies
Overall Men Women1910-1940 0.08 ∗ 0.11 ∗ 0.19 ∗∗∗1941-1950 0.12 ∗∗ 0.14 ∗∗ 0.30 ∗∗∗1951-1957 0.15 ∗∗ 0.25 ∗∗∗ 0.31 ∗∗∗
19/11/2008gr 54/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Section content
3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences
EntropyLongitudinal within sequence entropies
Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences
19/11/2008gr 55/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Clustering, Multidimensional scaling and more
Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.
19/11/2008gr 56/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Clustering, Multidimensional scaling and more
Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.
19/11/2008gr 56/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Clustering, Multidimensional scaling and more
Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.
19/11/2008gr 56/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Clustering, Multidimensional scaling and more
Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.
19/11/2008gr 56/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Distances between sequences
Edit distance (known as Optimal matching in Social sciences)(Levenshtein, 1966; Needleman and Wunsch, 1970; Abbott andForrest, 1986)
d(x , y) Total cost of insert, deletion and substitution changesrequired to transform sequence x into y .Different solutions depending on indel and substitution costs.
Other metrics proposed by (Elzinga, 2008)LCP: Longest common prefix (also longest common postfix)LCS: Longest common subsequence(same as OM with indel cost = 1, and substitution cost = 2).NMS: Number of matching subsequences...
Elzinga (2008) proposes a nice formalization of these metrics.
19/11/2008gr 57/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Distances between sequences
Edit distance (known as Optimal matching in Social sciences)(Levenshtein, 1966; Needleman and Wunsch, 1970; Abbott andForrest, 1986)
d(x , y) Total cost of insert, deletion and substitution changesrequired to transform sequence x into y .Different solutions depending on indel and substitution costs.
Other metrics proposed by (Elzinga, 2008)LCP: Longest common prefix (also longest common postfix)LCS: Longest common subsequence(same as OM with indel cost = 1, and substitution cost = 2).NMS: Number of matching subsequences...
Elzinga (2008) proposes a nice formalization of these metrics.
19/11/2008gr 57/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Clustering with OM distances: Dendrograms[1
][1
92]
[290
][3
19]
[360
][1
197]
[133
3][1
412]
[103
6][5
04]
[532
][2
04]
[388
][1
472]
[128
0][3
43]
[40]
[633
][4
2][5
20]
[632
][3
57]
[833
][4
1][6
42]
[944
][1
046]
[112
3][6
73]
[257
][6
85]
[795
][1
068]
[107
4][3
18]
[508
][1
069]
[138
][1
161]
[306
][1
220]
[352
][5
43]
[517
][1
188]
[162
][8
19]
[135
9][6
37]
[409
][9
41]
[121
5][7
06]
[101
1][1
375]
[221
][9
86]
[255
][4
37]
[940
][5
76]
[126
9][4
55]
[139
1][7
60]
[136
4][5
52]
[227
][7
79]
[281
][3
27]
[144
5][3
26]
[270
][1
315]
[465
][1
494]
[743
][1
493]
[120
3][1
355]
[114
][1
022]
[102
3][1
23]
[588
][2
58]
[124
8][4
28]
[210
][1
387]
[415
][7
52]
[712
][9
97]
[292
][1
331]
[524
][5
05]
[957
][3
17]
[439
][9
29]
[955
][1
438]
[528
][1
039]
[122
2][2
73]
[492
][5
65]
[658
][5
42]
[560
][8
44]
[135
6][6
34]
[933
][1
214]
[104
0][1
071]
[145
0][3
24]
[132
6][4
12]
[109
3][7
92]
[15]
[175
][2
24]
[783
][1
304]
[561
][1
075]
[122
][1
283]
[924
][6
40]
[641
][7
26]
[101
2][1
303]
[143
3][1
063]
[998
][4
38]
[463
][9
70]
[105
2][1
330]
[143
6][1
446]
[137
1][3
1][6
7][4
88]
[782
][1
089]
[110
7][1
274]
[80]
[163
][3
40]
[556
][5
74]
[110
8][1
297]
[143
2][5
53]
[402
][7
89]
[519
][9
87]
[103
7][1
279]
[222
][3
07]
[499
][1
201]
[444
][8
18]
[456
][6
60]
[956
][9
64]
[995
][1
374]
[26]
[157
][8
11]
[104
7][5
5][7
9][6
6][8
13]
[468
][1
148]
[116
5][1
357]
[97]
[692
][1
135]
[101
7][1
485]
[269
][1
219]
[113
6][1
133]
[464
][5
30]
[127
1][4
71]
[765
][1
281]
[149
][1
81]
[784
][8
07]
[981
][1
358]
[639
][1
175]
[111
8][1
117]
[747
][7
64]
[133
2][2
40]
[500
][5
39]
[503
][1
256]
[123
4][1
496]
[132
7][1
2][1
8][4
5][4
6][5
6][6
3][9
6][1
32]
[144
][1
59]
[169
][1
80]
[189
][2
15]
[246
][2
63]
[264
][2
95]
[297
][3
58]
[429
][4
41]
[442
][4
57]
[467
][5
06]
[558
][5
62]
[568
][5
69]
[585
][5
95]
[596
][5
98]
[664
][7
05]
[717
][7
46]
[761
][7
72]
[775
][7
76]
[777
][8
55]
[863
][8
67]
[879
][8
97]
[950
][9
96]
[100
5][1
067]
[108
3][1
084]
[110
0][1
101]
[111
3][1
114]
[114
4][1
147]
[125
8][1
260]
[134
4][1
345]
[135
4][1
376]
[149
8][1
499]
[72]
[414
][1
353]
[59]
[938
][4
93]
[945
][5
59]
[850
][2
96]
[144
3][1
019]
[111
6][6
2][5
84]
[952
][2
17]
[251
][9
92]
[583
][6
50]
[675
][1
60]
[494
][1
079]
[250
][6
71]
[23]
[119
1][1
64]
[350
][8
37]
[140
5][1
406]
[141
][7
98]
[566
][9
37]
[103
8][1
259]
[165
][7
90]
[16]
[623
][1
225]
[425
][1
321]
[20]
[22]
[814
][1
264]
[225
][2
37]
[110
9][1
18]
[619
][1
018]
[118
5][9
31]
[216
][3
92]
[146
5][9
34]
[141
9][1
15]
[126
8][1
143]
[122
1][4
96]
[511
][8
62]
[184
][1
142]
[121
0][4
35]
[346
][6
69]
[138
0][5
18]
[115
4][6
36]
[114
9][6
97]
[899
][1
119]
[117
][3
05]
[405
][4
66]
[932
][9
36]
[228
][8
75]
[781
][1
45]
[202
][1
273]
[145
7][1
70]
[147
8][4
79]
[688
][1
322]
[155
][1
93]
[943
][1
207]
[226
][9
06]
[119
6][1
9][7
78]
[127
6][2
35]
[313
][6
96]
[243
][6
74]
[763
][8
96]
[921
][5
34]
[918
][2
38]
[834
][1
488]
[510
][8
68]
[128
4][1
313]
[145
4][8
02]
[766
][1
173]
[144
0][2
36]
[342
][7
33]
[799
][1
470]
[801
][2
84]
[331
][3
02]
[976
][1
213]
[720
][7
70]
[137
][5
36]
[117
9][6
01]
[100
9][1
307]
[294
][9
02]
[137
3][2
06]
[332
][1
430]
[578
][1
073]
[917
][1
302]
[131
1][3
0][1
20]
[690
][8
40]
[418
][4
58]
[128
7][1
177]
[232
][2
85]
[145
5][3
11]
[132
0][4
51]
[809
][6
83]
[124
7][1
79]
[947
][8
15]
[126
3][8
80]
[905
][1
21]
[140
3][2
86]
[140
8][7
08]
[554
][1
34]
[434
][6
57]
[762
][2
76]
[110
4][3
83]
[2]
[651
][6
52]
[102
][9
12]
[201
][2
75]
[391
][1
395]
[133
6][3
90]
[134
0][1
458]
[39]
[627
][1
155]
[820
][1
453]
[50]
[110
3][1
88]
[507
][1
98]
[86]
[711
][1
19]
[128
5][1
349]
[291
][8
00]
[448
][8
49]
[604
][1
126]
[148
9][6
0][4
00]
[404
][5
81]
[913
][4
13]
[871
][1
111]
[771
][8
66]
[119
0][5
79]
[139
7][1
398]
[38]
[985
][7
04]
[124
6][3
61]
[698
][7
10]
[124
5][1
97]
[616
][1
473]
[127
0][9
61]
[836
][8
9][2
47]
[804
][5
45]
[106
][1
267]
[105
][1
024]
[659
][1
085]
[207
][1
346]
[135
2][3
20]
[107
0][2
34]
[373
][3
74]
[573
][1
474]
[106
1][1
449]
[120
2][1
312]
[239
][6
26]
[299
][9
60]
[105
6][1
125]
[14]
[680
][8
89]
[885
][3
62]
[821
][5
3][3
96]
[575
][4
20]
[136
5][6
67]
[54]
[888
][5
72]
[145
6][1
54]
[729
][2
18]
[125
5][1
080]
[118
0][1
206]
[140
][4
19]
[486
][8
60]
[916
][7
3][1
102]
[701
][8
91]
[214
][1
416]
[848
][5
44]
[107
2][3
28]
[138
8][7
93]
[980
][1
360]
[602
][6
72]
[823
][5
93]
[756
][1
235]
[132
9][1
94]
[622
][5
16]
[681
][3
63]
[116
4][5
14]
[702
][7
85]
[138
1][8
12]
[141
3][1
157]
[407
][6
03]
[978
][9
94]
[835
][1
242]
[147
5][4
69]
[113
9][1
277]
[883
][6
87]
[144
2][5
1][1
01]
[85]
[287
][1
090]
[443
][9
04]
[325
][6
78]
[753
][1
409]
[146
8][1
492]
[112
][1
041]
[230
][1
350]
[769
][1
305]
[547
][1
097]
[132
3][9
90]
[108
8][1
128]
[101
3][1
059]
[136
8][1
029]
[131
][9
22]
[100
6][3
67]
[126
1][1
56]
[476
][2
78]
[886
][1
50]
[662
][3
66]
[314
][1
160]
[724
][4
78]
[830
][8
53]
[48]
[118
3][1
193]
[725
][1
288]
[293
][1
112]
[125
2][5
57]
[262
][1
053]
[120
8][5
02]
[768
][8
10]
[124
9][6
1][8
32]
[143
][1
168]
[546
][3
36]
[703
][2
41]
[427
][2
67]
[105
5][1
447]
[100
3][1
167]
[119
5][1
39]
[116
9][9
93]
[109
1][8
39]
[620
][6
35]
[213
][1
158]
[977
][1
134]
[107
6][1
239]
[136
1][1
53]
[629
][1
174]
[828
][8
87]
[100
4][1
166]
[454
][1
393]
[890
][9
35]
[132
8][2
77]
[645
][1
186]
[104
5][1
317]
[146
1][5
09]
[907
][9
08]
[563
][1
057]
[666
][1
184] [3]
[323
][6
53]
[654
][1
266]
[126
][1
394]
[354
][2
83]
[911
][1
015]
[134
1][5
26]
[138
5][3
6][1
29]
[523
][1
78]
[110
6][2
56]
[242
][8
45]
[846
][1
13]
[203
][7
74]
[322
][1
087]
[309
][1
316]
[694
][1
194]
[127
8][6
93]
[126
5][9
19]
[803
][2
74]
[738
][1
205]
[339
][5
40]
[894
][3
71]
[125
0] [7]
[424
][1
82]
[308
][1
490]
[44]
[282
][3
00]
[148
6][6
08]
[100
7][1
189]
[142
6][3
64]
[140
0][7
94]
[124
4][9
82]
[5]
[121
1][4
89]
[723
][1
044]
[112
9][1
427]
[329
][4
03]
[132
4][2
89]
[399
][4
30]
[408
][7
35]
[927
][8
70]
[105
4][1
121]
[127
2][1
290]
[35]
[580
][5
87]
[142
0][2
98]
[303
][3
33]
[341
][1
152]
[453
][1
423]
[621
][1
081]
[379
][7
00]
[452
][4
90]
[431
][5
70]
[676
][7
07]
[21]
[49]
[148
7][1
159]
[137
0][6
5][1
00]
[88]
[791
][4
83]
[668
][7
18]
[594
][8
54]
[825
][1
343]
[529
][8
38]
[104
8][1
229]
[43]
[744
][4
26]
[369
][1
131]
[436
][1
431]
[123
3][1
096]
[87]
[773
][8
72]
[137
8][1
3][5
82]
[605
][1
30]
[416
][2
48]
[749
][9
53]
[954
][2
11]
[113
8][6
86]
[119
2][3
4][4
47]
[103
4][2
52]
[882
][1
87]
[138
6][6
10]
[527
][8
95]
[147
7][2
33]
[116
2][1
163]
[930
][1
231]
[142
2][3
21]
[140
2][4
84]
[577
][1
362]
[37]
[498
][1
300]
[52]
[312
][1
026]
[112
7][1
292]
[93]
[107
][7
48]
[841
][1
301]
[541
][1
294]
[133
8][8
4][1
141]
[910
][1
286]
[140
1][1
47]
[345
][9
68]
[249
][1
031]
[136
7][1
74]
[126
2][6
25]
[691
][7
54]
[145
2][3
44]
[102
1][5
67]
[847
][7
13]
[118
7] [8]
[555
][1
212]
[355
][1
241]
[69]
[571
][3
3][1
467]
[714
][1
48]
[245
][2
72]
[822
][8
65]
[121
7][1
464]
[387
][6
18]
[141
0][2
08]
[900
][1
91]
[624
][6
48]
[117
8][3
16]
[334
][1
216]
[125
4][5
91]
[989
][1
0][9
4][9
5][3
56]
[377
][4
60]
[551
][5
92]
[649
][6
77]
[928
][9
42]
[109
8][1
124]
[121
8][1
407]
[142
8][9
46]
[70]
[115
6][5
99]
[140
4][1
83]
[109
2][7
80]
[521
][8
06]
[817
][9
65]
[105
1][1
237]
[138
3][7
1][1
172]
[338
][1
451]
[423
][1
425]
[135
][9
01]
[151
][1
060]
[533
][7
88]
[372
][1
335]
[401
][6
47]
[564
][1
153]
[874
][1
318]
[138
2][9
1][1
314]
[158
][1
309]
[150
1][2
71]
[966
][2
59]
[148
1][1
001]
[398
][4
46]
[732
][8
58]
[983
][1
36]
[537
][6
09]
[351
][3
86]
[432
][6
89]
[750
][6
06]
[612
][6
15]
[728
][7
55]
[103
3][1
243]
[145
9][7
36]
[903
][1
014]
[113
7][4
7][1
04]
[172
][3
76]
[394
][5
89]
[878
][1
035]
[123
8][1
348]
[142
9][1
480]
[149
7][4
77]
[116
][6
65]
[859
][1
150]
[138
4][1
424]
[330
][7
37]
[105
8][4
17]
[550
][1
463]
[607
][8
3][3
35]
[133
][3
75]
[487
][1
25]
[661
][1
293]
[111
][8
29]
[742
][9
79]
[108
2][1
130] [4]
[385
][5
38]
[684
][7
40]
[876
][1
414]
[141
5][1
46]
[185
][9
23]
[974
][3
10]
[353
][7
30]
[613
][6
14]
[124
][2
65]
[406
][4
62]
[915
][9
91]
[127
5][1
500]
[186
][2
79]
[347
][3
93]
[459
][6
70]
[864
][8
84]
[898
][9
09]
[962
][9
63]
[103
2][1
049]
[110
5][1
176]
[125
7][1
295]
[597
][1
008]
[17]
[119
9][6
46]
[106
5][1
306]
[133
4][2
4][1
351]
[114
0][1
77]
[861
][1
441]
[27]
[209
][4
50]
[90]
[231
][7
22]
[739
][9
58]
[107
7][3
70]
[881
][1
227]
[122
8][1
363]
[365
][1
434]
[411
][1
122]
[395
][9
73]
[797
][1
392]
[474
][7
67]
[481
][7
31]
[869
][1
151]
[25]
[472
][8
51]
[972
][1
016]
[137
2][1
90]
[513
][1
095]
[148
4][3
37]
[525
][9
59]
[118
1][1
310]
[92]
[102
0][1
337]
[137
9][5
12]
[104
2][1
094]
[827
][1
389]
[969
][1
200]
[497
][8
26]
[106
2][1
435]
[147
6][6
8][9
8][4
10]
[656
][7
09]
[719
][1
099]
[129
8][1
469]
[288
][3
01]
[758
][4
70]
[515
][4
61]
[824
][5
86]
[123
2][8
31]
[139
9][7
8][8
57]
[939
][1
28]
[843
][1
396]
[786
][1
291]
[176
][4
82]
[535
][1
96]
[110
][7
21]
[926
][9
71]
[107
8][3
04]
[449
][8
05]
[102
5][1
064]
[111
5] [6]
[57]
[127
][1
42]
[200
][2
53]
[260
][2
68]
[397
][6
31]
[638
][6
43]
[644
][8
42]
[892
][9
49]
[102
8][1
030]
[112
0][1
145]
[114
6][1
198]
[120
4][1
282]
[150
3][7
5][8
2][1
08]
[440
][4
91]
[751
][7
96]
[134
2][1
71]
[548
][1
027]
[106
6][3
59]
[873
][1
209]
[378
][1
002]
[58]
[161
][1
66]
[168
][1
73]
[195
][2
23]
[261
][4
22]
[475
][4
80]
[590
][6
00]
[617
][6
28]
[699
][7
15]
[787
][8
56]
[877
][9
67]
[984
][1
000]
[105
0][1
224]
[130
8][1
319]
[141
7][1
482]
[148
3][1
491]
[150
2][2
9][1
67]
[199
][3
68]
[389
][4
33]
[445
][4
85]
[611
][6
30]
[734
][1
086]
[113
2][1
170]
[132
5][1
377]
[144
8][1
110]
[143
7][1
296]
[109
][2
20]
[229
][2
80]
[384
][4
21]
[655
][9
20]
[975
][2
19]
[380
][1
223]
[147
1][7
4][7
6][7
7][2
12]
[266
][5
49]
[759
][9
48]
[104
3][1
289]
[143
9] [9]
[473
][8
08]
[988
][1
182]
[123
0][1
299]
[64]
[315
][1
52]
[816
][1
444]
[381
][4
95]
[501
][7
45]
[852
][9
51]
[999
][1
010]
[125
3][1
366]
[142
1][1
466]
[757
][9
9][6
63]
[205
][7
41]
[11]
[103
][2
44]
[348
][6
79]
[682
][8
93]
[925
][1
226]
[123
6][1
251]
[133
9][1
347]
[139
0][1
479]
[149
5][2
8][3
2][2
54]
[522
][5
31]
[716
][9
14]
[117
1][1
240]
[136
9][1
411]
[141
8][1
460]
[146
2][8
1][3
49]
[695
][3
82]
[727
]
050
0010
000
1500
020
000
Cohabitational trajectories, Ward method
(OM Distances, Indel=1, Subst. Cost based on Trans. Rate)Individual trajectories
Hei
ght
1 2 5 6 7 13 24 29 30 34 36 37 38 42 43 46 50 53 62 64 68 69 76 84 86 92 95 97 101
104
112
116
117
118
120
124
129
137
139
149
156
158
163
166
168
172
173
177
181
188
190
197
202
206
210
214
215
217
220
224
228
229
230
232
235
236
246
249
254
257
258
264
269
271
272
275
276
278
282
287
289
294
296
297
302
307
308
314
315
318
322
325
330
332
333
337
343
347
348
351
352
353
360
362
365
367
369
372
375
376
378
384
392
393
399
403
412
415
419
421
426
428
434
438
441
451
453
457
461
463
464
466
468
477
478
483
485
489
490
492
494
495
499
505
506
507
514
518
521
525
526
528
532
533
538
540
543
549
550
552
554
558
561
562
566
568
569
570
576
578
579
582
586
588
592
594
595
596
598
599
604
606
613
616
626
629
631
632
633
638
640
648
653
657
661
662
664
669
671
676
679
684
685
689
691
693
700
705
708
710
712
717
720
733
734
736
737
738
740
747
748
752
753
754
756
758
761
769
794
798
799
800
801
802
803
805
808
816
820
827
828
835
840
850
853
856
858
866
868
870
871
875
877
881
884
888
890
892
894
899
902
904
909
913
914
917
924
925
929
937
938
940
942
943
945
948
949
952
958
960
964
966
969
970
971
972
973
974
976
978
983
984
987
990
991
995
996
1000
1004
1006
1011
1017
1018
1020
1023
1024
1025
1027
1032
1033
1034
1038
1040
1043
1051
1057
1059
1066
1070
1071
1073
1075
1077
1087
1089
1090
1094
1096
1112
1113
1114
1115
1120
1121
1123
1128
1130
1152
1157
1162
1166
1169
1171
1175
1176
1183
1184
1185
1187
1188
1189
1191
1195
1197
1199
1201
1204
1208
1211
1213
1215
1219
1222
1228
1238
1243
1247
1253
1258
1261
1262
1263
1265
1267
1270
1274
1275
1277
1279
1282
1290
1297
1302
1303
1307
1309
1313
1318
1320
1324
1325
1329
1330
1332
1333
1339
1341
1343
1348
1358
1360
1363
1365
1366
1367
1371
1374
1376
1377
1383
1387
1392
1396
1398
1404
1409
1410
1415
1417
1418
1419
1420
1422
1425
1427
1428
1431
1432
1438
1441
1444
1445
1447
1448
1449
1450
1456
1463
1464
1470
1473
1476
1479
1489
1490
1494
1495
1498
1499
1501
1408 12 15 89 98 114
150
174
192
260
281
355
391
410
465
471
475
523
531
539
567
600
617
619
635
636
646
651
658
659
678
713
765
789
807
809
818
842
847
886
906
955
1007
1081
1099
1104
1110
1135
1182
1236
1250
1288
1295
1345
1350
1354
1382
1388
1389
1401
1403
1405
1413
1430
1439
1442
1471
1484 87 30
512
0719
610
6911
94 498
1047 64
912
317
119
920
985
525
084
332
065
610
09 922
1052
1165 3
110
1911
01 143
1393 91
135
427
453
447
260
270
785
112
5612
64 182
216
238
266
313
383
406
470
724
766
773
860
915
919
956
1002
1030
1055
1056
1063
1144
1147
1198
1210
1272
1314
1322
1327
1446
1452 99
2 48 331
481
6114
35 179
993
1352 13
444
241
790
710
1513
53 270
1496 79
191
898
1 463
750
491
021
212
35 373
444
723
625
1370 45
973
569
027 41 23
425
127
943
956
367
268
168
373
978
210
2910
9211
2511
4311
9612
0212
3213
0013
4914
86 40
502
82 146
285
370
519
557
781
811
830
837
1045
1048
1091
1203
1223
1229
1242
1455 9
122 20
549
654
559
371
574
175
182
586
111
7011
7312
3412
4412
76 138
1064 72
123 67 75 34
535
936
144
559
060
871
974
376
786
487
310
6111
3113
0413
3413
7314
81 47 385
1337
730
824
335
1159
1381 630
810
1072 91
695
4 10 336
155
364
982
170
786
931
327
509
19 610
1039
1306 93
068
057
413
1114
8513
75 3510
37 255
624
17 20 49 55 110
221
277
310
407
517
839
876
1323
1368 5
792
183
614
16 59 901
1126 43
079
099
7 7857
210
36 7919
484
810
010
08 838
780
119
620
660
262
328
947
1200
1257 62
226
711
90 857 74
655
1317
1468
164
729
905
564
1326
1328
1174
1453 3
772
418
1231 95
013
1213
3814
9110
7610
5311
3414
143
114
54 284
1108
1158 28
679
612
17 280
565
1139
1378 20
445
678
512
9810
41 241
686
529
1399 560
703
1237 26 169
544
1119
985
1466
1168 60 244
1097 24
563 67
311 56 13
114
422
322
730
031
734
935
736
639
740
141
342
342
943
344
945
448
648
851
551
653
654
255
357
357
564
164
466
367
470
976
077
778
481
382
283
492
693
496
399
911
0011
0511
4511
5411
6012
0912
1412
4912
6912
7312
8713
0514
0714
5914
7214
8314
9715
02 72 159
273
524
696
1078
1083
1412 11
148
010
35 559
1179 84
512
614
577
530
180
474
4 88 201
440
776
252
701
1137
157
731
1013
1284 53
098
821
376
387
998
651 13
238
069
277
812
59 295
611
1423 79
789
710
5810
7911
77 189
381
424
455
479
556
121
771
208
727
770
1281 41
410
058 81 99 482
218
779
416
826
953
812
832
1268
154
994
831
1192
420
891
535
687
476
634
883
793
547
1357
764
1088
1474 69
769
851
158
915
311
33 312
1141
334
603
338
1218
1347 37
773 42
714
214
75 107
462
903
371
702
1085
1239 50
057
710
7411
63 21
1167
1193 82
355
511
4913
1514
88 4512
52 239
342
382
762
833
510
8516
528
897
736
377
411
07 374 94
1461 81
911
4810
4614
36 7710
6034
051
311
0213
56 844
1103 89
616
754
682
158
598
910
2211
1113
1613
31 814
1434 19
850
387
288
713
3633
963
914
6511
1811
51 726
667
1117 90
013
7210
5413
6414
92 32 247
583
1361 72
512
21 54
222
1385 66
624
087
434
467
784
112
46 311
1240
324
704
319
1255
1248
458
512
409
927
1180
1245
914
013
4611
81 1639
510
26 309
405
746
957
1310
103
647
1344 85
254
110
65 732
788
1062 44 368
920
422
1283
1308 65
213
91 125
1362
1095
127
147
1129 86
226
878
740
825
693
649
713
55 668
80 469
1395
178
1294
1164 29
912
830
654
893
526
394
446
714
51 6579
588
272
869
913
42 152
303
1116
584
1429 10
218
387
810
01 316
527
675
829
979
1254 39 52 66 113
219
446
493
597
654
749
854
908
959
1028
1289 58 28
310
0310
1411
0611
3212
2012
7113
94 321
1042
1296 23
713
5910
4913
5114
8710
912
216
232
945
249
161
574
510
1014
7715
03 358
975
1044 59
110
1619
575
593
910
6813
1914
58 341
1227
1402
1421 75
038
962
816
196
835
091
274
275
718
584
696
134
645
0 9668
811
5513
35 105
980
207
1379
1278
1340 62
724
211
40 759
1127 89
829
111
2211
524
843
584
986
993
341
132
362
320
326
532
692
312
0514
93 443
614
694
1457
1500 19
310
50 711
642
2513
69 893
1280 33 148
191
394
1156
1380 96
512
41 93 390
447
618
650
932
1299
1462 10
815
126
161
211
53 356
551
621
1233
387
716
941
474
670
28 487
259
1478
1293
133
298
386
520
537
682
951
1012
1084
1251
1460 18
710
2113
0113
014
43 211
184
253
398
501
643
967
1146
1216
1482 39
622
629
011
5014
2410
8611
2413
90 425
437
460
865
867
885
928
1172
1321
1480 14 571
1260 99
810
31 70 581
817
508
1206 16
012
24 473
580
645
176
718
889
7110
67 607
783
1142 66
571
429
313
84 863
400
946
1225 18 379
605
1292
1406 89
511
3813
86 83
292
587
1098
1433 20
076
810
8211
7814
6714
6922
514
00 484
722
243
1426
1437 90 706
1285 88
030
410
612
86 609
1093
1411
1161 79
213
514
40 186
1186
231
601
1230
1136 136
436
815
1109
1226
175
522
388
962
180
402
859
1212
1414 23
369
580
613
9710
80 404
1291 43
244
812
66
050
0010
000
1500
020
000
Professional trajectories, Ward method
(OM Distances, Indel=1, Subst. Cost based on Trans. Rate)Individual trajectories
Hei
ght
LA Prof19/11/2008gr 58/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
LA, State distribution by age, within cluster
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 1 : Conjugal Trajectories (16 %)
Fre
q. (
n=23
5)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 2 : Parental Trajectories, Slow Transition (19 %)
Fre
q. (
n=28
5)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 3: Parental Trajectories, Fast Transition (48 %)
Fre
q. (
n=71
4)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 4 : Nestalgic Trajectories (7 %)
Fre
q. (
n=11
0)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 5 : Solo or Reconstituted Family Trajectories (11 %)
Fre
q. (
n=15
9)
0.0
0.2
0.4
0.6
0.8
1.0
Biological father and motherOne biological parentOne biological parent with her/his partnerAloneWith partnerPartner and biological childPartner and non biological childBiological child and no partnerFriendsOther
19/11/2008gr 59/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
LA, Most frequent sequences by cluster
3.4%
3.4%
3%
2.5%
2.5%
2.1%
2.1%
2.1%
2.1%
1.7%
Type 1 : Conjugal Trajectories (16 %)
Fre
q. (
n=23
5)
A20 A23 A26 A29 A32 A35 A38 A41 A44
1.1%
1.1%
1.1%
1.1%
0.7%
0.7%
0.7%
0.7%
0.7%
0.7%
Type 2 : Parental Trajectories, Slow Transition (19 %)
Fre
q. (
n=28
5)
A20 A23 A26 A29 A32 A35 A38 A41 A44
4.5%
3.5%
2.5%
2.4%
2.4%
2.2%
2%
1.8%
1.7%
1.5%
Type 3: Parental Trajectories, Fast Transition (48 %)
Fre
q. (
n=71
4)
A20 A23 A26 A29 A32 A35 A38 A41 A44
61.8%
1.8%1.8%0.9%0.9%0.9%0.9%0.9%0.9%0.9%
Type 4 : Nestalgic Trajectories (7 %)
Fre
q. (
n=11
0)
A20 A23 A26 A29 A32 A35 A38 A41 A44
3.1%
1.9%
1.9%
1.9%
1.9%
1.3%
1.3%
1.3%
1.3%
1.3%
Type 5 : Solo or Reconstituted Family Trajectories (11 %)
Fre
q. (
n=15
9)
A20 A23 A26 A29 A32 A35 A38 A41 A44
Biological father and motherOne biological parentOne biological parent with her/his partnerAloneWith partnerPartner and biological childPartner and non biological childBiological child and no partnerFriendsOther
19/11/2008gr 60/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
LA, Sequence diversity within cluster
19/11/2008gr 61/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
LA, Birth year distribution by clusterType 1 : Conjugal Trajectories (16 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 2 : Parental Trajectories, Slow Transition (19 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 3: Parental Trajectories, Fast Transition (48 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 4 : Nestalgic Trajectories (7 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 5 : Solo or Reconstituted Family Trajectories (11 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Overall
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
19/11/2008gr 62/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Prof, State distribution by age, within cluster
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 1 : Full Time Trajectoires (53 %)
Fre
q. (
n=79
5)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 2 : Mixed Part Time − Home Trajectories (13 %)
Fre
q. (
n=15
5)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 3 : At Home Trajectories (16 %)
Fre
q. (
n=27
7)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 4 : Part Time Trajectories (7 %)
Fre
q. (
n=10
1)
0.0
0.2
0.4
0.6
0.8
1.0
A20 A23 A26 A29 A32 A35 A38 A41 A44
Type 5 : Missing Data (11 %)
Fre
q. (
n=17
5)
0.0
0.2
0.4
0.6
0.8
1.0
MissingFull timePart timeNegative breakPositive breakAt homeRetiredEducation
19/11/2008gr 63/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Prof, Most frequent sequences by cluster
56.6%
8.4%
3.8%
2.8%2.4%2.3%1.9%1.8%0.8%0.6%
Type 1 : Full Time Trajectoires (53 %)
Fre
q. (
n=79
5)
A20 A23 A26 A29 A32 A35 A38 A41 A44
2.6%
2.6%
2.6%
1.9%
1.9%
1.9%
1.9%
1.9%
1.3%
1.3%
Type 2 : Mixed Part Time − Home Trajectories (13 %)
Fre
q. (
n=15
5)
A20 A23 A26 A29 A32 A35 A38 A41 A44
5.4%
4%
4%
3.2%
3.2%
3.2%
2.9%
2.2%
2.2%
2.2%
Type 3 : At Home Trajectories (16 %)
Fre
q. (
n=27
7)
A20 A23 A26 A29 A32 A35 A38 A41 A44
5%
5%
5%
4%
2%
2%
2%
2%
2%
2%
Type 4 : Part Time Trajectories (7 %)
Fre
q. (
n=10
1)
A20 A23 A26 A29 A32 A35 A38 A41 A44
34.9%
4.6%
3.4%
3.4%
2.9%
2.3%
2.3%1.7%1.7%1.7%
Type 5 : Missing Data (11 %)
Fre
q. (
n=17
5)
A20 A23 A26 A29 A32 A35 A38 A41 A44
MissingFull timePart timeNegative breakPositive breakAt homeRetiredEducation
19/11/2008gr 64/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Prof, Sequence diversity within cluster
19/11/2008gr 65/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering
Prof, Birth year distribution by clusterType 1 : Full Time Trajectoires (53 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 2 : Mixed Part Time − Home Trajectories (13 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 3 : At Home Trajectories (16 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 4 : Part Time Trajectories (7 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Type 5 : Missing Data (11 %)
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Overall
Birth year
Den
sity
1910 1920 1930 1940 1950 1960
0.00
0.01
0.02
0.03
0.04
0.05
0.06
19/11/2008gr 66/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination
Section content
3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences
EntropyLongitudinal within sequence entropies
Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences
19/11/2008gr 67/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination
Heterogeneity of set of sequences
Sum of squares can be expressed in terms of the distancesbetween each pair of points
SS =n∑
i=1(yi − y)2 =
1n
n∑i=1
n∑j=i+1
(yi − yj)2
=1n
n∑i=1
n∑j=i+1
dij
Setting dij to the OM, LCP, LCS, ... distance, we get ameasure of diversity or heterogeneity of sequences.Can apply ANOVA principle to sequences.
19/11/2008gr 68/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination
Heterogeneity of set of sequences
Sum of squares can be expressed in terms of the distancesbetween each pair of points
SS =n∑
i=1(yi − y)2 =
1n
n∑i=1
n∑j=i+1
(yi − yj)2
=1n
n∑i=1
n∑j=i+1
dij
Setting dij to the OM, LCP, LCS, ... distance, we get ameasure of diversity or heterogeneity of sequences.Can apply ANOVA principle to sequences.
19/11/2008gr 68/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination
Heterogeneity of set of sequences
Sum of squares can be expressed in terms of the distancesbetween each pair of points
SS =n∑
i=1(yi − y)2 =
1n
n∑i=1
n∑j=i+1
(yi − yj)2
=1n
n∑i=1
n∑j=i+1
dij
Setting dij to the OM, LCP, LCS, ... distance, we get ameasure of diversity or heterogeneity of sequences.Can apply ANOVA principle to sequences.
19/11/2008gr 68/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination
Heterogeneity analysisProfessional trajectories by sex
19/11/2008gr 69/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination
Sequence Tree
19/11/2008gr 70/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences
Section content
3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences
EntropyLongitudinal within sequence entropies
Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences
19/11/2008gr 71/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences
Multidimensional Scaling: Principle
Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in
Finding a set of real valued variables (f1, f2) such that theδij =
√(fi1 − fj1)2 + (fi2 − fj2)2 best approximate the
distances between sequences.Plotting the points in the (f1, f2) space.
19/11/2008gr 72/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences
Multidimensional Scaling: Principle
Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in
Finding a set of real valued variables (f1, f2) such that theδij =
√(fi1 − fj1)2 + (fi2 − fj2)2 best approximate the
distances between sequences.Plotting the points in the (f1, f2) space.
19/11/2008gr 72/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences
Multidimensional Scaling: Principle
Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in
Finding a set of real valued variables (f1, f2) such that theδij =
√(fi1 − fj1)2 + (fi2 − fj2)2 best approximate the
distances between sequences.Plotting the points in the (f1, f2) space.
19/11/2008gr 72/95
Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences
Multidimensional Scaling
−20 −10 0 10 20
−20
−10
010
2030
40
Multidimensional scaling representation, colored cluster of cohabitation
MDS axis 1
MD
S a
xis
2
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
Type 1 : Conjugal Trajectories (16 %)Type 2 : Parental Trajectories, Slow Transition (19 %)Type 3: Parental Trajectories, Fast Transition (48 %)Type 4 : Nestalgic Trajectories (7 %)Type 5 : Solo or Reconstituted Family Trajectories (11 %)
−20 −10 0 10 20
−20
−10
010
2030
40
Multidimensional scaling representation, colored cluster of professional trajectories
MDS axis 1
MD
S a
xis
2
●●● ●
● ●
●●●● ●
●●●● ●●●
●●
●
●
●
●●●●●●●●●
●
● ●●
●●●
●
●●● ●●●●●
● ●●●●●●●●●
●
●
●
●● ●
●●●●
●●●
● ●●●●●●
●
●
●
●
●
● ●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
Type 1 : Full Time Trajectoires (53 %)Type 2 : Mixed Part Time − Home Trajectories (13 %)Type 3 : At Home Trajectories (16 %)Type 4 : Part Time Trajectories (7 %)Type 5 : Missing Data (11 %)
19/11/2008gr 73/95
Mining Event HistoriesMining Frequent Episodes
Table of content
1 Sequence Analysis in Social Sciences
2 Survival Trees
3 Characterizing, rendering and clustering sequence data
4 Mining Frequent Episodes
19/11/2008gr 74/95
Mining Event HistoriesMining Frequent Episodes
Mining Frequent Episodes
(Time stamped) event sequencesWhat can we expect from frequent episodes mining?
GSP (Srikant and Agrawal, 1996)MINEPI, WINEPI (Mannila et al., 1997)TCG, TAG (Bettini et al., 1996)SPADE (Zaki, 2001)
Are there specific issues when applying these methods insocial sciences?
19/11/2008gr 75/95
Mining Event HistoriesMining Frequent Episodes
Mining Frequent Episodes
(Time stamped) event sequencesWhat can we expect from frequent episodes mining?
GSP (Srikant and Agrawal, 1996)MINEPI, WINEPI (Mannila et al., 1997)TCG, TAG (Bettini et al., 1996)SPADE (Zaki, 2001)
Are there specific issues when applying these methods insocial sciences?
19/11/2008gr 75/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Section content
4 Mining Frequent EpisodesWhat Is It About?Example: Counting Alternate Episode StructuresFrequent and discriminant episodes
19/11/2008gr 76/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Frequent episodes. What is it?
Episode: Collection of events occurring frequently together.Mining typical (frequent) episodes:
Specialized case of mining frequent itemsets.Time dimension ⇒ Partially ordered events.
More complex than unordered itemsets: User mustspecify time constraints (and episode structure constraints).select a counting method.
19/11/2008gr 77/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Frequent episodes. What is it?
Episode: Collection of events occurring frequently together.Mining typical (frequent) episodes:
Specialized case of mining frequent itemsets.Time dimension ⇒ Partially ordered events.
More complex than unordered itemsets: User mustspecify time constraints (and episode structure constraints).select a counting method.
19/11/2008gr 77/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Episode structure constraints
For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?
LH,17
w = 2
??
w = 1
C1
M(0, 4)
(0,3)
(0, 1, 10)
elastic
event constraints
parallel
node constraint
edge constraints
19/11/2008gr 78/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Episode structure constraints
For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?
LH,17
w = 2
??
w = 1
C1
M(0, 4)
(0,3)
(0, 1, 10)
elastic
event constraints
parallel
node constraint
edge constraints
19/11/2008gr 78/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Episode structure constraints
For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?
LH,17
w = 2
??
w = 1
C1
M(0, 4)
(0,3)
(0, 1, 10)
elastic
event constraints
parallel
node constraint
edge constraints
19/11/2008gr 78/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Counting methods (Joshi et al., 2001)
20 21 22 23 24
U UUC C C
Searching (U,C)min gap= 1, max gap= 2, win size= 2
indiv. with episode COBJ = 1
windows with episode CWIN = 3
min win. with episode CminWIN = 2
distinct occurrences CDIS_o = 5
dist. occ. without overlap CDIS = 3
19/11/2008gr 79/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Counting methods (Joshi et al., 2001)
20 21 22 23 24
U UUC C C
Searching (U,C)min gap= 1, max gap= 2, win size= 2
indiv. with episode COBJ = 1
windows with episode CWIN = 3
min win. with episode CminWIN = 2
distinct occurrences CDIS_o = 5
dist. occ. without overlap CDIS = 3
19/11/2008gr 79/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Counting methods (Joshi et al., 2001)
20 21 22 23 24
U UUC C C
Searching (U,C)min gap= 1, max gap= 2, win size= 2
indiv. with episode COBJ = 1
windows with episode CWIN = 3
min win. with episode CminWIN = 2
distinct occurrences CDIS_o = 5
dist. occ. without overlap CDIS = 3
19/11/2008gr 79/95
Mining Event HistoriesMining Frequent EpisodesWhat Is It About?
Counting methods (Joshi et al., 2001)
20 21 22 23 24
U UUC C C
Searching (U,C)min gap= 1, max gap= 2, win size= 2
indiv. with episode COBJ = 1
windows with episode CWIN = 3
min win. with episode CminWIN = 2
distinct occurrences CDIS_o = 5
dist. occ. without overlap CDIS = 3
19/11/2008gr 79/95
Mining Event HistoriesMining Frequent EpisodesExample: Counting Alternate Episode Structures
Section content
4 Mining Frequent EpisodesWhat Is It About?Example: Counting Alternate Episode StructuresFrequent and discriminant episodes
19/11/2008gr 80/95
Mining Event HistoriesMining Frequent EpisodesExample: Counting Alternate Episode Structures
Example: Counting alternate structures (COBJ, no max gap)
0%
5%
10%
15%
20%
25%
30%
Child <
Marr
iage
Marriag
e < C
hild
Child =
Marr
iage
Child <
Job
Job <
Chil
d
Child =
Job
Child <
Edu
c end
Educ e
nd <
Child
Child =
Edu
c end
Marriag
e < Jo
b
Job <
Marr
iage
Marriag
e = Jo
b
Marriag
e < Edu
c end
Educ e
nd <
Marriag
e
Marriag
e = Edu
c end
Job <
Edu
c end
Educ e
nd <
Job
Job =
Edu
c end
Switzerland, SHP 2002 biographical survey (n = 5560).19/11/2008gr 81/95
Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes
Section content
4 Mining Frequent EpisodesWhat Is It About?Example: Counting Alternate Episode StructuresFrequent and discriminant episodes
19/11/2008gr 82/95
Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes
Frequent episodes, cohabitational and professional trajectories0.
00.
10.
20.
30.
40.
50.
6
(Bio
logi
cal f
athe
r an
d m
othe
r)
(With
par
tner
>P
artn
er a
nd b
iolo
gica
l chi
ld)
(Bio
logi
cal f
athe
r an
d m
othe
r)−
(With
par
tner
>P
artn
er a
nd b
iolo
gica
l chi
ld)
(Alo
ne>
With
par
tner
)
(Bio
logi
cal f
athe
r an
d m
othe
r>W
ith p
artn
er)
(Bio
logi
cal f
athe
r an
d m
othe
r)−
(Bio
logi
cal f
athe
r an
d m
othe
r>W
ith p
artn
er)
(Bio
logi
cal f
athe
r an
d m
othe
r>A
lone
)
(Bio
logi
cal f
athe
r an
d m
othe
r)−
(Bio
logi
cal f
athe
r an
d m
othe
r>A
lone
)
(Bio
logi
cal f
athe
r an
d m
othe
r>P
artn
er a
nd b
iolo
gica
l chi
ld)
(Bio
logi
cal f
athe
r an
d m
othe
r)−
(Bio
logi
cal f
athe
r an
d m
othe
r>P
artn
er a
nd b
iolo
gica
l chi
ld)
(Bio
logi
cal f
athe
r an
d m
othe
r>W
ith p
artn
er)−
(With
par
tner
>P
artn
er a
nd b
iolo
gica
l chi
ld)
(Bio
logi
cal f
athe
r an
d m
othe
r)−
(Bio
logi
cal f
athe
r an
d m
othe
r>W
ith p
artn
er)−
(With
par
tner
>P
artn
er a
nd b
iolo
gica
l chi
ld)
(Alo
ne>
With
par
tner
)−(W
ith p
artn
er>
Par
tner
and
bio
logi
cal c
hild
)
(Bio
logi
cal f
athe
r an
d m
othe
r)−
(Alo
ne>
With
par
tner
)
(Bio
logi
cal f
athe
r an
d m
othe
r>A
lone
)−(A
lone
>W
ith p
artn
er)
0.0
0.1
0.2
0.3
0.4
0.5
(Ful
l tim
e)
(Edu
catio
n)
(Edu
catio
n>F
ull t
ime)
(Edu
catio
n)−
(Edu
catio
n>F
ull t
ime)
(Ful
l tim
e>A
t hom
e)
(Ful
l tim
e)−
(Ful
l tim
e>A
t hom
e)
(Ful
l tim
e>P
art t
ime)
(At h
ome>
Par
t tim
e)
(Mis
sing
)
(Ful
l tim
e)−
(Ful
l tim
e>P
art t
ime)
(Ful
l tim
e>A
t hom
e)−
(At h
ome>
Par
t tim
e)
(Ful
l tim
e)−
(At h
ome>
Par
t tim
e)
(Par
t tim
e>F
ull t
ime)
(Ful
l tim
e)−
(Ful
l tim
e>A
t hom
e)−
(At h
ome>
Par
t tim
e)
19/11/2008gr 83/95
Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes
Discriminant episodes, professional trajectories, women
FullTime Mixed AtHome PartTime
(Full time>At home)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
FullTime Mixed AtHome PartTime
(Full time)−(Full time>At home)
0.0
0.1
0.2
0.3
0.4
0.5
FullTime Mixed AtHome PartTime
(At home>Part time)
0.0
0.1
0.2
0.3
0.4
FullTime Mixed AtHome PartTime Missing
(Full time>Part time)
0.0
0.1
0.2
0.3
0.4
0.5
Mixed AtHome PartTime
(Full time>At home)−(At home>Part time)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Mixed AtHome PartTime
(Full time)−(At home>Part time)
0.00
0.05
0.10
0.15
0.20
0.25
19/11/2008gr 84/95
Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes
Discriminant episodes, professional trajectories, men
FullTime Mixed Missing
(Missing)
0.0
0.1
0.2
0.3
0.4
FullTime Mixed PartTime Missing
(Education>Missing)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
FullTime Mixed PartTime Missing
(Full time)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
FullTime Mixed PartTime Missing
(Education>Full time)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
FullTime Mixed PartTime Missing
(Education)−(Education>Full time)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
FullTime Mixed PartTime Missing
(Education)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
19/11/2008gr 85/95
Mining Event HistoriesSummary
Summary
Data mining approaches (survival trees, clustering sequences,frequent episodes) have promising future in life courseanalysis.
Complement classical statistical outcomes with new insights.
Their use within social sciences raises specific issues:Accounting for multi-level effects when growing survival tree ormining association rules.Handling time varying predictors in survival trees.Selecting relevant counting methods (event dependent)?Suitable criteria for measuring association strength betweenfrequent episodes....
19/11/2008gr 86/95
Mining Event HistoriesSummary
Summary
Data mining approaches (survival trees, clustering sequences,frequent episodes) have promising future in life courseanalysis.
Complement classical statistical outcomes with new insights.
Their use within social sciences raises specific issues:Accounting for multi-level effects when growing survival tree ormining association rules.Handling time varying predictors in survival trees.Selecting relevant counting methods (event dependent)?Suitable criteria for measuring association strength betweenfrequent episodes....
19/11/2008gr 86/95
Mining Event HistoriesSummary
Our TraMineR R-package
Let me finish with an Add ...
TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining
19/11/2008gr 87/95
Mining Event HistoriesSummary
Our TraMineR R-package
Let me finish with an Add ...
TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining
19/11/2008gr 87/95
Mining Event HistoriesSummary
Our TraMineR R-package
Let me finish with an Add ...
TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining
19/11/2008gr 87/95
Mining Event HistoriesSummary
Our TraMineR R-package
Let me finish with an Add ...
TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining
19/11/2008gr 87/95
Mining Event HistoriesSummary
Thank You!Thank You!
19/11/2008gr 88/95
Mining Event HistoriesAppendixZoomed tree
Divorce, Switzerland, Differences in KM Survival CurvesI
Return
� � � � � � � � � � �
� � � � � � � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � �
� � � � � � � � � �
� � �
� � � � � � � � � � � �
� � � � � �� � � � � � � � � �
� � � � � � � � � � �
� � � � ! " " #$ � "
� � � � � � � � � � � �
� � � � � � �� � � � � � � � �
� � � � � � � � � �
% & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � �
% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � � � �
� � � � � � � � � �
� � � � � � � � ( � ) � *� � � � � � � � � �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � �
� � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � ' � � � # � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � �
+ � + �
+
+ � + + � + �19/11/2008gr 89/95
Mining Event HistoriesAppendixZoomed tree
Return
� � � � � � � � � � �
� � � � � � � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � �
� � � � � � � � � �
� � �
� � � � � � � � � � � �
� � � � � �� � � � � � � � � �
� � � � � � � � � � �
� � � � ! " " #$ � "
� � � � � � � � � � � �
� � � � � � �� � � � � � � � �
� � � � � � � � � �
% & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � �
% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � � � �
� � � � � � � � � �
� � � � � � � � ( � ) � *� � � � � � � � � �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � � �
� � � � � � � � � � �
� � � � � � � � �� � � � � � � �
� � � � � � � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � �
� � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � ' � � � # � � �
� � � � � � � � � � � �
� � � � � � �� � � � � � � �
� � � � � � � � � �
� � � � � � � � � � � �
� � � � � � � �� � � � � � � � �
� � � � � � � � � �
$ � "� �
� � � � � � � �
% & � � � � � � � # � � � � ' � � � # � � �
+ � + �
+
+ � + + � + �
19/11/2008gr 89/95
Mining Event HistoriesAppendixFor Further Reading
For Further Reading I
Abbott, A. and J. Forrest (1986). Optimal matching methods forhistorical sequences. Journal of Interdisciplinary History 16,471–494.
Bettini, C., X. S. Wang, and S. Jajodia (1996). Testing complextemporal relationships involving multiple granularities and itsapplication to data mining (extended abstract). In PODS ’96:Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGARTsymposium on Principles of database systems, New York, pp.68–78. ACM Press.
19/11/2008gr 90/95
Mining Event HistoriesAppendixFor Further Reading
For Further Reading II
Billari, F. C. (2005). Life course analysis: Two (complementary)cultures? Some reflections with examples from the analysis oftransition to adulthood. In R. Levy, P. Ghisletta, J.-M. Le Goff,D. Spini, and E. Widmer (Eds.), Towards an InterdisciplinaryPerspective on the Life Course, Advances in Life CourseResearch, Vol. 10, pp. 267–288. Amsterdam: Elsevier.
Blossfeld, H.-P. and G. Rohwer (2002). Techniques of EventHistory Modeling, New Approaches to Causal Analysis (2nded.). Mahwah NJ: Lawrence Erlbaum.
Elzinga, C. H. (2008). Sequence analysis: Metric representationsof categorical time series. Sociological Methods and Research.forthcoming.
19/11/2008gr 91/95
Mining Event HistoriesAppendixFor Further Reading
For Further Reading III
Elzinga, C. H. and A. C. Liefbroer (2007). De-standardization offamily-life trajectories of young adults: A cross-nationalcomparison using sequence analysis. European Journal ofPopulation 23, 225–250.
Gauthier, J.-A. (2007). Empirical categorizations of socialtrajectories: A sequential view on the life course. Thèse,Université de Lausanne, Faculté des sciences sociales et politique(SSP), Lausanne.
Huang, X., S. Chen, and S. Soong (1998). Piecewise exponentialsurvival trees with time-dependent covariates. Biometrics 54,1420–1433.
19/11/2008gr 92/95
Mining Event HistoriesAppendixFor Further Reading
For Further Reading IV
Joshi, M. V., G. Karypis, and V. Kumar (2001). A universalformulation of sequential patterns. In Proceedings of theKDD’2001 workshop on Temporal Data Mining, San Fransisco,August 2001.
Leblanc, M. and J. Crowley (1992). Relative risk trees for censoredsurvival data. Biometrics 48, 411–425.
Levenshtein, V. (1966). Binary codes capable of correctingdeletions, insertions, and reversals. Soviet Physics Doklady 10,707–710.
Mannila, H., H. Toivonen, and A. I. Verkamo (1997). Discovery offrequent episodes in event sequences. Data Mining andKnowledge Discovery 1(3), 259–289.
19/11/2008gr 93/95
Mining Event HistoriesAppendixFor Further Reading
For Further Reading VNeedleman, S. and C. Wunsch (1970). A general method
applicable to the search for similarities in the amino acidsequence of two proteins. Journal of Molecular Biology 48,443–453.
Segal, M. R. (1988). Regression trees for censored data.Biometrics 44, 35–47.
Segal, M. R. (1992). Tree-structured methods for longitudinaldata. Journal of the American Statistical Association 87(418),407–418.
Srikant, R. and R. Agrawal (1996). Mining sequential patterns:Generalizations and performance improvements. In P. M. G.Apers, M. Bouzeghoub, and G. Gardarin (Eds.), Advances inDatabase Technologies – 5th International Conference onExtending Database Technology (EDBT’96), Avignon, France,Volume 1057, pp. 3–17. Springer-Verlag.
19/11/2008gr 94/95
Mining Event HistoriesAppendixFor Further Reading
For Further Reading VI
Zaki, M. J. (2001). SPADE: An efficient algorithm for miningfrequent sequences. Machine Learning 42(1/2), 31–60.
19/11/2008gr 95/95
Recommended