51
Introduction to Triangulated Graphs Tandy Warnow

Introduction to Triangulated GraphsRec-I-DCM3 significantly improves performance Comparison of TNT to Rec-I-DCM3(TNT) on one large dataset 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • IntroductiontoTriangulatedGraphs

    TandyWarnow

  • Topicsfortoday

    •  Triangulatedgraphs:theoremsandalgorithms(Chapters11.3and11.9)

    •  Examplesoftriangulatedgraphsinphylogenyestimation(Chapters4.8,11.3-11.5)

  • Triangulated(i.e.,Chordal)Graphs

    •  Definition:Agraphistriangulatedifithasnosimplecyclesofsizefourormore.

  • DCMsareDivide-and-Conquerstrategies!

  • DCMsforphylogenyreconstruction

    •  Defineatriangulatedgraphsothatitsverticescorrespondto

    theinputtaxa(orsequences)•  Decomposethegraphintooverlappingsubgraphs,thus

    decomposingthetaxaintooverlappingsubsets.•  Applythe“basemethod” toeachsubsetoftaxa,toconstruct

    asubsettree•  Applyasupertreemethodtothesubsettreestoobtaina

    singletreeonthefullsetoftaxa.

  • DCMs(Disk-CoveringMethods)

    •  DCMsforpolynomialtimemethodsimprovetopologicalaccuracy(empiricalobservation)andhaveprovabletheoreticalguaranteesunderMarkovmodelsofevolution.

    •  DCMsforhardoptimizationproblemsreducerunningtimeneededtoachievegoodlevelsofaccuracy(empiricalobservation)

  • DecomposingTriangulatedGraphs

    Max Clique Decomposition Separator-component Decomposition

    Given:TriangulatedgraphG=(V,E)Output:DecompositionoftheverticesintooverlappingsubsetsRequire:Polynomialtime!Technique:Usespecialpropertiesabouttriangulatedgraphs

  • SimplicialVertices

    Definition:LetG=(V,E)beagraph,andletvbeavertexinV.Thenvissimplicialifitssetofneighbors(i.e.,Γ(v))isaclique.Todo:•  Giveexampleofagraphthathasnosimplicialvertices.

    •  Giveexampleofagraphwhereeveryvertexissimplicial.

  • PerfectEliminationOrdering

    Definition:LetG=(V,E)beagraphonnvertices.Aperfecteliminationorderingisanorderingoftheverticesv1,v2,…,vnofGsothateachvertexviissimplicialinthegraphinducedon{vi+1,vi+2,…,vn}.Theorems:•  Everytriangulatedgraphhasasimplicialvertex.•  Infact,everytriangulatedgraphthatisnotacliquehastwonon-adjacentsimplicialvertices.

  • PerfectEliminationOrdering

    Definition:LetG=(V,E)beagraphonnvertices.Aperfecteliminationorderingisanorderingoftheverticesv1,v2,…,vnofGsothateachvertexviissimplicialinthegraphinducedon{vi+1,vi+2,…,vn}.Theorems(Rose1970):AgraphGistriangulatedifandonlyifithasaperfecteliminationordering.Furthermore,givenatriangulatedgraph,aperfecteliminationorderingcanbefoundinpolynomialtime.

  • Somepropertiesofchordalgraphs

    •  Theorem:EverychordalgraphG=(V,E)hasatmost|V|maximalcliques,andthesecanbefoundinpolynomialtime:–  Maxcliquedecomposition.

    •  Provethisusingtheexistenceofaperfecteliminationordering.

  • Somepropertiesofchordalgraphs

    •  Theorem:EverychordalgraphG=(V,E)hasatmost|V|maximalcliques,andthesecanbefoundinpolynomialtime:–  Maxcliquedecomposition.

    •  Provethisusingtheexistenceofaperfecteliminationordering.

  • Somepropertiesofchordalgraphs

    •  Everychordalgraphthatisnotacliquehasavertexseparatorthatisamaximalclique,anditcanbefoundinpolynomialtime:–  Separator-componentdecomposition.

  • Somepropertiesofchordalgraphs

    •  Everychordalgraphhasatmostnmaximalcliques,andthesecanbefoundinpolynomialtime:Maxcliquedecomposition.

    •  Everychordalgraphthatisnotacliquehasavertexseparatorthatisamaximalclique,anditcanbefoundinpolynomialtime:Separator-componentdecomposition.

  • DecomposingTriangulatedGraphs

    Max Clique Decomposition Separator-component Decomposition

    Given:TriangulatedgraphG=(V,E)Output:DecompositionoftheverticesintooverlappingsubsetsRequire:Polynomialtime!Technique:Usespecialpropertiesabouttriangulatedgraphs

  • DCMsareDivide-and-Conquerstrategies!

  • Howtocombinesubsettrees?

    •  Everytriangulatedgraphhasaperfecteliminationordering:–  enablesustomergecorrectsubtreesandgetacorrectsupertreeback,ifsubtreesarebigenough(sothattheycontainalltheshortquartettrees).

  • TriangulatedGraphsandTrees

    Theorem(Gravil1974,Buneman1974):AgraphGistriangulatedifandonlyifGistheintersectiongraphofasetofsubtreesofatree.Proof:Onedirectioniseasy,andtheotherisnot…

  • ExamplesofTriangulatedGraphs

    •  ThresholdgraphsTG(D,q):Disadditiveandqisanyrealnumber,and(x,y)isanedgeifandonlyifD[x,y]

  • ExamplesofTriangulatedGraphs

    •  ThresholdgraphsTG(D,q):Disadditiveandqisanyrealnumber,and(x,y)isanedgeifandonlyifD[x,y]

  • DCM1-boostingdistance-basedmethods[Nakhlehetal.ISMB2001]

    • Theorem(Warnowetal.,SODA2001):DCM1-NJconvergestothetruetreefrompolynomiallengthsequences

    NJ DCM1-NJ

    0 400 800 1600 1200 No. Taxa

    0

    0.2

    0.4

    0.6

    0.8

    Erro

    r Rat

    e

  • ExamplesofTriangulatedGraphs

    •  ShortSubtreeGraphsSSG(T,w):Tisatreewithedge-weightingw,andeveryshortquartetcontributesa4-clique.

    •  Theorem:ForalltreesTwithedgeweightingw,SSG(T,w)isadditive.

    •  Proof:Usethefactthatallsubtreeintersectiongraphsaretriangulated.

  • Rec-I-DCM3 significantly improves performance

    Comparison of TNT to Rec-I-DCM3(TNT) on one large dataset

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    0 4 8 12 16 20 24

    Hours

    Average MP score above

    optimal, shown as a percentage of

    the optimal

    Current best techniques (TNT)

    Rec-I-DCM3(TNT)

  • ExamplesofTriangulatedGraphs

    •  Character-stateintersectiongraphsfromperfectphylogenies.

    •  Theorem:Forallperfectphylogenies,thecharacterstateintersectiongraph(wherenodescorrespondtocharacterstatesandedgescorrespondtoanytwostatesatanynodeinthetree–internalandleaf)istriangulated.

    •  Proof:Usethefactthatallsubtreeintersectiongraphsaretriangulated.

  • “Homoplasy-Free”Evolution(perfectphylogenies)

    YESNO

  • PerfectPhylogeny

    •  AphylogenyTforasetSoftaxaisaperfectphylogenyifeachstateofeachcharacteroccupiesasubtree(nocharacterhasback-mutationsorparallelevolution)

    30

  • Perfectphylogenies,cont.

    •  A=(0,0),B=(0,1),C=(1,3),D=(1,2)hasaperfectphylogeny!

    •  A=(0,0),B=(0,1),C=(1,0),D=(1,1)doesnothaveaperfectphylogeny!

  • Aperfectphylogeny

    •  A=00•  B=01•  C=13•  D=12•  E=03•  F=13

    A

    B

    C

    D

  • Aperfectphylogeny

    •  A=00•  B=01•  C=13•  D=12•  E=03•  F=13

    A

    B

    C

    D

    E F

  • ThePerfectPhylogenyProblem

    •  GivenasetSoftaxa(species,languages,etc.)determineifaperfectphylogenyTexistsforS.

    •  TheproblemofdeterminingwhetheraperfectphylogenyexistsisNP-hard(McMorrisetal.1994,Steel1991).

  • TriangulatedGraphs

    •  Agraphistriangulatedifithasnosimplecyclesofsizefourormore.

  • TriangulatedGraphs

    Theorem(Gravil1974,Buneman1974):AgraphGistriangulatedifandonlyifGistheintersectiongraphofasetofsubtreesofatree.Proof:Onedirectioniseasy,andtheotherisnot…

  • PerfectPhylogeniesandTriangulatedColoredGraphs

    •  SupposeMisacharactermatrixandTisaperfectphylogenyforM.

    •  ThenletM’betheextensionofMtoincludetheadditional“species”addedattheinternalnodes.

    •  CharacterStateIntersectionGraphGbasedonM’:–  ForeachcharacteralphaandstateiinM’,giveavertexv(alpha,i)andcolorthevertexwiththecolorforalpha.

    –  Putedgesbetweentwoverticesiftheyshareanyspecies.–  Gistriangulatedandproperlycolored.

  • PerfectPhylogeniesandTriangulatedColoredGraphs

    •  SupposeMisacharactermatrixandTisaperfectphylogenyforM.

    •  ThenletM’betheextensionofMtoincludetheadditional“species”addedattheinternalnodes.

    •  ThecharacterstateintersectiongraphGbasedonM’istriangulatedandproperlycolored.(Why?)

    •  ButifwehadbaseditonMitmightnothavebeentriangulated.(Why?)

  • Aperfectphylogeny

    •  A=00•  B=01•  C=13•  D=12•  E=03•  F=13Drawthecharacterstateintersectiongraph.

    A

    B

    C

    D

    E F

  • Matrixwithaperfectphylogeny

    c1c2c3s1321s2122s3113s4211

    Draw the perfect phylogeny and compute the sequences at the internal nodes.

  • Matrixwithaperfectphylogeny

    c1c2c3s1321s2122s3113s4211

    Draw the character state intersection graph for the extended matrix (including the sequences at the internal nodes).

  • Thepartitionintersectiongraph

    “Yes”InstanceofPP:c1c2c3s1321s2122s3113s4211

  • Triangulatingcoloredgraphs

    •  LetG=(V,E)beagraphandcbeavertexcoloringofG.ThenGcanbec-triangulatedifasupergraphG’=(V,E’)existsthatistriangulatedandwherethecoloringcisproper.

    •  Inotherwords,Gcanbec-triangulatedifandonlyifwecanaddedgestoGtomakeittriangulatedwithoutaddingedgesbetweenverticesofthesamecolor.

  • Agraphthatcanbec-triangulated

  • Agraphthatcanbec-triangulated

    A vertex-colored graph G=(V,E) can be c-triangulated if a supergraph G’=(V,E’) exists that is triangulated and where the coloring is proper.

  • Agraphthatcannotbec-triangulated

  • TriangulatingColoredGraphs(TCG)

    TriangulatingColoredGraphs:givenavertex-coloredgraphG,determineifGcanbec-triangulated.

  • ThePPandTCGProblems

    •  Buneman’sTheorem:AperfectphylogenyexistsforasetSifandonlyiftheassociatedcharacterstateintersectiongraphcanbec-triangulated.

    •  ThePPandTCGproblemsarepolynomiallyequivalentandNP-hard.

  • Ano-instanceofPerfectPhylogeny

    •  A=00•  B=01•  C=10•  D=11

    0 1

    0

    1

    Aninputtoperfectphylogeny(left)offoursequencesdescribedbytwocharacters,anditscharacterstateintersectiongraph.Notethatthecharacterstateintersectiongraphis2-colored.

  • SolvingthePPProblemUsingBuneman’sTheorem

    “Yes”InstanceofPP:c1c2c3s1321s2122s3113s4211

  • SolvingthePPProblemUsingBuneman’sTheorem

    “Yes”InstanceofPP:c1c2c3s1321s2122s3113s4211

  • Somespecialcasesareeasy

    •  Binarycharacterperfectphylogenysolvableinlineartime

    •  r-statecharacterssolvableinpolynomialtimeforeachr(combinatorialalgorithm)

    •  Twocharacterperfectphylogenysolvableinpolynomialtime(produces2-coloredgraph)

    •  k-characterperfectphylogenysolvableinpolynomialtimeforeachk(producesk-coloredgraphs--connectionstoRobertson-Seymourgraphminortheory)

  • EarlyHistory•  LeQuesne(1969,1972,1974,1977):initialformulationofperfectphylogenies•  Estabrook(1972),Estabrooketal.(1975):mathematicalfoundationsofperfect

    phylogenies•  McMorris(1997):binarycharactercompatibility•  Felsenstein(1984):reviewpaper•  Estabrook&Landrum,Fitch1975:compatibilityoftwocharacters•  Steel(1992)andBodlaenderetal.(1992):NP-hardness•  Buneman(1974):reducedperfectphylogenytotriangulatingcoloredgraphs•  KannanandWarnow(1992):establishedequivalenceofTCGandPP

    SeeT.Warnow,1993.Constructingphylogenetictreesefficientlyusingcompatibilitycriteria.NewZealandJournalofBotany,31:3,pp.239-248(linkedoffmyhomepage)forasurveyoftheearlyliteratureandthefullcitations.

  • Literaturesample•  R.AgarwalaandD.Fernandez-Baca,1994.Apolynomial-timealgorithmfortheperfectphylogenyproblem

    whenthenumberofcharacterstatesisfixed.SIAMJournalonComputing,23,1216–1224.•  R.AgarwalaandD.Fernandez-Baca,1996.Simplealgorithmsforperfectphylogenyandtriangulating

    coloredgraphs.InternationalJournalofFoundationsofComputerScience,7,11–21.•  H.L.Bodlaender,M.R.Fellows,MichaelT.Hallett,H.ToddWareham,andT.Warnow.2000.Thehardnessof

    perfectphylogeny,feasibleregisterassignmentandotherproblemsonthincoloredgraphs,TheoreticalComputerScience244(2000)167-188

    •  H.L.BodlaenderandT.KIoks,1993:Asimplelineartimealgorithmfortriangulatingthree-coloredgraphs.JournalofalgorithmsJ5:160-172.

    •  D.Fernandez-Baca,2000.Theperfectphylogenyproblem.Pages203–234of:Du,D.-Z.,andCheng,X.(eds),SteinerTreesinIndustries.KluwerAcademicPublishers.

    •  D.Gusfield,1991.Efficientalgorithmsforinferringevolutionarytrees.Networks21,19–28.•  R.IduryandA.Schaffer.1993:Triangulatingthree-coloredgraphsinlineartimeandlinearspace.SIAM

    journalondiscretemathematics6:289-294.•  S.KannanandT.Warnow,1992.Triangulating3-coloredgraphs.SIAMJ.onDiscreteMathematics,Vol.5

    No.2,pp.249-258(alsoSODA1991)•  S.KannanandT.Warnow,1997.Afastalgorithmforthecomputationandenumerationofperfect

    phylogenieswhenthenumberofcharacterstatesisfixed.SIAMJ.Computing,Vol.26,No.6,pp.1749-1763(alsoSODA1995)

    •  F.R.McMorris,T.Warnow,andT.Wimer,1994.TriangulatingVertexColoredGraphs.SIAMJ.onDiscreteMathematics,Vol.7,No.2,pp.296-306(alsoSODA1993).

  • ApplicationsofPerfectPhylogeny

    •  TumorPhylogenetics(MohammedEl-KebirwilltalkthisonApril10-17,2018)

    •  HistoricalLinguistics(IwilltalkaboutthisonMarch15,2018)

    •  PopulationgeneticsandHaplotypeinference