Assessment of FM and FM/TBM modeling in CASP13 · 2018-12-12 · GDTTS69.5 QCS90.7 Parts missing in...

Preview:

Citation preview

AssessmentofFMandFM/TBMmodelinginCASP13

LucianoA.AbriataandMatteoDalPeraro

LaboratoryforBiomolecular Modeling– LBMInstituteofBioengineering,SchoolofLifeSciencesEcole Polytechnique Fédérale deLausanne- EPFL

Acknowledgements

• CASP13(andCASP12) organizers– Andriy,John,Torsten,Krzysztof,MayaandAnna!– PredictionCenter– Previousassessorsandallpredictors

§ Greatandconstructiveexperience§ Excitingtowitness2consecutivehugetransformationsinproteinstructureprediction

CASP13tertiarystructuretrack:32FM&13FM/TBM(+4FM-special)

DetailsonclassificationbyLisaKinch &Andriy Kryshtafovych

#groups/servers:107/39#models(1/5):7542/35982

Part1:EU-specificevaluations

StrategyforEU-specificevaluations

TargetEUs,modelsandtablesfromthePredictionCenter

~halfofthemodelsforinitialinspection,representedbytopGDTTSmodels

Clusteringat3Å

WebAppforinteractivenavigationofmodelclusters:6mainscores,othersavailabletoo

JS+HTML

Designationofbestcluster(s) foreachEU

VisualInspection

Model(s)designatedbestforeachEU

Furtherevaluationofmodelsinbestcluster,ifworth

Part1:EU-specificevaluationsChin-HsienTai,Hongjun Bai,ToddJ.Taylor,andByungkook Lee*Proteins 2013

CASP12-likewebapp:facilitatesassessment,andiseasilyopenedtothepublichttp://lucianoabriata.altervista.org/papersdata/casp12fmassessment/casp12-fm-fmtbm-assessment-3Aclusters.html

NEW:morescores,showserversindistinctcolor,andbuiltauxiliarywebappsalsoformodels

clusteredat1Å andforanalysiswithnosplitting

Part1:EU-specificevaluations

Clearbest

Theselookbestbut

theyaren’t

Unclearbest,visualinspectionofmultiplemodels

Verybad

Verygood

Excellent

Part1:EU-specificevaluations

Examplesofcorrelationplots

à GDTTS&QCSturnouttobethetwomostinformativescores,inourexperience*ForQCSseeCongetalBioinformatics2011

Manygoodmodels

Alwayshighr

Differentscoresproposedifferentbestmodels

GDTTS&QCSindeedgroupedseparatelyinanalysisbyOlechnovic etal.Bioinformatics2018

Examplesofcorrelationplots

TopGDTTS TargetTopQCS,secondGDTTS

HHscore 13.98LGA 73.5

Neff/LHHblits 0.01

T0991-D1(FM)

TS366_3(topbyDFM,

designatedbest)

Part1:EU-specificevaluations

Importanceofguidingvisualassessmentbymultiplescores

GDTTS 37.4QCS 68.8DFM 0.82

Target

T1010-D1(FM)

TS117_1

T0990-D3(FM)

Target TS043_1(designatedbest)

HHscore 2.76LGA 23.8Neff/LHHblits 0.2

HHscore 0.69

LGA 39.5

Neff/LHHblits 0.07

Part1:EU-specificevaluations

Severalveryhardtargetswithfoldscaptured

GDTTS 50QCS 80

GDTTS 50QCS 80

OnlytwoverydifficultEUswithnobestmodel

T0981-D2(FM)

Allscoreslow;heremodelofhighestGDTTSlooksreasonablebutismissingthelast

strandwhichisseparatedinsequence.Andmodelsthatarecompletearetoobad…

T0989-D2(FM)

LongextendedN-terminusandC-terminalbetahairpin,noneiswellpositioned;butthecentral

betasheetisquitegoodinsomemodels.

HHscore 14LGA 63.8Neff/LHHblits 0.07 HHscore 4.9LGA 55.1Neff/LHHblits 0.01

Part1:EU-specificevaluations

ImpactofprogressinCASP13:

Examplesof“FM-special”targetsforwhichfullmodelswereverygood

Part1:EU-specificevaluations

Example:T0953s2(D1:FM/TBM,D2&D3:FM)

TS117_4(TopbyTM,2.53ÅRMSDover61%ofsequence)

TS224_3(TopbyGDTTS)TargetbyEU(D1,D2,D3)

Example:T1000(D1:TBMnoteval.,D2:FM)TS043_1(TopbyGDTTS,scoresquitegoodbyallmetrics)

GDTTS 69.5QCS 90.7

Partsmissinginexp targetstructure

There’sanx-raystructureofD1

89%res<2Å

NotableprogressinCASP13:

12hardEUsthatreachednearatomisticresolutionbymanygroups

Part1:EU-specificevaluations

T0968s2-D1(FM)

TS043_1-D1(12models)

2.33Åoverfullsequence(115residues)

HHscore 19LGA50Neff/LHHblits 1.23

GDTTS80&QCS90

T0970-D1(FM/TBM)

TS043_2-D1(5modelsplus4fromTS347)

2.78Åover89%ofsequence(total96residues)

HHscore 17LGA67 GDTTS80&QCS90Neff/LHHblits 1.61

T1001-D1(FM)

TS222_4-D1(106models)

2.32Åoverfullsequence(139residues)

HHscore 11LGA55Neff/LHHblits 0.04GDTTS74&QCS93

T1008-D1(FM/TBM)

TS281_1-D1(126models)

1.14Åoverfullsequence(77residues)

HHscore 61LGA74Neff/LHHblits 0.01GDTTS91QCS95

NMR/MD

Part1:EU-specificevaluations

Part2:Rankings

RankingbasedonZ-scoresofGDTTS&QCS

Ranking=sumZ-scorescombinedfromGDTTS&QCS(asthesearebyfarthetwomostinformativescorestoguidevisualassessment)onallmodelssubmittedas#1,forTBM/FM,FMandFM_sptargetEUs,andconsideringsumofZ-score>-2.

Rankingisveryrobust:scoreswithGDTTSonlyorQCSonlyreturnthesametopgroups.

Allgroups

Servers

Part2:Ranking

Notablehighlights:groupsnotintop5whoprovidedtheonlybestmodelsforsometargets(uponvisualevaluation)

• ZHOU-SPOTforT0998-D1:alone&quitebetterthanrunners-up

• Jones-UCLforT1010-D1:alone&quitebetterthanrunners-up

• RaptorX-DeepModeller forT0949

• KIAS-GdanskforT0957s1-D1

• BAKERforT0975-D1

• Venclovas forT0991-D1Part2:Ranking

QCS78

QCS81

Part3:Progress

ProgressinFreeModeling(FM/TBMnotconsidered)

Notes:- ExactdefinitionofFMEUsmightvaryfromyeartoyear- CASP12andCASP13EUsofroughlyofsimilardifficulty

Median+/- MedianDeviationforGDTTSofbestmodels

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12

GDT

_TSofallbe

stm

odels

19941996 1998200020022004200620082010201220142016

BoxplotsofGDTTSdistributionsforbestmodels

10

15

20

25

30

35

40

45

50

55

60

1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

Med

ianGDT

_TS

Year

Coevolution-basedcontactprediction

methodsinliterature

Med

ianGDT

TSofb

estm

odels

CASP13

Globalanalyses2- Progress

MachineLearningformolecularmodeling

Possiblesourcesofimprovement:alignmentdepth,existingtemplates,domainsize?

Possiblesourcesofimprovement:alignmentdepth,existingtemplates,domainsize?

CASP12

• FromCASP12toCASP13significantimprovementinperformance

• Dosomepredictorshaveaccesstospecial,closemetagenomicsdatabases?

CASP13

KeyconclusionsfromCASP13onthetertiarystructurepredictiontrack

• Yetanothersignificantimprovementinpredictionquality,mainlyduetotheriseofmachinelearningmethodscombinedwithcoevolution-basedcontactprediction

• ReachingnearlyatomisticresolutionofthebackboneforsomeverydifficultEUs(<150residues)bymanygroups!

• PredictionsaresogoodthatsplittingEUsisinsomecasesnotnecessary

• AlignmentdepthallowsforbettertopmodelsthaninCASP12,butnowseemtoneedlowernumbersofsequences

• TemplatesofpoorsequencesimilaritymightbebetteridentifiedthaninCASP12

• Remaininglimitations:domainsizeandalignmentdepth

Recommended