Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Tutorial:RosettatoolsforstructuredeterminationincryoEMdensityBrandonFrenz,RayY.-R.Wang,FrankDiMaioLastupdated:March2019
ThistutorialisintendedtointroduceuserstoseveraldifferentwaysRosettamaybeusedtosolvevariousstructuredeterminationtasksgiven3-5ÅcryoEMdensitydata.Itisnotintendedtoreplacetheuser’sguide,availableathttps://www.rosettacommons.org/manuals/latest/main/.
Thetutorialissplitupintofourparts.
1. AnintroductiontoRosettaingeneral,showinghowonemayscorestructuresandminimizestructuresguidedbyexperimentaldensitydata
2. Ourmodelrebuildingprotocol(RosettaCM),whereonewishestorecombinehomologousstructures,andrebuildsmallmissingregions(<12residues)
3. AnadvancedapplicationofRosettaCMtodeterminethesequencethreadingofamodel4. Ourmodelcompletiontools,whereonewishestocompleteapartialmodelbuiltbythedenovo
toolorwishestorebuildlargemissingregions(12ormoreresidues)
Ineachscenario,wepresentthemostbasicusageofRosettaforthetask,andthendescribeadditionaloptionsthatmaybeuseful.Command-lineflagsandinputscriptsareprovidedinshadedboxes,withboldfacedtextindicatingparametersofnote.Theseparametersaredescribedinthetextfollowingthecommandline.
Note:inallsections,youwillneedtoupdatethecommandscriptstopointatyourinstallationofRosettaandtheRosettadatabase.
1)Rosettaandelectrondensitybasics
ThissectionprovidesabriefintroductiontousingRosetta,andanoverviewofusingdensitydatawithinRosetta.
OverviewofRosetta
TheRosettadocumentationisagoodsourceofadditionalinformationonseveralofthetoolsdescribedinthisdocument.Thisisavailableathttps://www.rosettacommons.org/docs/latest/Home.
ThetoolsdescribedinthisdocumentusetheRosettaScriptsframework,describedathttps://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/RosettaScripts.Briefly,thisallowsprotocolstobedefinedasaseriesofatomic"Movers"whichmanipulateastructure.Theformatisasfollows:
<ROSETTASCRIPTS> <SCOREFXNS> </SCOREFXNS> <MOVERS> </MOVERS> <PROTOCOLS> </PROTOCOLS> </ROSETTASCRIPTS>
Eachblockcontainsinformationinrunningtheprotocol:<SCOREFXNS>and<MOVERS>areusedtodeclarescorefunctionsandmovers;while<PROTOCOLS>iswherethestepsoftheprotocolareenumerated.
Rosettatoolsarerunviathecommandline,withflagscontrollinggeneralprogrambehavior.Manyoftheflagsspecificallyfordensityrefinementareoutlinedinthesectionsfollowing.
DensityscoringinRosetta
AgreementtodensityisimplementedinRosettaasanadditionalenergyterm.Rosettaassessesagreementtodensitybycomputingthedensitythatonewouldexpecttosee,givenamodel,andmeasuringtheagreementoftheexpectedandexperimentaldensity.
elec_dens_fastThisscoretermisrecommendedfornearlyallusesofdensityrefinementinRosetta.Itusesinterpolationonaprecomputedgridofper-atomscorestoapproximatethedensitycorrelations.Thisversionissignificantlyfaster(~10x)thentheexactscoringtermbelow,andisveryhighlycorrelated.
TheseenergytermsmaybeprovidedtoRosettaintwoways.First,itmaybeprovidedinaRosettaScriptXMLfileasinput:
<Reweight scoretype="elec_dens_fast" weight="35.0"/>
Fornon-Rosettascriptapplications,thefollowingflagcontrolsthedensityscoringfunctionweight:
-edensity:fast_dens_wt 35.0 Therecommendedweightsforeachofthesetermsvarydependingonthedensitymapresolution,startingmodelquality,andprotocol.Section2describeshowtheweightsmaybetuned.However,thefollowingare
goodrulesofthumbforsettingthedensityweightwithinRosetta:Atresolutionsbetterthan2.5Å:anelec_dens_fastweightof65.0isgenerallyreasonable.Atresolutionsbetween2.5Åand3.5Å:anelec_dens_fastweightof50.0isgenerallyreasonable.Atresolutionsworsethan3.5Å:anelec_dens_fastweightof35.0isgenerallyreasonable.Incentroidmode:anelec_dens_fastweightof10.0isgenerallyreasonable
Atverylowresolutions(worsethan6Å),theweightmayneedtobefurtherreduced.Ingeneral,iftheRosettaenergiesarepositive(orsignificantoutliersareflaggedbyMolprobityorothervalidationprograms)thentheweightsneedtobereduced.
Inadditiontothescoretermsabove,therearealsoseveralflagsthatcontrolmapscoringbehavior.MapsarereadintoRosettausingeithertheflag:
-edensity::mapfile mapfile.mrc
OrfromXML:
<LoadDensityMap name="loaddens" mapfile="mapfile.mrc"/>
MapsmaybeineitherCCP4orMRCformat(themaptypeisautomaticallydetectedfromtheheaderinfo).
Theresolutionofthemap,usedwhencomparingcalculatedtoexperimentaldensity,isspecifiedwiththeflag:
-edensity::mapreso 5.0
Mapsmayalsoberesampledtoreducememoryusageandruntime.Thisisdonethroughtheflag:
-edensity::grid_spacing 2.0
Noticethatthisflagshouldneverbemorethanhalfthegivenresolution,andifusingthefastscoringfunctionnevermorethanathirdoftheresolution.Forbothparameters,thedefaultisgenerallyfine(don’tresample,andassumetheresolutionis~3xthegridsampling).
Finally,onemaychoosetocalculatedensityusingeithercryoEMorX-rayscatteringfactors.Atlowresolution,thisprobablymakeslittledifference,butmightatresolutionsbetterthanabout3.5Å.ThedefaultistouseX-rayscatteringfactors;toturnoncryoEMscatteringfactorsinstead,usethefollowingflag:
-edensity::cryoem_scatterers
Example1A:ScoringaPDBinRosettawithdensity
Mostsimply,onemaywishtosimplyscoreamodelusingRosetta’senergyfunctionincludingthedensityterms.Thisiseasilyaccomplishedusingthescore_jd2application.Asamplecommandlinetorescorethestructureindensityisgivenin1_rosetta_basics/A_run_rescore.sh.ItillustratestheuseofvariousdensityflagstoprovideRosettawithexperimentaldensityinformation.
$ROSETTA3/source/bin/score_jd2.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 1isrA.pdb 1issA.pdb \ -ignore_unrecognized_res \ -edensity::mapfile 1issA_6A.mrc \ -edensity::mapreso 5.0 \ -edensity::grid_spacing 2.0 \ -edensity::fastdens_wt 35.0 \ -edensity::cryoem_scatterers \ -crystal_refine
Someflagsofnoteareboldfacedabove.First,theinputstructureisprovidedwiththecommand-in::file::s.ThisiscommontomanyRosettaapplications,andmorethanoneinputmaybeprovided;eachwillbeprocessedindependently.Theflagsbeginningwith–edensity::tellRosettaaboutthedensitymapintowhichitisbeingfit.Thenameofthemapfile(inCCP4orMRCformat),theresolutionofthemap,thegridsamplingofthemap(whichshouldneverbemorethanhalftheresolution),andtheweightsonthevariousfit-to-densityscoringfunctions.Thesesameflagsarereusedformanydifferentprotocolsinadditiontorelax.Finally,theflag-crystal_refinetheflagturnsonseveraldensity-relatedoptionsrelatedtoPDBreadingandwriting,andshouldalwaysbeusedwhenrefiningagainstdensitydata.
Note:TheinputPDBmustbealignedtothedensitymapusingsomeexternaltool.Rosettawilloptionallyrigid-bodyminimizethestructureintodensitybeforerescoringbyprovidingtheflag–edensity::realignmintotheapplication.Ifthisisdone,theflag–out::pdbwillwritetheminimizedPDBfiletoaPDBfile.
Thiscommandlineoutputsascorefile,score.sc,thatgives,foreachstructurespecifiedwith-in::file::s,thescorewithrespecttoeachterminRosetta’senergyfunction.ThemeaningofindividualscoretermsaswellasanoverviewoftheRosettaenergyfunctioncanbefoundinthepaper:
AlfordRF,Leaver-FayA,JeliazkovJR,O'MearaMJ,DiMaioFP,ParkH,ShapovalovMV,RenfrewPD,MulliganVK,KappelK,LabonteJW,PacellaMS,BonneauR,BradleyP,DunbrackRLJr,DasR,BakerD,KuhlmanB,KortemmeT,GrayJJ.TheRosettaAll-AtomEnergyFunctionforMacromolecularModelingandDesign.JChemTheoryComput.2017Jun13;13(6):3031-3048.
Examples1Band1C:SimplerefinementintodensityusingRosettaScriptsandrelax
InthissectionweintroduceRosettaScriptsbywayofaverysimplerefinement-into-densityexample.RosettaScriptsprovidesanXMLscriptinginterfacetoRosettathatallowsfine-grainedcontrolofprotocols.ThesyntaxisfullydescribedintheRosettadocumentation;however,averybriefintroductionisprovidedhere.ThebasicsyntaxfortheXMLisillustratedhere(1_rosetta_basics/B_relax_density.xml)
<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta_cart"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="1issA_6A.mrc"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="2" cartesian="1"/> </MOVERS>
<PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>
Therearethree"blocks"ofdeclarationsinthisscript.Inthefirst,<SCOREFXNS>…</SCOREFXNS>,thescorefunctionstobeusedthroughouttheprotocolaredeclared;thesecond,<MOVERS>…</MOVERS>,movers–oratomicoperationsthatmodifyastructure–aredeclared;finally,thethird,<PROTOCOLS>…</PROTOCOLS>,afullprotocolisdeclaredasasequenceofmovers.
Inthisparticularexample,wedeclareasinglescorefunction,dens,whichusesthescorefunctionbeta_cart(adefaultscorefunction,don’tneedtoworryaboutit),andturnsonelec_dens_fast,thefit-to-densityscore,withaweightof35.Wethendeclarethreemovers,SetupForDensityScoring,LoadDensityMap,andFastRelax,whichsetsuptheloadedstructurefordensityscoring,loadsamapintomemory,andthenrefinesthestructureusingtheFastRelaxprotocol.Thedeclaredscorefunction,dens,isusedasaninputtotheFastRelaxmover.
Finally,notetheadditionalblock:
<Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/>
Thisadjuststheper-residuesidechaindensityweights.ItisrecommendedtoalwaysusetheseweightswhenrefiningagainstcryoEMdensity.
Torunthisscript,weusethefollowingcommandline(1_rosetta_basics/B_relax_density.sh):
$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 1isrA.pdb \ -parser::protocol ex_B1_run_RS_relax_density.xml \ -ignore_unrecognized_res \ -edensity::mapreso 5.0 \ -edensity::cryoem_scatterers \ -crystal_refine \ -out::suffix _relax \ -beta
Note:Wedonothavetospecifythedensityweightorthemapfileonthecommandline,sincetheyarehandledwithintheXMLfile.However,otherdensityoptionsmustbespecifiedonthecommandline.WhenusingRosettaScripts,thedensityweightsmustbespecifiedintheXML,theinputmapmaybespecifiedeitherway.
Finally,inthepreviousXMLfile,thetagcartesian=1appears,whichrefinesthestructureinCartesianspace.Rosettaalsoallowsrefinementintorsionalspace,whichmaybebetterforcapturingdomainmotion,andforfurtherreductioninmodelparametersagainstlow-resolutiondata.Toenabletorsionalrefinement(1_rosetta_basics/C_relax_tors_density.xml),wemakethreesmallchangestotheXML:
<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76,
C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="1issA_6A.mrc"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="5" cartesian="0"/> </MOVERS> <PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>
Cartesianversustorsionalrefinement
OneofthestrengthsofRosettaisitsabilitytoperformtorsion-spacerefinement,whichcanbeincrediblyvaluableincapturingthingslikedomainmotionofproteinswhicharesimplemovesintorsionspacebutcanbecomplexincartesianspace.Theoptimaltypeofrefinementforaparticularproblemdependsonthesystemitself,themapresolution,andthequalityofthestartingmodel.Afewgeneraltips:
• Several(2-4)repeatsoftorsion-spacerefinementfollowedby1repeatofCartesian-spacerefinementisgenerallyagoodstrategy
• Forverylarge(1000residue+)systemsorverypoorqualityinputmodels(manyclashes)cartesianrefinementaloneisbetterbehaved.
2)ModelrebuildingwithRosettaCM
Inthisscenario,weintroduceatool,RosettaCM,forbuildingmissingportionsofamodelguidedbydensitydata.Whileprimarilygearedtowardscomparativemodeling,itmayalsobeusefulforbuildingportionsofaproteinthataredisorderedwhencrystallizedordifficultregionsinhand-builtmodels.Inthisscenario,weintroducethebasicrebuildingprotocol,thenshowhowthetoolmayalsobeusedto:
• Combinepiecesfrommultipletemplatemodelsguidedbydensity• Rebuildwithuser-definedrestraints• Iterativelyrebuildmodelsindifficultcasesdifficultcases
Asarunningexample,weusehorsespleenapoferritinandthedepositedmap(EMD-2788).
Example2A:PreparingtemplatesforuseinRosettaCM
Inmanycases,muchofthesetupworkishandledbyascript,setup_RosettaCM.pyinRosettaTools(aseparaterepositoryavailablefromrosettacommons.org).Thisscripttakesaninputalignmentinavarietyofformats,andpreparestheinputsautomatically.Itisexecutedbyrunningthecommand:
setup_RosettaCM.py \ -–fasta t20s.fasta \ --alignment tmpl.fasta \ --alignment_format fasta \ --templates tmpl.pdb \ --rosetta_bin ~/Rosetta/main/source/bin \ --verbose
Inputsincludethefull-lengthfasta,analignmentfile–ineitherfasta,ClustalW,orHHSearchformat–andthecorrespondingtemplatePDBfiles.ThisscriptwillprepareallthenecessaryinputsinordertorunRosettaCM.
Alternately,thesetupmaybeperformedmanually.Inthiscase,sinceweareusingsomenonstandardfeatures(symmetryanddensity),andwehavetwochainsintheasymmetricunitwewilldothis;alternately,theinputsfromthepreviousstepmaybeusedasastartingpointandsubsequentlymodified.
Identifyingalignmentswithhhpred
Wefirstneedtoidentifyhomologoussequences.Todothis,weusethewebserverhhpred(https://toolkit.tuebingen.mpg.de/).Weenteroursequence(seg.fasta)intothewebformandclicksubmit.Wegetresults:
Inthiscase,therearemanyhomologous
WeneedtoconvertthisalignmenttoaformatRosettacanunderstand.Ihaveincludedascript(scripts/prepare_hybridize_from_hhsearch.pl)thatautomatesthisalthoughitmaybeperformedmanuallywithatexteditoraswell.
Downloadthealignmentbyclicking“RawOutput”andthen“Download”(orseethetutorialfileseq.hhr).
Mostofthesehitsareveryhighsequenceidentity,makingthemodellingproblemtrivial.Wearegoingtofocusonmodellingstartingfromtwodistantstructuresofbacterioferritins:
… 60 3UOI_V Bacterioferritin (E.C.1 99.7 9E-18 1.9E-22 114.8 19.5 156 3-168 1-156 … 62 3GVY_C Bacterioferritin; bacte 99.7 3.6E-17 7.5E-22 112.0 19.3 154 6-169 2-155 …
Usingatexteditor,editthefileseq.hhr,removingallbutthesetwoalignments(orseethefileseq_edit.hhr).
Next,convertthesealignmentstoRosettaformatusingthegivenscript(A_convert_hhr_file.sh).Inadditiontoconvertingthealignmentfile,itwillalsodownloadthetemplatefilesnecessaryforthenextstep.Runthisscriptwithoutinputarguments,andanoutput,alignment.filt,isproduced:
## 1XXX_ 3uoiV_201 # hhsearch scores_from_program: 0 1.00 2 IRQNYSTEVEAAVNRLVNLYLRASYTYLSLGFYFDRDDVALEGVCHFFRELAEEKREGAERLLKMQNQRGGRALFQDLQKPSQDEWGTTLDAMKAAIVLEKSLNQALLDLHALGSAQADPHLCDFLESHFLDEEVKLIKKMGDHLTNIQRLVGSQAGLGEYLFERL 0 --MQGDPDVLRLLNEQLTSELTAINQYFLHSKMQDN--WGFTELAAHTRAESFDEMRHAEEITDRILLLDGLPNYQRIGSLRI--GQTLREQFEADLAIEYDVLNRLKPGIVMCREKQDTTSAVLLE-KIVADEEEHIDYLETQLELMDK-----LGEELYSAQCV -- ## 1XXX_ 3gvyC_202 # hhsearch scores_from_program: 0 1.00 5 NYSTEVEAAVNRLVNLYLRASYTYLSLGFYFDRDDVALEGVCHFFRELAEEKREGAERLLKMQNQRGGRALFQDLQKPSQDEWGTTLDAMKAAIVLEKSLNQALLDLHALGSAQADPHLCDFLESHFLDEEVKLIKKMGDHLTNIQRLVGSQAGLGEYLFERLT 0 QGDAKVIEYLNAALRSELTAVSQYWLHYRLQED--WGFGSIAHKSRKESIEEMHHADKLIQRIIFLGGHPNLQRLNPLRI--GQTLRETLDADLAAEHDARTLYIEARDHCEKVRDYPSKMLFE-ELIADEEGHIDYLETQIDLMGS-----IGEQNYGMLNAK --
Inthisformat,thefirstlineis'##'followedbyacodeforthetargetandoneforthetemplate.Thesecondlineidentifiesthesourceofthealignment;thethirdjustkeepasitis.Thefourthlineisthetargetsequenceandthefifthisthetemplate;thenumberisan'offset',identifyingwherethesequencestarts.However,thenumberdoesn'tusethePDBresidbutjustcountsresiduesstartingat0.Thesixthlineis'--'.Multiplealignmentsmaybeconcatenatedinasinglefile,withthetemplatecodeidentifyingthetemplatecorrespondingtoeachalignment.
Example2B:Runpartialthreadinganddockmodelsintodensity
RosettaCMtakesasinputspartiallythreadedmodels,thatismodelswherealignedpositionshavetheirresidueidentitiesremapped,andunalignedresiduesarenotpresent.Togeneratethesemodelsfromanalignmentfileandtemplate,wecanruntheRosettacommand(3_model_rebuilding/A_partialthread.sh):
$ROSETTA3/source/bin/partial_thread.macosclangrelease \ -database ~/Rosetta/main/database/ \ -in::file::fasta seq.fasta \ -in::file::alignment alignments.filt \ -in::file::template_pdb 3uoiV.pdb 3gvyC.pdb pdb
Thiswilloutputatwopartiallythreadedmodels–3uoiV_201.pdband3gvyC_202.pdb–thatwillbeusedasinputforRosettaCM.
Thefinalstepofthemethodistoalignthepartiallythreadedmodelsintothedensitymap.ThiscanbedonemosteasilyusingChimera’s“fitintomap”tool.Itmaybeeasiesttoalignonepartialthreadintothedensityandthenaligntheothermodeltothat.Alignedversionsofthetemplatesareincludedas3uoiV_201_aln.pdband3gvyC_202_aln.pdb.
Example2C:RunningRosettaCMasamonomer.
Forourfirststep,wewillbemodellingthemonomerstructureusingRosettaCM.Whiletheassemblyissymmetric,andthenextpartwillbecarriedoutinthecontextoftheassembly,itmaybeusefulinsomecasestomodelindividualcomponentsoflargerassemblies.Suchmodellingismuchfaster,allowingformuchgreaterconformationalsampling,anditisoftenusefultomodelindividualsubunitsbeforemodellingtheentirecomplex.
LikethemethodsintroducedinScenario1,RosettaCMiscontrolledthroughanXMLscriptusingRosettaScripts.TheXMLisasfollows(2_model_rebuilding/C_rosettaCM_singletarget.xml):
<ROSETTASCRIPTS> <TASKOPERATIONS> </TASKOPERATIONS> <SCOREFXNS> <ScoreFunction name="stage1" weights="score3" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="10"/> </ScoreFunction> <ScoreFunction name="stage2" weights="score4_smooth_cart" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="10"/> </ScoreFunction> <ScoreFunction name="fullatom" weights="beta_cart" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="35"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <FILTERS> </FILTERS> <MOVERS> <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1"> <Template pdb="3uoiV_201_aln.pdb" weight="1.0" cst_file="AUTO"/> <Template pdb="3gvyC_202_aln.pdb" weight="1.0" cst_file="AUTO"/> </Hybridize> </MOVERS> <PROTOCOLS> <Add mover="hybridize"/> </PROTOCOLS> </ROSETTASCRIPTS>
Themainworkisdonethroughasinglemover,Hybridizewhichhandlesallstagesofmodel-building.Input
structuresarespecifiedviaTemplatelines(inthiscasethereisonlyone).Foreachtemplateline,wespecifythepdbinput,aswellasacoupleofotherparameters:aweight(therelativefrequencywesampleeachtemplatewith);aconstraintfile(settingthisto"auto"setsupautomaticconstraintstothetemplate,whilesettingthisto"none"turnsoffallconstraints,user-definedconstraintsaredescribedlater).
Afewnotesaboutusingmultiplemodelswithhybridize:
• Withdensity,weneedtoensurethatallinputmodelsarealignedtothedensity.ThiscanbedoneusingChimera’salignmenttools.Itmaybeeasiertoalignasinglemodeltothedensityandthenalignallothermodelstothismodel.
• Ineachtrajectory,astartingmodelischosenatrandom;theconstraintsandsymmetryfromthisselectedmodelarechosenatthestartofeachrun.Ifwewishtouseaportionofamodel,butdonotwanttouseitssymmetryorconstraints,wecanassignitaweightof0:backboneconformationsfromthismodelwillbeusedinconformationalsampling,butthesymmetryandconstraintswillneverbeused.
• Similarly,gapsintheselectedstartingmodelarerebuiltbeforerecombinationoccurs.Ifoneofthetemplateshaspoorcoverage,butprovidesvaluablestructuralfeatures,itshouldbeused,butwithweight0.
GiventhisXML,RosettaCMisthenrunwiththefollowingcommandline(C_rosettaCM_singletarget.sh):
$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in:file:fasta t20s.fasta \ -parser:protocol C_rosettaCM_singletarget.xml \ -nstruct 5 \ -relax:jump_move true \ -relax:dualspace \ -out::suffix _singletgt \ -edensity::mapfile t20S_41A_half1.mrc \ -edensity::mapreso 5.0 \ -edensity::cryoem_scatterers \ -beta \ -default_max_cycles 200
Theinputcommandissimilartothoseseenbefore,butwithafewkeydifferences.First,theinputtoRosettaisspecifiedwith-in:file:fastaratherthan-in:file:s.Alsonotethattheinputargument–nstruct 5isgiven,tellingRosettatogenerate50modelsforeachprocess.Generally,hundredstothousandsofmodelsarenecessarytosufficientlysampleconformationalspace;moreandlongerregionstorebuildrequiremoremodels.
JobdistributionItisgenerallyusefultosample~100modelsfromeachstartingpoint.Forthispurpose,itmaybeusefultorunmultiplejobsinparallel.Topreventoutputstructuresfromclobberingoneanother,theflag–out::suffixmaybeuseful,whereeachseparatejobisgivenadifferentsuffix.
Forexample,ona16-coremachine,wemayspecify-out::suffix_$1,then(usingGNUparallel)runthefollowing:
parallel –j16 ./C_rosettaCM_monomer.sh {} ::: {1..16}
Finally, GNU parallel allows launching of jobs remotely if SSH keys have been set up for passwordless login. To run:
parallel –S 16/node1,16/node2,16/node3,16/node4 –-workdir . ./C_rosettaCM_monomer.sh {} ::: {1..48}
Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentation(https://www.gnu.org/software/parallel/)formoreinformation.
AnalyzingresultsandmodelselectionWhilethisisanactivetopicofresearch,generally–onceadensityweighthasbeenchosen–toselectthebestmodelsfromamongthefullset,wewanttoselectmodelsoptimizingbothmodelgeometryandfit-to-densityvalues.ModelgeometrymaybeevaluatedusingRosettaenergiesaftersubtractingdensityenergies,whichmaybedonebyinspectingthescore*.scfilesproducedasoutput.DensityfitmaybeevaluatedusingthedensityenergyinRosettaaswellasFSCsusingtheReportFSCmover(notcoveredinthistutorial,seepart2ofthemaintutorial)
Nomattertheselectioncriteria,thetopmodels(5-10)shouldbeinspectedformodelconvergenceaswellasvisuallyinspectedfordensitymapagreement.
Example2D:RunningRosettaCMwithsymmetry.
Next,weneedtosetupsymmetricmodelingwithRosettaCM.Weuseascript,make_symmdef_file.plscriptinordertogenerateasymmetrydefinitionfileforuseinRosetta.AstraightforwardwaytodosoistouseChimeratodockthenecessarychainsintodensity.Thisscript’srequiredinputsdependontheunderlyingsymmetry:
• Forcyclic(C)anddihedral(D)symmetries,weonlyneedasingle"primarychain"andanadjacentchainineachpointgroup;
• Forhelicalsymmetries,weneedanadjacentchaininthelayer(ifthereisone)andanadjacentchainupthehelicalaxis
• Forothersymmetriesweneedallchainsadjacenttoasinglesubunit.
Sincethiscasefallsintothelattercase,(forexampleswithCandDsymmetryseethemaintutorial),weneedtocreateaPDBfilethatcontainsonechainplusalladjacentchainsdockedintodensity.Anexample,3gvyC_symm_r.pdb,isincluded.
Chimeracanbeusefulhereaswell.Fromwithinchimera,wecanrunthefollowingcommandtogeneratesymmetryforthiscase:
sym #1 group O center 88.9,88.9,88.9
Eitherway,savethechimerafilesinasingleoutputfileandthenrelabelchainsusingtheincludedscript:
scripts/relabel_chains.pl 3gvyC_symm.pdb
TogenerateourRosettasymmetryfilefromthisinput,wethensimplyhavetorunthecommand(D_make_symmdef.sh):
$ROSETTA3/source/src/apps/public/symmetry/make_symmdef_file.pl \ -m pseudo -a A \ -p 3gvyC_symm_r.pdb > ferritin.symm
Sincewehavealreadycreatedtheinputtemplatesusingthepartial_threadapplication,wesimplyneedto
usetheoutputofthepartialthreadingtogetherwiththesymmetrydefinitionfile.
Wethenneedtomaketwosmallmodificationstoourinputs:
... <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1”> <Template pdb="3uoiV_201_aln.pdb" weight="1.0" cst_file="AUTO" symmdef="ferritin.symm"/> <Template pdb="3gvyC_202_aln.pdb" weight="1.0" cst_file="AUTO" symmdef="ferritin.symm"/> </Hybridize> ...
3)Advancedmodelling:usingpartial_threadandrelaxtodeterminesequencethreading
Inthissection,wewillusethesametoolsintroducedintheprevioussectionstotackleamorechallengingproblem,determiningthealignmentofsequencetoabackbonemodel.ThisisbasedonEgelmanetal.,Structure,2015.
Forthisexamplewehaveamap(left)thatclearlyidentifieshelicesinthedensity.However,thethreadingofsequenceisambiguous:itisnotknownwhichistheN-andwhichistheC-terminus,andthereareonly24resolvedresidues,comparedto29aminoacidsinthesequence.
However,sincethehelixorientationsarestraightforward,wecanbrute-forcethisproblem.Wecreatethreemodels:
1. polyA_symm.pdb,inwhichtwohelicesaredocked,fromwhichwecangetthesymmetrydefinitionfile2. polyA_ctermin.pdb,amonomerinoneorientation3. polyA_ntermin.pdb,amonomerintheotherorientation
Example3A:Buildthesymmetrydefinitionfile.
Asinsectiontwo,westartbybuildingthesymmetrydefinitionfile:
$ROSETTA3/source/src/apps/public/symmetry/make_symmdef_file.pl \ -m HELIX -a J -b K \ -p polyA_symm.pdb -r 1000 -t 8 > h.symm
Sincewehavehelicalsymmetry,someoftheoptionsareabitdifferent.“-mHELIX”specifiesweruninhelicalmode,andthearguments“-aJ-bK”indicatethe“primarychain”(J)andthechainupthehelicalaxis(K).Finally,theargument“-t8”indicateshowmanysubunitstogenerateineachdirection.
Whenrunninginthismode,note:
• Rosettaoutputsafile,polyA_symm_model_JK.pdb,ofthesymmetryitidentifies.Youshouldensurethatthismakessensegiventhemap.
• Rosettaoutputsthehelicalparametersinferredfromthemodel,includingthehelicalriseandthesubunitsperturn.Thisshouldmatchwhatwasdeterminedexperimentally.
N-term in C-term in
0.30
0.35
0.40
0.45
0.50
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
test
map
FSC
(10-
4.4A
)helix placement and shift
Runningthiscommandproducesasymmetrydefinitionfile,“h.symm”,tobeusedasinputinsubsequentsteps.
Finally,weneedtomakeasmalledittothisfilefordensityrefinement.Change:
set_dof JUMP_0_0_0 z(2.20334489451302) angle_z set_dof JUMP_0_0_0_to_com x(17.509223919948) set_dof JUMP_0_0_0_to_subunit angle_x angle_y angle_z
To:
set_dof JUMP_0_0_0_to_com x y z set_dof JUMP_0_0_0_to_subunit angle_x angle_y angle_z
Thatis,deletethefirstline(whichallowsrefinementofthesymmetryoperators)
Example3B:Generatethepartialthreads.
Inthiscase,weusethepartial_threadtoolintroducedlastsectiontogeneratealltheofdifferentsequencethreadingswearegoingtomodel.Theincludedscript,scripts/generate_threadings.plwillbeusedtogeneratetheinputalignmentfile,thoughitmayalsobedonemanuallyusingatexteditor.
Theresultingalignmentfile(alignment.filt):
## 1XXX ctermin_0 # scores_from_program: 0.0 0 QARILEADAEILRAYARILEAHAEILRAQ 0 AAAAAAAAAAAAAAAAAAAAAAAAA---- -- ## 1XXX ntermin_0 # scores_from_program: 0.0 0 QARILEADAEILRAYARILEAHAEILRAQ 0 AAAAAAAAAAAAAAAAAAAAAAAAA---- -- … ## 1XXX ntermin_1 # scores_from_program: 0.0 0 QARILEADAEILRAYARILEAHAEILRAQ 0 -AAAAAAAAAAAAAAAAAAAAAAAAA--- -- … ## 1XXX ntermin_2 # scores_from_program: 0.0 0 QARILEADAEILRAYARILEAHAEILRAQ 0 --AAAAAAAAAAAAAAAAAAAAAAAAA-- -- …
Thealignmentfilesimplyslidesthesequencealongtheinputpoly-alaninemodel.
Wethenrunthepartial_threadapplicationonthismodel,producingatotalof10inputmodels:
$ROSETTA3/source/bin/partial_thread.macosclangrelease \ -database ~/Rosetta/main/database/ \
-in::file::fasta seq.fasta \ -in::file::alignment alignments.filt \
Example3C:Refineallthemodels.
Inthefinalstep,werefineeachofthemodelsagainstthedensitymap,usingthesamerelaxscriptthatwasusedinpartoneofthetutorial(withsomemodificationsforsymmetry).
Thecommandline(C_relax_density.sh):
$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database ~/Rosetta/main/database/ \ -render_density \ -in::file::s ntermin_*.pdb ctermin_*.pdb \ -parser::protocol C_relax_density.xml \ -ignore_unrecognized_res \ -edensity::mapreso 3.8 \ -edensity::cryoem_scatterers \ -crystal_refine \ -beta \ -out::suffix _relax \ -default_max_cycles 200
AndtheXMLfile:
<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta_cart"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76,C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88,A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForSymmetry name="setupdens" definition="h_edit.symm"/> <LoadDensityMap name="loaddens" mapfile="emd_6123.map"/> <FastRelax name="relaxtors" scorefxn="dens" repeats="1" cartesian="0"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="1" cartesian="1"/> </MOVERS> <PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="relaxtors"/> <Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>
Notethewaythatsymmetryinformationisloaded,withtheboldedmoverabove.Additionally,lookingattheprotocolshowsthatweperformonecycleoftorsionrefinement,followedbyonecycleofcartesianrefinement.
Finally,wecananalyzeresultsbylookingattheoutput*.scscorefiles.Foreachthreading,theyshowascorebreakdownofeachofthethreadedmodels.Wecanevaluatetheseresultsusingthecommand:
grep SCORE: *.sc | grep -v desc| sort -nk 2
Thiscommandsortstheoutputsbytotalenergy.Whatdoesthisshow?Whatifwesortbydensityenergyinstead?
4)BuildinglargesegmentsusingRosettaES
RosettaCM(section2)isapowerfultoolforrebuildingsmallsegmentsguidedbydensity.However,itpoorlydealswithmodelcompletionoflargesegmentsofprotein.Thesemayariseinseveralcases:
1. Homologymodels(particularlydistantones)mayhavelargeinsertions,orevenentiredomainsthatarelacking.
2. Themodelsproducedfromdenovo_densitymaybemissingsignificantfractionsofthebackbone3. Itmaybedifficulttomanuallytracelongstretchesoflowlocalresolutionintodensity
Toaddresstheseissues,wehavedevelopedatoolcalledRosettaEnumerativeSampling,whichusesaensemblesearchalgorithmtodeterminealargenumberofconformationsthatarebothconsistentwiththedensityandtheRosettaenergyfunction.Thistoolcanbeusedonapartialmodelsfromthedenovo_densityapplication,anincompletehomologymodel,oranyotherstartingstructure. RosettaESmodelbuildingconsistsofthreesteps.Initially,apreparationstepbuildsthefragmentsthataretobeusedinconformationalsampling.Thenarebuildingstepwillidentifyeachunassignedsegmentintheinitialmodelandbuildanensembleofpossiblesolutionsforeach.Finally,acombinationstepfindsalltheconsistentsubsetsofinteractions,andrefinesallsuchmodels(ifthereisonlyonesegment,thescriptsimplyrefinesallstructuresintheensemble).Inthiscombinationstep,ifassemblyfailstofindaconsistentsetofsolutions,anadditionalroundofsamplingwillbecarriedout,forcingdifferentsolutionsthanthepreviousmodel.
NotethatafulltutorialofRosettaESisgiveninsectionfivemaintutorial;inthis“minitutorial,”wewillonlybeusingthistooltorebuildasinglemissingsegmentfromamodel.
Comparedtotheothersections,theworkflowisabitmorecomplicatedwhenextendedtomultiplecomputecores.TohandlejobdistributionwehaveincludedapythonscriptRunRosettaES.pythatmanagesthisjobdistributionamongavailableCPUsonasinglemachine.(ThescriptisincludedaspartofRosetta,in/main/source/scripts/python/public/EnumerativeSampling,aswellasinthistutorial).Fordealingwithjobschedulersorclustersincompatiblewiththisscript,section5EgivesanoverviewofjobdistributionwithRosettaES.
Step4A.FragmentPicking
Thefirststepinvolvesselectionof"fragmentfiles,"whichpredictbackboneconformationfromlocalsequence.WehaveacustomalgorithmforfragmentpickinginRosettaES.ThesefragmentswillneedtobegeneratedbeforerunningRosettaES;thefollowingcommandwillgeneratethesefiles(A_PickFragments.sh): $ROSETTA3/source/bin/grower_prep.default.macosclangrelease \ -pdb input.pdb \ -in::file::fasta t20sA.fasta \ -fragsizes 3 9 \ -fragamounts 100 20
Thiswillgenerate1003residuefragmentsand209residuefragments,named100.3mersand20.9mers,thatarethenusedinsubsequentstepsoftherebuildingprocess.
Step4B.Generateconformationsforthemissingsegment Thegrowerconsidersassigningpositionsforeachunassignedsegmentofdensity(thatis,eachstretchofaminoacidspresentinthefastafilebutmissingfromtheinputstructure).Eachsegmentisreferredtousing
asegmentid,inwhicheachsegmentisnumberedfromN-toC-terminus(withmultiplechainsgiveninorderintheinputfastafile).Thescriptisrunintwoparts:first,thescriptisrunonceforeachsegmenttorebuild;then,thescriptisrunin“assemblymode”giventheoutputsproducedbyrebuildingeachsegmentindividually.Thus,forrebuildingthetwosegmentsinthetestcase,thescriptiscalledthreetimes:oncetobuildeachsegment,andoncetoassembletheresults. Inthefirststep,weperformconformationalsamplingforadifficultsegmentinaopferritin,generatinganensembleofputativesolutions.Thiscanbedonecallingthecommand(B_SampleSegment.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f seq.fasta \ -p difficult_loop.pdb \ -d ../2_rosettaCM_apoferritin/emd_2788.map \ -l 1 \ -c 16 \ -n loop_1
Theargumentstothisprogramareasfollows:
• -rsrunES.sh-thescriptthatislaunchedoneachcoreandcontainsRosettaflagsandinputs • -xRosettaES.xml-theXMLscriptdescribingparametersforconformationalsampling(seebelow) • -ft20sA.fasta-theinputfastafile(withchainbreaksspecifiedby‘/’) • -pinput.pdb-theinputpdbfile.Thisneedstomatchtheinputsequence,andallresiduespresentin
thefastabutabsentinthePDBwillgetbuilt. • -dT20S_48A_alpha_chainA.mrc-theinputdensitymap • -l1-thesegmentidofthesegmenttorebuild.Thiscommandshouldbecalledonceforeachsegment
torebuild,varyingthisargumentfrom1toN • -c16-thenumberofcomputecorestouse • -nloop_1-theoutputtagforthisjob(resultswillbeplacedinafolderwiththisname).Tagsshould
beuniqueforeachsegment. TheinputXMLfileexposeskeyparametersforconformationalsampling.Inthetutorial,thisfile,RosettaES.xml,containsablock: ... <FragmentExtension name="ext" fasta="full.fasta" scorefxn="dens" censcorefxn="cendens" beamwidth="32" dumpbeam="0" samplesheets="1" read_from_file="0" continuous_weight="0.3" looporder="1" comparatorrounds=”100” windowdensweight=”30” readbeams="%%readbeams%%" storedbeams="%%beams%%" steps="%%steps%%" pcount="%%pcount%%" filterprevious="%%filterprevious%%" filterbeams="%%filterbeams%%"> <Fragments fragfile="100.3mers"/> <Fragments fragfile="20.9mers"/> </FragmentExtension> ...
ThesamplingbehaviorofRosettaESiscontrolledbytheblockabove.Manyofthetagsinthisblock–fasta,dumpbeam,read_from_file,storedbeams,steps,pcount,filterprevious,comparitorrounds,andfilterbeams–areusedbythejobdistributionscripttopassresultsfromonesteptothenext,andtheyshouldbeleftas-is. Othersareuser-specified,andcanbemodifiedbasedonthesizeoftheloopandresolutionofthedata:
• beamwidth:controlsthemaximumnumberofsolutionstobeheldateachstep.Settingthevaluehigherwillincreaseruntimebutmayimproveaccuracy.
• windowdensweight:therelativecontributionofdensityinmodelselection
Formanycases,thedefaultparametersaresufficient.However,ifthesegmenttogrowislong(50+residues),youmayneedtoincreasebeamwidth;ifthedensityislowresolution,youmightneedtodecreasewindowdensweightto15or20. Severaloptionsshouldrarelybemodified,butmayneedtobeinspecificcases:
• samplesheets:Controlswhetherornotbetasheetsamplingshouldbeperformed.Itisrecommendedtousethisexceptwhenworkingwithsymmetricsystems.
• continuous_weight:Controlsthepenaltyondiscontinuousdensity.Settingthisvalueto1willcompletelyremoveanypenaltyondiscontinuousdensity;settingitcloserto0willincreasethepenalty.Youmaywishtoraisethisvalueto0.7(ormore)ifyouanticipatethesegmentyouaretryingtomodeldoesnotfollowacontinuouspathofdensity.
Finally,theoptioncomparitorroundsisusedinmulti-segmentassembly(seesection5C)AfterrunningthescriptwiththisXML,therearetwoimportantintermediateoutputfiles,placedinthefolderloop_1(theargumentto-n):
• .lps(forlooppartialsolution)files,whicharethencombinedinstep5C,incaseswheretherearemultiplesegmentstomodel
• loop_1/beam_X.txtfiles,whereXcorrespondstothenumberofresiduesaddedtothesegment.Thesearegeneratedasthesearchaddsresidues,andareusedtopassinformationfromonesteptothenext(asadditionalresiduesareaddedinasinglesegment).
Finally,whileinmostcases,userswillwanttowantforaruntofinishtoinspectthebeam,ifthesamplingresultswanttobeinspectedasthecodeisrunning,thefinaloutputensemblecanbesavedasPDBfileswiththecommand(B2_InspectIntermediates.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f seq.fasta \ -p difficult_loop.pdb \ -d ../2_rosettaCM_apoferritin/emd_2788.map \ -l 1 \ -db loop_1/beam_17.txt
Note,thenumberofthebeamfile(17)correspondstothetotalnumberofresiduesbuilt.Intermediateresults(aftergrowingNresidues)canbeinspectedbychangingthistoalowernumber(e.g.,beam_14.txtshowssolutionsafter14residueshavebeenrebuilt).