29
Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last updated: August 2018 This tutorial is intended to introduce users to several different ways Rosetta may be used to solve various structure determination tasks given 3-5Å cryoEM density data. It is not intended to replace the user’s guide, available at https://www.rosettacommons.org/manuals/latest/main/. The tutorial is split up into four parts. 1. An introduction to Rosetta in general, showing how one may score structures and minimize structures guided by experimental density data 2. Our fragment-based refinement protocol for refinement against 3-5Å EM density 3. Our model rebuilding protocol (RosettaCM), where one wishes to recombine homologous structures, and rebuild small missing regions (<12 residues) 4. Our de novo model-building tools, where one wishes to rebuild missing regions of a structure (for example, from a homology model 5. Our model completion tools, where one wishes to complete a partial model built by the de novo tool or wishes to rebuild large missing regions (12 or more residues) In each scenario, we present the most basic usage of Rosetta for the task, and then describe additional options that may be useful. Command-line flags and input scripts are provided in shaded boxes, with boldfaced text indicating parameters of note. These parameters are described in the text following the command line. Note: in all sections, you will need to update the command scripts to point at your installation of Rosetta and the Rosetta database. Which section should one read? If you want a straightforward introduction to scoring and basic refinement of structures in Rosetta, read Section 1 If you have a model that you want to refine into a 3 - 4.5Å density map, read Section 2. If you have a density map at 3 - 4.5Å (or worse!), one or more partial models, and you want to combine the models and rebuild short insertions and deletions, read Section 3. If you have a 3 - 4.5Å density map with no homology information available, and want to build a model, read Section 4. If you have a 3 - 4.5Å density map and a very incomplete model you want to complete, read Section 5. Other notes. For all the applications in this tutorial, it is recommended that you download the latest weekly release of Rosetta. Also, for brevity, some of the command lines and XML scripts have been trimmed in this document. The tutorial files contain the full command lines and XML scripts; if something is omitted in this document, it should not be changed from the value in the tutorial file.

Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Embed Size (px)

Citation preview

Page 1: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Tutorial:RosettatoolsforstructuredeterminationincryoEMdensityBrandonFrenz,RayY.-R.Wang,FrankDiMaioLastupdated:August2018

ThistutorialisintendedtointroduceuserstoseveraldifferentwaysRosettamaybeusedtosolvevariousstructuredeterminationtasksgiven3-5ÅcryoEMdensitydata.Itisnotintendedtoreplacetheuser’sguide,availableathttps://www.rosettacommons.org/manuals/latest/main/.

Thetutorialissplitupintofourparts.

1. AnintroductiontoRosettaingeneral,showinghowonemayscorestructuresandminimizestructuresguidedbyexperimentaldensitydata

2. Ourfragment-basedrefinementprotocolforrefinementagainst3-5ÅEMdensity3. Ourmodelrebuildingprotocol(RosettaCM),whereonewishestorecombinehomologous

structures,andrebuildsmallmissingregions(<12residues)4. Ourdenovomodel-buildingtools,whereonewishestorebuildmissingregionsofastructure(for

example,fromahomologymodel5. Ourmodelcompletiontools,whereonewishestocompleteapartialmodelbuiltbythedenovo

toolorwishestorebuildlargemissingregions(12ormoreresidues)

Ineachscenario,wepresentthemostbasicusageofRosettaforthetask,andthendescribeadditionaloptionsthatmaybeuseful.Command-lineflagsandinputscriptsareprovidedinshadedboxes,withboldfacedtextindicatingparametersofnote.Theseparametersaredescribedinthetextfollowingthecommandline.

Note:inallsections,youwillneedtoupdatethecommandscriptstopointatyourinstallationofRosettaandtheRosettadatabase.

Whichsectionshouldoneread?

• IfyouwantastraightforwardintroductiontoscoringandbasicrefinementofstructuresinRosetta,readSection1

• Ifyouhaveamodelthatyouwanttorefineintoa3-4.5Ådensitymap,readSection2.• Ifyouhaveadensitymapat3-4.5Å(orworse!),oneormorepartialmodels,andyouwantto

combinethemodelsandrebuildshortinsertionsanddeletions,readSection3.• Ifyouhavea3-4.5Ådensitymapwithnohomologyinformationavailable,andwanttobuilda

model,readSection4.• Ifyouhavea3-4.5Ådensitymapandaveryincompletemodelyouwanttocomplete,readSection5.

Othernotes.

Foralltheapplicationsinthistutorial,itisrecommendedthatyoudownloadthelatestweeklyreleaseofRosetta.

Also,forbrevity,someofthecommandlinesandXMLscriptshavebeentrimmedinthisdocument.ThetutorialfilescontainthefullcommandlinesandXMLscripts;ifsomethingisomittedinthisdocument,itshouldnotbechangedfromthevalueinthetutorialfile.

Page 2: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

1)Rosettaandelectrondensitybasics

ThissectionprovidesabriefintroductiontousingRosetta,andanoverviewofusingdensitydatawithinRosetta.

OverviewofRosetta

TheRosettadocumentationisagoodsourceofadditionalinformationonseveralofthetoolsdescribedinthisdocument.Thisisavailableathttps://www.rosettacommons.org/docs/latest/Home.

ThetoolsdescribedinthisdocumentusetheRosettaScriptsframework,describedathttps://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/RosettaScripts.Briefly,thisallowsprotocolstobedefinedasaseriesofatomic"Movers"whichmanipulateastructure.Theformatisasfollows:

<ROSETTASCRIPTS> <SCOREFXNS> </SCOREFXNS> <MOVERS> </MOVERS> <PROTOCOLS> </PROTOCOLS> <OUTPUT /> </ROSETTASCRIPTS>

Eachblockcontainsinformationinrunningtheprotocol:<SCOREFXNS>and<MOVERS>areusedtodeclarescorefunctionsandmovers;while<PROTOCOLS>iswherethestepsoftheprotocolareenumerated.

Rosettatoolsarerunviathecommandline,withflagscontrollinggeneralprogrambehavior.Manyoftheflagsspecificallyfordensityrefinementareoutlinedinthesectionsfollowing.

DensityscoringinRosetta

AgreementtodensityisimplementedinRosettaasanadditionalenergyterm.Rosettaassessesagreementtodensitybycomputingthedensitythatonewouldexpecttosee,givenamodel,andmeasuringtheagreementoftheexpectedandexperimentaldensity.

elec_dens_fastThisscoretermisrecommendedfornearlyallusesofdensityrefinementinRosetta.Itusesinterpolationonaprecomputedgridofper-atomscorestoapproximatethedensitycorrelations.Thisversionissignificantlyfaster(~10x)thentheexactscoringtermbelow,andisveryhighlycorrelated.

TheseenergytermsmaybeprovidedtoRosettaintwoways.First,itmaybeprovidedinaRosettaScriptXMLfileasinput:

<Reweight scoretype="elec_dens_fast" weight="35.0"/>

Fornon-Rosettascriptapplications,thefollowingflagcontrolsthedensityscoringfunctionweight:

-edensity:fast_dens_wt 35.0 Therecommendedweightsforeachofthesetermsvarydependingonthedensitymapresolution,starting

Page 3: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

modelquality,andprotocol.Section2describeshowtheweightsmaybetuned.However,thefollowingaregoodrulesofthumbforsettingthedensityweightwithinRosetta:Atresolutionsbetterthan2.5Å:anelec_dens_fastweightof65.0isgenerallyreasonable.Atresolutionsbetween2.5Åand3.5Å:anelec_dens_fastweightof50.0isgenerallyreasonable.Atresolutionsworsethan3.5Å:anelec_dens_fastweightof35.0isgenerallyreasonable.Incentroidmode(describedinpart4):anelec_dens_fastweightof10.0isgenerallyreasonable

Atverylowresolutions(worsethan6Å),theweightmayneedtobefurtherreduced.Ingeneral,iftheRosettaenergiesarepositive(orsignificantoutliersareflaggedbyMolprobityorothervalidationprograms)thentheweightsneedtobereduced.

Inadditiontothescoretermsabove,therearealsoseveralflagsthatcontrolmapscoringbehavior.MapsarereadintoRosettausingeithertheflag:

-edensity::mapfile mapfile.mrc

OrfromXML:

<LoadDensityMap name="loaddens" mapfile="mapfile.mrc"/>

MapsmaybeineitherCCP4orMRCformat(themaptypeisautomaticallydetectedfromtheheaderinfo).

Theresolutionofthemap,usedwhencomparingcalculatedtoexperimentaldensity,isspecifiedwiththeflag:

-edensity::mapreso 5.0

Mapsmayalsoberesampledtoreducememoryusageandruntime.Thisisdonethroughtheflag:

-edensity::grid_spacing 2.0

Noticethatthisflagshouldneverbemorethanhalfthegivenresolution,andifusingthefastscoringfunctionnevermorethanathirdoftheresolution.Forbothparameters,thedefaultisgenerallyfine(don’tresample,andassumetheresolutionis~3xthegridsampling).

Finally,onemaychoosetocalculatedensityusingeithercryoEMorX-rayscatteringfactors.Atlowresolution,thisprobablymakeslittledifference,butmightatresolutionsbetterthanabout3.5Å.ThedefaultistouseX-rayscatteringfactors;toturnoncryoEMscatteringfactorsinstead,usethefollowingflag:

-edensity::cryoem_scatterers

Example1A:ScoringaPDBinRosettawithdensity

Mostsimply,onemaywishtosimplyscoreamodelusingRosetta’senergyfunctionincludingthedensityterms.Thisiseasilyaccomplishedusingthescore_jd2application.Asamplecommandlinetorescorethestructureindensityisgivenin1_rosetta_basics/A_run_rescore.sh.ItillustratestheuseofvariousdensityflagstoprovideRosettawithexperimentaldensityinformation.

Page 4: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

$ROSETTA3/source/bin/score_jd2.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 1isrA.pdb 1issA.pdb \ -ignore_unrecognized_res \ -edensity::mapfile 1issA_6A.mrc \ -edensity::mapreso 5.0 \ -edensity::grid_spacing 2.0 \ -edensity::fastdens_wt 35.0 \ -edensity::cryoem_scatterers \ -crystal_refine

Someflagsofnoteareboldfacedabove.First,theinputstructureisprovidedwiththecommand-in::file::s.ThisiscommontomanyRosettaapplications,andmorethanoneinputmaybeprovided;eachwillbeprocessedindependently.Theflagsbeginningwith–edensity::tellRosettaaboutthedensitymapintowhichitisbeingfit.Thenameofthemapfile(inCCP4orMRCformat),theresolutionofthemap,thegridsamplingofthemap(whichshouldneverbemorethanhalftheresolution),andtheweightsonthevariousfit-to-densityscoringfunctions.Thesesameflagsarereusedformanydifferentprotocolsinadditiontorelax.Finally,theflag-crystal_refinetheflagturnsonseveraldensity-relatedoptionsrelatedtoPDBreadingandwriting,andshouldalwaysbeusedwhenrefiningagainstdensitydata.

Note:TheinputPDBmustbealignedtothedensitymapusingsomeexternaltool.Rosettawilloptionallyrigid-bodyminimizethestructureintodensitybeforerescoringbyprovidingtheflag–edensity::realignmintotheapplication.Ifthisisdone,theflag–out::pdbwillwritetheminimizedPDBfiletoaPDBfile.

Thiscommandlineoutputsascorefile,score.sc,thatgives,foreachstructurespecifiedwith-in::file::s,thescorewithrespecttoeachterminRosetta’senergyfunction.ThemeaningofindividualscoretermsaswellasanoverviewoftheRosettaenergyfunctioncanbefoundinthepaper:

AlfordRF,Leaver-FayA,JeliazkovJR,O'MearaMJ,DiMaioFP,ParkH,ShapovalovMV,RenfrewPD,MulliganVK,KappelK,LabonteJW,PacellaMS,BonneauR,BradleyP,DunbrackRLJr,DasR,BakerD,KuhlmanB,KortemmeT,GrayJJ.TheRosettaAll-AtomEnergyFunctionforMacromolecularModelingandDesign.JChemTheoryComput.2017Jun13;13(6):3031-3048.

Example1B:SimplerefinementintodensityusingRosettaScriptsandrelax

InthissectionweintroduceRosettaScriptsbywayofaverysimplerefinement-into-densityexample.RosettaScriptsprovidesanXMLscriptinginterfacetoRosettathatallowsfine-grainedcontrolofprotocols.ThesyntaxisfullydescribedintheRosettadocumentation;however,averybriefintroductionisprovidedhere.ThebasicsyntaxfortheXMLisillustratedhere(1_rosetta_basics/B_relax_density.xml)

<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta_cart"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="1issA_6A.mrc"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="2" cartesian="1"/> </MOVERS> <PROTOCOLS>

Page 5: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

<Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>

Therearethree"blocks"ofdeclarationsinthisscript.Inthefirst,<SCOREFXNS>…</SCOREFXNS>,thescorefunctionstobeusedthroughouttheprotocolaredeclared;thesecond,<MOVERS>…</MOVERS>,movers–oratomicoperationsthatmodifyastructure–aredeclared;finally,thethird,<PROTOCOLS>…</PROTOCOLS>,afullprotocolisdeclaredasasequenceofmovers.

Inthisparticularexample,wedeclareasinglescorefunction,dens,whichusesthescorefunctionbeta_cart(adefaultscorefunction,don’tneedtoworryaboutit),andturnsonelec_dens_fast,thefit-to-densityscore,withaweightof35.Wethendeclarethreemovers,SetupForDensityScoring,LoadDensityMap,andFastRelax,whichsetsuptheloadedstructurefordensityscoring,loadsamapintomemory,andthenrefinesthestructureusingtheFastRelaxprotocol.Thedeclaredscorefunction,dens,isusedasaninputtotheFastRelaxmover.

Torunthisscript,weusethefollowingcommandline(1_rosetta_basics/B_relax_density.sh):

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 1isrA.pdb \ -parser::protocol ex_B1_run_RS_relax_density.xml \ -ignore_unrecognized_res \ -edensity::mapreso 5.0 \ -edensity::cryoem_scatterers \ -crystal_refine \ -out::suffix _relax \ -beta

Note:Wedonothavetospecifythedensityweightorthemapfileonthecommandline,sincetheyarehandledwithintheXMLfile.However,otherdensityoptionsmustbespecifiedonthecommandline.WhenusingRosettaScripts,thedensityweightsmustbespecifiedintheXML,theinputmapmaybespecifiedeitherway.

Finally,inthepreviousXMLfile,thetagcartesian=1appears,whichrefinesthestructureinCartesianspace.Rosettaalsoallowsrefinementintorsionalspace,whichmaybebetterforcapturingdomainmotion,andforfurtherreductioninmodelparametersagainstlow-resolutiondata.Toenabletorsionalrefinement(1_rosetta_basics/C_relax_tors_density.xml),wemakethreesmallchangestotheXML:

<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="1issA_6A.mrc"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="5" cartesian="0"/> </MOVERS> <PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/>

Page 6: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

<Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>

Page 7: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

2)Modelrefinementviaiterativelocalrebuilding

Inthissection,weintroduceourcryoEMrefinementprotocol,whichusesaniterativelocalrebuildingproceduretoescapelocalminimaduringrefinement.Thesectionisdividedintotwoparts,inthefirst,weintroducethemethodfornon-symmetricsystems;inthesecond,wedescribehowtousethismethodforsymmetricsystems.

Asarunningexample,werefinemodelsofthetransmembraneregionoftheTRPV1ionchannel,usinga3.4ÅcryoEMsingleparticlereconstruction(M.Liao,E.Cao,D.Julius,Y.Cheng,Nature,2013),andthedepositedmodel(id:3j5p)asastartingmodel.Wewillfirstrefinethisasymmetrically,andthenintroducesymmetricrefinement.

Example2A:AsymmetricrefinementintocryoEMdensity

AsummaryoftheXMLusedforrefinement(2_cryoem_refinement/A_asymm_refine.xml)isshownbelow.Following,abriefdescriptionofthemoversandoptionsavailableisprovided.

<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="cen" weights="score4_smooth_cart"> <Reweight scoretype="elec_dens_fast" weight="20"/> </ScoreFunction> <ScoreFunction name="dens_soft" weights="beta_soft"> <Reweight scoretype="cart_bonded" weight="0.5"/> <Reweight scoretype="pro_close" weight="0.0"/> <Reweight scoretype="elec_dens_fast" weight="%%denswt%%"/> </ScoreFunction> <ScoreFunction name="dens" weights="beta_cart"> <Reweight scoretype="elec_dens_fast" weight="%%denswt%%"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="%%map%%"/> <SwitchResidueTypeSetMover name="tocen" set="centroid"/> <MinMover name="cenmin" scorefxn="cen" type="lbfgs_armijo_nonmonotone" max_iter="200" tolerance="0.00001" bb="1" chi="1" jump="ALL"/> <CartesianSampler name="cen5_50" automode_scorecut="-0.5" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <CartesianSampler name="cen5_60" automode_scorecut="-0.3" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <CartesianSampler name="cen5_70" automode_scorecut="-0.1" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <CartesianSampler name="cen5_80" automode_scorecut="0.0" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <ReportFSC name="report" testmap="%%testmap%%" res_low="10.0" res_high="%%reso%%"/>

Page 8: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

<BfactorFitting name="fit_bs" max_iter="50" wt_adp="0.0005" init="1" exact="1"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="1" cartesian="1"/> </MOVERS> <PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="tocen"/> <Add mover="cenmin"/> <Add mover="relaxcart"/> <Add mover="cen5_50"/> <Add mover="relaxcart"/> <Add mover="cen5_60"/> <Add mover="relaxcart"/> <Add mover="cen5_70"/> <Add mover="relaxcart"/> <Add mover="cen5_80"/> <Add mover="relaxcart"/> <Add mover="relaxcart"/> <Add mover="report"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>

Theprotocolisabitinvolved,butisdescribedinthefollowing.Thefirstthingtonoteistheuseofmacroslike"%%denswt%%".Thesearecommandargumentsthatmaybespecifiedfromthecommandlinethroughtheflag–parser::script_varsdenswt=25.0.Theprotocolaboveusesthesemacrosinplaceofparametersthatuserswouldmostliketochange;otherparametersshouldbeleftasisexceptforadvancedusers.

ThemainadditionistheCartesianSamplermover.Thismoveriterativelylocallyrebuildsthestructureinuser-specifiedorautomaticallydeterminedregions.Abriefdescriptionoftheargumentstothismover:

name="cen5_70":thenameofthemover,referredtointhe<PROTOCOLS>block(thiscanbeanything)strategy="auto":thestrategytouseinselectingwhattorebuild.Oneof:

• auto:selectregionsautomaticallybasedondensityfit&localstrain(usingthecutoffinautomode_scorecut,e.g.,automode_scorecut="-0.1")

• user:manuallyspecifyresidues(usingtheflagresidues,e.g.,residues="32A-48A")• rama:selectregionsautomaticallybasedonramascoreonly

rms="1.5":thecutoffonsimilaritywhenlocallyrebuilding.Increasingthisvaluewillincreasemodel

diversity(allowingworsestartingmodelstoberefinedbutrequiringadditionalsampling)ncycles="200":thenumberofrebuildingcyclestoconsider.Increasingthiswillincreaseruntimeand

slightlyincreasemodeldiversity.fraglens="7":thesegmentlengthtoreplace.Thismustbeanoddnumberfrom5-13,andincreasingthis

valuewillincreasemodeldiversitysignificantly.

Theremainingoptionsshouldneverbechanged.

Anotheroptionispassedtothedensityscoringviathe<Setscale_sc_dens_byres=.../>tag.Intherefinementprotocol,thissetsaper-amino-acidsidechainreweighing.Theweightsshowninthisexampleweredeterminedbyfittingtheseparametersintorefinedstructuresintoseveral3-5ÅcryoEMdensitymaps;theendresultisaslightdownweighingofsidechaindensity,particularlyforchargedsidechains.Thisshouldnotbechangedexceptbyadvancedusers.

Page 9: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

TheMinMoverfirstminimizesthestructureusingalow-resolutionenergyfunction(cen).Wehavefoundthisstepismostusefulforimprovingproteinbackbonegeometry,particularlywithhand-tracedmodels.Thislow-resolutionscorefunctionusesthecentroidrepresentation,whichisenabledbytheSwitchResidueTypeSetmover.

TheFitBFactorsmoverfitsreal-spaceatomicBfactorstomaximizemodel-mapcorrelation.AconstraintenforcingnearbyatomstotakethesameBfactorsisalsoemployed,andtheweightonthistermiscontrolledwiththewt_adpterm(0.0005isgenerallywell-behaved).Finally,init=1meanstodoaquickscanofoverallBfactorsbeforebeginningrefinement;ifthereismorethanonecalltothismoverinasingletrajectory,thenonlythefirstneedstohaveinit=1.Exact=1shouldalwaysbeused.

Finally,theReportFSCmoverassessesmodelagreementtothemapusedforfittingaswellasanindependentmapusingtheintegratedFSCoverhigh-resolutionshells.Wehavefoundintegratingfrom10Åtotheresolutionofthedataisbestformodeldiscrimination.

Finally,thiscommandisexecutedusingthefollowing(scenario2_cryoem_refinement/A_asymm_refine.sh):

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 3j5p_transmem_A.pdb \ -parser::protocol A_asymm_refine.xml \ -parser::script_vars denswt=35 rms=1.5 reso=3.4 map=half1_34A.mrc testmap=half2_34A.mrc \ -ignore_unrecognized_res \ -edensity::mapreso 3.4 \ -default_max_cycles 200 \ -edensity::cryoem_scatterers \ -beta \ -out::suffix _asymm \ -crystal_refine

Inboldaretheparametersthatshouldbechangedinadaptingthisrunforothersystems.Thefirstistheinputstructure,whichshouldbespecifiedwiththeargument–in::file::s.Thesecondaretheparameterstobepassedthroughtothescript(usingthemacroreplacementmachineryofRosettaScripts).Threeofthesedescribetheinputmaps:

map=half1_34A.mrc–themaptorefineagainsttestmap=half2_34A.mrc–anindependentmapforvalidation

(ifnotusingsplitmaps,justprovidethesamemapasthepreviousargument)reso=3.4–theresolutionofthedata

(notethatthisneedstobeprovidedtwiceinthecommandline,onceforscoringandonceforreporting)

Theothertwoareparameterstothealgorithm:denswt=35–theweightontheexperimentaldensitydatarms=1.5–theamountofdeviationtoallowinfragmentinsertionmoves

(largervalueswillleadtomoremodeldeviation)

Thedensityweightof25worksreasonablywellasastartingpoint,butonemightwanttoexploreseveraldifferentvaluesusinganindependentreconstruction.Manualinspectionofoutputmodelsformolprobityscore,freeFSC,and(freeFSC–workFSC)shouldprovidecluesastowhichweightworksbest.

Adescriptionofmuchoftheworkinthissectionisdescribedinthereference:

WangRY,SongY,BaradBA,ChengY,FraserJS,DiMaioF.Automatedstructurerefinementofmacromolecularassembliesfromcryo-EMmapsusingRosetta.Elife.2016Sep26;5.pii:e17219.

Page 10: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

JobdistributionItisgenerallyusefultosample~100modelsfromeachstartingpoint.Forthispurpose,itmaybeusefultorunmultiplejobsinparallel.Topreventoutputstructuresfromclobberingoneanother,theflag–out::suffixmaybeuseful,whereeachseparatejobisgivenadifferentsuffix.

Forexample,ona16-coremachine,wemayspecify-out::suffix_$1,then(usingGNUparallel)runthefollowing:

parallel –j16 ./A_asymm_refine.sh {} ::: {1..16}

Finally, GNU parallel allows launching of jobs remotely if SSH keys have been set up for passwordless login. To run:

parallel –S 16/node1,16/node2,16/node3,16/node4 –-workdir . ./B1_rosettaCM_singletarget.sh {} ::: {1..48}

Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentation(https://www.gnu.org/software/parallel/)formoreinformation.

AnalyzingresultsandmodelselectionWhilethisisanactivetopicofresearch,generally–onceadensityweighthasbeenchosen–toselectthebestmodelsfromamongthefullset,wewanttoselectmodelsoptimizingbothmodelgeometryandfreeFSCvalues.Modelgeometrymaybeevaluatedusingeither:a)Molprobity,orb)Rosettaenergiesaftersubtractingdensityenergies.Thelattermaybedonebyinspectingthescore*.scfilesproducedasoutput.

Usingtheabovescript,thefreeFSCvaluemaybedeterminedfromtheoutputPDBfiles.Theheadercontainsalinelike:

REMARK 1 FSC[mask=4.45657](10:3) = 0.590966 / 0.591017

Thetwonumbersattheendofthelineindicatethe"work"and"free"integratedFSCvalues.

Generally,weselectthebest20%ofmodelsbygeometry,andselectingthebestoverallbyfreeFSC.Thetop5modelsshouldbeinspectedformodelconvergenceaswellasvisuallyinspectedfordensitymapagreement.

Example2B:SymmetricrefinementintocryoEMdensity

Asthisisasymmetricsystem,tocorrectlyevaluatetheenergeticsofthesystem,weneedtomodelwithsymmetry-relatedcopiespresent.Thismaybedonethroughatwo-stepprocess:first,werunthemake_symmdef_file.plscripttoprepareadescriptionofthesymmetryofthesysteminaRosetta-readableformat.Next,weenablesymmetricscoringandoptimizationwithintheXMLfile.

TheinformationthatRosettaneedstoknowaboutasymmetricsystemisencodedinthesymmetrydefinitionfile.IttellsRosetta:(a)howtoscoreastructuresymmetricallyfromonlyasymmetricunitinteractions,and(b)howtherigid-bodydegreesoffreedomareallowedtomovetomaintainthesymmetryofthesystem.

Toaidincreatingasymmetrydefinitionfilefromasymmetric(ornear-symmetric)PDB,anapplication,

Page 11: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

make_symmdef_file.pl,hasbeenincludedinsrc/apps/public/symmetry.TogeneratethesymmetrydefinitionfileforTRPV1,werunthecommandin2_cryoem_refinement/B1_make_symmdef.sh.

$ROSETTA3/source/src/apps/public/symmetry/make_symmdef_file.pl \ -m NCS -a A -i B \ -p 3j5p_transmem.pdb –r 1000 > TRPV1.symm

Thisscriptneedsafewpiecesofinformation:with–m,thetypeofsymmetrytogenerate(hereNCS),with–a,theprimarychain(hereA),andwith–i,anadjacentchainineachsymmetrygroup,separatedbyspaces(herejustB).ForCnsymmetries,onlyoneadjacentchainisgiven;forDn,twoaregiven.Finally,with–r,wegivethecontactdistancebetweenaneighborchainandtheprimarychainnecessarytoincludethatsubunitexplicitly(here,1000,toensureeverysymmetricallyrelatedcopyisincluded).Iftheinputsystemisasymmetric,thescriptwillmakeasymmetricalversionofit(sometimessignificantlyperturbingitintheprocess).Therearealotofotheroptions,includingforcingsymmetricalorderandhelicalandhigher-ordersymmetries,seethedocumentation!

InadditiontothedefinitionfilewrittentoSTDOUT,thescriptwillalsowriteafile3j5p_transmem_symm.pdb,containingthesymmetrizedversionoftheinputfile,andafile3j5p_transmem_INPUT.pdb,thatcontainsonlythemainchain,tobeusedasinput(inadditiontothesymmetrydefinitionfile).

Thesymmetrydefinitionfilelookssomethinglikethis:

symmetry_name TRPV1__4 E = 4*VRT0_base + 4*(VRT0_base:VRT3_base) + 2*(VRT0_base:VRT2_base) anchor_residue CoM ... set_dof JUMP0_to_com x(11.7023996817515) set_dof JUMP0_to_subunit angle_x angle_y angle_z ...

Theomittedsectionsdescribeasystemofvirtualresiduesthatmaintainthesymmetryofthesystem,andtheygenerallyshouldremainunedited.

Thetwoset_doflinesshouldbeedited.Therearetwopossibilitieswhenrefiningsymmetricstructuresintodensity:

A)wedon’twanttorefinetherigidbodyorientationoftheentiresystem;weknowthesymmetryaxesandwedon’twantthemtomove

B)wedowanttorefinetheorientationoftheentiresystem,includingsymmetryaxes

Generally,incryoEM,wherethemapsaresymmetricallyaveraged,andthesymmetryisknown,wewanttousestrategyA.However,insomecases(forexample,ifourstartingmodelwasnotperfectlysymmetric)wewanttousestrategyB.Inbothcases,aminoredittotheset_doflinesinthesymmdeffileisnecessary.

ForstrategyA,becauseweareusingdensity,weneedtochangethefirstset_doflinetothefollowing:

set_dof JUMP0_to_com x y z

ForstrategyB,weleavethetwolinesunchangedandinsteadaddathirdline:

set_dof JUMP0 x y z angle_x angle_y angle_z

ForDnsymmetries,thechangesaresimilar,exceptinAthejumpnameisJUMP0_0_to_com.TherestofthissectionusesstrategyA;theeditedsymmetrydefinitionfileisinscenario1_cryoem_refinement/C4_edit.symm

Page 12: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Onceasymmetrydefinitionfilehasbeengenerated,thenrefiningstructuresinRosettasymmetricallyisstraightforward.ThechangestothepreviousXMLfileareindicatedbelow(see2_cryoem_refinement/B2_symm_refine.xml):

... <ScoreFunction name="cen" weights="score4_smooth_cart" symmetric="1"> <ScoreFunction name="dens_soft" weights="eta_soft" symmetric="1"> <ScoreFunction name="dens" weights="talaris2013_cart" symmetric="1"> ... <SetupForSymmetry name="setupsymm" definition="%%symmdef%%"/> ... <SymMinMover name="cenmin" scorefxn="cen" type="lbfgs_armijo_nonmonotone" max_iter="200" tolerance="0.00001" bb="1" chi="1" jump="ALL"/>

Inallthreedeclaredscorefunctions,symmetric=1mustbegiven.Additionally,theSetupForDensityScoringmovermustbereplacedwiththeSetupForSymmetrymover.Finally,theMinMovermustbereplacedwithitssymmetriccounterpart,SymMinMover.

Thecommandforrunningthisscriptislargelythesameasbefore,withafewadditions:

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 3j5p_transmem_A.pdb \ -parser::protocol A_asymm_refine.xml \ -parser::script_vars \ denswt=25 rms=1.5 reso=3.4 map=half1_34A.mrc \ testmap=half2_34A.mrc symmdef=TRPV1_edit.symm \ -ignore_unrecognized_res \ -score_symm_complex false \ -edensity::mapreso 3.4 \ -default_max_cycles 200 \ -edensity::cryoem_scatterers \ -beta \ -out::suffix _symm \ -crystal_refine Thecommandsymmdef=TRPV1_edit.symmpassesthesymmetrydefinitionfiletoRosetta.Theflag-score_symm_complexfalsedependsonwhetherstrategy(a)or(b)wasemployedabove.If(a),thenfalseshouldbeused;if(b),thentrueshouldbeused.Note:TheinputPDB(-in::file::s)isofthemonomer(thatis,theasymmetricunit).

Page 13: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

3)Modelrebuildingguidedbyexperimentaldensitydata

Inthisscenario,weintroduceatool,RosettaCM,forbuildingmissingportionsofamodelguidedbydensitydata.Whileprimarilygearedtowardscomparativemodeling,itmayalsobeusefulforbuildingportionsofaproteinthataredisorderedwhencrystallizedordifficultregionsinhand-builtmodels.Inthisscenario,weintroducethebasicrebuildingprotocol,thenshowhowthetoolmayalsobeusedto:

• Combinepiecesfrommultipletemplatemodelsguidedbydensity• Rebuildwithuser-definedrestraints• Iterativelyrebuildmodelsindifficultcasesdifficultcases

Asarunningexample,weusethe20Sproteasome(XuemingLietal.,NatureMethods,2013),whereonlyasubsetofparticleswereused,resultingina4.1Åreconstruction.Wearebuildingmodelsstartingfromahomologousstructure(pdbid:1iru)asthestartingmodel(25%/32%sequenceidentitytochainsA/B).

Example3A:PreparingtemplatesforuseinRosettaCM

Inmanycases,muchofthesetupworkishandledbyascript,setup_RosettaCM.pyinRosettaTools(aseparaterepositoryavailablefromrosettacommons.org).Thisscripttakesaninputalignmentinavarietyofformats,andpreparestheinputsautomatically.Itisexecutedbyrunningthecommand:

setup_RosettaCM.py \ -–fasta t20s.fasta \ --alignment tmpl.fasta \ --alignment_format fasta \ --templates tmpl.pdb \ --rosetta_bin ~/Rosetta/main/source/bin \ --verbose

Inputsincludethefull-lengthfasta,analignmentfile–ineitherfasta,ClustalW,orHHSearchformat–andthecorrespondingtemplatePDBfiles.ThisscriptwillprepareallthenecessaryinputsinordertorunRosettaCM.

Alternately,thesetupmaybeperformedmanually.Inthiscase,sinceweareusingsomenonstandardfeatures(symmetryanddensity),andwehavetwochainsintheasymmetricunitwewilldothis;alternately,theinputsfromthepreviousstepmaybeusedasastartingpointandsubsequentlymodified.Inthiscase,wefirstconvertouralignmenttoRosettaformat(scenario2_model_rebuilding/20S_1iru.ali):

## 1XXX_ 1iruAH_thread # hhsearch scores_from_program: 0 1.00 0 TVFSPDGRLFQVEYAREAVKK-GSTALGMKFANGVLLISDKKVRSRLIEQNSIEKIQLIDDYVAAVTSGLVADAR... 0 TIFSPEGRLYQVEYAFKAINQGGLTSVAVRGKDCAVIVTQKKVPDKLLDSSTVTHLFKITENIGCVMTGMTADSR... --

Inthisformat,thefirstlineis'##'followedbyacodeforthetargetandoneforthetemplate.Thesecondlineidentifiesthesourceofthealignment;thethirdjustkeepasitis.Thefourthlineisthetargetsequenceandthefifthisthetemplate;thenumberisan'offset',identifyingwherethesequencestarts.However,thenumberdoesn'tusethePDBresidbutjustcountsresiduesstartingat0.Thesixthlineis'--'.Multiplealignmentsmaybeconcatenatedinasinglefile,withthetemplatecodeidentifyingthetemplatecorrespondingtoeachalignment.

RosettaCMtakesasinputspartiallythreadedmodels,thatismodelswherealignedpositionshavetheir

Page 14: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

residueidentitiesremapped,andunalignedresiduesarenotpresent.Togeneratethesemodelsfromanalignmentfileandtemplate,wecanruntheRosettacommand(3_model_rebuilding/A_partialthread.sh):

$ROSETTA3/source/bin/ partial_thread.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::fasta t20s.fasta \ -in::file::alignment 20S_1iru.ali \ -in::file::template_pdb 1iruAH_aln.pdb

Thiswilloutputapartiallythreadedmodelin1iruA_thread.pdbthatiscorrectlynumberedforinputintoRosettaCM.

Next,weneedtosetupsymmetricmodelingwithRosettaCM.AsinScenario1,weusethemake_symmdef_file.plscriptinordertogenerateasymmetrydefinitionfileforuseinRosetta.AstraightforwardwaytodosoistouseChimeratodockthenecessarychainsintodensity.Weneedasingle"primarychain"andacoupleofanadjacentchainineachpointgroup;sincetheproteasomefeaturesD7symmetry,thatmeansweneedanadjacentchaininthe7-foldcomplex,aswellasachainintheoppositering.Anexamplehasbeencreatedinscenario2_model_rebuilding/setup_symm.pdbwherethreecopiesofthethreadedmodelhavebeendockedintodensitywithChimera.TogenerateourD7symmetryfilefromthisinput,wethensimplyhavetorunthecommand(3_model_rebuilding/B_make_symmdef.sh):

~/rosetta_source/src/apps/public/symmetry/make_symmdef_file.pl \ -m NCS -a A -i B C \ -p setup_symm.pdb –r 1000 > D7.symm

Sincewehavealreadycreatedtheinputtemplatesusingthepartial_threadapplication,wecanignorethesetup_symm_INPUT.pdbfileandusetheoutputofpartialthreadastheinput.However,westillneedtoalignallthethreadedmodelstothisinputstructure.ThiscaneitherbedonewiththeprogramTMalign(externaltoRosetta)orbyusingChimeratodocktheindividualthreadedmodelsintodensity.Inthiscase,wherewehavejustonetemplate,ithasalreadybeenalignedtothetemplateinscenario2_model_rebuilding/tmpl_thread_aln.pdb.

AsinScenario1,weneedtomakeasmalledittothesymmetrydefinitionfilefordensityrefinement.Changethefollowinglines:

set_dof JUMP0_0_to_com x(35.3434689631743) set_dof JUMP0_0_to_subunit angle_x angle_y angle_z set_dof JUMP0_0 x(39.2905097135684) angle_x

To(3_model_rebuilding/D7_edit.symm):

set_dof JUMP0_0_to_com x y z set_dof JUMP0_0_to_subunit angle_x angle_y angle_z

Note:The20Sproteasomecaseweareusingcontainstwochainsintheasymmetricunit.TospecifythisasinputstoRosettaCM,weneedtolistthefastafile,separatingthechainsbytheslashcharacter‘/’.ThisisreallyonlynecessaryinthefastaprovidedasinputtoRosettaCM(nextstep)however,thereisnoharmisdoingthisineverystep.

Example3B:RunningRosettaCMusingasingletemplatemodelasinput.

LikethemethodsintroducedinScenario1,RosettaCMiscontrolledthroughanXMLscriptusingRosettaScripts.TheXMLisasfollows(3_model_rebuilding/C_rosettaCM_singletarget.xml):

Page 15: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

<ROSETTASCRIPTS> <TASKOPERATIONS> </TASKOPERATIONS> <SCOREFXNS> <ScoreFunction name="stage1" weights="score3" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="10"/> </ScoreFunction> <ScoreFunction name="stage2" weights="score4_smooth_cart" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="10"/> </ScoreFunction> <ScoreFunction name="fullatom" weights="beta_cart" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="35"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <FILTERS> </FILTERS> <MOVERS> <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" stage1_increase_cycles="1.0" stage2_increase_cycles="1.0" linmin_only="0" realign_domains="0"> <Template pdb="1iruA_thread.pdb" weight="1.0" cst_file="AUTO" symmdef="D7_edit.symm"/> </Hybridize> </MOVERS> <PROTOCOLS> <Add mover="hybridize"/> </PROTOCOLS> </ROSETTASCRIPTS>

Themainworkisdonethroughasinglemover,Hybridizewhichhandlesallstagesofmodel-building.InputstructuresarespecifiedviaTemplatelines(inthiscasethereisonlyone).Foreachtemplateline,wespecifythepdbinput,aswellasacoupleofotherparameters:aweight(therelativefrequencywesampleeachtemplatewith);aconstraintfile(settingthisto"auto"setsupautomaticconstraintstothetemplate,whilesettingthisto"none"turnsoffallconstraints,user-definedconstraintsaredescribedlater);andan(optional)symmetrydefinitionfile.

Note:Besurethatyourtemplatesarealignedtothedensity!

GiventhisXML,RosettaCMisthenrunwiththefollowingcommandline(3_model_rebuilding/B1_rosettaCM_singletarget.sh):

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in:file:fasta t20s.fasta \ -parser:protocol C_rosettaCM_singletarget.xml \ -nstruct 50 \ -relax:jump_move true \ -relax:dualspace \ -out::suffix _singletgt \ -edensity::mapfile t20S_41A_half1.mrc \ -edensity::mapreso 5.0 \ -edensity::cryoem_scatterers \ -beta \ -default_max_cycles 200

Theinputcommandissimilartothoseseenbefore,butwithafewkeydifferences.First,theinputtoRosetta

Page 16: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

isspecifiedwith-in:file:fastaratherthan-in:file:s.Alsonotethattheinputargument–nstruct50isgiven,tellingRosettatogenerate50modelsforeachprocess.Generally,hundredstothousandsofmodelsarenecessarytosufficientlysampleconformationalspace;moreandlongerregionstorebuildrequiremoremodels.

Runningwithoutsymmetry

RunningwithoutsymmetryrequiresonlytwosmallchangestotheXMLfile:• Removethetag:symmdef="D7_edit.symm"• Removethethreetagssymmetric=1

JobdistributionAswithsection2,acombinationof-out::suffixandGNUparallelisusefulfordistributingjobs.Forexample,onemayreplacethe–out::suffixlineabovewith–out::suffix_$1,thenlaunchjobswith:

parallel –j16 ./B1_rosettaCM_singletarget.sh {} ::: {1..16}

Thetotalnumberofstructuresgeneratedisthenumberofstructuresspecifiedwith–nstructtimesthenumberofjobslaunched(inthisinstance,50structuretimes16jobs=800structures).Dependingontheruntimeperstructure(variabledependingonstructuresize)andthenumberofCPUsavailable,bothofthesenumbersmaybeadjusted.

Finally,GNUparallelallowslaunchingofjobsremotelyifSSHkeyshavebeensetupforpasswordlesslogin.Torun:

parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./B1_rosettaCM_singletarget.sh {} ::: {1..48}

Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentationformoreinformation.r

Example3C:RunningRosettaCMusingmultipletemplatemodelsasinput.

OneofthestrengthsofRosettaCMisitsabilitytomakeuseofmultipletemplatestructures,andtorecombineportionsofthesemodelsduringconformationalsampling.Thisisparticularlyusefulwhenmultiplehomologousstructuresareavailable,somewithclosersequenceidentity,andsomewithmorecompletecoverage.Theprotocolallowsthecombinationoffeaturesofbothmodels.

Tomakeuseofthisfeature,wesimplyaddadditionaltemplatelinesintheinputXML.Inthiscase,weaddthetemplate1ryp(scenario3_model_rebuilding/C1_rosettaCM_multitarget.xml):

... <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" stage1_increase_cycles="1.0" stage2_increase_cycles="1.0" linmin_only="0" realign_domains="0"> <Template pdb="1iruA_thread.pdb" weight="1.0" cst_file="auto" symmdef="D7_edit.symm"/> <Template pdb="1rypA_thread.pdb" weight="1.0" cst_file="auto" symmdef="D7_edit.symm"/> </Hybridize> ...

Therestishandledautomaticallybytheprotocol.However,thereareafewcaveatswhenusingmultiple

Page 17: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

inputstructures:

• Withdensity,weneedtoensurethatallinputmodelsarealignedtothedensity.ThiscanbedoneusingeitherTMalignorChimera’salignmenttools.

• Ineachtrajectory,astartingmodelischosenatrandom;theconstraintsandsymmetryfromthisselectedmodelarechosenatthestartofeachrun.Ifwewishtouseaportionofamodel,butdonotwanttouseitssymmetryorconstraints,wecanassignitaweightof0:backboneconformationsfromthismodelwillbeusedinconformationalsampling,butthesymmetryandconstraintswillneverbeused.

• Similarly,gapsintheselectedstartingmodelarerebuiltbeforerecombinationoccurs.Ifoneofthetemplateshaspoorcoverage,butprovidesvaluablestructuralfeatures,itshouldbeused,butwithweight0.

Example3D:RunningRosettaCMwithuserspecifiedconstraints.

AnotherstrengthofRosettaCMistheabilitytomakeuseofadditionalexperimentalinformationthatprovidesrestraintsoverconformationalspace.Whilepreviously,wehaveusedcst_file=autotoautomaticallygenerateconstraintsfromtemplatestructures,ifexperimentsprovidedistanceconstraints(orsomeotherpositionalrestraint,wemaymakeuseoftheminmodelrebuildingaswell.

TheRosettadocumentationprovidesagoodoverviewofthetypesofconstraintsthatmaybeused,withanumberofdifferentconstrainttypesandfunctionalformspossible.Forthisdemo,wewillassumewehaveknowledgeonthedistancebetweenresidues107and143thatwewanttouseduringrebuilding.

Thisinformationcanbeencodedinaconstraintfile(scenario3_model_rebuilding/D1_constraints.cst):

AtomPair CA 107 CA 143 HARMONIC 5.0 1.0

Note:Thenumberingofresiduesisbasedupontheorderintheinputfastafile(anddoesnotresetbetweenchains!).

Wethenreplacethecst_file=autolinesintheXMLwithourownconstraintfile(scenario3_model_rebuilding/D1_constraints.xml):

... <Template pdb="tmpl_thread_aln.pdb" weight="1.0" cst_file="D1_constraints.cst" symmdef="D7_edit.symm"/> ...

Wecanthenrebuildandrefineasbefore.

Example3E:ModelselectionandrunningRosettaCMiteratively

Withpossiblyhundredsofgeneratedmodels,thereareafewstrategiestoidentifythebest-sampledmodels.Generally,modelsshouldbefilteredontwodifferentcriteria–thetotalscoreandthedensityscore–insomeway.Weoftenselectthebest10-20%ofmodelsbasedontotalscore,andthesortthesemodelsbydensityscore,butvisualinspectionofthebestbybothcriteriacanoftenbebeneficialindifficultcases.

Finally,onestrategyforsolvingdifficultstructuresistoapplyRosettaCMiteratively.Usingtheabovecriteria,wecanselectthebest5-10modelsfromthefirstroundofrefinement,andfeedthemasinputsintothenextround.Modelscanbeselectedbyenergybylookingatthescorecolumnintheoutput.scfiles.

Page 18: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

ThisisverybrieflyillustratedinthefollowingXML(scenario3_model_rebuilding/E1_rosettaCM_iter.xml):

... <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1"> <Template pdb="expected_outputs/S_multitgt_0001_A.pdb" weight="1.0" cst_file="NONE" symmdef="D7_edit.symm"/> <Template pdb="expected_outputs/S_multitgt_0002_A.pdb" weight="1.0" cst_file="NONE" symmdef="D7_edit.symm"/> <Template pdb="expected_outputs/S_multitgt_0003_A.pdb" weight="1.0" cst_file="NONE" symmdef="D7_edit.symm"/> </Hybridize> ...

Thereisalsosomemanipulationofinputmodelsthatcanprovebeneficial.Ifonewantstoforcerebuildingsomesegmentofbackbone,theycansimplydeletethoseresiduesinallinputmodels.Similarlyifonewantstoforcesomeregiontoadoptaconformationtakinginoneinputmodel,theycandeleteallotherconformationsfromallmodels.

Example3F:RunningwithLigandsand/orNucleicAcids

RosettaCMcanberunwithligandsaswellasnucleicacids.WhilenucleicacidsarereadbyRosettanatively,ligandsrequireanadditionalinputtoRosetta,aparamsfile.

Forligands,amol2fileoftheligandwhichcontainshydrogenatomsisrequired.TheprogramOpenBabel(http://openbabel.org/wiki/Main_Page)iscapableofbothconvertingfromPDBtomol2aswellasaddinghydrogenstoamoleculegivenaPDBfileoftheligand.

Givenamol2file,molfile_to_params.py,includedinRosetta,at$ROSETTA3/source/scripts/python/public/willgenerateaparamsfile.Touse:

python $ROSETTA3/source/scripts/python/apps/public/molfile_to_params.py \ --keep-names --clobber –centroid XXX.mol2 -p XXX -n XXX

Replace"XXX"withtheligandsthreelettercode.ThisscriptwillcreateanXXX.paramsfileaswellasseveralotherfiles.ThisfilecanbepassedintoRosettausingtheflag:-extra_res_fa XXX.fa.params -extra_res_cen XXX.cen.params

Ifthereismorethanoneuniqueligand,runthescriptoneachuniqueligand,andpassalistofparamsfilesusingeachoftheflags.WhenmodellingwithligandsornucleicacidsinRosettaCM,twoadditionalthingsareneeded:1.TheligandornucleicacidsmustbeaddedtotheENDofeachtemplatefilewithaweight>02.AnadditionalflagneedstogetaddedtoHybridize:<Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" add_hetatm="1">

Page 19: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

4)Denovomodel-buildingguidedbyexperimentaldensitydata

Inthisscenario,weintroduceatool,denovo_density,aimedatautomaticallybuildingbackboneandplacingsequencein3-4.5ÅcryoEMdensitymaps.Thistoolisprimarilyintendedforcaseswhereamodelistobebuiltwithnoknownstructuralhomologues.Itisrelativelyexpensivecomputationally,andconsistsoffourbasicsteps:

• Searchforlocalbackbone"fragments"inthedensitymap• Scorethe"compatability"ofsetsofplacedfragments• MonteCarlosamplingforthe"maximallycompatable"fragmentset• Consensusassignmentfromthebest-scoringMonteCarlotrajectories

Theinputofthemethodisasegmenteddensitymap,andthesequencecontainedinthisregion.Theoutputofthemethodisa"partialmodel"that–ideally–places70-80%ofthesequenceintothedensitymap.ThispartialmodelcanthenbeusedasinputforRosettaCM(seesection3ofthetutorial).

Thismethodisunderconstantdevelopment.Currentlimitationsoftheapproach–hopefullytobeaddressedinfuturerevisions–include:

• Thecodeassumessegmenteddensitymaps.Itcannothandlesymmetry,andpoorlyhandlesunsegmenteddensitymaps.Resultsarebestwhenthemapissegmentstoonlycontainonecopyoftheresiduesgettingbuilt.

• Ithasonlybeentestedbuildingproteins~600residuesorless.Whileitshouldconceptuallyscaletolargerproteins,thisisuntested.Furthermore,withlargerproteins,thememoryusageofsteps2and3increasessignificantly,socaremustbetaken.

• Itcurrentlydoesnotidentifyandbuildligandsornucleicacids

Thissectionofthetutorialwalksthroughthesefourstepsoftheprotocol.Asarunningexample,weagainusethe20Sproteasome(XuemingLietal.,NatureMethods,2013),inthiscaseusinga4.8Åreconstructiondeterminedwithoutusingmotioncorrection.Wewillpretendthatknownhomologousstructuresarenotavailable,andinsteadwillbuildmodelsintodensitydenovo.

Inputfilepreparation:downloadfragmentfilesfromRobetta

Beforerunningthemethod,ausermustfirstcreatea"fragmentfile"thatpredictslocalbackboneconformationsgiventheamino-acidsequence.Theeasiestwaytodosoistosubmityoursequenceathttp://robetta.bakerlab.org/.

Alternately,theRosettausersguidedescribeshowfragmentfilesmaybebuiltlocally:(https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/dc/d10/app_fragment_picker.html)

Step4A.Localfragmentsearch

Inthefirstpartoftheprocedure,wesearchthedensitymapforeachsequence-predictedbackbonefragment.Thispart,likeallthestepsinthissection,usesaRosettaapplicationdenovo_density.

Thecommandtorunfragmentsearchingforasingleresidue(4_denovo_demo/A_search.sh):

Page 20: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -in::file::fasta t20sA.fasta \ -fragfile ./t001_.25.9mers \ -mapfile ./T20S_48A_alpha_chainA.mrc \ -n_to_search 500 -n_filtered 2500 -n_output 100 \ -bw 16 \ -atom_mask_min 2 \ -atom_mask 3 \ -clust_radius 3 \ -clust_oversample 4 \

-point_radius 3 \ -movestep 1 \ -delR 2 \ -frag_dens 0.8 \ -ncyc 3 \ -min_bb false \ -pos $1 \ -out:file:silent round1/t20s.$1.silent

Mostoftheargumentsshownhereshouldbeleftas-is.However,thereareafew–highlightedaboveinboldface–thatyoumightwanttochange:-n_to_search 500 -n_filtered 2500

Theseflagscontrolthenumberoftranslationstosearch,andthenumberofintermediatesolutionstokeep.Asaruleofthumb,theseshouldbeabout2and10timesthenumberofresiduesinthemap,respectively.

Makenoteofthefollowingflag:-pos $1

Thisflagtellsthecodetoonlysearchforfragmentsattheassignedpositions.Thisallowsforparallelizationofthescript,byrunningseparatejobsforeachpositionintheprotein.($1meansthescripttakesthepositionasaninputargument).

Forthisstep,youneedtorunthescriptonceforeachpositionintheprotein.Thiscanbedoneverysimplywiththebashcommand(forthiscase,the221indicatesthereare221residuesintheprotein):for i in `seq 1 221`; do ./A_search.sh $i; done

Theoutputofthisscriptisasinglefileforeachpositionintheprotein,thatidentifiestheplacementandconfigurationofeachdockedfragment.Thesefilesareusedasinputforthenextstepoftheprocess.

However,aseachpositionrunsindependently,andeachpositionmighttake30minutestoanhourforthesearch,youwillprobablywanttoparallelizethisovermanyprocessors.

Jobdistribution

Asthisisthemostcomputationallyintensivestep,itmakessensetoparallelizethisstep,byrunthisprocedureseparatelyforeachpositionintheprotein.UsingGNUparallel.parallel –j16 ./A_search.sh {} ::: {1..221}

GNUparallelallowslaunchingofjobsremotelyifSSHkeyshavebeensetupforpasswordlesslogin.Torun:

Page 21: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./A_search.sh {} ::: {1..221}

Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentationformoreinformation.

Otherusefuloptions

Therearetwooptionscontrollingfragmentplacementthatmayalsobeusefulincaseswherethereissomepreviousknowledgeaboutthestructureinquestion.Thefirstofthesedealswiththecasewherethebackbonestructureisknown(oratleastsomewhatknown)butregistrationofthesequencewiththebackbonemodelisambiguous.Inthiscase,aknownbackbonemodelcanbeprovidedwiththeflag:

-ca_positions backbone.pdb

Inthiscase,thecodewillonlyconsiderfragmentscenteredontheCalphapositionsfromtheinputmodel.Thisoffersasignificantspeedupaswellasreducedsearchspace.

Alternately,ifpartofthestructureisknowninadvance,itmaybeprovidedwiththeflag:

-startmodel start.pdb

Inthiscase,thematchingroutinewillmatchonlythenativefragmentscoveredbystartmodel,ensuringthatthesepositionswillbemaintainedthroughouttherefinement.Whenusingthisoptionstherearetwoimportantthingstokeepinmind:

• ThenumberinginthePDBfilemustmatchthenumberingoftheinputfasta• Anycontinuoussegmentsshorterthan9aminoacidsintheinputfilewillgetignored

Step4B.Placedfragmentscoring

Inthisstep,wewanttotaketheplacementsfromthepreviousstepandscorethemforcompatability.TheoutputsfromstepAareusedasinputsinthisstep(4_denovo_demo/B_score.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -mode score \ -in::file::silent round1/t20s*silent \ -scorefile round1/scores1 \ -n_matches 50

Highlightedinboldaretheinputfiles(-in::file::silent)–theoutputsfromthepreviousstep–andthescorefiletobewritten(-scorefile),theoutputofthisstep.Iftheoutputiswrittentoaseparatefolder,youwillneedtopointthecommandlinetothisalternatelocation.Thisstepisrelativelyfast(lessthan5minutes)andcanberunonasingleprocessor.

Step4C.MonteCarlofragmentassembly

Inthethirdstep,weusetheoutputsfromtheprevioustwosteps,andtrytogeneratea"maximally

Page 22: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

consistent"fragmentassignment.ItusesMonteCarlosamplingandascorefunctionassessingfragmentcompatibilitytoidentifythisfragmentset.Thecommandline(4_denovo_demo/C_assemble.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -mode assemble \ -nstruct 5 \ -in::file::silent round1/t20s*silent \ -scorefile round1/scores1 \ -assembly_weights 4 20 6 \ -null_weight -150 \ -out:file:silent round1/assembled.$1 \ -scale_cycles 1 \ -mute core

AswithstepB,theoutputsoftheprevioustwostepsneedtobeprovidedasinputs:-in::file::silent round1/t20s*silent -scorefile round1/scores1

Thescriptthenwritesasinglefileforeachindependenttrajectory:-out:file:silent round1/assembled.$1

Finally,eachjobwillgenerateseveral(inthiscase5)independenttrajectories:-nstruct 5

Itisrecommendedtogenerateatotalof1000independenttrajectories.AswithstepA,thiscanbesomewhattime-consuming(thoughnotastime-consumingasstepA).Thereforeitisrecommendedtoparallelizethisasbefore:parallel –j16 ./C_assemble.sh {} ::: {1..200}

Oracrossmultiplemachines:

parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./C_assemble.sh {} ::: {1..200}

Step4D.Consensusassignment

Thefinalstepoftheprotocolistoidentifytheconsensusassignmentfromthelowest-scoringMonteCarlotrajectories.Thisisdoneusingthefollowingcommand(4_denovo_demo/D_consensus.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -mode consensus \ -in::file::silent round1/assembled.*silent \ -consensus_frac 1.0 -energy_cut 0.05 \ -mute core

Theoutputtrajectoriesofthepreviousstepareprovidedasinputtothisscriptwith–in::file::silent.Thisscriptlooksforaconsensusassignmentinthebest-scoringtrajectories,andwilloutputaPDBfile,S_0001.pdb.

Page 23: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

ThisPDBfilecontainssequenceplacedintodensity.Ideally,themodelatthispointismorethan70%complete,andthisfilecanthenbeusedasinputtoRosettaCM(seesection3).Ifinsteadthisstructurecontainsareasonablepartialmodel,butwithlessthan70%coverage,theiterativeapproachofthenextsectioncanfurtherimprovethecoverageofthepartialmodel.

Step4E.Iterativeassemblytoincreasemodelcoverage

Insomecases,itmaybenecessarytoiteraterefinement,assubsequentroundsofdenovobuildingmaytraceportionsofthemodelunabletobeplacedinpreviousrounds.Thefollowingcommandlineillustrateshowassemblymaybeiterated(4_denovo_demo/E_search_iter.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -in::file::fasta t20sA.fasta \ -fragfile ./t001_.25.9mers \ -startmodel round1_model.pdb \ -mapfile ./T20S_48A_alpha_chainA.mrc \ -n_to_search 250 -n_filtered 1250 -n_output 50 \ -bw 16 \ -atom_mask_min 2 \ -atom_mask 3 \ -clust_radius 3 \ -clust_oversample 4 \

-point_radius 3 \ -movestep 1 \ -delR 2 \ -frag_dens 0.8 \ -ncyc 3 \ -min_bb false \ -pos $1 \ -out:file:silent round2/t20s.$1.silent

ThisstepisnearlyidenticaltostepA,withtwokeychageshighlightedinbold.Thefirstchangeindicatesthattheoutputofthepreviousstepistobeusedasaninitialmodel:-startmodel round1_model.pdb

Followinginspection,thismodelmayalsobemanuallyeditedaswell.Thesecondchangemakessuretheoutputsdon’tclobbertheoutputsfromthepreviousstep:-out:file:silent t20s.rd2.$1.silent

AswithstepA,thisshouldbeparallelizedovermanyprocessors,e.g,usingGNUparallel:parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./E_search_iter.sh {} ::: {1..221} Oncecomplete,stepsB,C,andDmaybethenfollowedinorder,makingsurethattheinputandoutputfilesareupdatedtoindicatetheround2models.NOTE:Ateachstep,becertainthatyouroutputsarenotoverwritingoneanother.Oncethepartialmodelhasbeencalculated,itissafetodeletetheintermediatefilescreatedaspartoftheconstructionprocess.Inthiscase,thesameoutputfilenamesmaybereused(thus,thesamescriptscanbeusedforeachiterationasidefromthefirst).

Page 24: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

5)Completingpartialmodelsguidedbyexperimentaldensitydata

Whilethepreviousworkflowshaveaddressmodelbuildingandmodelrefinement,noneoftheaforementionedtoolsdealwithcompletionoflargesegmentsofprotein.Thesemayariseinseveralcases:

1. Homologymodels(particularlydistantones)mayhavelargeinsertions,orevenentiredomainsthatarelacking.

2. Themodelsproducedfromdenovo_densitymaybemissingsignificantfractionsofthebackbone3. Itmaybedifficulttomanuallytracelongstretchesoflowlocalresolutionintodensity

Toaddresstheseissues,wehavedevelopedatoolcalledRosettaEnumerativeSampling,whichusesaensemblesearchalgorithmtodeterminealargenumberofconformationsthatarebothconsistentwiththedensityandtheRosettaenergyfunction.Thistoolcanbeusedonapartialmodelsfromthedenovo_denistyapplication,anincompletehomologymodel,oranyotherstartingstructure. RosettaESrunsbestwhenworkingwithdataatresolutions5Åorbetterwithsegmentstorebuildshorterthan50residues.However,withverylargeamountsofsampling(e.g.,ensemblesizes>250),reliablemodelsmaybeproducedwithsegmentslongerthan100residues.Atresolutionsworsethan5Å,thistoolmaybeunreliable.Themethodcanbeusedonbothsegmentedandunsegmenteddensitymaps,however,removalofdensitybelongingtopartsofthestructurenotbeingmodeledmayimproveresults. RosettaESmodelbuildingconsistsofthreesteps.Initially,apreparationstepbuildsthefragmentsthataretobeusedinconformationalsampling.Thenarebuildingstepwillidentifyeachunassignedsegmentintheinitialmodelandbuildanensembleofpossiblesolutionsforeach.Finally,acombinationstepfindsalltheconsistentsubsetsofinteractions,andrefinesallsuchmodels(ifthereisonlyonesegment,thescriptsimplyrefinesallstructuresintheensemble).Inthiscombinationstep,ifassemblyfailstofindaconsistentsetofsolutions,anadditionalroundofsamplingwillbecarriedout,forcingdifferentsolutionsthanthepreviousmodel. Comparedtotheothersections,theworkflowisabitmorecomplicatedwhenextendedtomultiplecomputecores.TohandlejobdistributionwehaveincludedapythonscriptRunRosettaES.pythatmanagesthisjobdistributionamongavailableCPUsonasinglemachine.(ThescriptisincludedaspartofRosetta,in/main/source/scripts/python/public/EnumerativeSampling,aswellasinthistutorial).Fordealingwithjobschedulersorclustersincompatiblewiththisscript,section5EgivesanoverviewofjobdistributionwithRosettaES.

Step5A.FragmentPicking

Thefirststep–muchlikeScenario4–involvesselectionof"fragmentfiles,"whichpredictbackboneconformationfromlocalsequence.UnlikeScenario4,wehaveacustomalgorithmforfragmentpicking.ThesefragmentswillneedtobegeneratedbeforerunningRosettaES;thefollowingcommandwillgeneratethesefiles(5_rosettaES/A_PickFragments.sh): $ROSETTA3/source/bin/grower_prep.default.macosclangrelease \ -pdb input.pdb \ -in::file::fasta t20sA.fasta \ -fragsizes 3 9 \ -fragamounts 100 20

Thiswillgenerate1003residuefragmentsand209residuefragments,named100.3mersand20.9mers,thatarethenusedinsubsequentstepsoftherebuildingprocess.

Page 25: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Step5B.GeneratePossibleConformationsForEachSegment Thegrowerconsidersassigningpositionsforeachunassignedsegmentofdensity(thatis,eachstretchofaminoacidspresentinthefastafilebutmissingfromtheinputstructure).Eachsegmentisreferredtousingasegmentid,inwhicheachsegmentisnumberedfromN-toC-terminus(withmultiplechainsgiveninorderintheinputfastafile).Thescriptisrunintwoparts:first,thescriptisrunonceforeachsegmenttorebuild;then,thescriptisrunin“assemblymode”giventheoutputsproducedbyrebuildingeachsegmentindividually.Thus,forrebuildingthetwosegmentsinthetestcase,thescriptiscalledthreetimes:oncetobuildeachsegment,andoncetoassembletheresults. Inthefirststep,weperformconformationalsamplingofeachofthetwosegments,generatinganensembleofputativesolutionsforeach.Thiscanbedonecallingthecommand(5_rosettaES/B1_SampleSegment1.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f t20sA.fasta \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -l 1 \ -c 16 \ -n loop_1

Theargumentstothisprogramareasfollows:

• -rsrunES.sh-thescriptthatislaunchedoneachcoreandcontainsRosettaflagsandinputs • -xRosettaES.xml-theXMLscriptdescribingparametersforconformationalsampling(seebelow) • -ft20sA.fasta-theinputfastafile(withchainbreaksspecifiedby‘/’) • -pinput.pdb-theinputpdbfile.Thisneedstomatchtheinputsequence,andallresiduespresentin

thefastabutabsentinthePDBwillgetbuilt. • -dT20S_48A_alpha_chainA.mrc-theinputdensitymap • -l1-thesegmentidofthesegmenttorebuild.Thiscommandshouldbecalledonceforeachsegment

torebuild,varyingthisargumentfrom1toN • -c16-thenumberofcomputecorestouse • -nloop_1-theoutputtagforthisjob(resultswillbeplacedinafolderwiththisname).Tagsshould

beuniqueforeachsegment. TheinputXMLfileexposeskeyparametersforconformationalsampling.Inthetutorial,thisfile,5_rosettaES/RosettaES.xml,containsablock: ... <FragmentExtension name="ext" fasta="full.fasta" scorefxn="dens" censcorefxn="cendens" beamwidth="32" dumpbeam="0" samplesheets="1" read_from_file="0" continuous_weight="0.3" looporder="1" comparatorrounds=”100” windowdensweight=”30” readbeams="%%readbeams%%" storedbeams="%%beams%%" steps="%%steps%%" pcount="%%pcount%%" filterprevious="%%filterprevious%%" filterbeams="%%filterbeams%%"> <Fragments fragfile="100.3mers"/> <Fragments fragfile="20.9mers"/> </FragmentExtension> ...

ThesamplingbehaviorofRosettaESiscontrolledbytheblockabove.Manyofthetagsinthisblock–fasta,dumpbeam,read_from_file,storedbeams,steps,pcount,filterprevious,comparitorrounds,andfilterbeams–areusedbythejobdistributionscripttopassresultsfromonesteptothenext,andtheyshouldbeleftas-is.

Page 26: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Othersareuser-specified,andcanbemodifiedbasedonthesizeoftheloopandresolutionofthedata: • beamwidth:controlsthemaximumnumberofsolutionstobeheldateachstep.

Settingthevaluehigherwillincreaseruntimebutmayimproveaccuracy. • windowdensweight:therelativecontributionofdensityinmodelselection

Formanycases,thedefaultparametersaresufficient.However,ifthesegmenttogrowislong(50+residues),youmayneedtoincreasebeamwidth;ifthedensityislowresolution,youmightneedtodecreasewindowdensweightto15or20. Severaloptionsshouldrarelybemodified,butmayneedtobeinspecificcases:

• samplesheets:Controlswhetherornotbetasheetsamplingshouldbeperformed.Itisrecommendedtousethisexceptwhenworkingwithsymmetricsystems.

• continuous_weight:Controlsthepenaltyondiscontinuousdensity.Settingthisvalueto1willcompletelyremoveanypenaltyondiscontinuousdensity;settingitcloserto0willincreasethepenalty.Youmaywishtoraisethisvalueto0.7(ormore)ifyouanticipatethesegmentyouaretryingtomodeldoesnotfollowacontinuouspathofdensity.

Finally,theoptioncomparitorroundsisusedinmulti-segmentassembly(seesection5C)AfterrunningthescriptwiththisXML,therearetwoimportantintermediateoutputfiles,placedinthefolderloop_1(theargumentto-n):

• .lps(forlooppartialsolution)files,whicharethencombinedinstep5C,incaseswheretherearemultiplesegmentstomodel

• taboo/beamX.txtfiles,whereXcorrespondstothenumberofresiduesaddedtothesegment.Thesearegeneratedasthesearchaddsresidues,andareusedtopassinformationfromonesteptothenext(asadditionalresiduesareaddedinasinglesegment).

Note:ThisprocessshouldthenberepeatedforallremainingsegmentstorebuildInthetutorial,thecommand5_rosettaES/B2_SampleSegment2.shbuildsconformationsforthesecondsegmentinthisfile.Allsegmentscanbesampledindependentlyofoneanother,soifmanycomputenodesareavailable,eachsegmentcanbesampledsimultaneouslyonseparatenodes.Finally,whileinmostcases,userswillwanttotakethesemodelsintotheassemblystep(part5C),ifthereisonlyonesegmenttorebuild,orifthesamplingresultswanttobeinspected,thefinaloutputensemblecanbesavedasPDBfileswiththecommand(5_rosettaES/B1.2_InspectIntermediates.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f t20sA.fasta \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -l 1 \ -db loop_1/taboo/beam17.txt

Note,thenumberofthebeamfile(17)correspondstothetotalnumberofresiduesbuilt.Intermediateresults(aftergrowingNresidues)canbeinspectedbychangingthistoalowernumber(e.g.,beam14.txtshowssolutionsafter14ofthe17residueshavebeenrebuilt).

Page 27: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Step5C.Findasetofconsistentconformations.

Incaseswheretherearemultipleinteractingsegments,wewanttofindallnonclashingcombinations.Thisstepwilltakethelooppartialsolution(lps)filesgeneratedinstepBanduseaMonteCarloAssembly(MCA)algorithminordertoidentifysetsofsolutionsthatareself-consistent.ThissectionassumesthatallmissingsegmentshavebeenbuiltinstepB.Torunthisassembly,weperformconformationalsamplingusingthescript,passingthe.lpsfilesgeneratedinstepB(5_rosettaES/C_AssembleResults.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f t20sA.fasta \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -lps loop_*/lps*.txt

Theflag-lpspointstotheoutputsoftheindividualsegmentjobs’output.Asbefore,thescriptwilluse(andmodify)aninputXMLfile.Forassembly,onlyasingleparameterisimportanttomodify:comparatorrounds="100”.ThisparametercontrolsthenumberofMonteCarlotrajectories(itisunlikelyyouwillneedtochangethisparameter). TheoutputofthisstepisPDBfiles,thatwillbeplacedintheworkingdirectorywiththeprefixaftercomparator_RRR_XXX.pdb,whereRRRisthetrajectoryid,andXXXistheenergy.Atextfilerecommendation.txtwillbewrittenthatreportstheclashscoreofthebestmodel. Ifthenumberinrecommendation.txtisabove+100itisrecommendyouperformadditionalroundsofsampling.Todoso,additionalpotentialsolutionsshouldbesampledbyrepeatingstepBandprovidingthesamedirectorynameasinput,withoutdeletingtheintermediatefiles.Thescriptwillthenenterthatdirectoryandusethealready-computedsolutions(storedinthefolder"taboo")toguidesamplingtowardpreviouslyunexploredregions.SeesectionDformoredetails. Ifthenumberinrecommendation.txtisbelow+100,thenmodelsshouldberunthroughRosettarefinementtoaccuratelyrankthem.Thatcanbedoneusingthiscommand(5_rosettaES/D_RefineOutput): python RunRosettaES.py \ -rlxs runrelax.sh \ -x RosettaES.xml \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -c 16 \ -rp aftercomparator_*.pdb

Whererunrelax.shcontainsacommandforrelaxingstructuresinRosetta.

Step5D.Interpretingresults RosettaESwillproducea100modelsasoutput(orwhateverisspecifiedincomparatorrounds),withscoresinthefilescore.sc.Whenfirstlookingattheoutputfromarun,itisgoodtovisuallyinspectthelowest-energy5-10structures.Ideally,theselowest-energymodelswillbeverytightlyconverged,butthelackofconvergencedoesnotindicatefailureinsampling,andthepresenceofconvergencedoesnotnecessaryindicatethesolutioniscorrect.Instead,thelowest-scoringmodelsshouldbeexaminedwithattentiontothefollowing:

Page 28: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

Insufficientsampling.ModelsshouldinitiallybeinspectedforclearlyincorrectfeaturesthatmightnothavebeenproperlypenalizedbyRosetta.Theseinclude:

• unexplaineddensity,particularlysmallsidechainsplacedintoalargedensityprotrusions • regionswithpoorfittodensity • unresolvedclashes

Ifthesefeaturesarepresentinthelowest-energymodels,itsuggeststhattheconformationalsamplingperformedbyRosettaESmaynothavebeensufficient.Thereareseveralwaystoaddressthisissue.Thefirstistoincreaseconformationalsampling.Thiscanbedonebyeither:a)increasedthebeamwidthparameterandrerunninganew,b)and/orperformingadditionalroundsoftaboosearch(taboosamplingoccursautomaticallyifthe“taboo/”folderintherunningdirectoryispopulatedwithbeamfiles”),orc)reducingthesearchspacebyeliminatingregionsofdensity. Thelattercanalsobeperformedbymanuallyremovingpartsofthemapyoudonotwishtosampleorbyincludingadditionalportionsofthemodel,forexampleifyouarebuildingamodelthathas4missingregionsandyoubelieveyouhaveaccuratelysampled3ofthem(theydonotpossesanyofthepathologieslistedaboveandfitthedensitywell),youcancombinethembytreatingthemastemplatesforRosettaCM(useafastafilewiththemissingsegmentremoved)andusetheresultasinputforRosettaEStobuildthelastregion. Unresolvedresidues.RosettaESwillalwaysattempttobuildallresiduespresentinthefastafile,however,inmanycasesnotallresidueswillberesolvedinthemap(particularlyattermini).BecauseRosettaESwillheavilypenalizemodelsthatdonotfitthedensity,youwilloftenfindmodelsthatare"overlycompacted"atterminiorinternalloops,totrytosqueezetheseresiduesintodensity. Ifthishappensatterminiitissuggestedthatyouexamineintermediatestructuresthathaveyettoattempttoassigntheseunresolvedresiduesinordertofindagoodmodel.Ifthishappensinternally(orifyouhaveagoodideaaprioriwhatresiduesshouldbemodeled),theseregionscanberemovedfromtheinputfasta.Ifinternalsegmentsaredeleted,besuretotreatthedeletioncorrectly,byputtinga'/'inthefastafile. Step5E.Customizingjobdistribution. JobdistributioninRosettaESiscomplicated,sinceeach"growingstep"canbeparallelized,butsubsequentstepsneedalltheinformationfromthepreviousstep.Consequently,thescriptprovided(RosettaES.py)managesjobs,bycallingRosettajobsateachround,collectingandcombiningtheresultsofthepreviousround,thensplittinginputfilesforthenextround.Onsystemswhereitisnotpossibletorunthisscript,thissectiondescribesinsomedetailwhatthescriptisdoing. TomanagethisthepythonscriptusestheprovidedXMLfileasinput,rewritingseveralparametersdependingontheprotocolstep.Thesevaryingparametersinclude:

• readbeams,abooleanoptionthattellsRosettawhetheritshouldloadintermediatesolutions • beamfile,thenameofthebeamfilethatstorestheintermediatesolutions • steps,howmanyresiduestoadd(0meansonlydofiltering,otherwisesetto1,settingto“-1”will

buildallmissingresidues) • pcount,usedtouniquelytagoutputfilesfromeachcore • filterprevious,abooleanoptionthatcontrolswhetherintermediatesolutionsshouldbereadfortaboo

search • filterbeams,thefilenameoftheintermediatesolutionsforuseintaboo.

Inordertobuildamissingsegmentthescriptwillperformthefollowingsteps:

Page 29: Tutorial: Rosetta tools for structure determination in ... · Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last

1. Launchajobwithreadbeams="0",beamfile="na",steps="1",pcount="1",filterprevious="0",filterbeams="na".Thiswillproduceafilenamedbeam1.1.txtfromRosetta.

2. Parsethebeam1.1.txtfileandsplitintoNfiles(whereNisthenumberofparalleljobstorun).Newfilesarelabeledbeam_$r.$i.txtwhere$risthenumberofresiduesaddedsofar,and$irangesfrom1toN.

3. SubmitNrosettajobswithpcountsetasthejobnumber,readbeamssetto"true",andbeamfilesettothecorrespondingoutputofstep2.

4. Outputfilesareparsedbythescriptandcompiledintoasinglefilewiththenamebeam$r.txt,with$rthenumberofresiduesgrown.

5. Rosettaisrunonasinglecoretofiltertheaggregatesolutionset.Here,readbeams=”1”,beamfile=”beam$r.txt”,andsteps="0".Afilenamedbeam_0.txtwillhavethefilteredresults.

6. Repeatsteps2-6usingthebeam_0.txtastheinputforstep2.ThisisdoneuntilthesegmentiscompleteandRosettaproducesanemptyfilecalledfinished.txt.Thepresenceofthisfiletriggerstheprogramtoperformonefinalroundoffilteringandexit.

ThelaststepofRosettaESwilladditionallyproduceafilewiththename"lpsfile_$s.0.txt,"usedforassembly.Torunassemblywiththesefilesfirstcombinethemintoasinglefile,describedbelow,andrunwithreadfromfile=”filename”.Thisnewfileshouldstartwiththeanumbercorrespondingtothetotalnumberofmissingsegments,thenforeachmissingsegmentprovideanumberforthetotalsolutionsinthatsegmentedfollowedbythesolutioninformationcontainedinthelpsfile_$s.0.txtdescribedabove.Segmentsshouldbearrangedtomatchtheorderinwhichtheyoccurinthefasta.