2016 BD2K Investments in Training Report

Preview:

Citation preview

January2016

BD2KInvestmentsinTraining

ExecutiveSummaryTrainingisamajorlimitingfactortoextractingknowledgefromdata,andthereforeitisasignificantpartoftheBigDatatoKnowledge(BD2K)Initiative.TheprimarygoalsofBD2Kintrainingaretoincreasethenumberofbiomedicaldatascientistsandtoimprovethedatascienceskillsofallbiomedicalscientists.Becausetrainingneedsvarygreatlybasedonanindividual’spriorbackgroundandintendedusefordatascience,theBD2Kinvestmentsintrainingarealsovaried.Forthoseindividualswhoareprimarilybiomedicalscientistsanddonotintendtobecomespecialistsindatascience,BD2Ksupportscoursesandeducationalresourcesthataremeanttoenableparticipantstobecomeconversantindatascienceandattainskillstoutilizedatasciencemethods.Tosupportthoseindividualswhowishtobecomespecialistsinbiomedicaldatascience,BD2Kincludestrainingprogramsforpredoctoralstudents,researchrotationsforearlycareerscientists,andcareerdevelopmentawardsforpostdocsandmoreseniorresearchers.Tofosterthedevelopmentofnewteamsconsistingofbiomedicalscientistsanddatascientists,BD2KissupportingtheQuBBD(QuantitativeApproachestoBiomedicalBigData)ProgramalongwiththeNationalScienceFoundation.AmongthefirstBD2Kawardsissued(inSeptember2014)weretrainingawards,andtheyarestartingtobearfruit.Forexample,thefirstBD2KOpenEducationalResources,intheformofMassiveOpenOnlineCourses,havebeenreleasedandalreadyboastthousandsofgraduates;threesummercourseswereofferedandfilledbeyondcapacity;andindividualssupportedbycareerdevelopmentawardshavetransitionedtomoresecurepositions.AlthoughtheexistingawardsmadeinFY14-16arepromising,additionalinvestmentsareneededtokeepupwiththefast-changingareaofdatascience.InorderforBD2K-supportedresourcestohavemaximalimpact,theyneedtobefindable,accessible,interoperable,andreusable(FAIR).Tohelpbiomedicalscientistsfindandaccessthemostappropriatedatascienceeducationalresources,theBD2KTrainingCoordinationCenter(TCC)isdevelopinganEducationalResourceDiscoveryIndex,workingwithinternationalpartners.ThroughtheTCCandtheothertrainingawards,BD2Kaimstoimprovetheabilityoftheentirebiomedicalsciencecommunity,whetherspecialistsinbiomedicalscienceordatascience,toutilizethegrowingvolumeandcomplexityofdata.

January2016

OverallGoalsFocusingontrainingwasoneofthemainrecommendationsfromtheJune2012DataandInformaticsWorkingGroupReport.ItisalsooneofthemajorthrustareasoftheBD2KprogramandtheADDSoffice.Trainingcurrentlyaccountsfor15%oftheBD2Kbudgetandisexpectedtoramptoabout20%.Theterm“Training”ismeanttoencompasstraining,education,andworkforcedevelopmentthatprovideslearners,nomatterwhatcareerlevel,eitherfoundationalknowledgeorskillsforimmediateuse.Therearetwomaingoalsfortraining,toimprovebigdataskillsinallbiomedicalscientistsandtoincreasethenumberofpeoplewhospecializeinbiomedicaldatascience.Thesetwogoalsincludesub-goalsaswell:

1) Toimprovebigdataskillsofallbiomedicalscientistsa. Supporttrainingopportunities,bothin-personandonlineb. Ensuretrainingopportunitiesandresourcesaremorereadily

discoveredandaccessedc. Enhancediversityinthebiomedicalandbiomedicaldatascience

workforces2) Toincreasethenumberofbiomedicaldatascientists

a. Establishbiomedicaldatascienceasacareerpathb. Fostercollaborationsbetweenbiomedicalscientistsanddata

scientists

Toaccomplishthesegoals,thetrainingportfolioisdiverse.AlthoughtenFOAswereissuedinFY14andFY15,theyclusterintofivegroups:

• R25awardsforcoursesandresourcesaboutdatamanagementanddatascience(4FOAswereissuedtoallowfordifferentbudgetarycategoriesandstructures)

January2016

• R25awardstoenhancediversity(denoteddR25todistinguishitfromtheotherR25s)

• T32/T15trainingprograms(3FOAStosupportbothnewprogramsandsupplementstoexistingT32sandT15s)

• K01CareerDevelopmentaward• U24TrainingCoordinationCenter(TCC)

Atotalof67awardswereissuedbyBD2KinthesefiveFOAclusters.Inaddition,eachoftheBD2KCentersofExcellencehastrainingcomponents.BecausetheCenters’trainingisdiverseandmainlyfocusedontrainingaboutspecifictoolsdevelopedbytheCenters,theyarenotincludedinthisreport.Collectively,thefiveBD2KtrainingFOAclustersreachanaudienceofvaryingexperiencelevelsandintentions,fromundergraduatestoseniorfacultyandinstructorswhowillbeteachingdatascience.Someoftheawardsaretargetedatparticularcareerlevels.Forexample,thedR25awardsareforundergraduates,theT32/T15programsareforpredoctoraltrainees,andtheK01awardsareforpostdocsandbeyond.Otherawards,suchastheU24TCCandtheR25sareforabroaderrangeofexperiencelevels.

Eachclusteraddressesoneormoregoals.

January2016

BecauseBD2Kisatrans-NIHprogram,withfundscomingfromallICsandtheCommonFund,ICinputhasbeenactivelysolicited.TheFOAsweredevelopedbyatrans-NIHgroupofprogramdirectors,firstcalledthe“BD2KTrainingSubcommittee”andlaterreferredtoasthe“BD2KTrainingProgramManagementGroup”.Thisgrouphaswelcomedall-comers,fromallICsandtheNIH/OD,withefforttorecruitmembersthroughtwopresentationstotheTAC(TrainingAdvisoryCommittee).Allprogrammaticaspects,includingFOAdevelopment,reviewattendance,payplandevelopment,anddecisionsaboutawardmanagement,havebeendonethroughthetrans-NIHgroupthatincludes13ICs,theCommonFund,andtheOfficeofBehavioralandSocialScienceResearch.Day-to-daymanagementoftheawardsisdistributedacross6ICsinanefforttobalanceincludingasmanyICsaspossibleandensuringthatgranteesaretreateduniformly.

January2016

Goal1:ToImproveBigDataSkillsofBiomedicalScientistsToimprovethebigdataskillsofallbiomedicalscientists,trainingopportunitiesneedtobeavailable,andbiomedicalscientistsneedtofindandaccesstheonesthatbestfittheirneeds.Tothisend,BD2Ksupportsthedevelopmentoftrainingopportunitiesandtheirdisseminationtolargenumbersoflearners,aswellasinfrastructurefordiscoveringthem.Tohelpbiomedicalscientistsfind,access,andchoosetrainingopportunities,theTrainingCoordinationCenteriscreatinganEducationalResourceDiscoveryIndex(ERuDIte).ERuDIteisplannedtobeadiscoveryindexthatorganizespointerstoeducationalcontent,utilizingmetadatadescribingtheeducationalresource.UtilizingandextendingcommonmetadataisbeingpursuedthroughaninternationalcollaborationbetweentheTCCandELIXIR,aEuropean-basedfederationoforganizationsthatbuildinfrastructureforthelifesciences.ERuDIte,whencombinedwithaknowledgemapthatshowshowBigDataskillsrelatetooneanother,mayformthebasisofapersonalizedlearningsystemforbiomedicalscientiststoefficientlyacquirenewskillstotackleBigData.CreationofthecontentthatERuDIteorganizesissupportedbyBD2KprimarilythroughR25sforopeneducationalresources(OER)andshortcoursesusingFOAsHG14-008,HG14-009,LM15-001,andLM15-002.Educationalcontentmayalsocomefromothersources,includingnon-BD2Kones.SomeoftheBD2KCenters,whicheachhaveatrainingcomponent,areproducingeducationalcontentsuchasTED-liketalks.CurriculumfordatasciencecoursesmaycomefromBD2KT32/T15trainingprograms,whichweregiventheopportunitytoapplyfor$20Kinfundstodevelopandsharecurriculumofnewcourses.ThenumberofeducationalresourcessupportedbyBD2K,orevenNIH,isdwarfedbythenumbersupportedelsewhere,whetherbytheNationalScienceFoundation,theDepartmentofEducation,foundations,universities,orprivateindustry.EducationalresourcesfromallofthesesourceswillbeincludedascontentinERuDIte.TheBD2KOpenEducationalResources,Shortcourses,andDiversityprogramsallaimtointroducebiomedicalscientiststodatamanagementanddatascience.Theseprogramsweredesignedtobeflexible,toallowforinnovationandforspecializationtoaparticularaudience,domainscience,ordatascience.Althoughafewofthefundedprogramsareconfinedtoanarrowaudienceorscientificarea,mostareforgeneralaudiencesandmultipledatatypes.InFY14andFY15,atotalof29R25programswerefunded,andtheseprogramshaveabroadreachgeographically.Moredetailabouttheaudience,scientificbreadth,andreachfollow.

January2016

AudienceTheR25awardsaddressavarietyofeducationallevels(undergradstoseniorfaculty)andintendedusages(endusersorinstructors).

• Fiveprogramsfocusonhelpinginstructorsofadvancedundergrad/earlygradcourses.Examplesinclude:

o Atrain-the-trainerscourseinbiomedicaldatascienceforinstructors,whowillcollectivelydevelopacurriculumforundergraduatesatnon-research-intensivecolleges

o Curricularmaterials(slides,assessments,readingsuggestions)thatcanbeusedandadaptedbyotherinstructors

o AToolkittohelplibrariansteachdatamanagementtobiomedicalscientists

• Sevenprogramstargetundergradsdirectly,throughsummerprograms(includingdidacticandresearchexperiences)thataimtorecruitdatasciencestudentsintobiomedicalscienceortoexposebiomedicalstudentstodatamanagementanddatascience.FourofthesevenprogramshaveaprimarygoalofenhancingdiversitythroughpartneringwithBD2KCenters.

• Theremaining16programsfocusdirectlyonthegraduatestudentorthemoreadvancedlearner;althoughthedatamanagementanddatasciencematerialisintroductory,becauseitisnew,learnersarejustaslikelytobeseniorfacultyasgraduatestudents.

TheR25programsareamaincomponentofBD2K’sdiversityefforts.FourundergraduateprogramsaimtoenhancediversityinthebiomedicalworkforcethroughpartnershipsbetweentheBD2KCentersofExcellenceandlow-resourcedinstitutions.Thepartnershipssupportthedevelopmentofcurriculumandresearchexperiencesforundergraduatesandfacultyfromlow-resourced

institutions.Collectively,thesefourprogramsreach134studentsoverthecourseof5years.However,thenumberofstudentstouchedbytheimprovedcurriculumandthestrengthenedfacultyisfargreater.Inadditiontothefourprogramsfundedexplicitlyfordiversity,otherR25smakeseriouseffortstorecruitandtrainunderrepresentedminorities.Forexample,inthefirstoffering,theshortcoursefromOregonHealthSciencesUniversitytrained9URMsoutofthe17totalparticipants.AlthoughtheR25programsformthecoreofBD2K’sdiversityefforts,URMscanbesupportedbyBD2KthroughtheT32/T15trainingprograms,whichmusthave“diversityrecruitmentandretentionplans,”andthroughdiversitysupplements(PA-15-322).

January2016

ScientificBreadth• DomainFocus:Althoughthemajorityoftheprogramsaimforageneral

biomedicalaudience,someofthemfocusonaparticulardatatype.Aboutaquarteroftheprogramsfocusongenomics,andanotherquarterfocusononeofimaging,clinical,orpopulationdata.Themajorityusemultipledatatypesandareforageneralbiomedicalaudience.

Areaswithfewapplicationsandhencefewfundedapplicationsaretheclinicalandpopulationsciences.Withintheseareas,thisisnotabledearthofactivityinmHealth.Toencourageapplicationsintheclinicalandpopulationsciences,BD2KwillworkcloselywithrelevantInstitutes,CentersandOffices,suchasNCATSandOBSSR,toensurethatanynewfundingopportunityannouncementscontainlanguagetoencourage

applicationsinthisareaandarewidelyadvertisedtotheappropriatecommunities.

• DataManagementandDataScience:Collectively,theawardsspanabroadrangeoftopics,includingdatamanagement,dataexploration,datarepresentation,computing,datamodeling,anddatavisualization.Eachofthesetopicscanbefurtherbrokendowninthefollowingway.

o Datamanagement:reuseofdata,datastandards,locatingandaccessingdataandtools,organizingandcuratingdatathroughontologiesanduseofmetadata

o Computing:distributedorparallelcomputing,workflows,programming,algorithms,optimization,andnaturallanguageprocessing

o Datarepresentation:datastructuresanddatabaseso Dataexploration:datamungingandpreparation,exploratorydata

analysiso DataModeling:probability,stochasticmodeling,introductory

statistics,advancedstatistics(e.g.multipletesting,dimensionreduction),machinelearning,experimentaldesign,Bayesianmethods,reproducibleresearch,networkmodels

o DatavisualizationandcommunicationAlthoughcollectivelytheR25awardsspantherangeabove,theyareoftencoveredwithlittledepth,andsomeofthetopicsareonlycoveredbyoneOpenEducationalResource,andthesetendtobethemoretechnical(e.g.optimizationandstochasticmodeling)orspecialized(e.g.ELSIandteamscience)topics.

January2016

Reach

• Elevenprogramswithin-personcomponentsarespreadevenlybetweenEastcoast,Westcoast,andthemiddleofthecountry.Basedontheplannedenrollments,the11programswillserveover250participantsinthesummerof2016.

o ThethreeprogramsthatwerefundedinFY14heldcoursesinthesummerof2015.Theycollectivelyreachedundergraduates,PhDstudents,andfacultyfromover30differentuniversities.

o Demandforthe2015in-personcourseswashigh,withreportedacceptanceratesfortheprogramswithlimitedslotsasbeing35%and14%.Anotherprogramgavesupporttoalimitednumberofstudentsbutopenedalargeauditoriumforthecourse.

• FourteenoftheawardsareonlineOpenEducationalResources.Thesereachalargenumberofstudentsandinstructors,providingagreatvalueperstudent.

o Forexample,about5,000studentscompletedthefirst8coursesinRafaIrizarry’sseriesofbiomedicalDataScienceMOOCsinthefirstoffering,amountingtoabout$40perstudentbasedonanNIHinvestmentof$200K(year1directcosts).

AlthoughBD2Kissupportingthedevelopmentanddiscoveryoftrainingresources,continuedsupportintheareaisneededforanumberofreasons:gapsincontentcoverageexist(e.g.methodsformHealthdata,algorithmsandoptimizationmethods,advancedstatistics,networkmodels);demandforin-personcoursescontinuestoexceedsupply;differentwaysofexplainingmaterialresonatewithdifferentlearners;andmaterialsneedtobeupdatedtotakeintoaccountnewscienceandnewdevelopmentsintheunderstandingoflearning.

Goal2:ToIncreasetheNumberofBiomedicalDataScientistsToincreasethenumberofbiomedicaldatascientists,traineesneedtogaintheappropriateskills,wanttoworkinbiomedicalscience,andhaveanappropriateplacetowork.Becauseallofthesetraineeshaveself-selectedtowardbiomedicalscience,BD2K’sfocusisonhelpingtraineesgettheappropriateskillsinitiallyandusethoseskillsinthelongrun.Traineesmaybe1)predoctoralstudents,whogainfoundationsinbiomedicaldatasciencethroughT32/T15trainingprograms,2)postdocs/faculty,whoaretrainedineitherbiomedicalscienceordatascienceandrecognizetheneedtocomplementtheirexistingknowledgeandskillsthroughK01careerdevelopmentawards,or3)studentsorfacultywhoneedspecializedtrainingthatisunavailablelocallybutattainablethroughaResearchRotation.Retainingtraineesisbothimportantandachallenge,duetothedemandfordatascienceskillsacrosssectors.Althoughretentionisbeingaddressedprimarily

January2016

throughprovidingsupport,BD2Kisalsointerestedinfacilitatingconversationswithuniversityleaderstodiscusscareerpathsforopenscienceanddatascience.SupportforthefurtherdevelopmentofdatascientistsintobiomedicaldatascientistsisgiventhroughK01awards.Beforeblendingbiomedicalanddatascienceknowledge,datascientistsmaybeabletoimmediatelycontributetoaddressingbiomedicalproblemsthroughworkinginteamswithbiomedicalscientists.SuchteamsarebeingfosteredthroughjointNSF/BD2KsupportintheQuBBDprogram,forbuildingcollaborationsbetweendatascientistsandbiomedicalscientists.

T32/T15TrainingProgramsPredoctoraltraineesaresupportedthroughnewT32programsandsupplementstoexistingT32orNLMT15programs:6werefundedinFY14,andanother10willbefundedinFY15,foratotalof84trainees.Sincemanyofthedepartmentsthattheprogramsresideinarenew,notallprogramsrequestedthemaximumof6slots.Becausethefieldisnascent,thereisstillnotfullagreementastowhatthecorecompetenciesare.However,someofthecommonareasofmostBD2Ktrainingprogramsare:modernstatistics(e.g.handlingmultiplecomparisons,highdimensionaldataanalysis,spatial-temporalcorrelation),computationaltechniques(e.g.usingcloudandparallelcomputing,optimization,andalgorithms),andmachinelearning.Mostprogramsarecreatingnewcoursesinordertointegratethesetopicstogetherandfocusclasstimeontheonesthatneedtobelearneddidactically.Belowisawordcloudofthecorecoursesoftheprograms.

January2016

K01CareerDevelopmentBD2Kissupporting21postdocsandfacultywithmentoredcareerdevelopmentawards(K01).ThePIscomefromdiversebackgrounds:

• 9arephysicians,withspecialtiesinhematology/oncology,neurology,neuroradiology,surgery,urologicsurgery,pulmonaryandcriticalcaremedicine,andinternalmedicine

• 7haveprimarilyquantitativeorcomputationalbackgrounds,withdegreesinElectricalEngineeringandComputerScience,Physics,NuclearPhysics,andBiomedicalEngineering

• 3havebackgroundsinfieldsthatblendthebiomedicalandcomputationalsciences(moleculargenetics,bioinformaticsandcomputationalbiochemistry)

• 2arebehavioralorsocialscientists(SocialEpidemiology,QuantitativePsychology)

ThegroupofK01awardeesisdiversenotjustbyscientificbackgroundbutalsodemographicallyandgeographically:

• Nineoutof21arefemale.• Theyworkat18uniqueinstitutions.

Throughasustainedperiodofresearchcareerdevelopmentandtraining,theK01awardeeswillgaintheknowledgeandskillsnecessarytolaunchindependentresearchcareers.Thegoalisthattheybecomecompetitivefornewresearchprojectgrant(e.g.,R01)fundingintheareaofBigDataScience.

ResearchRotationsTheBD2KTrainingCoordinationCenterhasthreemaingoals:1)tocoordinatebyincreasingcommunicationacrosstheBD2K-fundedtrainingawards;2)todevelopadiscoveryindexforeducationalresources;and3)tocoordinateresearchrotationstofacilitateaccesstoappropriateexpertise.The“researchrotations”aimtomatchtraineeswhoneedspecializedtrainingwiththoseexpertswillingandabletoprovideit.Thetraineesareexpectedtomainlybegraduatestudents,postdocs,orjuniorfaculty.Implementationdetailsoftheresearchrotationsarestillinthedevelopmentstage,andevaluationmeasuresarebeingdesignedalongwiththeprogram.

January2016

NSF/NIHQuantitativeBiomedicalBigData(QuBBD)ProgramSomeproblemswillrequireanewmodelofleadership.Particularlywhenverydiverseskillsneedtobebroughttobearontheproblem,teamsofindividualswithcomplementaryexpertisewillbeneeded.SuchteamsarebeingfosteredthroughapartnershipbetweenNSFandNIHcalledtheQuBBDprogram.ThisprogramconsistsofaseriesofInnovationLabsandthefundingofplanninggrants.InnovationLabsareweek-longmentoredworkshopsthatcatalyzeinterdisciplinaryteamsandspeeduptheprocessofdevelopingtheteam’sresearchprogram.BD2KranapilotInnovationLabinJuly2015.Bytheendoftheweek,12newinterdisciplinaryteamswerepreparedtosubmitgrantapplicationstogether.Theteamssubmittedapplicationsforsmallplanninggrants,alongwithnewly-formedteamsthatdidnotgothroughtheInnovationlab.Becauseanestablishedteamcandomuchworkvirtually,theteamsfosteredbytheQuBBDprogramdrawinawiderangeoftalent,manyofwhomarefromschoolsnototherwiserepresentedwithinBD2K.Theseteamsincludesomedatascientistswhoareinphysicallyisolatedlocationsalongwithbiomedicalscientistswhoareunabletofinddatasciencecollaboratorsduetothehighdemand.TheSummer2015InnovationLabwassuccessful,andthekeytosuccesswastheparticipationofmentorswhoarefromavarietyofbackgroundsandareleadersintheirrespectivefields.Thementorsguideteamsthroughtheiterativeideationprocess,offeringfeedbackonresearchideasthroughouttheweek.ContinuationoftheQuBBDprogramisunderdiscussion.

SummaryTheBD2KprogramsintrainingdescribedinthisdocumentareearlyeffortstoaddressaneedidentifiedbytheAdvisoryCommitteetotheDirector’sDataandInformaticsWorkingGroup.Theycoverbothbiomedicaldatasciencespecialists,aswellasspecialistsinotherbiomedicalareas.Theyalsocovertheeducationalpipelinefromundergraduatestofaculty.Theseprogramsare“earlyefforts”becausethereisstillmuchworktobedonetomeetbothgoals.Toquicklyincreasethebasedatascienceskillsofawidevarietyofbiomedicalscientists,earlyresourcesfocusedonbroad,generallyapplicabletopics.Laterresourcesmightbeonmorespecializedtopics.Likewise,toquicklyjumpstarttheincreaseinthenumberofbiomedicaldatascientists,someofthetrainingprogramsaremodificationsofexistingprograms,buildingonexistinginfrastructureandcourses.Asthecommunityconvergesonthecorecompetenciesofthefield,biomedicaldatasciencetrainingprogramswilllikelyevolveandmayendupbearinglittleresemblancetotheprogramsinitiallyfunded.

January2016

TheBD2KInitiativeisinitsinfancy,andthetrainingprograms,alongwithotherBD2Kprograms,arecontributingtothedevelopmentofthefieldofbiomedicaldatascienceintheUSandacrosstheworld.AlthoughtheawardedgrantsareconfinedtoUSinstitutions,thereachextendsfarbeyondthroughthedevelopmentofopeneducationalresourcesandcontributionstotheglobalconversationsaboutthefieldofbiomedicaldatascience.Inaddition,internationalcollaborationssurroundingthediscoveryofopeneducationalresourcesforbiomedicaldatasciencehavebegun.TheBD2KprogramaimstoimprovetheabilityofthebiomedicalworkforcetouseBigDatabothtodayandtomorrow,bothintheUSandacrosstheworld.

GeographicDistributionofBD2KTrainingAwards

Recommended