Click here to load reader

2013 03-15- Institut Jacques Monod - bioinfoclub

  • View
    141

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Text of 2013 03-15- Institut Jacques Monod - bioinfoclub

  • 1. Doing computational science better Some sources of inspirationSome tools Getting help A vous

2. Some sources of inspiration 3. EducationA Quick Guide to Organizing Computational BiologyProjectsWilliam Stafford Noble1,2*1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science andEngineering, University of Washington, Seattle, Washington, United States of AmericaIntroduction understanding your work or who may be under a common root directory. The evaluating your research skills. Most com-exception to this rule is source code or Most bioinformatics coursework focus- monly, however, that someone is you. Ascripts that are used in multiple projects.es on algorithms, with perhaps somefew months from now, you may notEach such program might have a projectcomponents devoted to learning pro-remember what you were up to when you directory of its own.gramming skills and learning how tocreated a particular set of files, or you mayWithin a given project, I use a top-leveluse existing bioinformatics software. Un-not remember what conclusions you drew. organization that is logical, with chrono-fortunately, for students who are prepar-You will either have to then spend time logical organization at the next level, anding for a research career, this type ofreconstructing your previous experimentslogical organization below that. A samplecurriculum fails to address many of theor lose whatever insights you gained from project, called msms, is shown in Figure 1.day-to-day organizational challenges as- those experiments.At the root of most of my projects, I have asociated with performing computationalThis leads to the second principle,data directory for storing fixed data sets, aexperiments. In practice, the principles which is actually more like a version ofresults directory for tracking computa-behind organizing and documentingMurphys Law: Everything you do, youtional experiments peformed on that data,computational experiments are oftenwill probably have to do over again.a doc directory with one subdirectory perlearned on the fly, and this learning is Inevitably, you will discover some flaw inmanuscript, and directories such as srcstrongly influenced by personal predilec-your initial preparation of the data beingfor source code and bin for compiledtions as well as by chance interactionsanalyzed, or you will get access to new binaries or scripts.with collaborators or colleagues.data, or you will decide that your param-Within the data and results directo- The purpose of this article is to describeeterization of a particular model was not ries, it is often tempting to apply a similar,one good strategy for carrying out com-broad enough. This means that the logical organization. For example, youputational experiments. I will not describeexperiment you did last week, or even may have two or three data sets againstprofound issues such as how to formulate the set of experiments youve been work-which you plan to benchmark yourhypotheses, design experiments, or drawing on over the past month, will probably algorithms, so you could create oneconclusions. Rather, I will focus on need to be redone. If you have organizeddirectory for each of them under data.relatively mundane issues such as organiz- and documented your work clearly, thenIn my experience, this approach is risky,ing files and directories and documentingrepeating the experiment with the new because the logical structure of your final 4. EducationA Quick Guide to Organizing Computational BiologyProjectsWilliam Stafford Noble1,2*1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science andEngineering, University of Washington, Seattle, Washington, United States of AmericaIntroductionunderstanding your work or who may beunder a common root directory. Theevaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus-monly, however, that someone is you. A scripts that are used in multiple projects.es on algorithms, with perhaps some few months from now, you may not Each such program might have a projectcomponents devoted to learning pro- remember what you were up to when youdirectory of its own.gramming skills and learning how to created a particular set of files, or you mayWithin a given project, I use a top-leveluse existing bioinformatics software. Un- not remember what conclusions you drew.organization that is logical, with chrono-fortunately, for students who are prepar- You will either have to then spend timelogical organization at the next level, anding for a research career, this type of reconstructing your previous experiments logical organization below that. A samplecurriculum fails to address many of the or lose whatever insights you gained fromproject, called msms, is shown in Figure 1.day-to-day organizational challenges as-those experiments. At the root of most of my projects, I have asociated with performing computationalThis leads to the second principle,data directory for storing fixed data sets, aexperiments. In practice, the principleswhich is actually more like a version of results directory for tracking computa-behind organizing and documenting 1. Directory structure for a sample project. Directorydo, youin large tional experiments in smaller typeface. Only a subset of Figure names aretypeface, and filenames areMurphys that the dates are formatted ,year.-,month.-,day. so that they can bepeformed on that data, the files are shown here. NoteLaw: Everything you sorted in chronological order. Thecomputational experiments are often code src/ms-analysis.c have to to do over again. and is documented in doc/ms-analysis.html. The README source will probablyis compiled create bin/ms-analysis a doc directory with one subdirectory per what date. The driver script results/2009-01-15/runalllearned on the fly, and this learning is the data directories specify who downloaded the data files from what URL on manuscript, and directories such as src files in automatically Inevitably, you will discover some flaw split3, corresponding to three cross-validation splits. The bin/parse-generates the three subdirectories split1, split2, and instrongly influenced by personal predilec- script is called by bothpreparation driverthe data being sqt.pyyour initial of the runall of scripts. for source code and bin for compiled doi:10.1371/journal.pcbi.1000424.g001tions as well as by chance interactions analyzed, or you will get access to newbinaries or scripts.with collaborators or colleagues.with this approach,or you will decide that Lab Notebookdata, the distinction be- The your param-Within the data and results a complete These types of entries provide directo- The purpose of this article is to describe data and results may of a particular model was not tweeneterization not be useful. ries, it is often tempting to apply of the project picture of the development a similar,In parallel with this chronologicalone good strategy for carrying out com- onebroad imagine a top-level means structure,the find itlogical toorganization. For example, you Instead, could enough. This directory that I useful over time. directory called something like experi-In practice, I ask members of myputational experiments. I will not describe , with subdirectories with names like last week, chronologically organizedhave two or group to data sets notebooks mentsexperiment you didmaintain a or even may lab research three put their lab againstprofound issues such as how to formulate 2008-12-19. Optionally, the directorynotebook. This is a document that residesthe set of experiments youveroot of the results directory andyou online, behind benchmark yourin the been work- whichplan to password protection ifhypotheses, design experiments, or draw might ing on over word past month, will probably namealso include aor twonecessary. When I meet with a memberthat records your progress algorithms, ofso lab or a could team, we can one indicating the topic of the the experiment in detail.my you project create referconclusions. Rather, I will focus therein. In practice,to single experiment you have organized should be dated, for each of lab notebook, focusing on on need a be redone. If and they should be relatively verbose, with to the online them under data.Entries in the notebookdirectoryrelatively mundane issues such as organiz- will often require more than one day of the current entry but scrolling up toand documented your work clearly, thenimages In my experience, entries approach is risky,this work, and so you may end up working alinks or embeddedor tables previous as necessary. The URLing files and directories and documenting or repeating creating a new displaying the results of the experiments the can also be provided toof yourcollabo- few daysmore before the experiment with the new becauselogical structure remote final 5. EducationA Quick Guide to Organizing Computational BiologyProjectsWilliam Stafford Noble1,2*1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington, United States of America, 2 Department of Computer Science andEngineering, University of Washington, Seattle, Washington, United States of AmericaIntroductionunderstanding your work or who may beunder a common root directory. Theevaluating your research skills. Most com- exception to this rule is source code or Most bioinformatics coursework focus-monly, however, that someone is you. A scripts that are used in multiple projects.es on algorithms, with perhaps some few months from now, you may not Each such program might have a projectcomponents devoted to learning pro- remember what you were up to when youdirectory of its own.gramming skills and learning how to created a particular set of files, or you mayWithin a given project, I use a top-leveluse existing bioinformatics software. Un- not remember what conclusions you drew.organization that is logical, with chrono-fortunately, for students who are prepar- You will either have to then spend timelogical organization at the next level, anding for a research career, this type of reconstructing your previous experiments logical organization below that. A samplecurriculum fails to address many of the or lose whatever insights you gained fromproject, called msms, is shown in Figure 1.day-to-day organizational challenges as-those experiments. At the root of most of my projects, I have asociated with performing computationalThis leads to the second principle,data directory for storing fixed data sets, aexperiments. In practice, the principleswhich is actually more like a version of results directory for tracking computa-behind organizing and documenting 1. Directory structure for a sample project. Directorydo, youin large tional experiments in smaller typeface. Only a subset of Figure names aretypeface, and filenames areMurphys that the dates are formatted ,year.-,month.-,day. so that they can bepeformed on that data, the files are shown here. NoteLaw: Everything you In each results folder: sorted in chronological order. Thecomputational experiments are often code src/ms-analysis.c have to to do over again. and is documented in doc/ms-analysis.html. The README source will probablyis compiled create bin/ms-analysis a doc directory with one subdirectory per what date. The driver script results/2009-01-15/runalllearned on the fly, and this learning is the data directories specify who downloaded the data files from what URL on manuscript, and directories such as src files in automatically Inevitably, you will discover some flaw split3, corresponding to three cross-validation splits. The bin/parse-generates the three subdirectories split1, split2, and in script: getResults.rb or WHATIDID.txtstrongly influenced by personal predilec- script is called by bothpreparation driverthe data being sqt.pyyour initial of the runall of scripts. for source code and bin for compiled doi:10.1371/journal.pcbi.1000424.g001tions as well as by chance interactions analyzed, or you will get access to newbinaries or scripts.with collaborators or colleagues.with this approach,or you will decide that Lab Notebookdata, the distinction be- The your param-Within the data and results a complete These types of entries provide directo- intermediates The purpose of this article is to describe data and results may of a particular model was not tweeneterization not be useful. ries, it is often tempting to apply of the project picture of the development a similar,In parallel with this chronologicalone good strategy for carrying out com- onebroad imagine a top-level means structure,the find itlogical toorganization. For example, you Instead, could enough. This directory that I useful over time. directory called something like experi-In practice, I ask members of myputational experiments. I will not describe , with subdirectories with names like last week, chronologically organizedhave two or group to data sets notebooksmaintain a or even may lab research three put their lab against output mentsexperiment you didprofound issues such as how to formulate 2008-12-19. Optionally, the directorynotebook. This is a document that residesthe set of experiments youveroot of the results directory andyou online, behind benchmark yourin the been work- whichplan to password protection ifhypotheses, design experiments, or draw might ing on over word past month, will probably namealso include aor twonecessary. When I meet with a memberthat records your progress algorithms, ofso lab or a could team, we can one indicating the topic of the the experiment in detail.my you project create referconclusions. Rather, I will focus therein. In practice,to single experiment you have organized should be dated, for each of lab notebook, focusing on on need a be redone. If and they should be relatively verbose, with to the online them under data.Entries in the notebookdirectoryrelatively mundane issues such as organiz- will often require more than one day of the current entry but scrolling up toand documented your work clearly, thenimages In my experience, entries approach is risky,this work, and so you may end up working alinks or embeddedor tables previous as necessary. The URLing files and directories and documenting or repeating creating a new displaying the results of the experiments the can also be provided toof yourcollabo- few daysmore before the experiment with the new becauselogical structure remote final 6. Best Practices for Scientic ComputingGreg Wilson , D.A. Aruliah , C. Titus Brown , Neil P. Chue Hong , Matt Davis , Richard T. Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institute([email protected]), University of Wisconsin ([email protected]), University of British Columbia (miMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected]) arXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using and open source software development [61software. However, most scientists are never taught how to do this ical studies of scientic computing [4, 31,eciently. As a result, many are unaware of tools and practices that development in general (summarized inwould allow them to write more reliable and maintainable code with practices will guarantee ecient, error-frless eort. We describe a set of best practices for scientic software ment, but used in concert they will reddevelopment that have solid foundations in research and experience,and that improve scientists productivity and the reliability of their errors in scientic software, make it easiesoftware.the authors of the software time and eo focusing on the underlying scientic quesSoftware is as important to modern scientic research astelescopes and test tubes. From groups that work exclusively 1. Write programs for people, not con computational problems, to traditional laboratory and eld Scientists writing software need to writescientists, more and more of the daily operation of science re- cutes correctly and can be easily read andvolves around computers. This includes the development of programmers (especially the authors futnew algorithms, managing and analyzing the large amounts cannot be easily read and understood it isof data that are generated in single research projects, and to know that it is actually doing what it icombining disparate datasets to assess synthetic problems. be productive, software developers must tScientists typically develop their own software for these aspects of human cognition into accountpurposes because doing so requires substantial domain-specic human working memory is limited, huma 7. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic software arXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. We ment, but used in concert they will red fdevelopment that have solid foundations in research and experience,and that improve scientists productivity and the reliability of their errors in scientic software, make it easie the authors of the software time and eo Software is as important to modern scientic research assoftware. focusing on the underlying scientic questelescopesasand test tubes. From groups that work exclusivelySoftware isimportant to modern scientic research as1telescopes and test tubes. From groups that work exclusivelyon computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems,scientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves around computers. This includes the development of cvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 8. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.scientic research astelescopesasand test tubes. From groups that work exclusivelySoftware isimportant to modern scientic research as1telescopes and test tubes. From groups that work exclusivelyon computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems,scientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves around computers. This includes the development of cvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 9. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand test tubes. From groups that work exclusivelySoftware isimportant to modern scientic research as1telescopes and test tubes. From groups that work exclusivelyon computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems,scientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves around computers. This includes the development of cvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 10. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research1telescopes and test tubes. From groups that work exclusivelyon computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems,scientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves around computers. This includes the development of cvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 11. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems,scientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves around computers. This includes the development of cvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 12. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems, control.5. Use versionscientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves around computers. This includes the development of cvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 13. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems, control.5. Use versionscientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves aroundDont repeat yourself (or others).6. computers. This includes the development ofcvolves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 14. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems, control.5. Use versionscientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves aroundDont repeat yourself (or others).6. computers. This includes the development ofcvolves 7. Plan for mistakes. around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing and analyzing the large amountscannot be easily read and understood it ispnew disparate datasets to assess synthetic problems. that are generated in single research projects, andto know that it is actually doing what it icombiningbe productive, software developers must t cof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, humat 15. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research 1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems, control.5. Use versionscientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves aroundDont repeat yourself (or others).6. computers. This includes the development of cvolves 7. Plan for mistakes. around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing andworksand pcannot be easily read and understood it isnew 8. Optimize software only after it analyzingknow that it is actually doing what it i that are generated in single research projects, correctly.the large amountstocombining disparate datasets to assess synthetic problems.be productive, software developers must tcof Scientists that are generated in single research human cognition andaccountdata typically develop their own software for these aspects of projects, intopurposes because doing so requires substantial domain-specichuman working memory is limited, huma t 16. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research 1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems, control.5. Use versionscientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves aroundDont repeat yourself (or others).6. computers. This includes the development of cvolves 7. Plan for mistakes. around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing andworksand pcannot be easily read and understood it isnew 8. Optimize software only after it analyzingknow that it is actually doing what it i that are generated in single research projects, correctly.the large amountstocombining disparate datasets to assess synthetic problems.9. Document the designown software single research projects, and must tand purpose ofthese rather than itssoftware developerscode be productive, mechanics. cof Scientists that are generated in fordata typically develop theirpurposes because doing so requires substantial domain-specicaspects of human cognition into accounthuman working memory is limited, huma t 17. ([email protected]), University of Wisconsin ([email protected] Practices for Scientic ComputingMary University of London ([email protected]), Unive Greg Wilson , ([email protected]), and Hong , Matt of , Richard T. (wilsUniversity D.A. Aruliah , C. Titus Brown , Neil P. ChueUniversityDavisWisconsin Guy ,Steven H.D. Haddock , Katy Hu , Ian M. Mitchell , Mark D. Plumbley , Ben Waugh ,Ethan P. White , Paul Wilson Software Carpentry ([email protected]), University of Ontario Institute of Technology (Dhavide.AruScientists spend an increasing amount of time building and usingaState University ([email protected]), Software Sustainability Institute ([email protected]), Space Telescope([email protected]), University of Toronto ([email protected]), Monterey Bay Aquarium Research Institutesoftware. However, most scientists are never taught how to do thisi([email protected]), University of Wisconsin ([email protected]), University of British Columbia (mieciently. As a result, many are unaware of tools and practices thatdMary University of London ([email protected]), University College London ([email protected]),University ([email protected]), and University of Wisconsin ([email protected])would allow them to write more reliable and maintainable code withpless eort. We describe a set of best practices for scientic softwarearXiv:1210.0530v3 [cs.MS] 29 Nov 2012Scientists spend an increasing amount of time building and using research and software development [61 and open source experience,mdevelopment that have solid foundations in ical studies of scientic computing [4, 31,software. However, most scientists are never taught how to do thiseciently. As a improve are unaware of tools and practices thatand the reliability of theirand that result, many scientists productivitye development in general (summarized insoftware. describe a set of best practices for scientic software practices will guarantee ecient, error-frtwould allow them to write more reliable and maintainable code withless eort. Wement, but used in concert they will red fdevelopment that have solid foundations in research and experience,and the not computers. errors in scientic software, make it easieand that improve scientists productivitypeople, reliability of their 1. Write programs forthe authors of the software time and eo Software is as important to modern focusing on the underlying scientic quessoftware.2. Automate repetitive tasks.scientic research astelescopesasand computer to record history. as that work exclusively3. Use important to tubes. From groupsSoftware is the test modern scientic research 1telescopes andMaketubes. From groups that work exclusively4.test incremental changes.on computationalto traditional laboratory and eld 1. laboratory andpeople, not c problems, to traditional Write programs for eldScientists writing software need to writeSon computational problems, control.5. Use versionscientists, more and more of the daily operation of science re- operation of science re-scientists, more and more of the daily cutes correctly and can be easily read andvolves aroundDont repeat yourself (or others).6. computers. This includes the development of cvolves 7. Plan for mistakes. around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsprogrammers (especially the authors futof data algorithms, managing andworksand pcannot be easily read and understood it isnew 8. Optimize software only after it analyzingknow that it is actually doing what it i that are generated in single research projects, correctly.the large amountstocombining disparate datasets to assess synthetic problems.9. Document the designown software single research projects, and must tand purpose ofthese rather than itssoftware developerscode be productive, mechanics. cof Scientists that are generated in fordata typically develop theirpurposes because doing so code reviews.10. Conduct requires substantial domain-specic aspects of human cognition into accounthuman working memory is limited, huma t 18. Ruby.(or maybe python) 19. Ruby.(or maybe python)Friends dont let friends do Perl - reddit user 20. Programming better beingable to use understand and improve your code in 6 months & in 60 years - approximate Damian Conway 21. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming 22. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming coding width: 100 characters 23. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming coding width: 100 characters indenting 24. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming coding width: 100 characters indenting Followconventions -eg Google R Style or https://github.com/hadley/devtools/wiki/ Style 25. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming coding width: 100 characters indenting Followconventions -eg Google R Style or https://github.com/hadley/devtools/wiki/ Style Versioning: DropBox & http://github.com/ 26. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming coding width: 100 characters indenting Followconventions -eg Google R Style or https://github.com/hadley/devtools/wiki/ Style Versioning: DropBox & http://github.com/ Automatedtesting. e.g.: 27. Programming better beingable to use understand and improve your code in 6months & in 60 years - approximate Damian Conway variable naming coding width: 100 characters indenting Followconventions -eg Google R Style or https://github.com/hadley/devtools/wiki/ Style Versioning: DropBox & http://github.com/preprocess_snps creates MyFile.tex=# this is equivalent to SweaveOpts{...} ### in shell:opts_chunk$set(fig.path=figure/minimal-, fig.align=center, fig.show=hold)options(replace.assign=TRUE,width=90)@ pdflatex MyFile.textitle{A Minimal Demo of knitr}# --> creates MyFile.pdfauthor{Yihui Xie}maketitleYou can test if textbf{knitr} works with this minimal demo. OK, letsget started with some boring random numbers:=set.seed(1121)(x=rnorm(20))mean(x);var(x)@The first element of texttt{x} is Sexpr{x[1]}. Boring boxplotsand histograms recorded by the PDF device:=## two plots side by sidepar(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)boxplot(x)hist(x,main=)@Do the above chunks work? You should be able to compile the TeX{} 34. knitr (sweave)Analyzing & Reporting in a single le. ### in R:MyFile.Rnw library(knitr)documentclass{article}usepackage[sc]{mathpazo}usepackage[T1]{fontenc} knit(MyFile.Rnw)usepackage{url}begin{document} # --> creates MyFile.tex=# this is equivalent to SweaveOpts{...} ### in shell:opts_chunk$set(fig.path=figure/minimal-, fig.align=center, fig.show=hold)options(replace.assign=TRUE,width=90)@ pdflatex MyFile.textitle{A Minimal Demo of knitr}# --> creates MyFile.pdfauthor{Yihui Xie}maketitle A Minimal Demo of knitrYou can test if textbf{knitr} works with this minimal demo. OK, letsget started with some boring random numbers: Yihui Xie= February 26, 2012set.seed(1121)(x=rnorm(20))mean(x);var(x)You can test if knitr works with this minimal demo. OK, lets get started with [email protected] numbers:set.seed(1121)The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots(x creates MyFile.tex numbers: set.seed(1121)= (x creates MyFile.pdf ## [1] 0.3217author{Yihui Xie} var(x)maketitle ## [1] 0.5715You can test if textbf{knitr} works with this minimal demo. OK, letsget started with some boring random numbers:The rst element of x is 0.145. Boring boxplots and histograms recorded by the PDF ## two plots side by side (option fig.show=hold)=par(mar = c(4, 4, 0.1, 0.1), cex.lab = 0.95, cex.axis = 0.9,set.seed(1121) mgp = c(2, 0.7, 0), tcl = -0.3, las = 1)(x=rnorm(20))boxplot(x)mean(x);var(x) hist(x, main = "")@ 2.0 8The first element of texttt{x} is Sexpr{x[1]}. Boring boxplots and histograms recorded by the PDF device: 1.5 6= 1.0 Frequency## two plots side by sidepar(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3,las=1)0.5 4boxplot(x)hist(x,main=)[email protected] the above chunks work? You should be able to compile the TeX{} 36. Plotting in R 37. Plotting in R Rs graphs suck: embarassingly ugly require tweaking in Illustrator --> hardto automate. counterintuitive & inconsistent API -->hard to switchbetween e.g. histogram and density plot. hard to customize. --> Need for something beautiful, easy & effortless. 38. ggplot2: beautiful &(almost) effortless R plots 39. ggplot2: beautiful &(almost) effortless R plots10count 5 0 4 6 8 factor(cyl)ggplot(mtcars, aes(factor(cyl))) + geom_bar() 40. ggplot2: beautiful &(almost) effortless R plots 10 count504 68factor(cyl) 10factor(gear)3 count45504 6 8factor(cyl)ggplot(mtcars, aes(factor(cyl))) + geom_bar()ggplot(mtcars, aes(factor(cyl), fill=factor(gear))) + geom_bar() 41. + layout_karyogram(dn, aes(fill = exReg, color = exReg), geom = "rect")ale for x is already present. Adding another scale for x, which will replaceS O F T W A R E scale. the existingOpen Access(dn)$pvalue