39
Molecular Evolution, Genomic Analysis and FreeBSD Joseph Mingrone Department of Mathematics and Statistics Dalhousie University

Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Molecular Evolution, Genomic Analysisand

FreeBSD

Joseph Mingrone

Department of Mathematics and StatisticsDalhousie University

Page 2: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

About the Bielawski Group

I Small group (5 - 10 members)

I Molecular Evolution and Genomic AnalysisI We analyze DNA sequences to make

inferences

I The Research is purely computational

I Multidisciplinary

Page 3: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Outline

I Hardware

I Research tracksI Modelling evolution at the molecular level

I MircroBiome / Metagenomics

I Software, Design decisions, Observations

Page 4: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

HardwareComputing Cluster: Awarnach

I Purchased from Sun in 2006

I Sun Fire V40z master node (two dualcore Opteron 870, 16 GB ECC RAM)

I Twenty compute nodes (X4100), twodual core Opteron 270, 4 GB ECCRAM, two 73 GB disks and four gigabitethernet ports

I 48-port gigabit SMC switch

Page 5: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

HardwareStorage

I Asus RS300-E7-PS4 1U Server

I E3-1230V2 Xeon CPU

I Four Intel 60GB SSD (520 Series)

I LSI 9205-8e SAS controller

I Supermicro SC847E16-RJB0D1 JBODwith 10 WD30EFRX 2TB Hard Drives(ZFS raidz)

Page 6: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

HardwareNew Compute Node

I New Compute Node

I Four 12-core 6348 CPUs

I 256 GB RAM

Page 7: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Background

I Studying evolution is studying the pastI Clues in present day to infer past processes

I Morphology from fossil record

I Studies with some organisms

I Genetic material contains many markers ofevents in evolutionary history

Page 8: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Molecular Evolution Modelling

I Classify selection pressure at the protein and amino acidlevel

I Purifying Selection

I Neutral Evolution

I Positive Selection

I Usually most interested in detecting sites under positiveselection

Page 9: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Arms Race

Page 10: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Arms Race

Page 11: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Analysis of HIV Genes

I Viral Envelop (Env) 3/91

I DNA polymerase (Pol) 11/947

I Viral infectivity factor (Vif) 10/192

Page 12: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: BackgroundThe Central Dogma of Biology

I DNA is made up of four nucleotides with four differentbases: Adenine (A), Cytosine (C), Guanine (G), orThymine (T)

I Codons specify amino acids, the building blocks of protein

Page 13: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Redundant Code

I 43 = 64 possible codons (61 sense), 20 amino acids

I Code is redundant

I Nucleotide substitution may or may not mean change inamino acid

Page 14: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: A Measure of Selection Pressure

dS = rate of synonymous substitutionsdN = rate of nonsynonymous substitutions

dN/dS = 1 Neutral EvolutiondN/dS < 1 Purifying SelectiondN/dS > 1 Positive Selection

dN/dS = ω

A measure of the strength and direction of selection pressure

Page 15: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Markov Process

qij =

0πjκπjωπjωκπj

Page 16: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Estimating Model Parameters is ComputationallyIntense

TGCCACCCCGGC...

TGCCAGCCCGGC...

AGCCACCCCGGC...

TGCCACCCCGGC...

TGCCACCTCGGC...

qij =

0πjκπjωπjωκπj

Page 17: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Estimating Model Parameters is ComputationallyIntense

TGCCACCCCGGC...

TGCCAGCCCGGC...

AGCCACCCCGGC...

TGCCACCCCGGC...

TGCCACCTCGGC...

• •

qij =

0πjκπjωπjωκπj

Page 18: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Estimating Model Parameters is ComputationallyIntense

TGCCACCCCGGC...

TGCCAGCCCGGC...

AGCCACCCCGGC...

TGCCACCCCGGC...

TGCCACCTCGGC...

• •qij =

0πjκπjωπjωκπj

Page 19: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Estimating Model Parameters is ComputationallyIntense

TGCCACCCCGGC...

TGCCAGCCCGGC...

AGCCACCCCGGC...

TGCCACCCCGGC...

TGCCACCTCGGC...

• qij =

0πjκπjωπjωκπj

Page 20: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 1: Estimating Model Parameters is ComputationallyIntense

TGCCACCCCGGC...

TGCCAGCCCGGC...

AGCCACCCCGGC...

TGCCACCCCGGC...

TGCCACCTCGGC...

• •

qij =

0πjκπjωπjωκπj

Page 21: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Track 2: MircroBiome / Metagenomics

I Culturing diverse microbial communities in the lab can bedifficult

I Sequence DNA as it exists in the natural environment(metagenomics)

I Understand associations between changes in microbiomesand changes in complex systems

I BiomeNET BioMiCo: group microbial communitiesaccording to properties

El-Swais, Heba, et al. “Seasonal assemblages and short-lived blooms incoastal north-west Atlantic Ocean bacterioplankton.” Environmentalmicrobiology (2014).Shafiei, Mahdi, et al. “BioMiCo: a supervised Bayesian model for inference ofmicrobial community structure.” Microbiome 3.1 (2015): 1-15.Shafiei, Mahdi, et al. “BiomeNet: A Bayesian model for inference of metabolicdivergence among microbial communities.” (2014): e1003918.

Page 22: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Research and Computing Technology are Coupled

● ●●● ●●● ●

● ●

●●

● ●

● ●●● ● ●

●●● ●

● ● ●

● ●●●

●●●

●●

● ●●●●● ●

●●

●●●●●●●

●●

●●

●●●●●●

●●

1970 1980 1990 2000 2010

02

46

810

year

log 1

0

CitationsMoore's LawSequenced DNA

Intel 4004Zilog Z80ARM 1PentiumAtomIBM z13

I • Nucleotides stored in the EMBL Nucleotide SequenceDatabase

I � Number of transistors in current Intel PC processors

I

Page 23: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 24: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 25: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 26: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 27: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 28: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 29: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Software

I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node

I Poudriere, VirtualBox (MatLab)

I ZFS (master node and storage server), NFS

I math/R, lang/perl5.20

I print/texlive-full

I biology/paml / Proteus

I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR

Page 30: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Softwaresecurity/tmux-cssh

Page 31: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Softwaresysutils/ganglia-monitor-core sysutils/ganglia-webfrontend

Page 32: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Softwaresysutils/ganglia-monitor-core sysutils/ganglia-webfrontend

Page 33: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

SoftwareBasic Unix Tools

Page 34: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

SoftwareBatch Submission, Resource Management with Sun Grid Engine

Page 35: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

SoftwareBatch Submission, Resource Management with Sun Grid Engine

Distribution of Port Makefile Size

Number of Lines

Fre

quen

cy

0 200 400 600 800 1000

020

0040

0060

0080

0010

000

1200

0

sysutils/sge62 (257)

Page 36: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Softwaresysutils/slurm-hpc (slurm-wlm)

I Despite the name, Simple Linux Utility for ResourceManagement (Slurm): “Portability: Written in C with a GNUautoconf configuration engine. While initially written forLinux, Slurm has been ported to a diverse assortment ofsystems.”

I “Sequoia, an IBM BlueGene/Q system at LawrenceLivermore National Laboratory with 1.6 petabytes ofmemory, 96 racks, 98,304 compute nodes, and 1.6 millioncores, with a peak performance of over 17.17 Petaflops.”

Page 37: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

SoftwarePorting Software Written by Biologists

Page 38: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Other Cluster Options

Page 39: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine

Thank You

Questions?

Image credits:Arms race: www.inkcintc.com.au

HIV images: http://evolution.berkeley.edu/evolibrary/article/medicine_04