Introduction to Molecular Biology, Genetics and Genomics Sushmita Roy sroy@biostat.wisc.edu...

Preview:

Citation preview

Introduction to Molecular Biology, Genetics and Genomics

Sushmita Roy

www.biostat.wisc.edu/bmi576/sroy@biostat.wisc.edu

September 6, 2012

BMI/CS 576

Goals for today

• Molecular biology crash course:– The different parts of a cell– DNA, RNA, chromosomes, nucleus, cytoplasm– Bio-chemical entities of a cell: mRNA, proteins,

metabolites– genes, heredity, transcription, translation, gene regulation,

gene expression, alternative splicing

• Genomics crash course:– Genomes, functional genomics, other omes, networks

Organization of biological information

Organism

Tissue

Gene

Chromosome

Cell

http://publications.nigms.nih.gov/thenewgenetics/chapter1.html

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

image from the DOE Human Genome Programhttp://www.ornl.gov/hgmis

DNA

• Short for Deoxy ribonucleic acid

• composed of small chemical units called nucleotides (or bases)– adenine (A), cytosine (C), guanine (G) and thymine (T)– ATGC is the alphabet

• DNA is double stranded: made up two twisting strands

• Each strand of DNA is a string composed of the four letters: A, C, G, T

DNA is a double helical molecule

DNA molecules consist of two strands arranged in a double helix

• DNA is made up of nucleotides

Double-helical structure is needed for the DNA molecule to store and pass with great precision

James Watson, Francis Crick, Maurice Wilkins and Rosalind Franklin

Watson-Crick Base Pairs

A always bonds to T C always bonds to G

This is called base pairing.A and G are double ringed structures called purines.C and T single ringed structures called pyrimidines

5’ and 3’ of a DNA molecule• The backbone of this molecule has

alternating carbon and phosphate molecules

• each strand of DNA has a “direction”– at one end, the terminal carbon atom

in the backbone is the 5’ carbon atom of the terminal sugar

– at the other end, the terminal carbon atom is the 3’ carbon atom of the terminal sugar

• therefore we can talk about the 5’ and the 3’ ends of a DNA strand

DNA stores the blue print of an organism

• The heredity molecule• Has the information needed to make an organism• Base pairing enables self-replication:

– one strand has all the information

Chromosomes

• All the DNA of an organism is divided up into individual chromosomes

• prokaryotes (single-celled organisms lacking nuclei) typically have a single circular chromosome

• eukaryotes (organisms with nuclei) have a species-specific number of chromosomes

Image from www.genome.gov

DNA packaging in Chromatin

DNA is very long (3m in humans), cell is very smallChromosome compresses the DNA molecule 50,000Collection of DNA and proteins is called chromatin.

Different organisms have different numbers of chromosomes

Organism # of chromosomes

Yeast 32

Human 46

Fly 8

Mouse 40

Arabidopsis 10

Worm 12

Genes

• genes are the basic units of heredity

• a gene is a sequence of bases which specifies a protein or RNA genes

• the human genome comprises ~ 25,000 protein-coding genes (still being revised)

• One gene can have many functions• One function can require many

genes…GTATGTCTAAGCCTGAATTCAGTCTGCTTTAAACGGCTTC…

Structure of genes

DNA

GeneNon-coding Promoter

Gene A Gene B Gene C

Genomes

• Refers to the complete complement of DNA for a given species

• the human genome consists of 2X23 chromosomes

• every cell (except egg and sperm cells and mature red blood cells) contains the complete genome of an organism

Some Greatest Hits

Genome Where Year

H. Influenza TIGR 1995

E. Coli K -12 Wisconsin 1997

S. cerevisiae (yeast) internat. collab. 1997

C. elegans (worm) Washington U./Sanger 1998

Drosophila M. (fruit fly) multiple groups 2000

E. Coli 0157:H7 (pathogen) Wisconsin 2000

H. Sapiens (that’s us) internat. collab./Celera 2001

Mus musculus (mouse) internat. collaboration 2002

Rattus norvegicus (rat) internat. Collaboration 2004

Some Genome Sizes

genome # base pairs

HIV 9750

E. coli 4.6 million

S. cerevisiae 12 million

C. elegans 97 million

Drosophila M. 137 million

human 3.1 billion

Number of sequenced genomes

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

RNA

• RNA is like DNA except:– single stranded– U is used in place of T

• a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U

Transcription• In eukaryotes: happens inside the nucleus• RNA polymerase is an enzyme that builds an RNA strand

from a gene• RNA Pol II is recruited at specific parts of the genome in a

condition-specific way. • Transcription factor proteins are assigned the job of Pol II

recruitment.

• RNA that is transcribed from a gene is called messenger RNA (mRNA)

Transcription: Process of turning DNA into RNA

mRNA

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

Translation

• Process of turning mRNA into proteins.

• Happens inside the cytoplasm in ribosomes

• ribosomes are the machines that synthesize proteins from mRNA

• Translation process reads one codon at a time

• translation begins with the start codon

• translation ends with the stop codon

Translation happens in ribosomes

Codons

• Each triplet of bases is called a odon• How many codons are possible?• Each codon is responsible for coding a particular

amino acid.

The Genetic Code

Codons and Reading Frames

Alanine

Threonine

Proteins

• Proteins are long strings of composed of amino acids

• There are 20 different amino acids known

Amino AcidsAlanine Ala A

Arginine Arg R

Aspartic Acid Asp D

Asparagine Asn N

Cysteine Cys C

Glutamic Acid Glu E

Glutamine Gln Q

Glycine Gly G

Histidine His H

Isoleucine Ile I

Leucine Leu L

Lysine Lys K

Methionine Met M

Phenylalanine Phe F

Proline Pro P

Serine Ser S

Threonine Thr T

Tryptophan Trp W

Tyrosine Tyr Y

Valine Val V

Proteins are the workhorses of the cell

• structural support• storage of amino acids• transport of other substances• coordination of an organism’s activities• response of cell to chemical stimuli• movement• protection against disease• selective acceleration of chemical reactions

Proteins are complex molecules

• Primary amino acid sequence

• Secondary structure• Tertiary structure• Quarternary structure

Some well-known proteins

Hemoglobin: carries oxygen Insulin: metabolism of sugarActin: maintenance of cell structure

Hemoglobin protein HBA1

>gi|224589807:226679-227520 Homo sapiens chromosome 16, GRCh37.p9 Primary Assembly

1 CCCACAGACT CAGAGAGAAC CCACCATGGT GCTGTCTCCT GACGACAAGA CCAACGTCAA

61 GGCCGCCTGG GGTAAGGTCG GCGCGCACGC TGGCGAGTAT GGTGCGGAGG CCCTGGAGAG

121 GATGTTCCTG TCCTTCCCCA CCACCAAGAC CTACTTCCCG CACTTCGACC TGAGCCACGG

181 CTCTGCCCAG GTTAAGGGCC ACGGCAAGAA GGTGGCCGAC GCGCTGACCA ACGCCGTGGC

241 GCACGTGGAC GACATGCCCA ACGCGCTGTC CGCCCTGAGC GACCTGCACG CGCACAAGCT

301 TCGGGTGGAC CCGGTCAACT TCAAGCTCCT AAGCCACTGC CTGCTGGTGA CCCTGGCCGC

361 CCACCTCCCC GCCGAGTTCA CCCCTGCGGT GCACGCCTCC CTGGACAAGT TCCTGGCTTC

421 TGTGAGCACC GTGCTGACCT CCAAATACCG TTAAGCTGGA GCCTCGGTGG CCATGCTTCT

481 TGCCCCTTTG G

DNA sequence (491 bp)

>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

Amino acid sequence (142 aa) Protein 3d structure

RNA Processing in Eukaryotes

• eukaryotes are organisms that have enclosed nuclei in their cells

• in many eukaryotes, RNAs consist of alternating exon/intron segments

• exons are the coding parts

• introns are spliced out before translation

RNA Splicing

RNA Genes

• not all genes encode proteins• for some genes the end product is RNA

– ribosomal RNA (rRNA), which includes major constituents of ribosomes

– transfer RNAs (tRNAs), which carry amino acids to ribosomes

– micro RNAs (miRNAs), which play an important regulatory role in various plants and animals

– linc RNAs (long non-coding RNAs), play important regulatory roles.

Central Dogma revisited

DNA

RNA

Proteins

Transcription

Translation

ncRNA, miRNA, rRNAs

Non-coding RNA processing

Summary

• Key concepts in molecular biology– Central Dogma– DNA, RNA, proteins– Chromosomes, Nucleus, Ribosomes

• Important processes– Transcription– Translation– RNA splicing

Functional Genomics

• Aims to characterize gene, proteins in an organism in an unbiased way using high throughput technologies.

• Really focused on “beyond the genetic sequence”• What does a piece of DNA do?

– Gene, regulatory element, a mutation

• Has generated large collections of “omics” datasets– Gene expression– Protein expression– Metabolite levels

Metabolites

• Metabolism:– A set of chemical processes in cells – Need for sustaining life

• Small molecules that are intermediates of metabolism– Sugar– Glycerol

• Metabolic pathway– A set of chemical reactions in a cell

The Tri-Carboxylic Acid cycle

Metabolites

Enzyme

Courtesy KEGG Pathways

Yeast metabolic pathways

Context-specific expression of a cell

• The DNA is static • But the set of mRNA per cell type, environment, time-

point may be different.• A key process is gene regulation

– determines which genes are expressed when

Environmental signal

Transcriptional gene regulation

• Key control process that determines what genes are expressed when

• Requires– RNA Polymerase– Transcription factors– Energy

http://www.youtube.com/watch?v=WsofH466lqk

Transcriptional gene regulation

Transcription factor level (trans)

HSP12

Transcription factor binding sites (cis)

mRNA levels

P2P1

Promoter

Regulation of GAL genes

• GAL genes are required for yeasts to grow on Galactose.

• There are 4 genes that are metabolic– GAL1, GAL10, GAL2 and GAL7

• There are three that are regulatory– GAL4, GAL80 and GAL3

Regulation of GAL genes

No Galactose

In Galactose

A metabolic GAL gene

Transcriptome

• The entire set of RNA products in a cell• A cell can decide to make more or less of a particular

RNA– Levels change

• It’s constituents are context-specific• Context is determined by environment of a cell

Transcriptional Regulatory networks

• The entire set of interactions between TFs and genes in an organism

• The transcriptome is the output of a regulatory network

Image courtesy: Dr. Mike Snyder, http://compbio.pbworks.com/w/page/16252928/Transcription%20Regluatory%20Network#1

Understanding cells requires an iterative approach spanning multiple levels

Ideker et al., Science 2002

Summary

• Cells are made up of many different molecular entities

• Functional genomics enables us to identify these entities

• Cells function via the interaction of these entities• Putting it together into comprehensive models is a

major goal of systems biology

Recommended