L20Biol261Genomics2014F

Embed Size (px)

Citation preview

  • 8/9/2019 L20Biol261Genomics2014F

    1/39

    Lecture 22, Genomics

    Genetic Analyses(1)FORWARD analysis began with classical transmission methods tracing the

    inheritance of mutant alleles or chromosomes.

    (2) REVERSEanalysis begins with mutating a reported gene sequence(knockout), tracing its inheritance and identifying its phenotypicexpression.

    I find it sometimes very difficult to tell what someone means when they talk aboutgenes because we dont share the same definition, says developmental geneticistWilliam Gelbert of Harvard University in Cambridge, Massachusetts.

    You have already struggled with multiple definitions of a gene, based on transmissionand cytological genetics. Now, using your understanding of molecular genetics, propose

    a gene definition and explain why it would be consistent with the requirements of agene (see lecture 11), practical and comprehensive.

  • 8/9/2019 L20Biol261Genomics2014F

    2/39

    2

    Increasingly,forward genetics, change of functioninsertions, and gene knockouts or eventually

    replacement, all depend on knowing something aboutsequence, and locationof a genein a genome

    - TODAY Genomics and its derivatives:

  • 8/9/2019 L20Biol261Genomics2014F

    3/39

    Genome: a (one) completeset of an organisms genetic

    information, or usually, one complete set of chromosomes

    (monoploid), sometimes, nuclear DNA content.

    The original definition ofGenomics included:

    (1)Mappingchromosomes

    (2)Sequencingchromosomes and identifying genes

    (3) Analyzing thefunctionsof entire genomes

    Currently genomics is divided into several fields of study:

    Structural genomics-the study of genome structure

    Comparative genomics-the study of genome diversity/evolution

    Bioinformatics- information from sequence structure

    Functional genomics-transcriptome(complete set of RNAs

    transcribed from a genome) and theproteome(the complete set

    of proteins transcribed from a genome).

    3

  • 8/9/2019 L20Biol261Genomics2014F

    4/39

    But first, Sequencing the Human Genome

    Clone by Clone method:Map first,

    sequence later (publicly fundedHuman Genome Project)

    Shotgun Method:Sequence

    first, map later (Celeracorporation)

    4

  • 8/9/2019 L20Biol261Genomics2014F

    5/39

    Genetic mapsof chromosomes are based on recombination

    frequencybetween markers:

    Low density- limit ~ 1% recombination is a practical limit-

    limited by the number of bodies you have to measure- for mostvisibly-expressed genes in eukaryotic breeding studies.

    Higher densitygenetic maps use restriction sites,andgene

    localization probesas landmarks.

    5

  • 8/9/2019 L20Biol261Genomics2014F

    6/39

    Genetic mapsof chromosomes start with

    ordinal distance based on recombination

    frequency between visible markers.

    Low density Cytogenic (cytological,

    ideogram) mapsare based on the location

    of markers within or near cytological

    features, microscopically visible.

    6

  • 8/9/2019 L20Biol261Genomics2014F

    7/39

    Low Density Genetic mapsof chromosomes are based on

    recombination frequency between visible markers.

    Cytogenic (cytological, ideogram) mapsare based on the location of

    markers within or near cytological features.

    High density mapsintegrate cytological and physical maps.

    Anchor markerscorrelate the cytogenic mapsto the physical.Chromosome fragments

    can beidentified by migration pattern (RFLP) , taggedusing PCRto amplify short,

    unique(200-500bp)Sequence TaggedSites(STS), or short cDNAsequence probes

    (ExpressedSequence Tags orEST) and Short (orSimple) Sequence Length

    Polymorphisms (SSLPor short repetitive elements). These tagged fragments of known

    sequence can be related to the cytogenic map (probewith complementary).

    Physical maps are measured in base pairs, kilobase pairs or megabase pairs, they often

    show the location of overlapping genomic fragment clones (contigs)and unique

    sequences (STS).

    7

  • 8/9/2019 L20Biol261Genomics2014F

    8/39

    1996-7:Anchoring the

    physical map:

    2335 microsatellite (SSLP)

    sites,16,000 STSmarkedloci &RFLPs used to map

    1600 human genes

    15Human chromosome 1 fig 4-20

    Restriction

    fragments

    8

  • 8/9/2019 L20Biol261Genomics2014F

    9/39

    Ordered Clone by Clone method:

    map first sequence later:Screen large

    clones (BAC) from a chromosome

    library for known sites (restrictionsites, known genes, or other sequence

    to anchor the map.

    HindIII digest, agarose gel

    electrophoresis on fragmented clone:

    Stain, characterize fragments by

    migration distance.

    Share a partial fragment ? = overlap

    (different clones share sequence)

    use overlaptoorient the clone

    fragments into a map

    9

  • 8/9/2019 L20Biol261Genomics2014F

    10/39

    Physical maps, are built by reconstructing the order of fragments cut by restriction enzymes. The

    first cloning vector is usually a YAC. For example, five YACS were known to hybridize to 1

    chromosome band ( 17q2 ). A restriction enzyme cutting an 8 base palindrome sequence having a

    low sequence probability (on average 48 or every 66,000 bases) was used to cut the chromosome

    fragment. The fragments were denatured and the 5 single stranded radioactive YACSwerehybridized to the blots of the digest to visualize the target chromosome fragments. The

    autoradiogram is below. Order of band fragments?

    1

    2

    3

    Chromosome

    band map of

    RFLP fragments

    The exposed

    photographic image of

    lanes (columns) corr-

    esponding to the same

    chromosome DNA

    tested with 5 different

    YAC probes.

    10

    +

    -

  • 8/9/2019 L20Biol261Genomics2014F

    11/39

    Map first sequence

    later: clone by clone,or ordered clone

    approach - public.

    Minimum tiling path-fewest clones

    necessary to get a

    complete sequence

    11

  • 8/9/2019 L20Biol261Genomics2014F

    12/39

    Whole genome Shotgun method

    Overlap: - the overlap

    is determined directly

    by sequence,not

    indirectly by fragment

    length.

    Most of the genome is

    sequenced many

    (10-15) times to get the

    correct overlap

    Small insert clones

    are prepared

    directly from DNA

    & sequenced

    12

  • 8/9/2019 L20Biol261Genomics2014F

    13/39

    (2)Paired end reads:each

    clone isprimed fromtwo different ends of a

    vector, which is known

    and PCR the

    intervening sequence,

    producing an endtagged linear sequence.

    End tagged, multiple

    inserts may then be

    overlapped to produce

    a sequence contig

    13

    Where do the fragments overlap ?

  • 8/9/2019 L20Biol261Genomics2014F

    14/39

    The overlapof sequences in regions of identity can be

    used to make contiguous sequences.

    (1) GATCTCGCCGCGTTGGAGAAGGACTACGAGGAGGTTGGCTCTGAGTCCGAC

    TCTGAGTCCGACCCGTATCC

    (2) ATGATGATGATGAGGATGGCGATGATGGTGACGAGTACTAG

    AGGATGGCGATGATGGTGACGAGTACTAGAGGAGTCGTCGTCGTCTGGGGGCT

    (3) TGATGTTCTGTGTGTCAAGGCCTGATTGATAACTGCTGCTATCCCATGATCTGCCAGTGT

    14

  • 8/9/2019 L20Biol261Genomics2014F

    15/39

    Problem?Repetitive sequence gaps,

    Using clones containing

    fragments of different sizes

    (different restriction enzymes)

    there will be overlap

    15

  • 8/9/2019 L20Biol261Genomics2014F

    16/39

  • 8/9/2019 L20Biol261Genomics2014F

    17/39

    Sequencing DNA fragments

  • 8/9/2019 L20Biol261Genomics2014F

    18/39

    DNA SEQUENCING- Sanger Method: A cloned fragment of DNA

    is sequenced by using:

    (1)aspecificprimerpiece of DNA(oligonucleotide) to replicate theDNAfrom a known, pre-defined starting point.

    (2)a spikeof radioactive dideoxy-nucleotides(ddATP or ddCTP,

    ddGTP, ddTTP) are incorporated with excess normaldATP, dCTP,

    dGTP, dTTP , + DNA polymerase,

    The ddNTPs are randomly

    incorporated and terminate

    the strand elongation

    (3)electrophoresis, visualizethe DNA

    (4)read the fragment order (by size)

    (5)reconstruct the complementaryoriginal sequence

    17

  • 8/9/2019 L20Biol261Genomics2014F

    19/39

    18

  • 8/9/2019 L20Biol261Genomics2014F

    20/39

  • 8/9/2019 L20Biol261Genomics2014F

    21/39

    Automated Sequencing

    uses flourescent tags for

    each ddNTP reaction.

    The sequencing reaction

    can be done in a tube, and

    it is read by a light

    detector

    20

  • 8/9/2019 L20Biol261Genomics2014F

    22/39

    Pyrosequencingrequires single strand DNA (template), a DNA

    primer DNA polymerase and dNTP. Read the sequence by the

    chemiluminescence, powered by ATP produced by sulfurlase.

    21

  • 8/9/2019 L20Biol261Genomics2014F

    23/39

    The basic techniques for sequencing entire genomes:

    (1) libraries (whole genome) (2) cloning vectors (3) PCR (4) DNA

    sequencing machines (5) chromosome maps (6) computers

    22

  • 8/9/2019 L20Biol261Genomics2014F

    24/39

    The DNA sequence is the base for computer - assisted analyses

    Structural genomicsinvolves the analysis of gene

    sequence, gene number, order andphysical nature of chromosomes.

    Comparative Genomics- similarity and divergence among genes with a

    similar function in different species

    Bioinformaticsis the use of computer analysis forstructural or functional genomics.

    Proteomics the study of all the proteins of an organism.

    Transcriptomics - transcript studies

    Functional genomicsstudies the function of genes

    gene expression, interactions between gene and proteins, and

    between proteins

    23

  • 8/9/2019 L20Biol261Genomics2014F

    25/39

    Genomics is the study of genomes in their entirety.

    Most Bacteria are now known by their sequence or

    partial sequence, viruses can be sequenced in a day or two

    Over 100 eukaryotic genomes have been sequenced including

    Human MouseYeast Several fungi

    Malaria parasite Mosquito

    Arabidopsis Rice

    Poplar tree

    Many other species have a great deal of cDNA and gene sequences

    especially ESTs - partial sequences of cDNA clones.

    24

  • 8/9/2019 L20Biol261Genomics2014F

    26/39

  • 8/9/2019 L20Biol261Genomics2014F

    27/39

  • 8/9/2019 L20Biol261Genomics2014F

    28/39

    27

    93% gene similarity - many mutations are rearrangements

  • 8/9/2019 L20Biol261Genomics2014F

    29/39

    Bioinformatics: (broadly)computational challenges

    in biologyor (narrowly) the information content of

    the genome. A first objective being theidentification of binding sites or thefunctional

    elements

    gene annotation.

    28

  • 8/9/2019 L20Biol261Genomics2014F

    30/39

    Bioinformatics(the information content) collates multiple sources of

    information, including comparative genomic (BLAST search), cDNA

    sequence and ORF to annotate a candidate gene sequence.

    Sequenceinformation

    29

  • 8/9/2019 L20Biol261Genomics2014F

    31/39

    A codingtranscriptome(ics) represents that small percentage of thegenetic code that is (apparently) transcribed into RNA moleculesestimated to be less than 5% of thegenome in humans Adams J. (2008)Nature Education 1:1

    It now appears that the majority of the human genome is transcribed(introns and intervening sequence), and the vast majority of sequences are non-protein coding (Frith et al.,2005). The proportion of transcribed sequencesthat are non-protein-coding appears to be greater in mammals compared tonematodes or drosophila.

    Frith et al., 2005. E.J.H.G. 13:894-897

  • 8/9/2019 L20Biol261Genomics2014F

    32/39

    The ENCODE PROJECT: identify and map all the transcribed regions of the human

    genome including regulatory regions, replication origins, DNA methylation and histone

    methylation sites etc..

    The pilot project found among other interesting findings:

    On average- per coding region:

    - There are 5.4 different transcriptsper coding region, over half showed transcription

    from both strands-exact reverse direction complements.

    - 63% of the mouse genome is transcribed, 1-2% have recognizable exons- 41% span introns, 22% span intergenic regions.

    -Majority of the human genome is transcribed

    -Genome- islands of protein coding sequence - interwoven and overlapping

    -transcription units spanning the genome

    Gene definition ?Micro rNA - regulatory gene ?

    31

  • 8/9/2019 L20Biol261Genomics2014F

    33/39

    Proteomicsthe study of the proteome, the sequence andexpression of all proteins

    25-35,000

    genes,

    100,000-30

    0,000

    differentproteins ?

    (alternativesplicing, RNAediting, oralternative

    transcriptioninitiation andterminationsites).Unknown fraction

    32

  • 8/9/2019 L20Biol261Genomics2014F

    34/39

    Genomic sequencing has made possible a new approach to genetics called

    functional genomics, which focuses on genome-wide patterns of geneexpression and the mechanisms by which gene expression is coordinated

    DNA microarray (or chip) - a flat surface about the size of a postage stampwith up to 100,000 distinct spots, each containing a different immobilizedoligotide DNA sequence, of all the known genes in a genomeor all the knowncDNA from a genomesuitable for hybridization with DNA or RNA isolated from cellsgrowing under different conditions

    Functional Genomics- study of expression patterns

    33

  • 8/9/2019 L20Biol261Genomics2014F

    35/39

    unknown

    known

    Relate - which genes are

    active in the tissue

    34

    Every yeastgene was

    cloned and

    sampled are

    spotted onto

    glass slides

  • 8/9/2019 L20Biol261Genomics2014F

    36/39

    35

    Binding (color) intensity indicates RNA concentration- gene activity

  • 8/9/2019 L20Biol261Genomics2014F

    37/39

    DNA chips use synthetic DNA, oligonucleotides, that

    can be spotted at a density of 106/cm2 so fragments

    from all human or other eukaryotic organisms can beannealed to the chip.

    36

  • 8/9/2019 L20Biol261Genomics2014F

    38/39

    29Transcriptional

    regulation of ~

    2500 genes

    showingsignificant

    changes during

    the first 2.75

    hours of C.

    elegansdevelopment

    Time minutesbefore (-2) to minutes

    after (165) the 4 cell stage in

    development (gastrulation)

    Transcript

    abundance relative

    to a non-dividing

    cell 37

  • 8/9/2019 L20Biol261Genomics2014F

    39/39

    A glossaryof types of DNA sequence:

    1. Full length cDNA- complement of the mRNA

    2. Full length (eukaryotic) gene clone(exons, introns, flanking regions).

    3. Restriction fragments, Restriction Fragment Length Polymorphisms4. SNPs - Single Nuclear Polymorphism(s)- nucleotide polymorphism

    5. PCR clone- partial gene sequence

    6. Large genomic clones:BACs or YACs may have the sequence of

    many adjacent genes

    7. Satellite DNA- mid and highly repetitive DNA including VNTRs,mini and microsatellite DNA

    8.STS(sequence tagged sites)short unique sequences used to hybridize

    to chromosomes

    9. Expressed sequence tags ESTs -partial sequence of cDNAs used as

    probes to ID chromosome locations, correlate RFLP and cytologicalmaps.

    10. SSLP (short (simple) sequence length polymorphisms)-short

    repetitive sequences use to anchor a map

    34