53
Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G.

Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Embed Size (px)

Citation preview

Page 1: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Human Genome Structure and Organization

Bert Gold, Ph.D., F.A.C.M.G.

Page 2: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Genetic Variation

PhenotypeExpression of the genotype (modified by the environment).

The structural or functional nature of an individual. Includes:

appearance, physical features, organ structure

biochemical, physiologic nature

GenotypeGenetic status, the alleles an individual carries.

Page 3: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Learning Objectives

Recap and Update Public and Private Human Genome Project Status

Provide Reminders of Necessary Background for Genetic Disease Association and Linkage Studies

Page 4: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Definitions• Penetrance - The probability that an individual who is ‘at-

risk’ for the disorder (ie- carries the gene) develops (expresses) the condition. May be age dependent.

• Expression - The characteristics of a trait or disease that are outwardly expressed. Eg-myotonic dystrophy: myotonia, cataracts, narcolepsy, frontal balding, infertility.

• Ascertainment – The method used in gathering genetic data. Study conclusions differ depending on how affected individuals entered the study.

• Phenocopy – Individuals whose phenotype, under the influence of non-genetic agents, has become like the one normally caused by a specific genotype in the absence of non-genetic agents.

• Pleiotropy - The quality of an allele to produce more than one effect; ie- to manifest its expression in the structure and/or function of more than one organ system or tissue

• Recurrence Risk – Likelihood that a relative of a proband for a rare disease will have the same disease.

Page 5: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Penetrance and Expressivity

• Penetrance: Proportion that expresses a trait– Complete: P=1.0 or 100%– Incomplete (“reduced”): P<1.0 or < 100%

• Expressivity: Severity of the phenotype– Expressivity may vary

• Between families (interfamilial) or• Within families (intrafamilial)

• TRY NOT TO CONFUSE “VARIABLE EXPRESSIVITY” WITH “INCOMPLETE PENETRANCE”

Page 6: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Chromosomes, Genes and Proteins

Genes are on Chromosomes

Genes may encode proteins or RNA

Page 7: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Non-coding RNA ‘genes’ • tRNAs (497 were counted, 821 when count

genes and pseudogenes)– tRNAs found are consistent with Wobble– Codon bias only roughly correlated with tRNA

distribution

• rRNAs• small nucleolar RNAs (snoRNAs)• snRNAs (spliceosome constituents)• 7SL RNA• telomerase RNA• Xist transcript• Vault RNA

Page 8: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

tRNAs

Page 9: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Some chromosomes are richer in genes than others

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

1 3 5 7 9 11 13 15 17 19 21 X

Chromosomes

Number ofNucleotide

sin

Exons

Page 10: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

HOXA, HOXB, HOXC and HOXD are in regions with a particularly low density of repeats: This is believed to result

from the presence of Cis-acting elements in this vicinity.

Page 11: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Proteins demonstrate patterns and similarity of function

Page 12: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Functionally and Structurally similar proteins are organized into families

e.g.- E.C., SWISS-PROT, TrEMBL,

Page 13: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

In silico approaches to characterize genes include:

• PFAM, searchable via HMMER• Other in silico collections include:

– PRINTS– PROSITE– SMART– BLOCKS

• Creation of an Integrated Protein Index (IPI)

Page 14: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

How many genes are there?

Estimates from the Public Program– RefSeq– Exons– Introns– Average Sizes– Coding Sequences (CDS)– Alternative splice products (about 3%)– Creation of an Integrated Gene Index (IGI)– Genscan to Ensembl to Pfam via GeneWise (31,778)– Could be as low as 24,500 using overprediction

corrections.

Page 15: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Estimates from Celera25,086 in Assembly 3

• 25,086 in Assembly 3

Page 16: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Pre-existing estimates

• W. Gilbert’s back of the envelope calculation

• Reassociation Kinetics

• Estimates from Double Twist using Promoter Inspector plus

• Unpublished estimates from Human Genome Sciences

Page 17: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Size of Genes:

• Largest: Dystrophin 2.7 Mb

• Titin

• 80,780 bp coding

• 178 exons

• largest single exon 17,106

Page 18: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

GENE HOMOLOGS, ORTHOLOGS, PARALOGS

• Vaculolar sorting machinery in yeast• ABC gene superfamily• Ig gene superfamily• FGF superfamily• Intermediate filament superfamily• PROTEIN FAMILY EXPANSION

APPEARS TO BE A PRIMARY EVOUTIONARY MECHANISM

Page 20: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

GENE ONTOLOGY

• Standard Vocabulary

• Hierarchy of terms (Directed ACYCLIC Graph)

• Ashburner Nature Genetics 25:25-29 (2000)

• ‘Bushy’ model

Page 21: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Horizontal Transfer controversy • One of the major conclusions of the Public Genome effort,

published in Feb. 15, 2001 Nature was: “Hundreds of human genes appear likely to have resulted

from horizontal transfer from bacteria at some point in the vertebrate lineage. Dozens of genes appear to have been derived from transposable elements”

• This has now been widely disputed and is believed to result from:– Microbial contaminants in the sequence.– Bacterial gene integration into pre-vertebrates– And

• “The more probable explanation for the existence of genes shared by humans and prokaryotes, but missing in nonvertebrates, is a combination of evolutionary rate

variation, the small sample of nonvertebrate genomes, and gene loss in the nonvertebrate lineages. “

-Salzberg et. al., Science

Page 22: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Splice Pattern, 98% GT-AG

Page 23: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Chromatin Structure

• Euchromatin

• Heterochromatin

• Nucleosomes

Page 24: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Chromosome Facts

• Chromosomes replicate during S phase

• Chromosomes recombine during Pachytene

• Recombination is an obligate activity

• Sex chromosomes recombine with each other

Page 25: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Cytogenetics is done by Karyotyping

• Chromosomes are chemically frozen in metaphase

• Must be carried out on dividing cells• Microfilament inhibitors• Microtubule inhibitors• Membrane lysis• Pronase, trypsin digest• Giemsa stain• G-bands correspond to regions of relatively low

GC contenthttp://genome.ucsc.edu/goldenPath/mapPlots/http://genome.ucsc.edu/goldenPath/hgTracks.html

Page 26: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Cell Division: Meiosis

– Segregation• Defined: Alleles are paired; gametes

receive one of each.• Exceptions: trisomy and uniparental disomy

– Independent Assortment• Gene Pairs segregate independently• Exception: linkage

Page 27: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Meiosis Creates Gametes

And provides a basis for genetic recombination!

Page 28: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Genetic Recombination

• Crossing Over• Resolution• Recombinant Chromosomes

– OBLIGATE ACTIVITY– FEMALE RECOMB. RATES HIGHER THAN

MALE– INCREASED RATES AT TELOMERES– PARADOX: SHORT ARMS SHOW MORE THAN

LONG ARMS– 1cM is 1 Mb on long arms, but short arms are 2 cM

per Mb and the Yp-Xp pseudoautosomal region is 20 cM per Mb.

Page 29: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

INCREASED RATES AT TELOMERES

Page 30: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

PARADOX: SHORT ARMS SHOW MORE THAN LONG ARMS

Page 31: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Genes

• Units of heredity• Encode proteins (and some RNAs)• Human genetics is the study of gene variation in

humans• ‘Gene’ as a term is used ambiguously to refer

both to the ‘locus’ and the ‘allele’ ie- There is only one locus but two alleles in a given individual.

• Sequencing in both genome projects took place upon multiple alleles; this has led to some assembly confusions.

• Ultimately want a haploid genome map.

Page 32: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

The Human Genome Project • International public effort commencing in 1990 to

sequence the entire human genome by 2005.• STS approach chosen in 1991• Private effort launched in 1996 by Celera using

‘Shotgun’ cloning

Page 33: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

BAC clones, sequenced into BAC end reads, and assembled into ‘contigs’

Page 34: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Markerless ‘contigs’ in the Celera

assembly are called ‘Scaffolds’

Page 35: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Markers are BAC ends in the ‘shotgun’

Page 36: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Mate pair reads provided the core of Celera sequence

Page 37: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Draft human genome sequences complete by

February 2001.• Published simultaneously in Feb. 2001

– Public Sequence in NATURE (409: 745-964)– Celera Sequence in SCIENCE (291: 1145-

1434)

Page 38: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Greater than 50% of sequence is repetitive

Page 39: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

45% of the human genome is derived from transposable elements

• Long Interspersed Elements: LINEs (21% of genome)– LINE1 – Some Still Active, Autonomous, consist of two ORFs

(one is a pol).

– LINE2

– LINE3

• Short Interspersed Elements: SINEs (13% of genome)– ALU – Some still active, use L1 enzymes to replicate

– MIR

– Ther2/MIR3

• LTR Retroposons– Consist of gag and pol

– Protease, rt, RNAseH, integrase all encoded

– Reverse transcription occurs cytoplasmically, using a tRNA to prime replication

• DNA Transposons

Page 40: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

98.5 % of sequence is non-coding.

Approximately 1/3 of the human genome is transcribed (public guess).

Page 41: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Allelism

• Alternate forms of a gene

• e.g.- Sickle Cell, CFTR

• Recessive disease

• e.g. Achondroplasia, Tuberous Sclerosis

• Dominant Disease

Page 42: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Heterozygote or Homozygote

• 1,2 or 1,1

• homogeneity of alleles at a locus

Page 43: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Genetic Markers

• RFLPs• VNTRs (STRs)• Microsatellites• STSs• SNPs• “Tools” used to find disease genes• “Flags” with locations throughout the

genome

Page 44: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Polymorphism Information Content versus Heterozygosity (PIC vs. het)

• Determining heterozygosity from SNP rare allele frequency

• Information Content in SNPs versus STRs

Page 45: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Typology of SNPs• Type I- Coding, non-synonymous, non-conservative• Type II- Coding, non-synonymous, conservative• Type III- Coding, synonymous• Type IV- Non-coding, 5’-UTR• Type V- Non-coding, 3’UTR• Type VI- Other non-coding• Type I and Type II SNPs have lower heterozygosity

than other SNPs, presumably as a result of selective pressure.– About 25% of type I and type II SNPs have minor allele

frequencies > 15%– About 60% have minor allele frequencies < 5%

Page 46: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Mutation

• Occurs more often during male meiosis

• Occurs more often in ‘long genes’

• More easily detected in Dominant Diseases– Achondroplasia– Duchenne Muscular Dystrophy

• May often involve CpG mutating to TpG

Page 47: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Autosomal Recessive Inheritance

• Two copies of a gene required to be affected• Carriers have one copy of the mutation and are

unaffected• 25% of offspring of two carriers will be

affected• Males and females affected in equal number• Eg. Sickle Cell, beta-thal., CF

Page 48: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

X Linked Recessive (Sex Linked)

• Females rarely affected

• No male to male transmission

• Affected males transmit gene to all daughters

• Eg- Duchenne Muscular Dystrophy, Hemophilia A

Page 49: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Autosomal Dominant Inheritance

• Each child at 50% risk

• Does not skip generations

• Often, lethal in double dose

• Large genetic load

Page 50: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

X-linked Dominant Pedigree

• Example is Hypophosphatemic, Vitamin D Resistant Rickets

• Distinguished from Autosomal Dominant by:– No male-to-male transmission– All daughters of affected fathers are affected

Page 51: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

IMPORTANT NOTE:

Dominant and Recessive refer to the phenotypic expression of alleles, NOT to intrinsic characteristics of gene loci.

Page 52: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Inheritance Pattern Complexities • Pseudodominant Transmission of a Recessive• Pseudorecessive Transmission of a Dominant

– Misassigned paternity, causal heterogeneity, incomplete penetrance, germline mosaicisim

• Mosaicism• Mitochondrial Inheritance• Penetrance and Expressivity

– Semi-dominant, gender- influenced, age-related, transmission-related, imprinting

• Uniparental Disomy (UPD)• Environmental effects, phenocopies

Page 53: Human Genome Structure and Organization Bert Gold, Ph.D., F.A.C.M.G

Preview of linkage analysis

• Characterizing Human Genetics:– Long generation time– Inability to control matings– Inability to control study population– Inability to control exposures to environmental

conditions– It is possible to define phenotypes well!– Can study genetic structures through family history– Link phenotypes and genetic structures through

statistical methods