Amia tb-review-09

  • View
    181

  • Download
    2

  • Category

    Science

Preview:

Citation preview

Translational Bioinformatics 2009: The Year in Review

Russ B. Altman, MD, PhDStanford University

1Wednesday, March 18, 2009

Thanks!• Casey Overby

• Gil Omenn

• Iddo Friedberg

• Howard Bilofsky

• Brad Malin

• Phil Bourne

• Ted Shortliffe

• Ramon Felciano

• Atul Butte

• Bernie Daigle

• Chirag Patel

• Sarah Aerni

• David Chen

• Joel Dudley

• Alex Morgan

• Yves Lussier

• Andrea Califano

2Wednesday, March 18, 2009

Goals

• Provide an overview of the major scientific events, trends and publications in translational bioinformatics

• Create a “snapshot” of what seems to be important in March, 2009 for the amusement of future generations.

• Marvel at the progress made and the opportunities ahead.

3Wednesday, March 18, 2009

Process

1. Think about what has had early impact

2. Think about sources to trust

3. Solicit advice from colleagues

4. Surf online resources

5. Select papers to highlight in ~2 slides and some to highlight in < 1 slide.

4Wednesday, March 18, 2009

Caveats

• Strictly considered 3/1/08 to 3/16/09 (one exception)

• Focused on human biology and clinical implications (except really important model systems): molecules, clinical data, informatics.

• Considered both data sources and informatics methods (and combination)

• Tried to avoid simply following crowd mentality.

5Wednesday, March 18, 2009

Final list

• 70 semi-finalist papers

• 19 presented here briefly

• 11 others mentioned

• This talk and bibliography will be made available on the conference website.

6Wednesday, March 18, 2009

Final categories• Literature analysis

• Genetic Privacy

• Genes x Environment

• Genes + drugs/small molecules

• Schizophrenia

• Networks for understanding disease

• Stem cell biology

• Potpourri

7Wednesday, March 18, 2009

But first, a lesson in how to make impact...

“A recipe for high impact” (Cokol et al, Genome Biology, 2007)

• Use MESH heading usage to define temperature and novelty

• Temperature = high for using popular concepts

• Novelty = high for using new MESH headings

• Separately applied to methodology and topic

• Conclusion: High impact factor = high topic temperature and medium-low method temperature

8Wednesday, March 18, 2009

9Wednesday, March 18, 2009

10Wednesday, March 18, 2009

Method T vs. Topic T

11Wednesday, March 18, 2009

“A recent advance in the automatic indexing of the biomedical literature”

(Neveol et al, J. Biomed. Info.)

• NLM team uses natural language processing to assign main heading/subheading pair recommendations

• 48% precision, 30% recall compared to human indexers

• Deployed and used currently.

12Wednesday, March 18, 2009

Genetic Privacy

13Wednesday, March 18, 2009

“Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays”

(Homer et al, PLoS Genetics)

• Previously: assumption that reporting pooled data (e.g. distribution of SNP alleles for 1000 individuals) would be safe

• This paper demonstrated that with knowledge of population frequencies and mixture frequencies, can infer with high confidence if one person contributed DNA to mixture

14Wednesday, March 18, 2009

Key idea: 500K noisy measurements = certainty.

15Wednesday, March 18, 2009

“On Jim Watson’s APOE status: genetic information is hard to hide”

(Nyholt et al, Eur. J. of Human Genetics)

• Watson’s genome published, but ApoE sequence (associated with Alzheimer’s) redacted (E4 allele increases risk, E2 decreases)

• Investigators showed that SNPs in LD (correlated) with the APOE markers make imputation trivial (but did not report allele)

• JWGB removed additional 2MB of sequence

16Wednesday, March 18, 2009

“Human genomes as email attachments”

(Christley et al, Bioinformatics)

• 99% of genome is identical (uses reference genome and reference SNP db)

• Encode position, offset, Huffman code for sequence, dbSNP ref when possible

• Watson genome = 3.3 million SNPs, 220K indels

• 3.2 GB to 84 MB (ref genome) to 4.1 MB (full compression)

17Wednesday, March 18, 2009

Gene x Environment

18Wednesday, March 18, 2009

“Prevalence in the United State of selected candidate gene variants: Third National Health and Nutrition Examination Survey, 1991-1994”

(Chang et al, American J. of Epidemiology)

• NHANES, ongoing population study, incredible documentation of health

• 7159 participants

• Allele frequencies for 90 variants, 50 genes from 6 pathways: nutrient metabolism, immunity & inflammation, xenobiotic metabolism, DNA repair, blood pressure, oxidative stress

19Wednesday, March 18, 2009

C in VDR rs731236

20Wednesday, March 18, 2009

“The ‘etiome’: identification and clustering of human disease etiological factors”

(Liu et al, BMC Bioinformatics & AMIA SUMMIT)

• Defined 3342 etiological MESH headings associated with 3159 diseases

• Defined etiology based on UMLS Semantic network

• Defined 1100 genes associated with 1034 diseases

• Created joint clustering of diseases, genes, etiologies as basis for understanding environmental influences on genetic diseases

21Wednesday, March 18, 2009

The “ACE/Hypertension/Diet” cluster

22Wednesday, March 18, 2009

The “p53/cancer/toxin” cluster

23Wednesday, March 18, 2009

Genes + Drugs/Small Molecules

24Wednesday, March 18, 2009

“Drug target identification using side-effect similarity”

(Campillos et al, Science)

• Used similarity in side effect phenotypes to infer if two drugs share a target

• Applied to 746 drugs, to create drug-drug network of 1018 relationships

• 261 relationships in chemically dissimilar molecules (20 tested, 13 validated by binding, 11 with reasonable affinity)

25Wednesday, March 18, 2009

Drug-drug network

26Wednesday, March 18, 2009

Rabeprazole (PPI) and neurological effects

27Wednesday, March 18, 2009

“Estimation of the warfarin dose with clinical and pharmacogenetic data”

(International Warfarin Pharmacogenetics Consortium, New Eng. J. Med.)

• 5000+ patients pooled from 21 sites, 9 countries

• Clinical variables + genotypes for CYP2C9 and VKORC1

• Pharmacogenetic dosing equation outperformed clinical algorithm

• High- and low-dose extremes benefit the most

28Wednesday, March 18, 2009

29Wednesday, March 18, 2009

30Wednesday, March 18, 2009

“Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites”

(Wikoff et al, PNAS)

• MS study of metabolites in plasma in colonized and uncolonized clonal mice.

• Metabolite levels markedly affected, particularly amino acids

• Phase II metabolizing enzyme response to microflora observed

• Implies that host genotype only one consideration in predicting the presence and metabolism of xenobiotics

31Wednesday, March 18, 2009

Two systems for drug metabolism

32Wednesday, March 18, 2009

“Metabolic profiles delineate potential role for sarcosine in prostate cancer progression”

(Sreekumar et al, Nature)

• Combined HPLC & Mass Spec to profile 1126 metabolites across 262 clinical samples (tissue, urine, plasma)

• Able to distinguish normal, BPH, local cancer and metastatic cancer from signatures

• Sarcosine, derivative of glycine (GLY) increased in invasive cancer

• Injection of sarcosine or knock-down lead to invasive phenotype from benign cells!

33Wednesday, March 18, 2009

Metabolite levels

34Wednesday, March 18, 2009

Sarcosine levels measured

35Wednesday, March 18, 2009

Schizophrenic results on...Schizophrenia

36Wednesday, March 18, 2009

“No significant association of 14 candidate genes with schizophrenia in a large European ancestry sample: implications for psychiatric genetics”

(Sanders et al, Am. J. Psychiatry)

• 1870 schizophrenics, 2002 controls

• 789 SNPs in 14 genes with reported associations

• No genome-wide significance for SNPs or haplotypes, all compatible with chance.

37Wednesday, March 18, 2009

Hits precisely as expected by chance

38Wednesday, March 18, 2009

“Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia”

(Walsh et al, Science)

• Novel deletions and duplications in 20% vs. 5%

“A genome-wide investigation of SNPs and CNVs in schizophrenia”

(Need et al, PLoS Genetics)

• 8 cases with deletions > 2 MB, 0 controls

• No evidence that preferential disruption of schizophrenia pathways

39Wednesday, March 18, 2009

Networks that tell us about disease...

40Wednesday, March 18, 2009

“Network-based global inference of human disease genes”

(Wu et al, Molecular Systems Biology)

• CIPHER tool to predict and rank potential disease genes

• Based on intuition that similar phenotypes are linked to functionally related genes

• Reports ranked candidates for 5000 phenotypes over human genome

• Builds on previous work to create a new classification of disease based on molecular data.

41Wednesday, March 18, 2009

Relating phenotypes to genes

42Wednesday, March 18, 2009

43Wednesday, March 18, 2009

“Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network”

(Iossifov et al, Genome Res.)

• Problem: multifactorial, heterogeneous disorders

• Novel framework combines linkage formalism with molecular interaction networks (text mining)

• Better statistics through grouping of genes

• Apply to autism, bipolar disorder & schizophrenia

• Find shared gene targets

44Wednesday, March 18, 2009

45Wednesday, March 18, 2009

“A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas”

(Mani et al, Mol. Sys. Bio.)

• Method to detect cancer-causing genetic lesions

• Look for molecular interactions that become dysregulated in specific tumors

• Algorithm for aberrant genes using mutual information measuring gain- or loss-of-correlation

• Applied to B-cell interactome, with strong predictive performance

46Wednesday, March 18, 2009

47Wednesday, March 18, 2009

“Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans”

(Querec et al, Nat. Immunology)

• Yellow fever vaccine old and effective, but totally empirical

• Goal to understand biology using microarray expression and cytokine activities

• Combined data to predict T-cell response correctly in 90% of independent sample

• Predicted neutralizing antibody formation with 100% accuracy

48Wednesday, March 18, 2009

Strong up-regulated network

49Wednesday, March 18, 2009

Separating CD8 response

50Wednesday, March 18, 2009

“Analysis of drosophila segmentation network identifies a JNK pathway factor overexpressed in kidney cancer” (Liu et al, Science)

• Initial goal: build network of fly segmentation genes, whose human homologs associated with cancer

• Combined gene expression, chromatin IP, literature mining, and yeast 2-hybrid results to build network of genes involved in segmentation control

• Found a major hub in resulting network = D-SPOP modulates JNK pathway--implicated in cancers

• H-SPOP tested and found as biomarker in 99% of clear cell renal cell carcinomas

51Wednesday, March 18, 2009

D-SPOP is a hub in JNK network

52Wednesday, March 18, 2009

H-SPOP stains clear cell carcinoma specifically

53Wednesday, March 18, 2009

Stem cell biology

54Wednesday, March 18, 2009

“Integration of external signaling pathways with core transcriptional network in embryonic stem cells”

(Chen et al, Cell)

• The stem cell miracle continues...4 transcription factors can be introduced into somatic cells to transform them into pluripotent stem cells.

• Investigators mapped targets of 13 transcription factors implicated in stem cell transformation (ChIP-seq)

• Specific genomic regions targeted by different TFs.

• Understanding regulation of ES-cells is coming...

55Wednesday, March 18, 2009

TF binding is co-localized

56Wednesday, March 18, 2009

Binding motifs defined

57Wednesday, March 18, 2009

The mother of TF networks

58Wednesday, March 18, 2009

Great work skipped...• Biosurveillance of emerging

biothreats using scalable genotype clustering

• The infinite sites model of genome evolution

• Predicting unobserved phenotypes for complex traits from whole-genome SNP data

• Cost effective strategies for completing the interactome

• Relating protein pharmacology by ligand chemistry

• Better bioinformatics through usability analysis

• Dynamic modularity in protein interaction networks predict breast cancer outcome

• A burst of segmental duplications in the genome of the African great ape ancestor

• DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome

• A universal mechanism ties genotype to phenotype in trinucleotide diseases

• Global diversity in the human salivary microbiome

59Wednesday, March 18, 2009

“The Human Protein Atlas--a tool for pathology”

(Ponten et al, J. Pathology)

• 6122 antibodies (representing 5011) proteins exposed to 708 tissues. Data available.

60Wednesday, March 18, 2009

2008 Crystal ball...Sequencing makes a comeback (watch out microarrays....)

Translational science projects will create astounding data sets (hopefully available) to catalyze research

GWAS will continue to proliferate

Consumer-oriented genetics will create demand for online resources for interpretation

Difficult decisions about when/how to bring new molecular diagnostics to practice.

61Wednesday, March 18, 2009

2009 Crystal ball...

• Focus on mechanism in interpreting genetic associations

• More sophisticated mechanisms to find signal in GWAS, including data integration

• Cellular dynamics of expression, metabolites, proteins

• Multiple human & cancer genome sequences

• Consumer sequencing (vs. genotyping)

62Wednesday, March 18, 2009

Thanks.See you in 2010!

russ.altman@stanford.edu

63Wednesday, March 18, 2009

Recommended