42
Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informati [email protected] http://wall.hms.harvard.edu

Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics

Embed Size (px)

Citation preview

Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes.

Dennis P. Wall, PhDCenter for Biomedical [email protected]://wall.hms.harvard.edu

Outline

• Rationale & Biological Significance (30 mins)

• Present status (5 mins)

• Project Plan (25 mins)

Introduction

• Polygenic & Multigenic

• Many genes have been linked to autism

• Few genes have been replicated in across studies

• Difficult for a single researcher to grasp the complexity of the autism gene landscape

StatisticsU.S. number of cases 1992-2006

http://www.fightingautism.org

Autism

Fragile X

RettSyndrome

TuberousSclerosis

Angelman Schizophrenia

Epilepsy

Seizure Disorder

Mental Retardation

Others??

Behavioral overlap with other disorders

Approach

• Build the network of all genes implicated in Autism to date

• Conduct large comparative analysis of Autism and other neurological disorders at the level of genes, biological processes, and networks

• Leverage existing research on Autism-related disorders to find new genetic leads.

Building Gene Lists for All Neurological Disorders (433)

OMIM

GeneCards

NINDS

AspergerFragile XTourette’sOCD…

Autism

OCD

Epilepsy

Ataxia Gene Lists

Disease source

Gene-Disease sourcesDisease gene database

ADHD

Tourette Syndrome

Attention Deficit Hyperactivity Disorder

Primary Lateral Sclerosis

Neurotoxicity

Down Syndrome

Dementia

Alzheimers Disease

Alzheimer Disease

Brain Injury

Stroke

Multiple Sclerosis

Systemic Lupus Erythematosus

Cerebral Palsy

Erbs Palsy

Neuronal Migration Disorders

West Syndrome

De Morsiers Syndrome

Williams Syndrome

Hydrocephalus

Encephalopathy

Huntington Disease

Epilepsy

Schizophrenia

Asperger Syndrome

Angelman Syndrome

Autism

Rett Syndrome

Hypotonia

Infantile Hypotonia

Spasticity

Microcephaly

mental retardation

Fragile X

Ataxia

Hypoxia

Seizure Disorder

Tuberous Sclerosis

obsessive compulsive disorder

Major Depression

Migraine

Autism Cluster

1100100101…1110101011…1001010100…1001011101…//1101011101…

Genes

Dis

orde

rsAutism Cluster

Network Construction

• Data derived from STRING (http://string.embl.de/)

• Integration of p-p interaction (interactome), co-expression (transcriptome), orthology (orthologome),text (bibliome), and other lines of evidence.

• Focus on creating a networks of possible interactions within a normal cell using classification methods (random forests)

Sequence coEvolution

P-P Interaction

FXYD1 is identified as a MeCP2 target gene whose de-repression may directly contribute to Rett syndrome neuronal pathogenesis

Text (aka Bibliome)

http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030043

Random Forest DecisionD1 D2 D3 D4 D5

D1 D3

D3D4= {1,0,2,1,0}

D2

D3

D3

D5D4

D1

D2 D4

Yes

No

A

B

A B

Correlated Expression

Networks for all AC disorders

Hypoxia(586 N/4359E)

FragileX(97N/100E)

TuberousSclerosis

(110N/204E)

Hypotonia(154N/208E)

Autism(145N/164E)

Microcephaly(135N/166E)Rett

(48N/74E)

Angelman(51N/57E)

Spasticity(62N/40E)

Mental Retardation(573N/1035E)

Ataxia(428N/1489E) Seizure

Disorder(35N/13E)

Inf. Hypotonia(29N/16E)

Asperger(15N/9E)

autworks.hms.harvard.edu

Multi-disorder component of autism (MDAG)

• 66 out of 127 involved in at least one member of the autism cluster• Highly connected component of the autism network

Biological Process p value MDAG genes

transmission of nerve impulse 3.00E-11 ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, TH, TPH1, TSC1

nervous system development

3.29E-11 ALDH5A1, APOE, ARX, BTD, CHRNA4, DAB1, DCX, FMR1, FOXP2, GABRA5, GATA3, GRIN2A, HOXA1, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1, UBE3A, VLDLR

synaptic transmission 7.68E-10 ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SLC6A4, TH, TPH1

cell-cell signaling 3.12E-09 ABAT, ADM, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, SSTR5, TH, TPH1, TSC1

brain development 2.64E-06 ARX, DAB1, DCX, FOXP2, GABRA5, HOXA1, MET, NF1, RELN, TSC1, UBE3A

generation of neurons 2.43E-05 APOE, ARX, DAB1, DCX, MAP2, MECP2, MET, NDN, NF1, NTF5, PTEN, RELN, VLDLR

regulation of cell proliferation

2.45E-04 ADM, ARX, CHRNA7, DHCR7, FOXP2, GRPR, MECP2, MET, NDN, NF1, PAX3, PTEN, SSTR5, TSC1

cell migration 3.93E-03 ARX, DAB1, DCX, MET, NDN, NF1, PAX3, PTEN, RELN, VLDLR

homeostasis 1.90E-02 ADM, APOE, ARX, CHRNA4, CHRNA7, GRIN2A, MBD1, NDN, NF1, SCN1A, SLC40A1, SSTR5, TH

cell morphogenesis 1.94E-02 APOE, ARX, ATP10A, DCX, MAP2, MECP2, NDN, PTEN, RELN, TSC1

ion transport 2.74E-02 ARX, CACNA1D, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GRIN2A, MECP2, MET, SCN1A, SLC40A1, TSC1

cell differentiation 4.35E-02 ADM, APOE, ARX, DAB1, DCX, DHCR7, EXT2, FXR1, GATA3, GLO1, GRIN2A, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1, VLDLR

Significantly enriched MDAG processes

Ion Transport

Cell Proliferation

CNSDevelopment

SynapticTransmission

P = 2.7E-02

P = 2.45E-04

P = 3.29E-11

P = 7.68E-10

•Fisher’s exact test•Bonferroni adjustment•14648 biological processes from Gene Ontology tested

Process-Driven Predictions

Fragile X

Tuberous Sclerosis

Seizure Disorder

Mental Retardation

CNS developmentSynaptic TransmissionIon Transport

Biological Processes Autism Cluster Disorders Putative New Genes

64 new genes, all of which occur in 2 or more of the Autism Cluster Disorders

Cell Proliferation…

Experimental Validation• GEO6575 (from UC Davis M.I.N.D. institute)• White blood cell Affymetrix U133plus2.0 • 17 samples of autistic children without

regression• 18 children with regression• 9 children with mental retardation or

developmental delay• 12 typically developing children from the general

population

Blood for Brain

Autism without regression (17)

Autism with regression (18)

Experimental Validation

• GEO6575 (from U.C. Davis M.I.N.D. institute)• White blood cell Affymetrix U133plus2.0 • 17 samples of autistic patients without

regression• 18 patients with regression• 9 patients with mental retardation or

developmental delay• 12 typically developing children from the

general population

Data-driven approach to FDR detection can be ineffective

• Standard data-driven application of false discovery rate control yields few genes below FDR threshold of 0.05. (with these data, only 2 genes survive)

• This is a frequent circumstance in instances of weak signal and large background noise (e.g. microarray experiments)

Results of process-driven search

• 43 Process-derived gene predictions had FDR-adjusted p values <0.05

• Highly significant rate of validation -- 65% of predictions confirmed by expression data

Network-Driven Predictions

Results of network-driven search

• 267 occurred in 1 autism cluster disorder

• 58 occurred in 2

• 17 in 3

• 3 in 4 sibling disorders

• A total of 345 new predictions

Results of network-driven search

• 301 had FDR-adjusted p values <0.05

• 90% (!) of predictions verified by expression data

2 4 6 8

20

40

60

80

100

average distance

8 10 12 14

43

Prior knowledge focuses whole-genomic search

• 43 Process-derived gene predictions had FDR-adjusted p values <0.05. 65%

• 301 Network-derived gene predictions had FDR-adjusted p values <0.05. 90%

The rate of validation in both cases is significantly non-random

Top 20 genes occurring in 3 or more Autism Sibling Disorders

For many of these candidates, their roles in neurological impairment have been studied in autism cluster disorders, but not in autism.

SLC16A2

Molecular Triangulation

OPHN1

AR

PAFAH1B1

FLNA

SLC6A8

MYO5A

FXN

L1CAM

Mental RetardationFragile XHypotonia AtaxiaHypoxia

MicrocephalyRett SyndromeSpasticityTuberous Sclerosis

- cytoskeleton organization

- cell organization/biogenesis

- cell communication

- cell motility

GO biological process enrichment

Conclusions

• Previous research has implicated between 100 and 1500 genes as contributors to the molecular physiology of Autism.

• Our knowledge-driven approach provides a logical means to filter the genome wide search.

Conclusions• Global “ask” swamped by noisy signal

• Informed, knowledge-driven “ask” results in biologically significant gene predictions

• Comparative analysis of Autism with related neurological disorders provides a focused search for novel gene candidates

Autworks• Autworks is a web-driven navigation

system that allows any researcher to view and search through the network of genes implicated in autism and related neurological disorders

• Built to aid and abet the role of serendipity and inspiration for researchers working on autism and other complex neuro diseases.

• http://autworks.hms.harvard.edu

Autworks now

The Plan

• Bring our analytical strategies and Autworks to the cloud– Beef up underbelly using AWS storage and the

Amazon “Turkforce”– Scale up comparative network analysis– Enlarge validation database, verify/re-verify

computational predictions, robustify the candidates

Database Description Stats

Database of Genomic Variants A curated catalogue of structural variation in the human genome

~31615 total entries (indels, inversions, and copy number variation)

dbSNP NCBI’s central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms

~6,136,008 SNPs for Human

Chromosomal Variation in Man* Searchable reference of chromosomal variation

> 3000 links to publications describing 30 different types of chromosomal variation in human disease

Human Gene Mutation Database*

Established for the study of mutational mechanisms in human genes

62901 mutations in public release

OMIM* NCBI’s compendium of human genes and genetic phenotypes

12,634 genes for ~2459 phenotypes/diseases

GeneCards* searchable database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information

all known and predicted human genes with summaries of known disease association

SNPedia* SNPedia shares information about the effects of variations in DNA, citing peer-reviewed scientific publications

4621 SNPs

Aim 1: Build the neurological disease “gene core” of the Autworks relational database

* Can be queried with a disease or gene term

Aim 1: Steps(1) Extract the entire set of neurological disorders listed by NINDS

(currently 433) to ensure that we can find any and all commonalities to Autism.

(2) Mine all databases in above Table that can be searched using a disease term as the query, specifically the Online Mendelian Inheritance in Man (OMIM), GeneCards, Chromosomal Variation in Man, the Human Gene Mutation Database (HGMD), and SNPedia.

(3) Combine and import the features from each of the online resources into a relational database that will become the backend of Autworks, being careful to remove any redundancies.

(4) Cross-reference resources to comprehensively populate data model.

Gene-disease data model “Gene Core”

Field Description

Gene official gene symbol from HUGO

Variant ID unique identifier (e.g., RS#, SS#, etc.)

Variant Type SNP, CNV, Indel, etc.

Genomic Location chromosomal coordinates (hg build 36)

Source Database(s) from where gene and/or gene variant was derived

OMIM score Confidence score used by OMIM

Polyphen score Score indicating severity of mutation

Disease Autism and related neurological disorders

PubmedID Article(s) describing the genetic variant

This data model will share much in common with Variome project’s database

ABI1

MedlineMedline

Medline

PMID: 17173049 SHANK3 (also known as ProSAP2) regulates the structural organization of dendritic spines and is a binding partner of neuroligins; genes encoding neuroligins are mutated in autism and Asperger syndrome. Here, we report that a mutation of a single copy of SHANK3 on chromosome 22q13 can result in language and/or social communication disorders...

GeneTagger

PMID: 17304222 We identified an important component for controlled actin assembly, abelson interacting protein-1 (Abi-1), as a binding partner for the postsynaptic density (PSD) protein ProSAP2/Shank3. During early neuronal development, Abi-1 is localized in neurites and growth cones; at later stages, the protein is enriched in dendritic spines and PSDs…

Candidate gene filtered

MeSH Major Topics

MeSH term filtered

AnnotatorChecksAccuracy throughBioNotate system

Shank3 Shank3 Autism

Results:Gene-Gene

Gene-DiseaseCorpora

Can we Turkify this process???

Aim 2: Build interaction & network cores for Autworks

Database DescriptionProtein-Protein

InteractionDerived directly from STRING [18]. STRING incorporates >80,000 p-p interactions

from numerous sources including MINT [24], HPRD [25], BIND [26], DIP [27], BioGrid [28, 29], KEGG [30], and Reactome [31]. These databases contain records from two-hybrid assays, synthetic lethality assays, mass spectrometry, co-Immunoprecipitation, and more.

Phylogenetic Profiles

We will take the union of evidence from STRING and evidence from RoundUP, which was built by the PI and has greater coverage than STRING’s orthology information (21,000+ unique phylogenetic profiles for more than 30 Eukaryotic organisms [2]). Phylogenetic profiles are commonly used to predict functional relationships between proteins [32, 33].

Gene Ontology (GO)

GO [34, 35] contains >923034 unique biological process, function, and cellular component terms. Same process, function, and/or cellular location can be used to predict protein-protein interaction. This has been incorporated into STRING.

Co-Expression We will combine data from STRING with our own in house Co-Expression database, ChipperDB [23]. ChipperDB contains a sizable portion of NCBI’s Gene Expression Omnibus [36]. Co-expression is a proven method for predicting shared function and protein-protein interaction [37].

Bibliome Statistically relevant co-occurrences of gene names, and semantically specified interactions found via Natural Language Processing [16].

GO

Co-Ex

P-P intx

Bibliome

Phylo-profiles

Classifier

Interaction Core

Network core

Ataxia

MentalRetardation

Autism

Can we “cloud” it up???

Aim 3: comparative network analysis on the cloud

Schizophrenia

Autism

- Find disease filtered interacting partners- Find shortest paths btw candidates- Find minimal subnetworks- Verify and reconstruct networks appropriately

Rett Syndrome

AngelmanSyndrome

MentalRetardation Genetic Landscape of Autism

Autism Diseaseome

Acknowledgments

• Zak Kohane• Matt Huyck• Tom Monaghan• Todd DeLuca• Nieves Mendizabel• Paco Esteban• Joaquin Goni

• Alal Eran• Michal Galdzicki• Lou Kunkel• Alexa McCray• Leon Peshkin