Upload
osama-jomaa
View
142
Download
0
Tags:
Embed Size (px)
Citation preview
Mouse Models in Research
Shares 99% of its genome with humans
Fewer ethical concerns than other
mammal models
Mouse Models in Research
InexpensiveShares 99% of its genome with humans
Fewer ethical concerns than other
mammal modelsShort generation
times
Small
The Mouse Trap. The Danger of Using one Lab Animal to Study Every Disease. Daniel Engber http:http://www.slate.com/articles/health_and_science/the_mouse_trap/2011/11/lab_mice_are_they_limiting_our_understanding_of_human_disease_.html. November 16, 2011
Designer Mice for Human Research
Photo taken from “Designer mice for human disease - A close view of Nobel Laureate : Oliver Smithies” Yau-Sheng Tsai, Pei-Jane Tsai, Man-Jin Jiang, Cherng-Shyang Chang. http://proj.ncku.edu.tw/research/commentary/e/20071116/2.html December 9, 2014
Mouse Model is Not Perfect Though
Photo taken from: The Mouse Trap. The Danger of Using one Lab Animal to Study Every Disease. Daniel Engber http:http://www.slate.com/articles/health_and_science/the_mouse_trap/2011/11/lab_mice_are_they_limiting_our_understanding_of_human_disease_.html. November 16, 2011
Mouse Correlation with Human to Equivalent Diseases
Photo taken from “Genomic responses in mouse models poorly mimic human inflammatory diseases.” Seok, Warren, and Others. Proceedings of the National Academy of Sciences. 110, no. 9 (2013): 3507-3512.
Rank correlation (R2)
Percentage of genes changed in the same direction
Proposed Research
Classify the Mouse-Human scientific literature in PubMed into different areas of research
Citation Networks + MeSH Thesaurus
Identify and study the popular areas of Mouse-Human research
What?
How?
Why?
Proposed Research
Classify the proteins in the Mouse-Human citation pairs into different biological systems
Protein Co-occurrence Networks + Gene Ontology
Investigate the biological systems andproteins for which Mouse is used as a model organism for Human
What?
How?
Why?
Agenda
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Protein and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Getting Mouse and Human PubMed IDs
UniprotGOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human papers from Uniprot
Getting Mouse and Human PubMed IDs
UniprotGOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human papers from Uniprot
2. Query PubMed API for the citation list for each article
Getting Mouse and Human PubMed IDs
UniprotGOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human papers from Uniprot
2. Query PubMed API for the citation list for each article
.
.<CitationList>
<PMID> 342342 </PMID><PMID> 423545 </PMID><PMID> 432598 </PMID>
</CitationList>..
3. Parse PubMed XML response and get the citation list
Getting Mouse and Human PubMed IDs
UniprotGOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human papers from Uniprot
2. Query PubMed API for the citation list for each article
.
.<CitationList>
<PMID> 342342 </PMID><PMID> 423545 </PMID><PMID> 432598 </PMID>
</CitationList>..
3. Parse PubMed XML response and get the citation list
Very few PubMed articles have the citation list in their XML file!
Getting Mouse and Human Citation List from Scopus
UniprotGOA
Mouse PubMed Identifiers (PMIDs)
Human PubMed Identifiers (PMIDs)
1. Get Mouse & Human papers from Uniprot
2. Author HTTP GET request with PMIDS
3. Parse Scopus JSON response and get the citation list
.
.{CitationList: {PMID: 342342}, {PMID: 423545}, {PMID: 432598}}
.
.
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Building the Citation Network
H
M
M
H
H
H
H
M
H
H
H
M
H
HH
H
H
H
M
H
M
M
H
H
H
H
M → HH → H
H → M
M → M
Building the Citation Network
H
M
M
H
H
H
H
M
H
H
H
M
H
HH
H
H
H
M
H
M
M
H
H
H
H
M → HH → H
H → M
M → M
62%3%
34%
Mouse Inter and Intra Citations
Mouse-Human Citations Mouse-Mouse Citations
Moue-Others Citations
34%
62%
4%
Human Inter and Intra Citations
Human-Others Citations Human-Human Citations
Human-Mouse Citations
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Medical Subject Headings
Controlled vocabulary to index PubMed articles
Stored in a DAG-like structure
16 top level concepts at the root
Includes ~27K concepts (MeSH descriptors) all together
Medical Subject Headings
Controlled vocabulary to index PubMed articles
Stored in a DAG-like structure
16 top level concepts at the root
Includes ~27K concepts (MeSH descriptors) all together
We used MeSH to group the Mouse and Human papers in the citation network
into classes of research
MeSH Structure Example
Digestive System Diseases
Gastrointestinal DiseasesDigestive System Neoplasms
Neoplasms by Site
Neoplasms
Stomach DiseasesGastrointestinal Neoplasms
Stomach Neoplasms
To Do: Place in research areas
H
M
M
H
H
H
M
H
H
H
M
H
HH
H
H
H
M
H
M
M
H H
H Digestive System Diseases
Eye Diseases
Virus Diseases
Immune System
Diseases
Cardiovascular DiseasesSkinDiseases
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
GenBank
Protein: NP_e342 | PMID: 432432kicgdkssgihygvitcegckgffrrsqqcProtein: NP_452u1 | PMID: 483232Adtltytlglsdgqlplgaspdlpeasacp…..
1. Get the protein sequences Human and papers
GenBank
Protein: NP_e342 | PMID: 432432kicgdkssgihygvitcegckgffrrsqqcProtein: NP_452u1 | PMID: 483232Adtltytlglsdgqlplgaspdlpeasacp…..
1. Get the protein sequences Human and papers
...
PMID: 3213414NP_u4323: sgihygvitcegckgffrrsqqcNP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg
PMID: 2346414NP_ti3423: vitcegckgckgffrrsqqcNP_q4322f: ygvitcegeasacfewrwtsNP_x342u2: kicgdkssgihygvitceg
2. Group the proteins by their PMID
GenBank
Protein: NP_e342 | PMID: 432432kicgdkssgihygvitcegckgffrrsqqcProtein: NP_452u1 | PMID: 483232Adtltytlglsdgqlplgaspdlpeasacp…..
1. Get the protein sequences Human and papers
...
PMID: 3213414NP_u4323: sgihygvitcegckgffrrsqqcNP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitceg
PMID: 2346414NP_ti3423: vitcegckgckgffrrsqqcNP_q4322f: ygvitcegeasacfewrwtsNP_x342u2: kicgdkssgihygvitceg
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
2. Group the proteins by their PMID
3. Intersect the Genbank papers with Scopus citations
NP_u4323: sgihygvitcegckgffrrsqqc
NP_i4322: lplgaspdlpeasacfewrwts
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_q4322f: ygvitcegeasacfewrwts
NP_x342u2: kicgdkssgihygvitceg
NP_w3421: kicgdkssgihygvitceg
NP_ti3423: vitcegckgckgffrrsqqc
NP_u4323: sgihygvitcegckgffrrsqqcNP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitcegNP_ti3423: vitcegckgckgffrrsqqcNP_q4322f: ygvitcegeasacfewrwtsNP_x342u2: kicgdkssgihygvitceg
Removing Redundancies
Use CD-HIT with similarity threshold = 0.9
Gene Ontology
Photo taken from: Gene Ontology Consortium. Ontology Structure. http://geneontology.org/page/ontology-structure Last access December 13, 2014
Gene Ontology Annotation
Biological Process
Cellular Component
Molecular Function
cytochrome c
mitochondrial matrix
oxidoreductase activityoxidative phosphorylation
NP_u4323: sgihygvitcegckgffrrsqqcNP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitcegNP_ti3423: vitcegckgckgffrrsqqcNP_q4322f: ygvitcegeasacfewrwtsNP_x342u2: kicgdkssgihygvitceg
FASTA FileBLAST
DB
1. Create BLAST query in FASTA format
2. Create BLAST Database from Swissprot Human Flat File
Getting GO Terms
NP_u4323: sgihygvitcegckgffrrsqqcNP_i4322: lplgaspdlpeasacfewrwts NP_w3421: kicgdkssgihygvitcegNP_ti3423: vitcegckgckgffrrsqqcNP_q4322f: ygvitcegeasacfewrwtsNP_x342u2: kicgdkssgihygvitceg
FASTA FileBLAST
DB
NP_u4323: GO1, GO5, GO4NP_i4322: GO5, GO9NP_w3421: GO4, GO6...
1. Create BLAST query in FASTA format
2. Create BLAST Database from Swissprot Human Flat File
3. Do BLAST with e-value = 10-8
4. Parse the BLAST XML response and get the GO terms for the top hits
Getting GO Terms
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Cited Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
To Do: Place in Protein Biological Systems
lactase activity
serotonin Receptor activity
signal sequence binding
signal transducer activitynucleotide
binding
ATP binding
1. PubMed Articles Classification
1. Collect Mouse and Human Papers
2. Build a Citation Network
3. Classify the Cit-Net Using MeSH Thesaurus
4. Stats Study on MeSH Disease Classification
2. PubMed Proteins Analysis
1. Collect Human Proteins and Annotation Data
2. Build the Entity Co-occurrence Networks
3. Classify PCoC Networks Using Gene Ontology
3. Summary
Summary
Cit-Net connects citing Mouse papers with cited Human
papers in the PubMed database
MeSH is used to classify the citation network nodes into
different classes of research
PCoC network connects the proteins in the citing Mouse
papers with proteins in the cited Human papers
GO is used to group the P-P and P-C-P network nodes
into different classes of MFs, BPs and Ccs
Timetable
Jan Feb Mar Apr May
Database Creation and Data migration
Citation Network Classification
PCoC Networks Building
PCoC Networks Classification
PCoC Networks Analysis