View
209
Download
2
Category
Tags:
Preview:
Citation preview
Data Manipulation: Molecular Online and Server Tools & BioExtract Server
Theme: FXN Gene and Pancreatic Cancer.
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013Etienne.gnimpieba@usd.edu
Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Metabolic:
• Sabio-RK (check with Brent)
• KEGG (check with Brent)
• HMDB (hmdb.ca, contact for API)
• SMPDB (http://www.smpdb.ca)
• BioModels
• drugDB
• Brenda (check with Brent)
• [Mathi's project]Protein
• Expazy DB collection (uniprot, )
• PDB
• SBKB
• STRINGGenomic:
• G.E.O.
• GenBank
• GO
• EBI Array Express & Gene AtlasPhenomic:
• PhenomicDB
• Phenoscape
Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Active Network Extraction & Analysis
Reactome Functional Interaction network
Disease subnetwork
Extract mutated, overexpressed, undexpressed, expanded/deleted genesAdd Linker
genes
Disease “modules”
Disease gene prediction
Sample classification
Hypothesis generationApply community clustering algorithms
Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
p53, SMAD, TGFβ, TNF signaling
KRAS, MAPK signaling
Heterotrimeric G-protein signaling
Rho GTPase signaling
Transcription & translation
Cell cycle
Wnt & Cadherin signaling
Hedgehog signaling
Transcription
Zinc fingers
Ca2+ Signaling
Non-silent mutations• blue – in primary tumour only• green – in xenograft only• red – in primary & xenograft
Pancreatic Cancer Module Map (43 Cases)
Christina Yung / Bioinformatics.ca
Data ManipulationMolecular Online Tools: BioExtract Server
Bibliographic Taxonomic
Nucleotide Genomic Protein Metabolic pathway
Molecular Biology
Databases
MEDLINEPubMedEMBASEBIOSISCAB InternationalAGRICOLA
NEWTThe Tree of LifeSpecies 2000IOPIITIS
KEGGEcoCycBRENDAENZYMEBIOMODELREACTOME
INSDCEMBLDDBJNCBIGENBANK
SPGPAceDBHIV-SD EnsemblWormbaseFlyBaseMGDSGDEBI ( Genome server, Karyn’s genome)RGDSPGP
•GOA•ENZYME•INterPro•PDB•Integr8•MEROPS LIGAN•EMP•DCHGR
•PROSITE•PRINT•Pfam•BLOCKS•SBASE
•UniProt/Swiss-Prot•PIR
Pri
mar
y pr
otei
n se
quen
ceS
peci
ali
zed
pro
tein
se
qu
en
ce
Secon
dary an
d stru
cture
protein
Review: Databases
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Sequence Type Accession Number
DNA sequence from GENBANk , EMBL or DDBJ 1 letter + 5 digits : U437522 letter + 6 digits : AF462052
GenePept sequence GENBANk , EMBL or DDBJ 3 letter + 5 digits : AAF46449
Protein sequence from SwissProt 1 letter + 5 digits : Q16595
Protein sequence from the Protein Research Foundation 6/7 digits + 1 letter : 2808353A
RefSeq sequence 2 letters + _ + >6 digitsmRNA : NM_******Protein : NP_******
Protein sequence from Protein Data Bank PDB 1 digit + 3 letters : 2EFF
Protein sequence from Molecular Modeling DataBase MMDB ID + >4 digits : MMDB ID 767744
Review: data formatData Manipulation
Molecular Online Tools and BioExtract Server
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
>gi|XXXX |XXX >sp|XXXX |XXXGene Info number Specie referenceAccession number Gene Info number Specie referenceAccession number
Data Manipulation Molecular Online Tools: BioExtract Server
Biological sequences and data can be analyzed in many ways with
bioinformatics tools. They can be read, assembled, compared, mapped,
predicted, designed, modeled…
1.Nucleotide and protein sequence searching (blastall, SSEARCH
for fasta local, GLSEEARCH for global)
2.Multiple sequence alignment (clustalW2, Mview, …)
3.Pairwise sequence alignment (Needle for global, LALIGN for
local)
4.Protein functional analysis (SMART, Phobius, interproscan)
5.Functional genomic tools (R-tools, SAIL, EFOtools,)
6.Molecular structure analysis (PDBeFold, QuaternaryStructure,
…)
7.Scientific literature text mining (EBIMed, Whatizit)
8.Sequence translation (Transeq, readseq, Backtranseq,…)
9.Data retrieval and ID mapping (dbfetchm, ENA/SRA, SRS,
PICR)
10.Protein structure prediction tools
11.…
Review: Online Programs & Algorithms
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
AND = term1 AND term2 must exist in the searched documentsOR = term1 OR term2 must exist NOT = term1 must not be present in any of the displayed documentsALL = term1 must not be present in all of the displayed documents+ term1 = document must contain the term1- term1 = document must not contain term1XXX* = all characters are accepted after the XXXXX?YX = all characters are accepted instead of Y
FXN [AND] gene [NOT] Frataxin all data related with FXN gene except those concerning Frataxin protein ataxia + apraxia + gene all genes related with ataxia and apraxia Ada* [AUTH] all authors whose names begin with Ada
Boolean operators and symbols
Data ManipulationMolecular Online Tools: BioExtract Server Review: Databases
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
BLAST (Basic Local Alignment search Tool) : comparing a protein or a DNA sequence to other sequences
FASTA (FAST-ALL): fast protein or nucleotide comparison
Similarity search tools
Global match : align all residues of a sequence with all of the other sequence
Local match : find a region in one sequence that matches with the other
Motif match : find matches of a short sequence in one or more region internal to another long sequence, it could be a :
Multiple alignment : a mutual alignment of many sequences
Perfect match
deletions insertionsmismatches
Review: Sequence AnalysisData Manipulation
Molecular Online Tools and BioExtract Server
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Review: Sequence AnalysisData Manipulation
Molecular Online Tools and BioExtract Server
Etienne Z. GnimpiebaBRIN WS 2013
Mount Marty College – June 24th 2013
Sequence alignment : assignment of residue-residue
correspondence
Determine phylogenic relationship by analyzing similarity and
homology-Similarity: Observation or measurement of
resemblance and difference
Homology: The sequences and the organisms in which they
occur are descended from a common ancestor Homology must
be an inference from observation of similarity
Determine if a protein (or a gene) is related to a larger group of
proteins
Verify if a mutated residue is conserved within species
Context
0. Specification & Aims
.
Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty.
Molecular Online Tools and Server
Keywords: Bio: FXN, Frataxin, pancreatic cancer, CDKN4Math: HMM, Informatics: programing, bioinformatics tools, getting and exporting data
Reduced expression of frataxin is the cause of Friedrich's ataxia (FRDA), a lethal neurodegenerative disease, how about liver cancer?
Aim: The purpose of this lab is to initiate online biological exploration tools of the human model large scale data study (metabolic, proteic, genomic, …). We simulated the application on FXN gene and pancreatic cancer disease. Now we can understand how a researcher can come to identify cross biological knowledge available in data banks.
Acquired skillsOnline and server tools:- Query biological DB (fasta, Html, txt, figure formats)- Sequence tools (protein and gene)Alignment (showalign, clustalw2), similarity, …- Manage data result (select, keep, map, export)- Build and reuse workflow
Biological Hypothesis
FXN on chromosome 9
Frataxin molecule structure (pymol)
Pancreatic cancerPancreas anatomy
?Bio
log
ical
DB
Tools
Resolution Process
T2. Genome exploration: Objective: Use of Ensembl to localize the FXN on the human genome and identify the genes implicate in pancreatic cancer disease.
T3. Sequences manipulation Objective: Find similar sequence using BLAST tools and make an alignment on given sequences.
T2.1. Locate a given gene on human genomeT2.2. Get a genomic sequence from NCBI T2.3. Get the protein data and sequence from EBI T2.4. Save the export sequences data in data folder
T3.1. Find similar sequences using BLAST toolT3.2. Align generated sequences with ClustalW toolT3.3. Visualized result using phylogenic tree on Jalview
T5. BioExtract server Objective: used server tool to optimized data
manipulation process, apply on BioExtract server.
T5.1. Server Initialization T5.2. Pancreatic cancer & Frataxin (FXN) T5.3. Mapping, Alignment T5.4. Workflow save & reused
T4. Protein Data and Structural Biology Knowledge
Objective: To provide protein levels of frataxin study and its connection with pancreatic cancer (functional ad structural data)
T1. Metabolomics Objective: Use metabolic data repository to understand the frataxin protein mechanismT1.1. Finding the Enzyme and Pathway
related to Frataxin using KEGG T1.2. Finding the Reaction involved with Frataxin using Reactome T1.3. Using BRENDA for enzyme data on FrataxinT1.4. Using Collected data for AnalysisT1.5. Redu the process with Pancreatic Cancer Results
T4.1. Structural Knowledge on Frataxin using SBKBT4.2. Using Uniprot for Frataxin Protein Study T4.3. Protein-Protein Interaction using STRINGT4.4. Using same method for Pancreatic Cancer and compare
Recommended