7
Bioinformatics Data Manipulation: Molecular Online Tools & BioExtract Server Theme: FXN Gene and Pancreatic Cancer. Lab #1 Etienne Z. Gnimpieba BRIN WS 2013 Mount Marty College – June 24 th 2013 [email protected]

Session i lab bioinfo dm and app mmc

Embed Size (px)

Citation preview

Page 1: Session i lab bioinfo dm and app mmc

Bioinformatics Data Manipulation: Molecular Online Tools & BioExtract Server

Theme: FXN Gene and Pancreatic Cancer.

Lab #1

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013 [email protected]

Page 2: Session i lab bioinfo dm and app mmc

Context

0. Specification & Aims

.

Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty.

Bioinformatics Molecular Online Tools and Server

Keywords: Bio: FXN, Frataxin, pancreatic cancer, CDKN4Math: HMM, Informatics: programing, bioinformatics tools, getting and exporting data

Reduced expression of frataxin is the cause of Friedrich's ataxia (FRDA), a lethal neurodegenerative disease, how about liver cancer?

Aim: The purpose of this lab is to initiate online biological exploration tools of the human model large scale data study (metabolic, proteic, genomic, …). We simulated the application on FXN gene and pancreatic cancer disease. Now we can understand how a researcher can come to identify cross biological knowledge available in data banks.

Acquired skillsOnline and server tools:- Query biological DB (fasta, Html, txt, figure formats)- Sequence tools (protein and gene)Alignment (showalign, clustalw2), similarity, …- Manage data result (select, keep, map, export)- Build and reuse workflow

Biological Hypothesis

FXN on chromosome 9

Frataxin molecule structure (pymol)

Pancreatic cancerPancreas anatomy

?Bio

log

ical

DB

Tools

Resolution Process

T2. Genome exploration: Objective: Use of Ensembl to localize the FXN on the human genome and identify the genes implicate in pancreatic cancer disease.

T3. Sequences manipulation Objective: Find similar sequence using BLAST tools and make an alignment on given sequences.

T2.1. Locate a given gene on human genomeT2.2. Get a genomic sequence from NCBI T2.3. Get the protein data and sequence from EBI T2.4. Save the export sequences data in data folder

T3.1. Find similar sequences using BLAST toolT3.2. Align generated sequences with ClustalW toolT3.3. Visualized result using phylogenic tree on Jalview

T5. BioExtract server Objective: used server tool to optimized data

manipulation process, apply on BioExtract server.

T5.1. Server Initialization T5.2. Pancreatic cancer & Frataxin (FXN) T5.3. Mapping, Alignment T5.4. Workflow save & reused

T4. Protein Data and Structural Biology Knowledge

Objective: To provide protein levels of frataxin study and its connection with pancreatic cancer (functional ad structural data)

T1. Metabolomics Objective: Use metabolic data repository to understand the frataxin protein mechanismT1.1. Finding the Enzyme and Pathway

related to Frataxin using KEGG T1.2. Finding the Reaction involved with Frataxin using Reactome T1.3. Using BRENDA for enzyme data on FrataxinT1.4. Using Collected data for AnalysisT1.5. Redu the process with Pancreatic Cancer Results

T4.1. Structural Knowledge on Frataxin using SBKBT4.2. Using Uniprot for Frataxin Protein Study T4.3. Protein-Protein Interaction using STRINGT4.4. Using same method for Pancreatic Cancer and compare

Page 3: Session i lab bioinfo dm and app mmc

Data ManipulationMolecular Online Tools and BioExtract Server T1. Metabolomics

Objective : Use metabolic data repository to understand the frataxin protein mechanism

Theme: Frataxin (FXN) implication in the pancreatic cancer genesis

T1.1. Finding the Enzyme and Pathway related to Frataxin using KEGG

T1.2. Finding the Reaction involved with Frataxin using Reactome

T1.3. Using BRENDA to find information on Frataxin

On the Reactome website: http://www.reactome.org/ReactomeGWT/entrypoint.htmlo Search frataxin and select the 4th result with Frataxin in the title. This shows you the pathway model

related to frataxin and how frataxin is involved in it.

On the BRENDA Database website: http://www.brenda-enzymes.org/oSearch using the E.C. number obtained in T1.1 and select the result given. This website gives

multitudes of information on the enzyme including the reaction, related species, and so on. At the very bottom of the webpage you can select other databases that have infromation on the same compound or protein

On the KEGG Database website: http://www.genome.jp/kegg/o Search frataxin, and select the first result under KEGG Gene Database (hsa:2395)o Copy the E.C. number given in “Definition” (EC:1.16.3.1) o In order to find the related pathway, search the E.C. number in the general KEGG Database search

(click on the KEGG logo on top)o Select the result given in the KEGG Enzyme Database at the bottom. Here you can see how this

enzyme is involved in the metabolism given.

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

T1.4. Using Collected Information to Analyze the DataOn the BioModels website: http://www.ebi.ac.uk/biomodels-main/oSearch using the E.C. number obtained in T1.1 and select the first result given. Here you can download

the SMBL file (in student folder) for this pathway (top left corner) and analyze it in the Sematic SBML website.

http://semanticsbml.org/semanticSBML/simple/index oClick on the first box “Find Similar Models” and click “Browse” and select the file you just saved from

BioModels. In this website you can use multiple tools to analyze the model and compare with other models as well.T1.5. Same Process Searching for Pancreatic Cancer Results (Optional) oUse the same process searching instead for pancreatic cancer results.

Page 4: Session i lab bioinfo dm and app mmc

Molecular Online Tools and BioExtract Server T2. Genome Exploration

Objective: Use Ensembl online tools to localize the FXN on the human genome and identify the genes implicated in pancreatic cancer disease. Next, find an appropriate data (sequence) on FASTA format.

Theme: Frataxin (FXN) implication in the pancreatic cancer genesis

On the NCBI website: http://www.ncbi.nlm.nih.gov/guide/o Pull down “All Databases” and select “Gene” database, then do a keyword search using term FXNo Click the corresponding Homo-sapiens FXN gene (first result)o Scroll down and look for the “NCBI Reference Sequences” title and go to subtitle “mRNA and

Proteins” o Click on the corresponding accession number of the first transcript variant (NM_000144.4)o Get the same sequence in FASTA format by clicking on “FASTA” linko Click Send on the top right in blue, select complete record, file, FASTA, and Create File

– then save in student folder if possible (will save in downloads automatically)

T2.1. Locate a given gene on human genome

T2.2. Get a genomic sequence from NCBI (42 DataBases)

The common protein name for FXN is FrataxinOn the EBI website: http://www.ebi.ac.uk/o Type “FXN” in the search and click on “find” o Select the Homo Sapien Frataxin to get all the information about the protein (function, domains, structure,

gene expression..)o Don’t close the window

T2.3. Get the protein information and sequence from EBI

On the Ensembl web site http://uswest.ensembl.org/index.html o Select our species "human“o Do a keyword search using the term "FXN“o Follow the link of the “Gene” drop down featureo Click the link for “Location”o Export this gene by clicking “Export data” (left side bar) in html file as a FASTA sequence.

o Click Nexto Click the “HTML” link

o Do the same process by searching for “pancreatic cancer”. When you find the list of genes, select the CDKN2A gene

Data Manipulation

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Page 5: Session i lab bioinfo dm and app mmc

Data ManipulationMolecular Online Tools and BioExtract Server T3. Sequences

Manipulation Objective : Find similar sequence using BLAST tools and make alignment on given sequences.

Theme: Frataxin (FXN) implication in the pancreatic cancer genesis

T3.1. Find similar sequences using BLAST tool

T3.2. Align generated sequences with ClustalW toolo Select about 10 different species then click on “Align” at the bottom of the screen. Selected

sequences will be directly inserted in ClustalW tool and the tool will run automatically.o From the right menu, it is possible to select similarities, polar residues, aromatic residues,

etc. if interested…o Through the same page you may add further sequences to the same alignment if needed.

You can also access the phylogenetic tree. More details about the residues and the distances can be obtained by clicking on “Jalview” on the top right in orange. (May have to open Jalview manually)

o In Jalview, click “file”, “add sequences”, “from file”, then select the sequence file you save earlier.

o Continuing from Task T2.3, select the “Protein” tab on the left and select “view sequence in Uniprot”

o You can get the Fasta format of the protein by clicking on “fasta” in the top right o Go back to previous page (using browser’s back button) and check the box next to the first

sequence under “Sequences” title. o Select the “Blast” tool in the drop down menu then click on “Go” .o The best matched sequences will appear on the first page (green indicates a better

match). To see other sequences you can click on next. Blast parameters can be modified by clicking on “Options” at the top

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

Page 6: Session i lab bioinfo dm and app mmc

Data ManipulationMolecular Online Tools and BioExtract Server T4. Protein Data and Structure

Data

Objective : To provide protein levels of frataxin study and its connection with pancreatic cancer (functional ad structural data)

Theme: Frataxin (FXN) implication in the pancreatic cancer genesis

T4.1. Structural Knowledge on Frataxin using SBKB

T4.2. Using Uniprot for Frataxin Protein Study

T4.3. Protein-Protein Interaction using STRING

On Uniprot Database: http://www.uniprot.org/o Search frataxin and select the first 3 results given and click “Download” in top right.

You can then “Open” or “Download” any of the results given

On the STRING Database: http://string-db.org/o Search under “search by name” “FXN”. oSelect the first result given and click “Continue”. Here you can look at the Protein-Protein Interaction model and obtain more information on a given protein or integration by clicking on it in the model, as well as use many other useful tools.

On Systems Biology Knowledgebase (SBKB): http://www.sbkb.org/o Select “by text” (options on left) and search “frataxin”. o For our example select the link next to “Structures and annotations…” Here you can

obtain information on all the different hits such a structure by looking under all the given tabs.

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

T4.4. Using same method for Pancreatic Cancer and compareo Go back to the STRING Database home page search under “multiple names” “frataxin” and “pancreatic cancer”. Select the first result.

oSelect all three results given and click “Continue”. Here it shows the 3 proteins we have selected, however there are no interaction shown between them in this database.

o Can widen the given result by change our search for cancer in general.

Page 7: Session i lab bioinfo dm and app mmc

o (If previous step was skipped, then this step is skipped as well) Again go to the query tab and search “FXN”. Search and select a few listings. Export them as done in T5.2 Go to the tools tab.

o Select similarity search tools, then select “blastp”. Select “use records on extract page formatted as “Fasta”. Under "choose search set" select the database "swissprot"

o When execution complete, go to the extract page and select 10 different sequences belonging to 10 different species including human, then “keep only selected records.” Again export the records.

o Go to the tools tab again, select “iPlant”, then “clustal w2”. Select “use records on extract page formatted as “Fasta”. Your 10 protein sequences will be automatically incorporated as an input in clustalw2 tool. Execute the tool. Use the pull down for “Search Results” and select “clustalw2.fa” before viewing the results.

Data Manipulation Molecular Online Tools and BioExtract Server

Etienne Z. GnimpiebaBRIN WS 2013

Mount Marty College – June 24th 2013

T5. Bioextract ServerObjective : Use Workflow Management Systems (WMS) to optimized data manipulation processes (BioExtract server).

Theme: Frataxin (FXN) implication in the pancreatic cancer genesis

T5.4. Workflow save & reused

http://bioextract.org T5.1. Server Initialization

T5.2. Pancreatic cancer & Frataxin (FXN) data

T5.3. Mapping, Alignment

o Register on BioExtract Server to be able to create and save your own workflows.o Click on the “workflows tab”, then click “create and import workflows.” Now click “record workflow” then “close.”o To obtain the workflow at the end of the lab: From the “workflows” tab click on “create and Import workflows” then click

on “save records”.

o Select the query tab. Then select the protein sequences and check the box next to NCBI protein database. Select “gene” as Search field and type “FXN”. Click on “Add Seach Line” and select “Species” and type “Human”. Submit the query.

o Results will appear on the “extract page”. You can get the Genbank view of each sequence by clicking on “View record”. We will need only the Homo sapien Frataxin. For that, we will click “select records”, then check the corresponding box of your choosing. Click on “keep only selected records”. The results can be saved or extracted in Fasta or txt format (Export the records in FASTA format)

o Click to the "tools" tab. then click on “Alignment Tools”, and “showalign”. Select “Use records on extract page formatted in Fasta”. o Click on “execute” to run the tool. When execution is complete, results can be retrieved by selecting the desired format and clicking

on “view results”. o Repeat the search process with “pancreatic cancer”. Make sure you change the first search field to “all text ” (Optional)

o Go back to the “workflow” tab and click “create and import workflows”. Write a name and a description for your workflow then click on Save. All the previous steps will be saved in this workflow.

o Once the workflow saves, you will find it in the bottom of the workflow list. Click on the name of the workflow to have a schematic view of it. Run the workflow by clicking on “start”.

o Get and verify all the results by clicking on “provenance”. The general report can be saved for later analysis. Results of each tool can be viewed or saved by clicking on “view file”.

o The same workflow can be executed for another query by simply modifying the accession number of the protein. (Click save in the “create and import workflows” section to temporarily save the new query)