46
Tutorial 6: NetworkAnalyst Galaxy Server

Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

  • Upload
    others

  • View
    46

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Tutorial 6: NetworkAnalyst Galaxy Server

Page 2: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Intro to NetworkAnalyst• Web application that enables complex

meta-analysis and visualization• Designed to be accessible to biologists rather

than specialized bioinformaticians • Integrates advanced statistical methods and

innovative data visualization to support:• Efficient data comparisons• Biological interpretation• Hypothesis generation

Tutorial 1: Overview

Tutorial 2

Tutorial 4

Tutorial 6

Tutorial 3

Tutorial 5

Page 3: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Overview

• Galaxy is a web-based platform for integrating various computational tools and resources into a cohesive workspace for comparative genomics.• NetworkAnalyst Galaxy server

• You can easily upload your own data• 34 workflows (i.e. customized pipelines for RNA-seq raw data processing) are

provided for 17 species and 2 different RNA-seq alignment programs• Results and step-by-step analysis can be recorded (Data Libraries and Histories)

• Note: if this is your first time using Galaxy, please make sure you read this tutorial first (Galaxy 101: https://galaxyproject.org/tutorials/g101)

Page 4: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Goals for this tutorial

• Introduce steps to execute a workflow on NetworkAnalyst Galaxy server• Upload a group of fastq.gz files to the server and building collections• Learn about existing alignment options and pipelines• Run workflows and prepare output for NetworkAnalyst.ca

Page 5: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

How does it work?

1) Account registration: you need to first register in order to upload files to our server.

2) Data upload: to upload your RNA-seq fastq.gz files using FTP.3) Building collection: this is a very practical step when you have tens or

more of RNA-seq samples to process (details).4) Import workflow: you need to first specify alignment program,

sequencing type and organism.5) Run workflow: after importing the workflow, you can directly start using

it.6) Download your gene count table, add class labels and upload

to NetworkAnalyst.

Page 6: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Account Registration

Page 7: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Account Registration

• It is highly recommended to create an account on NetworkAnalystGalaxy server although not required for access.• If used, the data quota is increased (50 GB) and full functionality

across sessions opens up, such as naming, saving, sharing, and publishing Galaxy objects (Histories, Workflows, Datasets, Pages).

Start by clicking here to Register.

Page 8: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Start by clicking here to Register.

Step-by-step instructions to start executing our

workflows.

Tools panel: Contains links to

the data upload, workflows,

preparation and analysis tools.

History panel: Show you the history of your analysis

steps, allow you view data

and results, and more.

Galaxy.networkanalyst.ca Main Page

Page 9: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

2) Data upload

Page 10: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

2) Data upload

• NetworkAnalyst Galaxy server expects compressed fastq raw files (i.e. fastq.gz).• Files can be uploaded directly from the interface or an FTP server can

be used (see Appendix A)

Click Get Data

Page 11: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Upload File

Upload or paste file. For large files (>

1GB), see Appendix A for FTP upload

option.

File format: It is recommended to provide ”fastq.gz”.

Species: can be left on the default

selection.Click here to choose a file from the local

drive.

Page 12: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Once files are chosen, please click on “Start” to load file

names and information.

Status bar would show the progress of loading the files to

the history workspace in Galaxy.

Page 13: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Once data uploading is finished, the file names appear highlighted by green in the history.

Page 14: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

3) Building collection

Page 15: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

3) Building collection

• Collections are:• Helpful to manage your data to minimize the amount of clicking you need• Convenient for processing many samples at once• Combining datasets to minimize clutter

• Difference collection types are used to represent the type of RNA-seqexperiments• Collection type: List of pairs à paired-end RNA-seq experiments• Collection type: List à single-end RNA-seq experiments

• Workflows in NetworkAnalyst Galaxy server are designed to recognize collection input

Page 16: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Collection for single-end uploaded samples

2) Select uploaded samples to build the collection or click “All” to select all samples.

1) Click on the checkmark for operations on multiple datasets.

3) Click on For all selected and choose Build Dataset List.

Page 17: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Assign some name forthe list 2) Click on Create list.

The collection willappear then in theHistory.

Page 18: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Collection for paired-end uploaded samples

2) Select uploaded samples to build the collection or click “All” to select all samples.

1) Click on the checkmark for operations on multiple datasets.

3) Click on For all selected and choose Build List of Dataset Pairs.

Page 19: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Galaxy provides aflexible way to pairsamples using a filterwith certain pattern.For example, “_1”refers to forward readsand “_2” for thereverse ones.

3) Finally, click on“Create list”. Thecollection will appearthen in the History.

2) Assign a name to thecollection.

Page 20: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Once the collection iscreated, it appears inthe History. Now youcan run single-end orpaired-end workflowsbased on the type ofcreated collection.

Page 21: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

4) Import workflow

Page 22: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

4) Import workflow

• Workflows specify the steps of execution in a process.• Workflow, allows user to repeat analysis using different datasets.• In NetworkAnalyst Galaxy server, 34 workflows are provided for 17

species and 2 different RNA-seq alignment programs.• Different workflows are designed for single-end and paired-end

experiments.

Page 23: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Choose alignment program,sequencing type andorganism

Click on “View workflow”

Page 24: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Click here to import theworkflow

Page 25: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Click here and choose Runto start using workflow

Click here to view theworkflow steps.

Page 26: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

5) Run workflow

Page 27: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

5) Run workflow

• In NetworkAnalyst Galaxy server, workflows are designed for:• Paired-end RNA-seq experiments• Single-end RNA-seq experiments

• An option is provided for the alignment program:• Kallisto: is a recently developed solution under the class of alignment free solutions

for processing RNA-seq reads. It is very fast and has proven to show high correlation with results of other existing solutions.

• HiSAT2: is another well-maintained solution for RNA-seq alignment. For HiSAT2, an annotation file should be provided. Please see Appendix B for details.

• In general the steps of our workflows are:• Trimming à Alignment à Data conversion/summarization

• For checking quality of the fastq files, FASTQC tool is available under the ”Sequence operations” in the Tools panel.

Page 28: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Choose the collectionname which was created inStep 3 from the History

2) Click on “Run workflow”

Workflow using Kallisto Alignment Program

Steps of execution for RNA-seq workflow.

Page 29: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

HiSAT2 workflows willrequire a referenceannotation file, choose thereference annotation afterloading it to History (seeAppendix B)

Workflow using HiSAT2 Alignment Program

Page 30: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

6) Download your gene count table, add class labels and upload

to NetworkAnalyst

Page 31: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Downloading Results Count Table from Galaxy

Click on the save icon tostart downloading the file.

Click on the output link to view more options.

Page 32: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Galaxy resultsoutput file will besimilar to thistable.

ForNetworkAnalyst.ca, please providethe #CLASSannotation beforeuploading thedataset.

Adding Class Labels to the Results File

Page 33: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Appendix AUploading data using FTP

Page 34: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Uploading data using FTP

• FTP (File Transfer Protocol) allows you to upload files from your computer to your NetworkAnalyst Galaxy account. • This is the recommended option to upload several and large fastq.gz files

to a Galaxy server• In order to use FTP, you will need an FTP client which is a desktop app that

connects your computer to your NetworkAnalyst Galaxy hosting account.• Any FTP client can be used and some popular options are:

• FileZilla• Supported operating systems: Windows, Linux and Mac• https://filezilla-project.org/

• WinSCP (Windows), Cyberduck (Windows & Mac)

Page 35: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Fill in the details for the FTP connection.Host: galaxy.networkanalyst.ca. Forusername and password, they are the samevalues used for your account inNetworkAnalyst Galaxy server.

Drag and drop gzip compressed fastq.gzfiles

NOTE: after sample files areuploaded, you can login togalaxy.networkanalyst.ca andretrieve data from FTP to yourGalaxy account

Using FileZilla

Page 36: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Click on:Get Data >Upload File tostart

Or, click onload your owndata

List of uploaded FTPfiles

Loading FTP Files from Galaxy

Page 37: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Click here to loadall files

Page 38: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Specify type of all files tobe fastq.gz 2) Click Start

Page 39: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Data loading to Galaxy history will start. Youcan click on Close.

Page 40: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Appendix BLoad Annotation Library for HiSAT2

Page 41: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Choose Data Libraries under Shared Data toview our uploaded genome annotation files

Page 42: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Click on this library to view the list of annotations

Page 43: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

1) Choose the genome annotationfor the species of interest

2) Click on ”To History” and choose“as Datasets” to load the library toyour Galaxy History

Page 44: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

The library is loaded successfully to the Historyand highlight in green color

Page 45: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

Choose the reference annotation file that wasloaded to the History

Page 46: Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial 3 Tutorial 5. Overview • Galaxy is a web-based platform for integrating various

the endFor more information, visit the FAQs, Tutorials, Resources,

and Contact pages on www.networkanalyst.ca