Tutorial6: NetworkAnalyst GalaxyServerTutorial 1: Overview Tutorial 2 Tutorial 4 Tutorial 6 Tutorial...

Preview:

Citation preview

Tutorial 6: NetworkAnalyst Galaxy Server

Intro to NetworkAnalyst• Web application that enables complex

meta-analysis and visualization• Designed to be accessible to biologists rather

than specialized bioinformaticians • Integrates advanced statistical methods and

innovative data visualization to support:• Efficient data comparisons• Biological interpretation• Hypothesis generation

Tutorial 1: Overview

Tutorial 2

Tutorial 4

Tutorial 6

Tutorial 3

Tutorial 5

Overview

• Galaxy is a web-based platform for integrating various computational tools and resources into a cohesive workspace for comparative genomics.• NetworkAnalyst Galaxy server

• You can easily upload your own data• 34 workflows (i.e. customized pipelines for RNA-seq raw data processing) are

provided for 17 species and 2 different RNA-seq alignment programs• Results and step-by-step analysis can be recorded (Data Libraries and Histories)

• Note: if this is your first time using Galaxy, please make sure you read this tutorial first (Galaxy 101: https://galaxyproject.org/tutorials/g101)

Goals for this tutorial

• Introduce steps to execute a workflow on NetworkAnalyst Galaxy server• Upload a group of fastq.gz files to the server and building collections• Learn about existing alignment options and pipelines• Run workflows and prepare output for NetworkAnalyst.ca

How does it work?

1) Account registration: you need to first register in order to upload files to our server.

2) Data upload: to upload your RNA-seq fastq.gz files using FTP.3) Building collection: this is a very practical step when you have tens or

more of RNA-seq samples to process (details).4) Import workflow: you need to first specify alignment program,

sequencing type and organism.5) Run workflow: after importing the workflow, you can directly start using

it.6) Download your gene count table, add class labels and upload

to NetworkAnalyst.

1) Account Registration

1) Account Registration

• It is highly recommended to create an account on NetworkAnalystGalaxy server although not required for access.• If used, the data quota is increased (50 GB) and full functionality

across sessions opens up, such as naming, saving, sharing, and publishing Galaxy objects (Histories, Workflows, Datasets, Pages).

Start by clicking here to Register.

Start by clicking here to Register.

Step-by-step instructions to start executing our

workflows.

Tools panel: Contains links to

the data upload, workflows,

preparation and analysis tools.

History panel: Show you the history of your analysis

steps, allow you view data

and results, and more.

Galaxy.networkanalyst.ca Main Page

2) Data upload

2) Data upload

• NetworkAnalyst Galaxy server expects compressed fastq raw files (i.e. fastq.gz).• Files can be uploaded directly from the interface or an FTP server can

be used (see Appendix A)

Click Get Data

Upload File

Upload or paste file. For large files (>

1GB), see Appendix A for FTP upload

option.

File format: It is recommended to provide ”fastq.gz”.

Species: can be left on the default

selection.Click here to choose a file from the local

drive.

Once files are chosen, please click on “Start” to load file

names and information.

Status bar would show the progress of loading the files to

the history workspace in Galaxy.

Once data uploading is finished, the file names appear highlighted by green in the history.

3) Building collection

3) Building collection

• Collections are:• Helpful to manage your data to minimize the amount of clicking you need• Convenient for processing many samples at once• Combining datasets to minimize clutter

• Difference collection types are used to represent the type of RNA-seqexperiments• Collection type: List of pairs à paired-end RNA-seq experiments• Collection type: List à single-end RNA-seq experiments

• Workflows in NetworkAnalyst Galaxy server are designed to recognize collection input

Collection for single-end uploaded samples

2) Select uploaded samples to build the collection or click “All” to select all samples.

1) Click on the checkmark for operations on multiple datasets.

3) Click on For all selected and choose Build Dataset List.

1) Assign some name forthe list 2) Click on Create list.

The collection willappear then in theHistory.

Collection for paired-end uploaded samples

2) Select uploaded samples to build the collection or click “All” to select all samples.

1) Click on the checkmark for operations on multiple datasets.

3) Click on For all selected and choose Build List of Dataset Pairs.

1) Galaxy provides aflexible way to pairsamples using a filterwith certain pattern.For example, “_1”refers to forward readsand “_2” for thereverse ones.

3) Finally, click on“Create list”. Thecollection will appearthen in the History.

2) Assign a name to thecollection.

Once the collection iscreated, it appears inthe History. Now youcan run single-end orpaired-end workflowsbased on the type ofcreated collection.

4) Import workflow

4) Import workflow

• Workflows specify the steps of execution in a process.• Workflow, allows user to repeat analysis using different datasets.• In NetworkAnalyst Galaxy server, 34 workflows are provided for 17

species and 2 different RNA-seq alignment programs.• Different workflows are designed for single-end and paired-end

experiments.

Choose alignment program,sequencing type andorganism

Click on “View workflow”

1) Click here to import theworkflow

Click here and choose Runto start using workflow

Click here to view theworkflow steps.

5) Run workflow

5) Run workflow

• In NetworkAnalyst Galaxy server, workflows are designed for:• Paired-end RNA-seq experiments• Single-end RNA-seq experiments

• An option is provided for the alignment program:• Kallisto: is a recently developed solution under the class of alignment free solutions

for processing RNA-seq reads. It is very fast and has proven to show high correlation with results of other existing solutions.

• HiSAT2: is another well-maintained solution for RNA-seq alignment. For HiSAT2, an annotation file should be provided. Please see Appendix B for details.

• In general the steps of our workflows are:• Trimming à Alignment à Data conversion/summarization

• For checking quality of the fastq files, FASTQC tool is available under the ”Sequence operations” in the Tools panel.

1) Choose the collectionname which was created inStep 3 from the History

2) Click on “Run workflow”

Workflow using Kallisto Alignment Program

Steps of execution for RNA-seq workflow.

HiSAT2 workflows willrequire a referenceannotation file, choose thereference annotation afterloading it to History (seeAppendix B)

Workflow using HiSAT2 Alignment Program

6) Download your gene count table, add class labels and upload

to NetworkAnalyst

Downloading Results Count Table from Galaxy

Click on the save icon tostart downloading the file.

Click on the output link to view more options.

Galaxy resultsoutput file will besimilar to thistable.

ForNetworkAnalyst.ca, please providethe #CLASSannotation beforeuploading thedataset.

Adding Class Labels to the Results File

Appendix AUploading data using FTP

Uploading data using FTP

• FTP (File Transfer Protocol) allows you to upload files from your computer to your NetworkAnalyst Galaxy account. • This is the recommended option to upload several and large fastq.gz files

to a Galaxy server• In order to use FTP, you will need an FTP client which is a desktop app that

connects your computer to your NetworkAnalyst Galaxy hosting account.• Any FTP client can be used and some popular options are:

• FileZilla• Supported operating systems: Windows, Linux and Mac• https://filezilla-project.org/

• WinSCP (Windows), Cyberduck (Windows & Mac)

Fill in the details for the FTP connection.Host: galaxy.networkanalyst.ca. Forusername and password, they are the samevalues used for your account inNetworkAnalyst Galaxy server.

Drag and drop gzip compressed fastq.gzfiles

NOTE: after sample files areuploaded, you can login togalaxy.networkanalyst.ca andretrieve data from FTP to yourGalaxy account

Using FileZilla

Click on:Get Data >Upload File tostart

Or, click onload your owndata

List of uploaded FTPfiles

Loading FTP Files from Galaxy

1) Click here to loadall files

1) Specify type of all files tobe fastq.gz 2) Click Start

Data loading to Galaxy history will start. Youcan click on Close.

Appendix BLoad Annotation Library for HiSAT2

Choose Data Libraries under Shared Data toview our uploaded genome annotation files

Click on this library to view the list of annotations

1) Choose the genome annotationfor the species of interest

2) Click on ”To History” and choose“as Datasets” to load the library toyour Galaxy History

The library is loaded successfully to the Historyand highlight in green color

Choose the reference annotation file that wasloaded to the History

the endFor more information, visit the FAQs, Tutorials, Resources,

and Contact pages on www.networkanalyst.ca

Recommended