11
Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus EUROPEAN UNION EUROPEAN REGIONAL DEVELOPMENT FUND INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY Genomic Data Analysis Services Available for PL-Grid Users Clinical Genomic Analysis (CGA) Workshop

Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

Domain-oriented services and resources

of Polish Infrastructure for Supporting

Computational Science in the European

Research Space – PLGrid Plus

Domain-oriented services and resources

of Polish Infrastructure for Supporting

Computational Science in the European

Research Space – PLGrid Plus

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

Genomic Data Analysis Services Available for PL-Grid Users

Clinical Genomic Analysis (CGA) Workshop

Page 2: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

2

Academic Computer Centre Cyfronet AGH

• Established in 1973 (40 years of experience)

• Main mission: to provide network, computational power and data storage capabilities for Polish science

• ~374 TFlops (145@top500), 2.5 PB (disks) and 3.5 PB (tapes)

• Regular and bigmem nodes, vSMP, GPGPU, FPGA, MPI over Infiniband

• Details: http://kdm.cyfronet.pl/

PL-Grid Infrastructure for Polish science

• Five computing centers with Cyfronet as the consortium leader

• Total: ~588 TFlops and ~5.6 PB (disks) • Planned for 1Q2015: >900 TFlops, 8 PB

• Available free of charge to all Polish scientists and their foreign collaborators

• Details: http://www.plgrid.pl

ACC Cyfronet AGH and

PL-Grid Infrastructure

Page 3: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

3

Register at https://portal.plgrid.pl

User verification process based on Polish OPI number

Assistants and foreigners are confirmed by Polish PIs

Variety of basic and higher level services available after login

Local SSH access, cloud computing, middlewares

Considerable library of installed applications

GATK, MACS, SAMTools, Picard, TopHat, Bowtie, (p)BWA,

R/Bioconductor, AutoDock/AutoGrid, BLAST, Clustal, CPMD,

Gromacs, NAMD, Matlab, Mathematica …

Free to compile and install own applications using the shell login

Possibility to use own commercial licenses on HPC resources

Questions: https://helpdesk.plgrid.pl or [email protected]

Using PL-Grid Infrastructure

Page 4: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

4

Preparation of specific computing environments, i.e., solutions, services and extended infrastructure tailored to the needs of different groups of scientists (2012-2014)

Life Science among 13 domains of science

LS Domain Leader: Kraków LifeScience Klaster

Tasks:

Analysis of user needs

Development of services

Procurement and deployment of applications on HPC res.

Continuous assistance for the Life Science community

PLGrid PLUS: Domain-oriented

Services, Resources and Tools

Page 5: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

5

https://lifescience.plgrid.pl/ For people who perform biological investigations using DNA microarrays

Goal: help to analyze gene expression information and correlate it with other clinical data

In development since 1Q2013, first version deployed

Analyses available now: normalization, clustering, SAM, T-test, GO-based enrichment, ANNs, PCA, panel filtering

Integromics analyses in preparation

CCA, PLS (gene expression and lipidomics)

Roleswitch, TargetScore (gene expression and miRNA)

Supported models: Affymetrix, Agilent (support for others is possible in case of demand)

DNA Microarray Integromics

Analysis Platform (1/2)

Page 6: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

6

Notable features Integration with EBI ArrayExpress (import, MIAME)

Sharing experiments with others

Importing own data for further analysis

Supported languages: PL, EN

Manual: https://docs.cyfronet.pl/x/JpaZ

Cooperation Jagiellonian University Medical Collage, Kraków

Medical University of Silesia, Katowice

Institute of Oncology, Gliwice

DNA Microarray Integromics

Analysis Platform (2/2)

Page 7: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

7

https://galaxy.plgrid.pl/ ”Galaxy is an open, web-based platform for data intensive

biomedical research.”

Goal: deploy high-performance, high-throughput NGS data

analysis solution on top of HPC resources for PL-Grid users

Needs a lot of adjustments and in-house add-on development

Work started 12.2013, first version planned ~08.2014

Planned integrated tools (list not closed): GATK, SAMtools,

Bowtie, TopHat, BWA, bedtools, Cufflinks, Picard,

SnpEff/SnpSift, Flexbar, FastQC, MACS

References: human, mouse, domestic animals

Targeted platforms: Illumina *Seq, Roche 454, Ion Proton

Galaxy NGS Server (1/2)

Page 8: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

8 Galaxy NGS Server (2/2)

Notable features Full integration with Zeus cluster and large disk arrays

PBS and MQ system for effective job queuing and management

Secured environment (open for all PL-Grid users, not ”public”)

All major Galaxy features (history, sharing, viewers) enabled

Well documented workflows designed by NGS experts Basics (alignment and quality control, trimming, filtering)

DNA-Seq, RNA-Seq, variant calling, SNP calling, methylation, exome analysis with annotations

Manual: https://docs.cyfronet.pl/x/voas (available when service goes production)

Cooperation Institute of Pharmacology, Polish Academy of Sciences, Kraków

Jagiellonian University Medical Collage, Kraków

National Research Institute of Animal Production, Kraków-Balice

Page 9: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

9 Agilent GeneSpring GX

RDP: genespring.plgrid.pl

Used with Windows Remote Desktop

Integrated with the DNA Integromics Platform for uniform microarray files management

5-year, single-seat license for all registered Polish scientists

Manual: https://docs.cyfronet.pl/x/JIq1

Page 10: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

10 PLGData – Simple Service to

Manage Files on Clusters

https://data.plgrid.pl/ Simple file and folder management

upload, delete, download, rename, change rights

Integrated with Cyfronet’s Zeus cluster, accessible for all users

Uses GridFTP and HTTPS protocols for secure data transfer

Access to group storage for team/project collaboration

Manual: https://docs.cyfronet.pl/x/64es

Page 11: Genomic Data Analysis Services Available for PL-Grid Users · ”Galaxy is an open, web-based platform for data intensive biomedical research.” Goal: deploy high-performance, high-throughput

EUROPEAN UNION

EUROPEAN REGIONAL

DEVELOPMENT FUND

INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY

11

These resources, services and tools (and much more) are available after registering to PL-Grid

https://portal.plgrid.pl/

PL-Grid User Manual https://docs.plgrid.pl/podrecznik_uzytkownika (PL)

https://docs.plgrid.pl/display/PLGDoc/User+manual (EN)

Questions, problems, requests about PL-Grid https://helpdesk.plgrid.pl or [email protected]

Contact for LifeScience domain services [email protected]

Collaborative effort Academic Computer Centre Cyfronet AGH, Kraków (project leader)

Kraków LifeScience Klaster (Life Science domain services leader)

10 expert institutes/laboratories from Małopolska and Śląsk

Links, Contact, Partners