Upload
lynn-langit
View
203
Download
1
Embed Size (px)
Citation preview
1
Scaling Galaxy on GCPLynnLangit Cloud and Data Architect Google Developer Cloud Expert, AWS Community Hero, Microsoft Data Platform MVP
2
Agenda
• Scaling Up• Virtual Machines• Hello Galaxy• Adding Tools to Galaxy• Genomic Data on GCP
• Scaling Out• Docker Containers• Google Persistent Disks• Pipelines • Google Genomics APIs• Big Query
Galaxy on Google Cloud Platform
3
Google Cloud in AustraliaData center here in 2017
4
Galaxy on GCP – Scale Up
5Google Cloud Platform 5
Demo 1- Hello Galaxy on Google Cloud
6Google Cloud Platform 6
Demo 2 - Adding Galaxy Tools
77
• Cloud Storage (file) buckets• Source data
• Compute Engine Virtual Machines• Virtual Machine Image files• External VM persistent hard disks with your
source data
GCP Virtual Machine Services
Key Concepts: -- VM configuration as code -- Fast, cheap scalable VMs
88
• Re-size Virtual Machines• Attach more persistent disks• Update base image• Monitor with Stackdriver
Scale Up Patterns
99
1010
Genomic Data• Files at GCS
• gs://genomics-public-data• Query via BigQuery
• https://bigquery.cloud.google.com/queries/genomics-public-data
• Code via Genomics API• Implements Global Alliance for Genomics and
Health APIs• Genome browser - https://gabrowse.appspot.com• Google Genomics example code on GitHub
1111
12
Galaxy on GCP – Scale Out
1313
GCP Docker Container Services
• Cloud Storage• Container Engine / Docker
Key Concepts: -- Container configuration as code -- Fast, cheap scalable Docker Containers
1414
1515
• Docker Container Cluster• Kubernetes manager• Container orchestration
Scale Out Patterns
1616
GCP Serverless Services
• Cloud Functions• Microservices
Key Concepts: -- Function configuration as code -- Fast, cheap scalable Microservices
17
Galaxy on GCP – Advanced Pipelines
18Google Cloud Platform 18
Demo 3 – Using the Google Genomics API & BigQuery
1919
BigQuery
• ANSI SQL Queries • Query-as-a-service
Key Concepts: -- SQL query configuration as code -- Fast, cheap scalable SQL Queries
20
Private Datasets
Public Datasets
Variant Analysis
MSSNG AutismCloud Storage
Scientist
HighThroughputGenomeSequencers
1000 GenomesCloud Storage
Patient DataCloud Storage
Illumina PlatformCloud Storage
Ref GenomesCloud Storage
TCGACloud Storage
Analytics
Online AnalyticsBigQuery
Batch AnalyticsCloud Dataflow
Lab NotebooksCloud Datalab
Data IngestGenomics
BAMFASTQ
21
Ingest
Elastic Cluster
Storage
Analytics
CarrierInterconnect
HighThroughputGenomeSequencers
Scientist
Raw DatafilesCloud Storage
Processed DataCloud Storage
MetadataCloud SQL
Lab notebooksCloud Datalab
HPC ClusterCompute Engine10 Nodes
Ingest ServerCompute Engine
Online AnalyticsBigQuery
Cloud LoadBalancing
CloudNetwork
Genomics, Secondary Analysis
2222
• Cloud Storage / Public datasets on GCP
• Big Query• Cloud Dataflow• Genomics API
Advanced GCP Pipelines Core Products
Key Concepts: -- Pipeline configuration as code -- Fast, cheap scalable cloud services
23
Resources
2424
• Cloud Storage (files) -- here• Compute Engine (VMs) -- here• Container Engine (Docker) -- here• Big Query (SQL) -- here• Cloud Dataflow (pipelines) -- here• Genomics API-- here
• Genomics Cookbook– here• Public datasets on GCP-- here• Google’s Genomic code samples – here• Lynn’s GitHub code samples -- here
Resources
25
More about Google Cloud Services
26
Compute
Compute Engine
App Engine
Container Engine
Container
Registry
Cloud Function
s
Networking
Cloud Virtual
NetworkCloud Load Balancing
Cloud CDN
Cloud Interconnec
tCloud DNS
Big Data
BigQuery Cloud Dataflow
Cloud Dataproc
Cloud Datalab
Cloud Pub/Sub
Genomics
Identity & Security
Cloud IAM
Cloud Resource Manager
Cloud Security Scanner
Cloud Platform Security
Storage and Databases
Cloud Storage
Cloud Bigtable
Cloud Datastor
eCloud SQL
Persistent Disk
Machine Learning
Cloud Machine Learning
Vision API
Speech API
Natural Language
APITranslatio
n API
Google Cloud Platform Services Part One
Jobs API
27
Management Tools
Stackdriver
Monitoring Logging
Error Reportin
gTrace Debugge
rDeployme
nt Manager
Cloud Endpoint
sCloud
Console
Developer Tools
Cloud SDK
Deployment
ManagerCloud Source Repositories
Cloud Tools for Android Studio
Cloud Tools for IntelliJ
Cloud Tools for
PowerShell
Cloud Tools for Visual
StudioGoogle Plug-in for Eclipse
Cloud Test Lab
Google Cloud Platform Services Part Two
Cloud Shell
Cloud Mobile App
Billing App
Cloud APIs
28
29
30
GCE Persistence Options – Disks, etc.… Created From Notes
Image GCS File or Disk File path <bucket>/<folder>/<file>Disk must detached from VM
Snapshot Disk or Instance (boot) Can create an Instance FROM a Snapshot
Persistent Disk
Image –or-Snapshot –or- Blank
Blank disk must be formattedCan create an Instance or Snapshot FROM a Disk
Bucket GCS console for file Access via path gs://<bucketName>/<fileName>
VM InstanceBoot Disk
Image –or-Snapshot –or-Disk
Images -> OS, Application or Custom ImageN/AFrom Saved Disk
VM Instance Additional Disk
Local Scratch –or-Standard persistent –or-SSD persistent
Max 8 at 375 GB each.500 GB 64 TBRead/Write or Read OnlyAttach up to 16 Disks* per VM
31