44
Using multi-core algorithms to speed up optimization Gary K. Chen Biostat Noon Seminar Introduction to high-performance computing Concepts Example 1: Hidden Markov Model Training Example 2: Regularized Logistic Regression Closing remarks Using multi-core algorithms to speed up optimization Gary K. Chen Biostat Noon Seminar March 23, 2011

Multi-core programming talk for weekly biostat seminar

  • Upload
    usc

  • View
    546

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Using multi-core algorithms tospeed up optimization

Gary K. ChenBiostat Noon Seminar

March 23, 2011

Page 2: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

An outline

Introduction to high-performance computing

Concepts

Example 1: Hidden Markov Model Training

Example 2: Regularized Logistic Regression

Closing remarks

Page 3: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

CPUs are not getting any faster

I Heat and power are the sole obstaclesI According to Intel: underclock a single core

by 20 percent and you save half the powerwhile sacrificing only 13 percent of theperformance.

I Implication? Two cores at the same powerhave 73% more performance

I (100− 13) ∗ 2/100

Page 4: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

1. High performance computingclusters

I Coarse-grained, aka “embararassinglyparallel”, problems

I 1. Launch multiple instances of the programI 2. Compute summary statistics across log

files

I ExamplesI Monte Carlo simulations (power/specificity),

GWAS scans, imputation, etc.

I RemarksI Pros: maximizes throughput (CPUs kept

busy), gentle learning curveI Cons: Doesn’t address some interesting

computational problems

Page 5: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Cluster Resource Example

I HPCC at USCI 94 teraflop clusterI 1,980 simultaneous processes running on

main queueI Jobs are asynchronous; can start and end in

any order

I Portable Batch SystemI Simply prepend some headers in your shell

script, describing how much memory youwant, how long your job will run, etc.

Page 6: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

2. High performance computingclusters

I Tightly-coupled parallel programsI Message Passing Interface

I 1. Programs are distributed across multiplephysical hosts

I 2. Each program executes the exact samecode

I 3. All processes can be synchronized atstrategic points

I RemarksI Pro: Can run interesting algorithms like

parallel tempered MCMCI Con: Developer is responsible for establishing

a communication protocol

Page 7: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Exploiting multiple-core processors

I Fine-grained parallelismI Suggests a much higher degree of

inter-dependence between each processI A “master” process executes majority of

code base. “Slave” processes are invoked toease bottlenecks.

I We hope to minimize the time spent in themaster process

I Some Bayesian algorithms stand to benefit

Page 8: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Amdahl’s Law

1(1−P)+ P

N

Page 9: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Heterogeneous Computing

Page 10: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Multi-core programming

I aka data-parallel programmingI Built in to common compilers (e.g. gcc)

I Very easy to get started!I SSE or Streaming SIMD Extensions: each

core can do vector operationsI OpenMP: parallel processing across multiple

coresI e.g. simply insert ”pragma omp for” directive

and compile with gcc!

I CUDA/OpenCLI CUDA is a proprietary C-based language

endorsed by nVidiaI OpenCL: standards based implementation

backed by the Khronos Group

Page 11: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

OpenCL and CUDA

I CUDAI Powerful libraries available to enrich

productivityI Thrust: C++ generics, cuBLAS: Level 1 and

2 parallel BLASI Supported only on nVidia GPU devices

I OpenCLI Compatible with nVidia and ATI GPU

devices, as well as AMD/Intel CPUsI Lags behind CUDA in libraries and toolsI Good to work with, given ATI hardware

currently leads in value

Page 12: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

A $60 HPC under your desk

Page 13: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

An outline

Introduction to high-performance computing

Concepts

Example 1: Hidden Markov Model Training

Example 2: Regularized Logistic Regression

Closing remarks

Page 14: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Threads and threadblocks

I Threads:I Perform a very limited function, but do all

the heavy liftingI Are extremely lightweight, so you’ll want to

launch thousands

I Threadblocks:I Developer assigns threads that can cooperate

on a common task into threadblocksI Threadblocks cannot communicate with one

another and run in any order(asynchronously)

Page 15: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Thread organization

Page 16: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Memory hierarchy

Page 17: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Kernels

I Warps/Wavefront:I Describes an atomic set of threads (32 for

nVidia, 64 for ATI)I Instructions are executed in lock step across

the set, each thread processing a distinctdata element

I Developer responsible for synchronizingacross warps

I Kernels:I Code that developer writes, which can

execute on a SIMD deviceI Essentially C functions

Page 18: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

An outline

Introduction to high-performance computing

Concepts

Example 1: Hidden Markov Model Training

Example 2: Regularized Logistic Regression

Closing remarks

Page 19: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

I Hidden Markov ModelsI A staple in machine learning.I Many applications in statistical genetics,

including imputation of untyped genotypes,local ancestry, sequence alignment (e.g.protein family scoring)

Page 20: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Application to cancer tumor data

I Extending PennCNVI Tissues are assumed to be a mixture of

tumor/normal cellsI Tumors are assumed to be heterogeneous in

CN across cells, implying fractional copynumber states

I PennCNV defines 6 hidden integer states fornormal cells and does not infer allelic state

I We can make more precise estimates of bothcopy numbers and allelic state in tumors withlittle sacrifice in performance

I Copy Num: z = (1-α)znormal + α ztumor

I z is fractional, whereas ztumor =I(z<=2)floor(z) + I(z>2)ceil(z)

Page 21: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

State Spacestate CNfrac BACnormal CNtumor BACtumor0 2 0 2 01 2 1 2 12 2 2 2 23 0 0 0 04 0 1 0 05 0 2 0 06 0.5 0 0 07 0.5 1 0 08 0.5 2 0 09 1 0 1 010 1 1 1 011 1 1 1 112 1 2 1 113 1.5 0 1 014 1.5 1 1 015 1.5 1 1 116 1.5 2 1 117 2.5 0 3 018 2.5 1 3 119 2.5 1 3 220 2.5 2 3 321 3 0 4 022 3 1 4 123 3 1 4 224 3 1 4 325 3 2 4 426 3.5 0 4 027 3.5 1 4 128 3.5 1 4 229 3.5 1 4 330 3.5 2 4 4

Page 22: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Training a Hidden Markov Model

I Objective: Infer the probabilities oftransitioning between any pair of states

I Apply forward-backward and Baum-Welchalgorithms

I A special case of theExpectation-Maximization (or generally,MM) family of algorithms

I Expectation step: forward-backwardcomputes posterior probs based on estimatedparameters

I Maximization: Baum-Welch empiricallyestimates parameters by averaging acrossobservations

Page 23: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

I Forward algorithmI We compute the probability vector at

observation t: f0:t = f0:t−1TOt

I Each state (element of the m-state vector)can independently compute a sum-product

I Threadblocks map to statesI Threads calculate products in parallel,

followed by a log2(m) addition reduction

Page 24: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Gridblock of threadblocks

Page 25: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Speedups

I We implement 8 kernels. Examples:I Re-scaling transition matrix (for SNP

spacing)I Serial: O(2nm2); Parallel: O(n)

I Forward backwardI Serial: O(2nm2); Parallel: O(nlog2(m))

I Normalizing constant (Baum-Welch)I Serial: O(nm); Parallel: O(log2(n))

I MLE of transition matrix (Baum-Welch)I Serial: O(nm2); Parallel: O(n)

Page 26: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Run time comparison

Table: 1 iteration of HMM training on Chr 1 (41,263SNPs)

states CPU GPU fold-speedup128 9.5m 37s 15x512 2h 35m 1m 44s 108x

Page 27: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

An outline

Introduction to high-performance computing

Concepts

Example 1: Hidden Markov Model Training

Example 2: Regularized Logistic Regression

Closing remarks

Page 28: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Regularized Regression

I Variable SelectionI For tractability, most GWAS analyses entail

separate univariate tests of each variable(e.g. SNP, GxG, GxE).

I However, it is preferable to model allvariables simultaneously to tease outcorrelated variables

I This is problematic when p < n. Parametersare unestimable, matrix inversion becomescomputationally intractable

Page 29: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Regularized Regression

I The LASSO method (Tibshirani, 1996)I Seeded a cottage industry of related methodsI e.g. Group LASSO, Elastic Net, MCP, NEG,

Overlap LASSO, Graph LASSOI Fundamentally solves variable selection

problem by introducing an L1 norm to invokesparsity

I Limitations: Do not provide a mechanismfor hypothesis testing (e.g p-values)

Page 30: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Regularized Regression

I Bayesian methodsI Posterior inferences on βI e.g.: Bayesian LASSO, Bayesian Elastic Net,I Highly computational. Scaling up to genome

wide scale is not obviousI MCMC is inherently serial, best option is to

speed up the sampling chainI Proposal: Implement key bottle neck on the

GPU: fitting βLASSO to the data

Page 31: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Optimization

I For binomial logistic regression:I L(β) =

∑ni=1[yi logpi + (1− yi)log(1− pi)]

I pi = eµ+xti β

1+eµ+xt

I 5L(β) =∑n

i=1[yi − pi(β)]xi

I −d2L(β) =∑n

i=1 pi(β)[1− pi(β)]xixti

I For *penalized* regression:I f (β) = L(β)− λ

∑pj=1 |βj |

I Find global maximum by applying NewtonRaphson one variable at a time.

I βm+1j = βm

j −Pn

i=1[yi−pi (βm)]xi−λsgn(βm

j )Pni=1 pi (βm)[1−pi (βm)]xix

ti

Page 32: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Overview of algorithmI Newton-Raphson kernel

I Each threadblock maps to a block of 512subjects (theads) for 1 variable

I Each thread calculates subject’s contributionto gradient and hessian

I Sum (reduction) across 512 subjectsI Sum (reduction) across subject blocks in new

kernel

I Compute log-likelihood change for eachvariable (like above).

I Apply a max operator (log2 reduction) toselect variable with greatest contributionto likelihood.

I Iterate repeatedly until likelihood increaseless than epsilon

Page 33: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Gridblock of threadblocks

Page 34: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Consideration of datatypes

I Need to compress genotypesI Why? Global memory is scarce, bandwidth is

expensiveI A warp of 32 threads loads 32 words

(containing 512 genotypes) into localmemory

Page 35: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Distributed GPU implementation

I For really large dimensions, we can link upan arbitrary number of GPUs

I MPI allows us to spread work across acluster

I Developed on Epigraph: 2 Tesla C2050sI Approach

I MPI master node delegates heavy lifting toslaves across network

I Master node performs fast serial code, suchas sampling from the full conditionallikelihood of any penalty parameter (e.g. λ)

I Network traffic is minimized so slaves mustmaintain up to date copies of data structures

Page 36: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Page 37: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Evaluation on large dataset

I GWAS dataI 6,806 African American subjects in a case

control study of prostate cancerI 1,047,986 SNPs typed

I Elapsed walltime for 1 LASSO iteration(sweep across all variables)

I 15 minutes on optimized serialimplementation across 2 slave CPUs

I 5.8 seconds on parallel implementation across2 nVidia Tesla C2050 GPU devices

I 155x speed up

Page 38: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Page 39: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

An outline

Introduction to high-performance computing

Concepts

Example 1: Hidden Markov Model Training

Example 2: Regularized Logistic Regression

Closing remarks

Page 40: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Conclusion

I Multicore programming is not a panaceaI Insufficient parallelism leads to an inferior

implementationI Graph algorithms *generally* do not map

well to SIMD architectures

I Programming EffortI Expect to spend at least 90 % time

debugging a black boxI Is it worth it? Human time > computer

time?I For generic problems (matrix multiplication,

sorting), absolutelyI OpenCL is a bit more verbose than CUDA,

but is more portable

Page 41: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Potential Future Work

I Reconstructing Bayesian NetworksI Compute joint probability for each possible

topologyI Code graph as a sparse adjacency matrix

I Approximate Bayesian ComputationI Sample θ from some assumed prior

distributionI Generate a dataset conditional on θI Examine how close fake data is to the real

one

Page 42: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Tomorrow’s clusters will requireheterogeneous programming

Page 43: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Tianhe-1A

I World’s faster supercomputerI 4.7 petaflops (quadrillion floating point

operations/sec)I 14,336 Xeon CPUs, 7,168 Tesla M2050s

I According to nVidiaI CPU only: 50k CPUs, twice the floor spaceI CPU only: 12 megawatts compared to 4.04

megawattsI $88 million dollars to build, $20 million for

annual energy costs

Page 44: Multi-core programming talk for weekly biostat seminar

Using multi-corealgorithms to

speed upoptimization

Gary K. ChenBiostat Noon

Seminar

Introduction tohigh-performancecomputing

Concepts

Example 1: HiddenMarkov ModelTraining

Example 2:RegularizedLogistic Regression

Closing remarks

Thanks to

I Kai: Ideas for CNV analysis

I Duncan, Wei: Discussions on LASSO

I Tim, Zack: Access to Epigraph

I Alex, James: Lively HPCdiscussions/debates