Upload
vudien
View
213
Download
0
Embed Size (px)
Citation preview
Puzzles in massively parallel
genome biology
await GPU solutions
Oleksiy Karpenko
Bioinformatics Program
University of Illinois at Chicago
Super Computing 2011
Seattle WA
This presentation is funded by Pan-American Advanced Studies Institute of NSF
2
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
3
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
4
DNA is the same
but active genes are different
ne
httpwwwbioapsanlgovscihiimgheartjpg httpwwwsciencephotocomimage190555largeF0014577-The_femur-SPLjpg
bone genesheart genes
5
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
bull the non-coding regions host important regulatory sites
6
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
2
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
3
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
4
DNA is the same
but active genes are different
ne
httpwwwbioapsanlgovscihiimgheartjpg httpwwwsciencephotocomimage190555largeF0014577-The_femur-SPLjpg
bone genesheart genes
5
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
bull the non-coding regions host important regulatory sites
6
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
3
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
4
DNA is the same
but active genes are different
ne
httpwwwbioapsanlgovscihiimgheartjpg httpwwwsciencephotocomimage190555largeF0014577-The_femur-SPLjpg
bone genesheart genes
5
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
bull the non-coding regions host important regulatory sites
6
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
4
DNA is the same
but active genes are different
ne
httpwwwbioapsanlgovscihiimgheartjpg httpwwwsciencephotocomimage190555largeF0014577-The_femur-SPLjpg
bone genesheart genes
5
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
bull the non-coding regions host important regulatory sites
6
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
5
Human genome for computer
scientists
hellipACGTTGGATCGAGACATGACGATGhellip
bull 4 letter alphabet of DNA A C G T
bull ~3 GB letters long
bull 2 are genes coding for proteins
bull 98 formerly known as ldquojunk DNArdquo
bull the non-coding regions host important regulatory sites
6
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
6
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
7
1) Transcription factors
httpwwwnaturecomnrijournalv2n9imagesnri887-f2gif
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
8
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
9
2) Histone modifications
httpwwwnyasorgimageaxdid=a261ff6e-1e42-442c-8dc6-428a60574df4ampt=634329511969400000
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
10
3) DNA methylation
httpwwwscqubccawp-contentuploads200608methylation[1]-GIFgif
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
11
Some known regulatory rules
httpagarizonaeduresearchxwangmilestonesRice_Chipchipjpg
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
12
Genome site signals detected
individually
hellipACGTTGGATCGAGACATGACGATGhellip
Single transcription
factor binding sites
Single histone modifcation
sites
DNA methylation
sites
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
13
Different signals for different
tissues
hellipACGTTGGATCGAGACATGACGATGhellip
ne
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
14
Complexity
Consider a combination of
bull ~2600 transcription factors
bull ~210 unique cell types
bull dozens of histone modifications
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
15
Genome signals
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
16
Gene neighborhood matters
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
17
Look for common patterns
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
18
Partition the data
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
19
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
GPU
core
Process in parallel
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology
20
Take home message
bull Biology as a science is becoming more
computationally oriented
bull Genome biology is massively parallel by
its nature
bull GPUs fit perfectly for solving problems in
genome biology