Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Data to Insight to Change
Chris JohnsonScientific Computing and Imaging Institute
University of Utah
Sources: Lesk, Berkeley SIMS, Landauer, EMC, TechCrunch, Smart Planet
0.1
1
10
100
1000
1999 2001 2003 2005 2007 2011
all disk storage
all digital info
new digital info/yr
all human documents in 40k Yrs
all spoken words in all lives
amount human minds can store in 1yr
Exab
ytes
(10
)
18
2737
2014
Every two days we create as much data as we did from the beginning of mankind until 2003!
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...Dan Ariely
Big Data
New Visual Analysis Techniques
341 Sections90nm thick sections
~32GB/Section~1000 tiles/section
4096x4096 pixels/tile2.18 nm/Pixel
16.5 TB after processing
Scientific Computing and Imaging Institute, University of Utah
Antony van Leeuwenhoek (1632-1723)
. . . my work, which I've done for a long time, was not pursued in order to gain the praise I now enjoy, but chiefly from a craving after knowledge, which I notice resides in me more than in most other men. And therewithal, whenever I found out anything remarkable, I have thought it my duty to put down my discovery on paper, so that all ingenious people might be informed thereof.
Antony van Leeuwenhoek. Letter of June 12, 1716
Scientific Computing and Imaging Institute, University of Utah
Connectome
PROBLEM-DRIVEN VISUALIZATION RESEARCH for biological data
- target specific biological problems- close collaboration with biologists- rapid, iterative prototyping- focus on genomic and molecular data
!"
!#
!$%!$&!$#!$'!()!($!((!(*
+$
+$#
,$
+$#,(
+(
+**
,*
+$*
+*)
,"
-
.
+&
+$)
/
0
+(
+$)
1$
!*)
+(#
)2)) $2))
)2)) $2))
!"# !"#"$3456/5. 5787597 0:.:57-;:4</=
%"&'()*+&"$ %"&,+-$,7/594<>??
$%
@A2)
B(2#
$&
@%2"
B*2'
$'(
@"2%
B*2*
$')
@"2#
B&2A
$'&
@A2"
BA2)
$'*
@A2(
BA2)
$+,
@%2'
B"2*
$+'
@%2*
B*2#
$++
@A2)
B#2$
$+-
@%2(
B#2$
@%2'
B&2)
@"2'
B&2A
@$2&
B(2"
@(2*
B*2'
@"2$
B"2%
@)2*
BA2)
@%2"
B$2'
@A2#
B(2A
@(2#
B*2)
@"2%
B*2"
@A2A
BA2)
@*2&
B$2&
@*2$
B$2(
@$2'
B#2$
.'
.+
.-
.%
./
.(
.)
.&
.*
.',
.''
.'+
.'-
.'%
0123456" 172884798:7;8<01:125="79>6;2586147?"68<5;@0123A1# <"2:5;78=":=5"A @0";5"@ ;>:="<10 8=":41#@
M. Meyer et al., EuroVis 2010. Pathlinesource: Humandestination: Lizard chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8chr9
chr10
chr11
chr12
chr13
chr14chr15
chr1
6ch
r17
chr1
8ch
r19
chr2
0
chr21
chr22
chrXchrY
chr3
chr1
chr2
chr3
chr4
chr5
chr6
chrachrb
chrcchrdchrfchrgchrh
saturationline
- +
10Mb
chr3
go to:
chr3 chr3
237164 146709664
386455 146850969
orientation:matchinversion
invert
out in
M. Meyer et al., InfoVis 2009.MizBee
!"#$%&'()*+$,-
(.+/
!"#$%&'(.+/
()*+$,-
!"#$%&'(.+/
(012$3
!"#$%&'(.+/
(456$3
.5789+-'(.+/
()*+$,-
!"##$%&'(
)#*%+,-.$/
&:& &:% 3:&&:;
&:<
3:&#=>=?@=(A?=>>BA,C;3="D!EFBA(.+/
[email protected]"=@!I?BA!"#$%&'()*+$,-
&:L,
3:%,3:%,
&:L,
0"%1'#$/GHH!=HG@IFJAH!FMN ?!=G@=(AH!FMN
5//
2+.O0+
85- +0+ P69 P7Q R7 9S 96S 6K1 T2 O--!"#$%&'(
()*+$,-,3U;
()*+$,-,;&,
()*+$,-,;&;
()*+$,-,3<V
()*+$,-,;&L
()*+$,-,3,C
()*+$,-,;,C
()*+$,-,3U,
85- +0+ P69 P7Q R7 9S 96S 6K1 T2 O--
3:&
&:&
&:;V &:,V&:<U
&:UU ;:U%
3:3&
.+7218A05/W+;:C
3:3
M. Meyer et al., InfoVis 2010.MulteeSum
Alignment
clear
class1class2class3class4class5class6class7class8
Site types
100 bp
region1
region2
region3
region4
region5
region1
region2
region3
region4
region5
Regions
InSite
!"#$%&'()*+%,-&./%,0&!*+1/*+"1'(2&*$/"0"1 %3$4
%3$44
%3$444
%3$45
%3$46
%3$5
%3$54
%3$544
%3$5444
%3$6
%3$64%3$644
%3$6444
%3$645
%3$646
%3$65
%3$654
%3$6544
%3$65444
%3$66
%3$664
%3$71
%3$45
%3$8
%3$9
%3$:
%3$;
%3$<
%3$=
%3$>%3$?%3$@
%3$8A
%3$88
%3$89
%3$8:
%3$8;
%3$8<%3$8=
%3$8>
%3$8?
%3$8@
%3$9A
%3$98
!/*#$/*+"1-+1&
B C
8AD.
%3$45
E"(*"'(
%3$45 %3$8?
=9?8; ==A?===
?=>@9 ==9<?=<
"$+&1*/*+"1'F/*%3+1G&$!+"1
(+1G&$*
!"#$%&'()*+%,-&./%,0&!*+1/*+"1'(2&*$/"0"1 %3$4
%3$44
%3$444
%3$45
%3$46
%3$5
%3$54
%3$544
%3$5444
%3$6
%3$64%3$644
%3$6444
%3$645
%3$646
%3$65
%3$654
%3$6544
%3$65444
%3$66
%3$664
%3$71
%3$45
%3$8
%3$9
%3$:
%3$;
%3$<
%3$=
%3$>%3$?%3$@
%3$8A
%3$88
%3$89
%3$8:
%3$8;
%3$8<%3$8=
%3$8>
%3$8?
%3$8@
%3$9A
%3$98
!/*#$/*+"1-+1&
B C
8AD.
%3$45
E"(*"'(
%3$45 %3$8?
=9?8; ==A?===
?=>@9 ==9<?=<
"$+&1*/*+"1'F/*%3+1G&$!+"1
(+1G&$*
!"#$%&'()*+%,-&./%,0&!*+1/*+"1'(2&*$/"0"1 %3$4
%3$44
%3$444
%3$45
%3$46
%3$5
%3$54
%3$544
%3$5444
%3$6
%3$64%3$644
%3$6444
%3$645
%3$646
%3$65
%3$654
%3$6544
%3$65444
%3$66
%3$664
%3$71
%3$45
%3$8
%3$9
%3$:
%3$;
%3$<
%3$=
%3$>%3$?%3$@
%3$8A
%3$88
%3$89
%3$8:
%3$8;
%3$8<%3$8=
%3$8>
%3$8?
%3$8@
%3$9A
%3$98
!/*#$/*+"1-+1&
B C
8AD.
%3$45
E"(*"'(
%3$45 %3$8
@AAA9= <:>8?A9
88?A=?= <=8?A=9
"$+&1*/*+"1'F/*%3+1G&$!+"1
(+1G&$*
"#* +1
Genome-wide synteny through highly sensitive sequence alignment: SatsumaM. Grabherr, et al. Bioinformatics (2010) 26 (9): 1145-1151.
Data Science Programs
• http://analytics.ncsu.edu/?page_id=4184
• 19 MS programs in Data Analytics
• 8 MS programs in Data Science
• 28 MS programs in Business Analytics
• Several additional “tracks” or “concentration” programs
Big Data Curriculum
Analytics Electives:
• Data Mining (required)
• Machine Learning (required)
• Visualization (required)
• Artificial Intelligence.
Decision making under uncertainty.
• Natural Language Processing.
Understanding textual data and language.
• Probabilistic Modeling.
Advanced statistical techniques and tools (using R).
• Image Processing.
Analysis and learning on image data.
Big Data Curriculum
Algorithmics Electives:
• Advanced Algorithms (required)
• Models of Computation for Big Data.How algorithmic bottlenecks change as data becomes very large;Relation to modern big data systems (e.g. MapReduce).
• Computational Geometry.Geometric interpretation of big data analysis and computation.
• Computational Topology.Topological data analysis and algorithms.
Big Data Curriculum
Management Electives:
• Database Systems (required)
• Parallel Programming for Many-Core Architectures.Parallel Computing and High Performance Computing.Scalable programming on GPUs, many-cores, and HPC clusters.
• Advanced Computer Networks.Large-scale network protocols, architectures, and applications.
• Network Security.Message integrity, access control, authentication, confidentiality.
Piloting in Adobe (Lehi)
• Starting Fall 2014.
• Live 2-way streaming. Interaction across video.
Fall 2014: Visualization: T-Th 9:10 - 10:30am
Fall 2014: Adv. Algorithms: T-Th 10:45 - 12:05am
(plan for early evening, e.g. Data Mining M-W 5:15-6:35pm)
• Potential for Instructor on site in future.
Adobe lecture room open to others.
The SCI Institute
Productivity Machines
Acknowledgments NIH/NIGMS Center for IntegrativeBiomedical Computing
Center&for&Extreme&Data&Management,Analysis,&and&Visualiza:on
CEDMAV
Utah&Center&for&Neuroimage&Analysis
Scalable&Data&Management,&Analysisand&Visualiza:on
NIH NAMIC
IAMCSInstitute for Applied Mathematicsand Computational Science
Scientific Computing and Imaging Institute, University of Utah
More Information
www.sci.utah.edu
Ecosystem Challenges Around Data Use
Chris JohnsonScientific Computing and Imaging Institute
University of Utah
Sources: Lesk, Berkeley SIMS, Landauer, EMC, TechCrunch, Smart Planet
0.1
1
10
100
1000
1999 2001 2003 2005 2007 2011
all disk storage
all digital info
new digital info/yr
all human documents in 40k Yrs
all spoken words in all lives
amount human minds can store in 1yr
Exab
ytes
(10
)
18
2737
2014
Every two days we create as much data as we did from the beginning of mankind until 2003!
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...Dan Ariely
Big Data
Leonid Zhukov - Director of Data Science, Ancestry.com
Vance Checketts - VP and GM of EMC
Edison Ting, Solutions Architect, Pivotal
Panelists