Upload
skyboximaging
View
96
Download
0
Tags:
Embed Size (px)
Citation preview
Indexing the Earth
Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems
HadoopWorld 2011
Session Agenda
2
‣ Skybox
‣ The Big Data problem
‣ Indexing the planet at scale
‣ Questions
HadoopWorld 2011
Today’s data is old
3
Stadium under construction
(completed 2010)
Bridge under construction (completed
2009)
Convention center under construction (completed 2010)
Image taken September 2008. > than
three years old
HadoopWorld 2011
Satellite Imagery = Transparency...
215 automobiles
55,245 gallonsof oil crude
6,254containers
43%damage
-15%vegetation
5J F M A M J J A S O N D J F M A M J J A S O N D J F
HadoopWorld 2011
Total Raw Data compute
• Satellites produce ~1TB of raw data/day
9
0
3.75
7.5
11.25
15
Year1 Year2 Year3 Year4 Year50
5
10
15
20
Dat
a C
aptu
red
per
Yea
r (P
B)
Sen
sors
in N
etw
ork
Title
Sensor NetworkSingle SatelliteSensors in Network
HadoopWorld 2011
Total Raw Data storage
• Satellites produce ~1TB of raw data/day
10
0
7.5
15
22.5
30
Year1 Year2 Year3 Year4 Year50
5
10
15
20
Dat
a C
aptu
red
per
Yea
r (P
B)
Sen
sors
in N
etw
ork
Title
Sensor NetworkSingle SatelliteSensors in Network
HadoopWorld 2011
Hadoop from space - processing bits
12
Hadoop is bad at:
๏Calling native C code or libraries at scale
๏Scientific computing is immature in Java
HadoopWorld 2011
Hadoop from space - processing bits
13
Standard Java Hadoop
๏Hadoop knows where data stored
๏Jobs efficiently scheduled close to data
๏Throughput optimized
HadoopWorld 2011
Hadoop from space - processing bits
14
Hadoop Pipes & Streaming
๏Hadoop schedules jobs without regard to
the data required by the job
๏Native code reads data across the network
๏Drives up network costs and drives down
throughput
HadoopWorld 2011
Hadoop from space - processing bits
15
BusBoy
✓Hadoop manages data reads & writes
✓Hadoop schedules jobs close to the data
✓Jobs read data and hand off to native code
for processing
HadoopWorld 2011
Architecture Overview
16
Hadoop Task
C code
math.libgdal.libcv.lib
BusBoy
Logging ProgressInputs Outputs
Hadoop JobTracker
HDFS HBase Hive
HadoopWorld 2011
Framework Benefits - Deployment
17
✓Low time to first byte
✓Insight into job progress
✓Diagnostics for large scale operations
✓Logging
HadoopWorld 2011
Framework Benifits - Development
18
✓Prototyping outside of Hadoop
✓Rapid turnaround
✓Testable interfaces
HadoopWorld 2011
Skybox providing Big Data
19
✓Produce the most complete and timely data
about the world
✓Make data available to users to mine the raw
data for information
✓Turn Big Data into knowledge, at Earth scale
SkyboxBusBoy
HadoopWorld 2011
Questions?Sample Data?