22
Indexing the Earth Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems [email protected]

Indexing the Earth - Hadoop World 2011

Embed Size (px)

Citation preview

Indexing the Earth

Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems

[email protected]

HadoopWorld 2011

Session Agenda

2

‣ Skybox

‣ The Big Data problem

‣ Indexing the planet at scale

‣ Questions

HadoopWorld 2011

Today’s data is old

3

Stadium under construction

(completed 2010)

Bridge under construction (completed

2009)

Convention center under construction (completed 2010)

Image taken September 2008. > than

three years old

HadoopWorld 2011

A problem of scale

4

HadoopWorld 2011

Satellite Imagery = Transparency...

215 automobiles

55,245 gallonsof oil crude

6,254containers

43%damage

-15%vegetation

5J F M A M J J A S O N D J F M A M J J A S O N D J F

HadoopWorld 2011

6

The problem ofcapacity

HadoopWorld 2011

7

Sensor networkin space

HadoopWorld 2011

New approach: Many distributed, low-cost satellites

8

HadoopWorld 2011

Total Raw Data compute

• Satellites produce ~1TB of raw data/day

9

0

3.75

7.5

11.25

15

Year1 Year2 Year3 Year4 Year50

5

10

15

20

Dat

a C

aptu

red

per

Yea

r (P

B)

Sen

sors

in N

etw

ork

Title

Sensor NetworkSingle SatelliteSensors in Network

HadoopWorld 2011

Total Raw Data storage

• Satellites produce ~1TB of raw data/day

10

0

7.5

15

22.5

30

Year1 Year2 Year3 Year4 Year50

5

10

15

20

Dat

a C

aptu

red

per

Yea

r (P

B)

Sen

sors

in N

etw

ork

Title

Sensor NetworkSingle SatelliteSensors in Network

HadoopWorld 2011

Enter the elephant

11

HadoopWorld 2011

Hadoop from space - processing bits

12

Hadoop is bad at:

๏Calling native C code or libraries at scale

๏Scientific computing is immature in Java

HadoopWorld 2011

Hadoop from space - processing bits

13

Standard Java Hadoop

๏Hadoop knows where data stored

๏Jobs efficiently scheduled close to data

๏Throughput optimized

HadoopWorld 2011

Hadoop from space - processing bits

14

Hadoop Pipes & Streaming

๏Hadoop schedules jobs without regard to

the data required by the job

๏Native code reads data across the network

๏Drives up network costs and drives down

throughput

HadoopWorld 2011

Hadoop from space - processing bits

15

BusBoy

✓Hadoop manages data reads & writes

✓Hadoop schedules jobs close to the data

✓Jobs read data and hand off to native code

for processing

HadoopWorld 2011

Architecture Overview

16

Hadoop Task

C code

math.libgdal.libcv.lib

BusBoy

Logging ProgressInputs Outputs

Hadoop JobTracker

HDFS HBase Hive

HadoopWorld 2011

Framework Benefits - Deployment

17

✓Low time to first byte

✓Insight into job progress

✓Diagnostics for large scale operations

✓Logging

HadoopWorld 2011

Framework Benifits - Development

18

✓Prototyping outside of Hadoop

✓Rapid turnaround

✓Testable interfaces

HadoopWorld 2011

Skybox providing Big Data

19

✓Produce the most complete and timely data

about the world

✓Make data available to users to mine the raw

data for information

✓Turn Big Data into knowledge, at Earth scale

SkyboxBusBoy

HadoopWorld 2011

20

Simulated from aerial platform using flight sensor

Color Images

HadoopWorld 2011

HD Video

HadoopWorld 2011

Questions?Sample Data?

[email protected]