25
© 2013 by Elbit Systems | Elbit Systems Proprietary אלביט מערכות יבשה ותקשובDefense Industry & Open Source & BigData

Defense Industry & Open Source & BigData - hamakor.org.il

Embed Size (px)

Citation preview

Page 1: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

ותקשובאלביט מערכות יבשה

Defense Industry & Open Source & BigData

Page 2: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

מרצה

גרמן גברילוב[email protected]

אלביט מערכות יבשה ותקשוב

מנהל מודיעין

תחום סייבר

Page 3: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Defense Industry

Open SourceBig Data

Defense Industry & Open Source & Big Data

Page 4: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Agenda

צורך

גידול בנפח מידע עולמי

צורך במערכות מודיעניות

?Big Dataמה זה

3V Model of Big Data

Scale up / Scale out

CAP theorem

סוגי פתרונות

Apache Hadoopפרוייקט

HDFS

Map Reduce

Hadoop Projects

דוגמא לארכיטקטורה של מערכת מידע

Hadoopבעזרת

Page 5: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

גידול בנפח מידע עולמי -צורך

Twitter produces over 340 million tweets per day, with over 500

million registered users as of 2012

Over 32 billion searches were performed last month on Twitter

Facebook creates over 30 billion pieces of content ranging from

web links, news, blogs, photo

Zynga processes 1 petabyte of content for players every day

More than 2 billion videos are watched on YouTube every day

By 2015, nearly 3 billion people will be online, pushing

the data created and shared to nearly 8 zettabytes.

Page 6: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

גידול בנפח מידע עולמי -צורך

Page 7: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

גידול בנפח מידע עולמי -צורך

quantity of global data

Page 8: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

צורך במערכות מודיעניות -צורך

יכולת קליטה בזמן קצר נפחים גדולים

(near real-time)של נתונים

יכולת קליטה סוגים שונים של נתונים

יכולת עיבוד נפחים גדולים של מידע

יכולת הרצת אנליזות שונות מותאמות

סוג מידע

יכולת תחקור של הצגה של מידע

מהירה ונוחה, בצורה ברורההלקוח רוצה לדעת לקרוא את המידע הקיים

בעולם בצורה נוחה

Page 9: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

צורך במערכות מודיעניות -צורך

Page 10: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

דוגמאות לתמונות שאנשים העלו בחשבון טוויטר

Page 11: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

?Big Dataמה זה

What is data?

Data is Information in raw or unorganized form such as alphabets,

numeric or symbols.

What is Big Data?

Big Data refers to large datasets which are difficult to store, manage

and analyze.

Everyday, we create over 2.5 trillion byte of

data – so much that 90% of the data in the

world today has been created in the last tow

years alone.

Page 12: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

?Big Dataמה זה

O’Reilly Radar definition:

Big data is when the size of the data itself becomes part of the problem

• EMC/IDC definition of big data:

Big data technologies describe a new generation of technologies and

architectures, designed to economically extract value from very large volumes

of a wide variety of data, by enabling high-velocity capture, discovery, and/or

analysis.

• IBM says that ”three characteristics define big data:”

Volume (Terabytes -> Zettabytes)

Variety (Structured -> Semi-structured -> Unstructured)

Velocity (Batch -> Streaming Data)

Page 13: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

3V Model of Big Data

Page 14: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

ביזור מדיע בין מכונות

Scale up / Vertical scaling Scale out / Horizontal scaling /

Distributed systems

To scale horizontally means to add more nodes

to a system, such as adding a new computer to

a distributed software application.

To scale vertically means to add resources to a

single node in a system, typically involving the

addition of CPUs or memory to a single

computer.

Page 15: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

CAP theorem

CA

RDBMSs (MySql,…(

Greenplum

Vertica

Aster Data

AP

Cassandra

CouchDB

SimpleDB

Dynamo

CP

Hbase

MongoDB

Terrastore

BigTable

MemcacheDB

Page 16: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

סוגי פתרונות

Conceptual StructuresDescriptionStore type

Schema-lessKey Value Stores

Storage by columnColumn-oriented

databases

Uses nodes and edges to

represent data.

Graph Databases

Store documents that are

semi-structured. Often XML

databases.

Document Oriented

Databases

Sharded RDBMS

(MPP databases)

ValueKey

Data

Node

Data

Node

Data

Node

Structured

Document (XML)Key

RDBMS RDBMS RDBMS

Weight

2.85 kg

1.23 kg

3.76 kg

Price

24.00 $

17.50 $

27.30 $

Target

Israel

Italia

Turkey

Page 17: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

סוגי פתרונות

FunctionalityComplexity

of Operation

Flexibility in

Data Variety

Horizontal

Scalability

PerformanceType

variable

(none)nonehighhighhigh

Berkeley

Scalaris

MemcacheDB

Key-Value

stores

minimallowmoderatehighhigh

Cassandra

HP Vertica

BigTable

Hbase

OrientDB

Column-oriented

databases

graph

theoryhighhighvariablevariable

Neo4j

InfiniteGraph

Titan

OrientDB

Graph

Databases

variable

(low)lowhigh

variable

(high)high

CouchDB

MongoDB

SimpleDB

Redis

Document

Oriented

Databases

relationalmoderatelowvariablevariable

HP Vertica

EMC

Greenplum

Aster Data

Shard RDBMS

(MPP)

Page 18: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Apache Hadoopפרוייקט

hadoop.apache.org

“The Apache Hadoop software library is a framework that allows for the

distributed processing of large data sets across clusters of computers using a

simple programming model”

wikipedia.org

Apache Hadoop is an open-source software framework that supports data-

intensive distributed applications. Hadoop implements a computational paradigm

named MapReduce, where the application is divided into many small fragments

of work, each of which may be executed or re-executed on any node in the

cluster.

Hadoop provides a distributed file system that stores data on the compute

nodes, providing very high aggregate bandwidth across the cluster. It enables

applications to work with thousands of computation-independent computers and

petabytes of data.

Page 19: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Apache Hadoopפרוייקט

Facebook.com

Amazon.com

Ancestry.com

Akamai

American Airlines

AOL

Apple

eBay

Hortonworks

Federal Reserve Board of Governors

Foursquare

Yahoo!

InMobi

Intuit

Joost

Last.fm

LinkedIn

Microsoft

NetApp

Netflix

Ooyala

Riot Games

The New York Times

SAP AG

SAS Institute

StumbleUpon

Twitter

Yodlee

Fox Interactive Media

Gemvara

Google

Hewlett-Packard

IBM

Organizations are using Hadoop to run large distributed computations

IBM - InfoSphere BigInsights

Oracle - Big Data Appliance

EMC - Pivotal HD

Microsoft – HDInsights

Others

Companies are provides Hadoop in they products

Page 20: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Apache Hadoop–hdfsפרוייקט

HDFS is a distributed, scalable, and portable

file system. HDFS is designed to store a large

amount of data in various servers/clusters.

Page 21: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Apache Hadoop–map/reduceפרוייקט

MapReduce is the key algorithm that the

Hadoop MapReduce engine uses to distribute

work around a cluster.

Page 22: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Apache Hadoopפרוייקט

• Pig )simply query language(

• Hive )SQL like queries(

• Cascading )software abstraction layer (

• Mahout )machine learning(

• Hama )scientific computation(

• Avro )data serialization system(

• Hadoop Map Reduce implementation

• Ambari (deploying, managing, and monitoring tool)

• Sqoop (transferring data tool)

• Oozie (workflow scheduler system)

• Zookeeper (coordination service)

• Flume (framework for populating Hadoop)

• Hadoop Distributed File System

• Hue (File Browser for HDFS)

• HBase (column oriented database)

• HCatalog (table/storage management service)

Data Access / Query

abilities

Map Reduce

Distributed processing

Storage / Data

structure

Management tools

Page 23: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Hadoop Ecosystem

Page 24: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

Hadoopדוגמא לארכיטקטורה של מערכת מידע בעזרת

Page 25: Defense Industry & Open Source & BigData - hamakor.org.il

© 2013 by Elbit Systems | Elbit Systems Proprietary

סוף

Thank You!

גידול בנפח מידע עולמי

צורך של מערכות מודעיניות

Big Dataפתרונות

Apache Hadoopמימוש בעזרת