26
© 2011 Evaluator Group, Inc. The Information Company for Storage Professionals Slide 1 Shared Storage for Shared Nothing John Webster Senior Partner [email protected]

Shared Storage for Shared Nothing · The Analytics Way to Say Big Data . Big Data Analytics is: A term for business intelligence (BI) processes that are different from traditional

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Slide 1

Shared Storage for Shared Nothing

John Webster Senior Partner [email protected]

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Big Data “Never has a term so vague meant so much to so many”

- Chief Marketing Officer of Major IT Vendor

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Agenda • The two ways to say Big Data:

– Big Data Storage – Big Data Analytics

• Distributed computing for Big Data Analytics (a.k.a. Shared Nothing)

– MapReduce (i.e. Apache Hadoop and knock-offs) and the Shared Nothing architecture

– Distributed/scalable database from Open Source and the traditional data warehouse vendors

• Stream computing and Complex Event Processing (CEP) • Is there a place for shared Big Data Storage in Big Data

Analytics? • If so, what does it look like? • Overheard around the Big Data water cooler

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

The Storage Way to Say Big Data

Defined by architectural platform, Big Data Storage is: – Scale-out NAS – Single NameSpace, Global NameSpace File System – NAS gateway to SAN and Scale-out SAN – Object-based storage

Defined by application, Big Data Storage is: – Storage for applications that handle large data sets – Examples: Media & Entertainment, Oil & Gas

Exploration, Life Sciences, etc.

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

The Analytics Way to Say Big Data Big Data Analytics is:

A term for business intelligence (BI) processes that are different from traditional Data Warehousing

The ability to tap unstructured data as a source for BI processes Information delivered to users in real or near-real time (but not an

absolute requirement) Convergence of multiple data sources

Latency introduced by storage, including networked storage, is often assiduously avoided

Cost is minimized

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

MapReduce and Apache Hadoop

• Apache Hadoop—Open Source project inspired by Google’s MapReduce framework and the need for an alternative to traditional data warehousing

• Cloudera is the commercial face of Apache Hadoop

• However, there are derivatives (Yahoo/HortonWorks, MapR)

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Scalable Database

• The xSQL communities (mySQL, NoSQL, newSQL) are another open source way do Big Data Analytics – Vibrant and growing communities – Examples: MongoDB (as in “humongous”), Terrastore

• The traditional DW vendors are responding with: – In-memory DB – In-memory Hadoop – The discovery of Flash-based SSD and DRAM as block

storage

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Stream Processing for Real Time Analytics Big Data Analytics delivering

information in real time StreamSQL says process first, then store

Examples: StreamBase, IBM InfoStreams, Ingress VectorWise

Real time processing applications using StreamSQL today:

Equity Trading, Telecomm Infrastructure Monitoring, Intelligence, Fraud Detection

Complex Event Processing (CEP) is platform for real time analytics using stream processing Source: StreamBase

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Shared Storage for the Traditional Data Warehouse

Data Warehouse

Reports

Archive Extract, Transform, Load (ETL)

Schedules

Ad hoc Queries

Dashboards Notifications

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Distributed, Shared Nothing Architectures for Big Data Analytics

NODE 1

NODE 2

NODE 3

NODE n

DAS DAS DAS DAS

1 2 3 4 5 6 7 8

B8

GM

R3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

DAS

Network Layer

Compute Layer

Storage Layer

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

CAP theorem

It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same

time) • Availability (a guarantee that every request receives a

response about whether it was successful or failed) • Partition tolerance (the system continues to operate

despite arbitrary message loss or failure of part of the system)

A distributed system can satisfy any two of these guarantees at the same time, but not all three

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Why Should IT Professionals Care? • Distributed computing for analytics (Hadoop for

example) is moving from science experiment to mission critical

• As this happens, data encompassed by these applications becomes the responsibility of IT professionals who worry about: – Security – Data Protection/Disaster Recovery/Business Continuance – Data Governance and Compliance – Digital Records Management and Archiving

• Shared storage can be used to address these concerns

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

NODE 1

NODE 2

NODE 3

NODE n

1 2 3 4 5 6 7 8

B8G

MR

3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Secondary Storage

Network Layer

Compute Layer

Storage Layer

SAN/NAS

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

NODE 1

NODE 2

NODE 3

NODE n

1 2 3 4 5 6 7 8

B8G

MR

3 Link

Active

Link

Active

Link

Active

ConsolePwr

Active

Link

Active

CONTROL

Shared Storage as Primary Storage

Network Layer

Compute Layer

Storage Layer

SAN and Scale-out NAS

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Why not Shared Storage?

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Shared Primary/Secondary Storage

Advantages – Enhances system availability and performance in

some cases – Addresses the enterprise storage requirements

– Security – Data Protection/Disaster Recovery/Business Continuance – Data Governance and Compliance – Digital Records Management and Archiving

Disadvantages – Latency – Additional cost – Crosses a “cultural” boundary

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Is Hadoop a Storage Device? • No

– It’s a distributed computing platform • Yes

– HDFS - Embedded, distributed file system (like scale-out NAS)

– Data protection built-in (multiple data copies but not RAID)

– 1K node cluster w/ 1TB RAM per node = 1PB of very high performance storage

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Evaluating Hadoop as a Storage Device

• Single Points of Failure Eliminated? • SSD and automated tiering? • Dedupe? • Snapshots? • Insert your hot-button storage feature

here: __________

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “Hadoop is a revamp of how we store and access data”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “Hadoop is not about real time”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “The big elephant doesn’t move through the little pipes especially well”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “Hadoop is the new tape”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “If we don’t move on this, someone else will.”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “We don’t know the questions we may want to ask in the future”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Overheard at the Big Data Water Cooler “It’s not information overload. Its filter failure”

© 2011 Evaluator Group, Inc.

The Information Company for Storage Professionals

Questions?

John Webster

[email protected]