23
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT. SAP HANA DATABASE PRESENTED BY : GEORGE JOSEPH S7 CS ALPHA ROLL NO-39 RSET , KERALA.

IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE

Embed Size (px)

Citation preview

IN-MEMORY DATABASE SYSTEMS

FOR BIG DATA MANAGEMENT. SAP HANA DATABASE

PRESENTED BY : GEORGE JOSEPH

S7 CS ALPHA ROLL NO-39

RSET , KERALA.

AGENDA

.ball

•Revisiting Traditional RDBMS

•Defining IMDB

•A look at a few IMDB products in the market

•SAP HANA database in detail

What is a database ?

.ball

•An organised collection of information•Allows reading and writing .•Provides authorisation and authentication.•Provides some level of data safety.

Traditional RDBMS

.ball

•Developed by E F Codd in early 1970s

•This model is based on tables rows and columns and the manipulation of data stored within.

•A Relational DB is the collection of all these table

•Example: Oracle, mysql & microsoft access

What is a database ?

.ball

•An organised collection of information•Allows reading and writing .•Provides authorisation and authentication.•Provides some level of data safety.

Data store for typical RDBMS

.ball

•Data resides on disk.•Data maybe cached into memory for access.

PROBLEM

.ball

• Existing disk-based systems can no longer offer timely response due to the high access latency to hard disks

•The unacceptable performance an obstacle for a meaningful real-time service.

•Eg :Real-time bidding, advertising, social gaming, Stock market .

“Memory is the new disk, disk is the new tape”

Jim Gray Data scientist

Creator IBM system R

.ball

© 2013 SAP AG. All rights reserved. 9Public

Hardware Advances: Moore’s Law - DRAM Pricing

1980: Memory $10,000/MB

2000: Memory $1/MB

2013: Memory $0.004/MB

Time

MemoryCost /Speed

gdfgfgfgh ss

© 2013 SAP AG. All rights reserved. 10Public

Hardware Advances: Moore‘s Law - CPUs

2002

1 core32 bits4MB

2007

2 cores2 CPUs per serverExternal Controllers

8 cores -16 threads / CPU4 CPUs per serverOn-chip memory controlQuick interconnectVM and vector support64 bits; 256 GB - 1 TB

2010

More cores, bigger caches16 ... 64 CPUs per server Greater on-chip integration(PCIe, network, ...)Data-direct I/OTens of TBs

2013

Images: Intel, Danilo Rizzuti / FreeDigitalPhotos.netball cold

IN-MEMORY DATABASE SYSTEMS

.ball

•For in-memory DB ,Data resides permanently on main memory.

•Source data is loaded into system memory in a compressed, non-relational format

•Only backup copy on disk.

•Memory optimised data structures are used

Disk VS Memory

.ball

•Order of magnitude of access time is less for main memory.•Main memory is normally volatile while disk storage is not.•The layout of disk is much more critical than layout of main memory

MMDB PRODUCTS AVAILABLE

.ball

.ball

.ball

•SAP HANA is the market leader in IMDB systems. It is also a platform for big data processing analysis and prediction.•SAP HANA can help business for building real-time applications and analytics for accelerating the process

© 2013 SAP AG. All rights reserved. 16Public

In-Memory

Column Database

Massively Parallel

Processing

Optimized Calculation

Engine

Columnar storage increases the amount of data that can be stored in limited memory

(compared to disk)

Column databases enable easier parallelization of

queries

Row buffer fast transactional processing

In-memory processing gives

more time for relatively slow

updates to column data

In-memory allows sophisticated

calculations in real-time

MPP optimized software enables linear performance

scaling making sophisticated calculations like allocations

possible

Each technology works well on its own, but combining them all is the real opportunity — provides all of the upside benefits while mitigating the downsides

SAP in-memory innovations make the “New Way” a reality

s

© 2013 SAP AG. All rights reserved. 17Public

Order Country Product Sales456 France corn 1000457 Italy wheat 900458 Italy corn 600459 Spain rice 800

SAP HANA: Column Store

456 France corn 1000

457 Italy wheat 900

458 Italy corn 600

459 Spain rice 800

456457458459

FranceItalyItalySpain

cornwheatcornrice

1000900600800

Typical Database

SAP HANA: column order

SELECT Country, SUM(sales) FROM SalesOrders WHERE Product = ‘corn’ GROUP BY Country

s

© 2013 SAP AG. All rights reserved. 18Public

SAP HANA: Data Compression

Efficient compression methods (dictionary, run length, cluster, prefix, etc.) Compression works well with columns and can speedup operations on

columns (~ factor 10) Because of compression, write changes into less compressed delta storage

Needs to be merged into columns from time to time or when a certain size is exceeded Delta merge can be done in background Trade-off between compression ratio and delta merge runtime

Updates into delta data storage and periodically merged into main data storage High write performance not affected by compression Data is written to delta storage with less compression which is optimized for write access. This is

merged into the main area of the column store later on.

© 2013 SAP AG. All rights reserved. 19Public

SAP HANA: Dictionary Compression

JonesMiller

MillmanZsuwalskiBakerMillerJohnMillerJohnsonJones

Column „Name“(uncompressed)

Value-ID sequenceOne element for each row in column

415N042431

Value ID

s

JohnsonMiller

JohnJones

01234

Millman

ZsuwalskiN

Dictionary

sorte

d

Value ID implicitly given by sequence in which values are stored

Value

Baker

5

Column „Name“ (dictionary compressed)

point intodictionary

s

© 2013 SAP AG. All rights reserved. 20Public

SAP HANA: ScalabilityScales from very small servers to very large clusters

Single Server• 2 CPU 128GB to 8 CPU 1TB

Scale Out Cluster• 2 to n servers per cluster• Largest certified configuration: 16

servers• Largest tested configuration: 100+

servers• Support for high availability

and disaster tolerance

Cloud Deployment

s

© 2013 SAP AG. All rights reserved. 21Public

What is inside HANA?

ACID Compliant Database- In-Memory- Column Store

Out

In

SQL

BICS

MDX

JSON / XML

DataServices

HANA Studio

ParallelExecution

ScriptingEngine

Business FunctionLibrary

Unstructured(Text)

PredictiveAnalysisLibrary

OLAP

XS AppServer

“R” HSIntegration

1. Batch Transfer2. SAP & Non-SAP3. Extensive Transformations4. Structured & Unstructured5. Hadoop Integration

1. ODBC / JDBC2. 3rd Party Apps3. 3rd Party Tools

1. BICS 2. NetWeaver BW3. SAP BOBJ

1. ODBO2. MS Excel3. 3rd Party OLAP Tools

1. HTTP2. RESTful services3. OData Compliant

“R”

ESP

Spatial /Geospatial

QueryFederation

1. IQ / ASE2. Teradata / Oracle3. Hadoop

ReplicationServices 1. Near Real Time

2. Non-SAP

s

.ball

•In-Memory Big Data Management and Processing: By Hao Zhang, Gang Chen, Member, IEEE, Beng Chin Ooi, Fellow, IEEE, Kian-Lee Tan, Member, IEEE, and Meihui Zhang, Member, IEEE

•SAP HANA Distributed In-Memory Database System: Transaction, Session, and Metadata Management Juchang Lee#1, Yong Sik Kwon#2, Franz Färber*3, Michael Muehle*4, Chulwon SAP Labs, Korea

•In-memory database www.wikipedia.org

REFERENCES

.ball