4

Click here to load reader

70 Everest-PGCon RT

Embed Size (px)

Citation preview

Page 1: 70 Everest-PGCon RT

EverestScaling to Petabytes

Yahoo!

May 2008

Page 2: 70 Everest-PGCon RT

2

Everest Architecture

Clients

Query Server

Storage Server

• Massively Parallel (Tens of PB)– Commodity Clusters

– Multi-tier scalability– Distributed Columnar Storage

• Smart– Optimized compression– Parallel Vector Query Processing– Query and Storage optimizations

– Query Expression and Columnar caching

• Leverage PostgreSQL– Tools and Connectivity (ODBC)– extensibility– UDF & UDAF framework

• Inexpensive– COTS

PostgreSQL Lib

Everest Extensions

Distributed QP

PostgreSQL Server

Node Storage Manager

Volume

Volume

Volume

PERL/Ruby.DBI ADO.NETODBC

Segment Manager

Scripts /Apps

PQLib

PgAdmin

Segmentation Platform

Logical Storage Manager

Storage Provider

Storage Proxy

Asynchronous Communications

Trans Server

QP

Shared Memory

Trans Proxy Mgmt Proxy LSM Proxy

MgmtServices

Storage Cache

Storage Provider

Storage Proxy

Asynchronous Communications

Storage Cache

Chunk Storage

Volume Storage Manager

Volume Metadata

Storage Services

Page 3: 70 Everest-PGCon RT

3

Performance and Scale

Data size Everest (min)

Vendor A (min)

Vendor B (min)

90 TB(600 B rows)

177 414 325

30 TB(200 B rows)

60 95 91

HW Cost (1 PB)

250 1200 1200

• Proven Petabytes scale in production– Approaching 2 PB, projected to grow > 30 PB by 2009– Largest table: 3.5 Trillion rows (time partitioned)

• 10x Price-Performance relative to commercial systems

Performance comparison

0

100

200

300

400

500

30 TB 90 TB

Data size

Res

po

nse

Tim

e (m

in)

Vendor A

Vendor B

Everest

Page 4: 70 Everest-PGCon RT

4

Everest Performance Advantages

• Source of Performance and Scale– Distributed Compressed Columnar Storage– Highly Parallel and Asynchronous

• Multi threaded Query Execution as well as Storage

– Vector Query Processing– Multi-level data partitioning and query partitioning– Cluster-level Compressed Columnar caching– Query expression caching– Yahoo! specific language extensions and UDF & UDAF