33
S. Melnik, A. Gubarev, J. Long, G. Romer, S. Shivakumar, M. Tolton Google Inc. VLDB 2010 Presented by Ke Hong (slide adapted from Melnik’s) 1 Dremel: Interactive Analysis of Web- Scale Datasets

Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

S. Melnik, A. Gubarev, J. Long, G. Romer,

S. Shivakumar, M. Tolton

Google Inc.

VLDB 2010

Presented by Ke Hong

(slide adapted from Melnik’s) 1

Dremel: Interactive Analysis of Web-

Scale Datasets

Page 2: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Outline

2

Problem

Motivation

Nested Columnar Storage

Query Execution

Evaluation

Summary

Page 3: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Interactive Query

3

Run a MapReduce to extract billions of signals from web

pages

Launch Dremel to execute several interactive commands

and extract an irregularity in signal1

Feed the incoming input data into another MapReduce

pipeline

DEFINE TABLE t AS /path/to/data/*

SELECT TOP(signal1, 100), COUNT(*) FROM t

. . .

Page 4: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Problem

4

How to do SQL-like aggregation queries

On trillion-row tables

On petabytes of data

Using thousands of CPUs

Executed within seconds

Serving thousands of concurrent users

Page 5: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Motivation

5

Analysis of crawled web documents

Tracking install data for applications on Android Market

Crash reporting for Google products

Nested Columnar Storage

OCR results from Google Books

Spam analysis

Debugging of map tiles on Google Maps

Tablet migrations in managed Bigtable instances

Results of tests run on Google's distributed build system

Disk I/O statistics for hundreds of thousands of disks

Resource monitoring for jobs run in Google's data centers

Symbols and dependencies in Google's codebase

Page 6: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Motivation

6

Parallel DBMS

Limited scalability

Limited interactive speed

Not fault tolerant

Translating SQL into MapReduce jobs

HadoopDB, Pig Latin, Hive

One job takes thousands of seconds

Hard to support multi-user

Page 7: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Dremel Data Layout

7

Nested structure

τ = dom | <A1:τ[*|?], ..., An:τ[*|?]>

τ as an atomic or record type

* as repeated fields

? as optional fields

A

B

C D

E

*

*

*

. . .

. . .

r1

r2

r1

r2

r1

r2

r1

r2

column-oriented row-oriented

Read less,

cheaper

decompression

Page 8: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Row vs. Column

8

Execution time on 3000 nodes to process an 85 billion record table

87 TB 0.5 TB 0.5 TB

sec

Page 9: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Nested Data Model

9

message Document {

required int64 DocId; [1,1] optional group Links {

repeated int64 Backward; [0,*]

repeated int64 Forward;

}

repeated group Name {

repeated group Language {

required string Code;

optional string Country; [0,1]

}

optional string Url;

}

}

DocId: 10

Links

Forward: 20

Forward: 40

Forward: 60

Name

Language

Code: 'en-us'

Country: 'us'

Language

Code: 'en'

Url: 'http://A'

Name

Url: 'http://B'

Name

Language

Code: 'en-gb'

Country: 'gb'

DocId: 20

Links

Backward: 10

Backward: 30

Forward: 80

Name

Url: 'http://C'

multiplicity:

Page 10: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Column-Striped Representation

10

value r d

10 0 0

20 0 0

DocId

value r d

http://A 0 2

http://B 1 2

NULL 1 1

http://C 0 2

Name.Url

value r d

en-us 0 2

en 2 2

NULL 1 1

en-gb 1 2

NULL 0 1

Name.Language.Code Name.Language.Country

Links.Backward Links.Forward

value r d

us 0 3

NULL 2 2

NULL 1 1

gb 1 3

NULL 0 1

value r d

20 0 2

40 1 2

60 1 2

80 0 2

value r d

NULL 0 1

10 0 2

30 1 2

Page 11: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Repetition & Definition

Level

11

value r d

en-us 0 2

en

NULL

en-gb

NULL

Name.Language.Code

r1.Name1.Language1.Code: 'en-us'

r1.Name1.Language2.Code: 'en'

r1.Name2

r1.Name3.Language1.Code: 'en-gb'

r2.Name1

: common prefix

DocId: 10

Links

Forward: 20

Forward: 40

Forward: 60

Name

Language

Code: 'en-us'

Country: 'us'

Language

Code: 'en'

Url: 'http://A'

Name

Url: 'http://B'

Name

Language

Code: 'en-gb'

Country: 'gb'

DocId: 20

Links

Backward: 10

Backward: 30

Forward: 80

Name

Url: 'http://C'

Repetition and definition levels encode the

structural delta between the current value

and the previous value.

r: length of common path prefix

d: length of the current path

Page 12: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Repetition & Definition

Level

12

value r d

en-us 0 2

en 2 2

NULL

en-gb

NULL

Name.Language.Code

r1.Name1.Language1.Code: 'en-us'

r1.Name1.Language2.Code: 'en'

r1.Name2

r1.Name3.Language1.Code: 'en-gb'

r2.Name1

: common prefix

DocId: 10

Links

Forward: 20

Forward: 40

Forward: 60

Name

Language

Code: 'en-us'

Country: 'us'

Language

Code: 'en'

Url: 'http://A'

Name

Url: 'http://B'

Name

Language

Code: 'en-gb'

Country: 'gb'

DocId: 20

Links

Backward: 10

Backward: 30

Forward: 80

Name

Url: 'http://C'

Repetition and definition levels encode the

structural delta between the current value

and the previous value.

r: length of common path prefix

d: length of the current path

Page 13: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Repetition & Definition

Level

13

value r d

en-us 0 2

en 2 2

NULL 1 1

en-gb

NULL

Name.Language.Code

r1.Name1.Language1.Code: 'en-us'

r1.Name1.Language2.Code: 'en'

r1.Name2

r1.Name3.Language1.Code: 'en-gb'

r2.Name1

: common prefix

DocId: 10

Links

Forward: 20

Forward: 40

Forward: 60

Name

Language

Code: 'en-us'

Country: 'us'

Language

Code: 'en'

Url: 'http://A'

Name

Url: 'http://B'

Name

Language

Code: 'en-gb'

Country: 'gb'

DocId: 20

Links

Backward: 10

Backward: 30

Forward: 80

Name

Url: 'http://C'

Repetition and definition levels encode the

structural delta between the current value

and the previous value.

r: length of common path prefix

d: length of the current path

Page 14: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Repetition & Definition

Level

14

value r d

en-us 0 2

en 2 2

NULL 1 1

en-gb 1 2

NULL

Name.Language.Code

r1.Name1.Language1.Code: 'en-us'

r1.Name1.Language2.Code: 'en'

r1.Name2

r1.Name3.Language1.Code: 'en-gb'

r2.Name1

: common prefix

DocId: 10

Links

Forward: 20

Forward: 40

Forward: 60

Name

Language

Code: 'en-us'

Country: 'us'

Language

Code: 'en'

Url: 'http://A'

Name

Url: 'http://B'

Name

Language

Code: 'en-gb'

Country: 'gb'

DocId: 20

Links

Backward: 10

Backward: 30

Forward: 80

Name

Url: 'http://C'

Repetition and definition levels encode the

structural delta between the current value

and the previous value.

r: length of common path prefix

d: length of the current path

Page 15: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Repetition & Definition

Level

15

value r d

en-us 0 2

en 2 2

NULL 1 1

en-gb 1 2

NULL 0 1

Name.Language.Code

r1.Name1.Language1.Code: 'en-us'

r1.Name1.Language2.Code: 'en'

r1.Name2

r1.Name3.Language1.Code: 'en-gb'

r2.Name1

: common prefix

DocId: 10

Links

Forward: 20

Forward: 40

Forward: 60

Name

Language

Code: 'en-us'

Country: 'us'

Language

Code: 'en'

Url: 'http://A'

Name

Url: 'http://B'

Name

Language

Code: 'en-gb'

Country: 'gb'

DocId: 20

Links

Backward: 10

Backward: 30

Forward: 80

Name

Url: 'http://C'

Repetition and definition levels encode the

structural delta between the current value

and the previous value.

r: length of common path prefix

d: length of the current path

Page 16: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Record Assembly

16

Name.Language.Country Name.Language.Code

Links.Backward Links.Forward

Name.Url

DocId

1

0

1

0

0,1, 2

2

0,1 1

0

0

Transitions labeled with

repetition levels

Page 17: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Record Assembly

17

Simpler FSM to retrieve a subset of fields

Cheaper to execute

Useful for queries like

/Name[2]/Language[1]/Country

DocId

Name.Language.Country 1,2

0

0

DocId: 10

Name

Language

Country: 'us'

Language

Name

Name

Language

Country: 'gb'

DocId: 20

Name

s1

s2

Page 18: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Query Execution

18

Id: 10

Name

Cnt: 2

Language

Str: 'http://A,en-us'

Str: 'http://A,en'

Name

Cnt: 0

t1

SELECT DocId AS Id,

COUNT(Name.Language.Code) WITHIN Name AS Cnt,

Name.Url + ',' + Name.Language.Code AS Str

FROM t

WHERE REGEXP(Name.Url, '^http') AND DocId < 20;

message QueryResult {

required int64 Id;

repeated group Name {

optional uint64 Cnt;

repeated group Language {

optional string Str;

}

}

}

Output table Output schema

Page 19: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

19 storage layer (e.g., GFS)

. . .

. . .

. . . leaf servers

(with local storage)

intermediate servers

root server

client

• Parallelize scheduling and

aggregation

• Mostly one-pass aggregation

• Well suited for aggregation

queries returning small and

medium-sized results (<1M

records)

• Common class of interactive

queries

Page 20: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

20

root server

client

• Receive a query from client,

determine all tablets

Page 21: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

21

. . . intermediate servers

root server

client

• Rewrite the query and route it

to intermediate servers

Page 22: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

22

. . .

. . .

. . . leaf servers

(with local storage)

intermediate servers

root server

client

• Rewrite the query into a set of

partial queries and assign them

to leaf servers

Page 23: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

23 storage layer (e.g., GFS)

. . .

. . .

. . . leaf servers

(with local storage)

intermediate servers

root server

client

• Access assigned tablets from the

storage layer

Page 24: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Example: count()

24

SELECT A, COUNT(B) FROM T

GROUP BY A

T = {/gfs/1, /gfs/2, …, /gfs/100000}

SELECT A, SUM(c)

FROM (𝐑𝟏𝟏 UNION ALL … 𝐑𝐧

𝟏)

GROUP BY A

0

Page 25: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Example: count()

25

SELECT A, COUNT(B) FROM T

GROUP BY A

T = {/gfs/1, /gfs/2, …, /gfs/100000}

SELECT A, SUM(c)

FROM (𝐑𝟏𝟏 UNION ALL … 𝐑𝐧

𝟏)

GROUP BY A

SELECT A, COUNT(B) AS c

FROM 𝐓𝟏𝟏 GROUP BY A

𝐓𝟏𝟏 = {/gfs/1, …, /gfs/10000}

SELECT A, COUNT(B) AS c

FROM 𝐓𝟐𝟏 GROUP BY A

𝐓𝟐𝟏 = {/gfs/10001, …, /gfs/20000}

. . .

0

1

𝐑𝟏𝟏 𝐑𝟐

𝟏

Page 26: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Example: count()

26

SELECT A, COUNT(B) FROM T

GROUP BY A

T = {/gfs/1, /gfs/2, …, /gfs/100000}

SELECT A, SUM(c)

FROM (𝐑𝟏𝟏 UNION ALL … 𝐑𝐧

𝟏)

GROUP BY A

SELECT A, COUNT(B) AS c

FROM 𝐓𝟏𝟏 GROUP BY A

𝐓𝟏𝟏 = {/gfs/1, …, /gfs/10000}

SELECT A, COUNT(B) AS c

FROM 𝐓𝟐𝟏 GROUP BY A

𝐓𝟐𝟏 = {/gfs/10001, …, /gfs/20000}

SELECT A, COUNT(B) AS c

FROM 𝐓𝟑𝟏 GROUP BY A

𝐓𝟑𝟏 = {/gfs/1}

. . .

0

1

3

𝐑𝟏𝟏 𝐑𝟐

𝟏

Data access ops

. . .

Page 27: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

27 storage layer (e.g., GFS)

. . . leaf servers

(with local storage)

• Scan assigned tablets to return

partial results to intermediate

servers

Page 28: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

28 storage layer (e.g., GFS)

. . .

. . .

. . . leaf servers

(with local storage)

intermediate servers

• Parallel aggregation of partial

results

Page 29: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Multi-Level Serving Tree

29 storage layer (e.g., GFS)

. . .

. . .

. . . leaf servers

(with local storage)

intermediate servers

root server

client

• Return the aggregate query

result to client

Page 30: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Query Dispatcher

30

Dremel as a multi-user system

Schedule queries based on priorities and load balancing

Fault tolerance

When the number of tablets processed in one query is

larger than the number of available execution threads on

leaf servers

Compute a histogram of tablet processing times

Reschedule tablet processing on another server if a tablet takes

disproportionally long time

Internal execution tree as a physical query execution plan

Page 31: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Evaluation

31

Scalability of Dremel execution time (sec)

number of leaf servers

SELECT TOP(aid, 20), COUNT(*) FROM T4

WHERE bid = {value1} AND cid = {value2}

On a trillion-row table T4 (105TB, 50 fields):

Page 32: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Evaluation

32

Interactive speed of Dremel

execution time (sec)

percentage

of queries

Most queries complete within 10 sec

Monthly query workload

of one 3000-node Dremel

instance

Page 33: Dremel: Interactive Analysis of Web-Scale Datasetsdimitris/6311/L23-DR-Hong.pdfInteractive Query 3 Run a MapReduce to extract billions of signals from web pages Launch Dremel to execute

Summary

33

Propose a scenario for interactive queries on web-scale

datasets

Design a columnar storage for Dremel’s nested data

model

Develop SQL-like Dremel’s query language

Design a multi-level serving tree for efficient query

execution

Evaluate Dremel’s scalability and interactive speed using

web-scale workloads