20
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014 USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATA Alexandros Labrinidis Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh http://labrinidis.cs.pitt.edu 1

USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATA Alexandros Labrinidis Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh http://labrinidis.cs.pitt.edu

1

Page 2: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

Data-Intensive Science

2

Data-intensive scienceObservational Simulation

One (virtual) instrument

Multiple instruments

✤ SDSS: Sloan Digital Sky Survey (2000 - )200 GB/night

✤ LSST: Large Synoptic Survey Telescope (2015 - )30 TB/night -- 1.28PB/year

✤ LHC: Large Hadron Collider15 PB/year

✤ SKA: Square Kilometer Array (2019 - )10 PB/hour

Page 3: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

Data-Intensive Science

3

Data-intensive science

Observational Simulation

One (virtual) instrument

Multiple instruments

✤ Gene Sequencing

✤ Personalized Medicine

Page 4: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

Data-Intensive Science

4

Data-intensive science

Observational Simulation

One (virtual) instrument

Multiple instruments

✤ Climate Modeling

✤ Turbulent Combustion Flow

Page 5: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

What’s the Big Deal with Big Data?

•  Featured on the cover of Nature and the Economist!

5

Page 6: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

What’s the Big Deal with Big Data?

• And even has a Dilbert Cartoon!

6

Page 7: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

7

Big Data Definition - The three Vs

• Volume - size does matter!

• Velocity - data at speed, i.e., the data “fire-hose”

• Variety - heterogeneity is the rule

Page 8: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

8

Five more Vs •  Variability - rapid change of data characteristics

over time

•  Veracity - ability to handle uncertainty, inconsistency, etc

•  Visibility – protect privacy and provide security

•  Value – usefulness & ability to find the right hay-colored needle in the haystack

•  Voracity - strong appetite for data!

Page 9: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

9

Enter Moore’s Law

[ Wikipedia Image ]

Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. The law is named after Gordon E. Moore, co-founder of Intel Corporation, who described the trend in his 1965 paper.

Source: http://en.wikipedia.org/wiki/Moore's_law

Page 10: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

10

Enter Bezos’ Law

Photo: http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/

Bezos' law is the observation that, over the history of cloud, a unit of computing power price is reduced by 50% approximately every 3 years

Source: http://blog.appzero.com/blog/futureofcloud

Page 11: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

11

Storage capacity increase

0 1000 2000 3000 4000 5000 6000 7000

HDD Capacity (GB)

[ Wikipedia Data ]

Insert other exponentially increasing graphs here (e.g., data generation rates, world-wide smartphone access rates,

Internet of Things, …)

Page 12: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

12

But

• Human processing capacity remains roughly the same!

Page 13: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

13

We refer to this as the:

Big Data – Same Humans Problem

Page 14: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

About the ADMT Lab • Directed by:

• Panos K. Chrysanthis

• Alexandros Labrinidis

• Established in 1995 • Currently: 5 PhD students

• Our “slogan”: User-centric data management for network-centric applications

14

Page 15: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

Look at the entire data lifecycle

15

Page 16: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

AQSIOS - A DSMS Architecture

Query optim

izer Data stream sources

Q1

Q2

Q3

Query networks

Cont. queries

Stream applications

Administrator Set the delay targets

and priorities for queries

Scheduler Statistics collector

Load Manager

AQSIOS is the DSMS prototype developed at our ADMT Lab. It is built on top of the STREAM prototype from Stanford.

16

AQS IOS

Page 17: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

DILoS evaluation – QoS and QoD

Average response time (ms) Average data loss (%) Class 1 Class 2 Class 3 Class 1 Class 2 Class 3

No load manager 3.40 3.53 56541.69 0 0 0

Common load manager 3.00 3.13 517.07 11.42 11.43 11.60

Per-class load manager 3.55 3.75 492.84 0 0 35.95

DILoS 4.28 4.38 42.95 0 0 0

17

Page 18: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

Style of research • Emphasis on systems and algorithms • Building real systems

•  Often based on academic prototypes (e.g., Stream from Stanford) or on top of well-known open-source software (e.g., Storm)

• Experimenting using real systems and simulation • Comparing alternatives

•  Should we do grouping of queries in way A or way B? •  If we do 4 different optimizations, what is the relative benefit of

each one? •  In which cases would a certain algorithm be better than another?

18

Page 19: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

Types of projects for undergrads • Upcoming:

• web-based user interface to visualize run-time behavior of a real system

• Past: •  clustering of tweets • web-based interfaces to different database back-ends • REST APIs for remote data access •  application to coordinate supernovae observations • monitoring application for transient astronomical events

19

Page 20: USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATAlabrinidis.cs.pitt.edu/mytalks/2014/undergraduate... · 2014-10-08 · • Variability - rapid change of data characteristics over

© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014

More info •  [1] The Beckman Report on Database Research

By Abadi et al, October 2013 http://beckman.cs.wisc.edu

•  [2] Big Data and Its Technical Challenges By Jagadish, Gehrke, Labrinidis, Papakonstantinou, Patel, Ramakrishnan, and Shahabi, Communications of the ACM, July 2014 http://bit.ly/bigdatachallenges (over 4,500 downloads)

•  [3] Contact me: http://labrinidis.cs.pitt.edu/contact

20