36
Defining Architecture Components of the Big Data Ecosystem Yuri Demchenko SNE Group, University of Amsterdam 2 nd BDDAC2014 Symposium, CTS2014 Conference 19-23 May 2014, Minneapolis, USA

Defining Architecture Components of the Big Data Ecosystem

Embed Size (px)

Citation preview

Page 1: Defining Architecture Components of the Big Data Ecosystem

Defining Architecture Components of

the Big Data Ecosystem

Yuri Demchenko

SNE Group, University of Amsterdam

2nd BDDAC2014 Symposium, CTS2014 Conference

19-23 May 2014, Minneapolis, USA

Page 2: Defining Architecture Components of the Big Data Ecosystem

Outline

• Big Data and Data Intensive Science as a new technology wave

– The Fourth Paradigm

• Big Data definition: From 6 Vs to 5 parts

• Big Data technology drivers

– Where do the data come from? What are Big Data drivers?

• Big Data: Paradigm change and new challenges

– From Big Data to All-Data – Moving to data centric service models

• Defining Big Data Architecture Framework (BDAF)

– Big Data Infrastructure (BDI) and Big Data Analytics infrastructure/tools

• Summary and Discussion

BDDAC2014 @CTS2014 Big Data Architecture Framework Slide_2

Page 3: Defining Architecture Components of the Big Data Ecosystem

Big Data and Security Research at System and

Network Engineering, University of Amsterdam

• Long time research and development on Infrastructure services and facilities

– High speed optical networking and data intensive applications

– Semantic description of infrastructure and network services

– Collaborative systems, Grid, Clouds and currently Big Data

• Focus on Infrastructure definition and services

– Software Defined Infrastructure based on Cloud/Intercloud technologies

– Dynamically provisioned security infrastructure and services

• NIST Big Data Working Group

– Contribution to Reference Architecture, Big Data Definition and Taxonomy, Big Data

Security

• Research Data Alliance (RDA)

– Interest Group on Education and Skills Development on Data Intensive Science

– Big Data Analytics Interest Group

• Big Data Interest Group at UvA

– Non-formal but active, meets two-weekly/monthly

– Provided input to NIST BD-WG and RDA activities

BDDAC2014 @CTS2014 Big Data Architecture Framework 3

Page 4: Defining Architecture Components of the Big Data Ecosystem

Technology Definitions and Timeline - Overview

• Service Oriented Architecture (SOA): First proposed in 1996 and revived with

the Web Services advent in 2001-2002

– Currently standard for industry, and widely used

– Provided a conceptual basis for Web Services development

• Computer Grids: Initially proposed in 1998 and finally shaped in 2003 with the

Open Grid Services Architecture (OGSA) by Open Grid Forum (OGF)

– Currently remains as a collaborative environment

– Migrates to cloud and inter-cloud platform

• Cloud Computing: Initially proposed in 2008 – Now in a productive phase

– Defined new features, capabilities, operational/usage models and actually provided

a guidance for the new technology development

– Originated from the Service Computing domain and service management focused

• Big Data and Data Intensive Science: Yet to be defined

– Involves more components and processes to be included into the definition

– Can be better defined as Ecosystem where data are the main driving component

– Need to define the Big Data properties, expected technology capabilities and provide a

guidance/vision for future technology development

BDDAC2014 @CTS2014 Big Data Architecture Framework 4

Page 5: Defining Architecture Components of the Big Data Ecosystem

Visionaries and Drivers:

Seminal works, High level reports, Activities

The Fourth Paradigm: Data-Intensive Scientific Discovery.

By Jim Gray, Microsoft, 2009. Edited by Tony Hey, et al.

http://research.microsoft.com/en-us/collaboration/fourthparadigm/

BDDAC2014 @CTS2014 Big Data Architecture Framework 5

Riding the wave: How Europe can gain from

the rising tide of scientific data.

Final report of the High Level Expert Group on

Scientific Data. October 2010.

http://cordis.europa.eu/fp7/ict/e-

infrastructure/docs/hlg-sdi-report.pdf

AAA Study: Study on

AAA Platforms For

Scientific

data/information

Resources in Europe,

TERENA, UvA, LIBER,

UinvDeb.

(2011-2012)

https://www.rd-alliance.org/

NIST Big Data Working Group (NBD-WG)

https://www.rd-alliance.org/

ISO/IEC JTC1 Big Data Study Group (SGBD)

http://jtc1bigdatasg.nist.gov/home.php

Page 6: Defining Architecture Components of the Big Data Ecosystem

The Fourth Paradigm of Scientific Research

1. Theory and logical reasoning

2. Observation or Experiment

– E.g. Newton observed apples falling to design his theory of

mechanics

– But Gallileo Galilei made experiments with falling objects from the

Pisa leaning tower

3. Simulation of theory or model

– Digital simulation can prove theory or model

4. Data-driven Scientific Discovery (aka Data Science)

– More data beat hypnotized theory

BDDAC2014 @CTS2014 Big Data Architecture Framework 6

Page 7: Defining Architecture Components of the Big Data Ecosystem

Gartner Technology Hypercycle (October 2013)

Source http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp

BDDAC2014 @CTS2014 Big Data Architecture Framework 7

Big Data

Cloud Computing

Page 8: Defining Architecture Components of the Big Data Ecosystem

Our/SNE Big Data Technology Research Cycle

Source http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp

BDDAC2014 @CTS2014 Big Data Architecture Framework 8

Big Data

Cloud Computing

Mid 2014

Mid-End 2013

2012

2011

End 2014

Active research into Big Data domain definition

Building community links

Remote BD technology following.

EU Study AAA for Research Data

Main research in Cloud/Intercloud

Component technologies mastering

Education courses development

Active and productive research

Teaching on Big Data Tech/Infra

New style of technology development

Technology consolidation

Page 9: Defining Architecture Components of the Big Data Ecosystem

Big Data Definitions Overview

• IDC definition of Big Data (conservative and strict approach) :"A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis“

• Gartner definitionBig data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.http://www.gartner.com/it-glossary/big-data/

– Termed as 3 parts definition, not 3V definition

• Big Data: a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.

– From “The Big Data Long Tail” blog post by Jason Bloomberg (Jan 17, 2013). http://www.devx.com/blog/the-big-data-long-tail.html

• “Data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

– Ed Dumbill, program chair for the O’Reilly Strata Conference

BDDAC2014 @CTS2014 Big Data Architecture Framework 9

Page 10: Defining Architecture Components of the Big Data Ecosystem

Improved: 6 (5+1) V’s of Big Data

BDDAC2014 @CTS2014 Big Data Architecture Framework 10

• Trustworthiness

• Authenticity

• Origin, Reputation

• Availability

• Accountability

Veracity

• Batch

• Real/near-time

• Processes

• Streams

Velocity

• Changing data

• Changing model

• Linkage

Variability

• Correlations

• Statistical

• Events

• Hypothetical

Value

• Terabytes

• Records/Arch

• Tables, Files

• Distributed

Volume

• Structured

• Unstructured

• Multi-factor

• Probabilistic

• Linked

• Dynamic

Variety

6 Vs of

Big Data

Generic Big Data

Properties

• Volume

• Variety

• Velocity

Acquired Properties

(after entering system)

• Value

• Veracity

• Variability

Commonly accepted

3V’s of Big Data

Adopted in general

by NIST BD-WG

Page 11: Defining Architecture Components of the Big Data Ecosystem

Big Data Definition: From 6V to 5 Parts (1)

(1) Big Data Properties: 5V– Volume, Variety, Velocity, Value, Veracity

– Additionally: Data Dynamicity (Variability)

(2) New Data Models– Data Lifecycle and Variability

– Data linking, provenance and referral integrity

(3) New Analytics– Real-time/streaming analytics, interactive and machine learning analytics

(4) New Infrastructure and Tools– High performance Computing, Storage, Network

– Heterogeneous multi-provider services integration

– New Data Centric (multi-stakeholder) service models

– New Data Centric security models for trusted infrastructure and data processing and storage

(5) Source and Target– High velocity/speed data capture from variety of sensors and data sources

– Data delivery to different visualisation and actionable systems and consumers

– Full digitised input and output, (ubiquitous) sensor networks, full digital control

BDDAC2014 @CTS2014 Big Data Architecture Framework 11

Page 12: Defining Architecture Components of the Big Data Ecosystem

Big Data Definition: From 6V to 5 Parts (1)

(1) Big Data Properties: 5V– Volume, Variety, Velocity, Value, Veracity

– Additionally: Data Dynamicity (Variability)

(2) New Data Models– Data linking, provenance and referral integrity

– Data Lifecycle and Variability/Evolution

(3) New Analytics– Real-time/streaming analytics, interactive and machine learning analytics

(4) New Infrastructure and Tools– High performance Computing, Storage, Network

– Heterogeneous multi-provider services integration

– New Data Centric (multi-stakeholder) service models

– New Data Centric security models for trusted infrastructure and data processing and storage

(5) Source and Target– High velocity/speed data capture from variety of sensors and data sources

– Data delivery to different visualisation and actionable systems and consumers

– Full digitised input and output, (ubiquitous) sensor networks, full digital control

BDDAC2014 @CTS2014 Big Data Architecture Framework 12

Page 13: Defining Architecture Components of the Big Data Ecosystem

Big Data Definition: From 6V to 5 Parts (2)

Refining Gartner definition

“Big data is (1) high-volume, high-velocity and high-variety information assets that demand (3) cost-effective, innovative forms of information processing for (5) enhanced insight and decision making”

• Big Data (Data Intensive) Technologies are targeting to process (1) high-volume, high-velocity, high-variety data (sets/assets) to extract intended data value and ensure high-veracity of original data and obtained information that demand cost-effective, innovative forms of data and information processing (analytics) for enhanced insight, decision making, and processes control; all of those demand (should be supported by) new data models (supporting all data states and stages during the whole data lifecycle) and new infrastructure services and tools that allows also obtaining (and processing data) from a variety of sources (including sensor networks) and delivering data in a variety of forms to different data and information consumers and devices.

(1) Big Data Properties: 5V

(2) New Data Models

(3) New Analytics

(4) New Infrastructure and Tools

(5) Source and Target

BDDAC2014 @CTS2014 Big Data Architecture Framework 13

Page 14: Defining Architecture Components of the Big Data Ecosystem

Big Data Nature: Origin and Target (consumers)

Big Data Origin

• Science

• Internet, Web

• Industry

• Business

• Living Environment,

Cities

• Social media and

networks

• Healthcare

• Telecom/Infrastructure

BDDAC2014 @CTS2014 Big Data Architecture Framework 14

Big Data Target Use

• Scientific discovery

• New technologies

• Manufacturing,

processes, transport

• Personal services,

campaigns

• Living environment

support

• Healthcare support

• Social Networking

Da

ta T

ransfo

rmation

Volume, Velocity, Variety & Value, Veracity, Variability

Page 15: Defining Architecture Components of the Big Data Ecosystem

Big Data technology drivers (1)

• Modern e-Science in search for new knowledge– Scientific experiments and tools are becoming bigger and

heavily based on data processing and mining

– The long tail of science

• Traditional data intensive industry– Genomic research, drugs development, Healthcare

– High-tech industry, CAD/CAM, weather/climate, etc.

• Customer facing industry and companies– Advertisement, retail business, service delivery

• Intelligence and security

• Network/infrastructure management– Network monitoring, Intrusion detection, troubleshooting

BDDAC2014 @CTS2014 Big Data Architecture Framework 15

Page 16: Defining Architecture Components of the Big Data Ecosystem

The Long Tail of Science (aka “Dark Data”)

• Collectively “Long Tail” science is generating a lot of data

– Estimated as over 1PB per year and it is growing fast with the new

technology proliferation

• 80-20 rule: 20% users generate 80% data but not necessarily 80%

knowledge

BDDAC2014 @CTS2014 Big Data Architecture Framework 16

Source: Dennis Gannon (Microsoft)

NIST Big Data Workshop, 2012

Page 17: Defining Architecture Components of the Big Data Ecosystem

Big Data technology drivers – Technology Loop

• Technology loop (known as Jevons Paradox)

– Increased efficiency to process current demand will create

new uses and increase demand even more

BDDAC2014 @CTS2014 Big Data Architecture Framework 17

Elastic Demand for Work:

A doubling of fuel

efficiency more than

doubles work demanded,

increasing the amount of

fuel used.

Jevons paradox occurs.

Page 18: Defining Architecture Components of the Big Data Ecosystem

Big Data technology drivers (2) – Managing public

campaigns, e.g. election, public relations

• The rise of public opinion stored in platforms like Twitter,

Google, Facebook, etc. provide enough intelligence to

influence the campaign development, timing, geography and

even the colour of the campaign signs

– Twitter was a major source of data aggregation for the Republican

Race in the US

– Multimillion-dollar contract for data management

and collection services awarded May 1, 2013 to

Liberty Work to build advanced list of voters

• Article “In Data we trust” by T.Edsall in The New York Times

– Book: In Data We Trust: How Customer Data is

Revolutionising Our Economy (Aug 2012)

• A strategy for tomorrow's data world

BDDAC2014 @CTS2014 Big Data Architecture Framework 18

Page 19: Defining Architecture Components of the Big Data Ecosystem

NIST Big Data Working Group (NBD-WG) and

ISO/IEC JTC1 Study Group on Big Data (SGBD)

• Started June 2013 - http://bigdatawg.nist.gov/home.php– Weekly calls, open participation, mailing list

• Targeted formal delivery Autumn 2014 of a set of NIST documentshttp://bigdatawg.nist.gov/V1_output_docs.phpVolume 1: NIST Big Data Definitions

Volume 2: NIST Big Data Taxonomies

Volume 3: NIST Big Data Use Case & Requirements (co-chair Geoffrey Fox)

Volume 4: NIST Big Data Security and Privacy Requirements

Volume 5: NIST Big Data Architectures White Paper Survey

Volume 6: NIST Big Data Reference Architecture

Volume 7: NIST Big Data Technology Roadmap

• ISO/IEC Study Group on Big Data (SGBD)http://jtc1bigdatasg.nist.gov/home.php

– Term (December 2013) – September 2014

– Extends NIST BDWG activity and scope

– 2nd meeting hosted in Amsterdam 13-16 May 2014http://jtc1bigdatasg.nist.gov/programEU.php

BDDAC2014 @CTS2014 Big Data Architecture Framework 19

Page 20: Defining Architecture Components of the Big Data Ecosystem

2nd ISO/IEC SGBD meeting 13-16 May 2014

Discussions and results

• Two days workshop 13-14 May 2014

– EU and NL focus, UvA activities

• Refining Big Data technology definition and Big

Data Architecture definition

• New items proposed

– Big Data market aspects

– Data ownership (including during data lifecycle/staging

and aggregation)

– Opacity (obfuscation) data linkage during processing

– Data linkage

BDDAC2014 @CTS2014 Big Data Architecture Framework 20

Page 21: Defining Architecture Components of the Big Data Ecosystem

From Big Data to All-Data – Paradigm Change

• Breaking paradigm changing factor

– Data storage and processing

– Security

– Identification and provenance

• Traditional model

– BIG Storage and BIG Computer with

FAT pipe

– Move compute to data vs Move data

to compute

• New Paradigm

– Continuous data production

– Continuous data processing

– DataBus as a Data container and

Protocol

BDDAC2014 @CTS2014 Big Data Architecture Framework 21

?

Big DataBig

ComputerNetwork

Move or not

to move?

Data Bus

Visuali

sation

Presen

tation

Action

Distributed Compute

and Analytics

Infrastructure Abstraction

Distributed Big Data

StorageData Abstraction

DataBus: (1) Data Container

(2) Metadata, State

(3) Data Transfer

Protocol

Page 22: Defining Architecture Components of the Big Data Ecosystem

Moving to Data-Centric Models and Technologies

• Current IT and communication technologies are host based or host centric– Any communication or processing are bound to host/computer that

runs software

– Especially in security: all security models are host/client based

• Big Data requires new data-centric models– Data location, search, access

– Data integrity and identification

– Data lifecycle and variability

– Data centric (declarative) programming models

– Data aware infrastructure to support new data formats and data centric programming models

• Data centric security and access control

BDDAC2014 @CTS2014 Big Data Architecture Framework 22

Page 23: Defining Architecture Components of the Big Data Ecosystem

Defining Big Data Architecture Framework

• Existing attempts address architecture issues in a traditional way: ODCA, TMF, NIST– http://bigdatawg.nist.gov/_uploadfiles/BD_Vol5_RefArchSurvey_V1Draft_Pre-

release.pdf

• Architecture vs Ecosystem– Big Data undergo a number of transformations during their lifecycle

– Big Data fuel the whole transformation chain • Data sources and data consumers, target data usage

– Multi-dimensional relations between• Data models and data driven processes

• Infrastructure components and data centric services

• Architecture vs Architecture Framework– Separates concerns and factors

• Control and Management functions, orthogonal factors

– Architecture Framework components are inter-related

BDDAC2014 @CTS2014 Big Data Architecture Framework 23

Page 24: Defining Architecture Components of the Big Data Ecosystem

Big Data Architecture Framework (BDAF) (1)

(1) Data Models, Structures, Types– Data formats, non/relational, file systems, etc.

(2) Big Data Management– Big Data Lifecycle (Management) Model

• Big Data transformation/staging

– Provenance, Curation, Archiving

(3) Big Data Analytics and Tools– Big Data Applications

• Target use, presentation, visualisation

(4) Big Data Infrastructure (BDI)– Storage, Compute, (High Performance Computing,) Network

– Sensor network, target/actionable devices

– Big Data Operational support

(5) Big Data Security– Data security in-rest, in-move, trusted processing environments

BDDAC2014 @CTS2014 Big Data Architecture Framework 24

Page 25: Defining Architecture Components of the Big Data Ecosystem

Big Data Architecture Framework (BDAF) –

Aggregated – Relations between components (2)

Col: Used By

Row: Requires

This

Data

Models

Structrs

Data

Managmnt

& Lifecycle

BigData

Infrastr &

Operations

BigData

Analytics &

Applicatn

BigData

Security

Data Models

& Structures

+ ++ + ++

Data

Managmnt &

Lifecycle

++ ++ ++ ++

BigData

Infrastruct &

Operations

+++ +++ ++ +++

BigData

Analytics &

Applications

++ + ++ ++

BigData

Security

+++ +++ +++ +

BDDAC2014 @CTS2014 Big Data Architecture Framework 25

Page 26: Defining Architecture Components of the Big Data Ecosystem

Big Data Ecosystem: General BD Infrastructure

Data Transformation, Data Management

BDDAC2014 @CTS2014 Big Data Architecture Framework 26

Co

nsu

me

r

Data

Source

Data

Filter/Enrich,

Classification

Data

Delivery,

Visualisatio

n

Data

Analytics,

Modeling,

Prediction

Data

Collection&

Registratio

n

Storage

General

Purpose

Compute

General

Purpose

High

Performance

Computer

Clusters

Data Management

Big Data Analytic/Tools

Big Data Source/Origin (sensor, experiment, logdata, behavioral data)

Big Data Target/Customer: Actionable/Usable Data

Target users, processes, objects, behavior, etc.

Infrastructure

Management/MonitoringSecurity InfrastructureNetwork Infrastructure

Internal

Intercloud multi-provider heterogeneous Infrastructure

Federated

Access

and

Delivery

Infrastructure

(FADI)

Storage

Specialised

Databases

Archives

(analytics DB,

In memory,

operstional)

Data categories: metadata, (un)structured, (non)identifiableData categories: metadata, (un)structured, (non)identifiableData categories: metadata, (un)structured, (non)identifiable

Big Data Infrastructure

• Heterogeneous multi-provider

inter-cloud infrastructure

• Data management

infrastructure

• Collaborative Environment

(user/groups managements)

• Advanced high performance

(programmable) network

• Security infrastructure

Page 27: Defining Architecture Components of the Big Data Ecosystem

Big Data Infrastructure and Analytics Tools

BDDAC2014 @CTS2014 Big Data Architecture Framework 27

Big Data Infrastructure• Heterogeneous multi-provider

inter-cloud infrastructure

• Data management

infrastructure

• Collaborative Environment

(user/groups managements)

• Advanced high performance

(programmable) network

• Security infrastructure

Big Data Analytics• High Performance Computer

Clusters (HPCC)

• Analytics/processing: Real-

time, Interactive, Batch,

Streaming

• Big Data Analytics tools and

applications

Page 28: Defining Architecture Components of the Big Data Ecosystem

Data Lifecycle/Transformation Model

• Does Data Model changes along

lifecycle or data evolution?

• Identifying and linking data

– Persistent identifiers

– Data ownership

– Traceability vs Opacity

– Referral integrity

BDDAC2014 @CTS2014 Big Data Architecture Framework 28

Common Data Model?

• Data Variety and Variability

• Semantic InteroperabilityData Model (1)Data Model (1)

Data Model (3)

Data Model (4)Data (inter)linking?

• PID/OID

• ORCID

• Identification

• Privacy, Opacity

Data Storage

Con

su

me

r

Data

An

alit

ics

Ap

plic

atio

n

Data

Source

Data

Filter/Enrich,

Classification

Data

Delivery,

Visualisation

Data

Analytics,

Modeling,

Prediction

Data

Collection&

Registration

Data repurposing,

Analitics re-factoring,

Secondary processing

Page 29: Defining Architecture Components of the Big Data Ecosystem

Evolutional/Hierarchical Data Model

Topics for discussion, research and standardisation

• Common Data Model?

• Data interlinking?

• Fits to Graph data type?

• Metadata

• Referrals

• Control information

• Policy

• Data patterns

BDDAC2014 @CTS2014 Big Data Architecture Framework 29

Usable DataActionable Data

Processed Data (for target use)

Classified/Structured Data

Raw Data

Classified/Structured Data

Classified/Structured Data

Processed Data (for target use)

Papers/Reports Archival Data

Processed Data (for target use)

ORCID

PID/DOI

Page 30: Defining Architecture Components of the Big Data Ecosystem

Summary and topics for discussion

• Researching, learning mastering Big Data domain is a Big

Data problem itself

• Cloud Computing as a natural platform for Big Data

– Acceptance of clouds will grow, so demand for specialists

• New generically data centric models are required

• New distributed data processing and analytics computing

models to be developed/re-factored

• Data Scientist is a new focus for talents search by

companies and task for universities to develop a new

curriculum

BDDAC2014 @CTS2014 Big Data Architecture Framework 30

Page 31: Defining Architecture Components of the Big Data Ecosystem

Additional topics

• Data Scientist: New profession and need for

Education&Training

BDDAC2014 @CTS2014 Big Data Architecture Framework 31

Page 32: Defining Architecture Components of the Big Data Ecosystem

Data Scientist: New Profession and Opportunities

• There will be a shortage of talent necessary for organizations to take advantage

of Big Data. – By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep

analytical skills as well as

– 1.5 million managers and analysts with the know-how to use the analysis of big data to make

effective decisions

SOURCE:US Bureau of Labor Statistics; US Census; Dun & Bradstreet; company interviews; McKinsey analysis

Wsh SGBD, 13-16 May 2014 Big Data and Education 32

McKinsey Institute on Big Data Jobs (2011)

http://www.mckinsey.com/mgi/publications/big_data/index.asp

Page 33: Defining Architecture Components of the Big Data Ecosystem

Strata Survey Skills and Data Scientist Self-ID

Wsh SGBD, 13-16 May 2014 Big Data and Education 33

Analysing the Analysers. O’Reilly Strata Survey – Harris, Murphy & Vaisman, 2013

• Based on how data scientists think about themselves and their work

• Identified four Data Scientist clusters

Page 34: Defining Architecture Components of the Big Data Ecosystem

Skills and Self-ID Top Factors

Wsh SGBD, 13-16 May 2014 Big Data and Education 34

ML/BigData

Business

Math/OR

Programming

Statistics

ML – Machine Learning

OR – Operations Research

Page 35: Defining Architecture Components of the Big Data Ecosystem

Wsh SGBD, 13-16 May 2014 Big Data and Education 35

Slide from the presentation

Demystifying Data Science

(by Natasha Balac, SDSC)

Page 36: Defining Architecture Components of the Big Data Ecosystem

Key to a Great Data Scientist

Technical skills (Coding, Statistics, Math)

+ Commitment +Creativity

+ Intuition

+ Presentation Skills

+ Business Savvy

= Great Data Scientist!

• How Long Does It Take For a Beginner to Become a Good

Data Scientist?

– 3-5 years according to KDnuggets survey [278 votes total]

Wsh SGBD, 13-16 May 2014 Big Data and Education 36