29
Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Embed Size (px)

Citation preview

Page 1: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Data Mining and Machine Learning

for Big Data Chengqi Zhang

Director of QCISUniversity of Technology, Sydney

Page 2: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Outline

ARC CoE bid in 2013 What we have learnt What we plan to do

18 September 2014

Page 3: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Big Data Research CoE

3

Page 4: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Big Data Research CoE

Scale

High quality postgraduates

TransformationalCritical issue

Global competitiveness

Community engagement

Capacity building

4

Page 5: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Vision and Mission1Overview

Today

Big Data Paradigm2Big Data Research3

4

5Team and Governance6The Critical Imperative7Distinctive Value and Impact8Big Data Research

Ecosystem

Objectives and Milestones

5

Page 6: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

The centre will generate wealth and increased productivity for Australia by creating a vibrant Big Data Research Ecosystem that will put Australia in a global leadership position.

Mission

Transform foundational data science

Create a Big Data high performance utility to unlock Big Data for smarter decision-making

Build human capacity and train next generation of Big Data researchers

1 Vision and Mission

Vision

6

Page 7: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

2 Big Data Paradigm -The Need

Big Data is a Game Changer

7

Page 8: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

2

Big Data is pushing the frontiers of the current paradigm

2 Big Data Paradigm

Data in storage

Data production is big and doubling each year!

2010: we crossed the barrier of one zettabyte(ZB)

1 ZB = 1012 GB

2013: more than 4 ZB of data.

VolumeData in many forms

Network data

Spatial data

Sensor data

VarietyData on the move

VelocityData in doubt

Veracity

8

Page 9: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

9

3 3 Big Data Research

Five target challenges

Data Acquisition and Quality: Just-in-time data linking and integration; data quality management;

provenance.1Big Data Processing: Storage and retrieval of big data; scalability; efficient indexing and searching.2 Real-time Analytics: Real-time machine learning at Big Data scale; real-time stream analytics with high volume; real-time knowledge discovery from deep analytics.3Decision-Making: Gathering the “best” evidence; making sense of Big Data; developing and exploiting insight and foresight with uncertain, inconsistent, incomplete info; risk.4Big Data Computing Paradigm: Fast real-time iterative processing with big distributed data.5

Page 10: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

10

3 3 Big Data Research

Five research programs

Data Acquisition and Quality1Big Data Processing2Real-Time Analytics3Decision-Making4Big Data Computing Paradigm5

Page 11: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

11

3 3 Big Data Research

Major scientific problems

Data Acquisition and Quality: Data inconsistency.1Big Data Processing: Sublinear time (approximate) algorithms against complexity.2Real-Time Analytics: Trade-off between scalability and analytics depth.3 Decision Making: Reasoning with quantitative and qualitative uncertain real-time information.4Big Data Computing Paradigm: Concurrency and mobility in computing Big Data.5

Page 12: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Indu

stry

Eng

agem

ent

Infrastructure and Netw

orks

Business Value:

Westfield, Woolworths, IBM, Google

Customer Behaviour:CBA, IBM,

Woolworths

Geolocation:Westfield,

Woolworths, IBM, Google

Technology:SAP, HP, CA,

Oracle, Schneider

Electric

National:e.g. NCI

Universities:UTS, UoM, UQ,

UNSW

National:CSIRO, CoEs,

Industry

Global Collaborators

: Academic, Industry, Govt

4 Big Data Research Ecosystem

Big Data Research Centre of Excellence

Training and OutreachDoctoral Training 12

Page 13: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

4 Big Data Research Ecosystem

Big Data Research Centre of Excellence

Training and OutreachDoctoral Training

Scale of PhD program

Industry embedded

Industry Doctoral Training Centre

Co-supervision of students

Schools outreach

Big Data Research Centre of Excellence

Training and OutreachDoctoral Training 13

Page 14: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Overarching centre objectives: Establish the computational foundation of Big Data Science Develop new framework for high performance Big Data technologies Train new generation of researchers for Big Data research and

applications

5 Objectives and Milestones

Key mid term milestones (Year 4)• Develop 10 benchmark problems that are adopted by industry• New computational models and programming abstractions to

support improvements of one order of magnitude in computational performance on all benchmark problems

• Algorithms to enable real-time querying, analytics and processing multimodal data with competitive performance on international benchmark problems

• Methodologies for asserting data quality and enabling integration required for the benchmark problems

• Real-time interactive visualisations for decision-making with benchmark problems

• First cohort of PhDs will graduate, over 30 of which will be embedded in industry 14

Page 15: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Overarching centre objectives: Establish the computational foundation of Big Data Science Develop new framework for high performance Big Data

technologies Train new generation of researchers for Big Data research and

applications

5 Objectives and Milestones

Final deliverables (Year 7)

Industry-ready techniques to facilitate real-time retrieval of actionable information that enables predictive analytics and decision -making with world leading performance.

Big Data is a utility that can support three orders of magnitude improvement in computational performance

on ten benchmark problems.

15

Page 16: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

4

Chief Investigators:• Top research groups • Awarded over 60 ARC grants since

2008• Program leaders (70% FTE time) and

CIs (50% FTE time)

Partner Investigators (Academic):• International research leaders from

USA, Europe and Asia

Partner Investigators (Engineers):• Five PIs are from leading companies,

such as IBM, SAP, Oracle

6 6 Team and Governance

The Dream Team

16

Page 17: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Key Sub-Committees

Industry Engagement& Outreach

Commercialisation

Advisory BoardChair: Ron Sandland

Executive Management Team

Chair: Centre Director

5 6 Team and Governance

Structure and governance

ResearchChair: Research

Director

MentoringChair: Education

Director

Industry Engagement& Outreach

Chair: IE & O Director

17

Page 18: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

4

• Chair – Dr Ron Sandland

• National Agencies – NICTA / CSIRO

• Industry – 3 representatives (rotating)

• University – DVCRs (or nominees)

• Leading Scholars – 3 international scholars

6 Team and Governance

Advisory Board

18

Page 19: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

4

Centre of Excellence• Critical mass• Increases

effectiveness of other efforts

• Leverages momentum• Creates intensity!

6 7 The Critical Imperative

Limited window of opportunity

AInternational effort gathering speed

Australia can benefit from first mover advantage

B

CIndustry and community focus and demand

DInternational and national “Dream Team”

19

Page 20: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

47 8 Distinctive Value and Impact

Right area: Transforming fundamentals of data science to make Big Data a high performance utility

Right people: Elite world-leading team of academic and industry leaders with decent time commitments

Right support: Strong industry commitment; Funding, People, Infrastructure, Data

Right time: Australia should take this golden opportunity to lead world in this very important area

Our distinctive value

20

Page 21: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

478 Distinctive Value and Impact

Establish Australia at the global forefront of information and Big Data Science

Increasing Australia’s global competitiveness and productivity

Paradigm shift - significant advances and new frameworks for decision-making with Big Data

Significant growth in national critical mass capability and knowledge

Delivering global impact

Right area: Transforming fundamentals of data science to make Big Data a high performance utility

Right people: Elite world-leading team of academic and industry leaders with decent time commitments

Right support: Strong industry commitment; Funding, ,People, Infrastructure, Data

Right time: Australia should take this golden opportunity to lead world in this very important area.

Our distinctive value

21

Page 22: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Outline

ARC CoE bid in 2013 What we have learnt What we plan to do

18 September 2014

Page 23: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

What we have learnt

Track record for collaborations Specific application areas by impacting Australia

25 November 2014

Page 24: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Outline

ARC CoE bid in 2013 What we have learnt What we plan to do

18 September 2014

Page 25: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

What we plan to do

Establishing a network of “Data Science Australia”

Identify one or two application areas Some pilot projects by involving all members

from DSA network

18 September 2014

Page 26: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Establishing a network of “Data Science Australia”

DSA was established on 13 November 2014 DSA includes UTS, UQ, U. of Melbourne, UNSW,

Monash U. and CSIRO Its objective is to prepare for next ARC CoE bid

18 September 2014

Page 27: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Identify one or two application areas

It could be resource industry; It could be Finance Industry; To be further investigated.

18 September 2014

Page 28: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Some pilot projects

This is the job to be done in next year or two.

18 September 2014

Page 29: Data Mining and Machine Learning for Big Data Chengqi Zhang Director of QCIS University of Technology, Sydney

Thank You!

Questions?

18 September 2014