22
Data Mining & Big Data Landscape Thanh Luong 30 th Aug 2016

Data Mining- Big Data landscape

Embed Size (px)

Citation preview

Data Mining & Big Data Landscape

Thanh Luong

30th Aug 2016

Agenda

• Data Mining, Data Science

• Data Analytics in Business

• Big Data

• Data Analytics TechniquesoClassification (Decision tree, ANN, K-nearest neighbor, Bayesian)

oRegression

oCluster

oText Mining

oSVM

oMachine Learning

Data- Information – Knowledge - Wisdom

Data, Information and Knowledge

• Statistic

• Machine

Learning

• Data Mining

• Data Science

Data Science and Business

Data Implementation for Enterprise

Data Implementation for Enterprise

What is Big Data?

Data Analysis Process

Data types vs mining methods

• Data types and modelso Flat data tables

oRelational databases

oTemporal & spatial data

oTransactional databases

oMultimedia data

oGenome databases

oMaterials science data

oTextual data

oWeb data

oEtc.

• Mining tasks and methodoClassification / Prediction

Decision trees

Bayesian classification

Neural networks

Rule induction

Support vector machine (SVM)

Hidden Markov Model

Etc

oDescriptionAssociation analysis

Clustering

Summarization

Etc.

Data types

• Symbolico Indexing

oBinary

oBoolean

oNominal

oOrdinal

• Numerico Integer

oContinuous

• Structured vs Unstructured data

• Semi-structured data

• Supervised vs Unsupervised data

Data mining techniques

• Supervised Learning

(predictive ability based on past data)

oClassification Statistics

oDecision Trees

oRegression

oArtificial Neural Networks (ANN)

oClassification machine learning

• Unsupervised Learning

(Exploratory analysis to discover patterns)

oClustering Analysis

oAssociation Rules

Study 1: Classification - Overview

Study 1: Classification with decision tree

Outlook Temp Humidity Windy Play

Sunny Hot Normal True ??

Study 1: Classification with decision tree

Study 1: Classification with decision tree

Study 2: Regression

• List all the variable available for making the model

• Establish a Dependent Variable (DV) of interest

• Examine visual (if possible) relationships between variables of interest

• Find a way to predict DV using other variables

Study 3: ANN

Study 3: ANN

Study 4: Cluster Analysis (K-Means)

• Market Segmentation

• Product Portfolio

• Text Mining

Study 5: Association Rule Mining

• Market Segmentation

• Product Portfolio

• Text Mining

THANK YOU