39

Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Data Mininghttp://www.unhas.ac.id/amil/S1TIF/DM2020/

L3

Amil Ahmad Ilham

Page 2: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Algoritma Data Mining (DM)

1. Estimation (Estimasi):• Linear Regression, Neural Network, Support Vector Machine, etc

2. Prediction/Forecasting (Prediksi/Peramalan):• Linear Regression, Neural Network, Support Vector Machine, etc

3. Classification (Klasifikasi):• Naive Bayes, K-Nearest Neighbor, C4.5, ID3, CART, Linear Discriminant Analysis,

Logistic Regression, etc

4. Clustering (Klastering):• K-Means, K-Medoids, Self-Organizing Map (SOM), Fuzzy C-Means, etc

5. Association (Asosiasi):• FP-Growth, A Priori, Coefficient of Correlation, Chi Square, etc

2

Page 3: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linear RegressionLinear Regression

Page 4: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Hubungan antarvariabel

Terhubung Tak Terhubung

Page 5: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Terhubung Linier

Terhubung Non-linier

Terhubung Non-linier

Hubungan antarvariabel

Page 6: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linier, terhubung kuat Linier, terhubung lemah

Hubungan antarvariabel

Page 7: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linier, kuat Linier, Lemah Non-Linier

Regresi Linier

Page 8: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linier, Kuat Linier, Lemah Non-Linier

Resi

dua

ls

Resi

dua

ls

Resi

dua

ls

Regresi Linier - Residual

acak acak - tersebar berpola - tersebar

Page 9: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Koefisien korelasi

22 )()(

))((

YYXX

YYXXr

ii

ii

• Mengukur korelasi antara dua variabel

• Menunjukkan kekuatan korelasi

Pearson coefficient correlation:

Page 10: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

koefisien korelasi

Page 11: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Apa itu regresi?

UkuranMesin

Silinder KonsumsiBBM

Emisi CO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ?

11

Regresi adalah proses untuk memprediksi nilai kontinuvariable terikat

Nila

i ko

nti

nu

X: variable bebas Y: variable terikat

Nilai kategoris atau kontinu

Page 12: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Apa itu model regresi?

12

Data historis mobil:jumlah silinder, ukuran mesin,

konsumsi BBM dan CO2

Model

Jenis mobil baru EstimasiEmisi CO2

Training

Page 13: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Jenis Model Regresi

• Simple Regression (regresi sederhana):• Linier

• Non-linier

• Multiple Regression (Regresi banyak variable):• Linier

• Non-linier

13

Prediksi EmisiCO2 vs Ukuran Mesin

Prediksi EmisiCO2 vs (Ukuran Mesin dan jumlah silinder)

Page 14: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Aplikasi Regresi

• Prediksi Nilai Penjualan per Tahun seseorang• Berdasarkan umur, pendidikan dan pengalaman seorang sales

• Analisis kepuasan pelanggan• Berdasarkan demografi dan faktor psikologis pelanggan

• Estimasi Harga Rumah• Berdasarkan luas, jumlah kamar dll

• Gaji Karyawan• Berdasarkan jenis pekerjaan, pendidikan, jenis kelamin, umur, pengalaman

kerja dll.

Page 15: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Berbagai Algoritma Regresi

• Regresi Ordinal

• Regresi Poisson

• Fast Forest Quantile

• Linier, Polinomial, Lasso, Stepwise, Ridge

• Regresi Linier Bayesian

• Neural Network

• Decision Forest

• Booseted Decision Tree

• KNN (K-nearest neighbors)

15

Page 16: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Regresi Linier Sederhana

16

Page 17: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Topologi Regresi Linier

• Regresi Linier Sederhana• Prediksi Emisi CO2 vs Ukuran Mesin dari semua mobil

• Regresi Linier Berganda (Multiple Linear Regression):• Prediksi Emisi CO2 vs Ukuran Mesin dan Silinder semua mobil

17

Regresi LinierSederhana

Variabel bebas(x1): Ukuran Mesin

Prediksi (y): Emisi CO2

Regresi LinierBerganda

Variabel bebas(x1, x2,…): Ukuran Mesin, Silinder, …

Prediksi (y): Emisi CO2

Page 18: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

UkuranMesin

Silinder KonsumsiBBM

Emisi CO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ?

18

Nila

i ko

nti

nu

X: variable bebas Y: variable terikat

Menggunakan Regresi Linier untuk memperkirakan nilai kontinu

Page 19: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cara menerapkan regresi linier

UkuranMesin

Silinder KonsumsiBBM

EmisiCO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ? Ukuran Mesin

Emis

iCO

2

Page 20: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cara menerapkan regresi linier

UkuranMesin

Silinder KonsumsiBBM

EmisiCO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ? x1 (Ukuran Mesin)

Y (E

mis

iCO

2)

2.4

214

Plot Data Y vs X

Page 21: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cara menerapkan regresi linier

x1 (Ukuran Mesin)

Y (E

mis

iCO

2)

Plot Data Y vs X

𝑦=𝜃0+𝜃1𝑥1

Variabel prediksi

Perpotongan sumbu y gradien

Koefisien garis

Variabel bebastunggal

Bagaimana mencari nilai𝜃0dan 𝜃1sehingga garis paling mendekati data (best fit) ?

Page 22: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cara mencari best fit ?

x1 (Ukuran Mesin)

Y (E

mis

iCO

2)

Plot Data Y vs XAmbil nilai variable bebas𝑥1=5.4𝑦=250(nilai Emisi CO2 menurut data)

𝑦=𝜃0+𝜃1𝑥1 𝑦=340adalah hasil prediksi untuk x1

Error =𝑦− 𝑦=250−340=−90

Karena nilai error bisa positif atau negatif, gunakan nilai kwardratnya. Best Fit adalahmencari garis yang jumlah kwardrat error – nya paling kecil (minimal square error):

𝑀𝑆𝐸=1

𝑛

𝑖=1

𝑛

𝑦𝑖− 𝑦𝑖2

5.4

𝒚=𝟐𝟓𝟎

𝒚=𝟑𝟒𝟎

minimasi 𝜃0,𝜃1

Page 23: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cara mencari best fit ?

x1 (Ukuran Mesin)

Y (E

mis

iCO

2)

Plot Data Y vs X

Pendekatan minimasi:• Pendekatan matematis• Pendekatan optimasi

𝑀𝑆𝐸=1

𝑛

𝑖=1

𝑛

𝑦𝑖− 𝑦𝑖2

Distribusi Error

Page 24: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Estimasi parameter 𝜃0dan 𝜃1(𝒎𝒂𝒕𝒆𝒎𝒂𝒕𝒊𝒔)

UkuranMesin

Silinder KonsumsiBBM

EmisiCO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ?

𝑥1 𝑦

𝑦=𝜃0+𝜃1𝑥1

𝜃1= 𝑖=1𝑠 𝑥𝑖− 𝑥 𝑦𝑖− 𝑦

𝑖=1𝑠 𝑥𝑖− 𝑥2

𝜃1=43.98

𝜃𝑜= 𝑦−𝜃1 𝑥

𝜃𝑜=92.8

Page 25: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Estimasi parameter 𝜃0dan 𝜃1(𝒐𝒑𝒕𝒊𝒎𝒂𝒔𝒊)

UkuranMesin

Silinder KonsumsiBBM

EmisiCO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ?

𝑥1 𝑦

𝑦=92.8+43.98𝑥1

Page 26: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Prediksi dengan Regresi Linier

UkuranMesin

Silinder KonsumsiBBM

EmisiCO2

0 2.0 4 8.5 196

1 2.4 4 9.6 221

2 1.5 4 5.9 136

3 3.5 6 11.1 255

4 3.5 6 10.6 244

5 3.5 6 10.0 230

6 3.5 6 10.1 232

7 3.7 6 11.1 255

8 3.7 6 11.6 267

9 2.4 4 9.2 ?

𝑦=92.8+43.98𝑥1

198.352

Page 27: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Tutorial Simple Linear Regression

• Download file automobileEDA.csv di http://www.unhas.ac.id/amil/S1TIF/DM2020/• Klik kanan file => Save Link As => Save as type: All Files

• Run Jupyter Notebook

Page 28: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Dataset

Page 29: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Contoh plot hubungan antara 'highway-mpg' dan 'price’

Miles per gallon (mpg)

Page 30: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Koefisien korelasi

• 'highway-mpg' and 'price'

Page 31: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cek koefisien korelasi

• 'engine-size' and 'price'

Page 32: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Cek koefisien korelasi

• 'Peak-rpm' and 'price'

Page 33: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linear function

• a refers to the intercept of the regression line, in other words: the value of Y when X is 0

• b refers to the slope of the regression line, in other words: the value with which Y changes when X increases by 1 unit

Page 34: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Load the modules for linear regression

Page 35: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linear function• "highway-mpg" as the predictor variable and the "price" as the

response variable.

Page 36: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Linear function• "highway-mpg" as the predictor variable and the "price" as the

response variable.

Page 37: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Model Linier

Page 38: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Plotting data dan model

Page 39: Data Mining DM 2020.pdfAlgoritma Data Mining (DM) 1. Estimation (Estimasi): • Linear Regression, Neural Network, Support Vector Machine, etc 2. Prediction/Forecasting (Prediksi/Peramalan):

Tugas• Run new Jupyter Notebook• Buat program untuk memperlihatkan hubungan antara “engine-size” dan “price”

• Buat prediksi price dengan engine-size yang lebih besar (500 – 1000).

Model Prediksi