Implementasi Bagging Nearest Neighbor Support Vector...

Preview:

Citation preview

Implementasi Metode Bagging Nearest

Neighbor Support Vector Machine Untuk

Prediksi Kebangkrutan

Penyusun:

M. Ulin Nuha – 5108100164

Dosen Pembimbing: Isye Arieshanti, S.Kom., M.Phil

Yudhi Purwananto, S.Kom., M.Kom.

PRESENTASI TUGAS AKHIR – KI091391

(Keyword: Prediksi kebangkrutan, BNNSVM, Bootstrap aggregating, K-nearest neighbor,

Support Vector Machine)

Latar Belakang

Tujuan

Permasalahan

Pengembangan Perangkat Lunak

Kesimpulan

Daftar Pustaka

Tujuan

Permasalahan

Pengembangan Perangkat Lunak

Kesimpulan

Daftar Pustaka

Latar Belakang

Latar Belakang

Krisis Finansial Global

Perusahaan Bangkrut

Prediksi Kebangkrutan

Bagging Nearest Neighbor Support Vector Machine

Permasalahan

Pengembangan Perangkat Lunak

Kesimpulan

Daftar Pustaka

Latar Belakang

Tujuan

Tujuan

Bagging Nearest Neighbor Support Vector Machine

(BNNSVM)

Prediksi Kebangkrutan

Implementasi

Tujuan

Pengembangan Perangkat Lunak

Kesimpulan

Daftar Pustaka

Latar Belakang

Permasalahan

Permasalahan

Bagaimana mengimplementasikan metode Bagging Nearest Neighbor Support Vector Machine (BNNSVM) untuk prediksi kebangkrutan?

Bagaimana menguji model BNNSVM untuk memprediksi kebangkrutan perusahaan?

Latar Belakang

Tujuan

Permasalahan

Kesimpulan

Daftar Pustaka

Pengembangan Perangkat Lunak

Studi Literatur

Desain dan Implementasi

Uji Coba

Pengembangan Perangkat Lunak

Desain dan Implementasi

Uji Coba Studi

Literatur

BNNSVM

Bagging (Bootstrap

Aggregating)

SVM (Support Vector

Machine)

KNN (K-Nearest

Neighbor)

K-Nearest Neighbor

𝑑 𝑝, 𝑞 = (𝑝𝑖 − 𝑞𝑖)2

𝑛

𝑖=1

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

Jarak antar data dapat dihitung dengan Euclidean distance

SVM – Konsep Dasar

Mencari hyperplane (decision boundary) yang memisahkan data

SVM – Konsep Dasar

Salah satu solusi

B1

SVM – Konsep Dasar

Solusi lain

B2

SVM – Konsep Dasar B

1

b11

b12

0 bxw

1 bxw 1 bxw

SVM – Konsep Dasar

Optimisasi permasalahan

1

2𝑤 2𝑤

𝑚𝑖𝑛

dengan konstrain 𝑦𝑖 . 𝑤. 𝑥𝑖 + 𝑏 ≥ 1, 𝑖 = 1,… , 𝑛

Decision function

𝑓 𝑥 = 𝑠𝑖𝑔𝑛(𝑤. 𝑥 + 𝑏) ◦ w : nilai vektor yang unik dari hyperplane

◦ 𝑏 : nilai tahanan (intercept) dari hyperplane

◦ y : kelas label (+1, -1)

◦ x : vektor berisi nilai atribut masing-masing data

SVM – Konsep Dasar B

1

b11

b12

0 bxw

1 bxw 1 bxw

1bxw if1

1bxw if1)(

xf

||||

2Margin

w

SVM – Soft Margin

Data tidak dapat dipisahkan secara linier

SVM – Soft Margin

Variabel slack ξ

𝑤. 𝑥 + 𝑏 = −1 + ξ 0 bxw

1 bxw

SVM – Soft Margin

Optimisasi permasalahan

1

2𝑤𝑇𝑤 + 𝐶 𝜉𝑖

𝑙

𝑖=1

w,b,𝜉min

dengan konstrain 𝑦𝑖 . 𝑏 + 𝑤

𝑇 . 𝛷(𝑥𝑖) ≥ 1 − 𝜉𝑖 ; 𝑖 = 1,… , 𝑙 ; 𝜉𝑖 ≥ 0

Nilai cost C

◦ Semakin besar, kemungkinan error juga

semakin besar

SVM – Kernel Trick

Decision boundary tidak linier

SVM – Kernel Trick

Transformasi data ke dimensi yang lebih tinggi

SVM – Kernel Trick

Jenis Kernel Fungsi

Linear 𝐾 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 . 𝑥𝑗

Polynomial 𝐾 𝑥𝑖 , 𝑥𝑗 = (𝛾.𝑥𝑖 . 𝑥𝑗 + 𝑐𝑛)𝑑

Radial Basis Function (RBF) 𝐾 𝑥𝑖 , 𝑥𝑗 = exp (−𝛾 ||𝑥𝑖 − 𝑥𝑗||2)

Bagging (Bootstrap Aggregating)

D Data latih asli

D1 D2 Dn-1 Dn

Membuat n data

latih baru dengan

sampling with

replacement …

C1 C2 Cn-1 Cn …

Membuat n model

klasifikasi

C*

Menggabungkan

(voting) hasil

prediksi

Bagging (Bootstrap Aggregating)

Sampling with replacement

Data latih asli 1 2 3 4 5 6 7 8 9

Bagging (1) 7 8 9 8 2 5 9 2 1

Bagging (2) 1 4 9 1 2 3 2 7 3

Bagging (3) 1 8 5 9 5 5 9 6 3

Uji Coba Studi

Literatur

Desain dan Implementasi

Antarmuka

Proses

Data

Desain dan Implementasi

Data: Input

Wieslaw dataset

Australian credit approval dataset

Dataset Jumlah

Record +

Jumlah

Record -

Jumlah

Record

Jumlah

Atribut

Wieslaw 128 112 240 30

Australian

credit

approval

383 307 690 14

Data: Output

Data output latih: model klasifikasi

Data output uji: hasil prediksi

◦ -1

bangkrut (Wieslaw)

kredit ditolak (Australian)

◦ +1

tidak bangkrut (Wieslaw)

kredit diterima (Australian)

Proses: Pembagian Data

Cross Validation

Pembagian Data Latih

Dataset

Pembagian Data Latih

Proses: Pembagian Data

Cross Validation

Dataset

Data Latih 1 Data Uji 1 Data Latih n Data Uji n …

Cross Validation

Data Uji Proses: Pembagian Data

Cross Validation

Pembagian Data Latih

Data Latih

Data trs 1 Data ts 1 Data trs n Data ts n …

Cross Validation

Proses: Latih

Bagging KNN SVM

training

Data trs

Data ts

Data Uji

Data Uji

Data ts

Proses: Latih

KNN SVM

training Bagging

Data trs

Bootstrap

sample 1

Bootstrap

sample 2

Bootstrap

sample 9

Bootstrap

sample 10 …

Bagging

Data trs

Data Uji

Bagging

Proses: Latih

SVM training

Data trs

Bootstrap

sample 1

Bootstrap

sample 2

Bootstrap

sample 9

Bootstrap

sample 10 …

Bagging Data ts

KNN

Data Uji

Bagging

Proses: Latih

SVM training

Data ts

KNN 1 KNN 2 KNN 9 KNN 10 …

K-Nearest

Neighbor

KNN

Bootstrap

sample 1

Bootstrap

sample 2

Bootstrap

sample 9

Bootstrap

sample 10 …

K-Nearest

Neighbor

K-Nearest

Neighbor K-Nearest

Neighbor

Data Uji

KNN Bagging

Proses: Latih

Data ts

KNN 1 KNN 2 KNN 9 KNN 10 …

K-Nearest

Neighbor

Bootstrap

sample 1

Bootstrap

sample 2

Bootstrap

sample 9

Bootstrap

sample 10 …

K-Nearest

Neighbor

K-Nearest

Neighbor K-Nearest

Neighbor

SVM training

Data Uji

KNN Bagging

Proses: Latih

KNN 1 KNN 2 KNN 9 KNN 10 …

SVM training

SVM Training SVM Training SVM Training SVM Training

SVM Model 1 SVM Model 2 SVM Model 9 SVM Model

10 …

Proses: Uji

SVM testing

Bagging

Data Uji

SVM Model 1 SVM Model 2 SVM Model 9 SVM Model

10 …

Proses: Uji

Bagging SVM

testing

Data Uji

SVM Model 1 SVM Model 2 SVM Model 9 SVM Model

10 …

Proses: Uji

Bagging SVM

testing

SVM Model 1 SVM Model 2 SVM Model 9 SVM Model

10 …

SVM Testing SVM Testing SVM Testing SVM Testing

Prediksi 1 Prediksi 2 Prediksi 9 Prediksi 10 …

Data Uji

SVM testing

Proses: Uji

SVM Model 1 SVM Model 2 SVM Model 9 SVM Model

10 …

SVM Testing SVM Testing SVM Testing SVM Testing

Prediksi 1 Prediksi 2 Prediksi 9 Prediksi 10 …

Data Uji

Bagging

SVM testing

Proses: Uji

Bagging

Prediksi 1 Prediksi 2 Prediksi 9 Prediksi 10 …

Bagging (voting)

Prediksi

Akhir

Antarmuka

Studi Literatur

Desain dan Implementasi

Nilai k (KNN)

Nilai cost & Jenis Kernel (SVM)

Perbandingan dengan metode lain

Uji Coba

Uji coba dengan nilai k berbeda

(Wieslaw) k Akurasi Presisi

Sensiti-

vity Specifi-

city

1 65.67 67.88 69.76 61.67

2 68.92 71.39 70.61 67.18

3 68.92 72.41 69.85 69.78

4 70.58 72.87 71.99 68.6

5 70.83 73.82 71.03 70.25

6 69.42 72.78 69.06 69.48

7 70.5 75.3 68.76 72.5

8 70.58 73.55 70.41 70.91

9 70.83 74.39 70.55 72.58

10 71.58 73.99 71.86 71.08

60

65

70

75

80

85

90

1 2 3 4 5 6 7 8 9 10

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan nilai k berbeda

(Australian) k Akurasi Presisi

Sensiti

-vity

Specifi

-city

1 84.49 83.91 81.17 87.22

2 84.9 83.94 82.35 87.19

3 84.93 83.78 82.76 87.04

4 85.74 84.81 83.02 87.84

5 85.3 83.19 84.54 86.28

6 84.43 83.53 81.88 86.51

7 85.07 83.08 83.94 86.16

8 86.23 85.92 83.06 88.95

9 85.77 84.96 82.95 88.1

10 85.22 84.68 82.34 87.3

60

65

70

75

80

85

90

1 2 3 4 5 6 7 8 9 10

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan nilai cost SVM

berbeda (Wieslaw)

Cost Akurasi Presisi Sensiti-

vity Specifi-

city

0.01 67.58 68.22 75.51 59.25

0.1 70.17 72.66 72.12 66.93

1 71.08 75.12 71.61 72.17

10 70.33 72.58 70.85 69.47

100 71 75.25 70.03 72.87

55

60

65

70

75

80

85

90

95

0.01 0.1 1 10 100

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan nilai cost SVM

berbeda (Australian)

Cost Akurasi Presisi Sensiti-

vity Specifi-

city

0.01 83.51 86.86 74.5 90.7

0.1 85.54 82.51 85.55 85.44

1 84.64 83.64 82.09 86.56

10 80.72 81.68 73.79 86.27

100 75.8 78.57 65.46 83.85

55

60

65

70

75

80

85

90

95

0.01 0.1 1 10 100

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan kernel RBF dan nilai

gamma berbeda (Wieslaw)

Gamma Akurasi Presisi Sensiti-

vity Specifi-

city

0.0001 54.17 58.36 73.19 35.07

0.001 58.75 60.59 65.61 50.55

0.01 56.83 57.28 74.14 37.45

0.1 54.5 54.36 89.48 14.33

1 53.25 53.3 99.87 0

10 52.67 51 96 4 0

20

40

60

80

100

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan kernel RBF dan nilai

gamma berbeda (Australian)

Gamma Akurasi Presisi Sensiti-

vity Specifi-

city

0.0001 67.88 68.43 52.17 80.79

0.001 68.81 64.98 64.13 72.45

0.01 56.06 51.05 14.67 89.41

0.1 55.04 16 0.4 98.89

1 55.51 0 0 100

10 55.51 0 0 100 0

20

40

60

80

100

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan kernel Polynomial

dan nilai degree berbeda (Wieslaw)

Degree Akurasi Presisi Sensiti-

vity Specifi-

city

1 65.25 66.92 71.77 58.96

2 71.33 72.34 74.22 67.8

3 69.67 72.22 69.36 69.95

4 68.08 71.42 68.64 68.47

5 70.08 72.25 71.11 69.32

50

60

70

80

90

100

1 2 3 4 5

Akurasi Presisi

Sensitivity Specificity

Uji coba dengan kernel Polynomial

dan nilai degree berbeda (Australian)

Degree Akurasi Presisi Sensiti-

vity Specifi-

city

1 79.83 89.47 62.87 93.85

2 80.32 85.79 67.64 90.48

3 72.35 75.74 62.65 79.87

4 60.09 64.66 62.55 57.88

5 57.97 64.9 52.6 62.46

50

60

70

80

90

100

1 2 3 4 5

Akurasi Presisi

Sensitivity Specificity

Uji coba perbandingan dengan

metode klasifikasi lain (Wieslaw)

Metode Akurasi Presisi Sensiti-

vity

Spesifi-

city

KNN 75 76.22 78.04 73.03

ANN 70 70 70 70

SVM 70.42 74.29 69.91 74.82

BLR 87.54 90.68 86.42 11.07

BNN-

SVM 71.58 73.99 71.86 71.08

65

70

75

80

85

90

95

100

Akurasi Presisi

Sensitivity Spesificity

KNN = K-Nearest Neighbor

ANN = Artificial Neural Network

SVM = Support Vector Machine

BLR = Binary Logistic Regression

BNNSVM = Bagging Nearest Neighbor

Support Vector Machine

Uji coba perbandingan dengan

metode klasifikasi lain (Australian)

Metode Akurasi Presisi Sensiti-

vity

Spesifi-

city

KNN 83.19 80.9 80.5 85.2

ANN 83.48 83.5 85.11 87.94

SVM 84.35 77.03 93.44 77.92

BLR 80.83 81.73 75.89 14.84

BNN-

SVM 86.23 85.92 83.06 88.95

65

70

75

80

85

90

95

100

Akurasi Presisi

Sensitivity Spesificity

KNN = K-Nearest Neighbor

ANN = Artificial Neural Network

SVM = Support Vector Machine

BLR = Binary Logistic Regression

BNNSVM = Bagging Nearest Neighbor

Support Vector Machine

Latar Belakang

Tujuan

Permasalahan

Pengembangan Perangkat Lunak

Daftar Pustaka

Kesimpulan

BNNSVM Prediksi

Kebangkrutan

Dataset Akurasi Presisi Sensiti-

vity

Specifi-

city

Wieslaw 71.58 % 73.99 % 71.86 % 71.08 %

Australian

credit

approval

86.23 % 85.92 % 83.06 % 88.95 %

Hasil

Latar Belakang

Tujuan

Permasalahan

Pengembangan Perangkat Lunak

Kesimpulan

Daftar Pustaka

Daftar Pustaka

Li, H., & Sun, J. (2011). Forecasting Business Failure: The Use of Nearest-Neighbour Support Vectors and Correcting Imbalanced Samples - Evidence from Chinese Hotel Industry. Tourism Management , XXXIII (3), 622-634.

Frank, A., & Asuncion, A. (2010). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. Diambil kembali dari http://archive.ics.uci.edu/ml

Wieslaw, P. (2004). Application of Discrete Predicting Structures in An Early Warning Expert System for Financial Distress. Tourism Management.

Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining (4th ed.). Boston: Pearson Addison Wesley.

Terima kasih

Recommended