Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Implementasi Metode Bagging Nearest
Neighbor Support Vector Machine Untuk
Prediksi Kebangkrutan
Penyusun:
M. Ulin Nuha – 5108100164
Dosen Pembimbing: Isye Arieshanti, S.Kom., M.Phil
Yudhi Purwananto, S.Kom., M.Kom.
PRESENTASI TUGAS AKHIR – KI091391
(Keyword: Prediksi kebangkrutan, BNNSVM, Bootstrap aggregating, K-nearest neighbor,
Support Vector Machine)
Latar Belakang
Tujuan
Permasalahan
Pengembangan Perangkat Lunak
Kesimpulan
Daftar Pustaka
Tujuan
Permasalahan
Pengembangan Perangkat Lunak
Kesimpulan
Daftar Pustaka
Latar Belakang
Latar Belakang
Krisis Finansial Global
Perusahaan Bangkrut
Prediksi Kebangkrutan
Bagging Nearest Neighbor Support Vector Machine
Permasalahan
Pengembangan Perangkat Lunak
Kesimpulan
Daftar Pustaka
Latar Belakang
Tujuan
Tujuan
Bagging Nearest Neighbor Support Vector Machine
(BNNSVM)
Prediksi Kebangkrutan
Implementasi
Tujuan
Pengembangan Perangkat Lunak
Kesimpulan
Daftar Pustaka
Latar Belakang
Permasalahan
Permasalahan
Bagaimana mengimplementasikan metode Bagging Nearest Neighbor Support Vector Machine (BNNSVM) untuk prediksi kebangkrutan?
Bagaimana menguji model BNNSVM untuk memprediksi kebangkrutan perusahaan?
Latar Belakang
Tujuan
Permasalahan
Kesimpulan
Daftar Pustaka
Pengembangan Perangkat Lunak
Studi Literatur
Desain dan Implementasi
Uji Coba
Pengembangan Perangkat Lunak
Desain dan Implementasi
Uji Coba Studi
Literatur
BNNSVM
Bagging (Bootstrap
Aggregating)
SVM (Support Vector
Machine)
KNN (K-Nearest
Neighbor)
K-Nearest Neighbor
𝑑 𝑝, 𝑞 = (𝑝𝑖 − 𝑞𝑖)2
𝑛
𝑖=1
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
Jarak antar data dapat dihitung dengan Euclidean distance
SVM – Konsep Dasar
Mencari hyperplane (decision boundary) yang memisahkan data
SVM – Konsep Dasar
Salah satu solusi
B1
SVM – Konsep Dasar
Solusi lain
B2
SVM – Konsep Dasar B
1
b11
b12
0 bxw
1 bxw 1 bxw
SVM – Konsep Dasar
Optimisasi permasalahan
1
2𝑤 2𝑤
𝑚𝑖𝑛
dengan konstrain 𝑦𝑖 . 𝑤. 𝑥𝑖 + 𝑏 ≥ 1, 𝑖 = 1,… , 𝑛
Decision function
𝑓 𝑥 = 𝑠𝑖𝑔𝑛(𝑤. 𝑥 + 𝑏) ◦ w : nilai vektor yang unik dari hyperplane
◦ 𝑏 : nilai tahanan (intercept) dari hyperplane
◦ y : kelas label (+1, -1)
◦ x : vektor berisi nilai atribut masing-masing data
SVM – Konsep Dasar B
1
b11
b12
0 bxw
1 bxw 1 bxw
1bxw if1
1bxw if1)(
xf
||||
2Margin
w
SVM – Soft Margin
Data tidak dapat dipisahkan secara linier
SVM – Soft Margin
Variabel slack ξ
𝑤. 𝑥 + 𝑏 = −1 + ξ 0 bxw
1 bxw
SVM – Soft Margin
Optimisasi permasalahan
1
2𝑤𝑇𝑤 + 𝐶 𝜉𝑖
𝑙
𝑖=1
w,b,𝜉min
dengan konstrain 𝑦𝑖 . 𝑏 + 𝑤
𝑇 . 𝛷(𝑥𝑖) ≥ 1 − 𝜉𝑖 ; 𝑖 = 1,… , 𝑙 ; 𝜉𝑖 ≥ 0
Nilai cost C
◦ Semakin besar, kemungkinan error juga
semakin besar
SVM – Kernel Trick
Decision boundary tidak linier
SVM – Kernel Trick
Transformasi data ke dimensi yang lebih tinggi
SVM – Kernel Trick
Jenis Kernel Fungsi
Linear 𝐾 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 . 𝑥𝑗
Polynomial 𝐾 𝑥𝑖 , 𝑥𝑗 = (𝛾.𝑥𝑖 . 𝑥𝑗 + 𝑐𝑛)𝑑
Radial Basis Function (RBF) 𝐾 𝑥𝑖 , 𝑥𝑗 = exp (−𝛾 ||𝑥𝑖 − 𝑥𝑗||2)
Bagging (Bootstrap Aggregating)
D Data latih asli
D1 D2 Dn-1 Dn
Membuat n data
latih baru dengan
sampling with
replacement …
C1 C2 Cn-1 Cn …
Membuat n model
klasifikasi
C*
Menggabungkan
(voting) hasil
prediksi
Bagging (Bootstrap Aggregating)
Sampling with replacement
Data latih asli 1 2 3 4 5 6 7 8 9
Bagging (1) 7 8 9 8 2 5 9 2 1
Bagging (2) 1 4 9 1 2 3 2 7 3
Bagging (3) 1 8 5 9 5 5 9 6 3
Uji Coba Studi
Literatur
Desain dan Implementasi
Antarmuka
Proses
Data
Desain dan Implementasi
Data: Input
Wieslaw dataset
Australian credit approval dataset
Dataset Jumlah
Record +
Jumlah
Record -
Jumlah
Record
Jumlah
Atribut
Wieslaw 128 112 240 30
Australian
credit
approval
383 307 690 14
Data: Output
Data output latih: model klasifikasi
Data output uji: hasil prediksi
◦ -1
bangkrut (Wieslaw)
kredit ditolak (Australian)
◦ +1
tidak bangkrut (Wieslaw)
kredit diterima (Australian)
Proses: Pembagian Data
Cross Validation
Pembagian Data Latih
Dataset
Pembagian Data Latih
Proses: Pembagian Data
Cross Validation
Dataset
Data Latih 1 Data Uji 1 Data Latih n Data Uji n …
Cross Validation
Data Uji Proses: Pembagian Data
Cross Validation
Pembagian Data Latih
Data Latih
Data trs 1 Data ts 1 Data trs n Data ts n …
Cross Validation
Proses: Latih
Bagging KNN SVM
training
Data trs
Data ts
Data Uji
Data Uji
Data ts
Proses: Latih
KNN SVM
training Bagging
Data trs
Bootstrap
sample 1
Bootstrap
sample 2
Bootstrap
sample 9
Bootstrap
sample 10 …
Bagging
Data trs
Data Uji
Bagging
Proses: Latih
SVM training
Data trs
Bootstrap
sample 1
Bootstrap
sample 2
Bootstrap
sample 9
Bootstrap
sample 10 …
Bagging Data ts
KNN
Data Uji
Bagging
Proses: Latih
SVM training
Data ts
KNN 1 KNN 2 KNN 9 KNN 10 …
K-Nearest
Neighbor
KNN
Bootstrap
sample 1
Bootstrap
sample 2
Bootstrap
sample 9
Bootstrap
sample 10 …
K-Nearest
Neighbor
K-Nearest
Neighbor K-Nearest
Neighbor
Data Uji
KNN Bagging
Proses: Latih
Data ts
KNN 1 KNN 2 KNN 9 KNN 10 …
K-Nearest
Neighbor
Bootstrap
sample 1
Bootstrap
sample 2
Bootstrap
sample 9
Bootstrap
sample 10 …
K-Nearest
Neighbor
K-Nearest
Neighbor K-Nearest
Neighbor
SVM training
Data Uji
KNN Bagging
Proses: Latih
KNN 1 KNN 2 KNN 9 KNN 10 …
SVM training
SVM Training SVM Training SVM Training SVM Training
SVM Model 1 SVM Model 2 SVM Model 9 SVM Model
10 …
Proses: Uji
SVM testing
Bagging
Data Uji
SVM Model 1 SVM Model 2 SVM Model 9 SVM Model
10 …
Proses: Uji
Bagging SVM
testing
Data Uji
SVM Model 1 SVM Model 2 SVM Model 9 SVM Model
10 …
Proses: Uji
Bagging SVM
testing
SVM Model 1 SVM Model 2 SVM Model 9 SVM Model
10 …
SVM Testing SVM Testing SVM Testing SVM Testing
Prediksi 1 Prediksi 2 Prediksi 9 Prediksi 10 …
Data Uji
SVM testing
Proses: Uji
SVM Model 1 SVM Model 2 SVM Model 9 SVM Model
10 …
SVM Testing SVM Testing SVM Testing SVM Testing
Prediksi 1 Prediksi 2 Prediksi 9 Prediksi 10 …
Data Uji
Bagging
SVM testing
Proses: Uji
Bagging
Prediksi 1 Prediksi 2 Prediksi 9 Prediksi 10 …
Bagging (voting)
Prediksi
Akhir
Antarmuka
Studi Literatur
Desain dan Implementasi
Nilai k (KNN)
Nilai cost & Jenis Kernel (SVM)
Perbandingan dengan metode lain
Uji Coba
Uji coba dengan nilai k berbeda
(Wieslaw) k Akurasi Presisi
Sensiti-
vity Specifi-
city
1 65.67 67.88 69.76 61.67
2 68.92 71.39 70.61 67.18
3 68.92 72.41 69.85 69.78
4 70.58 72.87 71.99 68.6
5 70.83 73.82 71.03 70.25
6 69.42 72.78 69.06 69.48
7 70.5 75.3 68.76 72.5
8 70.58 73.55 70.41 70.91
9 70.83 74.39 70.55 72.58
10 71.58 73.99 71.86 71.08
60
65
70
75
80
85
90
1 2 3 4 5 6 7 8 9 10
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan nilai k berbeda
(Australian) k Akurasi Presisi
Sensiti
-vity
Specifi
-city
1 84.49 83.91 81.17 87.22
2 84.9 83.94 82.35 87.19
3 84.93 83.78 82.76 87.04
4 85.74 84.81 83.02 87.84
5 85.3 83.19 84.54 86.28
6 84.43 83.53 81.88 86.51
7 85.07 83.08 83.94 86.16
8 86.23 85.92 83.06 88.95
9 85.77 84.96 82.95 88.1
10 85.22 84.68 82.34 87.3
60
65
70
75
80
85
90
1 2 3 4 5 6 7 8 9 10
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan nilai cost SVM
berbeda (Wieslaw)
Cost Akurasi Presisi Sensiti-
vity Specifi-
city
0.01 67.58 68.22 75.51 59.25
0.1 70.17 72.66 72.12 66.93
1 71.08 75.12 71.61 72.17
10 70.33 72.58 70.85 69.47
100 71 75.25 70.03 72.87
55
60
65
70
75
80
85
90
95
0.01 0.1 1 10 100
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan nilai cost SVM
berbeda (Australian)
Cost Akurasi Presisi Sensiti-
vity Specifi-
city
0.01 83.51 86.86 74.5 90.7
0.1 85.54 82.51 85.55 85.44
1 84.64 83.64 82.09 86.56
10 80.72 81.68 73.79 86.27
100 75.8 78.57 65.46 83.85
55
60
65
70
75
80
85
90
95
0.01 0.1 1 10 100
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan kernel RBF dan nilai
gamma berbeda (Wieslaw)
Gamma Akurasi Presisi Sensiti-
vity Specifi-
city
0.0001 54.17 58.36 73.19 35.07
0.001 58.75 60.59 65.61 50.55
0.01 56.83 57.28 74.14 37.45
0.1 54.5 54.36 89.48 14.33
1 53.25 53.3 99.87 0
10 52.67 51 96 4 0
20
40
60
80
100
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan kernel RBF dan nilai
gamma berbeda (Australian)
Gamma Akurasi Presisi Sensiti-
vity Specifi-
city
0.0001 67.88 68.43 52.17 80.79
0.001 68.81 64.98 64.13 72.45
0.01 56.06 51.05 14.67 89.41
0.1 55.04 16 0.4 98.89
1 55.51 0 0 100
10 55.51 0 0 100 0
20
40
60
80
100
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan kernel Polynomial
dan nilai degree berbeda (Wieslaw)
Degree Akurasi Presisi Sensiti-
vity Specifi-
city
1 65.25 66.92 71.77 58.96
2 71.33 72.34 74.22 67.8
3 69.67 72.22 69.36 69.95
4 68.08 71.42 68.64 68.47
5 70.08 72.25 71.11 69.32
50
60
70
80
90
100
1 2 3 4 5
Akurasi Presisi
Sensitivity Specificity
Uji coba dengan kernel Polynomial
dan nilai degree berbeda (Australian)
Degree Akurasi Presisi Sensiti-
vity Specifi-
city
1 79.83 89.47 62.87 93.85
2 80.32 85.79 67.64 90.48
3 72.35 75.74 62.65 79.87
4 60.09 64.66 62.55 57.88
5 57.97 64.9 52.6 62.46
50
60
70
80
90
100
1 2 3 4 5
Akurasi Presisi
Sensitivity Specificity
Uji coba perbandingan dengan
metode klasifikasi lain (Wieslaw)
Metode Akurasi Presisi Sensiti-
vity
Spesifi-
city
KNN 75 76.22 78.04 73.03
ANN 70 70 70 70
SVM 70.42 74.29 69.91 74.82
BLR 87.54 90.68 86.42 11.07
BNN-
SVM 71.58 73.99 71.86 71.08
65
70
75
80
85
90
95
100
Akurasi Presisi
Sensitivity Spesificity
KNN = K-Nearest Neighbor
ANN = Artificial Neural Network
SVM = Support Vector Machine
BLR = Binary Logistic Regression
BNNSVM = Bagging Nearest Neighbor
Support Vector Machine
Uji coba perbandingan dengan
metode klasifikasi lain (Australian)
Metode Akurasi Presisi Sensiti-
vity
Spesifi-
city
KNN 83.19 80.9 80.5 85.2
ANN 83.48 83.5 85.11 87.94
SVM 84.35 77.03 93.44 77.92
BLR 80.83 81.73 75.89 14.84
BNN-
SVM 86.23 85.92 83.06 88.95
65
70
75
80
85
90
95
100
Akurasi Presisi
Sensitivity Spesificity
KNN = K-Nearest Neighbor
ANN = Artificial Neural Network
SVM = Support Vector Machine
BLR = Binary Logistic Regression
BNNSVM = Bagging Nearest Neighbor
Support Vector Machine
Latar Belakang
Tujuan
Permasalahan
Pengembangan Perangkat Lunak
Daftar Pustaka
Kesimpulan
BNNSVM Prediksi
Kebangkrutan
Dataset Akurasi Presisi Sensiti-
vity
Specifi-
city
Wieslaw 71.58 % 73.99 % 71.86 % 71.08 %
Australian
credit
approval
86.23 % 85.92 % 83.06 % 88.95 %
Hasil
Latar Belakang
Tujuan
Permasalahan
Pengembangan Perangkat Lunak
Kesimpulan
Daftar Pustaka
Daftar Pustaka
Li, H., & Sun, J. (2011). Forecasting Business Failure: The Use of Nearest-Neighbour Support Vectors and Correcting Imbalanced Samples - Evidence from Chinese Hotel Industry. Tourism Management , XXXIII (3), 622-634.
Frank, A., & Asuncion, A. (2010). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. Diambil kembali dari http://archive.ics.uci.edu/ml
Wieslaw, P. (2004). Application of Discrete Predicting Structures in An Early Warning Expert System for Financial Distress. Tourism Management.
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining (4th ed.). Boston: Pearson Addison Wesley.
Terima kasih