Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
CHAPTER 5
IMPLEMENTATION AND TESTING
5.1 ImplementationThis system is created and implement using PHP programming. The
system is start from upload txt file or writing the text manually and inputing value
of kgram and window in range between 1 until 10. That process is run in
index.php. Also user can added text in system with click add text button which is
display in interface. Here is the interface :
As seen in interface, browse button is use for search txt file that would be upload.
After done selecting txt file minimum 2 text then user can click Upload Text
button to uploading the text and display it into text area. User can also write the
text manually directly into text area. Then user must inputing value of kgram and
16
Illustration 5.1: interfacesystem1
17
window. And user can detect plagiarism in those text using winnowing algorithm
by click Process Text button.
When the Process Text button is clicked, system will redirecting into new
page and displayed the results. The results display is run in hasilProses.php. Here
is the example result interface :
Illustration 5.2: interfacesystem2
Illustration 5.3: interfacesystem3
Illustration 5.4: interfacesystem4
18
The result page is displaying each step of winnowing algorithm to detect
plagiarism in text and also the percentage plagiarism of input text using 3 different
fingerprint matching method that is Jaccard Similarity Coefficient, Sorensen Dice
Similarity Coefficient, Andberg Similarity Coefficient. And as seen in result
interface above, the first step of winnowing algorithm is whitespace intensitivity
by removing symbol,space and unrelevant character. That whitespace insensitivity
process is execute in proses1.php. Second step of winnowing algorithm is split
text into word based on value of kgram that being input in system, kgram process
is execute in proses2.php. Third step of winnowing algorithm is rolling hash to
produce hash value in each word on result of kgram process. Rolling hash process
is execute in proses3.php. Fourth step of winnowing algorithm is partition the
hash value based on value window that also being input in system. This window
process is execute in proses4.php. Fifth step of winnowing algorithm is choose
smallest value in each window to determine as fingerprint, fingerprint process is
Illustration 5.5: interfacesystem5
Illustration 5.6: interfacesystem6
Illustration 5.7: interfacesystem7
19
execute in proses5.php. And the last step is calculate similarity text with 3
similarity method. Jaccard similarity coefficient is run in proses6.php, Sorensen
Dice similarity coefficient is run in proses7.php and Andberg similarity coefficient
is run in proses8.php. Those all process is being inheritance each process into
another process and called in class hasilProses.php.
5.2 TestingOn the testing test, there is divided into 3 section of testing. The first test is
analysing kgram and window in range 1 till 10 using text sample to find optimal
value kgram and window to detect plagiarism in text between those range. After
gain optimal value of kgram and window. Then continue to the second test is
analysing basic prime number in rolling hash function to find optimal basic prime
number to detect plagiarism in text. And the third test is to comparing the 3
similarity method which is Jaccard Similarity Coefficient, Sorensen Dice
Similarity Coefficient, Andberg Similarity Coefficient.
The text sample that will be used in first test is history computer that taken from
wikipedia and rewritten into txt file and being split into 4 text sample.
First test is inputing kgram and window in system with range 1 till 10. The
test is done continously with those range kgram and window. So the result of first
test is 100 times in each similarity method test. The first test is record into table as
seen in table below, here is the first test sample recap :
1.) Jaccard Similarity Table Test
Table 5.1: jaccardtabletest1
kgram window text 1 = text 2 text 1 = text 3 text 1 = text 4
1 1 95,46% 95,45% 86,36%
1 2 95% 90% 85%
1 3 94,74% 84,21% 73,68%
1 4 94,12% 82,35% 70,59%
1 5 86,67% 86,67% 60%
1 6 91,67% 83,33% 58,33%
20
1 7 81,82% 81,82% 63,64%
1 8 88,89% 88,89% 77,78%
1 9 87,5% 100% 75%
1 10 85,71% 100% 71,43%
Mean 90,16% 89,27% 72,18%
2.) Sorensen Dice Similarity Table Test
Table 5.2: sorensendicetabletest1
kgram window text 1 = text 2 text 1 = text 3 text 1 = text 4
1 1 97,67% 97,67% 92,68%
1 2 97,43% 94,74% 91,89%
1 3 97,30% 91,43% 84,85%
1 4 96,97% 90,32% 82,76%
1 5 92,86% 92,86% 75%
1 6 95,65% 90,91% 73,68%
1 7 90% 90% 77,78%
1 8 94,12% 94,12% 87,5%
1 9 93,33% 100% 85,71%
1 10 92,31% 100% 83,33%
Mean 94,76% 94,20% 83,52%
3.) Andberg Similarity Table Test
Table 5.3: andbergtabletest1
kgram window text 1 = text 2 text 1 = text 3 text 1 = text 4
1 1 91,30% 91,30% 76%
1 2 90,48% 81,82% 73,91%
1 3 90% 72,73% 58,33%
1 4 88.89% 70% 54,54%
1 5 76,47% 76,47% 42,86%
1 6 84,61% 71,43% 41,18%
1 7 69,23% 69,23% 46,67%
1 8 80% 80% 63,64%
21
1 9 77,78% 100% 60%
1 10 75% 100% 55,56%
Mean 82,38% 81,30% 57,27%
If described with a diagram, here is the resume of first test in diagram :
1. Jaccard Similarity Diagram
kgram : 1
With value kgram 1 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is decrease when the window is bigger with
similarity reach 95,46%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
100%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
86,36%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.8: jsdiagram1
22
kgram : 2
With value kgram 2 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is decrease when the window is bigger with
similarity reach 93,33%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
96%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
84%.
kgram : 3
1 2 3 4 5 6 7 8 9 100
20
40
60
8010
0
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.9: jsdiagram2
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.10: jsdiagram3
23
With value kgram 3 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is decrease when the window is bigger with
similarity reach 94,79%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
72,51%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
55,92%.
kgram : 4
With value kgram 4 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is decrease when the window is bigger with
similarity reach 90,15%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
57,88%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
41,13%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.11: jsdiagram4
24
kgram : 5
With value kgram 5 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is constantly change with similarity reach
85,95%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
55,13%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
36,22%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.12: jsdiagram5
25
kgram : 6
With value kgram 6 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 83,10%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 53,80%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 32,61%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.13: jsdiagram6
26
kgram : 7
With value kgram 7 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 81,79%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 52,81%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 30,77%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.14: jsdiagram7
27
kgram : 8
With value kgram 8 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 81,29%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 52,20%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 30,14%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.15: jsdiagram8
28
kgram : 9
With value kgram 9 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 81,21%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 52,05%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 30,14%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.16: jsdiagram9
29
kgram : 10
With value kgram 10 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 80,78%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 51,35%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 29,73%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.17: jsdiagram10
30
2. Sorensen Dice Similarity Diagram
kgram : 1
With value kgram 1 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is constantly change with similarity reach
97,67%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
100%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
92,68%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.18: sdicediagram1
31
kgram : 2
With value kgram 2 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is decrease when the window is bigger with
similarity reach 96,55%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
97,96%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
91,30%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.19: sdicediagram2
32
kgram : 3
With value kgram 3 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 97,32%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 84,06%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 71,73%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.20: sdicediagram3
33
kgram : 4
With value kgram 4 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 94,82%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 73,32%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 58,29%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.21: sdicediagram4
34
kgram : 5
With value kgram 5 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 92,45%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 71,07%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 53,18%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.22: sdicediagram5
35
kgram : 6
With value kgram 6 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 90,77%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 69,96%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 49,18%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.23: sdicediagram6
36
kgram : 7
With value kgram 7 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 89,97%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 69,12%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 47,06%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.24: sdicediagram7
37
kgram : 8
With value kgram 8 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 89,68%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 68,59%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 46,31%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.25: sdicediagram8
38
kgram : 9
With value kgram 9 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 89,63%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 68,47%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 46,31%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.26: sdicediagram9
39
kgram : 10
With value kgram 10 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 89,37%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 67,86%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 45,83%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.27: sdicediagram10
40
3. Andberg Similarity Diagram
kgram : 1
With value kgram 1 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is decrease when the window is bigger with
similarity reach 91,30%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
100%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
76%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.28: andiagram1
41
kgram : 2
With value kgram 2 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is constantly change with similarity reach
87,50%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
92,31%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
72,41%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.29: andiagram2
42
kgram : 3
With value kgram 3 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is constantly change with similarity reach
90,10%.
2. The similarity in text 1 and text 3 is constantly change with similarity reach
56,88%.
3. The similarity in text 1 and text 4 is constantly change with similarity reach
38,81%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.30: andiagram3
43
kgram : 4
With value kgram 4 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 82,06%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 40,73%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 25,89%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.31: andiagram4
44
kgram : 5
With value kgram 5 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 75,36%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 38,05%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 22,11%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.32: andiagram5
45
kgram : 6
With value kgram 6 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 71,08%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 36,12%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 19,48%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.33: andiagram6
46
kgram : 7
With value kgram 7 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 69,17%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 35,88%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 18,18%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.34: andiagram7
47
kgram : 8
With value kgram 8 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 68,48%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 35,31%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 17,74%.
1 2 3 4 5 6 7 8 9 100
20
40
60
80
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.35: andiagram8
48
kgram : 9
With value kgram 9 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 68,36%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 35,18%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 17,74%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.36: andiagram9
49
kgram : 10
With value kgram 10 in range window 1 – 10, the analyse :
1. The similarity in text 1 and text 2 is not significantly change with similarity
reach 67,76%.
2. The similarity in text 1 and text 3 is not significantly change with similarity
reach 34,54%.
3. The similarity in text 1 and text 4 is not significantly change with similarity
reach 17,46%.
1 2 3 4 5 6 7 8 9 100
102030405060708090
100
Similarity Text 1 with Text 2
Similarity Text 1 with Text 3
Similarity Text 1 with Text 4
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.37: andiagram10
50
On the second test is analysing basic prime number in rolling hash
function. The text sample that will be used is same with the text sample in first
test. Value kgram and window that used in second test is value kgram 3 and
window 1.
The resume of second test with 3 different similarity method using text sample
above :
1. Jaccard Similarity Table Test
Kgram : 3 , Window : 1
Table 5.4: jsprimetable
Basic Prime Number Text 1 = Text 2 Text 1 = Text 3 Text 1 = Text 4
3 94,79% 72,51% 55,92%
5 91,46% 63,72% 44,51%
7 86,77% 61,64% 40,74%
11 86,47% 57,34% 38,07%
19 86,55% 56,50% 37,44%
29 86,56% 55,95% 37%
47 86,56% 55,95% 37%
109 86,56% 55,95% 37%
199 86,56% 55,95% 37%
2. Sorensen Similarity Table Test
Kgram : 3, Window : 1
Table 5.5: sdiceprimetable
Basic Prime Number Text 1 = Text 2 Text 1 = Text 3 Text 1 = Text 4
3 97,32% 84,06% 71,73%
5 95,54% 77,84% 61,60%
7 92,92% 76,27% 57,89%
11 92,74% 72,89% 55,15%
19 92,79% 72,21% 54,49%
29 92,80% 71,75% 54,02%
47 92,80% 71,75% 54,02%
51
109 92,80% 71,75% 54,02%
199 92,80% 71,75% 54,02%
3. Andberg Similarity Table Test
Kgram : 3 , Window : 1
Table 5.6: andprimetable
Basic Prime Number Text 1 = Text 2 Text 1 = Text 3 Text 1 = Text 4
3 90,09% 56,88% 38,81%
5 84,27% 46,76% 28,63%
7 76,63% 44,55% 25,58%
11 76,16% 40,19% 23,51%
19 76,28% 39,37% 23,03%
29 76,31% 38,84% 22,70%
47 76,31% 38,84% 22,70%
109 76,31% 38,84% 22,70%
199 76,31% 38,84% 22,70%
Based on resume of the first test and second test above, could be
concluded that using kgram 2 and 3 in range window 1 till 10 and using basic
prime number 3 is the optimal value for the third test because from the first test
and second test with three different similarity method it has highest mean result.
The third test is to testing the winnowing algorithm to detect plagiarism in text
and comparing the 3 similarity method. The testing text that will be used is one of
the cases plagiarism in Indonesia as written in news (kabar24.bisnis.com/diduga-
plagiat-ini-perbandingan-artikel-anggito-abimanyu-hotbonar-sinaga,2014).
Master text (munawarkasan.com/artikel-asuransi/43-menggagas-asuransi-
bencana,2014).
Plagiarism Text (budisanblog.blogspot.co.id/gagasan-asuransi-bencana,2014).
52
Third test is inputing kgram in system with optimal value kgram 2 and 3.
The test is done continously with those range kgram and range window is 1 till 10.
So the result of third test is 20 times in each similarity method. Here is the third
test result in table :
1. kgram : 2
Table 5.7: thirdtesttable1
Window Percentage Similarity
Jaccard Similarity
Sorensen DiceSimilarity
Andberg Similarity
1 81,37% 89,73% 68,59%
2 80,13% 88,97% 66,85%
3 78,77% 88,12% 64,97%
4 78,42% 87,90% 64,50%
5 73,88% 84,98% 58,58%
6 71,32% 83,26% 55,42%
7 71,67% 83,49% 55,84%
8 70,69% 82,83% 54,67%
9 70,53% 82,72% 54,48%
10 69,09% 81,72% 52,78%
Mean 74,59% 85,37% 59,67%
Based from the table, the analyse is :
1. The highest similarity from the article with jaccard similarity is using value
kgram 2 and value window 1 with result reach 81,37%.
2. The highest similarity from the article with sorensen dice similarity is using
value kgram 2 and value window 1 with result reach 89,73%.
3. The highest similarity from the article with andberg similarity is using value
kgram 2 and value window 1 with result reach 68,59%.
4. While the lowest similarity from the article with jaccard similarity is using
value kgram 2 and value window 10 with result is 69,09%.
53
5. The lowest similarity from the article with sorensen dice similarity is using
value kgram 2 and value window 10 with result is 81,72%.
6. The lowest similarity from the article with andberg similarity is using value
kgram 2 and value window 10 with result is 52,78%.
2. kgram : 3
Table 5.8: thirdtesttable2
Window Percentage Similarity
Jaccard Similarity
Sorensen DiceSimilarity
Andberg Similarity
1 83,33% 90,90% 71,43%
2 80,38% 89,12% 67,19%
3 78,45% 87,92% 64,53%
4 77,16% 87,10% 62,81%
5 73,91% 85% 58,62%
6 71,53% 83,40% 55,68%
7 71,04% 83,07% 55,09%
8 70,8% 82,90% 54,80%
9 69,04% 81,68% 52,71%
10 69,78% 82,20% 53,58%
Mean 74,54% 85,33% 59,64%
Based from the table, the analyse is :
1. The highest similarity from the article with jaccard similarity is using value
kgram 3 and value window 1 with result reach 83,33%.
2. The highest similarity from the article with sorensen dice similarity is using
value kgram 3 and value window 1 with result reach 90,90%.
3. The highest similarity from the article with andberg similarity is using value
kgram 3 and value window 1 with result reach 71,43%.
4. While the lowest similarity from the article with jaccard similarity is using
value kgram 3 and value window 9 with result is 69,04%.
54
5. The lowest similarity from the article with sorensen dice similarity is using
value kgram 3 and value window 9 with result is 81,68%.
6. The lowest similarity from the article with andberg similarity is using value
kgram 3 and value window 9 with result is 52,71%.
If described with a diagram, here is the resume of third test in diagram :
1. kgram : 2
With value kgram 2 in range window 1 – 10, the analyse :
1. The similarity in article using jaccard similarity is reach 81,37%
2. The similarity in article using sorensen dice similarity is reach 89,73%
3. The similarity in article using andberg similarity is reach 68,59%
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
100
Similarity Article using Jaccard Similarity
Similarity Article using Sorensen Dice Similarity
Similarity Article using Andberg Similarity
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.38: thirdtestdiagram1
55
2. kgram : 3
With value kgram 3 in range window 1 – 10, the analyse :
1. The similarity in article using jaccard similarity is reach 83,33%
2. The similarity in article using sorensen dice similarity is reach 90,90%
3. The similarity in article using andberg similarity is reach 71,43%
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
100
Similarity Article using Jaccard Similarity
Similarity Article using Sorensen Dice Similarity
Similarity Article using Andberg Similarity
Window
Pe
rce
nta
ge
Sim
ilari
ty
Illustration 5.39: thirdtestdiagram2