40
CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is created and implement using PHP programming. The system is start from upload txt file or writing the text manually and inputing value of kgram and window in range between 1 until 10. That process is run in index.php. Also user can added text in system with click add text button which is display in interface. Here is the interface : As seen in interface, browse button is use for search txt file that would be upload. After done selecting txt file minimum 2 text then user can click Upload Text button to uploading the text and display it into text area. User can also write the text manually directly into text area. Then user must inputing value of kgram and 16 Illustration 5.1: interfacesystem1

CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

CHAPTER 5

IMPLEMENTATION AND TESTING

5.1 ImplementationThis system is created and implement using PHP programming. The

system is start from upload txt file or writing the text manually and inputing value

of kgram and window in range between 1 until 10. That process is run in

index.php. Also user can added text in system with click add text button which is

display in interface. Here is the interface :

As seen in interface, browse button is use for search txt file that would be upload.

After done selecting txt file minimum 2 text then user can click Upload Text

button to uploading the text and display it into text area. User can also write the

text manually directly into text area. Then user must inputing value of kgram and

16

Illustration 5.1: interfacesystem1

Page 2: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

17

window. And user can detect plagiarism in those text using winnowing algorithm

by click Process Text button.

When the Process Text button is clicked, system will redirecting into new

page and displayed the results. The results display is run in hasilProses.php. Here

is the example result interface :

Illustration 5.2: interfacesystem2

Illustration 5.3: interfacesystem3

Illustration 5.4: interfacesystem4

Page 3: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

18

The result page is displaying each step of winnowing algorithm to detect

plagiarism in text and also the percentage plagiarism of input text using 3 different

fingerprint matching method that is Jaccard Similarity Coefficient, Sorensen Dice

Similarity Coefficient, Andberg Similarity Coefficient. And as seen in result

interface above, the first step of winnowing algorithm is whitespace intensitivity

by removing symbol,space and unrelevant character. That whitespace insensitivity

process is execute in proses1.php. Second step of winnowing algorithm is split

text into word based on value of kgram that being input in system, kgram process

is execute in proses2.php. Third step of winnowing algorithm is rolling hash to

produce hash value in each word on result of kgram process. Rolling hash process

is execute in proses3.php. Fourth step of winnowing algorithm is partition the

hash value based on value window that also being input in system. This window

process is execute in proses4.php. Fifth step of winnowing algorithm is choose

smallest value in each window to determine as fingerprint, fingerprint process is

Illustration 5.5: interfacesystem5

Illustration 5.6: interfacesystem6

Illustration 5.7: interfacesystem7

Page 4: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

19

execute in proses5.php. And the last step is calculate similarity text with 3

similarity method. Jaccard similarity coefficient is run in proses6.php, Sorensen

Dice similarity coefficient is run in proses7.php and Andberg similarity coefficient

is run in proses8.php. Those all process is being inheritance each process into

another process and called in class hasilProses.php.

5.2 TestingOn the testing test, there is divided into 3 section of testing. The first test is

analysing kgram and window in range 1 till 10 using text sample to find optimal

value kgram and window to detect plagiarism in text between those range. After

gain optimal value of kgram and window. Then continue to the second test is

analysing basic prime number in rolling hash function to find optimal basic prime

number to detect plagiarism in text. And the third test is to comparing the 3

similarity method which is Jaccard Similarity Coefficient, Sorensen Dice

Similarity Coefficient, Andberg Similarity Coefficient.

The text sample that will be used in first test is history computer that taken from

wikipedia and rewritten into txt file and being split into 4 text sample.

First test is inputing kgram and window in system with range 1 till 10. The

test is done continously with those range kgram and window. So the result of first

test is 100 times in each similarity method test. The first test is record into table as

seen in table below, here is the first test sample recap :

1.) Jaccard Similarity Table Test

Table 5.1: jaccardtabletest1

kgram window text 1 = text 2 text 1 = text 3 text 1 = text 4

1 1 95,46% 95,45% 86,36%

1 2 95% 90% 85%

1 3 94,74% 84,21% 73,68%

1 4 94,12% 82,35% 70,59%

1 5 86,67% 86,67% 60%

1 6 91,67% 83,33% 58,33%

Page 5: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

20

1 7 81,82% 81,82% 63,64%

1 8 88,89% 88,89% 77,78%

1 9 87,5% 100% 75%

1 10 85,71% 100% 71,43%

Mean 90,16% 89,27% 72,18%

2.) Sorensen Dice Similarity Table Test

Table 5.2: sorensendicetabletest1

kgram window text 1 = text 2 text 1 = text 3 text 1 = text 4

1 1 97,67% 97,67% 92,68%

1 2 97,43% 94,74% 91,89%

1 3 97,30% 91,43% 84,85%

1 4 96,97% 90,32% 82,76%

1 5 92,86% 92,86% 75%

1 6 95,65% 90,91% 73,68%

1 7 90% 90% 77,78%

1 8 94,12% 94,12% 87,5%

1 9 93,33% 100% 85,71%

1 10 92,31% 100% 83,33%

Mean 94,76% 94,20% 83,52%

3.) Andberg Similarity Table Test

Table 5.3: andbergtabletest1

kgram window text 1 = text 2 text 1 = text 3 text 1 = text 4

1 1 91,30% 91,30% 76%

1 2 90,48% 81,82% 73,91%

1 3 90% 72,73% 58,33%

1 4 88.89% 70% 54,54%

1 5 76,47% 76,47% 42,86%

1 6 84,61% 71,43% 41,18%

1 7 69,23% 69,23% 46,67%

1 8 80% 80% 63,64%

Page 6: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

21

1 9 77,78% 100% 60%

1 10 75% 100% 55,56%

Mean 82,38% 81,30% 57,27%

If described with a diagram, here is the resume of first test in diagram :

1. Jaccard Similarity Diagram

kgram : 1

With value kgram 1 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is decrease when the window is bigger with

similarity reach 95,46%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

100%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

86,36%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.8: jsdiagram1

Page 7: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

22

kgram : 2

With value kgram 2 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is decrease when the window is bigger with

similarity reach 93,33%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

96%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

84%.

kgram : 3

1 2 3 4 5 6 7 8 9 100

20

40

60

8010

0

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.9: jsdiagram2

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.10: jsdiagram3

Page 8: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

23

With value kgram 3 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is decrease when the window is bigger with

similarity reach 94,79%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

72,51%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

55,92%.

kgram : 4

With value kgram 4 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is decrease when the window is bigger with

similarity reach 90,15%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

57,88%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

41,13%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.11: jsdiagram4

Page 9: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

24

kgram : 5

With value kgram 5 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is constantly change with similarity reach

85,95%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

55,13%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

36,22%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.12: jsdiagram5

Page 10: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

25

kgram : 6

With value kgram 6 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 83,10%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 53,80%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 32,61%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.13: jsdiagram6

Page 11: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

26

kgram : 7

With value kgram 7 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 81,79%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 52,81%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 30,77%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.14: jsdiagram7

Page 12: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

27

kgram : 8

With value kgram 8 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 81,29%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 52,20%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 30,14%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.15: jsdiagram8

Page 13: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

28

kgram : 9

With value kgram 9 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 81,21%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 52,05%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 30,14%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.16: jsdiagram9

Page 14: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

29

kgram : 10

With value kgram 10 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 80,78%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 51,35%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 29,73%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.17: jsdiagram10

Page 15: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

30

2. Sorensen Dice Similarity Diagram

kgram : 1

With value kgram 1 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is constantly change with similarity reach

97,67%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

100%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

92,68%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.18: sdicediagram1

Page 16: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

31

kgram : 2

With value kgram 2 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is decrease when the window is bigger with

similarity reach 96,55%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

97,96%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

91,30%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.19: sdicediagram2

Page 17: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

32

kgram : 3

With value kgram 3 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 97,32%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 84,06%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 71,73%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.20: sdicediagram3

Page 18: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

33

kgram : 4

With value kgram 4 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 94,82%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 73,32%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 58,29%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.21: sdicediagram4

Page 19: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

34

kgram : 5

With value kgram 5 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 92,45%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 71,07%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 53,18%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.22: sdicediagram5

Page 20: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

35

kgram : 6

With value kgram 6 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 90,77%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 69,96%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 49,18%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.23: sdicediagram6

Page 21: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

36

kgram : 7

With value kgram 7 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 89,97%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 69,12%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 47,06%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.24: sdicediagram7

Page 22: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

37

kgram : 8

With value kgram 8 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 89,68%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 68,59%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 46,31%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.25: sdicediagram8

Page 23: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

38

kgram : 9

With value kgram 9 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 89,63%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 68,47%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 46,31%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.26: sdicediagram9

Page 24: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

39

kgram : 10

With value kgram 10 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 89,37%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 67,86%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 45,83%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.27: sdicediagram10

Page 25: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

40

3. Andberg Similarity Diagram

kgram : 1

With value kgram 1 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is decrease when the window is bigger with

similarity reach 91,30%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

100%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

76%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.28: andiagram1

Page 26: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

41

kgram : 2

With value kgram 2 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is constantly change with similarity reach

87,50%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

92,31%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

72,41%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.29: andiagram2

Page 27: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

42

kgram : 3

With value kgram 3 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is constantly change with similarity reach

90,10%.

2. The similarity in text 1 and text 3 is constantly change with similarity reach

56,88%.

3. The similarity in text 1 and text 4 is constantly change with similarity reach

38,81%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.30: andiagram3

Page 28: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

43

kgram : 4

With value kgram 4 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 82,06%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 40,73%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 25,89%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.31: andiagram4

Page 29: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

44

kgram : 5

With value kgram 5 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 75,36%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 38,05%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 22,11%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.32: andiagram5

Page 30: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

45

kgram : 6

With value kgram 6 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 71,08%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 36,12%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 19,48%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.33: andiagram6

Page 31: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

46

kgram : 7

With value kgram 7 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 69,17%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 35,88%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 18,18%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.34: andiagram7

Page 32: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

47

kgram : 8

With value kgram 8 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 68,48%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 35,31%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 17,74%.

1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.35: andiagram8

Page 33: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

48

kgram : 9

With value kgram 9 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 68,36%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 35,18%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 17,74%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.36: andiagram9

Page 34: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

49

kgram : 10

With value kgram 10 in range window 1 – 10, the analyse :

1. The similarity in text 1 and text 2 is not significantly change with similarity

reach 67,76%.

2. The similarity in text 1 and text 3 is not significantly change with similarity

reach 34,54%.

3. The similarity in text 1 and text 4 is not significantly change with similarity

reach 17,46%.

1 2 3 4 5 6 7 8 9 100

102030405060708090

100

Similarity Text 1 with Text 2

Similarity Text 1 with Text 3

Similarity Text 1 with Text 4

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.37: andiagram10

Page 35: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

50

On the second test is analysing basic prime number in rolling hash

function. The text sample that will be used is same with the text sample in first

test. Value kgram and window that used in second test is value kgram 3 and

window 1.

The resume of second test with 3 different similarity method using text sample

above :

1. Jaccard Similarity Table Test

Kgram : 3 , Window : 1

Table 5.4: jsprimetable

Basic Prime Number Text 1 = Text 2 Text 1 = Text 3 Text 1 = Text 4

3 94,79% 72,51% 55,92%

5 91,46% 63,72% 44,51%

7 86,77% 61,64% 40,74%

11 86,47% 57,34% 38,07%

19 86,55% 56,50% 37,44%

29 86,56% 55,95% 37%

47 86,56% 55,95% 37%

109 86,56% 55,95% 37%

199 86,56% 55,95% 37%

2. Sorensen Similarity Table Test

Kgram : 3, Window : 1

Table 5.5: sdiceprimetable

Basic Prime Number Text 1 = Text 2 Text 1 = Text 3 Text 1 = Text 4

3 97,32% 84,06% 71,73%

5 95,54% 77,84% 61,60%

7 92,92% 76,27% 57,89%

11 92,74% 72,89% 55,15%

19 92,79% 72,21% 54,49%

29 92,80% 71,75% 54,02%

47 92,80% 71,75% 54,02%

Page 36: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

51

109 92,80% 71,75% 54,02%

199 92,80% 71,75% 54,02%

3. Andberg Similarity Table Test

Kgram : 3 , Window : 1

Table 5.6: andprimetable

Basic Prime Number Text 1 = Text 2 Text 1 = Text 3 Text 1 = Text 4

3 90,09% 56,88% 38,81%

5 84,27% 46,76% 28,63%

7 76,63% 44,55% 25,58%

11 76,16% 40,19% 23,51%

19 76,28% 39,37% 23,03%

29 76,31% 38,84% 22,70%

47 76,31% 38,84% 22,70%

109 76,31% 38,84% 22,70%

199 76,31% 38,84% 22,70%

Based on resume of the first test and second test above, could be

concluded that using kgram 2 and 3 in range window 1 till 10 and using basic

prime number 3 is the optimal value for the third test because from the first test

and second test with three different similarity method it has highest mean result.

The third test is to testing the winnowing algorithm to detect plagiarism in text

and comparing the 3 similarity method. The testing text that will be used is one of

the cases plagiarism in Indonesia as written in news (kabar24.bisnis.com/diduga-

plagiat-ini-perbandingan-artikel-anggito-abimanyu-hotbonar-sinaga,2014).

Master text (munawarkasan.com/artikel-asuransi/43-menggagas-asuransi-

bencana,2014).

Plagiarism Text (budisanblog.blogspot.co.id/gagasan-asuransi-bencana,2014).

Page 37: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

52

Third test is inputing kgram in system with optimal value kgram 2 and 3.

The test is done continously with those range kgram and range window is 1 till 10.

So the result of third test is 20 times in each similarity method. Here is the third

test result in table :

1. kgram : 2

Table 5.7: thirdtesttable1

Window Percentage Similarity

Jaccard Similarity

Sorensen DiceSimilarity

Andberg Similarity

1 81,37% 89,73% 68,59%

2 80,13% 88,97% 66,85%

3 78,77% 88,12% 64,97%

4 78,42% 87,90% 64,50%

5 73,88% 84,98% 58,58%

6 71,32% 83,26% 55,42%

7 71,67% 83,49% 55,84%

8 70,69% 82,83% 54,67%

9 70,53% 82,72% 54,48%

10 69,09% 81,72% 52,78%

Mean 74,59% 85,37% 59,67%

Based from the table, the analyse is :

1. The highest similarity from the article with jaccard similarity is using value

kgram 2 and value window 1 with result reach 81,37%.

2. The highest similarity from the article with sorensen dice similarity is using

value kgram 2 and value window 1 with result reach 89,73%.

3. The highest similarity from the article with andberg similarity is using value

kgram 2 and value window 1 with result reach 68,59%.

4. While the lowest similarity from the article with jaccard similarity is using

value kgram 2 and value window 10 with result is 69,09%.

Page 38: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

53

5. The lowest similarity from the article with sorensen dice similarity is using

value kgram 2 and value window 10 with result is 81,72%.

6. The lowest similarity from the article with andberg similarity is using value

kgram 2 and value window 10 with result is 52,78%.

2. kgram : 3

Table 5.8: thirdtesttable2

Window Percentage Similarity

Jaccard Similarity

Sorensen DiceSimilarity

Andberg Similarity

1 83,33% 90,90% 71,43%

2 80,38% 89,12% 67,19%

3 78,45% 87,92% 64,53%

4 77,16% 87,10% 62,81%

5 73,91% 85% 58,62%

6 71,53% 83,40% 55,68%

7 71,04% 83,07% 55,09%

8 70,8% 82,90% 54,80%

9 69,04% 81,68% 52,71%

10 69,78% 82,20% 53,58%

Mean 74,54% 85,33% 59,64%

Based from the table, the analyse is :

1. The highest similarity from the article with jaccard similarity is using value

kgram 3 and value window 1 with result reach 83,33%.

2. The highest similarity from the article with sorensen dice similarity is using

value kgram 3 and value window 1 with result reach 90,90%.

3. The highest similarity from the article with andberg similarity is using value

kgram 3 and value window 1 with result reach 71,43%.

4. While the lowest similarity from the article with jaccard similarity is using

value kgram 3 and value window 9 with result is 69,04%.

Page 39: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

54

5. The lowest similarity from the article with sorensen dice similarity is using

value kgram 3 and value window 9 with result is 81,68%.

6. The lowest similarity from the article with andberg similarity is using value

kgram 3 and value window 9 with result is 52,71%.

If described with a diagram, here is the resume of third test in diagram :

1. kgram : 2

With value kgram 2 in range window 1 – 10, the analyse :

1. The similarity in article using jaccard similarity is reach 81,37%

2. The similarity in article using sorensen dice similarity is reach 89,73%

3. The similarity in article using andberg similarity is reach 68,59%

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Similarity Article using Jaccard Similarity

Similarity Article using Sorensen Dice Similarity

Similarity Article using Andberg Similarity

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.38: thirdtestdiagram1

Page 40: CHAPTER 5 IMPLEMENTATION AND TESTINGrepository.unika.ac.id/16185/6/14K10040 Luke Michael Febriansyah.… · CHAPTER 5 IMPLEMENTATION AND TESTING 5.1 Implementation This system is

55

2. kgram : 3

With value kgram 3 in range window 1 – 10, the analyse :

1. The similarity in article using jaccard similarity is reach 83,33%

2. The similarity in article using sorensen dice similarity is reach 90,90%

3. The similarity in article using andberg similarity is reach 71,43%

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

100

Similarity Article using Jaccard Similarity

Similarity Article using Sorensen Dice Similarity

Similarity Article using Andberg Similarity

Window

Pe

rce

nta

ge

Sim

ilari

ty

Illustration 5.39: thirdtestdiagram2