12
Research Article Intelligent Bar Chart Plagiarism Detection in Documents Mohammed Mumtaz Al-Dabbagh, 1,2 Naomie Salim, 1 Amjad Rehman, 3 Mohammed Hazim Alkawaz, 1,2 Tanzila Saba, 4 Mznah Al-Rodhaan, 5 and Abdullah Al-Dhelaan 5 1 Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia 2 Faculty of Computer Sciences and Mathematics, University of Mosul, Mosul, Iraq 3 MIS Department, CBA, Salman Bin Abdulaziz University, Alkharj, Saudi Arabia 4 College of Computer and Information Sciences (CCIS), Prince Sultan University, Riyadh, Saudi Arabia 5 Computer Science Department, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia Correspondence should be addressed to Amjad Rehman; [email protected] Received 30 March 2014; Revised 21 June 2014; Accepted 7 July 2014; Published 17 September 2014 Academic Editor: Iſtikhar Ahmad Copyright © 2014 Mohammed Mumtaz Al-Dabbagh et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts. 1. Introduction Detection, determination, and rectification of plagiarism are outstanding quests in every sphere of documentation and copyright. Lately, the significant advancement in information technology represented by digital libraries and World Wide Web is regarded as one of the main reasons for exponential growth in plagiarism appearance. It has become effortless for the plagiarist to utilize or copy the work of others without acknowledging or citing them due to the easy availability of most resources in digital format. us, plagiarism is regarded as one of the electronic crimes and intellectual theſts from others documents [13]. In academia, plagiarism posed a severe educational challenge which is acutely faced by research institutions, universities, and even schools. Several efforts are dedicated to detecting different types of plagiarism via programming code and text. Plagiarism detection began in the 1970s, where the identification of rate of plagiarism in programming code written by some computer languages such as C and Pascal was introduced [4]. Digital documents being the major carriers of information require extreme authentication in terms of their origins and trustfulness. e quest for achieving an accurate and efficient image forgery detection method in digital documentation is never ending. Developing a robust plagiarism detector by overcoming the limitations associated with human intervention is the key issue [5]. Recently, several researchers developed the algorithmic approach using computer codes to detect plagiarism in the homework of students [6]. Based on levels of plagiarism pat- terns some studies introduced plagiarism detection methods which are implemented in the algorithms and program codes [7]. Generally, the computerized or statistical approaches are exploited to detect plagiarism in natural language since the 1990s. e techniques used for natural language are based on various factors such as grammar, semantic, and grammar- semantics hybridizations [1, 4]. However, the grammar-based method is one of the restrictive ones to detect plagiarism. is type of method analyzes the sentences based on grammatical structure, which can be efficiently used to detect the Exact Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 612787, 11 pages http://dx.doi.org/10.1155/2014/612787

Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

Research ArticleIntelligent Bar Chart Plagiarism Detection in Documents

Mohammed Mumtaz Al-Dabbagh12 Naomie Salim1

Amjad Rehman3 Mohammed Hazim Alkawaz12 Tanzila Saba4

Mznah Al-Rodhaan5 and Abdullah Al-Dhelaan5

1 Faculty of Computing Universiti Teknologi Malaysia 81310 Skudai Johor Malaysia2 Faculty of Computer Sciences and Mathematics University of Mosul Mosul Iraq3MIS Department CBA Salman Bin Abdulaziz University Alkharj Saudi Arabia4College of Computer and Information Sciences (CCIS) Prince Sultan University Riyadh Saudi Arabia5 Computer Science Department College of Computer amp Information Sciences King Saud University Riyadh Saudi Arabia

Correspondence should be addressed to Amjad Rehman arkhansauedusa

Received 30 March 2014 Revised 21 June 2014 Accepted 7 July 2014 Published 17 September 2014

Academic Editor Iftikhar Ahmad

Copyright copy 2014 Mohammed Mumtaz Al-Dabbagh et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

This paper presents a novel features mining approach from documents that could not be mined via optical character recognition(OCR) By identifying the intimate relationship between the text and graphical components the proposed technique pulls out theStart End and Exact values for each bar Furthermore the word 2-gram and Euclidean distance methods are used to accuratelydetect and determine plagiarism in bar charts

1 Introduction

Detection determination and rectification of plagiarism areoutstanding quests in every sphere of documentation andcopyright Lately the significant advancement in informationtechnology represented by digital libraries and World WideWeb is regarded as one of the main reasons for exponentialgrowth in plagiarism appearance It has become effortless forthe plagiarist to utilize or copy the work of others withoutacknowledging or citing them due to the easy availabilityof most resources in digital format Thus plagiarism isregarded as one of the electronic crimes and intellectualthefts from others documents [1ndash3] In academia plagiarismposed a severe educational challengewhich is acutely faced byresearch institutions universities and even schools Severalefforts are dedicated to detecting different types of plagiarismvia programming code and text Plagiarism detection beganin the 1970s where the identification of rate of plagiarismin programming code written by some computer languagessuch as C and Pascal was introduced [4] Digital documents

being the major carriers of information require extremeauthentication in terms of their origins and trustfulness Thequest for achieving an accurate and efficient image forgerydetection method in digital documentation is never endingDeveloping a robust plagiarism detector by overcoming thelimitations associated with human intervention is the keyissue [5]

Recently several researchers developed the algorithmicapproach using computer codes to detect plagiarism in thehomework of students [6] Based on levels of plagiarism pat-terns some studies introduced plagiarism detection methodswhich are implemented in the algorithms and program codes[7] Generally the computerized or statistical approaches areexploited to detect plagiarism in natural language since the1990s The techniques used for natural language are basedon various factors such as grammar semantic and grammar-semantics hybridizations [1 4] However the grammar-basedmethod is one of the restrictive ones to detect plagiarismThistype of method analyzes the sentences based on grammaticalstructure which can be efficiently used to detect the Exact

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 612787 11 pageshttpdxdoiorg1011552014612787

2 The Scientific World Journal

Summarising caption

Synonyms of caption

Restructuring caption sentence

With caption

Without captionAll

chart

Part of chart

Exact copy

Modified copy

Chart plagiarism

Translation from one language to others

All data in chart

Part of data in chart

Restructuring chart

Same shape

Different shape

Changing color

Changing scale

Change words and symbols

Changing words via synonyms

Changing among symbols

Change value within range

Changing part of colors for chart

Changing all colors of chart

Convert to text

Convert among chart types

Convert to paragraph

Convert to table

All data converted

Part of data converted

Same colors

Different colors

All data

Part of data

Arrange manner

Arbitrary manner

All data

Part of data

Figure 1 Taxonomy of chart plagiarism

Copy of text While semantic-based method utilizes vectorspace model to calculate the similarities among the textsUndoubtedly the grammar-semantics hybrid approach over-comes all disadvantages of the other methods This is con-sidered as one of the most versatile techniques to detect textplagiarism [1 8 9]

A new taxonomy is introduced to explain the concepts forvarious types and patterns of text plagiarism [4] Plagiarismis divided into twomain parts including literal and intelligentone Each part consists of several subparts which cover allpossibilities of text plagiarism Generally the representationsof quantitative information are formulated via infographicformby using figures charts and tablesThe information thatis displayed in charts figures and tables includes results ofexperiments framework and statistical facts These data andinformation in homogenous form can be formulated by usingvarious shapes such as pie chart bar chart and 2D and 3Dplots [10ndash12]

We report a new type of plagiarism detection methodby highlighting the types of information that can be stolenfrom others work without referencing Firstly different typesof forged information are organized into taxonomy of chart

figure and table to highlight varieties of plagiarism patternssuch as Exact and Modified Copy Secondly plagiarismdetection in bar chart image is performed depending on tenfeatures in images Some of the features are extracted byOCR tool while others are acquired from the relationship oftext and graphic components [13 14] Finally the proposedtechnique is used to extract the features of bar chart imageswhich cannot be extracted by OCR to detect plagiarism Thepaper is organized as follows Section 2 describes variousexisting techniques for extracting data from bar chart images[15] The taxonomy of chart figure and table related toplagiarism is presented in Section 3 Section 4 discussesthe methodology and Section 5 includes the experimentalresults of bar chart plagiarism detection The discussions areelucidated in Section 6 Section 7 concludes the paper

2 Related Work

Categorizations of bar chart images refer to their labelinginto one of the predefined geometrical or nongeometricalclasses Though the classification is apparently manageable

The Scientific World Journal 3

Figure plagiarism

Modified copy

Exact copy

All figure

Part of figure

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Translation from one language to others

All data in figure

Part of data in figure

Restructuring figure

Same shape

Different shape

Convert to other shapes

Same colors

Different colors

Convert to paragraph

All data

Part of data

Changing color

Change words and sentences

Swap amongcomponents

of figure

All data

Part of data

Change by synonyms

Change by summarising

Change by restructuring

Figure 2 Taxonomy of figure plagiarism

it is proven to be an extremely difficult problem in com-puter programming Hence there is an intense attention indeveloping automatic tools to categorize describe or retrieveimages based on their contents

Consequently researchers attempted to extract the fea-tures and data from chart images For automatic images cat-egorization and description computational model is success-fully introduced [16] The analysis of local and global imagecharacteristics by using text and image features is used in themodel The model is capable of differentiating geometricaland ordinary imagesThe computational model is comprisedof classifier stage which is trained by the associated textfeatures using advanced concepts and similarity matchingstage

Classification methods based on multiple-instance learn-ing for chart images are also developed [10] A re-revision

system consisting of three concatenated major stages suchas classification extraction and redesigned chart images isemployed [17] In the extraction stage two types of charts (pieand bar) are focused on Some techniques are presented inextracting data and graphicalmarks from chart images Trulythe understanding and recognition of chart images requirethe preprocessing and extraction of data and informationPrimarily two types of available methods that deal with chartimages are either to consider electronic chart directly [18ndash20]or to obtain them after converting into raster images [21ndash24]Mishchenko and Vassilieva [25] introduced a model-basedmethod for the classification of chart images which involvedtwomain stages firstly predicting the location and the size ofchart depending on the color distribution of chart image andsecondly the extraction andmatching of chart image edges toachieve the best match between query and database images

4 The Scientific World Journal

Table plagiarism

Modified copy

Exact copy

All table

Part of table

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Copy column(s) from table

Copy row(s) from table

Copy columns and rows from table

Translation from one language to others

Restructuring table

Same shape

Different shape

Convert to paragraph

All table

Part of table

Convert to chart

All data

Part of data

All data

Part of data

Swapping among cells

Change words and sentences

Used synonyms

Summarising

Restructuring

Swapping of columns

Swapping of rows

Swap row(s) with column(s)

Figure 3 Taxonomy of table plagiarism

The techniques for features extraction of image dependonthe type of images such as chart or medical representationSome techniques are applicable on two-dimensional plot ofchart images while others work well for bar chart imagesHough transform technique is introduced as an approachto extract the features of bar chart images [23] Someinvestigations are based on the edges of bars to extract thefeatures [24 26] The learning-based method is establishedto recognize the chart images [22] The features of bar chartimages can be extracted by describing the height andwidth ofeach bar which is applied on statistical images to determinethe similarities [27] Meanwhile other techniques focusedon geometric features rather than data and information ofscientific bar chart images [28]

Currently several techniques are developed to extract thefeatures frommedical imagesThe texture is one of the visual

contents of a medical image used in content-based imageretrieval (CBIR) to represent the image effectively for search-ing and recovering similar areas [29] Gray-level statisticalmatrix technique is applied to extract the texture informationfor the content-based retrieval of mammograms from theMIAS database [30] 3D texture features technique based onthe cooccurrence matrixes of the gray-level gradient andcurvature information regarding the nodule volume data forclassifying themalignancy from benign is introduced [31 32]

3 Plagiarism Taxonomy and Patterns

Three types of graphic plagiarism such as figure chartand table are important to emphasize Each type highlightsdifferent levels of plagiarism The patterns and types ofplagiarism for figures charts and tables are presented as

The Scientific World Journal 5

taxonomy Some kinds of text plagiarism are also evaluated[33] The methods for detecting passages of text plagiarismfor documents without appropriate citations are also sug-gested Taxonomy is further extended to cover other typesof plagiarism [4] The taxonomy presented in various studiesmajorly demonstrates literal and intelligent plagiarism whereeach kind includes many patterns of plagiarism Howeverwe are interested in detecting plagiarism of charts as well astheir taxonomy figures and tables Alternatively charts (piebar and line) can be considered as one of the methods forrepresenting the data and information of experimental resultsor comparing among techniques which are copied from otherreferences without citation Therefore plagiarism of chartscan be formulated in several forms to manifest the sameinformation in various shapes Taxonomy of chart plagiarismdemonstrates many patterns and models which may be usedto plagiarize the data of chart image

Plagiarism patterns of chart figure and table are dividedinto Exact Copy and Modified Copy prototypes The ExactCopy patterns of plagiarism are defined as the direct quoteof data from other works without referencing where copyand paste of the whole or part of the information image isperformed Simplicity is one of the important attributes ofthis type of plagiarism Besides this type of plagiarism doesnot require much time to hide the academic crimeThe othertype of graphic plagiarism is the Modified Copy for infor-mation of chart figure and table This is more intelligentlyperformed than the previous one because the same data canbe formulated in many ways to exhibit the work in a differentstyle than the original oneThe goal of these intelligentmeansis that the plagiarist attempts to deceive the readers by doingsome changes such as translation from other languages orgenerating another shape for the same data

TheModifiedCopy plagiarisms are primarily divided intotranslation and restructuring In this research new types ofcopying are organized by taxonomy which explains variouspatterns of graphic plagiarism Furthermore the primaryfocus of the bar chart image is to detect the proportion ofplagarism Figures 1 2 and 3 depict the taxonomy of chartfigure and table plagiarism respectively

4 Methodology

Themethodology of bar chart plagiarism detection as shownin Figure 4 consists of three main stages namely planningand collection feature extraction and development withsystem evaluation In the planning and collection stagevarious patterns of graphical plagiarism via taxonomy ofchart figures and table are presented (Figures 1 2 and 3)Thetaxonomy of chart plagiarism explains different formulationswhich plagiarize the data of bar chart images Thereforevarieties of bar chart images are collected and data sets areconsidered The gatherings of the data sets consist of 100 barchart images for storing in databases and twenty images forquery including all possibilities of plagiarism for bar chartimages These data sets are collected from different resourcessuch as thesis which represented various types of bar chartimages in 2D and 3D Besides vertical and horizontal barchart images as shown in Figure 5 are taken into account

Planning and collection

Extract text features

Preprocessing

Extract numeric features

Detect plagiarism

System evaluation

Feat

ures

extr

actio

nD

evelo

pmen

t an

d ev

alua

tion

Figure 4 Flowchart of methodology

In the feature extraction stage the features of bar chartimages are used to detect plagiarism Various types of barchart images are analyzed to detect the features of imageThebar chart images inferred to acquiremaximum of ten featuresrepresenting the information and the data of image Thesefeatures are common in different types of bar chart imagesfor instance in 2D and 3D images However the number ofuses for these features may different from each other

The features extraction is an essential process to get thedata from images which can be utilized to detect the rate ofplagiarized data Therefore these ten features are categorizedinto low-level and high-level features The low-level featuresrefer to the text features of bar chart The text features are thetext which can be used in image to represent the informationand data such as caption of image label of each bar label ofcoordinates and values on coordinates Generally they areextracted from bar chart images using OCR tool Converselythe high-level features referring to numeric features cannot beextracted using OCR toolThe extraction of numeric featuresrequires a relationship between the text and graphic compo-nents The numeric features include values of bars in imageEach bar in an image has three numeric features which canbe extracted by the proposed technique depending on StartEnd and Exact values The Start and End values representthe first and last values while the Exact one corresponds tothe real value of the bar For instance the text features forimage in Figure 5 are ldquoFigure 10 Revision Design GalleriesrdquoldquoSilver Platinum Cashrdquo and ldquo0 5 40rdquo whichrepresent caption of image and label of coordinates X and Yrespectively Meanwhile the numeric features represent thevalue of each bar which can be detected by Start End andExact features For example the Start End and Exact featuresof bar Silver are 5 10 and 6 respectively

These features are used to detect the proportion of pla-giarism for bar chart imageThe extraction of Start End and

6 The Scientific World Journal

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40Revision Design Galleries ()

(a)

MinMean

Std

Rosenbrock function on 100 dimensions BSFP-PSO-CUI BSFP-PSO-CAI

250E + 07

200E + 07

150E + 07

100E + 07

500E + 06

000E + 00

(b)

Figure 5 Samples of data set [17 36]

Exact values necessitates preprocessing of bar chart image tothe adjacent coordinates of the image Image scanning is thenperformed to detect the length of each bar in order to findthe numeric features for each bar Storing of the features indatabases depends on the type of features whether numericor text The features which are extracted by the proposedtechnique are represented as vectors while the text featuresthat are extracted by OCR are characterized as string

The detection methods for text plagiarism are mainlycategorized based on character semantics structure cita-tion cluster cross language and syntax Comparatively thesmaller number of textual components than normal para-graph text allows us to use character-basedmethods to detectplagiarism of bar chart imagesThe character-based methodsdepend on character matching approaches to exactly orpartially detect the identical string for features of bar chartimages Various algorithms of plagiarism are adopted in thetext as character 119899-gram to identify the similarity betweentwo strings based on the number of identical characters offeatures Some researchers use 8-gramand 5-gram techniques[34 35] for matching strings to detect plagiarismWe used 2-gram technique to detect plagiarism of bar chart imagesThistechnique is used to represent the text features of bar chartimages Different similarity measures can be used to obtainthe similarity for numeric features such as Euclidean distanceJaccard or cosine coefficient The Euclidean distance iscalculated by the following

Ec (119909 119910) =radic

sum

119894

1003816

1003816

1003816

1003816

119909

119894minus 119910

119894

1003816

1003816

1003816

1003816

2 (1)

Once the detection and storing of the proportion ofplagiarism are completed then the performance of the systemis evaluated The performance is evaluated by overlapping of

features using the relation of Precision andRecall given by thefollowing

Recall = Relavent Documents RetreivedAll Relavent Documents

Precision = Relavent Documents RetreivedAll Documents Retrived

(2)

5 Experimental Results

The bar chart plagiarism detections are carried out in fourmain stages such as submission of query images featureextraction of bar chart image plagiarism detection andhighlighting results The first stage is to submit varioustypes of query images covering different kinds of possibleplagiarism to detect and judge plagiarism of bar chart imageswhile the features of query are extracted in the second stageThe third stage includes detection of plagiarism by usingword 2-gram and Euclidean distance techniques Finally thefeatures of query bar chart image that are plagiarized fromothers are highlighted and the proportion of the similarity isdisplayed

The bar chart plagiarism is further divided into ExactCopy and Modified Copy as explained in taxonomyFigure 6(a) shows the query image while Figure 6(b)depicts the plagiarized images detected by the system Thefirst plagiarized image is similar to the whole data in theimage while the second plagiarized image contains the samedata that was plagiarized but presented as a horizontal barchart image The system extracts the features of query imageand detects the proportion of plagiarism depending on StartEnd and Exact values for each bar as well as the label of eachbar The system highlights the data and information that areplagiarized and provides the proportion of plagiarism

One of the patterns of plagiarism derived from ModifiedCopy is the stealing by changing scales Each bar is modified

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

2 The Scientific World Journal

Summarising caption

Synonyms of caption

Restructuring caption sentence

With caption

Without captionAll

chart

Part of chart

Exact copy

Modified copy

Chart plagiarism

Translation from one language to others

All data in chart

Part of data in chart

Restructuring chart

Same shape

Different shape

Changing color

Changing scale

Change words and symbols

Changing words via synonyms

Changing among symbols

Change value within range

Changing part of colors for chart

Changing all colors of chart

Convert to text

Convert among chart types

Convert to paragraph

Convert to table

All data converted

Part of data converted

Same colors

Different colors

All data

Part of data

Arrange manner

Arbitrary manner

All data

Part of data

Figure 1 Taxonomy of chart plagiarism

Copy of text While semantic-based method utilizes vectorspace model to calculate the similarities among the textsUndoubtedly the grammar-semantics hybrid approach over-comes all disadvantages of the other methods This is con-sidered as one of the most versatile techniques to detect textplagiarism [1 8 9]

A new taxonomy is introduced to explain the concepts forvarious types and patterns of text plagiarism [4] Plagiarismis divided into twomain parts including literal and intelligentone Each part consists of several subparts which cover allpossibilities of text plagiarism Generally the representationsof quantitative information are formulated via infographicformby using figures charts and tablesThe information thatis displayed in charts figures and tables includes results ofexperiments framework and statistical facts These data andinformation in homogenous form can be formulated by usingvarious shapes such as pie chart bar chart and 2D and 3Dplots [10ndash12]

We report a new type of plagiarism detection methodby highlighting the types of information that can be stolenfrom others work without referencing Firstly different typesof forged information are organized into taxonomy of chart

figure and table to highlight varieties of plagiarism patternssuch as Exact and Modified Copy Secondly plagiarismdetection in bar chart image is performed depending on tenfeatures in images Some of the features are extracted byOCR tool while others are acquired from the relationship oftext and graphic components [13 14] Finally the proposedtechnique is used to extract the features of bar chart imageswhich cannot be extracted by OCR to detect plagiarism Thepaper is organized as follows Section 2 describes variousexisting techniques for extracting data from bar chart images[15] The taxonomy of chart figure and table related toplagiarism is presented in Section 3 Section 4 discussesthe methodology and Section 5 includes the experimentalresults of bar chart plagiarism detection The discussions areelucidated in Section 6 Section 7 concludes the paper

2 Related Work

Categorizations of bar chart images refer to their labelinginto one of the predefined geometrical or nongeometricalclasses Though the classification is apparently manageable

The Scientific World Journal 3

Figure plagiarism

Modified copy

Exact copy

All figure

Part of figure

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Translation from one language to others

All data in figure

Part of data in figure

Restructuring figure

Same shape

Different shape

Convert to other shapes

Same colors

Different colors

Convert to paragraph

All data

Part of data

Changing color

Change words and sentences

Swap amongcomponents

of figure

All data

Part of data

Change by synonyms

Change by summarising

Change by restructuring

Figure 2 Taxonomy of figure plagiarism

it is proven to be an extremely difficult problem in com-puter programming Hence there is an intense attention indeveloping automatic tools to categorize describe or retrieveimages based on their contents

Consequently researchers attempted to extract the fea-tures and data from chart images For automatic images cat-egorization and description computational model is success-fully introduced [16] The analysis of local and global imagecharacteristics by using text and image features is used in themodel The model is capable of differentiating geometricaland ordinary imagesThe computational model is comprisedof classifier stage which is trained by the associated textfeatures using advanced concepts and similarity matchingstage

Classification methods based on multiple-instance learn-ing for chart images are also developed [10] A re-revision

system consisting of three concatenated major stages suchas classification extraction and redesigned chart images isemployed [17] In the extraction stage two types of charts (pieand bar) are focused on Some techniques are presented inextracting data and graphicalmarks from chart images Trulythe understanding and recognition of chart images requirethe preprocessing and extraction of data and informationPrimarily two types of available methods that deal with chartimages are either to consider electronic chart directly [18ndash20]or to obtain them after converting into raster images [21ndash24]Mishchenko and Vassilieva [25] introduced a model-basedmethod for the classification of chart images which involvedtwomain stages firstly predicting the location and the size ofchart depending on the color distribution of chart image andsecondly the extraction andmatching of chart image edges toachieve the best match between query and database images

4 The Scientific World Journal

Table plagiarism

Modified copy

Exact copy

All table

Part of table

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Copy column(s) from table

Copy row(s) from table

Copy columns and rows from table

Translation from one language to others

Restructuring table

Same shape

Different shape

Convert to paragraph

All table

Part of table

Convert to chart

All data

Part of data

All data

Part of data

Swapping among cells

Change words and sentences

Used synonyms

Summarising

Restructuring

Swapping of columns

Swapping of rows

Swap row(s) with column(s)

Figure 3 Taxonomy of table plagiarism

The techniques for features extraction of image dependonthe type of images such as chart or medical representationSome techniques are applicable on two-dimensional plot ofchart images while others work well for bar chart imagesHough transform technique is introduced as an approachto extract the features of bar chart images [23] Someinvestigations are based on the edges of bars to extract thefeatures [24 26] The learning-based method is establishedto recognize the chart images [22] The features of bar chartimages can be extracted by describing the height andwidth ofeach bar which is applied on statistical images to determinethe similarities [27] Meanwhile other techniques focusedon geometric features rather than data and information ofscientific bar chart images [28]

Currently several techniques are developed to extract thefeatures frommedical imagesThe texture is one of the visual

contents of a medical image used in content-based imageretrieval (CBIR) to represent the image effectively for search-ing and recovering similar areas [29] Gray-level statisticalmatrix technique is applied to extract the texture informationfor the content-based retrieval of mammograms from theMIAS database [30] 3D texture features technique based onthe cooccurrence matrixes of the gray-level gradient andcurvature information regarding the nodule volume data forclassifying themalignancy from benign is introduced [31 32]

3 Plagiarism Taxonomy and Patterns

Three types of graphic plagiarism such as figure chartand table are important to emphasize Each type highlightsdifferent levels of plagiarism The patterns and types ofplagiarism for figures charts and tables are presented as

The Scientific World Journal 5

taxonomy Some kinds of text plagiarism are also evaluated[33] The methods for detecting passages of text plagiarismfor documents without appropriate citations are also sug-gested Taxonomy is further extended to cover other typesof plagiarism [4] The taxonomy presented in various studiesmajorly demonstrates literal and intelligent plagiarism whereeach kind includes many patterns of plagiarism Howeverwe are interested in detecting plagiarism of charts as well astheir taxonomy figures and tables Alternatively charts (piebar and line) can be considered as one of the methods forrepresenting the data and information of experimental resultsor comparing among techniques which are copied from otherreferences without citation Therefore plagiarism of chartscan be formulated in several forms to manifest the sameinformation in various shapes Taxonomy of chart plagiarismdemonstrates many patterns and models which may be usedto plagiarize the data of chart image

Plagiarism patterns of chart figure and table are dividedinto Exact Copy and Modified Copy prototypes The ExactCopy patterns of plagiarism are defined as the direct quoteof data from other works without referencing where copyand paste of the whole or part of the information image isperformed Simplicity is one of the important attributes ofthis type of plagiarism Besides this type of plagiarism doesnot require much time to hide the academic crimeThe othertype of graphic plagiarism is the Modified Copy for infor-mation of chart figure and table This is more intelligentlyperformed than the previous one because the same data canbe formulated in many ways to exhibit the work in a differentstyle than the original oneThe goal of these intelligentmeansis that the plagiarist attempts to deceive the readers by doingsome changes such as translation from other languages orgenerating another shape for the same data

TheModifiedCopy plagiarisms are primarily divided intotranslation and restructuring In this research new types ofcopying are organized by taxonomy which explains variouspatterns of graphic plagiarism Furthermore the primaryfocus of the bar chart image is to detect the proportion ofplagarism Figures 1 2 and 3 depict the taxonomy of chartfigure and table plagiarism respectively

4 Methodology

Themethodology of bar chart plagiarism detection as shownin Figure 4 consists of three main stages namely planningand collection feature extraction and development withsystem evaluation In the planning and collection stagevarious patterns of graphical plagiarism via taxonomy ofchart figures and table are presented (Figures 1 2 and 3)Thetaxonomy of chart plagiarism explains different formulationswhich plagiarize the data of bar chart images Thereforevarieties of bar chart images are collected and data sets areconsidered The gatherings of the data sets consist of 100 barchart images for storing in databases and twenty images forquery including all possibilities of plagiarism for bar chartimages These data sets are collected from different resourcessuch as thesis which represented various types of bar chartimages in 2D and 3D Besides vertical and horizontal barchart images as shown in Figure 5 are taken into account

Planning and collection

Extract text features

Preprocessing

Extract numeric features

Detect plagiarism

System evaluation

Feat

ures

extr

actio

nD

evelo

pmen

t an

d ev

alua

tion

Figure 4 Flowchart of methodology

In the feature extraction stage the features of bar chartimages are used to detect plagiarism Various types of barchart images are analyzed to detect the features of imageThebar chart images inferred to acquiremaximum of ten featuresrepresenting the information and the data of image Thesefeatures are common in different types of bar chart imagesfor instance in 2D and 3D images However the number ofuses for these features may different from each other

The features extraction is an essential process to get thedata from images which can be utilized to detect the rate ofplagiarized data Therefore these ten features are categorizedinto low-level and high-level features The low-level featuresrefer to the text features of bar chart The text features are thetext which can be used in image to represent the informationand data such as caption of image label of each bar label ofcoordinates and values on coordinates Generally they areextracted from bar chart images using OCR tool Converselythe high-level features referring to numeric features cannot beextracted using OCR toolThe extraction of numeric featuresrequires a relationship between the text and graphic compo-nents The numeric features include values of bars in imageEach bar in an image has three numeric features which canbe extracted by the proposed technique depending on StartEnd and Exact values The Start and End values representthe first and last values while the Exact one corresponds tothe real value of the bar For instance the text features forimage in Figure 5 are ldquoFigure 10 Revision Design GalleriesrdquoldquoSilver Platinum Cashrdquo and ldquo0 5 40rdquo whichrepresent caption of image and label of coordinates X and Yrespectively Meanwhile the numeric features represent thevalue of each bar which can be detected by Start End andExact features For example the Start End and Exact featuresof bar Silver are 5 10 and 6 respectively

These features are used to detect the proportion of pla-giarism for bar chart imageThe extraction of Start End and

6 The Scientific World Journal

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40Revision Design Galleries ()

(a)

MinMean

Std

Rosenbrock function on 100 dimensions BSFP-PSO-CUI BSFP-PSO-CAI

250E + 07

200E + 07

150E + 07

100E + 07

500E + 06

000E + 00

(b)

Figure 5 Samples of data set [17 36]

Exact values necessitates preprocessing of bar chart image tothe adjacent coordinates of the image Image scanning is thenperformed to detect the length of each bar in order to findthe numeric features for each bar Storing of the features indatabases depends on the type of features whether numericor text The features which are extracted by the proposedtechnique are represented as vectors while the text featuresthat are extracted by OCR are characterized as string

The detection methods for text plagiarism are mainlycategorized based on character semantics structure cita-tion cluster cross language and syntax Comparatively thesmaller number of textual components than normal para-graph text allows us to use character-basedmethods to detectplagiarism of bar chart imagesThe character-based methodsdepend on character matching approaches to exactly orpartially detect the identical string for features of bar chartimages Various algorithms of plagiarism are adopted in thetext as character 119899-gram to identify the similarity betweentwo strings based on the number of identical characters offeatures Some researchers use 8-gramand 5-gram techniques[34 35] for matching strings to detect plagiarismWe used 2-gram technique to detect plagiarism of bar chart imagesThistechnique is used to represent the text features of bar chartimages Different similarity measures can be used to obtainthe similarity for numeric features such as Euclidean distanceJaccard or cosine coefficient The Euclidean distance iscalculated by the following

Ec (119909 119910) =radic

sum

119894

1003816

1003816

1003816

1003816

119909

119894minus 119910

119894

1003816

1003816

1003816

1003816

2 (1)

Once the detection and storing of the proportion ofplagiarism are completed then the performance of the systemis evaluated The performance is evaluated by overlapping of

features using the relation of Precision andRecall given by thefollowing

Recall = Relavent Documents RetreivedAll Relavent Documents

Precision = Relavent Documents RetreivedAll Documents Retrived

(2)

5 Experimental Results

The bar chart plagiarism detections are carried out in fourmain stages such as submission of query images featureextraction of bar chart image plagiarism detection andhighlighting results The first stage is to submit varioustypes of query images covering different kinds of possibleplagiarism to detect and judge plagiarism of bar chart imageswhile the features of query are extracted in the second stageThe third stage includes detection of plagiarism by usingword 2-gram and Euclidean distance techniques Finally thefeatures of query bar chart image that are plagiarized fromothers are highlighted and the proportion of the similarity isdisplayed

The bar chart plagiarism is further divided into ExactCopy and Modified Copy as explained in taxonomyFigure 6(a) shows the query image while Figure 6(b)depicts the plagiarized images detected by the system Thefirst plagiarized image is similar to the whole data in theimage while the second plagiarized image contains the samedata that was plagiarized but presented as a horizontal barchart image The system extracts the features of query imageand detects the proportion of plagiarism depending on StartEnd and Exact values for each bar as well as the label of eachbar The system highlights the data and information that areplagiarized and provides the proportion of plagiarism

One of the patterns of plagiarism derived from ModifiedCopy is the stealing by changing scales Each bar is modified

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

The Scientific World Journal 3

Figure plagiarism

Modified copy

Exact copy

All figure

Part of figure

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Translation from one language to others

All data in figure

Part of data in figure

Restructuring figure

Same shape

Different shape

Convert to other shapes

Same colors

Different colors

Convert to paragraph

All data

Part of data

Changing color

Change words and sentences

Swap amongcomponents

of figure

All data

Part of data

Change by synonyms

Change by summarising

Change by restructuring

Figure 2 Taxonomy of figure plagiarism

it is proven to be an extremely difficult problem in com-puter programming Hence there is an intense attention indeveloping automatic tools to categorize describe or retrieveimages based on their contents

Consequently researchers attempted to extract the fea-tures and data from chart images For automatic images cat-egorization and description computational model is success-fully introduced [16] The analysis of local and global imagecharacteristics by using text and image features is used in themodel The model is capable of differentiating geometricaland ordinary imagesThe computational model is comprisedof classifier stage which is trained by the associated textfeatures using advanced concepts and similarity matchingstage

Classification methods based on multiple-instance learn-ing for chart images are also developed [10] A re-revision

system consisting of three concatenated major stages suchas classification extraction and redesigned chart images isemployed [17] In the extraction stage two types of charts (pieand bar) are focused on Some techniques are presented inextracting data and graphicalmarks from chart images Trulythe understanding and recognition of chart images requirethe preprocessing and extraction of data and informationPrimarily two types of available methods that deal with chartimages are either to consider electronic chart directly [18ndash20]or to obtain them after converting into raster images [21ndash24]Mishchenko and Vassilieva [25] introduced a model-basedmethod for the classification of chart images which involvedtwomain stages firstly predicting the location and the size ofchart depending on the color distribution of chart image andsecondly the extraction andmatching of chart image edges toachieve the best match between query and database images

4 The Scientific World Journal

Table plagiarism

Modified copy

Exact copy

All table

Part of table

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Copy column(s) from table

Copy row(s) from table

Copy columns and rows from table

Translation from one language to others

Restructuring table

Same shape

Different shape

Convert to paragraph

All table

Part of table

Convert to chart

All data

Part of data

All data

Part of data

Swapping among cells

Change words and sentences

Used synonyms

Summarising

Restructuring

Swapping of columns

Swapping of rows

Swap row(s) with column(s)

Figure 3 Taxonomy of table plagiarism

The techniques for features extraction of image dependonthe type of images such as chart or medical representationSome techniques are applicable on two-dimensional plot ofchart images while others work well for bar chart imagesHough transform technique is introduced as an approachto extract the features of bar chart images [23] Someinvestigations are based on the edges of bars to extract thefeatures [24 26] The learning-based method is establishedto recognize the chart images [22] The features of bar chartimages can be extracted by describing the height andwidth ofeach bar which is applied on statistical images to determinethe similarities [27] Meanwhile other techniques focusedon geometric features rather than data and information ofscientific bar chart images [28]

Currently several techniques are developed to extract thefeatures frommedical imagesThe texture is one of the visual

contents of a medical image used in content-based imageretrieval (CBIR) to represent the image effectively for search-ing and recovering similar areas [29] Gray-level statisticalmatrix technique is applied to extract the texture informationfor the content-based retrieval of mammograms from theMIAS database [30] 3D texture features technique based onthe cooccurrence matrixes of the gray-level gradient andcurvature information regarding the nodule volume data forclassifying themalignancy from benign is introduced [31 32]

3 Plagiarism Taxonomy and Patterns

Three types of graphic plagiarism such as figure chartand table are important to emphasize Each type highlightsdifferent levels of plagiarism The patterns and types ofplagiarism for figures charts and tables are presented as

The Scientific World Journal 5

taxonomy Some kinds of text plagiarism are also evaluated[33] The methods for detecting passages of text plagiarismfor documents without appropriate citations are also sug-gested Taxonomy is further extended to cover other typesof plagiarism [4] The taxonomy presented in various studiesmajorly demonstrates literal and intelligent plagiarism whereeach kind includes many patterns of plagiarism Howeverwe are interested in detecting plagiarism of charts as well astheir taxonomy figures and tables Alternatively charts (piebar and line) can be considered as one of the methods forrepresenting the data and information of experimental resultsor comparing among techniques which are copied from otherreferences without citation Therefore plagiarism of chartscan be formulated in several forms to manifest the sameinformation in various shapes Taxonomy of chart plagiarismdemonstrates many patterns and models which may be usedto plagiarize the data of chart image

Plagiarism patterns of chart figure and table are dividedinto Exact Copy and Modified Copy prototypes The ExactCopy patterns of plagiarism are defined as the direct quoteof data from other works without referencing where copyand paste of the whole or part of the information image isperformed Simplicity is one of the important attributes ofthis type of plagiarism Besides this type of plagiarism doesnot require much time to hide the academic crimeThe othertype of graphic plagiarism is the Modified Copy for infor-mation of chart figure and table This is more intelligentlyperformed than the previous one because the same data canbe formulated in many ways to exhibit the work in a differentstyle than the original oneThe goal of these intelligentmeansis that the plagiarist attempts to deceive the readers by doingsome changes such as translation from other languages orgenerating another shape for the same data

TheModifiedCopy plagiarisms are primarily divided intotranslation and restructuring In this research new types ofcopying are organized by taxonomy which explains variouspatterns of graphic plagiarism Furthermore the primaryfocus of the bar chart image is to detect the proportion ofplagarism Figures 1 2 and 3 depict the taxonomy of chartfigure and table plagiarism respectively

4 Methodology

Themethodology of bar chart plagiarism detection as shownin Figure 4 consists of three main stages namely planningand collection feature extraction and development withsystem evaluation In the planning and collection stagevarious patterns of graphical plagiarism via taxonomy ofchart figures and table are presented (Figures 1 2 and 3)Thetaxonomy of chart plagiarism explains different formulationswhich plagiarize the data of bar chart images Thereforevarieties of bar chart images are collected and data sets areconsidered The gatherings of the data sets consist of 100 barchart images for storing in databases and twenty images forquery including all possibilities of plagiarism for bar chartimages These data sets are collected from different resourcessuch as thesis which represented various types of bar chartimages in 2D and 3D Besides vertical and horizontal barchart images as shown in Figure 5 are taken into account

Planning and collection

Extract text features

Preprocessing

Extract numeric features

Detect plagiarism

System evaluation

Feat

ures

extr

actio

nD

evelo

pmen

t an

d ev

alua

tion

Figure 4 Flowchart of methodology

In the feature extraction stage the features of bar chartimages are used to detect plagiarism Various types of barchart images are analyzed to detect the features of imageThebar chart images inferred to acquiremaximum of ten featuresrepresenting the information and the data of image Thesefeatures are common in different types of bar chart imagesfor instance in 2D and 3D images However the number ofuses for these features may different from each other

The features extraction is an essential process to get thedata from images which can be utilized to detect the rate ofplagiarized data Therefore these ten features are categorizedinto low-level and high-level features The low-level featuresrefer to the text features of bar chart The text features are thetext which can be used in image to represent the informationand data such as caption of image label of each bar label ofcoordinates and values on coordinates Generally they areextracted from bar chart images using OCR tool Converselythe high-level features referring to numeric features cannot beextracted using OCR toolThe extraction of numeric featuresrequires a relationship between the text and graphic compo-nents The numeric features include values of bars in imageEach bar in an image has three numeric features which canbe extracted by the proposed technique depending on StartEnd and Exact values The Start and End values representthe first and last values while the Exact one corresponds tothe real value of the bar For instance the text features forimage in Figure 5 are ldquoFigure 10 Revision Design GalleriesrdquoldquoSilver Platinum Cashrdquo and ldquo0 5 40rdquo whichrepresent caption of image and label of coordinates X and Yrespectively Meanwhile the numeric features represent thevalue of each bar which can be detected by Start End andExact features For example the Start End and Exact featuresof bar Silver are 5 10 and 6 respectively

These features are used to detect the proportion of pla-giarism for bar chart imageThe extraction of Start End and

6 The Scientific World Journal

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40Revision Design Galleries ()

(a)

MinMean

Std

Rosenbrock function on 100 dimensions BSFP-PSO-CUI BSFP-PSO-CAI

250E + 07

200E + 07

150E + 07

100E + 07

500E + 06

000E + 00

(b)

Figure 5 Samples of data set [17 36]

Exact values necessitates preprocessing of bar chart image tothe adjacent coordinates of the image Image scanning is thenperformed to detect the length of each bar in order to findthe numeric features for each bar Storing of the features indatabases depends on the type of features whether numericor text The features which are extracted by the proposedtechnique are represented as vectors while the text featuresthat are extracted by OCR are characterized as string

The detection methods for text plagiarism are mainlycategorized based on character semantics structure cita-tion cluster cross language and syntax Comparatively thesmaller number of textual components than normal para-graph text allows us to use character-basedmethods to detectplagiarism of bar chart imagesThe character-based methodsdepend on character matching approaches to exactly orpartially detect the identical string for features of bar chartimages Various algorithms of plagiarism are adopted in thetext as character 119899-gram to identify the similarity betweentwo strings based on the number of identical characters offeatures Some researchers use 8-gramand 5-gram techniques[34 35] for matching strings to detect plagiarismWe used 2-gram technique to detect plagiarism of bar chart imagesThistechnique is used to represent the text features of bar chartimages Different similarity measures can be used to obtainthe similarity for numeric features such as Euclidean distanceJaccard or cosine coefficient The Euclidean distance iscalculated by the following

Ec (119909 119910) =radic

sum

119894

1003816

1003816

1003816

1003816

119909

119894minus 119910

119894

1003816

1003816

1003816

1003816

2 (1)

Once the detection and storing of the proportion ofplagiarism are completed then the performance of the systemis evaluated The performance is evaluated by overlapping of

features using the relation of Precision andRecall given by thefollowing

Recall = Relavent Documents RetreivedAll Relavent Documents

Precision = Relavent Documents RetreivedAll Documents Retrived

(2)

5 Experimental Results

The bar chart plagiarism detections are carried out in fourmain stages such as submission of query images featureextraction of bar chart image plagiarism detection andhighlighting results The first stage is to submit varioustypes of query images covering different kinds of possibleplagiarism to detect and judge plagiarism of bar chart imageswhile the features of query are extracted in the second stageThe third stage includes detection of plagiarism by usingword 2-gram and Euclidean distance techniques Finally thefeatures of query bar chart image that are plagiarized fromothers are highlighted and the proportion of the similarity isdisplayed

The bar chart plagiarism is further divided into ExactCopy and Modified Copy as explained in taxonomyFigure 6(a) shows the query image while Figure 6(b)depicts the plagiarized images detected by the system Thefirst plagiarized image is similar to the whole data in theimage while the second plagiarized image contains the samedata that was plagiarized but presented as a horizontal barchart image The system extracts the features of query imageand detects the proportion of plagiarism depending on StartEnd and Exact values for each bar as well as the label of eachbar The system highlights the data and information that areplagiarized and provides the proportion of plagiarism

One of the patterns of plagiarism derived from ModifiedCopy is the stealing by changing scales Each bar is modified

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

4 The Scientific World Journal

Table plagiarism

Modified copy

Exact copy

All table

Part of table

Without caption

With caption

Summarising caption

Synonyms of caption

Restructuring caption

Copy column(s) from table

Copy row(s) from table

Copy columns and rows from table

Translation from one language to others

Restructuring table

Same shape

Different shape

Convert to paragraph

All table

Part of table

Convert to chart

All data

Part of data

All data

Part of data

Swapping among cells

Change words and sentences

Used synonyms

Summarising

Restructuring

Swapping of columns

Swapping of rows

Swap row(s) with column(s)

Figure 3 Taxonomy of table plagiarism

The techniques for features extraction of image dependonthe type of images such as chart or medical representationSome techniques are applicable on two-dimensional plot ofchart images while others work well for bar chart imagesHough transform technique is introduced as an approachto extract the features of bar chart images [23] Someinvestigations are based on the edges of bars to extract thefeatures [24 26] The learning-based method is establishedto recognize the chart images [22] The features of bar chartimages can be extracted by describing the height andwidth ofeach bar which is applied on statistical images to determinethe similarities [27] Meanwhile other techniques focusedon geometric features rather than data and information ofscientific bar chart images [28]

Currently several techniques are developed to extract thefeatures frommedical imagesThe texture is one of the visual

contents of a medical image used in content-based imageretrieval (CBIR) to represent the image effectively for search-ing and recovering similar areas [29] Gray-level statisticalmatrix technique is applied to extract the texture informationfor the content-based retrieval of mammograms from theMIAS database [30] 3D texture features technique based onthe cooccurrence matrixes of the gray-level gradient andcurvature information regarding the nodule volume data forclassifying themalignancy from benign is introduced [31 32]

3 Plagiarism Taxonomy and Patterns

Three types of graphic plagiarism such as figure chartand table are important to emphasize Each type highlightsdifferent levels of plagiarism The patterns and types ofplagiarism for figures charts and tables are presented as

The Scientific World Journal 5

taxonomy Some kinds of text plagiarism are also evaluated[33] The methods for detecting passages of text plagiarismfor documents without appropriate citations are also sug-gested Taxonomy is further extended to cover other typesof plagiarism [4] The taxonomy presented in various studiesmajorly demonstrates literal and intelligent plagiarism whereeach kind includes many patterns of plagiarism Howeverwe are interested in detecting plagiarism of charts as well astheir taxonomy figures and tables Alternatively charts (piebar and line) can be considered as one of the methods forrepresenting the data and information of experimental resultsor comparing among techniques which are copied from otherreferences without citation Therefore plagiarism of chartscan be formulated in several forms to manifest the sameinformation in various shapes Taxonomy of chart plagiarismdemonstrates many patterns and models which may be usedto plagiarize the data of chart image

Plagiarism patterns of chart figure and table are dividedinto Exact Copy and Modified Copy prototypes The ExactCopy patterns of plagiarism are defined as the direct quoteof data from other works without referencing where copyand paste of the whole or part of the information image isperformed Simplicity is one of the important attributes ofthis type of plagiarism Besides this type of plagiarism doesnot require much time to hide the academic crimeThe othertype of graphic plagiarism is the Modified Copy for infor-mation of chart figure and table This is more intelligentlyperformed than the previous one because the same data canbe formulated in many ways to exhibit the work in a differentstyle than the original oneThe goal of these intelligentmeansis that the plagiarist attempts to deceive the readers by doingsome changes such as translation from other languages orgenerating another shape for the same data

TheModifiedCopy plagiarisms are primarily divided intotranslation and restructuring In this research new types ofcopying are organized by taxonomy which explains variouspatterns of graphic plagiarism Furthermore the primaryfocus of the bar chart image is to detect the proportion ofplagarism Figures 1 2 and 3 depict the taxonomy of chartfigure and table plagiarism respectively

4 Methodology

Themethodology of bar chart plagiarism detection as shownin Figure 4 consists of three main stages namely planningand collection feature extraction and development withsystem evaluation In the planning and collection stagevarious patterns of graphical plagiarism via taxonomy ofchart figures and table are presented (Figures 1 2 and 3)Thetaxonomy of chart plagiarism explains different formulationswhich plagiarize the data of bar chart images Thereforevarieties of bar chart images are collected and data sets areconsidered The gatherings of the data sets consist of 100 barchart images for storing in databases and twenty images forquery including all possibilities of plagiarism for bar chartimages These data sets are collected from different resourcessuch as thesis which represented various types of bar chartimages in 2D and 3D Besides vertical and horizontal barchart images as shown in Figure 5 are taken into account

Planning and collection

Extract text features

Preprocessing

Extract numeric features

Detect plagiarism

System evaluation

Feat

ures

extr

actio

nD

evelo

pmen

t an

d ev

alua

tion

Figure 4 Flowchart of methodology

In the feature extraction stage the features of bar chartimages are used to detect plagiarism Various types of barchart images are analyzed to detect the features of imageThebar chart images inferred to acquiremaximum of ten featuresrepresenting the information and the data of image Thesefeatures are common in different types of bar chart imagesfor instance in 2D and 3D images However the number ofuses for these features may different from each other

The features extraction is an essential process to get thedata from images which can be utilized to detect the rate ofplagiarized data Therefore these ten features are categorizedinto low-level and high-level features The low-level featuresrefer to the text features of bar chart The text features are thetext which can be used in image to represent the informationand data such as caption of image label of each bar label ofcoordinates and values on coordinates Generally they areextracted from bar chart images using OCR tool Converselythe high-level features referring to numeric features cannot beextracted using OCR toolThe extraction of numeric featuresrequires a relationship between the text and graphic compo-nents The numeric features include values of bars in imageEach bar in an image has three numeric features which canbe extracted by the proposed technique depending on StartEnd and Exact values The Start and End values representthe first and last values while the Exact one corresponds tothe real value of the bar For instance the text features forimage in Figure 5 are ldquoFigure 10 Revision Design GalleriesrdquoldquoSilver Platinum Cashrdquo and ldquo0 5 40rdquo whichrepresent caption of image and label of coordinates X and Yrespectively Meanwhile the numeric features represent thevalue of each bar which can be detected by Start End andExact features For example the Start End and Exact featuresof bar Silver are 5 10 and 6 respectively

These features are used to detect the proportion of pla-giarism for bar chart imageThe extraction of Start End and

6 The Scientific World Journal

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40Revision Design Galleries ()

(a)

MinMean

Std

Rosenbrock function on 100 dimensions BSFP-PSO-CUI BSFP-PSO-CAI

250E + 07

200E + 07

150E + 07

100E + 07

500E + 06

000E + 00

(b)

Figure 5 Samples of data set [17 36]

Exact values necessitates preprocessing of bar chart image tothe adjacent coordinates of the image Image scanning is thenperformed to detect the length of each bar in order to findthe numeric features for each bar Storing of the features indatabases depends on the type of features whether numericor text The features which are extracted by the proposedtechnique are represented as vectors while the text featuresthat are extracted by OCR are characterized as string

The detection methods for text plagiarism are mainlycategorized based on character semantics structure cita-tion cluster cross language and syntax Comparatively thesmaller number of textual components than normal para-graph text allows us to use character-basedmethods to detectplagiarism of bar chart imagesThe character-based methodsdepend on character matching approaches to exactly orpartially detect the identical string for features of bar chartimages Various algorithms of plagiarism are adopted in thetext as character 119899-gram to identify the similarity betweentwo strings based on the number of identical characters offeatures Some researchers use 8-gramand 5-gram techniques[34 35] for matching strings to detect plagiarismWe used 2-gram technique to detect plagiarism of bar chart imagesThistechnique is used to represent the text features of bar chartimages Different similarity measures can be used to obtainthe similarity for numeric features such as Euclidean distanceJaccard or cosine coefficient The Euclidean distance iscalculated by the following

Ec (119909 119910) =radic

sum

119894

1003816

1003816

1003816

1003816

119909

119894minus 119910

119894

1003816

1003816

1003816

1003816

2 (1)

Once the detection and storing of the proportion ofplagiarism are completed then the performance of the systemis evaluated The performance is evaluated by overlapping of

features using the relation of Precision andRecall given by thefollowing

Recall = Relavent Documents RetreivedAll Relavent Documents

Precision = Relavent Documents RetreivedAll Documents Retrived

(2)

5 Experimental Results

The bar chart plagiarism detections are carried out in fourmain stages such as submission of query images featureextraction of bar chart image plagiarism detection andhighlighting results The first stage is to submit varioustypes of query images covering different kinds of possibleplagiarism to detect and judge plagiarism of bar chart imageswhile the features of query are extracted in the second stageThe third stage includes detection of plagiarism by usingword 2-gram and Euclidean distance techniques Finally thefeatures of query bar chart image that are plagiarized fromothers are highlighted and the proportion of the similarity isdisplayed

The bar chart plagiarism is further divided into ExactCopy and Modified Copy as explained in taxonomyFigure 6(a) shows the query image while Figure 6(b)depicts the plagiarized images detected by the system Thefirst plagiarized image is similar to the whole data in theimage while the second plagiarized image contains the samedata that was plagiarized but presented as a horizontal barchart image The system extracts the features of query imageand detects the proportion of plagiarism depending on StartEnd and Exact values for each bar as well as the label of eachbar The system highlights the data and information that areplagiarized and provides the proportion of plagiarism

One of the patterns of plagiarism derived from ModifiedCopy is the stealing by changing scales Each bar is modified

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

The Scientific World Journal 5

taxonomy Some kinds of text plagiarism are also evaluated[33] The methods for detecting passages of text plagiarismfor documents without appropriate citations are also sug-gested Taxonomy is further extended to cover other typesof plagiarism [4] The taxonomy presented in various studiesmajorly demonstrates literal and intelligent plagiarism whereeach kind includes many patterns of plagiarism Howeverwe are interested in detecting plagiarism of charts as well astheir taxonomy figures and tables Alternatively charts (piebar and line) can be considered as one of the methods forrepresenting the data and information of experimental resultsor comparing among techniques which are copied from otherreferences without citation Therefore plagiarism of chartscan be formulated in several forms to manifest the sameinformation in various shapes Taxonomy of chart plagiarismdemonstrates many patterns and models which may be usedto plagiarize the data of chart image

Plagiarism patterns of chart figure and table are dividedinto Exact Copy and Modified Copy prototypes The ExactCopy patterns of plagiarism are defined as the direct quoteof data from other works without referencing where copyand paste of the whole or part of the information image isperformed Simplicity is one of the important attributes ofthis type of plagiarism Besides this type of plagiarism doesnot require much time to hide the academic crimeThe othertype of graphic plagiarism is the Modified Copy for infor-mation of chart figure and table This is more intelligentlyperformed than the previous one because the same data canbe formulated in many ways to exhibit the work in a differentstyle than the original oneThe goal of these intelligentmeansis that the plagiarist attempts to deceive the readers by doingsome changes such as translation from other languages orgenerating another shape for the same data

TheModifiedCopy plagiarisms are primarily divided intotranslation and restructuring In this research new types ofcopying are organized by taxonomy which explains variouspatterns of graphic plagiarism Furthermore the primaryfocus of the bar chart image is to detect the proportion ofplagarism Figures 1 2 and 3 depict the taxonomy of chartfigure and table plagiarism respectively

4 Methodology

Themethodology of bar chart plagiarism detection as shownin Figure 4 consists of three main stages namely planningand collection feature extraction and development withsystem evaluation In the planning and collection stagevarious patterns of graphical plagiarism via taxonomy ofchart figures and table are presented (Figures 1 2 and 3)Thetaxonomy of chart plagiarism explains different formulationswhich plagiarize the data of bar chart images Thereforevarieties of bar chart images are collected and data sets areconsidered The gatherings of the data sets consist of 100 barchart images for storing in databases and twenty images forquery including all possibilities of plagiarism for bar chartimages These data sets are collected from different resourcessuch as thesis which represented various types of bar chartimages in 2D and 3D Besides vertical and horizontal barchart images as shown in Figure 5 are taken into account

Planning and collection

Extract text features

Preprocessing

Extract numeric features

Detect plagiarism

System evaluation

Feat

ures

extr

actio

nD

evelo

pmen

t an

d ev

alua

tion

Figure 4 Flowchart of methodology

In the feature extraction stage the features of bar chartimages are used to detect plagiarism Various types of barchart images are analyzed to detect the features of imageThebar chart images inferred to acquiremaximum of ten featuresrepresenting the information and the data of image Thesefeatures are common in different types of bar chart imagesfor instance in 2D and 3D images However the number ofuses for these features may different from each other

The features extraction is an essential process to get thedata from images which can be utilized to detect the rate ofplagiarized data Therefore these ten features are categorizedinto low-level and high-level features The low-level featuresrefer to the text features of bar chart The text features are thetext which can be used in image to represent the informationand data such as caption of image label of each bar label ofcoordinates and values on coordinates Generally they areextracted from bar chart images using OCR tool Converselythe high-level features referring to numeric features cannot beextracted using OCR toolThe extraction of numeric featuresrequires a relationship between the text and graphic compo-nents The numeric features include values of bars in imageEach bar in an image has three numeric features which canbe extracted by the proposed technique depending on StartEnd and Exact values The Start and End values representthe first and last values while the Exact one corresponds tothe real value of the bar For instance the text features forimage in Figure 5 are ldquoFigure 10 Revision Design GalleriesrdquoldquoSilver Platinum Cashrdquo and ldquo0 5 40rdquo whichrepresent caption of image and label of coordinates X and Yrespectively Meanwhile the numeric features represent thevalue of each bar which can be detected by Start End andExact features For example the Start End and Exact featuresof bar Silver are 5 10 and 6 respectively

These features are used to detect the proportion of pla-giarism for bar chart imageThe extraction of Start End and

6 The Scientific World Journal

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40Revision Design Galleries ()

(a)

MinMean

Std

Rosenbrock function on 100 dimensions BSFP-PSO-CUI BSFP-PSO-CAI

250E + 07

200E + 07

150E + 07

100E + 07

500E + 06

000E + 00

(b)

Figure 5 Samples of data set [17 36]

Exact values necessitates preprocessing of bar chart image tothe adjacent coordinates of the image Image scanning is thenperformed to detect the length of each bar in order to findthe numeric features for each bar Storing of the features indatabases depends on the type of features whether numericor text The features which are extracted by the proposedtechnique are represented as vectors while the text featuresthat are extracted by OCR are characterized as string

The detection methods for text plagiarism are mainlycategorized based on character semantics structure cita-tion cluster cross language and syntax Comparatively thesmaller number of textual components than normal para-graph text allows us to use character-basedmethods to detectplagiarism of bar chart imagesThe character-based methodsdepend on character matching approaches to exactly orpartially detect the identical string for features of bar chartimages Various algorithms of plagiarism are adopted in thetext as character 119899-gram to identify the similarity betweentwo strings based on the number of identical characters offeatures Some researchers use 8-gramand 5-gram techniques[34 35] for matching strings to detect plagiarismWe used 2-gram technique to detect plagiarism of bar chart imagesThistechnique is used to represent the text features of bar chartimages Different similarity measures can be used to obtainthe similarity for numeric features such as Euclidean distanceJaccard or cosine coefficient The Euclidean distance iscalculated by the following

Ec (119909 119910) =radic

sum

119894

1003816

1003816

1003816

1003816

119909

119894minus 119910

119894

1003816

1003816

1003816

1003816

2 (1)

Once the detection and storing of the proportion ofplagiarism are completed then the performance of the systemis evaluated The performance is evaluated by overlapping of

features using the relation of Precision andRecall given by thefollowing

Recall = Relavent Documents RetreivedAll Relavent Documents

Precision = Relavent Documents RetreivedAll Documents Retrived

(2)

5 Experimental Results

The bar chart plagiarism detections are carried out in fourmain stages such as submission of query images featureextraction of bar chart image plagiarism detection andhighlighting results The first stage is to submit varioustypes of query images covering different kinds of possibleplagiarism to detect and judge plagiarism of bar chart imageswhile the features of query are extracted in the second stageThe third stage includes detection of plagiarism by usingword 2-gram and Euclidean distance techniques Finally thefeatures of query bar chart image that are plagiarized fromothers are highlighted and the proportion of the similarity isdisplayed

The bar chart plagiarism is further divided into ExactCopy and Modified Copy as explained in taxonomyFigure 6(a) shows the query image while Figure 6(b)depicts the plagiarized images detected by the system Thefirst plagiarized image is similar to the whole data in theimage while the second plagiarized image contains the samedata that was plagiarized but presented as a horizontal barchart image The system extracts the features of query imageand detects the proportion of plagiarism depending on StartEnd and Exact values for each bar as well as the label of eachbar The system highlights the data and information that areplagiarized and provides the proportion of plagiarism

One of the patterns of plagiarism derived from ModifiedCopy is the stealing by changing scales Each bar is modified

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

6 The Scientific World Journal

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40Revision Design Galleries ()

(a)

MinMean

Std

Rosenbrock function on 100 dimensions BSFP-PSO-CUI BSFP-PSO-CAI

250E + 07

200E + 07

150E + 07

100E + 07

500E + 06

000E + 00

(b)

Figure 5 Samples of data set [17 36]

Exact values necessitates preprocessing of bar chart image tothe adjacent coordinates of the image Image scanning is thenperformed to detect the length of each bar in order to findthe numeric features for each bar Storing of the features indatabases depends on the type of features whether numericor text The features which are extracted by the proposedtechnique are represented as vectors while the text featuresthat are extracted by OCR are characterized as string

The detection methods for text plagiarism are mainlycategorized based on character semantics structure cita-tion cluster cross language and syntax Comparatively thesmaller number of textual components than normal para-graph text allows us to use character-basedmethods to detectplagiarism of bar chart imagesThe character-based methodsdepend on character matching approaches to exactly orpartially detect the identical string for features of bar chartimages Various algorithms of plagiarism are adopted in thetext as character 119899-gram to identify the similarity betweentwo strings based on the number of identical characters offeatures Some researchers use 8-gramand 5-gram techniques[34 35] for matching strings to detect plagiarismWe used 2-gram technique to detect plagiarism of bar chart imagesThistechnique is used to represent the text features of bar chartimages Different similarity measures can be used to obtainthe similarity for numeric features such as Euclidean distanceJaccard or cosine coefficient The Euclidean distance iscalculated by the following

Ec (119909 119910) =radic

sum

119894

1003816

1003816

1003816

1003816

119909

119894minus 119910

119894

1003816

1003816

1003816

1003816

2 (1)

Once the detection and storing of the proportion ofplagiarism are completed then the performance of the systemis evaluated The performance is evaluated by overlapping of

features using the relation of Precision andRecall given by thefollowing

Recall = Relavent Documents RetreivedAll Relavent Documents

Precision = Relavent Documents RetreivedAll Documents Retrived

(2)

5 Experimental Results

The bar chart plagiarism detections are carried out in fourmain stages such as submission of query images featureextraction of bar chart image plagiarism detection andhighlighting results The first stage is to submit varioustypes of query images covering different kinds of possibleplagiarism to detect and judge plagiarism of bar chart imageswhile the features of query are extracted in the second stageThe third stage includes detection of plagiarism by usingword 2-gram and Euclidean distance techniques Finally thefeatures of query bar chart image that are plagiarized fromothers are highlighted and the proportion of the similarity isdisplayed

The bar chart plagiarism is further divided into ExactCopy and Modified Copy as explained in taxonomyFigure 6(a) shows the query image while Figure 6(b)depicts the plagiarized images detected by the system Thefirst plagiarized image is similar to the whole data in theimage while the second plagiarized image contains the samedata that was plagiarized but presented as a horizontal barchart image The system extracts the features of query imageand detects the proportion of plagiarism depending on StartEnd and Exact values for each bar as well as the label of eachbar The system highlights the data and information that areplagiarized and provides the proportion of plagiarism

One of the patterns of plagiarism derived from ModifiedCopy is the stealing by changing scales Each bar is modified

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

The Scientific World Journal 7

0

10

20

30

40

Cash Bonds Stocks Gold SilverPlatinumFigure Revision Design Galleries

Query image

()

(a)

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]Gold [0 10 10]Platinum [0 10 6]Silver [0 10 5]

Figure Revision Design Galleries

The rate of similarity is 100

Cash [30 40 32]Bonds [20 30 22]Stocks [20 30 22]

Gold [0 10 10]Platinum [0 10 6]

Silver [0 10 5]Figure Revision Design Galleries

Cash Bonds Stocks Gold SilverPlatinum

Figure Revision Design Galleries ()

0

10

20

30

40

()

Cash

Platinum

BondsStocks

Gold

Silver

0 5 10 15 20 25 30 35 40

The rate of similarity is 71

Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]

Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

The rate of similarity is 71Cash [30 - 32]Bonds [20 - 22]Stocks [20 - 22]Gold [- - 10]Platinum [- 10 6]Silver [- 10 5]

Figure Revision Design Galleries

Plagiarized image

Plagiarized image

Figure Revision Design Galleries

(b)

Figure 6 Plagiarism detection for one of Exact Copy patterns as plagiarism of the whole data of image

by plagiarists by changing Start and End values to be differentfrom the original image Figure 7 illustrates the significantrole played by the Exact values to detect this type of plagia-rism

The plagiarists may use integration among patterns ofpossible bar chart plagiarism to present a more compleximage which has the same data quoting from other worksFigure 8 displays the query image which is modified bychanging colours and scales of bars as well as changing theirlocation via swapping The proposed system is capable ofdetecting this type of plagiarism and identifies the proportionof similarity

Figure 9 illustrates the performance of the system for pla-giarism detection of Exact Copy andModified Copy patternsrespectively

6 Discussion

The state-of-the-art graphical plagiarism techniques andpatterns are presentedThe graphical plagiarism is consideredone of the electronic crimes and thefts and the concepts ofsuch stealing are newly viewed Various important informa-tion and data can be represented as graphical forms suchas results or frameworks for academic and business aspectsHowever many systems of text plagiarism methods such asTurnitin are incapable of detecting plagiarism of images Inspite of the different styles of bar chart images the extractionof features of image plays an important role in detectingplagiarism Our proposed technique which is used to extractthe numeric features played an essential role for bar chartplagiarism detection The patterns of Exact Copy of bar chart

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

8 The Scientific World Journal

Query image

TA fo

und

40

32

24

16

8

0CTN CEA PP CA TPST

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(a)

40

35

30

25

5

0CTN CEA PP CA TPST

20

15

10

TA fo

und

Plagiarized image

The rate of similarity is 53 CTN-manual search [- 40 36]CTN-system search [- - 35]

CEA-system search [- - 29]PP-manual search [- - 32]

CA-manual search [- - 32]

TPST-manual search [- - 8]TPST-system search [- - 6]

PP-system search [- - 32]

CA-system search [- - 32]

CEA-manual search [- - 29]

Comparison between manual search and system search from UK e-commerce website

Manual searchSystem search

Trust attributesComparison between manual search and system search

from UK e-commerce website

(b)

Figure 7 Plagiarism detection for one of the Modified Copy patterns which is the stealing by changing scales of image

plagiarism detection including direct copy of all data orpart of data are underscored Plagiarism which is carriedout by modifying caption of images via restructuring orsummarizing for label sentences is emphasized Alternativelythe patterns of Modified Copy which are regarded as morecomplicated than Exact Copy patterns are also analyzedThe difficulty of these patterns is the changing on imagewhich appears as the same data and information in differentformsThe restructuring of information for image within thesame shape is also covered The edition of bar chart imagesincluding the change of image bar colors or changing thebar locations either by swapping or via generating horizontalbars from vertical bars and vice versa is discussed in detail

Besides more professional modification such as changingof scales on coordinates which is completely different fromoriginal one can be detected by the proposed method Inthis case the Start and End features of bars are completelydifferent Consequently the Exact features as well as otherattributes play significant role in detecting plagiarism in barchart image

7 Conclusion

We demonstrate the precise recognition of different plagia-rized patterns in business documents using an intelligent barchart detection system The types and patterns of plagiarism

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

The Scientific World Journal 9

Query image

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 8 16 24 32 40 Example of bar chart Revision redesigns ()

(a)

Example Revision redesigns for input bar charts ()

Orthoptera

Hymenoptera

Hemiptera

Lepidoptera

Other orders

Coleoptera

0 10 20 30 40

The rate of similarity is 46

Orthoptera [- - 9]Hymenoptera [- - 11]Hemiptera [- - 11]Lepidoptera [0 - 7]Other orders [- - -]Coleoptera [- - -]

Plagiarized image

(b)

Figure 8 Plagiarism detection for integration among possible bar chart plagiarism

1

08

06

04

02

0

Prec

ision

1080604020Recall

(a)

1

08

06

04

02

0

Prec

ision

1080604020Recall

(b)

Figure 9 The evaluation of performance (a) for Exact Copy patterns and (b) forModified Copy patterns

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

10 The Scientific World Journal

are presented via taxonomy of figure chart and table Variouskinds of possible plagiarism are highlighted using taxonomyPlagiarism of bar chart image as type of chart is detectedby the newly proposed technique It is established that thepresent technique is capable of extracting the features from abar chart image which cannot be pulled out using OCR toolOur technique first recognizes the connection between thetext and graphical components to extract the Start End andExact value for each bar Using word 2-gram and Euclideandistance methods the accurate detection of plagiarism isperformed The detection of plagiarism is based on tenstriking features The system is capable of detecting differentlevels of plagiarism not only copy and paste of bar chartimage but also modification on images such as changingcolor or scales The present system efficiently and accuratelydistinguishes other possible alteration administered on theseimages such as swapping among bars location and evenchanges on caption via summarizing and restructuring Theproposed technique may be useful for intelligent plagiarismdetection in business and academic documents

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors extend their appreciation to the Deanship ofScientific Research at King Saud University for funding thiswork through Research Group no RGP-264 The authorsare also thankful to Ministry of Science and TechnologyInnovation (MOSTI) Malaysia and Research ManagementCenter (RMC) Universiti Teknologi Malaysia (UTM) JohorMalaysia for their technical support and expertise in con-ducting this research

References

[1] A M El Tahir Ali H M D Abdulla and V Snasel ldquoSurveyof plagiarism detection methodsrdquo in Proceedings of the 5th AsiaModelling Symposium (AMS rsquo05) pp 39ndash42 May 2011

[2] A Rehman and T Saba ldquoFeatures extraction for soccer videosemantic analysis current achievements and remaining issuesrdquoArtificial Intelligence Review vol 41 no 3 pp 451ndash461 2014

[3] A Rehman and T Saba ldquoEvaluation of artificial intelligent tech-niques to secure information in enterprisesrdquo Artificial Intelli-gence Review pp 1ndash16 2012

[4] S M Alzahrani N Salim and A Abraham ldquoUnderstandingplagiarism linguistic patterns textual features and detectionmethodsrdquo IEEE Transactions on Systems Man and CyberneticsC Applications and Reviews vol 42 no 2 pp 133ndash149 2012

[5] T Saba and A Altameem ldquoAnalysis of vision based systems todetect real time goal events in soccer videosrdquo Applied ArtificialIntelligence vol 27 no 7 pp 656ndash667 2013

[6] K J Ottenstein ldquoAn algorithmic approach to the detection andprevention of plagiarismrdquo SIGCSE Bulletin vol 8 pp 30ndash411976

[7] A Parker and J O Hamblen ldquoComputer algorithms for plagia-rismdetectionrdquo IEEETransactions on Education vol 32 pp 94ndash99 1989

[8] J P Bao J Y Shen X D Liu andQ B Song ldquoSurvey on naturallanguage text copy detectionrdquo Journal of Software vol 14 no 10pp 1753ndash1760 2003

[9] T Saba and A Rehman Machine Learning and Script Recogni-tion Lambert Academic Publisher 2012

[10] W Huang S Zong and C L Tan ldquoChart image classificationusing multiple-instance learningrdquo in Proceedings of the IEEEWorkshop on Applications of Computer Vision (WACV 07) p27 Austin Tex USA February 2007

[11] T Saba A Rehman and M Elarbi-Boudihir ldquoMethods andstrategies on off-line cursive touched characters segmentationa directional reviewrdquo Artificial Intelligence Review 2011

[12] A Rehman D Mohammad G Sulong and T Saba ldquoSimpleand effective techniques for core-region detection and slantcorrection in offline script recognitionrdquo in Proceedings of theIEEE International Conference on Signal and Image ProcessingApplications (ICSIPA rsquo09) pp 15ndash20 Kuala Lumpur MalaysiaNovember 2009

[13] A Rehman and T Saba ldquoDocument skew estimation andcorrection analysis of techniques common problems andpossible solutionsrdquo Applied Artificial Intelligence vol 25 no 9pp 769ndash787 2011

[14] A Rehman and T Saba ldquoOff-line cursive script recognitioncurrent advances comparisons and remaining problemsrdquo Arti-ficial Intelligence Review vol 37 no 4 pp 261ndash288 2012

[15] T Saba A Rehman and G Sulong ldquoImproved statisticalfeatures for cursive character recognitionrdquo International Journalof Innovative Computing Information and Control vol 7 no 9pp 5211ndash5224 2011

[16] T Helmy ldquoA computational model for context-based imagecategorization and descriptionrdquo International Journal of Imageand Graphics vol 12 no 1 Article ID 1250001 2012

[17] M Savva N Kong A Chhajta F-F Li M Agrawala and JHeer ldquoReVision automated classification analysis and redesignof chart imagesrdquo in Proceedings of the 24th Annual ACMSymposium on User Interface Software and Technology (UISTrsquo11) pp 393ndash402 Santa Barbara Calif USA October 2011

[18] S Elzer S Carberry I Zukerman D Chester N Green and SDemir ldquoA probabilistic framework for recognizing intention ininformation graphicsrdquo in Proceedings of the 19th InternationalJoint Conference on Artificial Intelligence (IJCAI rsquo05) pp 1042ndash1047 Edinburgh UK August 2005

[19] S Carberry S Elzer N Green K McCoy and D ChesterldquoUnderstanding information graphics a discourse-level prob-lemrdquo in Proceedings of the 4th SIGdial Workshop of DiscourseandDialogue (SIGDIAL rsquo03) pp 1ndash12 Sapporo Japan July 2003

[20] M S M Rahim A Rehman S Nirsquomatus F Kurniawan andT Saba ldquoRegion-based features extraction in ear biometricsrdquoInternational Journal of Academic Research vol 4 no 1 pp 37ndash42 2012

[21] N Yokokura and T Watanabe ldquoLayout-based approach forextracting constructive elements of bar-chartsrdquo in Graph-ics Recognition Algorithms and Systems K Tombre and AChhabra Eds vol 1389 pp 163ndash174 Springer Berlin Germany1998

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

The Scientific World Journal 11

[22] Y Zhou and C L Tan ldquoLearning-based scientific chart recog-nitionrdquo in Proceedings of the 4th IAPR International Workshopon Graphics Recognition (GREC rsquo01) pp 482ndash492 2001

[23] Y P Zhou and C L Tan ldquoHough technique for bar chartsdetection and recognition in document imagesrdquo in Proceedingof the International Conference on Image Processing (ICIP 00)vol 2 pp 605ndash608 Vancouver Canada September 2000

[24] W Huang C L Tan and W K Leow ldquoModel-based chartimage recognitionrdquo in Graphics Recognition Recent Advancesand Perspectives J Llados and Y-B Kwon Eds vol 3088 pp87ndash99 Springer Berlin Germany 2004

[25] A Mishchenko and N Vassilieva ldquoModel-based chart imageclassificationrdquo in Advances in Visual Computing G Bebis RBoyle B Parvin et al Eds vol 6939 of Lecture Notes inComputer Science pp 476ndash485 Springer Berlin Germany 2011

[26] W Huang and C L Tan ldquoA system for understanding imagedinfographics and its applicationsrdquo in Proceedings of the ACMSymposium on Document Engineering pp 9ndash18 ManitobaCanada August 2007

[27] M M Hassan and W Al Khatib ldquoSimilarity searching instatistical figures based on extracted meta datardquo in Proceedingsof the ComputerGraphics Imaging andVisualisation (CGIV 07)pp 329ndash334 Bangkok Thailand August 2007

[28] L Yang W Huang and C L Tan ldquoSemi-automatic groundtruth generation for chart image recognitionrdquo in Proceedings ofthe 7th international conference on Document Analysis SystemsNelson New Zealand 2006

[29] G D Tourassi ldquoJourney toward computer-aided diagnosis roleof image texture analysisrdquo Radiology vol 213 no 2 pp 317ndash3201999

[30] D A Chandy J S Johnson and S E Selvan ldquoTexture featureextraction using gray level statistical matrix for content-basedmammogram retrievalrdquoMultimedia Tools and Applications vol72 no 2 pp 2011ndash2024 2014

[31] F Han HWang B Song et al ldquoA new 3D texture feature basedcomputer-aided diagnosis approach to differentiate pulmonarynodulesrdquo in Medical Imaging Computer-Aided Diagnosis Pro-ceedings of SPIE San Diego Calif USA 2014

[32] F Han H Wang B Song et al ldquoEfficient 3D texture featureextraction from CT images for computer-aided diagnosis ofpulmonary nodulesrdquo in Proceedings of the SPIE Medical Imag-ing Computer-Aided Diagnosis vol 9035 pp 1ndash7 2014

[33] S M Eissen B Stein and M Kulig ldquoPlagiarism detectionwithout reference collectionsrdquo in Advances in Data AnalysisR Decker and H-J Lenz Eds pp 359ndash366 Springer BerlinGermany 2007

[34] J Kasprzak M Brandejs and M Kripac ldquoFinding plagiarismby evaluating document similaritiesrdquo in Proceedings of the 3rdWorkshop onUncovering Plagiarism Authorship and Social Soft-ware Misuse (PAN rsquo09) pp 24ndash28 Donostia Spain September2009

[35] C Basile D Benedetto E Caglioti G Cristadoro and MD Esposti ldquoA plagiarism detection procedure in three stepsselection matches and squaresrdquo Donostia Spain 2009

[36] Z A Hamed and S Z Mohd Hashim Hybrid particle swarmoptimization and black stork foraging for functional neural fuzzynetwork learning enhancement [UTMThesis] 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Intelligent Bar Chart Plagiarism ...downloads.hindawi.com/journals/tswj/2014/612787.pdf · Research Article Intelligent Bar Chart Plagiarism Detection in Documents

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014