68
Al-Falah Charitable Trust, New Delhi Global Sci-Tech Al-Falah's Journal of Science & Technology Published by ISSN 0975-9638 (Print) ISSN 2455-7110 (Online) UNIVERSITY 1997 A AA -F H Global Sci-Tech., 10, 4 (187-248) October-December 2018 EDITORS Prof. Saoud Sarwar Prof. Khalil Ahmad www.alfalahuniversity.edu.in www.kurra.co.in Available at: CHIEF EDITOR Prof. Z.H. ZAIDI VOLUME - 10 NUMBER - 4 OCTOBER-DECEMBER 2018 EDITORIAL BOARD Prof. Abdullah M. Jarrah, Jordan Prof. Akhtar A. Khan, USA Prof. Ash Mohd Abbas, India Prof. Carlos Castro, USA Prof. D. Bahuguna, India Prof. H.P. Dikshit, India Prof. H.R. Khan, Germany Prof. Ishwar Singh, India Prof. Lovely Agarwal, USA Prof. M.S. Jamil Asghar, India Prof. Mohd Zulfiquar, India Prof. Mohd. Sharif, India Prof. Mursaleen, India Prof. Pankaj Maheshwari, USA Prof. R.K. Pandey, India Prof. R.M. Mehra, India Prof. Tabrez Alam Khan, India Prof. Vikram Kumar, India Prof. Zahid A. Khan, India Prof. Z.A. Jaffery

ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Al-Falah Charitable Trust, New Delhi

Global Sci-TechAl-Falah's Journal of Science & Technology

Published by

ISSN 0975-9638 (Print)ISSN 2455-7110 (Online)

U N I V E R S I T Y1 9 9 7

A A A-F H

Glo

ba

l Sci-T

ech

., 10

, 4 (1

87

-24

8) O

cto

be

r-De

ce

mb

er 2

01

8

EDITORS

Prof. Saoud Sarwar

Prof. Khalil Ahmad

www.alfalahuniversity.edu.inwww.kurra.co.in

Available at:

CHIEF EDITOR

Prof. Z.H. ZAIDI

VOLUME - 10 NUMBER - 4 OCTOBER-DECEMBER 2018

EDITORIAL BOARD

Prof. Abdullah M. Jarrah, Jordan

Prof. Akhtar A. Khan, USA

Prof. Ash Mohd Abbas, India

Prof. Carlos Castro, USA

Prof. D. Bahuguna, India

Prof. H.P. Dikshit, India

Prof. H.R. Khan, Germany

Prof. Ishwar Singh, India

Prof. Lovely Agarwal, USA

Prof. M.S. Jamil Asghar, India

Prof. Mohd Zulfiquar, India

Prof. Mohd. Sharif, India

Prof. Mursaleen, India

Prof. Pankaj Maheshwari, USA

Prof. R.K. Pandey, India

Prof. R.M. Mehra, India

Prof. Tabrez Alam Khan, India

Prof. Vikram Kumar, India

Prof. Zahid A. Khan, India

Prof. Z.A. Jaffery

Page 2: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

EDITORIAL BOARD

CHIEF EDITOR

Prof. Z.H. ZAIDI

EDITORS

Prof. Saoud SarwarAl-Falah University, Faridabad

Prof. Khalil AhmadAl-Falah University, Faridabad

Prof. Z.A. JafferyJamia Millia Islamia, New Delhi

Prof. Abdullah M. Jarrah, Jordan

Prof. Akhtar A. Khan, USA

Prof. Ash Mohd Abbas, India

Prof. Carlos Castro, USA

Prof. D. Bahuguna, India

Prof. H.P. Dikshit, India

Prof. H.R. Khan, Germany

Prof. Ishwar Singh, India

Prof. Lovely Agarwal, USA

Prof. M.S. Jamil Asghar, India

Prof. Mohd Zulfiquar, India

Prof. Mohd. Sharif, India

Prof. Mursaleen, India

Prof. Pankaj Maheshwari, USA

Prof. R.K. Pandey, India

Prof. R.M. Mehra, India

Prof. Tabrez Alam Khan, India

Prof. Vikram Kumar, India

Prof. Zahid A. Khan, India

Page 3: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Global Sci - TechAl-Falah’s Journal of Science & Technology

Volume 10 Number 4 October-December 2018

CONTENTS

1. Analysis of advance software tools and techniques for software risk 187management and classification

Nadiya Afzal and Mohd. Sadim

2. Collection of software requisite prioritization with analytic hierarchy process 193Rajesh Kumar Singh

3. Comparative study of different classification techniques using weka tool 200Mohd. Soban Siddiqui and Ali Imam Abidi

4. An efficient machine learning model for classification of liver patient diseases 209Tajamul Maqbool

5. Halftoning of images in visual cryptography by direct bin ary search 217Mohammad Mahtab Kazim

6. Study of web mining and its types 227Sushma Pal

7. IOT strategic research and use case scenario: A direction to the smart life 235Preety Khatri

8. Digital watermarking using MATLAB 242Arsheen Neda Siddiqui

Owned and Published by J.A. Siddiqui (Chairman, Al-Falah Charitable Trust) -Global Sci-Tech- 274-A, Al-Falah House, Jamia Nagar, Okhla, New Delhi-110 025.Printed at Alpha Printers, WZ-35/C, Naraina, Ring Road, New Delhi-110 028.

Editors: Khalil Ahmad, Z.A. Jaffery and Saoud Sarwar, 274-A, Al-Falah House, Jamia Nagar,New Delhi-110 025.

Page 4: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of
Page 5: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

187

Analysis of advance software tools and techniques for software risk management

Analysis of advance software tools andtechniques for software risk management

and classification

NADIYA AFZAL1 and MOHD. SADIM2

Department of Computer Science and EngineeringSchool of Engineering and Technology, Al-Falah University

Dhauj, Faridabad, Haryana, IndiaE-mail: [email protected]

ABSTRACT

These days Software development deal with semblance bountiful challenges andrisks. In software development software tools have been used for a long time. Thesesoftware tools are used for performance analysis, testing and verification, debuggingand edifice consideration. Software tools and techniques can be very intelligibleand in significant, e.g. linkers, or actual large and complicated. Several conditionof software development. According to software risk management are complete allthough the exclusive project from derivation to commencement. In softwaredevelopment the risks endure we need to recognize the capacity and equitable ofthe software tools and use the convenient risk management tools and techniques.The intention of this research paper is to determine the analysis of advanced toolsand techniques used for software risk management classification.

Key words: Software development tools, Analysis of advanced tools, Softwarerisks, Risk management, Risk management tools and techniques, Riskassessment, Software engineering tools.

Global Sci-Tech, 10 (4) October-December 2018; pp. 187-92 DOI No.: 10.5958/2455-7110.2018.00027.7

1. INTRODUCTION

There is various software risks elaborate indesign immense kind of software tools thatdemand to be very carefully influenced. Incontempt of obtain advanced technology,innovatory approach and software developmenttools and techniques, software developmentperformance act static abounding of softwarerisks. Individually depend upon distinct IT risksassociated to our software development projects.Software risk identification accumulate it in acommon data storage, determine risks, applyingtools and techniques, determine correctmoderation activity and indication in order thatmitigated risks are decreased. The demand forsoftware project risk management includeextensively determinate beyond complete

software development association with Amazon,Microsoft, Oracle, IBM. The technique of riskmanagement was advanced backward in thesixteenth century in the meanwhile regenera-tion, Stage of invention about the vulnerable ofRisk Management Process (RMP) since 1990excessive character of approach and mechanismaccept accomplish to abode the demand forenhanced adequate risk management[7]. Indispersion through we can distinguish thePUMA[5] and the MRMP[8] in constructionengineering context; the RFRM[6] in systemengineering context, the SHAMPU[2] and thePMBoK[9]. In project management context thestandard of the AS/NZS 4360[4] and the DOD[3] in public application context etc. In this paperwe acquire considered and correlated ultimateof risk affiliated argument in software

Page 6: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Nadiya Afzal and Mohd. Sadim

188

engineering condition.

2. APPLICATION

The appraisal of software tool is elevated ifthere are software analysis agenda applicable.Several tools have described risk division duringthe time not all deteminated risks should beconsiderd the aforesaid. Some classified risksare acceptable to eventualize. If some achieved,would have a extensive concussion. Riskanalysis and management depends on the typesof risks being considered[13].

2.1 Methodological Risks

Methodological Risks that are affiliatedamong the administration of the softwarecommodity and introduce disagreement withcommunication, assignment amount, contem-plate occupational, principle, approach, aspect,constancy representation arguments. Alike ifthere are no intermediate activity conversionin comprehensiveness, accidental industrialaggravation can also change the avocationinferior. Project conductor power to know thetechnologies they are applying in the assign-ment very fine but when they accommodate itwith a distinct fundamental, it's an entireclutter.

2.2 Economic Risks

These Economic Risks Accommodate withavailable resources, principle and estimatedexpenses release and appearance oncontribution compulsion. These economic risksare correlated with the revenue of the softwarecommodity at the time of software developmentin conjunction with its closing non-returnexpenditure, non-repeat expenditure, stableexpenditure, flexible expenditure, profit/lossallowance and reality.

2.3 Human Resource Risks

Human Resource Risks introduce assistantsdecrease acquaintance and principlesdisagreement, equitable and dutiful contention,assistants difference, and abundance difference.Alternative capital risks implicate inadequacyor delay conveyance of attachments andcontribute, incapable tools, incapableequipment, allocated position, insufficiency ofartificial intelligence capability and dense

acknowledgment.

2.4 Program and Capacity Risk

These risks are combined with the programand capacity of the software commoditymeanwhile augmentation. Risk in scope isfrequent in IT projects and to some extent theyare quite logical.

3. ESTIMATION OF APPLICATION SOFT-WARE RISK MANAGEMENT

Risk identification and risk determinationshould be concluded as primitive as availableto abbreviate adverse alteration and to increaseabsolute consequence at the same time projectdevelopment. Estimates software risks agent todetermining the implementation of capacityrisks. During the determination of riskestimation the automatic tool control dispensepredefined confirmed method that would assistthe qualified to conduct management assess-ment. Different procedure to software riskmanagement has been following advanced andused in the software engineering conditions.Though despite of certain analyzing andacquaintance advertised as concerns riskmanagement, the software management, in aconventional approach, not achieve to proceedfrom classical to consider and authority therisks concluded the development of theirproducts[11]. According to Johnson[10] twoapproaches to software project management canbe identified, traditional and risk-oriented.Acknowledge to be comparable is influenceablein attributes and accord with complicationscollective to all software projects analytical andassignment categorical problems as theyappear. The following accession, despite isaggressive as it aspire to conduct exclusivecondition of a characteristic project antecedentlythey appulse the project. Risk inquiry andadministration are commonly established on theenlightment levelheaded from long establishedattainments, or collateral well-known canister,practicality, conclusion of analysis, evaluatingof unrevised disclosure. The inaugural anythingfor the motorized tools is to assemble ancientdata to habits up a database. Earlier thedatabase escape, it will evolution the data andfount approximately applied enlightment toguidance the controller consider risks and

Page 7: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

189

Analysis of advance software tools and techniques for software risk management

adjust accommodation. Today's software toolscan mechanize abundance all intentionemanation in a basic archive communal by allcustomers. Requirements and modulation canbe analyze, characteristic and prioritized.Employment is imitative from fundamental,which can be derivative concluded theintegrated life process. This aid that datacommissary and search should be crucialparadigm when selection the classification.Today we use extensive elite of discretetechnologies and are permitted benefit softwareas we compulsion. Many software purchaserelevate computer tools with enough reducedassemble duration. They want requirement todisremember reverse accession, performance,guidance and resources attempt. Nowadays, thevalue is not characterize as enough by purposeseveral although by combinable. The userassume to action against advance center andapplicant assistant formation to allocatedemployment and data fundamental softwarewith actual time conjugative. Corroborativeauspices, established, and risk approach wouldhelp customer determine on the accurate basethe attendant constructive disseminated in theorganization life process investigation ofcharacter authority integral for enterprisesestablishment, confirmation of software,customer, agents, mechanization, engineeringaccord, exploration of dispute respectingimaginable intimidation to classificationexploitation in conjunction with notificationagreement and certainty opposed torevolutionary calculation of system organizationcomplex conveyance factor, conformation,commendation, logical organization use andoptimization[1]. Frequently, in conditions whererisk estimate are achieve but are notstandardized, risk evaluations may vary fromone assessor to the next. Whether anappropriate activity is appropriated confide onthe accurate reviewer, allusion that companionargument be authorized point acommunal scenepresence actuality discordantly. Deprivedincoherent risk estimate a exclusiveclassification should be pre-owned into clusterand command risk management convertibleenterprises. The organization should confirmthat risk management compassion inception areengaged and from risk associated exertion

opposite the completed IT project. PMBOK[9] bythe Project Management Institute(PMI), is aproject management conductor, and anworldwide accepted definitive, indulge thefoundation of project management toarchitecture, software, engineering technology,automotive etc. Risk management constitutesa cardinal of evolution are:

1. Risk administration devising

2. Risk Identification

3. Risk estimation quantifiable

4. Risk perceptible dissection

5. Risk acknowledgment devising

6. Risk administration devising

7. Risk administration contrivance

8. Risk Identification

9. Risk class

10. Designate anticipation and significance

11. Conclusions aggregate

12. Consequences by equitable

13. Assumptions testing

14. Data precision ranking

15. Risk perceptible dissection

16. Characteristic and activity risk

17. Credibility assessment

18. Consciousness and accommodation shrubresolution

19. Duplication approach

20. Risk acknowledgment devising

21. Risk Responses should be:- Adapted- Expenditure adequate- Appropriate, practical- Admit (finance)

22. Risk administration devising- Developing, consecutive activity- Risks audit- Advanced risks determinate- Capability of risk administration

estimated

4. ACCESSIBLE TOOLS AND TECHNIQUES

Applications of tools and techniques are

Page 8: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Nadiya Afzal and Mohd. Sadim

190

elevated-aspect software products to theorganization on duration. According to theorganization demand, it is crucial to assetartificial intelligence tools and techniques alongimmense efficiency anticipation to guidanceadministration accomplish individual agree-ment. Software risk estimation and organizationcan be fractionally reassigned towards dataanalysis or data mining. Computerized tools areconstruct to aid the project administrator indevising and perspective up design, delegatecapital to effort, inference movement, organizingcosts, demand, contraction and risks as usefulas consider attempt charge. To accommodatelikewise effectual course of risk management,software tools and techniques, that areenlightened and modifying to risk managementapproach.

4.1 Effective Risk Adminstrator

Effective Risk Adminstator (ERA) is theworld's dominating Establishment Risk Manage-ment (ERM) software, assortment. Variousacknowledge, concurrence-attract clarification,ERA distribute bit cost and adequacy to itscustomer. Beside its muscular and differentcombined advance, ERA is the isolated ERMexplanation that forward the risk managementcommittal of the unified formation. Againstsubsist activity and affairs risk concluded tocritical function devising, ERA assistformulation determinate, inspect, domination,dropper, moderate and communiqué on riskbeyond the industry. ERA is the award-victorious base capacity. ERM in unusual of thecreation ultimate admired managementinclusive of London subterrestrial CrossrailLockheed Martin EADS US Department ofHomeland Security, UK MOD, Saudi Armco, RioTinto, Bechtel and Skanska. Connected withabundant explication bag in addition ERA Riskachievement official ERA Risk Combinative, ERAand ERM extensive, ERM overture a exhaustiveERM illumination with ability that summateequivalent to accumulation the performance ofrisk organization.

4.2 Risk Supervision

In consideration of risk supervision in 1993has been a cosmopolitan captain in maintainrisk valuation clarification.risk supervision

convinced that execute collateral andconcurrence risk conclusively if amplificationallure.

• Global references that are demonstrate-established and proceed from the riskclone communicate by ISO 32001,Sandia Lab and FEMA.

• A framework which can be comfortablypersonalized aside the client to achieveeither from any category of riskappraisement in order that is compatibleto individual organization.

• An organization imitation so thataccommodate our client amidst an oddglimpse of risks crosswise the allocatedorganization.

• Assurance established on a actual timerisk addition to centralize applicationamplify development.

• Automation materialist clarification sothat is achieved to abutment collectionof ecological community.

• Penetrating bend enlightment andagility against the appear bluff andaccountability.

• Appeased information along millenary ofU.S and worldwide adjustment, comparemanner and fact abstraction.

4.3 Contingency

Contingency is a extensive risk analysis tooland techniques that consolidate seamlesslyincluding Microsoft Project to specify theexpenditure and agenda inconclusivenessaffiliate by project contrivance. Accuratelyconclude according to which project will takeor how much it will cost is approximatelyoffensive, and isolated count budget duringassignment endurance and amount could bedesperately ambiguous. Contingencyexperienced Monte Carlo-established reflectionproficiency to defence subject just as "What arethe prospective of achieve by 2/28/2002"or"According to what expecting that budget willbe under $9 million" excessive organizing entityhave implemented these tools and techniques.Contingency conduct this intensity for PC atan offer amount.

Page 9: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

191

Analysis of advance software tools and techniques for software risk management

4.4 Risk Accumulate

A Risk Accumulate is occupying in St.John's NL, and has fashioned actual compactto be a dominating contributed of cobwebderived risk and affirmation explanation fromthe time 2006. Risk accumulate was develop athought and compel by Craig Rowe, that onehas been in the risk and indemnificationmanagement from the time of 1989. Against hisexperienced and comprising comission inmanagement affiliation similar as the Risk andassurance industry community. Craig generatesto articulating reverse risk that industrycommonality associated risk management. Itassociated characterize culminating conventionthat employment formerly manipulate althoughconsider of risk management.

Risk Accumulate endure to contributeconcern endowment application and riskunraveling at intermediate advertise costs. Ourelucidation our to correlate, motorized anddivision resident risk, coverage and privilegedata. Risk accumulate explication condition ourprospect to designate their pretense and riskfiner, elementary and amassed figureproductively. Community and managementshould be adept to approach creation classyunfolding applying the most moderntechnologies. They should expect to geteverything demend beyond through for theeffects they do not obligation. Risk accumulateexplanation are continually elaborate includingour patron. As a web-based clarificationprovider, our customers are consistently testingthe current version. We add appearance,augmentation and employment automatically,so the adjustment is insistent and not disorderlyto the customers.

4.5 Risk Detecting Enterprise (RDE)

Risk Detecting Enterprise contribute actualchoice business at the Enterprise, Program, andsoftware Project elevation during completed aRisk Management Program. The employmentframe of reference abounds with riskmanagement program development andfundamentals. Risk Detecting Enterprisedelegate administration and afford theircompany the clarity they depend upon toclassify, evaluate, indication, domination,

modify and brief Risk convenience. It approvebudget active management of budget, catalog,industrial risk/achievement risks/probabilityrisk inward a banal extensible and businessschema. It development the clarity of affairsrisks by order them to recognized, figure out,impress, alleviate, and containment them. Thatsimplify toward colossal bundle of almighty andpeople-time to conserved also activity vitalityexpressed on-actual time and on-cost.

5. CONCLUSION

This paper has given the characterizationof the analysis of advance tools techniques forrisk management in software Engineering RiskManagement is appropriate crucial posture ofsoftware development. In exception aspect ofthe software development the project is estimatefor the risks management. The involvement ofrisk management that accumulation includingthe complications of the developed manage-ments. The risk management tools andtechniques which are enlightened andprogrammed are extensively passed down.Equivalent tools and techniques have thecontents to pre-owned with any softwaredevelopment methodology, in case ofconventional, agile, or alike a consolidation ofsystem. There is no commendable or awful toolsfor risk management as the enclosure is alleviateconcealed analysis and innumerable latest toolsand techniques are actuality liberated to theorganization. Different tools and techniques areused to the demand of the project.

REFERENCES

[1] S. Avdoshin and E. Pesotskaya .Business informatization. Managingrisks, Moscow: DMK Press, [In Russian],pp. 176, (2011)

[2] C.B. Chapman and S.C. Ward. ProjectRisk Management, Processes,Techniques and Insights, 2nd Edition.John Wiley. Chichester, UK. (2003).

[3] E.H. Conrow. Effective RiskManagement: Some Keys Success, 2nd

Edition. American Institute of Aeronauticsand Astronautics. Reston, USA. (2003).

[4] D. Cooper. Tutorial Notes: The

Page 10: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Nadiya Afzal and Mohd. Sadim

192

Australian and New Zealand Standardon Risk Management (AS/NZS 460).Retrieved: may (2004) from http://www.broadleaf.com.

[5] A. Del Cano and M.P. De La Cruz.Integrated methodology for project riskmanagement. Journal of ConstructionEngineering and Management. 128(6):473-485, (2002).

[6] Y.Y. Haimes, S. Kaplan and J.H.Lambert. Risk filtering, ranking andmanagement framework usinghierarchical holographic modeling. RiskAnalysis. 22(2): 381-395, (2002).

[7] Y.A. Kwak and J. Stoddard. Project riskmanagement: lessons learned fromsoftware development environment.Technovation. 24: 915-920, (2003).

[8] J. Pipattanapiwong. Development ofMulti-party Risk and UncertaintyManagement Process for an Infra-structure Project. Ph.D. Thesis, Kochi

University of Technology. Kochi, Japan.(2004).

[9] PMI (Project Management Institute). AGuide to the Project Management Bodyof Knowledge (PMBoK). Newtown Square.Pennsylvania, USA. (2004).

[10] D.L. Johnson. Risk Management andthe Small Software Project, viewed 4 May(2009). http://www.sei.cmu.edu/iprc/sepg2006/johnson.pdf

[11] T.G. Kimer and L.E. Concalves .Software Engineering Techniques:Design for Quality, IFIP InternationalFederation for Information Processing.227: 149-154, (2006).

[12] Sergey M. Avdoshin and Y. Elena.Software Risk Management: UsingAutomated Tools. Pesotskaya School ofSoftware Engineering, SoftwareManagement Department, NationalResearch University Higher School ofEconomics, Moscow, Russian Federation.

Page 11: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

193

Collection of software requisite prioritization with analytic hierarchy process

Collection of software requisite prioritizationwith analytic hierarchy process

RAJESH KUMAR SINGHDepartment of MCA, Bhagwant University, Ajmer, Rajasthan

E-mail: [email protected]

ABSTRACT

Selection of software requirement prioritization is an impression of techniques forprioritization of requirements for software realization. Prioritization is an essentialtrace in the direction of generating superior settlement concerning inventionpreparation for particular and numerous clemencies. Varieties of conditions ofperformances are measured, such as importance, risk, cost, etc. Prioritizationconclusions are made by contributor, as well as customers, administrators, planner,or their presentational. Requirement prioritization methods are set how to mergepersonage prioritizations based on generally objectives and constraints. A collectionof special techniques and aspects are useful pattern to exemplify their service.Finally, restrictions and imperfections of existing process are indicated, and openanalysis query in the field of requirements prioritization are deliberated.

Key words: Requirements analysis, software product planning, requirementsprioritization Software realization.

Global Sci-Tech, 10 (4) October-December 2018; pp. 193-199 DOI No.: 10.5958/2455-7110.2018.00028.9

1. INTRODUCTION

In daily life, we make various selections, e.g.when purchasing a T.V, meal, a mobile, etc.habitually, we are not even awake of makingsingle. Typically, we do not have extra optionsto regard as such as which brand buy, orwhether to take this cab or another one. Evenwith just options of selections can be tough toprepare. When we have multiples of options,selections becomes much tougher.

One of the crucial steps to make rightselections is to prioritize between dissimilaroptions. It is often not understandable whichalternative is superior, since numerous aspectsmust be taken into contemplation. For example,when buying a new car, it is comparativelysimple to make a preference based on speed,cost, security, comfort.

When taking into consideration variousappearance, such as rate, protection, relieve,or baggage weight, the option becomes much

tougher. When evolving software systems,comparable deal must be contrived. Theperformance that is most influential for theconsumers may not be as significant whenadditional appearance is particularized in. Weneed to advance the performance that is mostcraved by the consumers, as well as leastprecarious, least expensive, and so onward.Prioritization benefits to deal with thesemultifarious assessment troubles. This chaptercontributes a depiction of accessible proceduresand tactics, and how to move towardsprioritization circumstances.

The remaining sections as follows: First, anoverview of the area of prioritization is given itis followed by a presentation and discussion ofdifferent aspects that could be used whenprioritizing. Next, chapter organized prioriti-zation techniques and characteristics providean example of prioritization. The remaining partis conclusion and future research in the field ofrequirements prioritization.

Page 12: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Rajesh Kumar Singh

194

2. SOFTWARE SELECTION REQUIREMENTSPRIORITIZATION

Software requirements prioritizationpositions are not distinctive to softwareengineering. Other regulation as psychology andorganizational activities deliberated constrainedsystematically[1]. Software requirementimportant representative have been graphed toa variety of requirements engineering actionsto demonstrate the similarities. Support inrequirements engineering. Current softwaremainly targets on requirements prioritization,an essential part of decision-making[3].

The purpose is to explain the existingremains of comprehension in the requirementsprioritization field. The superiority of a softwareproduct is frequently resolute by the capabilityto assure the requirements of the clients andusers[4]. Hence, extorting and indicating theright requirements and devising appropriatereleases with the right performance are crucialimpressions approaching to the achievement ofa scheme or creation. If the incorrect require-ments are enabled and buyer refuse to acceptusing the creation, it does not material how hardthe creation is or how systematically it has beenexperienced.

The majority software projects have moreapplicant requirements than can be accom-panished within the time and cost impulsions.Requirements Prioritization benefits to recognizethe mainly expensive requirements from thisset by distinctive the significant area.

The procedure of prioritizing requirementscontribute sustain for the following:-

Activities (e.g. [5, 6, 7, 8]):

• For collaborator to make a decision onthe center requirements for thestructure.

• To plot and choose a prepared, mostfavorable firm of software requirementsfor accomplishment in consecutivereleases.

• To exchange preferred assignmentextent alongside from time to timecontradictory restraints for exampleprogram, funds, possessions, time tomarketplace, and excellence.

• To stability the trade profit of eachrequirement alongside its rate.

• To equilibrium indications ofrequirements on the software structuraldesign and prospect advancement of themanufactured goods and its linked rate.

• To choose only a division of therequirements and at rest fabricate aarrangement that will assure theconsumer(s).

• To measure the probable consumercontentment.

• To obtain a technological benefit andadvance retail chance.

• To reduce redraft and agenda slippage(plan stability).

• To hold conflicting requirements, centerthe cooperation progression, anddetermine arguments stuck betweencollaborators

• To set up comparative significance ofeach condition to afford the maximumworth at the minimum price.

The list over obviously shows theconsequence of prioritizing and decisive whatrequirements to take in a creation. This is aplanned procedure because these decisionsconstrain the improvement operating expenseand produce returns as well as creation thevariation among marketplace expand andmarketplace defeat[1]. Additional, the con-sequence of prioritization strength appearancethe foundation of creation and advertisingstrategy, as well as being a powerful strengththrough scheme development. Ruhe et al.summarize this as: "The confrontis to pick the'right' requirements out of a known superset ofapplicant requirements so that all the diverse keybenefit, technological constraints and preferencesof the critical collaborators are satisfied and theon the whole business value of the product ismaximized"[9].

Of course, it is likely to put right mistakendecisions afterward on by means of alteradministration but this can be very expensivebecause it is considerably more luxurious toaccurate troubles afterward in the developmentprocess[5]. Frederick P. Brooks puts it in thefollowing words: "The hardest solitary fraction

Page 13: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

195

Collection of software requisite prioritization with analytic hierarchy process

of building a software system is decidingaccurately what to construct. No other part of thework so cripples the ensuing scheme if doneincorrect. No other part is more hard to put rightlater."[11].

3. ASPECTS OF PRIORITIZATION

Requirements can be prioritized captivatingnumerous diverse appearances into depiction.An appearance is a belongings or feature of anassignment in addition to its requirements thatcan be used to prioritize requirements. Generalappearances are significance, consequence,cost, time, and risk. When prioritizing require-ments based on particular appearances, it issimple to make a decision which one is mostattractive.

When concerning other appearances, suchas cost, consumers can adjust their mentalityand high priority requirements may turn out tobe few imperative if they are very luxuriousto satisfy[12]. Hence, it is necessary to beacquainted with what possessions such rivalrymay have, and it is very important to not onlyconsider significance when prioritizingrequirements but also other appearancesdistressing software development andcontentment with the resultant creation. Anumber of appearances can be prioritized, andit may not be sensible to judge them all. Whichones to believe depend on the accurate situation,and a few examples of appearances appropriatefor software projects are described below.Appearance are regularly calculated bycollaborators in a scheme (administrators,buyers, planner, etc.).

3.1 Significance

When prioritizing consequence, thecollaborators be supposed to prioritize whichrequirements is most significant for theorganization. However, requirements significantcould be an enormously comprehensive thoughtsince it depends very much on which perceptionthe collaborators have significance possibly willfor example be necessity of accomplishment,consequence of a requirement for the producestructural design, deliberate importance for thefirm, etc.[13].

3.2 Consequence

It is probable to appraise the punishmentthat is proposed if a requirement is notfulfilled[7]. Consequence is not just the contraryof significance. For example, fault to be conven-tional to a normal might acquire a soaringconsequence even less for the consumer.

3.3 Cost

The accomplishment cost is generallypredictable by the increasing associationamplitude that cost include: complication of theobligation, the capability to recycle obtainablepolicy, the quantity of difficult and credentialsdesirable, etc.[7].

3.4 Time

As can be seen in the section more than,cost in software development is frequentlyconnected to amount of personnel time.However, time (i.e. lead time) is prejudiced bymany other circumstances such as degree ofsimilarity in development, guidance desires,need to expand support framework, completemanufacturing principles, etc.[7].

3.5 Risk

Each assignment sustains a number ofquantities of risk. In assignment management,risk management is used to manage with bothinterior (technological and marketplace risks)and exterior risks (e.g. convention, suppliers).Both possibility and collision have to bemeasured influential the height of risk of anthing or activity[15]. Risk management can alsobe used when preparation necessities into goodsand absolution by analyzing risks that isprobable to cause complications duringexpansion[14, 7].

4.1 Requirements Prioritization Techniques

The intention of any prioritization is toallocate principles to different prioritizationmatter allow organization of a comparativeorganize connecting the items in the position.In our case, the stuff is the requirements toprioritize. The prioritization can be done with avariety of dimension balance and category. Thesmallest influential prioritization level is theordinal range, where the requirements are

Page 14: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Rajesh Kumar Singh

196

prepared so that it is probable to observe whichrequirements are more imperative than others,but not how much additional significant. Theproportion level is more prevailing because it isprobable to compute how greatly further impera-tive one requirement is than an additional. Aneven more controlling level is the supreme level,which can be use in positions where anunconditional number can be authorizes (e.g.number of hours). With superior levels ofdimension, more complicated calculations andcalculations become possible[16].

4.2 Analytic Hierarchy Process

Analytic Hierarchy Process (AHP) is a "multi-criteria decision making" method which wasdeveloped by "Thomas. T. Saaty" in 1972 for"pair-wise comparisons among the alternatives".AHP has been applied in "software testing","business applications", "software requirementsselection"[17,18,19], et al. It is conducted bycomparing all probable pair of hierarchicallyclassified requirements, in order to concludewhich has superior main concern, and to whatamount (usually on a level from one to ninewhere one represents equal significance andnine represents extremely more imperative). Thetotal quantity of comparisons to execute withAHP are n × (n-1)/2 (where n is the number ofrequirements) at each hierarchy level, whichresults in a theatrical amplify in the amount ofcomparisons as the number of requirementsincreases.

4.3 Software Requirements Prioritization

Requirements prioritization desires to regardas numerous diverse appearances, techniques,and collaborators situations. This part presentsextra issues to regard as and many ways to dealwith such issues.

i. Generalization Rank

Requirements are generally represented atdiverse levels of abstraction[20], which causestroubles when prioritizing requirements. Onebasis is that requirements on advancedabstraction levels be inclined to get superiormain concern in pair-wise comparisons[21]. Forexample, if prioritizing requirements in a car, alamp in the dashboard cannot be compared withhaving a luggage boot. Most consumers would

possibly favor a baggage boot over a lamp inthe dashboard but if one had to contrast a lampin the baggage boot and a lamp in thedashboard, the lamp in the dashboard mighthave advanced priority. Hence, it is actuallyimperative that the requirements are not variedat diverse generalization rank[7]. Deciding onthe height of generalization can be complicatedand depend extremely much on the quantity ofrequirements and their complication. With asmall number of requirements, it may be likelyto prioritize the requirements at a little heightof generalization rank while it may be a betterthought to begin with requirements at elevatedlevel and prioritize lower levels within the higherlevels later when having numerous require-ments to prioritize[7]. AHP ropes this approachof decayed requirements into diversehierarchical levels in turn to reduce the quantityof comparisons. In other cases, it may even bea better initiative to immediately prioritize thehigh level requirements, and then leasing thesubsidiary requirements come into thepriorities. If selecting this approach, it isimperative that collaborators are conscious ofthis inheritance[7]. Regnell et al. discuss thedifficulty of having a large number of require-ments to prioritize[22]. They grouped therequirements to make the prioritization simpler.The requirements were divided into a low level(original requirements) and a higher level(requirements were grouped based onassociations). This approach not only reducesthe amount of requirements to prioritize but alsocope with dependencies of requirements.Grouping requirements based on requirementsdependencies. According to the consequence ofthe study, forming logical groups was simpleand the collaborators effectively prioritized atboth levels.

ii. Re-Prioritization

When developing software products, it ispossible that latest requirements will appear,requisrements are erased, priorities ofaccessible requirements alter, or that therequirements themselves change[23]. Hence, itis of incredible significance that the prioriti-zation procedure is able to cope with shiftingrequirements and priorities of previouslyprioritized requirements. When prioritizations

Page 15: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

197

Collection of software requisite prioritization with analytic hierarchy process

are on an ordinal or fixed level this does notestablish any major troubles because the latestor distorted requirement now require to beassigned a assessment, or a right priority. Suchiterations of the statistical task technique havebeen used successfully[24].

When using prioritization on a proportionscale (such as AHP), the situation becomes moreintricate as all requirements be supposed tocompared to all others to set up the accuratecomparative priorities. However, it is likely tomodify this progression by comparing new oradapted requirements with assured allusionrequirements and thereby estimating the virtualvalue. Conversely, this means that the uniqueprocedure is not followed and the consequencemay be different from a absolute reprioritizationeven though the rate versus profit of such aresolution may be superior enough. Cost andprofit must be taken into deliberation whenselecting a prioritization technique. It issignificant to not fail to remember that prioritiesof previously implemented requirements canmodify; particularly non-functional require-ments.

iii. Non-Functional Requirements

Non functional requirements are involvedwith the attribute of presentation of thesystem.non functional requirements classify thesystem and how system will do. Sometimes nonfunctional requirements are define conditionson matrix.

(i) Cost(ii) Classification(iii) Storage space(iv) Arrangement(v) Presentation(vi) Protection(vii) Flexibility(viii) Convenience

5. CONCLUSION AND FUTURE WORK

Requirements engineering is an area with agreat deal do research action. Nevertheless, theobtainable effort in the district of requirementsprioritization is restricted still though therequire for prioritizing software requirementsis attributed in the investigate literature.

Particularly, few experiential validations ofdifferent prioritization techniques and methodsexist. Instead, it is general that new techniquesand methods are introduced and they seem towork well, but the scalability of the move towardhas not been tested (e.g. [9]). However, presentbe real some studies that have evaluated diverseprioritization techniques.

Regrettably, such empirical calculationsmost frequently center on toy systems with afew requirements this is not in fact given thatany confirmation of in case one approach issuperior than one more even though somebeginning confirmation could be originate. Oneof the few trade studies, for example, found thatAHP was not usable with more than 20requirements because the quantity ofcomparisons became too numerous for thepractitioners[21]. Hence, more studies aredesirable when prioritization methods are usedin trade. An auxiliary query that hardly ever isaddressed in requirements prioritizationinvestigate is the query of how much superiorityis in reality needed. Many techniques andmethods are developed and they turn out to bemore and more compound with the aim to givemore assist for practitioners but the con-sequences are rarely used in industry. Instead,professionals use easy methods such asarithmetical obligation. Practitioners live indiverse surroundings than investigationalsubjects and are more imperfect by moment andcost constraints[4]. Hence, significant query toreply is how much superiority is really essentialand pleasing by practitioners. The beyond issuesguide to one more release query concerning atwhat time a procedure or process is appropriate.Accessible experimental studies hardly everconverse factors such as company size, time-to-market limitations, number of collaborators,field, etc. In its place, center is on whether aprocedure or process is improved than anadditional one. Additional resonance movestoward would be to test diverse approaches inan assortment surroundings to get someconsiderate when diverse prioritizationtechniques, appearances, etc. are suitable.In[25] a structure for calculating pairindoctrination is recommended and self-governing (e.g. technique), reliant (e.g. quality),and circumstance variables (e.g. type of task)

Page 16: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Rajesh Kumar Singh

198

are projected for calculating indoctrinationtechniques. A like structure for requirementsprioritization would be favorable. One moreimperative difficulty in the region of require-ments prioritization. However, the contact ofdependencies can be marvelous. For example,prioritization techniques (such as AHP) supposethat requirements are self-determining eventhough we know that they infrequently are[26].We need to find improved habits to holddependencies in well-organized way.

(i) Functional and non-functional require-ments are very diverse even though theyhave a solemn contact on each other.Prioritizing these two completely mutuallyor unconnectedly may not be the finestresolution. Approaches where prioritizationsof functional and non-functional could becollective in competent way are essential.Diverse methods that seem appropriate forprioritizing non-functional requirements areaccessible and it would be appealing tocalculate these empirically in trade settings.

(ii) Other finding ways to unite such access withaccessions more managed to functionalrequirements.

REFERENCES

[1] A. Aurum and C. Wohlin. TheFundamental Nature of RequirementsEngineering Activities as a Decision-Making Process. Information andSoftware Technology. 45(14): 945-954,(2003).

[2] K. Beck. Extreme ProgrammingExplained. Addison-Wesley, UpperSaddle River, (1999).

[3] G. Ruhe. Software Engineering DecisionSupport-A New Paradigm for LearningSoftware Organizations. Advances inLearning Software Organization, Lecturenotes in Computer Science, Springer-Verlag, 2640: 104-115, (2003).

[4] G.G. Schulmeyer and J.I. McManus.Handbook of Software QualityAssurance, 3rd Edition. Prentice Hall,Upper Saddle River, (1999).

[5] J. Karlsson. A Systematic Approach for

Prioritizing Software Requirements.Ph.D. Thesis, Linköping Institute ofTechnology, (1998).

[6] I. Sommerville and P. Sawyer.Requirements Engineering - A GoodPractice Guide. John Wiley and Sons,Chichester, (1997).

[7] K. Wiegers. Software Requirements.Microsoft Press, Redmond, (1999).

[8] AC Yeh. Requirements EngineeringSupport Technique (REQUEST) - AMarket Driven Requirements Manage-ment Process. Proceedings of SecondSymposium, (1992).

[9] G. Ruhe, A. Eberlein and D. Pfahl.Quantitative WinWin - A New Method forDecision Support in RequirementsNegotiation. Proceedings of the 14thInternational Conference on SoftwareEngineering and Knowledge Engineering(SEKE'02), ACM Press, New York, pp.159-166, (2002).

[10] B.W. Boehm. Software EngineeringEconomics. Prentice Hall, EnglewoodCliffs, (1981).

[11] F.P. Brooks. The Mythical Man-Month:Essays on Software Engineering.Addison-Wesley Longman, Boston,(1995).

[12] S. Lausen. Software Requirements -Styles and Techniques. PearsonEducation, Essex, (2002).

[13] L. Lehtola, M. Kauppinen and S.Kujala. Requirements PrioritizationChallenges in Practice. Proceedings of 5th

International Conference on ProductFocused Software Process Improvement.(2004).

[14] L.A. Maciaszek. Requirements Analysisand System Design – DevelopingInformation Systems with UML. AddisonWesley, London, (2001).

[15] J.M. Nicholas. Project Management forBusiness and Technology - Principlesand Practice, 2nd Edition. Prentice Hall,Upper Saddle River, (2001).

[16] N.E. Fenton and S.L. Pfleeger. Software

Page 17: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

199

Collection of software requisite prioritization with analytic hierarchy process

Metrics - A Rigorous and PracticalApproach, 2nd Edition. PWS PublishingCompany, Bosto, (1997).

[17] T.L. Saaty. How to Make a Decision: TheAnalytic Hierarchy Process. EuropeanJournal of Operational Research, 48(1):9-26, (1990).

[18] H. Min. Selection of Software: TheAnalytic Hierarchy Process. InternationalJournal of Physical Distribution andLogistics Management, 22(1): 42-52,(1992).

[19] E.E. Karsak and C.O. Ozogul. AnIntegrated Decision Making Approach forERP System Selection. Expert Systemswith Applications. 36(1): 660-667, (2009).

[20] T. Gorschek. Software ProcessAssessment & Improvement in IndustrialRequirements Engineering. LicentiateThesis, Blekinge Institute of Technology,(2004).

[21] L. Lehtola and M. Kauppinen. EmpiricalEvaluation of Two RequirementsPrioritization Methods in ProductDevelopment Projects, Proceedings of theEuropean Software Process Improve-ment Conference (Euro SPI 2004),Springer-Verlag, Berlin Heidelberg, pp.161-170, (2004).

[22] B. Regnell, M. Host, J. Nattoch Dag,

P. Beremark and T. Hjelm. AnIndustrial Case Study on DistributedPrioritization in Market-DrivenRequirements Engineering for PackagedSoftware. Requirements Engineering.6(1): 51-62, (2001).

[23] D. Greer and G. Ruhe. Software ReleasePlanning: an Evolutionary and IterativeApproach. Information and SoftwareTechnology. 46(4): 243-253, (2004).

[24] A.M. Davis. The Art of RequirementsTriage. IEEE Computer. 36(3): 42-49,(2003).

[25] H. Gallis, E. Arisholm and T. Dyba. AnInitial Framework for Research on PairProgramming. Proceedings of the 2003International Symposium on EmpiricalSoftware Engineering (ISESE'03). IEEEComputer Society, Los Alamitos, pp.132-142, (2003).

[26] B. Regnell, B. Paech, A. Aurum, C.Wohlin, A. Dutoit and J. Natt och Dag.Requirements Mean Decisions! -Research Issues for Understanding andSupporting Decision Making inRequirements Engineering. FirstSwedish Conference on SoftwareEngineering Research and Practice(SERP'01): Proceedings, Blekinge Instituteof Technology, Ronneby, pp. 49-52,(2001).

Page 18: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

200

Comparative study of different classificationtechniques using weka tool

MOHD. SOBAN SIDDIQUI and ALI IMAM ABIDIDepartment of Computer Science

Al-Falah University, Faridabad, Haryana, IndiaE-mail: [email protected]

ABSTRACT

In today's world data mining have increasingly become very large resulted in thegreat need of data mining technique in order to generate meaningful knowledge.Data mining is a fruitful technique to get useful information from a large amountof data stored in database. Data mining tools use to solve a big problem with datamining techniques such as classification,clustering, association rule, and neuralnetwork. Classification techniques are very effective way to classify the data, whichis essential in decision-making process of the big problem. To solve the big problemin an effective manner by classification techniques has been challenges forresearchers. The research describes classification algorithmic discussion of J48,Random forest, IBK, kStar, and Navy Bias. In this paper, compared the performanceof successful classified instances, MAE RMSE, Kappa Statistics, and the error ratemeasurement for different classifier in weka using 10 fold cross-validation andresults indicated none of the classification algorithm is perfect for all differentdatabase. It depends on the database

Key words: Data Mining, Classification algorithms, Weka tool.

Global Sci-Tech, 10 (4) October-December 2018; pp. 200-208 DOI No.: 10.5958/2455-7110.2018.00029.0

1. INTRODUCTION

Now-a-days there is large amount of databeing gathered and stored in databaseseverywhere across the world. The tendency isto keep increasing year after year. That is over1,099,511,627,776 bytes of data and preciousinformation and knowledge is hidden in thisdatabase. It is practically impossible to mine ofthem to extract the feature i.e. information fromthe huge database without automatic method.Throughout the years' no of algorithms havebeen created to extract what is called nuggetsof knowledge from large sets of data. T. Recentlymany Data mining research were done in thevarious domains, such as Mobile commerce.

Data mining is a technique which is use ofdataanalysis tools to discover previouslyunknown, valid patterns and relationships in

large data set. Data mining tools can includestatistical models, mathematical algorithm andmachine learning methods. Consequently, datamining consists of more than collection andmanaging data, it also includes analysis andprediction. Classification technique is a mostimportant task of data mining that capable ofprocessing a wider variety of data thanregression and is growing in popularity. Thereare several areas of Machine Learning (ML), themost significant of which is data mining.Generally people do mistakes during analysesor, possibly, when trying to establishrelationships between multiple features. Thismakes it difficult for them to find solutions tocertain problems. Machine learning is capableto solve these kind of problems, improving theefficiency of systems and the designs ofmachines.

Page 19: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

201

Comparative study of different classification techniques using weka tool

2. LITERATURE REVIEW

The author analysed of different classi-fication algorithms and provide a comprehensivetheory. This theory objective for mining therelationship in diabetes data for efficientclassification. But they need proposed a modelthat can diagnose diabetes dataset[1].

The author proposed in this research paper"Comparative Analysis of Data Mining Tools andClassification Techniques using WEKA inMedical Bioinformatics". Studied the per-formance of Tree Random Forest, J48 decisiontree, Bayes Naïve Bayes and Lazy.IBK. Theycompared classification algorithms based ontheir accuracy, learning time and error rate.They analysed that there is a relationshipbetween execution time in building the treemodel and the volume of data records directly,while there is also arelationship betweenexecution time in building the model and theattribute size of the data sets indirectly.Throughout experiment, they analysed thatBayesian algorithms have good accuracy fromother compared algorithms[2].

The author presented in their article"Evaluating Performance of Data MiningClassification Algorithm in Weka". They provideperformance of different dataset use data miningclassification in the machine learning tool Weka.The main objective of this paper judge the per-formance of different data mining classificationalgorithms on various datasets[3].

In this article "Performance Analysis andEvaluation of Different Data Mining Algorithmsused for Cancer Classification". They have madea comprehensive comparative analysis offourteen different classification algorithms andtheir performance has been evaluated by usingthree different cancer data sets. The resultsindicate that no one better accuracy all othersin terms of the accuracy when applied on allthe 3 data sets. Most of the algorithmsperformed better as the size of the data set isincreased. They analysed and provide acomprehensive theory that do not to stick to aparticular classification method and shouldevaluate different classification algorithms andselect the better algorithm[4].

Paper[5] deals Students Mood recognitionduring online self-assessment test. They usedexponential logic and its formulas forcomputation. Student's previous answers andslide bar status are considered as input. TotalNumber of questions for online self-assessmenttest, Student's goal, and slide bar value are usedas variables for exponential logic. This systemidentifies student's current status of mood andgives appropriate feedback. Limitation of thissystem is student's manually selecting theirmood using slide bar without any automation.

The author[6] purposed on how to improveaspect- level opinion mining for online customerreviews. The Joint Aspect/Sentiment model(JAS) is used to get aspects and aspectdependent sentiment lexicons from onlinecustomer reviews in a unified framework. Theyused Gibbs Sampling algorithm. In Paper[4] anovel weakly supervised cybercriminal networkmining method which can uncover both explicitand implicit relationships among cybercriminalsbased on their conversational messages postedon online social media. Mined two types ofsemantics such as transactional and colla-borative relationships among cyber criminalsusing context-sensitive Gibbs samplingalgorithm. They used probabilistic generativemodel to extract multi-word expressionsdescribing two types of cyber-criminalrelationships in unlabelled messages. They usedconcept level approaches to better grasp theimplicit semantics associated with text.

In this paper the author used differentclassification algorithm on different database.They used three dataset downloaded from UCImachine learning Repository and they used fourclassifier algorithms J48, Multilayer Perceptron,Bayes Net, and Naïve Bayes Update. Thisresearch work has been carried out to make aperformance evaluation above algorithms[7].

Tiwari et al.[8] in their research paper"Performance analysis of Data miningalgorithms in Weka". The author want to findthe accuracy of different data mining algorithmson various data sets.

Bin Othman et al.[9] compared differentclassification techniques using WEKA for breastcancer. The objective of their paper is too carried

Page 20: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

202

out the performance of different classificationor clustering methods for a set of large data.The algorithm or methods tested with the helpof classification algorithm i.e. Bayes Network,Radial Basis Function, Pruned Tree, SingleConjunctive Rule Learner and NearestNeighbours Algorithm. The best algorithm basedon the breast cancer data is Bayes networkclassifier with an accuracy of 89.71% and thetotal time taken to build the model is at 0.19seconds. Eventually, by using the Bayesnetwork classifier has the lowest average errorat 0.2140 compared to others.

Fifteen attributes of real medical data arecollected from dataset. Classification techniqueis a sequence of no of process that are used toclassify the data. Pre-processingmethod is usedto cleansing the data for effective classification,after cleansing the data. In this paper, theauthor studied about two machine learningalgorithm that is C4.5 and Naive Bayesalgorithms. Datasets are divided into threedifferent types of ratio based on average andstandard deviation of each factor of both classand evaluated the accuracy. After evaluate theaccuracy he said C4.5 is gives better accuracythan Naive Bayes, because it gives moreaccuracy with the minimum time taken[10].

In his paper, the author reviewed theclassification technique algorithm in datamining techniques for liver disease disorder andcompared two decision tree algorithms that isFT growth and Naïve Bayes and found out whichalgorithm gives better accuracy. After analysingthe result, the author found Naïve Bayes isbetter than FT growth algorithm with the useof machine learning because, Naïve Bayes(75.54%) gives more accuracy than FT growthalgorithm (72.66) using WEKA Tool. 29 datasetswith 12 different attributes has been used inthis study[24].

3. CLASSIFICATION TECHNIQUES

Construct accurate and efficient classifierfor massive databases is one of the necessarytasks of data mining and machine learningresearch. Classification techniques play themajor role in in every field such as in medical,Artificial Intelligence, Super market etc. There

are several classification methods available indata mining such as decision tree basedalgorithms, rule-based algorithms, NaïveBayesian algorithms, nearest-neighbouralgorithms, neural network, rough set, supportvector machine, distance based methods,associative classification and geneticalgorithms. This study focuses on the followingfive classification techniques.

3.1 J48

J48 algorithm is called as optimizedimplementation of the C4.5 or improved versionof the C4.5. The output given by J48 is theDecision tree. A Decision tree is same as thatof the tree structure having different nodes,such as root node, intermediate nodes and leafnode. Each node in the tree contains a decisionand that decision leads to our result as name isdecision tree. The input space of a data set intomutually exclusive areas by decision tree, whereeach area having a label, a value or an actionto describe or elaborate its data points. Decisiontree is used splitting criterion method tocalculate attribute which attribute is the bestto split that portion tree of the training datathat reaches a particular node[11].

3.2 K-Nearest-Neighbour (KNN) -

KNN is a non- Parametric classificationmethod which is simple but effective in manycases[12]. The KNN has major drawbacks 1. Itslow efficiency- being a lazy learning methodprohibits it in many such applications such asdynamic web mining for a large repository and2. Its dependency is based on selection of a goodfor K. There are[13,14] that are used to signi-ficantly reduce the computation required atquery time such as indexing training examples.As the KNN classifier requires storing the wholetraining set, when this is not at the redundancyof the training set to alleviate this problem[15,16,17 and 18]. Condensed Nearest Neighbor[CNN] by minimizing the number of storedpatters and storing only a sub-set of the trainingset for classification. Additional, the authorproposed the Reduced Nearest Neighbor (RNN)rule that aims to further reduce the storedsubset after having applied CNN. It simplyremoves those elements from the subset whichwill not cause an error[19].

Page 21: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

203

Comparative study of different classification techniques using weka tool

3.3 Random Forest

Random Forest is the ensemble techniqueto find the nearest neighbour predictor. It usesdivide-conquer technique to improve efficiency.Several small weak learners are used toensemble to form one strong learner. Being astrong learner, it can efficiently run on largedatabases. It is learnt as error rate of randomforest depends on two main factors: Correlationbetween to trees and strength of each individualtree[20]. More the correlation, more will be theerror rate between trees and more the individualstrength lesser will be the error rate. Some keyFeatures for our Random Forest are listed below:1. It is unexcelled in accuracy among current

algorithms.

2. It is very effective for large databases.3. It can handle thousands of input variables

without variable deletion.

3.4 K-Star

K-Star is an instance-based learner (IBL)that tries to improve its performance for dealingwith missing values, smoothness problems andboth real and symbolic valued attributes; but itis not known much information about how theway it faces attribute and class noisy, and withmixed values of the attributes in the datasetsInIB classification problems, "each new instanceis compared with existing ones using a distancemetric, and the closest existing instance is usedto assign the class to the new one[21]". Theprincipal difference of K* against other IBalgorithms is the use of the entropy concept fordefining its distance metric, which is calculatedby mean of the complexity of transforming aninstance into another; so it is taken into accountthe probability of this transformation occurs ina "random walk away" manner. The classi-fication with K* is made by summing theprobabilities from the new instance to all of themembers of a category. This must be done withthe rest of the categories, to finally select thatwith the highest probability[22].

4. RESEARCH METHODOLOGY

The research study followed in the work isdemonstrated in Fig. 1. Firstly, the data is pre-processed by removing the noisy or any kind of

outlier (i.e. transmission error). Datasets havebeen downloaded from the Weka Database[23].An important step in the data mining processis data pre-processing. One of the challengesthat face the knowledge discovery process indatabase is poor data quality. Then, Differentdata mining techniques applied on differentdatabase and compared the performance andprovide the comprehensive theory.

5. EVALUATION METRICS

In selecting the appropriate algorithms andparameters that best model the different datasetvariable, thefollowing performance metrics wereused:

5.1 Time : This is referred to as the timerequired to complete training or modellingof a dataset. It is represented in seconds

5.2 Kappa Statistic : A measure of the degreeof non-random agreement betweenobservers or measurements of the samecategorical variable.

5.3 Mean Absolute Error : Mean absolute erroris theaverage of the difference betweenpredicted and the actual value in all testcases; it is the average prediction error.

5.4 Mean Squared Error : Mean-squared erroris one of the most commonly method. It is

Fig. 1. Research Methodology

Page 22: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

204

Fig. 3. Accuracy Details of J48

used measures of success for numericprediction. The mean-squared error issimply the square root of the mean-squared-error. The mean-squared errorgives the error value the same dimen-sionality as the actual and predicted values.

5.5 Root relative squared error : Relativesquared error is the total squared errormade relative to what the error would havebeen if the prediction had been the averageof the absolute value. As with the root meansquarederror, the square root of the relativesquared error is taken to give it the samedimensions as the predicted value.

5.6 Relative Absolute Error : Relative AbsoluteError is the total absolute error maderelative to what the error would have beenif the prediction simply had been theaverage of the actual values.

6. RESULT AND DISCUSSION

An experimental comparison of classificationtechniques is carried out in WEKA. Here, wehave used different database for all the Five DataMining techniques, and differentiate theirperformance on different instances. Alltechnique have been applied on machinelearning tool weka using 10-fold crossvalidation. Firstly, J48 applied on different

database and got the highest accuracy 96.3%for the 'vote' Database as shown in Fig. 2.

Figure 3 shows the performance details ofJ48 when it applied on vote database. Confusionmatrix shows the correct classified instancesi.e. (259 + 160 = 419) correctly classifiedinstances. However, when Random forestapplied on different datasets using weka tooland got the highest accuracy 96.09% for thesame database 'vote' which is lesser than J48in terms of classification success rate. But, forthe rest of the database got the better or equalaccuracy by random forest as shown Fig. 4below.

Figure 5 shows the detailed accuracy of

Fig. 2. Graphically representation of J48 performance

Page 23: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

205

Comparative study of different classification techniques using weka tool

Fig. 4. Graphically representation of Random forestperformance

random forest which is applied on 'vote'database, when Navy Bayes applied on differentdatabase then got the highest accuracy for theiris database i.e. 96 %. However, by the J48classifier got the same accuracy on samedatabase. It shows for the iris database NavyBayes and j48 is optimum classifier algorithm.The Fig. 6 shows the graphically representationof the performance of Navy Bayes on 'iris'database.

Figure 7 shows the detailed accuracy of NavyBayes which is applied on 'Iris' database, when

Fig. 5. Accuracy Details of Random forest

Fig. 6. Graphically representation of Navy Bayes performance

Page 24: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

206

Fig. 7. Accuracy Details of Navy Bayes

Fig. 8. Graphically representation of IBK performance

IBK applied on different database then got thehighest accuracy for the 'iris' database i.e. 95.33% as shown in Fig. 8 where K-star got a highestaccuracy on same database 94.6% as shown inFig. 9.

There are many fluctuations between theclassification success rate of different classifieralgorithm on different database for examplek-star gives the better accuracy than IBK for

the 'glass;' database, however IBK gives thebetter accuracy than k-star for 'iris' databaseshown in Fig. 8 and Fig. 9. So, no one particularclassifier is better for all database. It is dependon the database.

7. CONCLUSION

The objective of this study is to evaluate andanalysed FIVE selected classification algorithms

Page 25: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

207

Comparative study of different classification techniques using weka tool

Fig. 9. Graphically representation of K-Star performance

based on WEKA. The algorithm in WEKA is J48classifier with an accuracy of 96% on irisdatabase. However, Navy Bayes has been giventhe highest accuracy 96% also using same irisdatabase. Random Forest gived the highestaccuracy 96.09% by using vote database. Byusing IBK and K-star got the highest accuracyon iris database 95.33% and 94.6% respectivelywhich are less accurate than J48. They are usedin various healthcare units all over the world.In future to improve the performance of theseclassification.

I had been use the data mining classifiersto compare the performance for differentdatabase. Data mining offers promising waysto uncover hidden patterns within largeamounts of data. These hidden patterns can beuseful to predict future behaviour. Theaccessible of data mining algorithms, however,should be met with caution. First of all, thesetechniques are only as good as the data thathas been collected. Good data is the firstrequirement for good data exploration.Assuming good data is available, the next stepis to choose the most appropriate technique tomine the data. Although, there are trade-offs toconsider when choosing the appropriate datamining technique to be used in a certainapplication. There are definite differences in thetypes of problems that are conductive to eachtechnique. The "best" model is often found bytrial and error: trying different technologies andalgorithms. Often times, the data analyst shouldcompare or even combine available techniquesin order to obtain the best possible results.

The future work will be focused on usingthe other classification algorithms of datamining. It is a known fact that the performanceof an algorithm is dependent on the domain andthe type of the data set. Hence, the usage ofother classification algorithms like machinelearning will be explored in future.

REFERENCES

[1] K. Rajesh and V. Sangeetha ."Application of data mining methods andtechniques for diabetes diagnosis".International Journal of Engineering andInnovative Technology (IJEIT). 2(3),(2012).

[2] David, Satish Kumar, Amr TM Saeband Khalid Al Rubeaan. "ComparativeAnalysis of Data Mining Tools andClassification Techniques using WEKAin Medical Bioinformatics". ComputerEngineering and Intelligent Systems4(13): 28-38, (2013).

[3] Salvithal, N. Nikhil and R.B. Kulkarni."Evaluating Performance of Data MiningClassification Algorithm in Weka". 2(10),(2013).

[4] Nookala, Gopala Krishna Murthy,Bharath Kumar Pottumuthu, NagarajuOrsu and Suresh B. Mudunuri."Performance analysis and evaluation ofdifferent data mining algorithms used forcancer classification". InternationalJournal of Advanced Research in ArtificialIntelligence (IJARAI). 2(5), (2013).

Page 26: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

208

[5] Christos N. Moridis and Anastasios A.Economides. "Mood Recognition duringOnline Self-Assessment Tests" IEEETransactions on Learning Technologies.2(1), (2009).

[6] Xu Xueke, Cheng Xueqi, Tan Songbo,Liu Yue and Shen Huawei. "Aspect-LevelOpinion Mining of Online CustomerReviews". Key Laboratory of Web DataScience and Technology, Institute ofComputing Technology, ChineseAcademy of Sciences, Beijing 100190,China, Management and visualization ofuser and network data, ChinaCommunications, (2013).

[7] V. Vaithiyanathan, K. Rajeswari, KapilTajane and Rahul Pitale. "Comparisonof Different Classification TechniquesUsing Different Datasets". 6(2), (2013).

[8] Tiwari, Mahendra, Manu Bhai Jha andOm Prakash Yadav . "Performanceanalysis of Data Mining algorithms inWeka". IOSR Journal of ComputerEngineering (IOSRJCE) ISSN : 2278-0661, 6(3), (2012).

[9] Bin Othman, Mohd Fauzi and ThomasMoh Shan Yau. "Comparison of differentclassification techniques using WEKA forbreast cancer". 3rd Kuala LumpurInternational Conference on BiomedicalEngineering 2006. Springer BerlinHeidelberg, (2007).

[10] A.S. Aneesh Kumar and C. JothiVenkateswaran . "Estimating theSurveillance of Liver Disorder usingClassification Algorithms". InternationalJournal of Computer Applications, 57(6):095-8887, (2012).

[11] V. Vaithiyanathan, K. Rajeswari, KapilTajane and Rahul Pitale. "Comparisonof Different Classification TechniquesUsing Different Datasets". 6(2), (2013).

[12] D. Hand, H. Mannila and P. Smyth.Principles of Data Mining. The MIT Press.(2001).

[13] T. Mitchell. Machine Learning. MitpressAnd Mcgraw-Hill, (1997).

[14] C.M. Bishop. Neural Networks ForPattern Recognition. Oxford UniversityPress, UK (1995).

[15] P. Hart. The Condensed NearestNeighbour Rule, IEEE Transactions onInformation Theory, 14: 515-516, (1968).

[16] G. Gates. The Reduced NearestNeighbour Rule. IEEE Transactions OnInformation Theory, 18: 431-433, (1972).

[17] E. Alpaydin. Voting Over MultipleCondensed Nearest Neoghbors. ArtificialIntelligence Review. Kluwer AcademicPublishers. 11: 115-132, (1997).

[18] M. Kubat and M. Jr. Voting Nearest-Neighbour Sub-classifiers. Proceedings ofthe 17th International ConferenceOnmachine Learning, ICML-2000,Stanford, CA, 2: 503-510, (2000).

[19] N. Friedman, D. Geiger and M.Goldszmidt. Bayesian NetworkClassifiers. Machine Learning, 29: 131-163, (1997).

[20] Andy Liaw and Matthew Wiener.Classification and regression by randomforest. R news, 2(3):18-22, 2002.

[21] I.H. Witten, E. Frank and M.A. Hall.Data Mining. Practical Machine Learningtools and techniques, Elsevier, Editor.(2011).

[22] J. Cleary and L. Trigg, K*. An Instance-based Learner Using an EntropicDistance Measure, in 12th InternationalConference on Machine Learning. 108-114, (1995).

[23] http://storm.cis.fordham.edu/~gweiss/data-mining/datasets.html

[24] S. Dhamodharan . "Liver DiseasePrediction Using Bayesian Classifica-tion", 4th National Conference onAdvanced Computing, Applications andTechnologies, Special Issue, (2014).

Page 27: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

209

Comparative study of different classification techniques using weka tool

An efficient machine learning model forclassification of liver patient diseases

TAJAMUL MAQBOOLKnowlarity Communication private limited, Gurgoan, Haryana

E-mail: [email protected]

ABSTRACT

Data mining methods/techniques are playing a very important role in healthcarethese days. Due to changed lifestyle of people diseases are becoming common .Among these diseases Liver related diseases are one of the fastest growing ones.Many people have done their bit in this field by classifying data and they haveapplied multiple various classification techniques like Decision Tree, Support VectorMachine, Naïve Bayes, Artificial Neural Network (ANN) etc. to classify the data.These techniques can be very useful in time bound and accurate classification andhence prediction of diseases can be much more easy which in turn leads to bettercare of patients. In this research work, we have provided a comparative analysis ofvarious classification methods over given data set taken from UCI repository. Wehave improved the accuracy of prediction by applying preprocessing, feature selectionand performing many other methods over data. Out of all the classificationtechniques Random forest with recursive feature elimination and feature selectiontechnique has the highest accuracy.

Key words: Classification, Data Mining, Decision Tree, Random Forest, Featureselection.

Global Sci-Tech, 10 (4) October-December 2018 pp. 209-216 DOI No.: 10.5958/2455-7110.2018.00030.7

1. INTRODUCTION

Liver is the most sizeable internal organ ina human body, that plays a significant role inmetabolism and serves many importantfunctions. It is the largest glandular organ of ahuman body. For a normal person a liver weighsaround 3lb which is roughly equivalent to 1.36kgs. Almost every organ of the body is supportedby liver and it very is vital for our survivalitself[6]. Liver disease can be explained as anydisturbance in liver functioning that causesillness. If liver becomes infected/diseased orinjured, the loss of those functions which liverdoes can cause major damage to a human body.Liver disease is also known to as hepaticdisease. Symptoms of liver disease includevomiting, nausea, right upper quadrantabdominal pain, fatigue and weakness.Digestion is hardly hit in case there are any

issues in liver. An early identification ordiagnosis of a liver complication can result inmore efficient treatment and more chances ofsurvival. Classification (Han, J. et al., 2006)[2]is that the method of finding a model thatdescribes and distinguishes data categories orideas, for the aim of having the ability to usethe model to predict category of objects whoseclass label is unknown.

Health care is an important issue in humanlife and facing various diseases by most of thepeople. Liver Diseases is a very serious liverfaced by most of the human beings. Machinelearning techniques are very effective inclassifying liver and non-liver patients withbetter accuracy. Classification is one of the easyand effective technique through which we getbetter accuracy to judge liver data. In this paper,we provide the comparative analysis of various

Page 28: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

210

classification methods over given data set takenfrom UCI repository. We have improved theaccuracy of prediction by applying pre-processing, feature selection and manyperforming many other methods over data. Outof all the classification techniques Randomforest with recursive feature elimination featureselection technique has the highest accuracy.

2. LITERATURE SURVEY

There are many authors that have workedwithin the field of classification for liver patientdata. H. Jin et al., (2014) planned "DecisionFactors on Effective Liver Patient knowledgePrediction" and evaluated on Indian LiverPatient data (ILPD) dataset[4]. P. Saxena et al.,(2013) planned "Analysis of assorted clusteringAlgorithms of knowledge Mining on HealthInformatics"[5]. A. Gulia et al., (2014) planned"Liver Patient Classification mistreatmentIntelligent Techniques"[6].

This research work evaluates the results ofvarious classification algorithms with andwithout the appliance of feature selectiontechnique. The final output obtained shows thatSupport Vector Machine algorithmic rule offershigher performance with associate degreeaccuracy of around seventy one percent ascompared to different algorithmic rules onceevaluated while not applying feature selectionand Random Forest algorithm offers higherperformance with an accuracy of around seventytwo percent as compared to different algorithmsonce evaluated with feature selection.Theyexpressed that the longer term work canembrace the development of hybrid model forclassification of healthcare knowledge. Theytogether planned the utilization of latestalgorithms with aim to achieve enhancedperformance than the techniques used. J.Pahareeya et al., (2014) planned "Liver PatientClassification mistreatment IntelligenceTechniques"[8]. S. Bahramirad et al., (2013)planned "Classification of disease Diagnosis: AComparative Study"[12]. E.M. Hashem et al.,(2013) planned "A Study of Support VectorMachine algorithmic rule for diseaseDiagnosis"[10]. C. Liang et al., (2013) planned"An automatic designation System of diseasemistreatment Artificial Immune and Genetic

Algorithms"[11]. Suryakant et al., (2015)planned "An Improved K-means clustering withAtkinson Index to Classify Liver PatientDataset"[9]. Dr. S. Vijayarani et al., (2015) haveplanned "Liver illness Prediction mistreatmentSVM and Naïve mathematician Algorithm"[7].B.V. Ramana et al., (2012) have planned "LiverClassification mistreatment changed RotationForest"[17]. Reetu et al., (2015) have planned"Medical Diagnosis for Liver Cancer UsingClassification Techniques"[13]. S.Dhamodharan (2014) has planned "LiverPrediction mistreatment Bayesian Classi-fication"[14]. H.R. Kiruba et al., (2014) haveplanned "An Intelligent Agent Based Frameworkfor Liver Disorder Diagnosis Using ArtificialIntelligence Techniques"[15]. A.S. Aneeshkumar et al., (2012) have planned "Estimatingthe police investigation of Liver Disordermistreatment Classification Algorithms"[16].they have worked on Naïve Bayesian and C4.5decision tree algorithms for classification of liverpatient. For this research work, real medicaldata set with 15 different attributes werecollected from a public charitable hospital inChennai. This work offers us a binary i.e.classification. either an individual could be aliver patient or not. The result obtained showsthat Random Forest offers higher performancecompared to other in classification of liverpatient dataset.

3. PROPOSED METHODOLOGY

3.1 Machine Learning Models:

Decision Tree : A decision tree is a tree likestructure where each internal node representsan attribute, each branch represents a decisionand each leaf node represents an outcome.Rootand internal nodes of the tree contain featuretest conditions to separate tuples that havedissimilar characteristics. Once the tree getsconstructed, it becomes very easy to classify thetest records[1].

Support Vector Machine : SVM is a verypowerful supervised machine learning modelused for both regression and classification. Theworking of SVM is to find a best hyper planethat divides the dataset into two classes. Thebest hyper plane gets selected using maximummargin[18].

Page 29: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

211

Comparative study of different classification techniques using weka tool

K Nearest Neighbor : KNN is one thesimplest and most used machine learningclassifier. It can be used for both, regression aswell as classification problems. Actually it is avery simple algorithm which classifies the newcases using the distance functions. The variousdistance functions used in KNN are: Euclideandistance, Manhattan distance and Minkowskidistance. The above distance functions are usedfor continuous variables. For categoricalvariables Hamming distance gets used.

Random Forest : Random Forest can beused to solve both classification and regressionproblems. Random Forest is an ensemblelearning technique that combines multipledecision trees and combines them keeping inmind the end goal to get more exact and preciseoutcomes. It can be also used to find theimportance of features in the data[3].

Naive Bayes : For predictive analysis NaïveBayes is simple but surprisingly powerful model.It is based on Bayes theorem and it assumesthat the features in a class are unrelated andindependent of each other. Bayes theorem canbe mathematically shown as:

P(A|Z) = P(Z|A)* P(A) P(Z)

P(A|Z) = P(Z1|A)*P(Z|A)*….*P(Z|A)*P(A)

Here, P(A|Z) is posterior probability, P(A) isprior probability of class, P(Z|A) is the likelihoodand P(Z) is the prior probability of the classifier.

Multi Layer Perceptron : MLP is the mostcommon type of artificial neural networks. Itconsists of one input layer, one output layerand one or more hidden layers. Mathematicallyit can be shown as:

Y = ψ (Σwixi + b) = ψ ( wT x + b)

Where 'x' is the input vector, 'b' is bias and '?' isactivation function and 'Y' is calculated output.

3.2Dataset Description

The data set we have chosen consists of 10variables that include age, gender, A/G ratio,SGPT, SGOT and Alk Phos, total Bilirubin, directBilirubin, total proteins, albumin. It has recordsfor 416 liver patients and 167 non liver patientrecords. The data set was taken from AndhraPradesh (north east of AP). India. A label knownas selector is a class label used to divide data

into groups (liver patient or not). In total wehave 441 male patient records and 142 recordsfro female patients[19].

Any patient male or female whose age ismore than 89 is chosen as an age of "90".

Attribute Information:

1. Age : Age of the patient2. Gender : Gender of the patient

3. Sgpt Alanine Aminotransferase4. Sgot Aspartate Aminotransferase5. TP : Total Proteins

6. ALB : Albumin7. TB : Total Bilirubin

8. DB : Direct Bilirubin9. Alk Phos : Alkaline Phosphatase

10. A/G Ratio : Albumin and Globulin Ratio11. Selector field used to split the data into

two sets (labeled by the experts)

Table 1 : General Data set information.

S.No. Column 1 Column 2

1. Data Set Characteristics Multivariate2. Attribute Characteristics Integer, Real3. Associated Tasks Classification4. Number of Instances 5835. Number of Attributes 106. Missing Values N/A7. Area Life8 Date Donated 2012-05-21

Environment : This project needs onlySoftware requirements. These are :

OS : Linux or linux like system.

IDE : Jupyter (install via pip)

Main Libraries required :• Sklearn• Pandas• Numpy• Matplotlib• Seaborn• standardscaler

Proposed Model : The process includefollowing steps . Please refer to Fig. 1.

Page 30: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

212

Data Gathering : Data gathering is theprocess of collecting and measuring informationon targeted variables in a given data setsystematically, which then helps one to replyto relevant questions correctly and evaluateoutcomes.

Feature Extraction : It is the process oftaking raw and unstructured data and providingpotential useful variables as outcome in the formof data. It has been found that a feature is acombination of keywords (attributes), whichcaptures essential characteristics andsentiments of the data. A feature extractionmethod detects and filters only importantfeatures which a far smaller set than actualnumber of attributes and makes them a newset of features by decomposition of the originaldata. Therefore this process enhances the speedof supervised learning algorithms[21].

Feature Selection : Once we have thepotential variables a particular subset is chosenspecific to the problem for help in modelconstruction.

Feature Classification : Classification isdone on the basis of some quality of an entity.Labels are attached .

Decision Making : Finally once we have theclassified data we apply various classifiers toaccurately predict the actual problem statementbased on the data set provided. These classifierstake the decision and categorize data accordingto the set of possibilities that can happen.

3.3 Performance Measures used forevaluation

After feature extraction , selection and datanormalization , effectiveness of a model can beevaluated on the basis of some metrics calledas performance metrics. Let us look at some oftypes of metrics without wasting any time[20].

3.3.1 Confusion Matrix

It is one of the most instinctive and simplemetrics that is used for finding the propernessand accuracy of the model. Confusion matrix isused for classification problem where the finaloutput can be some fixed (two or more fixedtypes of classes) number of classes.

Let us try to understand the confusionmatrix by assuming we are solving aclassification problem in which we will bepredicting whether a person is having canceror not.

To start with let's give a label of to our targetvariable:

Fig. 1. Proposed Model

Fig. 2. Confusion matrix for Classification Mode

Page 31: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

213

Comparative study of different classification techniques using weka tool

1: When a person is having cancer0: When a person is NOT having cancer.

Alright! Now that we have pinpointed theproblem, let us try understand how confusionmatrix will be implemented here. It is a tablewith two dimensions ("Actual" and "Predicted"),and sets of "classes" in both dimensions.Columns are taken as Actual and Rows asPredicted ones.

Terms associated with confusion matrix :

True Positives TP : These are the caseswhen actual and predicted are both True.

Ex. If model detects the patient has cancer,actually that patient has cancer .

True Negatives (TN) : These are the caseswhen the real data was 0 or zero (false) and themodel also predicts the value as False (0).

Ex. If model detects the patient doesn't hascancer , actually that patient doesn't has cancer.

False Positives (FP) : These are the caseswhen the actual data was 0 (false) but the modelpredicts the value as True (1).

Ex. A person who doesn't has cancer andthe model classifies his case person havingcancer comes under False Positives.

False Negatives (FN) : These are the caseswhen the actual data was True (1) but the modelpredicts the value as False (0).

Ex. A person who has cancer and the modelclassifies his case person not having cancercomes under False Negatives.Where, TP = True Positive

FP = False PositiveTN = True NegativeFN = False Negative

3.3.2 Accuracy

Proportion of total number of predictionsthat are correctly classified in class C. It isdetermined using below equation:

Accuracy = (TP + TN) / (TP + FP + FN + TN)

It is a good measure when the target variableclasses in the data are nearly balanced.

3.3.3 Precision

% of selected documents that are correctlyclassified in class C out of all documents in classC. It is calculated using equation:

Precision = TP / (TP + FP)

3.3.4 Recall or Sensitivity

Recall is a method that lets us know theexact percentage of patients that actually hadcancer and the algorithm also diagnosed themas having cancer. The actual positives (Peoplehaving cancer are TP and FN). Note: FN is alsotaken here as the Person who actually had acancer but algorithm failed to predict andpredicted otherwise.

Recall = TP / (TP + FN)

3.4 Correlation matrix for data :

It gives us an idea as to how closely ourfeatures/ data are related to each other . Samecan be implemented in the form of a graph agiven in figure 3.

Where 1, 2, …. 10 denotes our features columnwise age,gender , etc. . Correlation ratio of 1means these two features are totally correlated.

3.5 Results :

Now let us apply various techniques and plotthe tables and respective metrics for variousclassification algorithms discussed above. Also

Table 2 : Performance evaluation for various classification algorithms after applying feature selection(60-40).

Techniques Accuracy(%) Precision Recall F1_score

Naive Bayes 58.3 .58 .58 .58SVM 73.0 .53 .73 .61KNN 70.0 .70 .70 .70Decision Tree 63.21 .62 .64 .65Random Forest 75.67 .75 .77 .75MLP 72.67 .72 .72 .72

Page 32: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

214

Fig. 3. Correlation matrix

Fig. 4. Performance evaluation for various classification algorithms with feature selection (60- 40)

Page 33: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

215

Comparative study of different classification techniques using weka tool

we will be varying the percentage of trainingand test data set.

4. CONCLUSION

In medical science, diagnosing of healthconditions is a very daunting and challengingtask in itself for that matter in any field. Onecan provide options to cure only after knowingthe core of any problem. In this paper, we haveapplied many data mining techniques mainlyclassification algorithms as our datasetdemanded for classification of liver patients.Also as more and more digitization of patientrecords is happening that gives us amplereasons to manage this data properly and usethis data after some preprocessing for accurateprediction of such diseases and thus helping inproviding more accurate and better healthcareto the patients. This dissertation work hashelped in increased accuracy of prediction thanearlier after applying thorough data pre-processing techniques. 75.67 is the highestaccuracy for this data set till date which is verygood considering the sample size of the dataset is very small.

5 FUTURE SCOPE

Machine learning and data mining is adifficult technology to get right , however itsbenefits are great. The more amount of data wehave, the more we can train our model and crossvalidate and hence the accuracy willautomatically be on the higher side. In futurewe we will try to get more samples of liver patientsamples and further train our model. Also anapp can be built where a person can enter histest details and bases upon that the likelihoodof that person to be affected with any liverrelated disease can be generated.

REFERENCES

[1] A.K. Pujari, Data mining techniques,4thedition, University Press (India) PrivateLimited, (2001).

[2] J. Han. and K. Micheline, Data mining:Concepts and Techniques, MorganKaufmann Publisher, (2006).

[3] R. Parimala et al., A Study of SpamEmail classification using Feature

Selection Package". Global General ofComputer Science and Technology, 11,ISSN 0975-4172,2011.

[4] H. Jin, S. Kim and J. Kim, DecisionFactors on Effective Liver Patient DataPrediction. International Journal of BioScience and Bio Technology. 6(4), 167-178, (2014).

[5] P. Saxena and S. Lahre, Analysis ofVarious Clustering Algorithms of DataMining on Health Informatics.International Journal of Computer &Communication Technology, 4(2), 108-112 (2013).

[6] A. Gulia, R. Vohra and P. Rani, LiverPatient Classification using IntelligentTechniques, International Journal ofComputer Science and InformationTechnologies, 5(4), 5110-5115 (2014).

[7] S. Vijayarani and S. Dhayanand, LiverDisease Prediction using SVM and NaïveBayes Algorithms, International Journalof Science, Engineering and TechnologyResearch (IJSETR) 4(4), 816-820 (2015).

[8] J. Pahareeya, R. Vohra, J. Makhijaniand S. Patsariya, Liver PatientClassification using IntelligenceTechniques, International Journal ofAdvanced Research in Computer scienceand Software Engineering, 4(2), 295-299(2014).

[9] S. and I.A. Ansari, An Improved K-means Clustering with Atkinson Indexto classify Liver Patient dataset, springer,(2015).

[10] Er. M. Hashem and M.S. Mabrouk, AStudy of Support Vector MachineAlgorithm for Liver Disease Diagnosis,American Journal of Intelligent Systems,9-14 (2014).

[11] C. Liang and L. Peng, An AutomatedDiagnosis System of Liver Disease usingArtificial Immune and GeneticAlgorithms, Springer, (2013).

[12] S. Bahramirad, et al., Classification ofLiver Disease Diagnosis: A ComparativeStudy, IEEE, 42- 46 (2013).

[13] Reetu and Narendra Kumar, Medical

Page 34: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

216

Diagnosis for Liver Cancer usingClassification Techniques, InternationalJournal of Recent Scientific Research,6(6), 4809-4813 (2015).

[14] S. Dhamodharan, Liver Prediction usingBayesian Classification, An InternationalJournal of Advanced ComputerTechnology, (2014).

[15] H.R. Kiruba and G.T. Arasu, AnIntelligent Agent Based Framework forLiver Disorder Diagnosis using ArtificialIntelligent Techniques, Journal ofTheoretical and Applied InformationTechnology, 91-99 (2014).

[16] A.S.A. Kumar and C.J. Venkateswaran,Estimating the Surveillance of LiverDisorder using Classification Algorithms.International Journal of ComputerApplication, 57, 39-42 (2012).

[17] B.V. Ramana and M.S. Prasad Babu,Liver Classification using ModifiedRotation Forest, International Journal ofEngineering Research and Development,1(6), 17-24 (2012).

[18] V.N. Vapnik, Statistical LearningTheory, New York: John Wiley and Sons,(1998).

[19] https://archive. ics.uci .edu/ml/datasets/ILPD + (Indian + Liver + Patient+ Dataset).

[20] Performance Metrics for Classificationproblems in Machine Learning https://medium.com/greyatom/performance-metrics-for-classification-problems-in-machine-learning-part-i-b085d432082b

[21] Polarity Detection in Reviews (SentimentAnalysis) by Manish Agarwal andSudeept Sinha

Page 35: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

217

Halftoning of images in visual cryptography by direct bin ary search

Halftoning of images in visual cryptographyby direct bin ary search

MOHAMMAD MAHTAB KAZIMAbydos Technologies, 85-T, Fourth Floor Sector-7, Jasola Vihar, New Delhi-11 00 25

E-mail: [email protected]

ABSTRACT

The motivation behind this paper is to encode a secret picture SI into n shares ofhalftone pictures in VC scheme. In this paper it proposes a strategy for encoding ashading picture into n significant halftone shares utilizing the scheme of halftonevisual cryptography secret pixels encrypted into shares acquaint commotion withthe halftone pictures, the perceptual mistakes between the halftone shares andthe continuous-tone pictures are minimized as for a human visual framework (HVS)demonstrate. The secret picture can be effectively decoded without demonstratingany impedance with the share pictures. The security of our strategy is ensured bythe properties of VC. The proposed technique can encoded the secret pixels intothe shares using the method of direct binary search (DBS) halftoning strategy forcolor pictures. The mistake between the halftone shares and the continous-tonepictures are minimized as for a human visual framework (HVS) show The recreationcomes about portray that our proposed technique can enhance altogether thehalftone picture quality for the encoded offers of grayscale pictures and colorspictures also utilizing the strategy for DBS to enhance the nature of halftone pictureand revealed secret picture.

Key words: Visual Cryptography, Halftoning, DBS, HVS

Global Sci-Tech, 10 (4) October-December 2018; pp. 217-226 DOI No.: 10.5958/2455-7110.2018.00031.9

1. INTRODUCTION

Visual Cryptography (VC) is a system ofcryptography that permits information(pictures, content and so forth ) to be encodedsuch that the decryption is finished by humanvision. This procedure was first proposed byNaor and Shamir in 1994[1]. They delivered anessential scheme for sharing a secret pairedpicture utilizing their own particular codingtable.

This strategy basically takes the secretpictures, these pictures are partitioned into afew parts, each part called shares. These offersare disseminated among various members toXeroxed or superimposed together to recover thesecret picture.

Visual Cryptography (VC) uses two

transparent pictures, one picture containsuproarious or irregular pixels and the otherpicture contains the secret information. The twopictures are important to remake and uncoverthe secret picture, and the ownership of it ispossible that, only one is futile in deciding thesecret picture, this makes the encryption moresecure.

The benefit of Visual Cryptography plot isthat it dispenses with the calculation issue amiddecoding process, and the secret picturerestored by stacking activity, another recognizedpreferred standpoint of this is it unravelsspecifically amid human vision.

The standards of visual cryptography canbe outlined by considering a basic 2-out-of-2visual cryptography scheme in Fig. 1

Page 36: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohammad Mahtab Kazim

218

Naor and Shamir introduced encryptionplan to share a secret double picture utilizingtheir own coding table. In this plan twotransparent pictures called shares are created.The binary picture is separated into two offers,Share1and Share2, the white pixel of the secretpicture one of the over two lines of Table 1 isproduced Share1 and Share2. On the off chancethat the pixel of the secret picture is dark, oneof the lower two lines of table I is utilized tocreate share1 and 2. In this scheme itcomprises of pixel development where each pixelfrom the secret picture is extended to 2 pixels,so when the offers are produced and super-imposed together the reproduced picture willbe two times the first secret picture measure inlight of this pixel extension. Additionally thedetermination of the reproduced picture will benot as much as the first secret picture as eachwhite pixel is decayed into two white and twodark pixels. Just a single secret could beconcealed utilizing this method. In underneathtable picture are appeared for making offers ofwhite and dark pixels.

Now consider the superposition of the twoshares as shown in the Fig. 1. If the pixel pwas black the superposition of the two sharesoutputs two black sub-pixels corresponding toa grey level 1, If p is white it results in one whiteand one black sub-pixels corresponding to agrey level ½ compared with the secret imagethere is a contrast loss in the reconstructedimage.

Based on the process described above wecan construct two shares for SI, as shown inFig. 2(b) and Fig. 2(c). Superimposing the twoshares leads to the output secret image shownin Fig. 2(d). The decoded image is clearlyidentified, although some contrast loss occurs.The width of the decoded image is twice that ofthe original secret image since each pixel isexpanded to two sub-pixels in each share asshown in Fig. 1.

In this paper, we proposed a improved DBSVC which relies on the simple operation of DBSby using threshold image. The quality of theencoded secret pixels is totally diffused awayand high quality halftone shares that shownatural images are produced. The secret imagecan be decoded without showing anyinterference with the share image.

Fig. 1. Construction of (2, 2) VC Scheme

Fig. 2. Example of 2-out-of-2 scheme

Page 37: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

219

Halftoning of images in visual cryptography by direct bin ary search

2. HALFTONE VISUAL CRYPTOGRAPHY

Previously the developments of VC are onlyrely upon combinational systems in thehalftoning structure of visual cryptography asecret binary image is encoded into high qualityhalftone pictures or halftone shares. Inparticular, this technique applies the richhypothesis of blue noise halftoning to thedevelopment phenonmenon utilized as a partof traditional VSS scheme to produce halftoneshares, while the security properties are as yetkept up, the decoded secret picture has uniformcontrast.

The halftone shares convey significant visualdata to the reviewers, for example, landscape,building, and so forth the visual quality receivedby the new strategy is fundamentally superiorto anything that accomplished by any accessibleVSS technique known to date. Thus,adversaries, investigating a halftone share, areless inclined to presume that cryptographic datais covered up. A higher security level issubsequently accomplished[2].

Halftone VC was proposed in[2] and is basedupon the matrices and collections accessible intraditional VC. A secret paired pixel p in halftoneVC is encoded into a variety Of Ql x Q2 sub-pixels, called halftone cell, in every one of the nshares. The pixel development in halftone VCis along these lines Ql x Q2. The secret pixel pin the picture reconstruction can be outwardlydecoded with contrast 1/Q1Q2[17]. In a (2,2)halftone visual edge conspire, a halftone pictureI, got from a dim scale picture, is allocated tomember 1. Its integral picture I, acquired byturning around all dark/white pixels of I towhite/dark pixels, is appointed to member 2.To encode a secret pixel p into a Ql x Q2 halftonecell in every one of the two offers, just 2 pixels,alluded to as the secret data pixels, in everyhalftone cell should be changed. Both secretdata pixels ought to be at similar places in thetwo offers. In the event that p is white, aframework M is arbitrarily chosen from thegathering of lattices C0. On the off chance thatp is white, M is arbitrarily chosen from thegathering of lattices C1. The secret data pixelsin the ith (i = 1, 2) share are supplanted withthe two sub-pixels in the ith line of M. Thesechanged pixels convey the secret data of the

encoded picture. Alternate pixels in the halftonecells that are not adjusted are conventionalpixels[2]. As in the above method, the deter-mination of the secret data pixels in a halftonecell is imperative as it influences the visualnature of the resultant halftone shares. Forincreasing better visual outcomes the void andgroup calculation[3] which grows the minoritypixels as comparatively as conceivable wasutilized to accomplish enhanced halftone picturein each share, but using the void and clusteralgorithm has two disadvantages.

To begin with, it is computationally costly.Each binary pixel in the first halftone pictureneeds to experience a nonlinear low passchannel which needs concentrated calculation.Second, utilizing the void and group calculationto pick the places of secret data pixels reallymakes the positions subject to the white/darkpixel conveyance on the first halftone picture.The recreated picture may uncover some hintof the first halftone picture the secret picture isencoded in.

3. HALFTONING OF IMAGES USING ERRORDIFFUSION

3.1Error Diffusion

Halftone image is obtained by applying theerror diffusion[4] algorithm. This method is usedas it is simple and effective. The Error Diffusionalgorithm is designed to preserve the averageintensity level between input and output imagesby propagating the quantization errors tounprocessed neighboring pixels according tosome fixed ratios. To produce the i-th halftoneshare, each of the three color layers are fed intothe input.

Page 38: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohammad Mahtab Kazim

220

Depends on current input and output and alsoon the entire past history. Fig. 3. shows theblock diagram of the Error Diffusion algorithmand Fig. 4. shows the Floyd and Steinberg errorfilter.

Halftoning process converts a continuous-tone image grayscale image into a binary valuedimage using algorithms like Error diffusion.Using the secret image and multiple grayscaleimages, halftone shares are generated such thatthe resultant halftone shares are no longerrandom patterns, but take meaningful visualimages. A secret binary pixel p is applied withvisual secret sharing pixel expansion to generatesubpixels which are generated on random basisfrom matrix collections C0 and C1. Then thesubpixels are encoded into a block of thehalftoned image of size q=v1*v2, referred to asa halftone cell, in each of the n shares. Errordiffusion diffuses quantization error over theneighboring continuous tone pixels using errorfilter.

4. HALFTONING OF IMAGES USING DIRECTBINARY SEARCH

DBS looks to minimize the mean-squarederror between perceptually filtered adaptationsof the continuous tone original and binarypictures. This is done by looking over the picturepixel-by-pixel. At every pixel, we think about

Fig. 3. Error diffusion block diagram

Fig. 4. Error Filter

Fig. 5. Original Grey Scale Image of Lena Fig. 6. Halftone Image By Error Diffusion

Page 39: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

221

Halftoning of images in visual cryptography by direct bin ary search

changing the condition of that pixel (flip), orswapping it with every pixel that vary in statefrom it. After the impact of every one of thistrial changes is evaluated, we keep that change,assuming any, which diminishes the mean-squared error the most. A single iteration of thealgorithm comprises of a visit to each pixel inthe picture. The algorithm ends when no changeis continued during the whole iteration. Thistypically requires 105 iterations, regardless ofthe size of the image.

Direct Binary search (DBS) uses a humanvisual framework (HVS) model to minimize theerror between the continuous tone picture andthe yield halftone picture via looking for bestpossible configuration of the binary values inthe halftone picture iteratively [5]. The HVSmodel is a is a linear shift invariant low passfilter based in view of the contrast sensitivityfunction (CSF) of the human visual framework.

4.1 Given an error metric:

d(I(x, y), b(x, y))

Example: d(I,b) = Σ((I(x, y)–b(x, y))2)

Initialize binary image b(x, y) (example –choose random binary image).

Randomly chose a pixel (x0, y0) in b~(x, y) ifd(I, b)~ < d(I, b) then assign b = b where b is bexcept for b(x0, y0) =1–b(x0, y0). Repeat last stepuntil |d(I, b) – d(I, b)| is "small".

Error metric can be "smart" for examplebased on Human Visual System.

An outline of the basic steps of the algorithmis given below:

1. Generate an initial halftone image Ih(m,n)

2. Compute the error e(m, n) in approximatingI(m, n) by Ih(m, n)

3. Iterate the following: For each pixel (m, n):

a. Check which of the following changesto Ih decreases e the most:o swapping pixel (m, n) with one pixelo toggling pixel (m, n) to the opposite bit

b. if any of the possible actions decreasese, perform the best one

c. update e

The DBS algorithm needs to evaluate manytrial toggles and swaps; recomputing the fullmatrix ê at each trial is infeasible.

The swapping or toggling introduces only asmall, localized change to ê. To reducecomputation in DBS, it is attempted to avoidswaps and toggles that do not decrease the errorsignificantly[6]. The adaptive search and swapis given below:

1. Split the image into m × m size blocks

2. Sort each block based on | ê (i, j)|

3. Generate an initial search set S

4. Repeat the following until the ending criteria(εn – εn – 1)/εn – 1 < 0.01 is met. For eachpixel (m, n) in S:a. Remove (m, n) from Sb. Process pixel (m, n)c. if pixel (m, n) is changed, add to S

5. For each pixel to process:

a. If the best trial change is a toggle,perform the change and continue to thenext pixel. Otherwise:

b. If the best trial swap gives ε< threshold,theno Perform the changeo Update ε

Thus halftone shares are produced that hasminimal difference with respect to the originalcontinuous-tone image in the sense of HVS andalso contains the secret image information. Allthe shares except the complementary share areproduced via DBS with flipping the pixel.

4.2 Universal Image Quality Index (UQI)

Picture quality measures play vital roles indifferent picture handling applications. Thereare essentially two classes of target objectivequality or distortion assessment approaches.The first are numerically characterizedmeasures, for example, the generally uses meansquared error (MSE), peak signal to noise ratio(PSNR), root mean squared error (RMSE), meanabsolute error (MAE), and signal-to-noise ratio(SNR). The other class of measurement methodsthink about human visual framework (HVS)attributes trying to consolidate perceptual

Page 40: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohammad Mahtab Kazim

222

quality measures[7, 8]. Undortunately, none ofcomplicated objective metrics in the theliterature has demonstrated any reasonablefavorable position over basic numericalmeasures, for example, RMSE and PSNR understrict testing conditions and image distortionenvironments[10]. Mathematically characterizedmeasures are as yet fascinating because of tworeasons. To begin with, they are simple tocalculate and usually have low computationalcomplexity. Second, they are autonomous ofreview conditions and individual observers. Inspite of the fact that it is trusted that the viewingconditions assume critical parts in human viewof image quality, they are, in general cases, notsettled and particular information is usuallyinaccessible to the picture analysis system.Consider that there are N diverse viewingconditions, a viewing condition-dependentstrategy will create N distinctive measurementout comes that are not convenient to use. Inaddition to calculate the viewing conditionsand to compute and input the conditionparameters as well.

If two images f and g are considered as amatrices with M column and N rows containingpixel values f[i, j], g[i, j], respectively (0 > i > M, 0> j > N), the universal image quality index Qmay be calculated as a product of threecomponents:

where

The first component is the correlationcoefficient, which measures the degree of linearcorrelation between images f and g. It varies inthe range [-1, 1]. The best value 1 is obtainedwhen f and g are linearly related, which meansthat g[i, j] = a f[ i, j]+b for all possible values of iand j. The second component, with a value rangeof [0, 1], measures how close the meanluminance is between images. Since σf and σgcan be considered as estimates of the contrastof f and g, the third component measures how

Fig. 7. Original Color Image Fig. 8. Halftone Color Image

Page 41: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

223

Halftoning of images in visual cryptography by direct bin ary search

similar the contrasts of the images are. Thevalue range for this component is also[0, 1].

The range of values for the index Q is [–1,1]. The best value 1 is achieved if and only ifthe images are identical.

4.3 Matrix Derivation With VIPSynchronization and Generating Shares

Visual Information Pixel (VIP) is pixel[11] onthe encrypted shares that have color values ofthe original images, which make sharesmeaningful. In the proposed method each subpixel n carries visual information as well asmessage information ,while other methods in[1] and [12] extra pixels are needed in additionto the pixel expansion n to produce meaningfulshares. The VIP synchronization process isindependently applied to each Red(R),Green(G)and Blue(B) color channels. The below figillustrate the Matrices distribution along witha message pixel. Every message pixel composedof 3 b is encoded into four subpixels for eachcolor channel by referring the bit value on eachchannel of message bit. Each encrypted sharehas the VIPs at the same position throughoutthe color channels, where colored in gray in thebelow fig. This feature makes the shares carryaccurate colors of the original image afterencryption.

4.4 Algorithm of matrices construction withVIP synchronization :

1: For given matrices S0 and S1 of size n ×m, let Sc[ij ] be a j-th bit of i-th row inSc, c ∈ {0, 1} (1 < i < n, 1 < j < m). The pis the number of 1 and the given q isthe number of ci in a row i of Sc (1 < q <m – p – 1).

2: procedure MATRICES CONSTRUCTION(S0, S1, q )

3: for i = 1 do

4: for l ← 1, q do

5: if S0[1j ] = S1[1j ]=0 then

6: S0[1j ] ← c1 and S1[1j ] ← c1

7: end if

8: end for

9: end for

10: for i = 2, n do

11: for l ← 1, q do

12: repeat

13: if S0[ij ] = S1[ij ]=0 then

14: S0[ij ] ← ci and S1[ij ] ← ci

15: else

Fig. 9

Page 42: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohammad Mahtab Kazim

224

16: switch(S0[ij1], S0[ij2]) or

17: switch(S1[ij1], S1[ij2]),

18: where j1 = j2

19: end if

20: until if there exists an α satisfying

21: w(S1[i]) – w(S0[i]) – αm

22: end for

23: end for

24: end procedure

Fig. 9. General illustration of matricesdistribution of (2, 2)-color EVC. (a) Matricesdistribution along with a message pixel. Everymessage pixel composed of 3b is encoded intofour subpixels for each color channel byreferring the bit value on each channel ofmessage bit. The positions of VIPs across colorchannels where colored in gray are preservedafter encryption. (b) Decryption example ofsubpixels. Regardless of VIP values, thedecrypted subpixels represent the intendedcolor, the same as that of the original messagepixel, where colored in gray. The ⏐⏐ representsthe logical "OR" operation.

4.5 Thresholding Approach in DBS

In this paper right now we have taken thenecessary steps for halftoning the pictureutilizing DBS, we have taken the Floyd Pictureas an initial picture now our work in this taskis to take the picture as an initial picture utilizedby threshold picture to enhance the outcome.

We utilized a noise based approach thatgives a contrasting option to customarydithering. As opposed to thresholding the totalof the picture and noise (as in dithering), orthresholding the picture (as in error diffusion),we edge the clamor itself. It demonstrates asquare chart depicting the approach. Thecommotion procedure (through a shaping filter)is thresholded to deliver the halftoned yield. Theedge esteem depends just on the (feedforwardfiltered) picture in the open loop approach. Inthe closed loop approach the limit is adjustedconsidering the mistake between the (feedbackfiltered) halftone and the (feedforward filtered)input picture.

In this paper the edge picture at gray levelwill demonstrate the enhanced outcome forsecret image, shares and decoded picture.

4.6 Simulation Results

Simulation results for the proposed secretsharing scheme for color images are illustratedin this section. The experiment was conductedfor different color images of size 512 × 512. Theembedded secret image is of same size as theoriginal image. The original color image is shownin Fig. 10(a). The obtained encoded halftoneshares via DBS are shown in Fig. 10(b) and Fig.10(c). Fig. 10(d) shows the decoded secret imageby stacking the shares together. The simulationresult for the multiple color images is shown inFig.11.

Fig. 10. Halftoned Images of Color Image

5. CONCLUSION

Visual cryptography is the current area ofresearch where lot of scope exists. In this thesis,we have demonstrated the construction of basismatrices for 2-out-of-2 VCS is demonstratedwith examples we have shown how halftonevisual cryptography can be improved to achievebetter halftone images by simultaneouslyencoding the secret image and producing the

Page 43: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

225

Halftoning of images in visual cryptography by direct bin ary search

Fig. 11. Halftoned Images for Shares

Fig. 12. Encrypted Share Images

Fig. 13. Decrypted Secret Images Fig. 14. Final Improved Decrypted Secret Image

PSNR Correlation UQI SSIM

Share 1(Floyd Halftoning) 51.402 0.30733 0.072488 0.047497Share 2(Floyd Halftoning) 51.291 0.29179 0.054117 0.035541Share 1(Standard DBS) 51.606 0.29989 0.12575 0.083654Share 2(Standard DBS) 51.552 0.303 0.11138 0.074741Share 1(Improved DBS) 52.124 0.37084 0.19886 0.092463Share 2(Improved DBS) 51.921 0.3301 0.17988 0.095533

Fig. 15. Comparison table of shares at various parameter

Page 44: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohammad Mahtab Kazim

226

halftone shares via DBS, till now we have donethe halftoning of color image and implement inVC shares and decoded the secret image In thiswe have generated the meaningfull shares .

Halftone visual cryptography is advanced toacquire higher halftone images by means ofconcurrently encoding the secret image andgenerating the halftone shares via DBS.Simulation effects display that the proposedapproach outperforms the other strategies andthe recovered image is of excessive high-quality.The proposed approach achieves high qualityof halftone images and the discovered secretsholds good for more than one colored imagesadditionally does not require any extracomputational complexity.

The comparison table at different parameterof shares shows how the quality has beenimproved by using threshold input forhalftoning the images using DBS.

REFERENCES

[1] Moni Naor and Adi Shamir, "VisualCryptography", advances in cryptology-Euro crypt, 1-12 (1995).

[2] Zhi Zhou, Gonzalo R. Arce andGiovanni Di Crescenzo. "HalftoneVisual Cryptography". IEEE transactionson image processing, 15(8), (2006).

[3] H.C. Wu and C.-C. Chang, "SharingVisual Multi-Secrets Using CircleShares", Comput. Stand. Interfaces,134(28), 123-135 (2005).

[4] A. Parakh and S. Kak. "A RecursiveThreshold Visual Cryptography Scheme".Department of Computer Science,

Oklahoma State University Stillwater,OK 74078.

[5] S.H. Kim and J.P. Allebach, "Impact ofHVS models on model-based halftoning"IEEE Transactions on Image Processing,11, 258-269 (2002).

[6] S. Bhatt, J. Harlim, J. Lepak, R.Ronkese and J. Sabino, "Direct BinarySearch with Adaptive Search and Swap,"in the college of information sciences andtechnology, (2005).

[7] T.N. Pappas and R.J. Safranek ,"Perceptual criteria for image qualityevaluation," in Handbook of Image andVideo Processing, A.C. Bovik, Ed. NewYork: Academic, (2000).

[8] "Special issue on image and video qualitymetrics," Signal Process., 70, (1998).

[9] J.-B. Martens and L. Meesters, "Imagedissimilarity," Signal Process., 70, 155-176 (1998).

[10] Final report from the video qualityexperts group on the validation ofobjective models of video qualityassessment. (2000). [Online] Available:http://www.vqeg.org/

[11] InKoo Kang, Gonzalo R. Arce andHeung-Kyu Lee, " Color Extended VisualCryptography Using Error Diffusion",IEEE Transactions on image processing,20(1), (2011).

[12] G. Ateniese, C. Blundo, A. Santis andD.R. Stinson, "Extended capabilities forvisual cryptography", ACM Theor.Comput. Sci., 250, 143-161 (2001).

Page 45: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

227

Comparative study of different classification techniques using weka tool

Study of web mining and its types

SUSHMA PALSchool of Electronics & Communication Engineering Department,

Saraswati Institute of Engineering & Technology, Hapur, IndiaE-mail: [email protected]

ABSTRACT

Software Testing is the process of identifying the correctness and quality of softwareprogram. The purpose is to check whether the software satisfies the specificrequirements, needs and expectations of the customer. In other word, the job oftesting is to find out the reasons of application failures so that they can be correctedaccording to requirements. Acceptance testing is the key feature of softwareimplementation. AT is performed to ensure that the new system meets all theessential user requirements. It is the final testing activity performed by the customerto test for the completeness, correctness and consistency of the software.

Key words: Web mining, web content mining, web structure mining, web usagemining, decision tree, k-nearest neighbor.

Global Sci-Tech, 10 (4) October-December 2018; pp. 227-234 DOI No.: 10.5958/2455-7110.2018.00032.0

1. INTRODUCTION

Web mining is the process of using datamining techniques and algorithms to extractinformation directly from the Web by extractingit from Web documents and services, Webcontent, hyperlinks and server logs. The goal ofWeb mining is to look for patterns in Web databy collecting and analyzing information in orderto gain insight into trends, the industry andusers in general. Web mining can be broadlydivided into three categories.

2. WEB MINING

2.1 Web Content Mining

2.2 Web Structure Mining

2.3 Web Usage Mining.

2.1 Web Content Mining

Web content mining targets the knowledgediscovery, in which the main objects are thetraditional collections of multimedia documentssuch as images, video, and audio, which areembedded in or linked to the web pages.

Fig. 1. Type of Web Mining Fig. 2. Type of Web Content Mining

Page 46: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

228

It is also quite different from Data miningbecause Web data are mainly semi-structuredand/or unstructured, while Data mining dealsprimarily with structured data. Web contentmining is also different from Text miningbecause of the semi-structure nature of theWeb, while Text mining focuses on unstructuredtexts. Web content mining thus requires creativeapplications of Data mining and / or Textmining techniques and also its own uniqueapproaches. In the past few years, there was arapid expansion of activities in the Web contentmining area. This is not surprising because ofthe phenomenal growth of the Web contents andsignificant economic benefit of such mining.However, due to the heterogeneity and the lackof structure of Web data, automated discoveryof targeted or unexpected knowledgeinformation still present many challengingresearch problems.

The goal of Web structure mining is togenerate structural summary about the Web siteand Web page. Technically, Web content miningmainly focuses on the structure of inner-document, while Web structure mining tries todiscover the link structure of the hyperlinks atthe inter-document level. Based on the topologyof the hyperlinks, Web structure mining willcategorize the Web pages and generate theinformation, such as the similarity andrelationship between different Web sites.

Web structure mining can also have anotherdirection - discovering the structure of Webdocument itself. This type of structure miningcan be used to reveal the structure (schema) ofWeb pages; this would be good for navigationpurpose and make it possible to compare/integrate Web page schemes. This type ofstructure mining will facilitate introducingdatabase techniques for accessing informationin Web pages by providing a reference schema.

2.3 Web Usage Mining

Web Usage Mining focuses on techniquesthat could predict the behavior of users whilethey are interacting with the WWW. Web usagemining, discover user navigation patterns fromweb data, tries to discover the useful informa-tion from the secondary data derived from theinteractions of the users while surfing on theWeb. Web usage mining collects the data fromWeb log records to discover user access patternsof web pages. There are several availableresearch projects and commercial tools thatanalyze those patterns for different purposes.The insight knowledge could be utilized inpersonalization, system improvement, sitemodification, business intelligence and usagecharacterization.

Fig. 3. Types of Web Structure Mining

Fig. 4. Type of Web Usages Mining

2.2 Web Structure Mining

Web Structure Mining focuses on analysisof the link structure of the web and one of itspurposes is to identify more preferabledocuments. The different objects are linked insome way. The intuition is that a hyperlink fromdocument A to document B implies that theauthor of document. A thinks document Bcontains worthwhile information. Web structuremining helps in discovering similarities betweenweb sites or discovering important sites for aparticular topic or discipline or in discoveringweb communities.

Simply applying the traditional processesand assuming that the events are independentcan lead to wrong conclusions. However, theappropriate handling of the links could lead topotential correlations, and then improve thepredictive accuracy of the learned models.

Page 47: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

229

Comparative study of different classification techniques using weka tool

The only information left behind by manyusers visiting a Web site is the path throughthe pages they have accessed. Most of the Webinformation retrieval tools only use the textualinformation, while they ignore the linkinformation that could be very valuable. Ingeneral, there are mainly four kinds of datamining techniques applied to the web miningdomain to discover the user navigation pattern.

2.4 Real World Scenarios

The proposed method is targeted to a widerange of semantic data extraction from websources, though in this research work two ofthem are considered: news harvesting andcommercial offers. On-line newspapers andnews portals provide a huge amount of semi-structured information: title, abstract, body andmain image are the most frequent data foundon such websites. Once again there arethousands of sites with very different formatsand no standard mean to aggregate contents, apart of feed formats like RSS that are oftenimplemented exposing partial information, andmoreover do not provide a standardized way ofretrieving past entries. For these reasons newsharvesting is still an interesting and challengingtask. In particular a given e-commerce websiteis viewed as a collection of product offers eachwith product name, relevant image, price,description, technical details, etc. As with thenews harvesting domain, target websites havedifferent structures and provide no standardizedmean of collecting data.

3. WEB CONTENT MINING TECHNIQUES

The two common tasks through whichuseful information can be mined from Web areClustering and Classification. Here In thispaper, I present various classificationalgorithms used to fetch the information.

Classification is often posed as a supervisedlearning problem in which a set of labeled datais used to train a classifier which can be appliedto label future examples.

3.1 Decision Tree

Decision tree is a powerful classificationtechnique. The decision trees, take the instancedescribed by its features as input, and outputs

a decision, denoting the class information inour case. Two widely known algorithms forbuilding decision trees are Classification andRegression Trees and ID3/C4.5. The tree triesto infer a split of the training data based on thevalues of the available features to produce agood generalization. The split at each node isbased on the feature that gives the maximuminformation gain. Each leaf node correspondsto a class label. A new example is classified byfollowing a path from the root node to a leafnode, where at each node a test is performedon some feature of that example. The leaf nodereached is considered the class label for thatexample. The algorithm can naturally handlebinary or multiclass classification problems. Theleaf nodes can refer to either of the K classesconcerned.

3.2 K-nearest Neighbor

KNN is considered among the oldestnonparametric classification algorithms. Toclassify an unknown example, the distance(using some distance measure e.g. Euclidean)from that example to every other trainingexample is measured. The k smallest distancesare identified, and the most represented classin these k classes is considered the output classlabel. The value of k is normally determinedusing a validation set or using cross-validation.

3.3 Neural Network

The most popular neural network algorithmis back propagation which performs learningon a multilayer feed forward neural network. itconsists of an input layer, one or more hiddenlayers and an output layer. The basic unit in aneural network is a neuron or unit. The inputsto the network correspond to the attributesmeasured for each training tuple. Inputs arefed simultaneously into the units making upthe input layer. They are then weighted and fedsimultaneously to a hidden layer. The numberof hidden layers is arbitrary, although usuallyonly one. The weighted outputs of the lasthidden layer are input to units making up theoutput layer, which emits the network'sprediction. The network is feed forward in thatnone of the weights cycles back to an input unitor to an output unit of a previous layer.

Page 48: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

230

4. EXPERIMENTAL EVALUATION

In this section we report the experimentalevaluation of the effectiveness of the proposedapproach as an automated Web Content Mining(WCM) system able to operate on real case work-load. In particular the experiments addressthese questions:

1. Evaluate the fitness of multi-site WCMwhen generalizing common layouts ofproper con- tent pages.

2. Quantify whether or not the combineduse of anchor text and web objectfeatures is able to improve therecognition process. This step alsovalidates the effectiveness of theproposed Machine-Learning model.

3. Isolate the contribution of non-conventional visual features.

4.1 Evaluation Metrics

Usually standard evaluation metrics suchas error matrix and derived measures can beadopted to evaluate classification results.However, such metrics are not able to directlyevaluate how well the system is able to recognizea given field of interest in a given web page.This can be assessed considering the well-known Information Retrieval metrics Precision(P), Recall (R) and F-Measure (Fβ).

correctP =

actual(4.1)

correctR =

possible (4.2)

2

2

(β + 1.0) * P * RFβ =

β = P + R(4.3)

Where correct is the number of field ofinterest instances correctly recognized by thesystem, actual is the total number of web objectsrecognized as field of interest by the system,and possible is the total number of field ofinterest objects we expected from system. Themetrics were evaluated with a macro-averageapproach, and the F-Measure Fβ was used withequal weight for P and R (β = 1).

4.2 Datasets

The experimental assessment of the

proposed approach is based considering threedifferent datasets. The first two datasets wename COMMOFF-1 and WEBNEWS-1respectively, have been built to analyze WCMwhen no hyperlink information is available. Thethird dataset we name COMMOFF-LA is fromthe web news domain. We decided to collect andpublish datasets used in the experiments sinceto our knowledge there are no public domainresources on the WCM problem that apply toour paradigm. Each dataset is provided as aplain text file with the URI of each web pageand the the set of fields of interest. Web sitepages are complex and dynamic: advertise-ments, images, dynamic contents can appearin very different ways in each rendering of eachpage. Due to this a snapshot of each URI hasbeen taken for each dataset that allows thereproduction of all the HTTP requests asperformed during our experiments. TheWEBNEWS-1 dataset was collected on 29 dailynews websites with a total dataset dimensionof 310 pages. The training set T rSnews has207 pages. The total number of images in thedataset is 13192. The truth set W was built withthe help of a script that properly manipulatesthe RSS feeds of the selected websites.

4.3 Web Content Mining without LinkAnalysis

The effectiveness of our method with nohyperlink information can be measured on thetwo distinct kinds of web object recognized bythe approach: text and images. We name Imageof Interest the former kind of field of interestand text of interest the latter.

4.3.1 Image of interest recognition

In this section we evaluate the performanceof the proposed approach in the task ofidentifying semantically meaningful imageswithin web pages, i.e. images that are directlyrelated to page content and that we name Imageof Interest (IOI). News harvesting for exampleis a challenging task, with the need to collectthe title, abstract (or full content) and imagefor each article. Finding a content related imagewithin the page for a given article exemplifieswhat we intend for IOI identification. A robustand systematic approach to the identificationof the Image of Interest in the domain of web

Page 49: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

231

Comparative study of different classification techniques using weka tool

news is of great importance because alternativedomain specific solutions suffer of somelimitations; in particular RSS[59] are oftenimplemented exposing partial information, anddo not provide a standardized way of retrievingpast entries.

Table 1: Web News: feature analysis results.

Id Features P R F1

wn1 x,y,w,h,type,name,intitle 0.97 0.94 0.96wn2 x,y,w,h,type 0.96 0.93 0.95wn3 x,y,w,h,type,intitle 0.96 0.92 0.94wn4 x,y,w,h,type,name 0.97 0.95 0.96wn5 y,w,h,type 0.92 0.86 0.89wn6 y,w,h,type,name 0.94 0.85 0.89wn7 y,w,h,type,name,intitle 0.94 0.85 0.89wn8 x,y,w,h 0.97 0.95 0.96wn9 w,h,type 0.85 0.68 0.76wn10 w,h,type,name 0.93 0.81 0.86wn11 w,h,type,intitle 0.85 0.68 0.76wn12 w,h,type,name,intitle 0.92 0.79 0.85wn13 type,name,intitle 0.92 0.12 0.21

Table 2: Commercial Offers: feature analysisresults.

Id Features P R F1

co1 x,y,w,h,type,name,intitle 0.91 0.83 0.87co2 x,y,w,h,type 0.86 0.63 0.73co3 x,y,w,h,type,intitle 0.88 0.78 0.83co4 x,y,w,h,type,name 0.90 0.80 0.85co5 y,w,h,type 0.81 0.57 0.67co6 y,w,h,type,name 0.87 0.69 0.77co7 y,w,h,type,name,intitle 0.89 0.75 0.82co8 x,y,w,h 0.85 0.73 0.79co9 w,h,type 0.76 0.20 0.31co10 w,h,type,name 0.84 0.30 0.44co11 w,h,type,intitle 0.82 0.42 0.56co12 w,h,type,name,intitle 0.88 0.48 0.62co13 type,name,intitle 0.83 0.26 0.40

In the e-commerce domain, images togetherwith offer names and price, are the relevantitems to be identified within a huge number ofdiversified product offering web pages.

The feature analysis results for theCOMMOFF-1 and WEBNEWS-1 datasets areproposed in tables 4.1 and 4.2. Each experimentis configured with a different feature set and isidentified by an Id for the sake of convenience.Only the most significant experiments arereported. The contribution of the fx and fy layoutfeatures is clearly isolated in the result tables:rows related to experiments wni, coi, i = 1, . .. , 8 report results obtained by using layoutfeatures while results in rows wni, coi, i = 9, . .. , 13 refer to experiments without their use.

The improvement given by the F l featuresin the WEBNEWS-1 dataset permits the Recallto be boosted from ~.81 to ~.95 (see experimentswn8 and wn10 from table 4.1) and the Precisionto further improve its value. In the CommercialOffers dataset the F l features improve precisionin all experiments (coi, i = 1, . . . , 8 in table4.2) from ~.48 up to ~.83 in the best case.

The difficulty of the two problems isdifferent: the e-commerce domain requires alarger set of features to obtain satisfactoryresults, whereas the web news domain can beapproached with fewer features. Consequentlythe boost given by the layout features F l in thefirst domain yields satisfactory results for real-world Web Content Mining applications, whilein the second domain the improvementenhances Precision and Recall on already highscores. The features fx and fy have similardiscriminant power, and give their best whenused in a pair. The image dimension featuresfw and fh probably contribute the most in thediscrimination of the IOI class (see experiments

Table 2 : Performance evaluation for various classification algorithms after applying feature selection(60-40).

Techniques Accuracy(%) Precision Recall F1_score

Naive Bayes 58.3 .58 .58 .58SVM 73.0 .53 .73 .61KNN 70.0 .70 .70 .70Decision Tree 63.21 .62 .64 .65Random Forest 75.67 .75 .77 .75MLP 72.67 .72 .72 .72

Page 50: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

232

wn12 - wn13 and co12 - co13).

The ftype, fname and fintitle feature as expectedimprove both precision and recall, in particularin the Commercial Offers dataset.

The best feature set in both domains is Fbest= {fx, fy, fw, fh, ftype, fname, fintitle} whenconsidering the measure F1, even though in theWeb News dataset a smaller set of featuresbrings to similar figures. A difference betweenthe Commercial Offers and Web News datasetis that in the former domain there can be manyproduct images with size and position similarto the IOI. The fintitle feature can be useful todiscriminate non-relevant images.

4.4 Web Content Mining with Link Analysis

The results obtained by the overall set ofexperiments are reported in table 4.8. In thebest configuration using the complete set offeatures we obtained a satisfactory result witha value of F1 = 0.86. The precision value of P =0.88 determines the ability to embrace theproposed solution in real-world scenarios. Theaddition of anchortext features is able toimprove both P and R passing from an overallF1 of 0.72 to 0.86. This latter observation

deserves particular attention since it justifiesthe definition of the proposed novel ML model.

Experimental results show that the use ofthe ffsize and ffbold visual features enable afurther improvement of the recognitionperformance, in particular the Recall value thatgoes from 0.81 to 0.84. This latter experimentcan be considered an integration test where thewhole set of proposed techniques is combinedtogether in the definition of a complete WebContent Mining technique coped with WebStructure Mining information.

5. CONCLUSION

The need of structured and semantic-carrying data on the Web is driving the changetoward the adoption of Semantic Web standardsand protocols. The large-scale deployment ofsuch technologies greatly simplifies the effortrequired to gather structured data and alsopermits to perform further reasoning's becauseof the relationships among data that can beexpressed.

However, the diffusion of such standards isstill immature because of the many proposedalternatives and the initial instability of

Fig. 5. Experiments with dataset size of growing dimension on the News dataset

Page 51: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

233

Comparative study of different classification techniques using weka tool

standards. Therefore intelligent techniques ableto adapt on the evolving data of the Web arerequired for high volume structured dataextraction purposes. In this work we havepresented a complete strategy for Web ContentMining aimed at automatizing the collection ofdata from the web, given a specific page topic.

REFERENCES

[1] R. Kosala and H. Blockeel. Web miningresearch: a survey. SIGKDD Explor.Newsl., 2(1): 1-15 (2000).

[2] S. Kuhlins and R. Tredwell. Toolkits forgenerating wrappers. In NODe '02:Revised Papers from the InternationalConference NetObjectDayson Objects,Components, Architectures, Services,and Applications for a Networked World,London, UK. Springer-Verlag, pp 184-198(2003).

[3] T. Berners-Lee, J. Hendler and O.Lassila. The semantic web. ScientificAmerican, 284(5): 34-43 (2001).

[4] R.O. Duda, P.E. Hart and D.G. Stork.Pattern Classification. Wiley-IntersciencePublication, (2000).

Fig. 6. Experiments with dataset size of growing dimension on the Offers dataset.

[5] X. Zhu. Semi-supervised learningliterature survey. Technical report,Computer Sciences, University ofWisconsin-Madison, (2005).

[6] K.R. McKeown, R. Barzilay, D. Evans,V. Hatzivassiloglou, J.L. Klavans, A.Nenkova, C. Sable, B. Schiffman and S.Sigelman. Tracking and summarizingnews on a daily basis withcolumbia'snewsblaster. In Proceedings ofthe second international conference onHuman Language Technology Research,San Francisco, CA, USA. MorganKaufmann Publishers Inc., pp. 280-285,(2002).

[7] W3C. RDFa in XHTML: Syntax andProcessing. (2008). http://www.w3.org/TR/rdfa-syntax/.

[8] B. Adida. hgrddl: Bridging microformatsand rdfa. Web Semantics: Science,Services and Agents on the World WideWeb, 6(1): 54-60 (2008).

[9] W3C. XHTML 1.1 - Module-basedXHTML - Second Edition, (2010). http://www.w3.org/TR/xhtml11/.

[10] W3C. Html markup language, (1999a).

Page 52: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Mohd. Soban Siddiqui and Ali Imam Abidi

234

http://www.w3.org/TR/1999/REC-html401-19991224//.

[11] N. Kushmerick. Wrapper induction forinformation extraction. PhD thesis,University of Washington. Chairperson-Weld, Daniel S. (1997).

[12] A.H.F. Laender, B.A. Ribeiro-Neto, A.S.da Silva and J.S. Teixeira. A brief surveyof web data extraction tools. SIGMODRec., 31(2): 84-93 (2002).

[13] Chang, M. Kayed, Member-Girgis andK.F. Shaalan. A surveyof webinformation extraction systems. IEEETrans. on Knowl. and Data Eng., 18(10):1411-1428 (2006).

[14] G. Fiumara. Automated informationextraction from web sources: a survey.In Proceedings of Between Ontologiesand Folksonomies Workshop in 3rd

International Conference on Communitiesand Technology, (2007).

[15] S. Sarawagi. Automation in information

extraction and integration. In The 28th

International Conference on Very LargeData Bases (VLDB), (2002).

[16] J. Hammer, J. McHugh and H. Garcia-Molina. Semistructured data: Thetsimmis experience. In First East-European Workshop on Advances inDatabases and Information Systems-ADBIS '97, (1997).

[17] W3C. (2009). Cascading style sheets.http://www.w3.org/TR/CSS2/.

[18] Ecma International, (2009). Ecmascript(Javascr ip t ) .h t tp ://www.ecma-internat ional .org/publ icat ions/standards/Ecma-262.htm.

[19] W3C. (1997). The Document ObjectModel. http://www.w3.org/DOM/.

[20] D. Cai, S. Yu, J.-R. Wen and W.-Y. Ma.Extracting content structure for webpages based on visual representation. pp.596, (2003).

Page 53: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

235

IOT strategic research and use case scenario: A direction to the smart life

IOT strategic research and use case scenario:A direction to the smart life

PREETY KHATRIInstitute of Management Studies (IMS), Noida

E-mail: [email protected]

ABSTRACT

IOT (Internet of Things) becomes an important technology which allowscommunication between objects, machines etc. IOT becomes the wide area forresearchers. It is technology which help the objects to interact with internal as wellas external environment, which in turn affects the decisions taken. The type ofcommunication like human-machine, human-human or machine-machine. TheIOT sensors have different types of connections such as GSM, GPRS, 3G, LTE,RFID, Wi-Fi, Bluetooth, and ZigBee. This paper covers the most important issuesand challenges for Internet of Things technology. This paper elaborates the keyissues with the help of different types of technologies as well as about current andfuture research and development efforts in this field.

Key words: Internet of Things, RFID, technologies, research, use case scenario.

Global Sci-Tech, 10 (4) October-December 2018; pp. 235-241 DOI No.: 10.5958/2455-7110.2018.00033.2

1. INTRODUCTION

The Internet of Things (IOT) is a continuousand advancement in technology (figure 1) whichdescribes a future where every physical objectscan be connected to the Internet with the helpof various devices like GSM, RFID, 3G etc.objects obtain intelligence and make themselvesrecognizable and identifiable.

Objects through internet can communicatelike one machine can communicate with anothermachine (M2M).according to a survey thenumber of internet connected devices will beupto 50 billion in 2020.

According to an industry analyst, Inter-national Data Corporation, installed base forIOT will grow up to 212 billion by 2020 (figure

Fig. 1. Internet of Things

Page 54: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Preety Khatri

236

2). International Data Corporation, analyze thatthis growth is driven by intelligent systems thatwill be installed and collecting data which isbased on customer and industry specific.

IOT includes, for example, Cameras whichconnected to internet that allow you to switchingoff the lights automatically in a room when noone is around, changing the lane while drivingsafely, post pictures online with a single click.It also includes electric vehicle and smarthouses that are security enabled withconnectivity through internet. IOT can also beable to transfer data over the network withouthuman interaction.

2. IOT ANALYTICS AND CHARACTERISTICS

IOT analytics examines and analyze thedata. Sensors, RFID and other devices helpfulto collect the data and on that data analysis isto be performed. IOT analytics consists of IOTdata (in the form of different devices like smartvehicle, smart house, smart devices etc.) whichare connected through multiple sensors withcontinuous and high volume data and that datastore and blend and data is managed. Thesecond step of IOT analytics (figure 3) is morecomplexity which consists of multiple anddistributed analytics. That data is integratedwith operation system and the final step is more

Fig. 2. Installed base for IOT analysis (will grow upto 212 billion by 2020)

Fig. 3. IOT Analytics

Page 55: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

237

IOT strategic research and use case scenario: A direction to the smart life

automation which consists of bidirectionalcommunication and control of end points.

The main characteristics (figure 4) of IOTconsist of:

••••• Data: Data is the first step in IOT towardsaction and intelligence.

••••• Connectivity: Connectivity involve toconnect devices, sensors etc. with networkcompatibility and accessibility. The Devicesare to be connected with any device, item,and actuators and to 'the Internet' oranother network. Compatibility involves theability to produce and consume datawhereas accessibility means to connect toa network.

••••• Communication: The devices can becommunicated through data and analyzedata.

••••• Intelligence: With the help of algorithmswhich makes the devices smart andintelligent. Which is the main aspect as inthe sensing capabilities in IOT devices.

••••• Things: The things which are connectedwith devices are the main characteristics ofIOT. The thing-related services are providedby IOT, such as privacy protection andsemantic consistency between physicalthings. The devices can contain sensors orsensing materials can be attached todevices.

••••• Action: The action can be physical action,or it can be based on some consequences ormechanization. The action is the outcomeof intelligence. Overall, it is the mostimportant part of IOT.

••••• Ecosystem: The ecosystem in IOT consistsof expertise and ability which is required tocreate the value chain which begins withdifferent mechanism and consists ofdifferent components like processors,modules etc.

3. INTERNET OF THINGS STRATEGICRESEARCH

The Internet of Things is one of the mostpowerful techniques for the next industrialrevolution and business activities. The IOTrequires sound information processingcapabilities and physical, digital, cyber andvirtual worlds for the "digital shadows" of thesereal things. The European Research Clusteron the Internet of Things (IERC) has implicatedexperts from research, academics, Industrieswhich provide their vision on IOT technologies,research challenges, and the key applications.

The IOT innovations affects on manyindustries, research as well as big organizations.Some important technologies like embedsystems. These systems are filling the gapbetween physical world of real things and cyberspace. As shown in the figure 5, how IOT society

Fig. 4. IOT CharacteristicsFig. 5. The relationship between Real physical world

and virtual cyber world using IOT

Page 56: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Preety Khatri

238

knowledge integration, filling the gap betweenreal physical world and virtual cyber world.There is bidirectional communication betweendigital world and real physical world whichshows things integration, between digital worldand virtual cyber world which shows dataintegration and also between real physical worldand virtual cyber world, which shows semanticintegration.

The IERC Research and Innovation Agendafocus on the most important aspects of IOT. Themost important technologies like identification,communication, architecture, network techno-logy, security, cyber security, interoperability,data and signal processing etc. The StrategicResearch is developed with the help ofEuropean-led community consisting of differenttypes of projects. Most of the research in thearea of cloud system, cyber physics, socialnetworks etc. has been done.

4. INTERNET OF THINGS APPLICATIONS

There are lots of life applications that wenormally see and these applications are smartbut not communicate with each other. So tocommunicate them, there are lots ofapplications for their communication. Some ofthe applications are:

••••• Smart Home: Smart Home ranking ashighest IOT application. More than 70,000people search for the term "Smart Home"each month. Smart Home has become theinnovative steps of success in residentialspace. IOT also provide the solutions forHome Automation through which we cancontrol the home appliances. By using smartapplications for smart home will saveenergy, time and money.

••••• Wearable's: Wearable's also the mostimportant IOT application. There are lots ofavailable in the market like smart watches,smart headphones, smart gesture control,smart bracelet.

••••• Smart City: Smart city is another powerfulapplication of IOT. Smart city span withvariety like urban security, waterdistribution, traffic management, wastemanagement and environmentalmonitoring. By using smart city applicationit also solve the problem of pollution, trafficcongestion problems, reduce noise. With thehelp of sensors, citizens can find out thefree available parking slots nearby.

••••• Smart Grid: Smart Grid is also the popularapplication of IOT. It will improve efficiency,economics, reliability, whether it is used forany manner.

Fig. 6. IOT Applications

Page 57: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

239

IOT strategic research and use case scenario: A direction to the smart life

••••• Industrial Internet: Industrial Internet hasbecome the biggest push for internetwebsites like facebook, twitter etc. and it isthe new buzz in the industrial sector. It isgive power to industrial engineering with bigdata, sensors etc. IOT provides greatpotential for quality control andsustainability in case of industrial internet.

••••• Connected Cars: The automotive new digitaltechnology worked to optimize vehicles withits internal functions which are based onsmart applications. A large number of automakers are working on connected carsolutions. The connected car is a vehiclewhich is able to optimize it's own operation,maintenance, comfort of passengers etc.with the help of internet connectivity andsensors.

••••• Connected Health: Connected healthbecomes the most important part of IOTapplications. Smart Medicare devices andconnected healthcare system bears potentialfor companies as well as for people.

••••• Smart Retailing and Supply-chainManagement: Smart retailing is the mostimportant part of daily life. IOT with smartequipments like with the help of RFID orsensors which supply lots of compensationto retailers like with the help of RFID, theretailer can track and detect the stocks andprevent them from going out of stock. With

Fig. 7. IOT Devices connected with network

smart gadgets, it can track the supply chainmanagement system and also generategraphs for useful strategies.

••••• Smart Farming: Smart farming has becomethe most important application the field ofagriculture. With the help of smart equip-ments with sensor technology, farmingbecomes interesting and easy.

5. IOT USE CASE SCENARIOS

The future technology is based upon the IOTtechnology in which there are uniquelyidentifiable objects which are interconnectedthrough a network for a new platform andgrowth (figure 7).

There are different types of use casescenarios and real life examples related to IOT.These come across industries, some projects likepilot, simulation, robotics, AI, RFID. IOT alsoincrease in the field of healthcare, smart cityand various innovations.

During the data analysis from 2017 until2022, in healthcare applications, the growth ofIOT has increasing its demand day by day i.e.there is digital transformation in the healthcareindustry. There is increasing perception ofconsumers regarding their health, whichdemand for remote and home possibilities keepsgrowing. There are different healthcare systemsturn up with work of fiction approaches.

Page 58: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Preety Khatri

240

Fig. 8. Use case Scenarios (from source :IOT Analytics sells an Excel with of over 640 Enterprise IOTprojects in the IOT segments in this illustration)

eHealthcare is also the new area in the case ofhealthcare. So we can say that around 60% ofhealthcare organizations have introduces IOTdevices into their facilities. Real time healthcaresystem is also a key area in the field of IOT andBig data analytics tools and processes whichanalyze the healthcare system.

The Internet of Things (IOT) is makingpossible to make cities smart and greener, saferand more efficient. In smart cities, manydifferent stakeholders must work together toprovide the best technology solutions.Governments and their partners can reduceenergy and water consumption, keep peoplemoving efficiently, and improve safety andquality of life by connecting devices, vehiclesand infrastructure in the city. So the integrationof technology which provide security and safetywhich is higher demand.

So by optimizing resources like urbanfarming, reducing traffic congestion etc. we canmake a smart city. Smart city or it can be smart

building, there is connectivity or data whichenable various technologies and make themsmart.

Connected communities and engagedcitizens also the important part. So the goalsand is to shape the evolutions of and in smartcities on all levels. According to a survey, by2019, around 40% of local and regionalgovernments will use IOT to make smart cityi.e. to turn infrastructure like roads, streetlights,traffic signals etc. Until at least 2020, smartcity projects are poised to sharply increase aswe move from ad hoc smart city projects to thefirst true smart cities.

The above diagram shows the differentindustries uses IOT and their global share ofIOT projects. There are some firms which madelists, based upon publicly available customersuccess stories and other sources, different usecase scenarios are made like IOT analytics.There are more real examples of IOT like analystfirms and research companies. So we can say

Page 59: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

241

IOT strategic research and use case scenario: A direction to the smart life

that the effective IOT use case scenarios andexamples start with challenges and end goalsin mind. Then the effective way for organizationsto look at the possible way of IOT deploymentsto succeed in their digital transformations byoptimize their efficiency and better serve theircustomers.

6. CONCLUSION

IOT will grow to around 30 billion units,installed in 2020 representing almost 30-foldincrease from 0.9 billion in 2009. At Europeanand global level the economic value is added by2020. The IOT applications are still underdevelopment by different industriesapplications, but there is some grown in theindustries like sensors, electronic processing,micro controllers, information andcommunication services. From the use casescenarios we state that not all industries growsat the same speed. IOT will also facilitate thenew business models based on real time databased on billions of sensor nodes which will bethe enhancement and development in the smarttechnology.

REFERENCES

[1] Amrita Sajja, D.K. Kharde andChandana Pandey. A Survey on efficientway to Live: Smart Home - It's an Internetof Things. ISAR - International Journalof Electronics and Communication Ethics,1(1): (2016).

[2] Bill Chamberlin. Healthcare Internet ofThings: 18 trends to watch in 2016. IBMCenter for Applied Insights. https://ibmcai.com/2016/03/01/healthcare-internet-of-things-18-trends-to-watch-in-2016.

[3] T.R. Banniza, A.M. Biraghi, L.M.Correia, J. Goncalves, M. Kind, T.Monath, J. Salo, D. Sebastiao and K.Wuenstal. (2010) Project---wideEvaluation of Business Use Cases.Project report. FP7 ICT-2007-1-21604--4WARD/D1.2.

[4] Conner and Margery. Sensors empowerthe 'Internet of Things'", EDN Magazine,(2010):http://www.edn.com/article/5 0 9 1 2 3 S e n s o r s _ e m p o w e r _ t h e _Internet_of_Things_.php

[5] Debasis Bandyopadhyay and JaydipSen. Internet of Things - Applications andChallenges in Technology andStandardization. arvix (2011).

[6] Kevin Ashton, "That 'Internet of Things'Thing", RFID Journal, (2009).

[7] M.A. Ezechina, K.K. Okwara and C.A.U.Ugboaja. The Internet of Things (Iot): AScalable Approach to ConnectingEverything. The International Journal ofEngineering and Science. 4(1): 09-12(2015).

[8] Patrick Guillemin, et al., Internet ofThings Position Paper on Standardi-zation for IoT technologies. Europeanresearch cluster on the internet of things;(2015).

[9] Rafiullah Khan, Sarmad Ullah Khan,Rifaqat Zaheer and Shahid Khan,"Future Internet: The Internet of ThingsArchitecture, Possible Applications andKey Challenges," in Proceedings ofFrontiers of Information Technology (FIT),257-260 (2012).

[10] Rosario Miceli. Energy Managementand Smart Grids. Energies (2013).

[11] Sapandeep Kaur and IkvinderpalSingh. A Survey Report on Internet ofThings Applications. InternationalJournal of Computer Science Trends andTechnology. 4(2): (2016).

[12] Sarita Agrawal and Manik Lal Das.Internet of Things - A Paradigm Shift ofFuture Internet Applications? Instituteof technology, nirma university,ahmedabad. 382-481 (2011).

[13] S. Misra et al., Security Challenges andApproaches in Internet of Things.Springer Briefs in Electrical and ComputerEngineering, (2016).

Page 60: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Arsheen Neda Siddiqui

242

Digital watermarking using MATLAB

ARSHEEN NEDA SIDDIQUINetwork Engineer at Asian Technical Training Centre, New Delhi-110025

E-mail: [email protected]

ABSTRACT

With the development of multimedia technologies, network, fast growing socialstream and rapid evolution of internet multimedia copyright protection and contentauthentication have become serious problem. It need to get solved urgently. Digitalwatermarking technology provides a strong solution for it. The general purpose ofdigital watermarking technology is used to protect image, audio, video from copying.In this project our aim is to study the digital watermarking technique using astrong tool that is MATLAB. MATLAB provides the strong platform to implementwatermarking technique using different approaches. Objective of this work is tounderstand use of digital watermarking and implement same using MATLAB.Moreover in future we will learn how MATLAB can used in image processing.

Key words: MATLAB, Watermarking, Images, Audio, Video and Digital.

Global Sci-Tech, 10 (4) October-December 2018; pp. 242-246 DOI No.: 10.5958/2455-7110.2018.00034.4

1. INTRODUCTION

Digital watermarking is simply the act ofhiding a message related to a digital signal (i.e.an image, song and video). This concept closelyrelated to steganography, in that they bothconceal a message inside a digital signal.However, what separates them is their goal.Watermarking tries to conceal a message relatedto the actual content of the digital signal, whilein steganography the digital signal has norelation to the message, and it is merely usedas a cover to hide its existence. Watermarkingcomes into existence centuries ago in the formof watermarks found initially in plain paper andsubsequently in paper bills. However, the fieldof digital watermarking was only developedduring the last 15 years and it is now beingused for many different applications. Digitalwatermarking can be used for multiplepurposes, such as, Copyright protection, Sourcetracking, Broadcast tracking, such aswatermarked videos from global newsorganizations, Hidden communication it istypically used to identify ownership of thecopyright of such signal. Watermarking is theprocess of hiding digital information in a carrier

signal; the hidden information should, but doesnot need to, contain a relation to the carriersignal. Digital watermarks may be used to verifythe authenticity or integrity of the carrier signalor to show the identity of its owners. It isprominently used for tracing copyrightinfringements and for banknote authentication.To hide multimedia information, watermarkingis a relative new technique. Its application isbroad, including data authentication, ownershipprotection, side information conveyance,broadcast monitoring etc. Basically digitalwatermarking is that technique in which weembed the signal or any proprietary informationi.e. watermarks, into the digital media likeimage, audio, video. After that the embeddedsignal is detected and extracted out to revealthe real identity of digital media.

MATLAB is a data analysis and visualizationtool which has been designed with powerfulsupport for matrices and matrix operations. Aswell as this, MATLAB has excellent graphicscapabilities, and its own powerful programminglanguage. One of the reasons that MATLAB hasbecome such an important tool is through theuse of sets of MATLAB programs designed to

Page 61: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

243

Digital watermarking using MATLAB

support a particular task.

We go through several basic imageprocessing commands loading an image,[4] itis better to put the image in the same folderwith the m-file. This way, the image can be easilyloaded through "imread" command: A =imread('lena.tif');

Else if the image is in a different folder, itshould be fully addressed: A = imread('C:\Users\User1\Desktop\lena.tif');

The supported formats by MATLAB are:bmp, cur, fts(fits), gif,hdf, ico, j2c(j2k), jp2,jpf(jpx), jpg(jpeg), pbm, pcx, pgm, png, pnm,ppm, ras, tif(tiff), and xwd. 'A' is now a matrixof pixels brightness values. If the image is inblack and white, the matrix is 2-dimmensional.However, if there is a color image, we will havedimensional matrix, which has three planes ofmain colors: Red, Green, and Blue. The numberof bits that are needed to preserve the value ofevery pixel is called"bit depth" of the image. Theoutput class of "imread" command is "logical"for depth of one bit, "uint8" for bit depth between2-8, and "uint16" for higher bit depths.

2. LITERATURE SURVEY

In 2012, Amit Kumar Singh, Nomit Sharma,Mayank Dev and Ananad Mohan proposed anovel technique for digital image watermarkingin spatial domain where they described all aboutthe LSB technique and the observations madeby performing the LSB algorithm over differentimages (Lena, Baboon, Space and Medical). Theimages used for the experiment are Colourimages. In 2016, Sanjai Kumar and AmbarDutta proposed an analysis of spatial domaindigital watermarking techniques where theyimplemented two spatial domain algorithms fordigital watermarking and compared with respectto different performance measures in thepresence/absence of noise. It was found fromthe experimental results that the algorithmusing the concept of maximum entropy blockin the cover image (Algorithm-2) performedbetter as compared with Algorithm-1. It is alsofound from the results that Algorithm-1 is morerobust with respect to Gaussian noise comparedto Algorithm-2. Both the algorithms are robustwith respect to salt-and-pepper noise.

In 2006, Yuan et al. Proposed an integerwavelet based multiple logo watermarkingscheme. The watermark is permutated usingArnold transform and is embedded by modifyingthe coefficients of the HH and LL sub bands. Inthis approach, an integer wavelet based multiplelogo-watermarking schemes for copyrightprotection of digital image is presented. A visualmeaningful binary logo is used as watermark.The process of watermark embedding is carriedout by transforming the host image in theinteger wavelet domain. To construct a blindwatermarking scheme, wavelet coefficients ofHH and LL bands are modified depending onthe watermark bits. To add the security,permutation is used to preprocess thewatermark.

3. IMPLEMENTATION

3.1Watermarking without side-information

As described earlier, some communication-based watermarking models do not takeadvantage of the channel side-information. Inthis kind of models, the image is simplyconsidered as another form of channel noisethat distorts the message during its trans-mission. This can be seen in Figure 1. Thewatermark embedder encodes a message usinga watermark encoder and a key. This is thenadded to the original image and transmitted overthe communication channel which adds somenoise. The watermark detector at the other endreceives the noisy watermarked image and triesto decode the original image using a key.

An example watermarking system thatillustrates watermarking without side-information was implemented in MATLAB.

Fig. 1. Standard model for watermarking with noside-information

Page 62: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Arsheen Neda Siddiqui

244

3.2 Blind embedding

This system is an example of blindembedding, which does not exploit the originalimage statistics to embed a message in animage. The detection is done using linearcorrelation. This system is a 1-bit watermarkingsystem, in other words it only embeds one bit(a 1 or 0) inside the cover image.[9]

The algorithm for the embedded anddetector are as follows:

Embedder:

1. Choose a random reference pattern. Thisis simply an array with the samedimensions as the original image, whoseelements are drawn from a randomGaussian distribution in the interval [-1, 1]. The watermarking key is the seedthat is used to initiate the pseudo-random number generator that createsthe random reference pattern.

2. Calculate a message pattern dependingon whether we are embedding a 1 or a0. For a 1, leave the random referencepattern as it is. For a 0, take its negativeto get the message pattern.

3. Scale the message pattern by a constant? which is used to control the embeddingstrength. For higher values of α we havemore robust embedding, at the expenseof loosing image quality. The value usedat the initial experiment was α = 1. 4.Add the scaled message pattern to theoriginal image to get the watermarkedimage.

Detector:

1. Calculate the linear correlation betweenthe watermarked image that wasreceived and the initial reference patternthat can be recreated using the initialseed which acted as the watermarkingkey.

2. Decide what the watermark messagewas, according to the result of thecorrelation. If the linear correlation valuewas above a threshold, we say that themessage was a 1. If the linear correlationwas below the negative of the threshold

we say that the message was a 0. If thelinear correlation was between thenegative and the positive threshold wesay that no message was embedded.

An example of the embedding processcan be seen in Figure 2. The top leftimage is the original image, the bottomleft image is the reference pattern andthe watermarked image resulting fromembedding a 1, with α=1, is seen on theright. As we can see, there is noperceptual difference between theoriginal and the watermarked image.

To test the effectiveness of this system,I used 1 small images (112 × 92 pixels).I ran the embedding algorithm with α =1 and tried to embed a 1 and a 0 in theimage, resulting in watermarked image.

4. APPLICATIONS AND CHALLENGES

4.1Challenges of watermarking

First we we will discuss the challenges ofdigital watermarking faced by the people now adays.An Obvious Distraction One of the majorcomplaints raised by photographers and viewersalike is the distraction a watermark can createwithin the image. Some photographers andcompanies place a large watermark across themiddle of the image, verifying copyright butobscuring the subject of the photo. If thewatermark is too dark or large, it becomes theonly thing the viewer can truly focus on, creating

Fig. 2. An example of the blind embedding systemresults

Page 63: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

245

Digital watermarking using MATLAB

a negative impression. Not FoolproofWatermarks are typically used to try and reducethe possibility of thievery. With the availabilityof affordable or free photo editing programs suchas Photoshop and Gimp, this may prove to be awasted effort. Using the clone or trimming tools,even a novice can remove your watermark inone of these editing programs. An unscrupulousthief may even place his own watermark on theimage after removing yours. Watermarks do notoffer foolproof protection against theft. ReducesSocial Sharing: Social networking sites helpspread information and news, but if your photosare heavily watermarked they may not receivemuch attention. People tend to share imagesthey find appealing, and some may be turnedoff by a Web-sized image with a prominentwatermark. Promotes Negative Image of youAlthough a watermark is intended to protectyour rights, the appearance of one could havea negative impact on your brand. Copyrightingall your online images could give the impressionto viewers that you are more concerned withpossible theft than displaying the image as itwas intended. Large watermarks across aproduct photo may frustrate shoppers, as theycan't see the product clearly without thedistracting watermark over it.

4.2 Watermarking applications

The increasing amount of research onwatermarking over the past decade has beenlargely driven by its important applications indigital copyrights management and protection.First application for watermarking wasbroadcast monitoring.

It is often crucially important that we areable to track when a specific video is beingbroadcast by a TV station. This is important toadvertising agencies that want to ensure thattheir commercials are getting the air time theypaid for. Another very important application isowner identification. Being able to identify theowner of a specific digital work of art, such as avideo or image can be quite difficult.Nevertheless, it is a very important task,especially in cases related to copyrightinfringement. Instead of including copyrightnotices with every image or song, we could usewatermarking to embed the copyright in theimage or the song itself.

Transaction tracking is another interestingapplication of watermarking. In this case thewatermark embedded in a digital work can beused to record one or more transactions takingplace in the history of a copy of this work. Forexample, watermarking could be used to recordthe recipient of every legal copy of a movie byembedding a different watermark in each copy.If the movie is then leaked to the Internet, themovie producers could identify which recipientof the movie was the source of the leak. Copycontrol is a very promising application forwatermarking. In this application, water-marking can be used to prevent the illegalcopying of songs, images of movies, byembedding a watermark in them that wouldinstruct a watermarking-compatible DVD or CDwriter to not write the song or movie because itis an illegal copy.

5. CONCLUSION AND FUTURE SCOPE

5.1 Conclusion

In the previous section we have reviewedand discuss digital watermarking usingMATLAB. We have explained simple techniqueto implement digital watermarking on any smallpixel image using MATLAB coding. We haveshown an example with the diagram of blindembedding. It is very simple technique I ranthe embedding algorithm with ? = 1 and triedto embed a 1 and a 0 in the image, resulting inwatermarked image. Due to the rapid growthof the role of social networks andcommunications in everyday lives, taking andsharing images frequently has become awidespread practice, where a remarkabledivision of modern movable phones andcomputers, as well as digital cameras, handlehigh resolution imaging. However, transferringthe foregoing images from a device to anotherone may be seriously exposed to the risks ofsecurity, manipulation and copyright attacks,unless it has been carefully taken care of byembedding the data into the media contentsthrough watermarking .Watermarking providesa vital platform aiming at protecting multimediamaterials from a variety of undesired operationsand illegal interferences, such as distributionand manipulation, meaning that for a reliableperformance, they need to generate seamless

Page 64: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Arsheen Neda Siddiqui

246

watermarks which could handle large volumesof data robustly and securely.

5.2 Future work

In this work we simply wants to understandthe basic approaches to implement thewatermarking technique using MATLAB with asmall pixel image. In future we will focus onDWT and DCT these are two most popular toolsused in watermarking algorithm. With theincreasing use of SVD, the digital watermarkingtechnology in transform domain has beengreatly developed. Based on DWT, DCT andSVD, we will try to develop new watermarkingalgorithm for digital image.

REFERENCES

[1] V. Solachidis, E. Maiorana, P. Campisiand F. Banterle. "HDR imagewatermarking based on bracketingdecomposition," in Digital SignalProcessing (DSP), 18th InternationalConference on, IEEE, pp. 1-6, (2013).

[2] I. Cox, M. Miller, J. Bloom, J. Fridrich,and T. Kalker. Digital Watermarking andSteganography Second Edition. Elsevier,(2008).

[3] X. Xue, M. Okuda and S. Goto, "?-lawbased watermarking for HDRimagerobust to tone mapping," (2011).

[4] Digital Watermarking Using MATLAB

Pooya Monshizadeh Naini University ofTehran Iran.

[5] K. Zebbiche and F. Khelifi, "Efficientwavelet-based perceptual watermarkmasking for robust fingerprint imagewatermarking," IET ImageProcessing,8(1): 23-32, (2014).

[6] L. Laur, M. Daneshmand, M. Agoyi andG. Anbarjafari, "Robustgray scalewatermarking technique based on facedetection," in Signal Processing andCommunications Applications Conference(SIU), 23th, IEEE, pp. 471-475, (2015).

[7] P.A. Hernandez-Avalos, C. Feregrino-Uribe and R. Cumplido, "Watermarkingusing similarities based on fractalcodification," Digital SignalProcessing,22(2): 324-336, (2012).

[8] P. Rasti, S. Samiei, M. Agoyi, S.Escalera and G. Anbarjafari,"Robustnon-blind color videowatermarking using qr decompositionand entropyanalysis," Journal of VisualCommunication and ImageRepresentation, 38: 838-847, (2016).

[9] E. Elbas¸i, "Robust multimediawatermarking: Hidden markov modelapproachfor video sequences," TurkishJournal of Digital Watermarking MelinosAverkiou.

[10] wikipedia https://www.wikipedia.org/

Page 65: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of
Page 66: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of
Page 67: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

AL - FALAH UNIVERSITY

Faridabad (Haryana), India

Research Programme (Ph. D. Course)

UG Programme (B.Sc. Courses)

B.Sc. (Hon.) Chemistry

B.Sc. (Hon.) Physics

B.Sc. (Hon.) Mathematics

Doctor of Philosophy (Ph.D.)

PG Programme (M.Sc. Courses)

M.Sc. (Chemistry)

M.Sc. (Physics)

M.Sc. (Mathematics)

B.A. (Hon.) English

B.A. (Hon.) Urdu

Doctor of Philosophy (Ph.D.)

M.A. (English)

M.A. (Urdu)

Bachelor of Social Work (BSW)

B.A. (Hon.) Economics

B.A. (Hon.) History

B.A. (Hon.) Geography

Master of Social Work (MSW)

M.A. (Economics)

M.A. (History)

M.A. (Geography)

Bachelor of Education (B.Ed.)

Diploma in Education (D.Ed.)

Master of Education (M.Ed.)

Doctor of Philosophy (Ph.D.)

UNIVERSITY POLYTECHNIC

SCHOOL OF COMMERCE & MANAGEMENT

BCA

B.Sc. (I.T.)

Doctor of Philosophy (Ph.D.)

Diploma in Civil Engineering

Diploma in Mechanical Engineering

Diploma in Electrical Engineering

MCA

Mechanical Engineering

Civil Engineering

Electrical & Electronic Engineering

Electronic & Communication

Computer Science & Engineering

Bachelor of Architecture

Mechanical Engineering

Civil Engineering

Electrical & Electronic Engineering

Electronic & Communication Engineering

Computer Science & Engineering

Mechanical & Automation Engineering

Machine Design

Thermal Engineering

Electronic & Communication

VLSI Design

Department of Mechanical

Department of Electronic &

Department of Electrical &

Manufacturing Process & Automation

Industrial Production & Engineering

Department of Computer Science

Structural & Foundation Engineering

Engineering

Engineering

Engineering

Communication Engineering

Engineering

& Engineering

Computer Science & Engineering

Electronic Engineering

Power System

Department of Civil Engineering

Comunication Technology andManagement

Environmental Engineering

SCHOOL OF PHYSICAL & MOLECULAR SCIENCESCHOOL OF ENGINEERING & TECHNOLOGY

UG Programme (B.Tech. Courses)UG Programme (B.Tech. Courses) PG Programme (M.Tech. Courses)

Bachelor of Business Administration

(BBA)

B. Com.

Master of Business Administration

(MBA)

M. Com.

Master of Finance & Control

Doctor of Philosophy (Ph. D.)AL-FALAH HOSPITAL

SCHOOL OF SOCIAL SCIENCES

SCHOOL OF EDUCATION & TRAINING

SCHOOL OF COMPUTER SCIENCE

SCHOOL OF HUMANITIES & LANGUAGES

Page 68: ISSN 0975-9638 (Print) Global Sci-Tech - Al-Falah Universityalfalahuniversity.edu.in/wp-content/uploads/2019/03/Global-Sci-Tech-104-2018.pdfGlobal Sci - Tech Al-Falah’s Journal of

Faridabad (Haryana), India

AL-FALAH UNIVERSITY

Abstracted/Indexed by

Scribd Advanced Science Index (ASI) Qwant ResearchBib