ICCCI Proceedings

VelammalSilverJubileeCelebrations19862010

22nd & 23rd July 2010

Proceedings of the International Conference On COMPUTERS, COMMUNICATION & INTELLIGENCE

Organised by

Velammal College of Engineering & Technology

Viraganoor, Madurai 625 009, India

International Conference on Computers, Communication & Intelligence

22nd & 23rd July 2010

CONFERENCE ORGANIZATION Chief Patron : Shri. M.V. Muthuramalingam, Chairman Organising Chair : Dr. N. Suresh Kumar, Principal Organising Secretaries : Dr. P. Alli and Dr. G. Manikandan INTERNATIONAL ADVISORY COMMITTEE MEMBERS Dr. Kimmo Salmenjoki, Seinajoki University of Applied Sciences, Finland Dr. Fiorenzo Fianceschini, Polytechnico di Torino, Italy Dr. Namudri Kamesh, University of North Texas, USA Dr. Henry Selvaraj, University of Nevada, USA Dr. Kasim Mousa Al Aubidy, Philadelphia University, Jordan Dr. Jiju Antony, University of Strathclyde, UK Dr. Lorno Uden, University of Staffordshire, UK Dr. A. Vallavaraj, Caledonian College of Engineering, Sultanate of Oman Dr. Paulraj Murugesa Pandiyan, University of Malaysia, Perlis Dr. Arunagiri, Yanbu Industrial College, Kingdom of Saudi Arabia. Dr. Raja Sooriya Moorthi, Carnegie Melon University, USA Dr. Angappa Gunasekaran, University of Massachusetts, USA Dr. Sridhar Arjunan, RMIT University , Australia, NATIONAL ADVISORY COMMITTEE MEMBERS Cmdr. Suresh Kumar Thakur, NRB, DRDO, India Dr. M. Mathirajan, Anna University, India Dr. Chitra T. Rajan, PSG Tech, India Dr. G. Arumugam, MKU, India Dr. S. Ibrahim Sadhar, Wins Infotech Pvt,. Ltd, India Dr. T. Devi, Bharathiar University, India Dr. L. Ganesan, AC Tech, India Dr. T. Purushothaman, GCT, India Dr. R. Murugesan, MKU, India Dr. B. Ramadoss, NIT, India Dr. S. Mercy Shalini, TCE, India Dr. Kannan Balasubramanian, MSEC, India Dr. K. Muneeswaran, MSEC, India Dr. K. Ramar, NEC, India Mr. Jegan Jothivel, Cisco Networking Academy, India

PAPER ID PAPER TITLE SESSION ID

PAGE NO. IN

PROCEEDINGS

SESSION 1 AI002 Study of similarity metrics for genomic data using go-ontology S1-01 87 - 94

AI005 Hybrid PSO based neural network classifier and decision tree for brain MRI mining

S1-02 95 - 100

AI006 Gap: genetic algorithm based power estimation technique for behavioral circuits

S1-03 101 - 107

AI007 Human action classification using 3d star skeletonization and rvm classifier

S1-04 108 - 115

COMP002 Enhanced knowledge base representation technique for intelligent storage and efficient retrieval using knowledge based markup l

S1-05 154 - 157

AI009 Face detection using wavelet transform and rbf neural network S1-06 303 - 306

COMP016 Automated test case generation and performance analysis for GUI application

S1-07 178 - 187

COMP017 The need of the hour nosql technology for next generation data storage

S1-08 188 - 192

COMP115 Intelligent Agent based Data Cleaning to improve the Accuracy of WiFi Positioning System Using Geographical Information System (GIS)

S1-09 24 - 30

COMP102 Designing Health Care Forum using Semantic Search Engine & Diagnostic Ontology

S1-10 77 - 81

COMP135 Optimization of Tool Wear in Shaping Process by Machine vision system Using Genetic Algorithm

S1-11 453 - 456

COMP111 Framework for Comparison of Association Rule Mining using Genetic Algorithm

S1-12 14 - 20

COMP114 A New Method For Solving Fuzzy Linear Programming With TORA

S1-13 237 - 239

SESSION 2 AI008 Relevance vector machine based gender classification using gait

appearance features S2-01 116 - 122

COMN013 An energy efficient advanced data compression and decompression schemes for wsn

S2-02 316 - 319

COMN018 Active noise control: a simulation study S2-03 320 - 325

AI013 A survey on gait recognition using hmm model S2-04 123 - 126

COMN022 Human motion tracking and behavior classification using multiple cameras

S2-05 131 - 134

COMP007 Adaptive visible watermarking and copy protection of reverted multimedia data

S2-06 168 - 173

COMP026 Hiding sensitive frequent item set by database extension S2-07 357 - 362

COMP027 Integrated biometric authentication using finger print and iris matching

S2-08 436 - 441

COMP103 Improvement towards efficient OPFET detector

S2-09 417 - 420

COMN028 Texture segmentation method based on combinatorial of morphological and statistical operations using wavelets

S2-10 326 - 329

COMP118 High Performance Evaluation of 600-1200V, 1-40A Silicon Carbide Schottky Barrier Diodes and Their Applications Using Mat L b

S2-11 369 - 376

COMN034 Phase based shape matching for trademarks retrieval S2-12 149 - 153

COMP146 The Medical Image segmentation S2-13 212 - 215

SESSION 3 AI014 An clustering approach based on functionality of genes for

microarray data to find meaningful associations i i

S3-01 307 - 315

COMP006 Semantic web based personalization of e-learning courseware using concept maps and clustering

S3-02 158 - 167

COMP142 Modeling of Cutting Parameters for Surface Roughness in Machining

S3-03 464 - 467

COMP008 A web personalization system for evolving user profiles in dynamic web sites based on web usage mining techniques and agent

h l

S3-04 174 - 177

COMP013 Creating actionable knowledge within the organization using rough set computing

S3-05 334 - 336

COMP032 A new frame work for analyzing document clustering algorithms S3-06 442 - 452

COMP038 Secure Multiparty Computation Based Privacy Preserving Collaborative Data Mining

S3-07 66 - 70

COMP133 Towards Energy Efficient Protocols For Wireless Body Area Networks

S3-08 207 - 211

COMP119 A cascade data mining approach for network anomaly Detection system

S3-09 377 - 384

COMP124 Rule Analysis Based On Rough Set Data Mining Technique S3-10 291 - 296

COMP128 On the Investigations of Design, Implementation, Performance and Evaluation issues of a Novel BD-SIIT Stateless IPv4/IPv6 T l

S3-11 260 - 269

COMP129 The Role of IPv6 over Fiber (FIPv6): Issues, Challenges and its Impact on Hardware and Software.

S3-12 270 - 277

COMP137 Entrustment based authentication protocol for mobile systems.

S3-13

389 - 392

SESSION 4 COMP022 Exploiting parallelism in bidirectional dijkstra for shortest-path

computation S4-01 351 - 356

COMP138 Cld for improving overall throughput in wireless networks S4-02 46 - 49

COMN005 Congestion management routing protocol in mobile adhoc networks

S4-03 127 - 130

COMN020 Performance improvement in ad hoc networks using dynamic addressing

S4-04 6 - 13

COMN023 Hierarchical zone based intrusion detection system for mobile adhoc networks.

S4-05 135 - 138

COMN024 Implementing High Performance Hybrid Search Using CELL Processor

S4-06 139 - 148

COMN026 Enhancing temporal privacy and source-location privacy in wsn routing by fft based data perturbation method

S4-07 421 - 425

COMN027 Mixed-radix 4-2 butterfly fft/ifft for wireless communication S4-08 203 - 206

COMP035 NTRU - public key cryptosystem for constrained memory devices S4-09 55 - 59

COMP036 A novel randomized key multimedia encryption algorithm secure against several attacks

S4-10 60 - 65

COMP037 Denial Of Service: New Metrics And Their Measurement S4-11 363 - 368

COMP012 Fpga design of application specific routing algorithms for network on chip

S4-12 330 - 333

COMP019 Selection of checkpoint interval in coordinated checkpointing protocol for fault tolerant open-mpi

S4-13 216 - 223

SESSION 5 COMP116 Latest Trends and Technologies in Enterprise Resource Planning

ERP S5-01 240 - 245

COMP020 Integrating the static and dynamic processes in software development

S5-02 343 - 350

COMP023 Content management through electronic document management system

S5-03 21 - 23

COMP024 A Multi-Agent Based Personalized e-Learning Environment

S5-04 397 - 401

COMP109 Architecture Evaluation for Web Service Security Policy S5-05 284 - 290

COMP110 Harmonics In Single Phase Motor Drives And Power Conservation. S5-06 412 - 416

COMP030 Identification in the e-health information systems S5-07 402 - 405

COMP033 A Robust Security metrics for the e-Healthcare Information Systems

S5-08 297 - 302

COMP147 Theoretical Investigation Of Size Effect On The Thermal Properties Of Nanoparticles

S5-09 426 - 431

COMN033 An efficient turbo coded ofdm system S5-10 193 - 198

COMP018 Compval a system to mitigate sqlia S5-11 337 - 342

COMP126 Fault Prediction Using Conceptual Cohesion in Object Oriented System

S5-12 256 - 259

COMP127 A Framework for Multiple Classifier Systems Comparison (MCSCF)

S5-13 31 - 40

COMP150 A Comparative Study of Various Topologies and its performance analysis using WDM Networks

S5-14 457 - 463

SESSION 6 COMP117 A New Semantic Similarity Metric for Handling all Relations in

WordNet Ontology S6-01 246 - 255

COMP112 Simplification of diagnosing disease through microscopic images of blood cells

S6-02 224 - 229

COMP123 Efficient Apriori Hybrid Algorithm For Pattern Extraction Process S6-03 41 - 45

COMP125 MRI Mammogram Image Segmentation using N Cut method and Genetic Algorithm with partial filters

S6-04 1 - 5

COMP130 Localized Cbir for indexing image database S6-05 278 - 283

COMP021 Particle swarm optimization algorithm in grid computing S6-06 50 - 54

COMP149 Advancement in mobile technology Using BADA

S6-07 199 - 202

COMP107 Enhancing the Life Time of Wireless Sensor Networks Using Mean Measure Mechanism

S6-08 82 - 86

COMP113 Cloud Computing And Virtualization

S6-09 230 - 236

COMP121 Dynamic Key Management to minimize communication latency for efficient group communication

S6-10 432 - 435

COMP101 Towards Customer Churning Prevention through Class Imbalance S6-11 71 - 76

COMP120 Membrane Computing - an Overview S6-12 385 - 388

COMP148 Privacy Preserving Distributed Data Mining Using Elliptic Curve Cryptography

S6-13 406 - 411

COMP149 Modeling A Frequency Selective Wall For Indoor Wireless Environment

S6-14 393 - 396

Paper Index

Sl. No Title Page No. 1. MRI Mammogram Image Segmentation using NCut method and Genetic Algorithm with

partial filters A.Pitchumani Angayarkanni

1-5

2. Performance Improvement in Ad Hoc Networks Using Dynamic Addressing S.Jeyanthi & N.Uma Maheswari

6-13

3. Framework for Comparison of Association Rule Mining using Genetic Algorithm K.Indira & S.Kanmani

14-20

4. Content Management through Electronic Document Management System T.Vengattaraman, A.Ramalingam & P.Dhavachelvan

21-23

5. Intelligent Agent based Data Cleaning to improve the Accuracy of WiFiPositioning System Using Geographical Information System (GIS) T.Joshva Devadas

24-30

6. A Framework for Multiple Classifier Systems Comparison (MCSCF) P.Shanmugapriya & S.Kanmani

31-40

7. Efficient Apriori Hybrid Algorithm For Pattern Extraction Process J.Kavitha, D.Magdalene Delighta Angeline & P.Ramasubramanian

41-45

8. CLD for Improving Overall Throughput in Wireless Networks Dr. P. Seethalakshmi & Ms. A. Subasri

46-49

9. Particle Swarm Optimization Algorithm In Grid Computing Mrs.R.Aghila, M.Harine & G.Priyadharshini

50-54

10. NTRU - Public Key Cryptosystem For Constrained Memory Devices V.Pushparani & Kannan Balasubramaniam

55-59

11. A Novel Randomized Key Multimedia Encryption Algorithm Security Against Several Attacks S. Arul Jothi

60-65

12. Secure Multiparty Computation Based Privacy Preserving Collaborative Data Mining J.Bhuvana & Dr.T.Devi

66-70

13. Towards Customer Churning Prevention through Class Imbalance M.Rajeswari & Dr.T.Devi

71-76

14. Designing Health Care Forum Using Semantic Search Engine & Diagnostic Ontology Prof.Mr.V.Shunmughavel & Dr.P.Jaganathan

77-81

15. An Enhancing the Life Time of Wireless Sensor Networks Using Mean Measure Mechanism P.Ponnu Rajan & D.Bommudurai

82-86

16. Study of Similarity Metrics for Genomic Data Using GO-Ontology V.Annalakshmi,R. Priyadarshini &V. Bhuvaneshwari

87-94

17. Hybrid PSO based neural network classifier and decision tree for brain MRI mining Dr.V.Saravanan & T.R.Sivapriya

95-100

18. GAP: Genetic Algorithm based Power Estimation Technique for Behavioral Circuits Johnpaul C. I, Elson Paul & Dr. K. Najeeb

101-107

19. Human Action Classification Using 3D Star Skeletonization and RVM Classifier Mrs. B. Yogameena, M. Archana & Dr. (Mrs) S. Raju Abhaikumar

108-115

20. Relevance Vector Machine Based Gender Classification using Gait Appearance Features Mrs. B. Yogameena, M. Archana & Dr. (Mrs) S. Raju Abhaikumar

116-122

21. A Survey on Gait Recognition Using HMM Model M.Siva Sangari & M.Yuvaraju

123-126

22. Congestion Management Routing Protocol In Mobile ADHOC Networks A. Valarmathi1 & RM. Chandrasekaran

127-130

23. Human Motion Tracking And Behaviour Classification Using Multiple Cameras M.P.Jancy & B.Yogameena

131-134

24. Hierarchical Zone Based Intrusion Detection System for Mobile Adhoc Networks. D G Jyothi & S.N Chandra shekara

135-138

25. Implementing High Performance Hybrid Search Using CELL Processor Mrs.Umarani Srikanth

139-148

26. Phase Based Shape Matching For Trademarks Retrieval B.Sathya Bama, M.Anitha & Dr.S.Raju

149-153

27. Enhanced Knowledge Base Representation Technique for Intelligent Storage and Efficient Retrieval Using Knowledge Based Markup Language A. Meenakshi, V.Thirunageswaran & M.G. Avenash

154-157

28. Semantic Web Based Personalization Of E-Learning Courseware Using Concept Maps And Clustering D.Anitha

158-167

29. Adaptive visible watermarking and copy protection of reverted multimedia data S.T.Veena & Dr.K.Muneeswaran

168-173

30. A Web Personalization System for evolving user profiles in Dynamic Web Sites based on Web Usage Mining Techniques and Agent Technology G.Karthik, R.Vivekanandam & P.Rupa Ezhil Arasi

174-177

31. Automated Test Case Generation and Performance Analysis for GUI Application Ms. A.Askarunisa & Ms. D. Thangamari

178-187

32. The Need Of The Hour - NOSQL Technology for Next Generation Data Storage K.Chitra & Sherin M John

188-192

33. An Efficient Turbo Coded ofdm system Prof. Vikas Dhere

193-198

34. Advancement In Mobile Technologyusing Bada V.Aishwarya, J.Manibharathi & Dr.S.Durai Raj

199-202

35. Mixed-Radix 4-2 Butterfly FFT/IFFT For Wireless communication A.Umasankar & S.Vinayagakarthikeyan

203-206

36. Towards Energy Efficient Protocols For Wireless Body Area Networks Shajahan Kutty & J.A. Laxminarayana

207-211

37. The Medical Image Segmentation Hemalatha & R.Kalaivani

212-215

38. Selection of a Checkpoint Interval in Coordinated Checkpointing Protocol for Fault TolerantOpen MPI P.M.Mallikarjuna Shastry & K. Venkatesh

216-223

39. Simplification Of Diagnosing Disease Through Microscopic Images Of Blood Cells Benazir Fathima, K.V.Gayathiri Devi, M.Arunachalam & M.K.Hema

224-229

40. Cloud Computing And Virtualization R. Nilesh Madhukar Patil & Mr. Shailesh Somnath Sangle

230-236

41. A New Method For Solving Fuzzy Linear Programming With TORA S. Sagaya Roseline , A. Faritha Asma & E.C. Henry Amirtharaj

237-239

42. Latest Trends And Technologies In Enterprise Resource Planning Erp B.S.Dakshayani

240-245

43. A New Semantic Similarity Metric for Handling all Relations in WordNet Ontology K.Saruladha, Dr.G.Aghila & Sajina Raj

246-255

44. Fault Prediction Using Conceptual Cohesion in Object Oriented System V.Lakshmi, P.V.Eswaripriya, C.Kiruthika & M.Shanmugapriya

256-259

45. On the Investigations of Design,Implementation, Performance and Evaluation issues of a Novel BD-SIIT Stateless IPv4/IPv6 Translator J.Hanumanthappa, D.H.Manjaiah & C.V.Aravinda

260-269

46. The Role of IPv6 over Fiber (FIPv6): Issues, Challenges and its Impact on Hardware and Software. J.Hanumanthappa, D.H.Manjaiah & C.V.Aravinda

270-277

47. Localized CBIR for Indexing Image Databases D.Vijayalakshmi & P. Vijayalakshmi

278-283

48. Architecture Evaluation for Web Service Security Policy B.Joshi.vinayak ,Dr.D.H. Manjaiah ,J. Hanumathappa & Nayak.Ramesh.Sunder

284-290

49. Rule Analysis Based On Rough Set Data Mining Technique P.Ramasubramanian, V.Sureshkumar & P.Alli

291-296

50. A Robust Security metrics for the e-Healthcare Information Systems Said Jafari, Fredrick Mtenzi, Ronan Fitzpatrick & Brendan OShea

297-302

51. Face Detection Using Wavelet Transform And Rbf Neural Network M.Madhu, M.Moorthi, S.Sathish Kumar & Dr.R.Amutha

303-306

52. An Clustering approach based on Functionality of Genes for Microarray data to find meaningful associations M.Selvanayaki & V.Bhuvaneshwari

307-315

53. An Energy Efficient Adavanced Data Compression And Decompression Schemes For Wsn G.Mohanbabu#1, Dr.P.Renuga#2

316-319

54. Active Noise Control: A Simulation Study Sivadasan Kottayi & N.K. Narayanan

320-325

55. Texture Segmentation Method Based On Combinatorial Of Morphological And Statistical Operations Using Wavelets V.Vijayapriya & Prof.K.R.Krishnamoorthy

326-329

56. FPGA Design Of Routing Algorithms For Network On Chip R.Anitha & Dr.P.Renuga

330-333

57. Creating Actionable Knowledge within the Organization using Rough set computing Mr.R.Rameshkumar, Dr.A.Arunagiri, Dr.V.Khanaa & Mr.C.Poornachandran

334-336

58. COMPVAL A system to mitigate SQLIA S. Fouzul Hidhaya & Dr. Angelina Geetha

337-342

59. Integrating the Static and Dynamic Processes in Software Development V. Hepsiba Mabel, K. Alagarsamy & S. Justus

343-350

60. Exploiting Parallelism in Bidirectional Dijkstra for Shortest-Path Computation R.Kalpana, Dr. P.Thambidurai, R. Arvind kumar, S. Parthasarathi & Praful Ravi

351-356

61. Hiding Sensitive Frequent Item Set by Database Extension B. Mullaikodi & Dr. S.Sujatha

357-362

62. Denial Of Service:New Metrics And Their MeasurementDr.KannanBalasubramanian & P.Kavithapandian

363-368

63. High Performance Evaluation of 600-1200V, 1-40A Silicon Carbide Schottky Barrier Diodes and Their Applications Using Mat Lab K.Manickavasagan

369-376

64. A Cascade Data Mining Approach for Network Anomaly Detection System C. Seelammal

377-384

65. Membrane Computing - an Overview R.Raja Rajeswari & Devi Thirupathi

385-388

66. Entrustment Based Authentication Protocol For Mobile Systems. R.Rajalakshmi & R.S.Ponmagal

389-392

67. Modeling A Frequency Selective Wall For Indoor Wireless Environment. Mrs. K.Suganya, Dr.N.Suresh Kumar & P.Senthil Kumar

393-396

68. A Multi-Agent Based Personalized e-Learning Environment T. Vengattaraman, A. Ramalingam, P. Dhavachelvan & R.Baskaran

397-401

69. Identification in the E-Health Information Systems Ales Zivkovic

402-405

70. Privacy Preserving Distributed Data Mining Using Elliptic Curve Cryptography M.Rajalakshmi & T.Purusothaman

406-411

71. Harmonics In Single Phase Motor Drives And Energy Conservation Mustajab Ahmed Khan & Dr.A.Arunagiri

412-416

72. Improvement towards efficient OPFET detector Jaya V. Gaitonde & Rajesh B. Lohani

417-420

73. Enhancing Temporal Privacy and Source-Location Privacy in WSN Routing by FFT Based Data Perturbation Method R.Prasanna Kumar & T.Ravi

421-425

74. Theoretical nvestigation of size effect on the thermal properties of nanoparticles K.Sadaiyandi & M.A.Zafrulla Khan

426-431

75. Dynamic Key Management to minimize communication latency for efficient group communication Dr.P.Alli ,G.Vinoth Chakkaravarthy & R.Deepalakshmi

432-435

76. Integrated Biometric AuthenticationUsing Fingerprint and IRIS Matching A.Muthukumar & S.Kannan

436-441

77. New Framework for Analyzing Document Clustering Algorithms Mrs. J. Jayabharathy & Dr. S. Kanmani

442-452

78. Optimization of Tool Wear in Shaping Process by Machine vision system Using Genetic Algorithm S.Palani, G.Senthilkumar, S.Saravanan & J.Ragunesan

453-456

79. A Comparative Study of Various Topologies and its performance analysis using WDM Networks P. Poothathan, S. Devipriya & S. John Ethilton

457-463

80. Modeling of Cutting Parameters for Surface Roughness in Machining M. Aruna & P. Ramesh Kumar

464-467

Proceedings of International Conference on Computers, Communication & Intelligence, July 22nd & 23rd 2010

Velammal College of Engineering and Technology, Madurai Page 1

MRI Mammogram Image Segmentation Using Ncut Method And Genetic Algorithm With Partial

Filters (1)S.Pitchumani Angayarkanni M.C.A,M.Phil,Ph.d

Lecturer,Department of Computer Science, Lady Doak College, Madurai

[email protected] ABSTRACT: Cancer is one of the most common leading deadly diseases which affect men and women around the world. Among the cancer diseases, breast cancer is especially a concern in women. It has become a major health problem in developed and developing countries over the past 50 years and the incidence has increased in recent years. Recent trends in digital image processing are CAD systems, which are computerized tools designed to assist radiologists. Most of these systems are used for automatic detection of abnormalities. However, recent studies have shown that their sensitivity is significantly decreased as the density of breast increases. In this paper , the proposed algorithm uses partial filters to enhance the images and the Ncut method is applied to segment the malignant and benign regions , futher genetic algorithm is applied to identify the nipple position followed by bilateral subtraction of the left and the right breast image to cluster the cancerous and non cancerous regions. The system is trained using Back Propagation Neural Network algorithm. Computational efficiency and accuracy of the proposed system are evaluated based on the Frequency Receiver Operating Characteristic curve(FROC). The algorithm are tested on 161 pairs of digitized mammograms from MIAS database. The Receiver Operating Characteristic curve leads to 99.987% accuracy in detection of cancerous masses. Keywords: Filters, Normalized Cut, Segmentation, BPN, Genetic Algorithm and FROC. INTRODUCTION: Breast cancer is one of the major causes for the increased mortality among women especially in developed countries. It is second most common cancer in women. The World Health Organizations International estimated that more than 1,50,000 women worldwide die of breast cancer in year. In India, breast cancer accounts for 23% of all the female cancer death followed by cervical cancer which accounts to 17.5% in India. Early detection of cancer leads to significant improvements in conservation treatment. However, recent studies have shown that the sensitivity of these systems is significantly decreased as the density of the breast increased

while the specificity of the systems remained relatively constant. In this work we have developed automatic neuron genetic algorithmic approach to automatically detect the suspicious regions on digital mammograms based on asymmetries between left and right breast image. One of the major tool used for early detection of breast cancer is mammography. Mammography offers high quality images at low radiation doses and is the only widely accepted imaging method for routine breast cancer screening. Although mammography is widely used around the world for breast cancer detection, there are some difficulties when mammography is used for diagnosing breast cancer. One of the difficulties with mammography is that mammograms generally have low contrast compared with normal breast structure, and thus make it difficult for radiologists to interpret them. Studies show that the interpretation of mmaograms by radiologists could result in high rate of false-positive and false-negative. This difficulty has caused high proportion of women without cancers to undergo breast biopsies and miss the breast treatment time. Several solutions were proposed in the past to increase accuracy and sensitivity of mammography and reduce unnecessary biopsies. Double reading of mmamograms is one of the solutions and has been advocated to reduce the proportion of missed cancers. The basic idea for double reading is to read the mammograms by two radiologists. However this solution is both costly and time consuming.Instead CAD has drawn attention from both computer scientists and radiologists in the interpretation of mammograms. CAD which integrates computer science, image processing , pattern recognition and artificial intelligence technologies can be defined as a diagnosis that is made by a radiologist who uses the output from a computerized analysis of medical images as a second opinion in detecting lesions and in making diagnostic decisions. It has been proven that this kind of system can improve the accuracy of breast diagnosis for early prediction of breast cancer. Computer aided breast cancer detection system is especially useful



when the radiologist become tired of screening mammograms. In the CAD System for breast cancer, the detection of abnormal regions , such as calcification, mass and architectural distortion is the central task and the performance of a CAD system will depend on the performance of the detection of these abnormalities. There have been many proposed algorithms for detection of these abnormalities. In this paper we have introduced the detection of microcalcifications. As one of the early signs of breast cancer , mirocalcifications are tiny granule like deposits of calcium, which appear as small bright spots of mmaograms. Their size varies from 0.1 mm to 1mm. Cluster: of MCs is defines as a group of three to five MCs within regions. Generally microcalcification clusters are important indication of possible cancer. This algorithm effectively and automatically detect MCs . 2. ALGORITHM DESIGN: There are four steps involved in the algorithm for the detection MCCs which is shown in the figure.

2.1 PARTIAL FILTER FOR IMAGE ENHANCEMENT:

A filter is a mathematical transformation (called a convolution product) which allows the value of a pixel to be modified according to the values of neighbouring pixels, with coefficients, for each pixel of the region to which it is applied. The filter is represented by a table (matrix), which is characterized by its dimensions and its coefficients, whose centre corresponds to the pixel concerned. The table coefficients determine the properties of the filter. The following is an example of a 3 X 3 filter:

Thus, the product of the image matrix, which is usually very large because it represents the initial image (pixel table), by the filter yields a matrix corresponding to the processed image. 2.1.1 HIGH PASS FILTER: It allow high frequency areas to pass with the resulting image having greater detail resulting in a sharpened image. The boundary information of the enhanced image was extracted for visual evaluation. A high-pass (laplacian) filter was used for this purpose.

Figure 2: Mammogram Image enhanced using high pass filter 2.1.2) LOW PASS FILTER: Low pass filtering, otherwise known as "smoothing", is employed to remove high noise from a digital image. Noise is often introduced during the analog-to-digital conversion process as a side-effect of the physical conversion of patterns of light energy into electrical patterns . There are several common approaches to removing this noise: If several copies of an image have been obtained from the source, some static image, then it may be possible to sum the values for each pixel from each image and compute an average. This is not possible, however, if the image is from a moving source or there are other time or size restrictions. If such averaging is not possible, or if it is insufficient, some form of low pass spatial filtering may be required. There are two main types: reconstruction filtering, where an image is restored based on some knowledge of the type of degradation it has undergone. Filters that do this are often called "optimal filters". enhancement filtering, which attempts to improve the (subjectively measured) quality of an image for human or machine interpretability. Enhancement filters are generally heuristic and problem oriented One of the most important problems in image processing is denoising. Usually the procedure used for denoising, is dependent on the features of the image, aim of processing and also post-processing algorithms [5]. Denoising by low-pass filtering not only reduces the noise but also blurs the edges.

1 1 1

1 4 1

1 1 1

Partial Filter

Feature Extraction using NCut Segmentation

Genetic Algorithm

Multilayered BPN

Fig 1: Flow Chart of Algorithm



Spatial and frequency domain filters are widely used as tools for image enhancement. Low pass filters smooth the image by blocking detail information. Mass detection aims to extract the edge of the tumor from surrounding normal tissues and background, high pass filters (sharpening filters) could be used to enhance the details of images.

Figure 3: Mammogram Image Enhanced Using Low Pass filter

2.2 IMAGE SEGMENTATION:

The goal of image segmentation is to cluster pixels into salient image regions, i.e., regions corresponding to individual surfaces, objects, or natural parts of objects. In this we apply Normalized Cut method of segmentation to cluster microcalcification regions. Finally we outline the normalized cut approach of Shi and Malik [13].Here we seek a partition F and G = V F of the affinity weighted,undirected graph (without source and sink nodes). In order to avoid partitions where one of F or G is a tiny region, Shi and Malik propose the normalized cut criterion, namely that F and G should minimize.

Note any segmentation technique can be used for generating proposals for suitable regions F, for which N(F, V F) could be evaluated. Indeed, the SMC approach above can be viewed as using S and T to provide lower bounds on the terms L(F, V ) and L(G, V ) (namely L(S, V ) and L(T, V ), respectively), and then using the S-T min cut to globally minimize L(F,G) subject to S C F and T C G. Using this

method the Microcalcifications are

clustered. Figure 4: After Normalized Cut Segmentation The computational efficiency 12.563 seconds on the 160x160 image. 2.3 GENETIC ALGORITHM: A partial filtering absed normalized cut method is used to generate a image to separate the breast and the non breast region . The GA enhances the breast border . Border detector detects the edges in the binary images , where each pixel takes on either the intensity value of zero for a non border pixel or one for border pixel. Each pixel in the binary map corresponds to an underlying pixel in the original image . In this proposed system , kernel is extracted from border points as a neighborhood array of pixels of the size 3*3 window of binary image. The binary kernels are considered population strings for GA. The corresponding kernels are extracted from gray level mammogram image using spatial coordinate points and the sum of the intensity values are considered as the fitness value . After identifying initial population and the fitness value , the genetic operator can be applied to generate a new population. Reproduction operator produces new string for crossover. Reproduction is implemented as linear search through roulette wheel with slots weighted in proportion to kernel fitness values. In this function, a random number multiplies the sum of population fitness called as stopping point.

Figure 5: GA



2.4 GENERATING THE ASYMMETRIC IMAGE: After the images were aligned, bilateral subtraction was performed [47,48] by subtraction was performed by subtracting the digital matrix of the left breast image from the digital matrix of the right breast image. Microcalcification in the right breast image have positive pixel values in the image obtained after subtraction, while microcalcification in the left breast image have negative pixel values in the subtracted image. As a result, two new images were generated: one with positive values and the other with negative values. The most common gray level was zero, which indicated no difference between the left and right images. Simple linear stretching of the two generated images to cover the entire available range of 1024 gray levels was then calculated. The difference between corresponding pixels contains important information that can be used to discriminate between normal and abnormal tissue. The asymmetry image can be thresholded to extract suspicious regions. To generate FROC curve, the asymmetry image is thresholded using ten different intensity values ranges from 50-150. Figure 6 shows a asymmetry image and connected regions extracted based on thresholding to obtain a progressively larger number of high difference pixels.

Figure 6 Asymmetric images

Two different techniques are used in the interpretation of mammogram. The first technique consists of systematic search of each mammogram for visual pattern symptomatic tumors. Such as, a bright, approximately circular blob with hazy boundary might indicate the presence of a circumscribed mass. The second technique, the asymmetric approach , consists of systematic comparison of corresponding regions in the left and the right breast. 2.5. BPN TRAINING: In addition, a backpropagation artificial neural network (BP-ANN) was also developed and evaluated on the same data. The parameters for ANN training were published before. Figure 5 compare the ROC curves for the LGP and the BP-ANN algorithms respectively. The BP-ANN yielded an ROC area index of Az=0.880.01. Our GP approach achieved a statistically significantly better performance with Az=0.910.01.

Figure 7a) Steps involved in automated Classification using Ant Colony Optimization

2.6. ROC CURVE: Finally the technique was evaluated on the mammograms randomly selected from the non-suspicious section of the data base. The method outlined small regions in 5 out of the 15 non suspicious mammograms. The areas identified were generally very small compared to those in abnormal mammograms

Figure 8 Lesion Areas detected for Abnormal and Non-Suspicious cases (large image extracts). [Figures (a) and (b) are presented at different ordinate scales] Fig 8(a) shows the extracted areas for the abnormal lesions. (Image sequence 54 - 87 are stellate lesions and 74 to 100 are regular masses). We first establish whether these represent two different populations, by applying a Mann-Whitney (Wilcoxon rank sum) non-parametric test, since it is unrealistic to presume any specific underlying distribution. Median values are 450 and 1450 pixels respectively which produce a confidence level of 85% that the two data sequences emanate from distinct populations. Since this is not significant at normally acceptable levels we can compare the abnormals as a single distribution against the non-suspicious set, Fig 8(b). Using the same test, median values of 5500 and 10 pixels for the two distributions are established, giving a confidence level of greater than 97.5%



that the two distributions are different, suggesting that our PROTOCOLS ARE AN EFFECTIVE METHOD OF AREA DETECTION. CONCLUSION: The proposed algorithms are tested on 161 pairs of digitized mammograms from Mammographic Image analysis Society(MIAS) database. A free response receiver operating characteristic (FROC) curve is generated for the mean value of the detection rate for all the 161 pairs of mammograms in the MIAS database, to evaluate the performance of the proposed method. There is no doubt that for the immediate future mammography will continue to play a major role in the detection of breast cancer. The ultimate objective of this thesis was to identify tumor or masses in breast tissue. Since hamartomas consists of normal breast tissue with abnormal proportions and the first step was try to identify the different tissue type in mammography with normal breast tissue. The important features have been extracted from the Normalized cut method of the each sub image using various statistical techniques. The Genetic algorithm has been implemented and the breast border was identified from the clustered image. The tests that were carried out using a set of 117 tissues samples, 67 benign and 50 malignant. The result analysis has given a sensitivity of 99.8%, a specificity of 99.9% and an accuracy above 99.9%, which means encouraging results. The preliminary results of this approach are very promising in characterizing breast tissue. REFERENCES: [1] Bosch. A.; Munoz, X.; Oliver.A.; Marti. J., Modeling and Classifying Breast Tissue Density in Mammograms, Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on Volume 2, Issue , 2006 Page(s): 1552 15582. [2] Dar-Ren Chena, Ruey-Feng Changb, Chii-Jen Chenb, Ming-Feng Hob, Shou-Jen Kuoa, Shou-Tung Chena, Shin-Jer Hungc, Woo Kyung Moond, Classification of breast ultrasound images using fractal feature, ClinicalImage, Volume 29, Issue4, Pages 234-245. [3] Suri, J.S., Rangayyan, R.M.: Recent Advances in Breast Imaging, Mammography,and Computer-Aided Diagnosis of Breast Cancer. 1st edn. SPIE (2006) [4] Hoos, A., Cordon-Cardo, C.: Tissue microarray pro.ling of cancer specimens and cell lines: Opportunities and limitations. Mod. Pathol. 81(10), 13311338 (2001) [5] Lekadir, K., Elson, D.S., Requejo-Isidro, J., Dunsby, C., McGinty, J., Galletly, N.,Stamp, G., French, P.M., Yang, G.Z.: Tissue characterization using dimensionality reduction and .uorescence imaging. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4191, pp. 586593. Springer, Heidelberg (2006).

[6] A. Papadopoulos, D. I. Fotiadis, and A. Likas,An Automatic microcacalcification Detection System Based On a Hybrid Neural Network Classifier, Artificial Intelligence in Medicine,vol. 25, pp. 149-167, 2002. [7] A. Papadopoulos, D. I. Fotiadis, and A. Likas, Characterization of Clustered microcalcifications in Digitized Mammograms Using Neural Networks and Support Vector Machine, Artificial Intelligence in Medicine,vol. 34, pp. 141-150, 2005. [8] R. Mousa, Q. Munib, and A. Moussa, Breast Cancer Diagnosis System based in Wavelet Analysis and Fuzzy-Neural, Expert Systems with Applications, vol. 28, pp. 713-723, 2005.



Performance Improvement in Ad Hoc Networks Using Dynamic Addressing

S.Jeyanthi#1, N.Uma Maheswari*2 #Lecturer, Computer Science Department

PSNA College of Engg & Tech,Dindigul,Tamilnadu,India [email protected]

*Assistant Professor PSNA College of Engg & Tech,Dindigul,Tamilnadu,India

[email protected]

Abstract Dynamic addressing refers to the assignment of IP addresses automatically. In this paper we propose the scalable routing in ad hoc networks. It is well known that the current ad hoc protocol do not scale to work efficiently in networks of more than a few hundred nodes. Most current adhoc routing architectures use flat static addressing and thus, need to keep track of each node individually, creating a massive overhead problem as the network grows. In this paper, we propose that the use of dynamic addressing can enable scalable routing in adhoc networks. We provide an initial design of a routing layer based on dynamic addressing, and evaluate its performance. Each node has a unique permanent identifier and a transient routing address, which indicates its location in the network at any given time. The main challenge is dynamic address allocation in the face of node mobility. We propose mechanisms to implement dynamic addressing efficiently. Our initial evaluation suggests that dynamic addressing is a promising approach for achieving scalable routing in large adhoc and mesh networks. Keywords Adhoc networks, Flat static addressing, Dynamic addressing, Unique permanent identifier.

I. Introduction

Adhoc networking technology has advanced tremendously but it has yet to become a widely deployed technology. Ad hoc networks research seems to have downplayed the importance of scalability. In fact, current ad hoc architectures do not scale well beyond a few hundred nodes. Existing Ad Hoc Routing Layers do not support several hundred nodes and lack of scalability. It uses flat static addressing. It creates a massive Routing overhead. It increases searching time (not optimal solution). The easy-to-use, self-organizing nature of ad hoc networks make them attractive to a diverse set of applications. Today, these are usually limited to smaller deployments, but if we can solve

the scalability problem, and provide support for heterogeneous means of connectivity, including directional antennas, communication lasers, even satellites and wires, ad hoc and mesh-style networking is likely to see adoption in very large networks as well. Large-scale events such as disaster relief or rescue efforts are highly dependent on effective communication capabilities. Such efforts could benefit tremendously from the use of self-organizing networks to improve the communications and monitoring capabilities available. Other interesting candidate scenarios are community networks in dense residential areas, large scale, long-range networks in developing regions, and others, where no central administrator exists, or where administration would prove too costly. The current routing protocols and architectures work well only up to a few hundred nodes. Most current research in ad hoc networks focus more on performance and power consumption related issues in relatively small networks, and less on scalability. The main reason behind the lack of scalability is that these protocols rely on flat and static addressing. With scalability as a partial goal, some efforts have been made in the direction of hierarchical routing and clustering [1] [2] [3]. These approaches do hold promise, but they do not seem to be actively pursued. It appears to us as if these protocols would work well in scenarios with group mobility [4], which is also a common assumption among cluster based routing protocols. We examine that whether dynamic addressing is a feasible way to achieve scalable adhoc routing. Byscalable we mean thousands up to millions of nodes in an ad hoc or mesh network. With dynamic addressing, nodes change addresses as they move, so that their addresses have a topological meaning. Dynamic addressing simplifies routing but introduces two new problems: address allocation, and address lookup. As a guideline, we identify a set of properties that a scalable and efficient solution must have: Localization of overhead: a local change should affect only the immediate neighborhood, thus limiting the overall overhead incurred due to the change.



Lightweight, decentralized protocols: To avoid the responsibility at any individual node, and to keep the necessary state to be maintained at each node as small as possible. Zero-configuration: To remove the need for manual configuration beyond what can be done at the time of manufacture. Minimal restrictions on hardware: Omni directional link layers do not scale to large networks. Localization technologies, such as GPS, may limit protocol applicability. We present a complete design including address allocation, routing and address lookup mechanisms, and provide thorough evaluation results for the address allocation and routing components. First, we develop a dynamic addressing scheme, which has the necessary properties mentioned above. Our scheme separates node identity from node address, and uses the address to indicate the nodes current location in the network. Second, we study the performance of a new routing protocol, based on dynamic addressing, through analysis and simulations. The address allocation scheme uses the address space efficiently on topologies of randomly and uniformly distributed nodes, empirically resulting in Average routing table size< 2 log2 n Where n is the number of nodes in the network. We describe a new approach to routing in ad hoc networks, and compare it to the current routing architectures. However, the goal is to show the potential of this approach and not to provide an optimized protocol. We believe that the dynamic addressing approach is a viable strategy for scalable routing in ad hoc networks.

II. Related Work In most common IP-based ad hoc routing protocols [5] [7] [8], addresses are used as pure identifiers. Without any structure in the address space, there are two choices: either keep routing entries for every node in the network, or resort to flooding route requests throughout the network upon connection setup. However, this approach can be severely limiting as location information is not always available and can be misleading in, among others, non-planar networks. For a survey of ad hoc routing, see [9]. In the Zone Routing Protocol (ZRP) [10] and Fisheye State Routing (FSR) [11], nodes are treated differently depending on their distance from the destination. In FSR, link updates are propagated more slowly the further away they travel from their origin, with the motivation that changes far away are unlikely to affect local routing decisions. In ZRP is a hybrid reactive/ proactive protocol, where a technique called border casting is used to limit the damaging effects of global broadcasts. In multilevel-clustering approaches such as Landmark [12], LANMAR [3], L+ [13], MMWN [1] and Hierarchical State Routing (HSR) [2], certain nodes are elected as cluster

heads. These cluster heads in turn select higher level cluster heads, up to some desired level. A nodes address is defined as a sequence of cluster head identifiers, one per level, allowing the size of routing tables to be logarithmic in the size of the network, but easily resulting in long hierarchical addresses. In HSR, for example, the hierarchical address is a sequence of MAC addresses, each of which is 6 bytes long. A problem with having explicit cluster heads is that routing through cluster heads creates traffic bottlenecks. In Landmark, LANMAR and L+, this is partially solved by allowing nearby nodes route packets instead of the cluster head, if they know a route to the destination. Our work is, as far as we know, the first attempt to use this type of addressing in ad hoc networks. Tribe [14] is similar to DART at a high level, in that it uses a two phase process for routing: first address lookup, and then routing to the address discovered. However, the tree-based routing strategy used in Tribe bears little or no resemblance to the area based approach in DART. Tree-based routing may under many circumstances suffer from severe traffic concentration at nodes high up in the tree, and a high sensitivity to node failure. III. Overview of Network Architecture In this section, we present sketch of network architecture shown in figure 1, which could utilize the new addressing scheme effectively. In our approach, we separate the routing address and the identity of a node. The routing address of a node is dynamic and changes with node movement to reflect the nodes location in the network topology.

Figure.1 Overall system design

DART

Cluster creation

Address allocation

Distributed lookup table

Routing

Mapping



The identifier is a globally unique number that stays the same throughout the lifetime of the node. For ease of presentation, we can assume for now that each node has a single identifier. When a node joins the network, it listens to the periodic routing updates of its neighboring nodes, and uses these to identify an unoccupied address. The joining node registers its unique identifier and the newly obtained address in the distributed node lookup table. Due to mobility, the address may subsequently be changed and then the lookup table needs to be updated. When a node wants to send packets to a node known only by its identifier, it will use the lookup table to find its current address. Once the destination address is known the routing function takes care of the communication. The routing function should make use of the topological meaning that our routing addresses possess. We start by presenting two views of the network that we use to describe our approach: a) the address tree, and b) the network topology. Address Tree: In this abstraction, we visualize the network from the address space point of view. Addresses are l bit binary numbers, al-1, . . . , a0. The address space can be thought of as a binary address tree of l + 1 level, as shown in figure 2. The leaves of the address tree represent actual node addresses; each inner node represents an address sub tree a range of addresses with a common prefix. Level 0 sub tree is a single address. Level 1 sub tree has a 2 bit prefix and can contain up to two leaf nodes. Level 2 sub tree containing addresses [100] through [111].

Figure.2 Address tree of 3-bit binary address space.

For presentation purposes, nodes are sorted in increasing address order, from left to right. The actual physical links are represented by dotted lines connecting leaves in fig 3. Network Topology: This view represents the connectivity between nodes. In figure 3, the network from figure 2 is presented as a set of nodes and the physical connections between them. Each solid line is an actual physical connection, wired or wireless, and the sets of nodes from

each sub tree of the address tree are enclosed with dotted lines. Note that the set of nodes from any sub tree in figure 2 induces a connected sub graph in the network topology in figure 3.

Figure.3 A network topology with node addresses assigned.

The nodes that are close to each other in the address space should be relatively close in the network topology. More formally, we can state the following constraint. Prefix Sub graph Constraint: The set of nodes that share a given address prefix form a connected sub graph in the network topology. This constraint is fundamental to the scalability of our approach. Intuitively, this constraint helps us map the virtual hierarchy of the address space onto the network topology. The longer the shared address prefix between two nodes, the shorter the expected distance in the network topology. Finally, let us define two new terms that will facilitate the discussion in the following sections. A Level-k sub tree of the address tree is defined by an address prefix of (l-k) bits, as shown in figure 2. For example, a Level-0 sub tree is a single address or one leaf node in the address tree. A Level-1 sub tree has a (l-1)-bit prefix and can contain up to two leaf nodes. In figure 1, [0xx] is a Level-2 sub tree containing addresses [000] through [011]. Note that every Level-k sub tree consists of exactly two Level-(k - 1) sub trees. We define the term Level-k sibling of a given address to be the sibling of the Level-k sub tree to which a given address belongs. By drawing entire sibling sub trees as triangles, we can create abstracted views of the address tree, as shown in figure 4.

Figure.4 Routing entries corresponding to figure 2. Node 100 has entries

for sub trees 0xx, 11x (null entry) and 101. Here, we show the siblings of all levels for the address [100] as triangles: the Level-0 sibling is [101], Level-1 is [11x], and the Level-2 sibling is [0xx]. Note that each address has exactly one Level-k sibling, and thus at most l siblings in total.



Finally, we define the identifier of a sub tree to be the min of the identifiers of all nodes that have addresses from that sub tree. In cases where the prefix sub graph constraint is temporarily violated, two disconnected instances of the address sub tree exist in the network. In this case, each instance is uniquely identified by the min of the subset of identifiers that belongs to its connected sub graph. Our addressing and routing schemes have several attractive properties. First, they can work with omni directional and directional antennas as well as wires. Second, we do not need to assume the existence of central servers or any other infrastructure, nor do we need to assume any geographical location information, such as GPS coordinates. However, if infrastructure and wires exist, they can, and will, be used to improve the performance. Third, we make no assumptions about mobility patterns, although high mobility will certainly lead to increased overhead and decreased throughput. Finally, since our approach was designed primarily for scalability, we do not need to limit the size of the network; most popular ad hoc routing protocols today implicitly impose network size restrictions.

IV. Routing In this work, we use a form of proactive distance-vector routing, made scalable due to the hierarchical nature of the address space. Although we have chosen to use distance vector routing, we would like to point out that many of the advantages of dynamic addressing can be utilized by a link-state protocol as well. Each node keeps some routing state, routing state about a nodes Level-i sibling is stored at position i in each of the respective arrays. The routing state for a sibling contains the information necessary to maintain a route toward a node (any node) in that sub tree. The address field contains the current address of the node, and bit i of the address is referred to as address[i], where i = 0 for the least significant bit of the address. Arrays next hop and cost are self-explanatory. The id array contains the identifier of the sub tree in question. As described earlier, the identifier of a sub tree is equal to the lowest out of all the identifiers of the nodes that constitute that sub tree. Finally, route log[i] contains the log of the current route to the sibling at level i, where bit b of log i is referenced by the syntax route log[i][b]. To identify the most significant bit that differs between the current nodes address and the destination address. In this case, the most significant differing bit is bit number 2. The node then looks up the entry with index two in the next hop table, and then sends the packet there. In our example, this is the neighbor with address [011]. The process is repeated until the packet has reached the given destination address. The hierarchical technique of only keeping track of sibling sub trees rather than complete addresses has three immediate benefits. One,

the amount of routing state kept at each node is drastically reduced. Two, the size of the routing updates is similarly reduced. Three, it provides an efficient routing abstraction such that routing entries for distant nodes can remain valid despite local topology changes in the vicinity of these nodes. A. Loop Avoidance DART uses a novel scheme for detecting and avoiding routing loops, which leverages the hierarchical nature of the address space to improve scalability. A simple way of implementing this is to concatenate a list of all visited nodes in the routing update, and to have nodes check this list before accepting an update. However, this approach has a scalability problem, in that routing updates will quickly grow to unwieldy sizes. Instead, DART makes use of the structured address space to create a new kind of loop avoidance scheme. In order to preserve scalability, we generalize the loop freedom rule above. For each sub tree, once a routing entry has left the sub tree, it is not allowed to re-enter. This effectively prevents loops, and can be implemented in a highly scalable manner.

V. Node Lookup We propose to use a distributed node lookup table, which maps each identifier to an address, similar to what we proposed in [5]. Here, we assume that all nodes take part in the lookup table, each storing a few7 entries. However, this node lookup scheme is only one possibility among many, and more work is needed to determine the best lookup scheme to deploy. For our proposed distributed lookup table, the question now arises: which node stores a given entry? Let us call this node the anchor node of the identifier. We use a globally, and a priori, known hash function that takes an identifier as argument and returns an address where the entry can be found. If there exists a node that occupies this address, then that node is the anchor node. If there is no node with that address, then the node with the least edit distance between its own address and the destination address is the anchor node. To route packets to an anchor node, we use a slightly modified routing algorithm: If no route can be found to a sibling sub tree indicated by a bit in the address, that bit of the address is ignored, and the packet is routed to the sub tree indicated by the next (less significant) bit. When the last bit has been processed, the packet has reached its destination. This method effectively finds the node with the address minimum edit distance to the address returned by the hash function. For example, using figure 3 for reference, lets assume a node with identifier ID1 has a current routing address of [010]. This node will periodically send an updated entry to the lookup table, namely . To figure out where to send the entry, the node uses the hash function to calculate an address, like so: hash (ID1). If the returned address is [100], the packet will simply be routed to the node with that address. However, if the returned address



was instead [111], the packet could not be routed to the node with address [111] because there is no such node. In such a situation, the packet gets automatically routed to the node with the most similar address, which in this case would be [101] A. Improved Scalability We would like to stress that all node lookup operations use unicast only: no broadcasting or flooding is required. This maintains the advantage of proactive and distance vector based protocols over on-demand protocols: the routing overhead is independent of how many connections are active. When compared with other distance vector protocols, our scheme provides improved scalability by drastically reducing the size of the routing tables, as we described earlier. In addition, updates due to a topology change are in most cases contained within a lower level sub tree and do not affect distant nodes. This is efficient in terms of routing overhead. To further improve the performance of our node lookup operations, we envision using the locality optimization technique described in [5]. Here, each lookup entry is stored in several locations, at increasing distance from the node in question. By starting with a small, local lookup and gradually going to further away locations, we can avoid sending lookup requests across long distances to find a node that is nearby. B. Coping with Temporary Route Failures On occasion, due to link or node failure, a node will not have a completely accurate routing table. This could potentially lead to lookup packets, both updates and requests, terminating at the wrong node. The end result of this is that requests cannot be promptly served. In an effort to reduce the effect of such intermittent errors, a node can periodically check the lookup entries it stores, to see if a route to a more suitable host has been found. If this should be the case, the entry is forwarded in the direction of this more suitable host. Requests are handled in a similar manner: if the request could not be answered with an address, it is kept in a buffer awaiting either the arrival of the requested information, or the appearance of a route to a node which more closely matches the key requested. This way, even if a request packet arrives at the anchor node before the update has reached it; the request will be buffered and served as soon as the update information is available. C. Practical Considerations Due to the possibility of network partitioning and node failure, it is necessary to have some sort of redundancy mechanism built-in. We have opted for a method of periodic refresh, where every node periodically sends its information to its anchor node. By doing so, the node ensures that if its anchor node should become unavailable, the lookup information will be available once again within one refresh period. Similarly, without a mechanism of expiry, outdated

information may linger even after a node has left the network. Therefore, we set all lookup table entries to expire automatically after a period twice as long as the periodic refresh interval.

VI. Dynamic Address Allocation To assess the feasibility of dynamic addressing, we develop a suite of protocols that implement such an approach. Our work effectively solves the main algorithmic problems, and forms a stable framework for further dynamic addressing research. When a node joins an existing network, it uses the periodic routing updates of its neighbors to identify and select an unoccupied and legitimate address. It starts out by selecting which neighbor to get an address from the neighbor with the highest level insertion point is selected as the best neighbor. The insertion point is defined as the highest level for which no routing entry exists in a given neighbors routing table. However, the fact that a routing entry happens to be unoccupied in one neighbors routing table does not guarantee that it represents a valid address choice. We discuss how the validity of an address is verified in the next subsection. The new node picks an address out of a possibly large set of available addresses. In our current implementation, we make nodes pick an address in the largest unoccupied address block. For example, in figure 4, a joining node connecting to the node with address [100] will pick an address in the [11x] sub tree. Figure 5 illustrates the address allocation procedure for a 3-bit address space.

Figure.5 Address allocation procedure for a 3-bit address space

There are several ways to choose among the available addresses, and we have presented only one such method. However, it has turned out that this method of address selection works well in simulation trials. Under steady-state, and discounting concurrency, the presented address selection technique leads to a legitimate address allocation: the joining node is by definition connected to neighbor it got its new address from, and the new address is taken from one of the neighbors empty sibling sub trees, so the prefix sub graph constraint is satisfied. Node A starts out alone with address [000]. When node B joins the network, it observes that A has a null routing entry corresponding to the sub tree [1xx], and



picks the address [100]. Similarly when C joins the network by connecting to B, C picks the address [110]. Finally, when D joins via A, As [1xx] routing entry is now occupied. However, the entry corresponding to sibling [01x] is still empty, and so, D takes the address [010]. Merging Networks Efficiently - DART handles the merging of two initially separate networks as part of normal operations. In a nutshell, the nodes in the network with the higher identifier join the other network one by one8. The lower id network absorbs the other network slowly: the nodes at the border will first join the other network, and then their neighbors join them recursively. Dealing with Split Networks - Here, we describe how we deal with network partitioning. Intuitively, each partition can keep its addresses, but one of the partitions will need to change its network identifier. In this situation, there are generally no constraint violations. This reduces to the case where the node with the lowest identifier leaves the network. Since the previous lowest identifier node is no longer part of the network, the routing update from the new lowest identifier node can propagate through the network until all nodes are aware of the new network identifier.

VII. Maintaining the Dynamic Routing Table

While packet forwarding is a simple matter of looking up a next hop in a routing table, maintaining a consistent routing state does involve a moderate amount of sophistication. In addition to address allocation, loop detection and avoidance is crucial to correct protocol operation. DART nodes use periodic routing updates to notify their neighbors of the current state of their routing table. If, within a constant number of update periods, a node does not hear an update from a neighbor, it is removed from the list of neighbors, and its last update discarded. Every period, each node executes Refresh() function. Refresh() checks the validity of its current address, populates a routing table using the information received from its neighbors, and broadcasts a routing update. When populating the routing table, the entry for each level, i, in the received routing update is inspected is sequence, starting at the top level. For neighbors where the address prefix differs at bit i, we create a new routing entry, with a one-hop distance. It also has an empty route log, with the exception of bit i, which represents the level-i sub tree boundary that was just crossed. The sub tree identifier is computed using the id array. After this, the procedure returns, as the remaining routing information is internal to the neighbors sub tree, and irrelevant to the current node. For nodes with the same address prefix as the current node, we go on to inspect their routing entry for level i. First, we ensure that the entry is loop free. If so, then keep the routing entry as long as the identifier of the entry is the same or

smaller than what is already in the routing table, and as long as the distance is smaller.

VIII. Simulation Results

We conduct our experiments using two simulators. One is the well known ns-2 network simulator. In ns-2, we used the standard distribution; version 2.26 used the standard values for the Lucent Wave LAN physical layer, and the IEEE 802.11 MAC layer code, together with patch for a retry counter bug recently identified by Dan Berger at UC Riverside9. For all of the ns-2 simulations, we used the Random Waypoint mobility model with up to 800 nodes and a maximum speed of 5 m/s, a minimum speed of 0.5 m/ s, a maximum pause time of 100 seconds and a warm-up period of 3600 seconds10. The duration of all the ns-2 simulations was 300 seconds11, wherein the first 60 seconds are free of data traffic, allowing the initial address allocation to take place and for the network to thereby organize itself. The size of the simulation area was chosen to keep average node degree close to 9. For example, for a 400-node network, the size of the simulation area was 2800x2800 meters. This was done in order to maintain a mostly connected topology. Mobility parameters were chosen to simulate a moderately mobile network. DART is not suitable for networks with very high levels of mobility, as little route aggregation benefits are to be had when the current location of most nodes bear little relation to where these nodes were a few seconds ago. Our simulations focus on the address allocation and routing aspects of our protocol, not including the node lookup layer, which is replaced by a global lookup table accessible by all nodes in the simulation. The choice of lookup mechanism (for example distributed, hierarchical, replicated, centralized, or out-of-band) should be determined by network characteristics, and performance may vary depending on what mechanism is used. Here follows a summary of our findings. DSDV, due to its periodic updates and flat routing tables, experiences very high overhead growth as the network grows beyond 100 nodes, but nevertheless performs well in comparison with other protocols in the size ranges studied. AODV, due to its reactive nature, suffers from high overhead growth both as the size of the network, and the number of flows, grows. While AODV performs very well in small networks, the trend suggests that it is not recommendable for larger networks .DSR, in our simulations, performed well in small networks, and never experienced high overhead growth, likely due to its route caching functionality. However, due to excessive routing failures, DSR demonstrated unacceptable performance in larger networks. Finally, DART, demonstrated its scalability benefits in terms of no overhead growth with the number of flows, and logarithmic overhead growth with network size. The simulation results shown in Figure 6, 7, 8.



Figure.6 Communication between source and destination Figure.7 Destination became member in group 3

Figure.8 Destination leaving group3 and move towards group4

IX. Performance Evaluation

We compare our protocol to reactive protocols (AODV, DSR) and a proactive protocol (DSDV). Our results shown in table.1 suggest that dynamic addressing and proactive routing together provide significant scalability advantages and high level addressing. DART reduces the overhead growth with the number of flows, and logarithmic overhead growth with network size shown in figure 1.

Table.1 DART performance

Figure.9 Throughput vs. Network Size (Nodes)

X. Conclusion We proposed Dynamic address routing, an initial design toward scalable ad hoc routing. We outline the novel challenges involved in a dynamic addressing scheme, and proceeded to describe efficient algorithmic solutions. We show how our dynamic addressing can support scalable routing. We demonstrate, through simulation and analysis, that our approach has promising scalability properties and is a viable alternative to current ad hoc routing protocols. First, we qualitatively compare proactive and reactive overhead and determine the regime in which proactive routing exhibits less overhead that its reactive counterpart. Large scale simulations show that the average routing table size with DART grows logarithmically with the size of the network. Second, using the ns-2 simulator, we compare our routing scheme to AODV,

DSDV DSR AODV DART No of Transmitted packets

10726 5138 23349 42661

No of Lost packets

82 234 1003 1654



DSR and DSDV, and observe that our approach achieves superior throughput, and with considerably smaller overhead, in networks larger than 400 nodes. The trend in simulated overhead, together with the analysis provided, strongly indicate that DART is the only feasible routing protocol for large networks. We believe that dynamic addressing can be the basis for ad hoc routing protocols that for massive adhoc and mesh networks.

REFERENCES

[1] Ram Ramanathan and Martha Steenstrup, Hierarchically-organized, multihop mobile wireless networks for quality-of-service support, Mobile Networks and Applications, vol. 3, no. 1, pp. 101119, 1998. [2] Guangyu Pei, Mario Gerla, Xiaoyan Hong, and Ching-Chuan Chiang, A wireless hierarchical routing protocol with group mobility, in WCNC, 1999. [3] G. Pei, M. Gerla, and X. Hong, Lanmar: Landmark routing for large scale wireless ad hoc networks with group mobility, in ACM MobiHOC00, 2000. [4] X. Hong, M. Gerla, G. Pei, and C. Chiang, A group mobility model for ad hoc wireless networks, 1999. [5] J. Eriksson, M. Faloutsos, and S. Krishnamurthy, Peernet: Pushing peer-2-peer down the stack., in IPTPS, 2003. [6] C. Perkins, Ad hoc on demand distance vector routing, 1997. [7] Charles Perkins and Pravin Bhagwat, Highly dynamic destinationsequenced distance-vector routing (DSDV) for mobile computers, in ACM SIGCOMM94, 1994. [8] David B Johnson and David A Maltz, Dynamic source routing in ad hoc wireless networks, in Mobile Computing, vol. 353. Kluwer Academic Publishers, 1996. [9] Xiaoyan Hong, Kaixin Xu, and Mario Gerla, Scalable routing protocols for mobile ad hoc networks, IEEE NETWORK, vol. 16, no. 4, 2002. [10] Z. Haas, A new routing protocol for the reconfigurable wireless networks, 1997. [11] Guangyu Pei, Mario Gerla, and Tsu-Wei Chen, Fisheye state routing: A routing scheme for ad hoc wireless networks, in ICC (1), 2000, pp. 7074. [12] Paul F. Tsuchiya, The landmark hierarchy : A new hierarchy for routing in very large networks, in SIGCOMM. 1988, ACM. [13] Benjie Chen and Robert Morris, L+: Scalable landmark routing and address lookup for multi-hop wireless networks, 2002. [14] Aline C. Viana, Marcelo D. de Amorim, Serge Fdida, and Jos F. de Rezende, Indirect routing using distributed location information, ACM Mobile Networks Applications, Special Issue on Mobile and Pervasive Computing, 2003. [15] Jakob Eriksson, Michalis Faloutsos, and Srikanth Krishnamurthy, Scalable ad hoc routing: The case for dynamic addressing, in IEEE InfoCom, 2004.



Framework for Comparison of Association Rule Mining using Genetic Algorithm

K.Indira, S.Kanmani Research Scholar, Professor & Head Department of CSE, Department of IT Pondicherry Engineering College, Pondicherry Engineering College, Pondicherry, India Pondicherry, India [email protected] [email protected] Abstract A new framework for comparing the literature on Genetic Algorithm for Association Rule Mining is proposed in this paper. Genetic Algorithms have emerged as practical, robust optimization and search methods to generate accurate and reliable Association Rules. The main motivation for using GAs in the discovery of high-level prediction rules is that they perform a global search and cope better with attribute interaction than the greedy rule induction algorithms often used in data mining. The objective of the paper is to compare the performance of different methods based on the methodology, datasets used and results achieved. It is shown that the modification introduced in GAs increases the prediction accuracy and also reduces the error rate in mining effective association rules. The time required for mining is also reduced. Keywords - Data Mining, Genetic Algorithm, Association Rule Mining,

I. INTRODUCTION In todays jargon enormous amount of data are stored in files, databases, and other repositories. Hence it becomes necessary, to develop powerful means for analysis and interpretation of such data and for the extraction of interesting knowledge to help in decision-making. Thus, there is a clear need for (semi-) automatic methods for extracting knowledge from data. This need has led to the emergence of a field called data mining and knowledge discovery. Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. The Knowledge Discovery in Databases process comprises of a few steps starting from raw data collections to formation of new knowledge. The iterative process consists of the following steps: Data cleaning: also known as data cleansing, is a phase in which noise data and irrelevant data are removed from the collection. Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source.

Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. The architecture of Data mining system is depicted in figure.1

Figure 1. Architecture of typical Data mining System.



This paper reviews the works published in the literature, where basic Genetic Algorithm is modified in some form to address Association Rule Mining. The rest of the paper is organized as follows. Section II briefly explains Association Analysis. Section III gives a preliminary overview of Genetic Algorithm for Rule Mining. Section IV Reviews the different approaches reported in the literature based on Genetic Algorithm for Mining Association Rules. Section V lists the inferences attained from the comparison. Section VI presents the concluding remarks and suggestions for further research.

I. ASSOCIATION ANALYSIS Association analysis is the discovery of what are commonly called association rules. It studies the frequency of items occurring together in transactional databases, and based on a threshold called support, identifies the frequent item sets. Another threshold, confidence, which is the conditional probability that an item appears in a transaction when another item appears, is used to pinpoint association rules. The discovered association rules are of the form: PQ [s,c], where P and Q are conjunctions of attribute value-pairs, and s (for support) is the probability that P and Q appear together in a transaction and c (for confidence) is the conditional probability that Q appears in a transaction when P is present.

III. GENETIC ALGORITHM

A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology. Genetic algorithms use biologically inspired techniques such as genetic inheritance, natural selection, mutation, and sexual reproduction (recombination, or crossover). Genetic algorithms are typically implemented using computer simulations in which an optimization problem is specified. For this problem, members of a space of candidate solutions, called individuals, are represented using abstract representations called chromosomes. The GA consists of an iterative process that evolves a working set of individuals called a population toward an objective function, or fitness function. Traditionally, solutions are represented using fixed length strings, especially binary strings, but alternative encodings have been developed. The evolutionary process of a GA is a highly simplified and stylized simulation of the biological version. It starts from a population of individuals randomly generated according to some probability distribution, usually uniform and updates this population in steps called generations. Each generation, multiple individuals are randomly selected from the current population based upon some application of fitness, bred using crossover, and modified through mutation to form a

new population. The flowchart of the Basic Genetic Algorithm is given in figure 2.

Figure 2. Basic Genetic Algorithm.

A. [Start] Generate random population of n chromosomes (suitable solutions for the problem) B. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population C. [New population] Create a new population by repeating the following steps until the new population is complete

i. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)

ii. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.

iii. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).

iv. [Accepting] Place new offspring in a new population

D. [Replace] Use new generated population for a further run of algorithm E. [Test] If the end condition is satisfied, stop, and return the best solution in current population F. [Loop] Go to step 2



IV. ANALYSIS ON GENETIC ALGORITHM FOR MINING ASSOCIATION RULES

Among the Genetic algorithms designed for the purpose of Association rule mining is discussed based on the following criteria 1. Genetic Operations 2. Encoding I. Initial Population II. Crossover III. Mutation IV. Fitness Threshold 3. Methodology. 4. Application areas. 5. Evaluation Parameters The various methodologies are listed in Table A1. given in Annexure. 1. Genetic Operations The basic steps in the traditional Genetic algorithm implementations are discussed in the previous section. Modifications are carried out in the traditional GA to increase the prediction accuracy thereby reducing error rate in mining association rules. The variations have been carried out in various steps of GA. 2. Encoding : Encoding is the process of representing the entities in datasets for mining. Rules or chromosomes can be represented either with fixed length data [2..18] or by varying length chromosomes[1], Fuzzy rules are implemented for encoding data in [10], In [4] and [6] coding is done using natural numbers, In [17] Gene expressions are used for representation of chromosomes and encoding is carried out using arrays in [7] . Initial Population: The initial population could be generated by random selection, seeded by users [1], single rule set generation [5] and Fuzzy Rules [10]. Crossover: The Crossover operator which produces new offspring and hence new population plays a vital role in enhancing the efficiency of the algorithms. The changes are carried out as discussed in Table A1. Saggar. M et. Al [2] describes whether crossover is to be performed or not and if required the locus point where the crossover begins is of prime importance. Crossover on same attributes of both offspring if present or random attributes in absence of similar attributes is carried out in [11]. In [12] setting the crossover rate dynamically for each generations are presented. The concept of Symbiotic combination, where instead of crossover the combination of chromosomes to generate a new chromosome based on Symbiogenesis in Ramin Halavathy et. Al [5] has proved to increase the speed of the rule generation system.

Mutation: Mutation is the process where attributes of selected entities are changed to from a new offspring. The Mutation Operator is altered based on the application domain into macro mutation in [1], changing locus points of mutation in [2]. The weight factor is taken into consideration for locus point of mutation in [5] so as to generate a better offspring, Dynamic mutation where the mutation point is decided on the particular entity and generation selected enhances the diversity of colony is introduced in [12], mutation 1 & mutation 2 where mutation is performed twice to generate offspring is performed in [16]. Adaptive mutation where the mutation rate differs for each generation is found to produce better offsprings in [17]. Fitness Threshold: The passing of chromosomes from a population to new population for the succeeding generation depends on the fitness criteria. Changes to the fitness functions or threshold values alter the population and hence the effective fitness values lead to the efficiency of the rules generated. The negation of the attributes are taken into consideration while generating rules by including criteria like True Positive, True Negative, False Positive and False Negative[2]. By these criteria rules with negated conditions of attributes are also generated. By varying the fitness values dynamically in each generation, the speed of the system can be improved [4]. Factors like strength of implication of rules when considered while calculating fitness threshold proves to generate more interesting rules [6]. The Sustainability index, creditable index and inclusive index when considered for fitness threshold results in better predictive accuracy [7]. When the real values of confidence and support derived and applied in threshold rate found to generate faster than traditional methods [8]. The predictability and comprehensibility factors of rules tends to provide better classification performance[11]. 3. Methodology Rather than altering the operations in basic GA algorithm the changes made in the methodology has also proved to increase the performance. In [5] the crossover operation is replaced by symbiotic recombination techniques. Wenxi

Documents

ICCCI Proceedings