22
CyberMiner Software Design Description CS/SE 6362.001 Fall 2015 Submitted to Dr. Lawrence Chung Associate Professor, Department of Computer Science, The University of Texas at Dallas, Richardson, TX -75080 Team: Karthik Kannambadi Sridhar Ramakrishnan Sathyavageeswaran Vaidehi Jariwala Team Website: http://utdallas.edu/~rxs142530/cyberminer/readme.html pg. 1

personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

CyberMinerSoftware Design Description

CS/SE 6362.001Fall 2015

Submitted to Dr. Lawrence Chung

Associate Professor,Department of Computer Science,The University of Texas at Dallas,

Richardson, TX -75080

Team:Karthik Kannambadi Sridhar

Ramakrishnan SathyavageeswaranVaidehi Jariwala

Team Website:http://utdallas.edu/~rxs142530/cyberminer/readme.html

pg. 1

Page 2: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

Document status overview

Document title: Software Design Description

Identification: Software Design Description.docx

Author:

Ramakrishnan Sathyavageeswaran,Karthik Kannambadi SridharVaidehi Jariwala

Document status: Complete

Version Primary Author(s)

Description of Version

Reason of Change

Date

1.0 Karthik, Ram, Vaidehi

DRAFT Documentation 10/01/2015

1.1 Karthik, Ram, Vaidehi

Phase 1 -FINAL Documentation 10/15/2015

2.0 Karthik, Ram, Vaidehi

Phase 2 Draft Documentation 11/10/2015

2.1 Karthik, Ram, Vaidehi

Phase 2 – Adding changes to Project description

Documentation 11/25/15

2.2 Karthik, Ram, Vaidehi

Phase 2 - Final Documentation 11/30/2015

pg. 2

Page 3: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

Table of Content1. INTRODUCTION.......................................................................................................................................5

1.1 Design Overview..........................................................................................................................5

1.2Goals and objectives.........................................................................................................................5

1.2.1 Phase 1.........................................................................................................................................5

1.2.2 Phase 2.........................................................................................................................................5

1.2.3Statement of Scope....................................................................................................................5

1 SYSTEM ARCHITECTURAL DESIGN..............................................................................................6

2.1 KWIC System.....................................................................................................................................6

2.1.1 Chosen System Architecture..................................................................................................6

2.1.2 Discussion of Alternative Designs........................................................................................6

2.1.3 Trade off Analysis......................................................................................................................7

2.1.4 Advantages.................................................................................................................................7

2.1.5 Disadvantages............................................................................................................................8

2.2 Cyberminer System..........................................................................................................................9

2.2.1 Chosen System Architecture..................................................................................................9

2.2.2 Alternate Architecture Discussions....................................................................................10

2.2.3 Trade off analysis of Alternate Architecture Styles........................................................13

2.2.4 System Interface Description...............................................................................................13

3 DETAILED DESCRIPTION OF COMPONENTS.................................................................................14

3.1 KWIC Engine....................................................................................................................................14

3.1.1 Line Storage..............................................................................................................................14

3.1.2 Circular Shift.............................................................................................................................14

3.1.3 Noise Eliminator.......................................................................................................................14

3.1.4 Alphabetizer..............................................................................................................................14

3.2 Cyberminer Engine.........................................................................................................................15

3.2.1 Insert...........................................................................................................................................15

3.2.2 Query..........................................................................................................................................15

3.2.3 Delete..........................................................................................................................................15

4 USER INTERFACE DESIGN.................................................................................................................16

4.1 Description of the User Interface................................................................................................16

pg. 3

Page 4: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

4.2 Use Cases.........................................................................................................................................17

Use-case 1: User provides input...............................................................................................................17

Use-case 2: User provide number input...................................................................................................17

Use-case 3: User provide input with Special character..........................................................................18

Use-case 4: User provides a search query..............................................................................................18

Use-case 5: User provides a search query with AND, OR, NOT specifier..........................................18

Use-case 6: User deletes any existing entry from the Database..........................................................18

5 ADDITIONAL MATERIAL......................................................................................................................19

pg. 4

Page 5: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

1. INTRODUCTION1.1 Design Overview This document describes the various design aspects of the “Cyber Miner”, a web search engine. The intent is to build a CyberMiner web search Engine, mimicking the popular search engines like Google, Bing and Yahoo, but in a smaller environment where the web-links to be searched are part of a local database, and depending on the search keywords, the corresponding entries are shown to the user, from within the indexed database.

In Phase 1, the system shall be implemented in J2EE which has the implementation of the KWIC. The user will give an input and system shall show the circular shifted lines and alphabetized lines in the Web page with clear description. In this phase the indices are generated but the not stored in the database.

In Phase 2, the system shall be a four tier J2EE Web Application. The Front end is development using the web technologies like HTML, CSS, and JQuery. Controller Layer would be a Java severlet and Model which has the core logic which will be Java beans. We uses the Elasticsearch which is built on top of Apache Lucene for storing the URL and description of it. All the URL and Descriptions are stored in the form of JSON Objects. The system shall provide the user with the search capabilities and ability to add URL to the database.

1.2Goals and objectives1.2.1 Phase 1

To create a J2EE website which generate the KWIC index for the user based on the output.

1.2.2 Phase 2To create a web search engine which provides search results upon the user inputs through a database.

1.2.3Statement of ScopeThe software will take lines as an input and will store indices for those particular lines, is being developed. Output will be circular shifts of all lines in ascending alphabetical order. A Database will be used to store the indices and display the matches later on the search. We use the Elastic Search Data store to store the URL and description, and to perform search operations on them.

pg. 5

Page 6: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

1 SYSTEM ARCHITECTURAL DESIGN 2.1 KWIC System2.1.1 Chosen System ArchitectureAbstract Data Type (ADT) or Object Oriented (OOP) style, is a type of Software Architecture, wherein different modules are built, to perform specific tasks, and communicate with each other via explicit invocation of each other’s functionality. This project uses Object Oriented Style for the implementation. A typical architecture diagram of a KWIC system implemented in OOP style, is shown below.

2.1.2 Discussion of Alternative Designs To perform tradeoff analysis, other architecture styles namely Shared Data type and Implicit Invocation type architectures were looked at. However, because of their shortcomings, specifically to implement KWIC system, they were dropped out.

Shared Data Type Architecture:

In this architecture style, data is shared across all the modules, and the modules themselves do not store any data. This is good for performance as there is no data replication, but modifiability and enhance-ability are the two major disadvantages of this architecture style.

pg. 6

Page 7: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

Implicit Invocation Type Architecture:

In this architecture style, the component integration is similar to that of shared data, but the interface to data is more abstract. This is based on an active data model, where any data writes cause events to occur, and those events in turn invoke the next set of processing. This has a major disadvantage with the performance, in terms of space and time.

2.1.3 Trade off AnalysisThe following table talks about the various trade-offs that each of the other architecture style achieves.

Shared Data

ADT Implicit Invocation

Modifiability Algorithms - + +Data

Representation-- + ++-

Enhanceability Add Function +- ++- ++Performance Space ++ + -

Time + + -Reusability -- + +

Intuitiveness - + +-

2.1.4 AdvantagesThis style gives strong support to modifiability and scalability.

Modifiability:o Change in processing algorithmo If we change processing in one algorithm then it will not affect others.o Change in Data representation

Scalability:o There is no constraint on number of lines processed and stored in Database.

pg. 7

Page 8: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

2.1.5 DisadvantagesAlthough there are good amount of advantages, there are some disadvantages too. Any object that is dependent or invokes functions of other objects, have to know their identity and definition beforehand to communicate. This, sometimes can become an overhead.

pg. 8

Page 9: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

2.2 Cyberminer System2.2.1 Chosen System ArchitectureCyberminer internally consists of the previously defined KWIC system along with other components, which are modeled in Model View Controller (MVC) style of architecture. It includes an Open Source search server called Elasticsearch, which is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

Here, View compromises of the GUI specific components of Cyberminer. It enables users to add, delete and search for entries to and from the backend. The control layer consists of a delegate which forwards the transactions from View to the corresponding element in the Model layer, which consists of individual processing elements for add, delete, search and user configuration. Here Control interacts with the previously defined KWIC system as part of the Insert transaction and passes on the Alphabetized output to the Model, which in turn stores it in the Elastic database.

Advantages:

As per the MVC guidelines, Control layer only interacts with the Model, which is split into Elastic search model and KWIC system, which in turn accesses KWIC system. This increases cohesion between the Model, View and the Control layers.

Reusability of KWIC Engine (Model), as only the Control delegates the interface with KWIC.

pg. 9

Page 10: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

2.2.2 Alternate Architecture Discussions2.2.2.1 Alternate Architecture: 1

Here, KWIC System is coupled with the Model layer. The Model specific components directly interact with the KWIC system, without any external interfaces

Advantages:

Performance of KWIC system access is good.

Disadvantages:

High coupling of KWIC system in the overall system. High degree of dependency on the KWIC system APIs within the Model.

KWIC system is not replaceable with a different one.

pg. 10

Page 11: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

2.2.2.2 Alternate Architecture: 2

Here, Model layer directly interfaces with the KWIC system, instead of Control interfacing with the KWIC System.

Advantages:

Control layer is simple all the processing happens in the Model.

Disadvantages:

Model layer is tightly coupled with KWIC System. Model does more than interfacing with the Elastic database, which is against the MVC guidelines.

Model layer is no more reusable and is bulky. Performance of the Model layer is adversely affected, and hence of the Cyberminer

system.

pg. 11

Page 12: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

2.2.2.3 Alternate Architecture: 3

Here, KWIC System is interfaced with the Control Layer and the Model layer. Alongside, KWIC System interfaces with the Elastic Database too.

Advantages:

High performance inserts and retrievals to and from the Database.

Disadvantages:

KWIC System is tightly coupled with the Control and Model layers, and with the Elastic Database. This affects reusability of the system modules.

Portability and Scalability of the Cyberminer system also gets affected.

pg. 12

Page 13: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

2.2.3 Trade off analysis of Alternate Architecture Styles

2.2.4 System Interface Description

The Class Diagram of the Cyberminer System is as follows.

pg. 13

Page 14: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

3 DETAILED DESCRIPTION OF COMPONENTS 3.1 KWIC Engine3.1.1 Line StorageThe input line from the input medium is populated in the LineStorage data structure. This module stores lines as an ordered set of words, and the words as an ordered set of characters.

Public String getWord(int wordIdx, int lineIdx) – Returns the specific word Public String[] getLine(int lineIdx) – Returns the specific Line Public setWord(String word, int lineIdx) – Set the specific Word in the storage. Public void addNewLine() – Adds a New line entry to Line Storage

3.1.2 Circular ShiftThis module will create lines of circular shifts of stored lines. This is achieved by iteratively removing the first word, and appending it to the end of each line. It also provides routine to access to individual characters and words in the shifted lines.

Public void doCircularShift(String line) - Initiate Circular Shift processing Public String[] getLine(int lineIdx) – Returns the specific Line Public String getWord(int lineIdx, intwordIdx) – Returns the specific Word

3.1.3 Noise EliminatorThis module will remove the common articles like ‘a’, ‘an’, ‘the’, etc. from the circularly shifted output. These are considered noise words, as they will not serve any purpose for during searching the indices. This will reduce the number of lines presented for the next module, and hence optimize the overall performance of the KWIC system.

3.1.4 Alphabetizer Creates alphabetized lines of the circular shifts using CS-Char and CS-Word. It provides routines to access shifted lines in alphabetical order.

Public void doAlphabetize(CircularShift tempCS) – Initiate Alphabetizer processing Public String[] getLine(int lineIdx) – Returns the specific Line Private void downwardShift(int rootIdx, int leafIdx) - Private Method to Sort the lines.

Insertion Sort

pg. 14

Page 15: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

3.2 Cyberminer Engine3.2.1 InsertThe alphabetized output, which is the KWIC indexed description and URL is inserted into the Elastic Database.

Public IndexResponse insert(String index, XContentBuilder document)

3.2.2 QueryAND Query: The user entered keywords are run through an AND query, where only the entries which match ALL the keywords are returned.

Public searchResponse searchWithAnd(String index, XContentBuilder document)

OR Query: The user entered keywords are run through an OR query, where all the entries which match one or more entered keyword are returned

Public searchResponse searchWithOr(String index, XContentBuilder document)

NOT Query: The user entered keywords are run through an NOT query, where all the entries EXCEPT the specified keywords are returned.

Public searchResponse searchWithNot(String index, XContentBuilder document)

3.2.3 Delete Deletion of out of date URL, and the corresponding description from the Elastic Database.

Public seleteResponse deleteOutDatedUrl(String index, XContentBuilder document)

pg. 15

Page 16: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

4 USER INTERFACE DESIGN 4.1 Description of the User Interface The user interface (UI) is web-based which provides easy way to create the KWIC index. The user interface consists of a text input box, and multiple internal window panes, which display the URL and descriptor. The interface is intuitive for users to understand every functionality. Appropriate tool tips shall be provided for various functionalities.

Here, in the following home page below buttons are provided

add url search url delete url and user config

Add URL:

It is used when user wants to add new URL to database.

Search URL:

This option is used when user wants to search URL which is already stored in database.

Delete URL:

It is used when user wants to delete the outdated URL.

User Config:

It is helpful when user wants to filter out words/symbols which are not meaningful to the users.

pg. 16

Page 17: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

4.2 Use Cases Cyberminer System Use Case

Use-case 1: User provides input

Use case ID 1Primary Actor UserMain Flow User enters URL in the input box and a

description, and then clicks on “ADD URL”. This input is then passed to KWIC system for circular shift and alphabetizer processing, and then the output is going to be stored in the database.

Alternate Flow N/A

Use-case 2: User provide number input

Use case ID 2Primary Actor UserMain Flow User enters an invalid URL. This input is

processed and Error message “Input

pg. 17

Page 18: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

should be a valid URL” shall be shown.

Alternate Flow N/A

Use-case 3: User provide input with Special character

Use case ID 3Primary Actor UserSecondary Actor KWIC systemMain Flow User enters the input with special

characters like @, #, $, %, *, etc. in the input box. This input is processed and Error message “Input Should not contain special characters” shall be shown.

Alternate Flow N/A

Use-case 4: User provides a search query

Use case ID 4Primary Actor UserSecondary Actor Cyberminer systemMain Flow User enters the set of words to be filtered

out, from the output, via the User Config menu.

Alternate Flow N/A

Use-case 5: User provides a search query with AND, OR, NOT specifier

Use case ID 4Primary Actor UserSecondary Actor Cyberminer systemMain Flow User enters the search query with AND,

OR, NOT specifier. The Cyberminer system creates a search query to the Elastic Search module and retrieves the relevant entries and displays it to the user.

Alternate Flow N/A

Use-case 6: User deletes any existing entry from the Database

Use case ID 5

pg. 18

Page 19: personal.utdallas.educhung/SA/Presentations/Fall-2015/… · Web viewCyberMiner. Software Design Description. CS/SE 6362.001. Fall 2015. Submitted to . Dr. Lawrence Chung. Associate

Primary Actor UserSecondary Actor KWIC systemMain Flow User enters the specific entry to be

deleted from the database. The Cyberminer systems creates a search query accordingly, and deletes the entry, if found in the database. Returns an error to the User, otherwise.

Alternate Flow N/A

5 ADDITIONAL MATERIAL[1] KWIC Wikipedia Source - https://en.wikipedia.org/wiki/Key_Word_in_Context

[2] David L. Parnas uses a KWIC Index as an example on how to perform modular design in his paper “On the Criteria To Be Used in Decomposing Systems into Modules”, available as an ACM Classic Paper.

pg. 19