Final Report Acg

Embed Size (px)

Citation preview

  • 8/3/2019 Final Report Acg

    1/25

    Automatic community generator

  • 8/3/2019 Final Report Acg

    2/25

    Automatic community generator

    Project Approval Sheet

    The Project entitled

    AUTOMATIC COMMUNITY GENERATOR

    Is hereby approved in partial fulfillment for the Bachelors Degree of Engineering in

    Information Technology and will be carried out by

    Name of students: Roll Number

    1. Parag Jain 062. Saurabh Jain 083. Manas Jain 04

    4. Nikhil Bhutada 21

    5. Yash Bhise 19

    (Prof. L.M.R.J Lobo)

    HOD I.T. DEPT.

    Department of Information Technology

    Walchand Institute of Technology, Solapur

    Year 2010-2011

  • 8/3/2019 Final Report Acg

    3/25

    Automatic community generator

    Certificate

    This is to certify that the Project design entitled

    Automatic Community Generator

    Has been carried out by

    1. Parag Jain 06

    2. Saurabh Jain 08

    3. Manas Jain 04

    4. Nikhil Bhutada 21

    5. Yash Bhise 19

    of B.E.(Information technology) Class in partial fulfillment for the award of Degree in

    Bachelor of Engineering in Information Technology as per requirement of Solapur

    University in academic Year 2010-11.

    (Prof: L.M.R.J. Lobo)

    Head IT Dept.

    (Dr. S. A. Halkude)

    Principal

    Department Of Information Technology,

    Walchand Institute of Technology, Solapur

  • 8/3/2019 Final Report Acg

    4/25

    Automatic community generator

    ACKNOWLEDGEMENT

    It is with a great sense of gratitude that we acknowledge the support given to us

    by our project guide Prof. L.M.R.J. LOBO. We really feel a tough task, to put

    into words, the confidence and support that our guide Prof. L.M.R.J. LOBO

    gave us, they turned to be our moral boosters. We are really grateful for this

    confidence in us, which proved to be our strength all throughout our work. His

    support, both technically and morally helped us in computing the first task of

    our project, the Design Report. Finally we mean it. We could do this onlybecause of Prof. L.M.R.J. LOBOs guidance.

  • 8/3/2019 Final Report Acg

    5/25

    Automatic community generator

    INDEX

    1.Brief Idea about the Project & Algorithms used2.Software used3.Snapshots of screens4.Results generated5.Comparison with existing systems6.Testing and Maintenance7.References

  • 8/3/2019 Final Report Acg

    6/25

    Automatic community generator

    BRIEF IDEA ABOUT PROJECT

    PURPOSEThe very purpose of our project is to enhance communication by classifying

    the users according to the level of activity or their communication. This is

    an extension to the social networking websites where communities will be

    generated automatically on the basis of blogs forum etc. Any post from the

    user will be considered and will be helpful in knowing about user.

    This clustering will help user to search and the other users who have

    knowledge about that topic. And hence the user can communicate and get

    relevant information.

    FOCUSOur project focuses on how data mining can be useful to extract patterns and

    find a solution to clustering. From the large data we have extracted the

    keywords which occur frequently and also there use by users.

    We found this task interesting and challenging because of the nature of the

    project that needs thorough knowledge along with special skills to develop

    web mining tools. Our project reflects our career.

    Problem StatementTo build a Social networking website with enhanced features of

    automatic community generation for improved communication.

    Idea:We will build a social networking website with the enhancement features

    of Automatic community generation on the basis of personal information,chats, forum and thread started. These communities will be useful to

    other users to find information on the topics and also the specialist in thatfield can be found.

    How the Automatic grouping of users is done:

    Our objective with this task is to group users into subgroups to facilitate

    collaboration among them with the course tools. In our case we have

    opted for the clustering algorithm EM (Expectation

    Maximization).Once the groups have been formed, if we have some

    interaction data on a specific user the system automatically assigns

  • 8/3/2019 Final Report Acg

    7/25

    Automatic community generator

    him/her to a group. The user will then be advised to contact members of

    that group.

    Block Diagram

    Structure of the Project1. SQL database2. Java Program i.e. Main Algorithm3. JSP pages.Tables

  • 8/3/2019 Final Report Acg

    8/25

    Automatic community generator

    blogs communities communityusers distance1 forumposts forumthreads pendingresource users userwords words

    Java Program

    Function Used:

    int getWordId(String word) This function is used to retrieve the wordId for a

    given word

    DatabaseManager

    Int getWordId(String word)

    int getPendingBlogs()

    int getPendingChats()

    int getPendingPosts()

    String getBlogText(int bid)

    String getChatText(int cid)

    String getPostText(int pid)

    void idDelete(int id, String s)

    int getUserid(int id,String s)

    void InsertToUserwords(int uid,int wid) void calc_weight()

    int[] getComm( int [][]w)

    WordExtractor

    static List extract(String s)

    static int InsertToDatabase(String

    word)

    static void distance(List words)

    public static void main(String args[])

  • 8/3/2019 Final Report Acg

    9/25

    Automatic community generator

    int getPendingBlogs() This function is used to retrieve the ID of pending

    blogs from the datatable pendingresourse sothat the blog data can be scanned.

    int getPendingPosts() This function is used to retrieve the ID of pending

    posts from the datatable pendingresourse so that

    the forum data can be scanned.

    String getBlogText(int bid) Returns the text of the blog whose ID is passed as a

    parameter to this function as returned from the

    function getPendingBlogs().

    String getPostText(int pid) Returns the text of the post whose ID is passed as a

    parameter to this function as returned from thefunction getPendingPosts() .

    void idDelete(int id, String s) This function deletes the entry from the datatable

    pendingresourse after the entry in the database

    is no more pending that is the WordExtractor is

    executed.

    int getUserid(int id,String s) This function returns the useridcorresponding to

    the area in which user is working.

    void InsertToUserwords(int uid,int wid) This function is used to insert a word with wordid

    as widand used by the user with useridas uidand

    a userwordidis assigned.

    void calc_weight()

  • 8/3/2019 Final Report Acg

    10/25

    Automatic community generator

    This function uses the the formulaeWeight = hitcount * no of users to calculate a

    weigtht of a word which is used for ranking

    purpose.

    Algorithm 1 for generating community (for single words) calc_weight()

    Start

    1: connect to the database ocg using ConnectionManager

    2: execute the query select count(*)as h from words; //words is table in Ocg

    3: print size of words;4: execute query select wid as w1,hitcount* (select count(*) from userwords

    where wid=w1)as uh from words order by uh desc

    5: for i=0 to size

    6: Goto next record

    7: Store wordID UserId into two dimensional array

    8: end for

    9: for i=0 to size

    10: Display the two dimensional array

    11: End for

    12: int a[] = getComm(words);

    13: for i=0 to a.length()

    14: execute query select word from words where wid = "+a[i];

    15: while rs.next()

    16: print word;

    17: execute query insert ignore into communities values(0,'"+word+"',CURRENT_DATE);

    18: if st1.executeUpdate(q) != 0 then

    19: execute query INSERT_ID(cid) from communities order by cid desc limit 1;

  • 8/3/2019 Final Report Acg

    11/25

    Automatic community generator

    20: print id;

    21: execute query select uid from userwords where wid = "+a[i];

    22: while rs.next()

    23: print userid;

    24: execute query insert ignore into communityusers values

    (0,"+id+","+userid+");

    25: end while

    26: end if

    27: end while

    28: end for

    Stop

    Algorithm 2 for generating community (Distance method)

    distance(List words)

    Start

    1: get words from string array

    2: get connection

    3: createStatement st,st1

    4: intialize l = words.size();

    5: for i to l

    6: for j=i+1, j to l

    7: select hits from distance1 where wid1 =+y[i]+and wid2 =+y[j]+

  • 8/3/2019 Final Report Acg

    12/25

    Automatic community generator

    8: if result

    9: update distance1

    10: executeUpdate(qq)

    11: insert into distance1

    12: executeUpdate(q)

    13: String n = "select * from distance1 where hits > 4

    14: while record exist

    15: getstring wid1

    16: getstring wid2

    17: wid3=wid1+" "+wid2

    18: select * from communities where name = '"+wid3+"

    19: executeQuery(n)

    20: if record exists

    21: else

    22: insert into communities values 0,wid3,CURRENT_DATE;

    23: executeUpdate(n)

    24: initialize new database object dbm

    25: get wid1 in id1

    26: get wid1 in id1

    27: select LAST_INSERT_ID(cid) from communities order by cid desc limit 1;

    28: while record exists

    29: uu = get first record

    30: insert into communityusers values(0,"+cid+","+uu+");

    31: nt.executeUpdate(n);

    32: end for

    33: end for

    Stop

  • 8/3/2019 Final Report Acg

    13/25

    Automatic community generator

    SOFTWARE USED

    Front end: Java and Html

    Java entrenched itself on the server side where it has clear advantages over anyother existing technology. However, just about any application has some formof user interface and front-end presentation.

    Given the simplicity of HTTP/HTTPS protocols, you are also guaranteed to

    enjoy the predictability of programming for various network configurations and

    firewalls. But as with everything else on this planet, that comes at a price. The

    trade-off with HTML is the lack of user interaction and the necessity of making

    network trips to the server for every response to a user action. JavaScript is a

    great language for adding uncomplicated interactive logic to otherwise staticHTML, but it is not one for developing sophisticated user interfaces that willimpress the user with self-intelligence.

    Java Server Pages (JSP) technology provides a simplified, fast way to create

    dynamic web content. JSP technology enables rapid development of web-based

    applications that are server and platform-independent. JSP technology lets you

    add snippets of servlet code directly into a text-based document. Typically, aJSP page is a text-based document that contains two types of text:

    Static data, which can be expressed in any text-based format, such asHTML, Wireless Markup Language (WML), or XML

    JSP technology elements, which determine how the page constructs dynamiccontent

    Html can be easily embedded in JSPWe have developed our logic in JAVA and used HTML and JSP technology

    to develop web pages.

    Middle ware: Apache Tomcat

    A Web application runs within a Web container of a Web server. The Webcontainer provides the runtime environment through components that provide

    naming context and life cycle management. Some Web servers may also

    provide additional services such as security and concurrency control. A Web

    server may work with an EJB server to provide some of those services. A Web

    server, however, does not need to be located on the same machine as an EJBserver.

    Web applications are composed of web components and other data such asHTML pages. Web components can be servlet, JSP pages created with the Java

  • 8/3/2019 Final Report Acg

    14/25

    Automatic community generator

    Server Pages technology, web filters, and web event listeners. These

    components typically execute in a web server and may respond to HTTP

    requests from web clients. Servlet, JSP pages, and filters may be used togenerate HTML pages that are an applications user interface. They may also be

    used to generate XML or other format data that is consumed by otherapplication components.

    Back end: MySQL

    MySQL is the world's most popular open source and platform independent

    database software, with over 100 million copies of its software downloaded or

    distributed throughout its history. With its superior speed, reliability, and ease

    of use, MySQL has become the preferred choice for Web, Web 2.0, SaaS, ISV,

    Telecom companies and forward-thinking corporate IT Managers because it

    eliminates the major problems associated with downtime, maintenance andadministration for modern, online applications.

    Hardware requirementsMinimum: CPU 2.6 GHz, 2 GB RAM and 160 GB HDD

  • 8/3/2019 Final Report Acg

    15/25

    Automatic community generator

    SCREENSHOTS

    Home page (homepage.jsp)

    Signup (signup.jsp)

  • 8/3/2019 Final Report Acg

    16/25

    Automatic community generator

    New user registration (registration.jsp)

    Home page after login (home.jsp)

  • 8/3/2019 Final Report Acg

    17/25

    Automatic community generator

    Users Blog list (blog.jsp)

    New Blog (blog1.jsp)

  • 8/3/2019 Final Report Acg

    18/25

    Automatic community generator

    Forum list (forum.jsp)

    Post to forums (replyto.jsp)

  • 8/3/2019 Final Report Acg

    19/25

    Automatic community generator

    Search Community (communityhome.jsp)

    Community page after search (community.jsp)

  • 8/3/2019 Final Report Acg

    20/25

    Automatic community generator

    List of users in community (communitypage.jsp)

    User details

  • 8/3/2019 Final Report Acg

    21/25

    Automatic community generator

    RESULTS GENERATED

    From the Algorithm 1 and Algorithm 2 we have generated community

    automatically the screenshots of the community page are shown above (fig

    9&10)

    Fig: Community list this figure above shows the list of all the community automatically

    generated.

    Query: Select * from communities

    Fig: Users list sorted according to communities

    Query: select * from communityusers

  • 8/3/2019 Final Report Acg

    22/25

    Automatic community generator

    COMPARISON WITH EXISTING SYSTEMS:

    In todays world Social Networking websites we search or find communities,

    but if the community is not present we have to create it and then access it.

    Hence Community is not generated automatically. Moreover data is notorganized properly or it is difficult to find particular data.

    For example if we want to search any information about RMI in Java then we

    use to go the community named as java and then try to find relevant results, but

    in this case we may or may not get our desired result and also it is cumbersome

    to browse the entire community , this problem of browsing for the result ischanged to simply searching the name of the required community and as

    communities are created automatically so even for a small field/area a different

    community is created and hence the problem of browsing for the community issolved.

  • 8/3/2019 Final Report Acg

    23/25

    Automatic community generator

    TESTING AND MAINTENANCE:

    TestingThere are two approaches for testing:

    1) White Box Testing2) Black Box Testing

    1) White Box Testing: In our project white box testing methods can be usedto evaluate the completeness of a test suite that was created with black

    box testing methods.

    In white box testing we are going to check the value of each variable in

    every method by using the debugger.

    This allows software team to examine part of system that are rarely tested

    and ensures that the most important function points have been tested.

    We used two common form of code coverage are:

    Function coverage which reports on functions executed. Statement coverage, which report on number of lines executed to

    complete the test.

    They both return code coverage metric measured as a percentage.

    2) Black Box Testing:Black Box Testing treats our software as a blackbox- without any knowledge of internal implementation. Black box

    testing methods include: equivalence partitioning, boundary valueanalysis, all-pairs testing, fuzz testing, model-based testing, exploratorytesting and specification-based testing.

    Specification-based Testing: Specification-based testing aims to test thefunctionality of software according to the applicable requirements. Thus,

    the tester inputs data into, and only sees the output from, the test object.

    This level of testing usually requires thorough test cases to be provided to

    the tester, who then can simply verify that for a given input, the output

    value (or behavior), either "is" or "is not" the same as the expected valuespecified in the test case.

  • 8/3/2019 Final Report Acg

    24/25

    Automatic community generator

    Specification-based testing is necessary, but it is insufficient to

    guard against certain risks.

    Advantages and Disadvantages: The black box tester has no "bonds"with the code, and atester's perception is very simple: a code must havebugs. Using the principle, "Ask and you shall receive," black box testers

    find bugs where programmers do not. On the other hand, black box

    testing has been said to be "like a walk in a dark labyrinth without a

    flashlight," because the tester doesn't know how the software being tested

    was actually constructed. As a result, there are situations when-1) A tester writes many test cases to check something that could have

    been tested by only one test case, and/or

    2)Some parts of the back-end are not tested at all. Therefore, blackbox testing has the advantage of "an unaffiliated opinion", on the

    one hand, and the disadvantage of "blind exploring", on the other.

    Therefore, black box testing has the advantage of "an unaffiliated

    opinion", on the one hand, and the disadvantage of "blind exploring", on

    the other.

  • 8/3/2019 Final Report Acg

    25/25

    Automatic community generator

    REFERENCES

    We have taken basic idea of grouping of users from Towards web-basedadaptive learning communities [1]. This paper gives the general idea of

    grouping students depending on their requirements. They have opted for the

    clustering algorithm EM Expectation Maximization.

    To identify the contents of web pages, we propose a combined mechanism

    which computes the product of term frequency and document frequency and

    prioritizes the terms based on the calculation of entropies. We assign any Web

    page into a category within domain ontology [2]. Our approach to identifying

    associations between a Keyword and a predefined category is to use term-

    classification rules compiled by machine learning algorithms.

    CONCLUSION

    We have built website which has the feature of automatic community generation

    which can be used as extension to current social networking website. This will

    be helpful in enhancing communication as the clustering is done automatically.Other algorithms can also be included in this to increase the efficiency of

    clustering and generation of communities.

    The program will be running on server at frequent intervals which will create

    community time to time as set by administrator. This will update users

    information.