18
Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation). Web pages presented on the internet do not conform to any data organization standard and search engines provide primitive query capabilities for users to retrieve relevant data [1]. In addition to that, they do not list sites equally and are inclined toward listing more popular pages. These tendencies brush many web pages aside and leave a limited number of alternatives to the users. I have created a lightweight domain ontology that consists of a taxonomic hierarchy and made use of it by an automated agent to classify web pages on the internet. The automated agent discovers and classifies relevant pages with the help of Yahoo and Google search engines.

Ontological Classification of Web Pages Zafer Erenel

Embed Size (px)

DESCRIPTION

Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation). - PowerPoint PPT Presentation

Citation preview

Page 1: Ontological Classification of Web Pages Zafer Erenel

Ontological Classification of Web Pages

Zafer Erenel

• Many users use search engines to locate and buy goods and services (such as choosing a vacation).

• Web pages presented on the internet do not conform to any data organization standard and search engines provide primitive query capabilities for users to retrieve relevant data [1].

• In addition to that, they do not list sites equally and are inclined toward listing more popular pages. These tendencies brush many web pages aside and leave a limited number of alternatives to the users.I have created a lightweight domain ontology that consists of a taxonomic hierarchy and made use of it by an automated agent to classify web pages on the internet.

• The automated agent discovers and classifies relevant pages with the help of Yahoo and Google search engines.

Page 2: Ontological Classification of Web Pages Zafer Erenel

Related Research• Desai and Spink presented a clustering scheme that

groups documents into partially and substantially relevant pages by using similarity measures and ranking heuristics [2].

• They worked with the end-user queries (limited number of terms) to obtain the relevance score. Instead, my automated agent will act along with an established ontology to discover and classify documents.

• Chiang, Chua, and Storey parsed snippets of returned links to find the ratio of the number of matching terms to rank the web pages for relevance [1].

• I believe snippets consist of a very few number of words and we cannot judge the web page on the basis of snippets. My agent scours the entire web page which is more time-consuming but more effective

Page 3: Ontological Classification of Web Pages Zafer Erenel

• Yahoo and Google search results contain scores of links. • My lightweight domain ontology consists of 7 branches. Each

branch is comprised of predetermined terms.• Score of the web page increases in a certain branch as the

agent comes across these predetermined terms on the html code.

• I’ve chosen country ontology because internet users’ interest in a certain country can be quite high.

Page 4: Ontological Classification of Web Pages Zafer Erenel
Page 5: Ontological Classification of Web Pages Zafer Erenel
Page 6: Ontological Classification of Web Pages Zafer Erenel
Page 7: Ontological Classification of Web Pages Zafer Erenel

• The ranks of web pages in each cluster will clarify their content to the user.

• In addition to that, we can compare result sets of different search engines (Yahoo and Google) for the same queries and find complement and intersection of their result sets to have a clear understanding of search engines’ behaviors.

Page 8: Ontological Classification of Web Pages Zafer Erenel

• I’ve used web stream classes in C# Programming language to create my agent.

• A WebRequest is an object that requests a Uniform Resource Identifier (URI) such as the URL for a web page [3].

• You can use a WebRequest object to create a WebResponse object that will encapsulate the object pointed to by the URI.

• Once you get the actual object (e.g., a web page) pointed to by the URI, what you get back is a stream of the web page.

Page 9: Ontological Classification of Web Pages Zafer Erenel

• I used this capability for reading a page from a site to extract the information I need. I have created two web requests using search syntax given below

http://www.google.com/search?q=cyprus+vacation+travel&lr=&start=0&sa=N

http://search.yahoo.com/search?p=cyprus+vacation+travel&b=1

Page 10: Ontological Classification of Web Pages Zafer Erenel

• Google search engine has returned 200 URLs and I have created 200 web requests to extract relevant information from each web page.

• Yahoo search engine has been used in the same manner to extract relevant information.

Page 11: Ontological Classification of Web Pages Zafer Erenel
Page 12: Ontological Classification of Web Pages Zafer Erenel
Page 13: Ontological Classification of Web Pages Zafer Erenel

• If we analyze price rankings, we come across pages that have information about student flights, travel insurances, vacation package discounts, cheap flights and etc..

• If we analyze nature rankings, we come across web pages that offer adventure and etc.

• If I want to do scuba diving on my vacation, I know that hawai and fiji are among my options by looking at activities rankings

Page 14: Ontological Classification of Web Pages Zafer Erenel

• Interestingly enough, in the top 100 search lists, the number of web pages that both appear on Google and Yahoo is 19.

Page 15: Ontological Classification of Web Pages Zafer Erenel

• In the top 200 search lists, the number of web pages that both appear on Google and Yahoo is 24.

Page 16: Ontological Classification of Web Pages Zafer Erenel
Page 17: Ontological Classification of Web Pages Zafer Erenel

• As a result, ontologically organized clusters of web sites that are offering information about a given country regarding vacation and travel alternatives serve our objective to a greater extent in finding what we are in search of.

• In my work, I have used 2 search engines and a single ontology. Search Engines’ shortcomings can be prevented by combining multiple engines with multiple ontologies to ease the search for most needed information on the internet.

• Venn Diagrams prove that a specific search engine is not very effective by itself.

Page 18: Ontological Classification of Web Pages Zafer Erenel

References

[1] R.H.L. Chiang, C.E.H. Chua, V.C. Storey, A smart web query method for semantic retrieval of web data, Data & Knowledge Engineering 38 (2001) 63-84.

[2] M. Desai, A. Spink, An algorithm to cluster documents based on relevance, Information Processing and Management 41 (2005) 1035-1049.

[3] Liberty, J., Programming C#,3rd ed. O’REILLY, 2003.