33
Chapter 1 Introductio n 1

Chapter 1 Introduction

  • Upload
    prince

  • View
    19

  • Download
    4

Embed Size (px)

DESCRIPTION

Chapter 1 Introduction . The Web. redefines the meanings and processes of business, commerce, marketing, publishing, education, research, government, and development , as well as other aspects of our daily life. . What’s the difference?. New challenges of the web. Size Complexity - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 1 Introduction

1

Chapter 1Introduction

Page 2: Chapter 1 Introduction

2

The Webredefines the meanings and processes of business, commerce, marketing, publishing, education, research, government, and development, as well as other aspects of our daily life.

Page 3: Chapter 1 Introduction

3

What’s the difference?

Page 4: Chapter 1 Introduction

4

New challenges of the web Size Complexity

we need to modify or enhance existing theories and technologies to deal with the size and complexity of the web

Page 5: Chapter 1 Introduction

5

What is WI?“Web Intelligence (WI) exploits Artificial Intelligence (AI) and advanced Information Technology (IT) on the Web and Internet.”

AI IT WI

Page 6: Chapter 1 Introduction

6

Web Intelligence (WI) The term WI was conceived in late 1999 A recent sub discipline in computer

science, first WI conference was the Asia-Pacific Conference on WI-2001

Page 7: Chapter 1 Introduction

7

Intelligent Web Learning new knowledge from the Web Searching for relevant information Personalized web pages Learning about individual users

Page 8: Chapter 1 Introduction

8

Information Retrieval

Page 9: Chapter 1 Introduction

9

Information Retrieval (IR) As soon as information archives started

building, so did information retrieval techniques. Catalogues, index, table of contents

Computerized information storage and retrieval from 1950 and 60’s

Renewed interest after the advent of the Web

Page 10: Chapter 1 Introduction

10

Figure 1.1 Timeline of information and retrieval (Courtesy of Ned Fielden, San Francisco State University)

Page 11: Chapter 1 Introduction

11

Modern Information RetrievalDocument representationQuery representationRetrieval modelSimilarity between document and

queryRank the documentsPerformance evaluation of the

retrieval process

Page 12: Chapter 1 Introduction

12

Semantic Web

Page 13: Chapter 1 Introduction

13

Keywords versus Semantics The traditional IR is limited by keywords Key phrases can be used to introduce a

bit of semantics Semantic Web is an emerging area

Page 14: Chapter 1 Introduction

14

Semantic WebThe Semantic Web proposed by

Tim Berners-Lee, the developer of the World Wide Web

The Semantic Web is concerned with the representation of data on the World Wide Web.

W3C, researchers and industrial partners

Page 15: Chapter 1 Introduction

15

Web Mining

Page 16: Chapter 1 Introduction

16

Data Mining Applied to Web Data mining is the process of

discovering knowledge from large amount of data

Used significantly in commercial and scientific applications

Adjustment needs to be made for the Web

Page 17: Chapter 1 Introduction

17

Data MiningClustering: Finding natural

groupings of users or pagesClassification and prediction:

Determining the class or behavior of a user or resource

Associations: Determining which URLs tend to be requested together

Page 18: Chapter 1 Introduction

18

Web MiningWeb content mining

Applied to primary data on the Web, text and multimedia documents

Web structure mining Hyperlink analysis

Web usage mining Secondary data consisting of user

interaction with the WebUser profiles

Page 19: Chapter 1 Introduction

19

Figure 1.2 Web mining classifications (Courtesy of O. Romanko, 2002)

Page 20: Chapter 1 Introduction

20

Web Usage Mining

Page 21: Chapter 1 Introduction

21

Web Usage Mining Study of data generated by the

surfer’s sessions or behaviors Works with the secondary data from

user’s communications with the Web web logs, proxy-server logs, browser logs

A Web-access log is an inventory of page-reference data referred to as clickstream data, as each

entry corresponds to a mouse click Cookies

Page 22: Chapter 1 Introduction

22

Figure 1.3 High level web usage mining process (Courtesy of Srivastava et al., 2000)

Page 23: Chapter 1 Introduction

23

Web Usage MiningLogs can be observed from two angles:

Server: to advance the design of a website. Client: assessing a client’s sequence of

clicks. Useful for caching of pages Efficient loading of Web pages

Helps organizations efficiently market their products on the Web.

Can supply essential information on how to restructure a website

Page 24: Chapter 1 Introduction

24

Applications of Web Usage Mining

Figure 1.4 Applications of web usage mining (Courtesy of O. Romanko, 2002; Courtesy of Srivastava et al., 2000)

Page 25: Chapter 1 Introduction

25

Web Content Mining

Page 26: Chapter 1 Introduction

26

Web Content Mining Text mining

Traditional information retrieval Semantic Web

Multimedia Images Audio Video

Web crawlers

Page 27: Chapter 1 Introduction

27

Figure 1.5 Architecture of a search engine (Courtesy of O. Romanko, 2002)

Page 28: Chapter 1 Introduction

28

Web Structure Mining

Page 29: Chapter 1 Introduction

29

Web-Structure MiningFinding the model underlying

the link structures of the Web,classify web pages. similarity and relationship

between various websites

Page 30: Chapter 1 Introduction

30

Web Structure Mining Algorithms to model web topology

PageRank HITS CLEVER

Primarily useful as a technique for computing the rank of every web page

Assumption: if one web page points to another web page, then the former is approving the significance of the latter.

Page 31: Chapter 1 Introduction

31

Why Web Intelligence?

Page 32: Chapter 1 Introduction

32

Build Better Web Sites Using Intelligent Technologies

Better keyword and key-phrase based search

Multimedia information retrieval using Web content mining

Analyze the shopping trends using data mining

Improve access to website by studying Web usage

Improved structure using Web structure mining

Page 33: Chapter 1 Introduction

33

Benefits of Intelligent WebMatching existing resources to a visitor’s

interestsBoost the value of visitorsEnhance the visitor’s experience on the

web siteAchieve targeted resource managementTest the significance of content and web

site architecture