30
Semantic Semantic collaborative web collaborative web caching caching Jean-Marc Pierson Jean-Marc Pierson Lionel Brunie, David Coquil Lionel Brunie, David Coquil LISI, INSA de LYON LISI, INSA de LYON Jean-Marc.Pierson@insa- Jean-Marc.Pierson@insa- lyon.fr lyon.fr

Semantic collaborative web caching

  • Upload
    mirra

  • View
    22

  • Download
    1

Embed Size (px)

DESCRIPTION

Semantic collaborative web caching. Jean-Marc Pierson Lionel Brunie, David Coquil LISI, INSA de LYON [email protected]. Outline. Motivations and Proxies Documents indexation Temperature of documents Collaboration schema and architecture Results, evaluation and discussion - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic collaborative web caching

Semantic Semantic collaborative web collaborative web

cachingcaching

Jean-Marc PiersonJean-Marc PiersonLionel Brunie, David CoquilLionel Brunie, David Coquil

LISI, INSA de LYONLISI, INSA de [email protected]@insa-lyon.fr

Page 2: Semantic collaborative web caching

[email protected] 2

OutlineOutline

►Motivations and ProxiesMotivations and Proxies►Documents indexationDocuments indexation►Temperature of documentsTemperature of documents►Collaboration schema and architectureCollaboration schema and architecture►Results, evaluation and discussionResults, evaluation and discussion►ConclusionConclusion

Page 3: Semantic collaborative web caching

[email protected] 3

Sharing information/Sharing Sharing information/Sharing usageusage

► Information is disseminatedInformation is disseminated►The volume of information is hugeThe volume of information is huge

How find my way in the jungle of the How find my way in the jungle of the IS ?IS ?

►Many possible solutions : search Many possible solutions : search engines, agents, ontologies...engines, agents, ontologies...

►A solution to be explored : help A solution to be explored : help from/collaboration with other usersfrom/collaboration with other users

Page 4: Semantic collaborative web caching

[email protected] 4

Making users share usagesMaking users share usages

► ... Is an issue that has been addressed ... Is an issue that has been addressed for a long time : for a long time : proxiesproxies

server

proxy

users

Page 5: Semantic collaborative web caching

[email protected] 5

ProxiesProxies►Proxies allowProxies allow

reducing the response timereducing the response time reducing the server loadreducing the server load reducing the network loadreducing the network load

►Proxies can be located close to the Proxies can be located close to the server and/or close to usersserver and/or close to users

►Proxies can collaborate (hierarchical or Proxies can collaborate (hierarchical or "flat" collaboration)"flat" collaboration)

►Proxy management policies are based Proxy management policies are based on operational (LRU/MFU-like) on operational (LRU/MFU-like) informationinformation

Page 6: Semantic collaborative web caching

[email protected] 6

MotivationsMotivations

►Users are generally interested in some Users are generally interested in some concerns concerns

►User caches contain related documentsUser caches contain related documents►Metadata, user profiles, virtual Metadata, user profiles, virtual

communities, hot topics can provide communities, hot topics can provide proxies with semantic and contextual proxies with semantic and contextual information about the queries they information about the queries they have to servehave to serve

Page 7: Semantic collaborative web caching

[email protected] 7

monitoring this semantic and contextual monitoring this semantic and contextual information to :information to :

►optimize proxy management policies optimize proxy management policies and proxy communication policiesand proxy communication policies

►allow users to share usagesallow users to share usages►give users a personalized view of the give users a personalized view of the

web information space web information space

PropositionProposition

Page 8: Semantic collaborative web caching

[email protected] 8

►Proposition : use collaborative proxies to :Proposition : use collaborative proxies to : improve performances (basic)improve performances (basic) act as forum and mediators for helping users act as forum and mediators for helping users

share usage informationshare usage information

►Assumptions :Assumptions : proxies do not share rough data but proxies do not share rough data but

documentsdocuments that hold information which can be that hold information which can be described by described by metadatametadata (descriptors) (descriptors)

users are not isolated nor autistic : they share users are not isolated nor autistic : they share some common interest or experience or some common interest or experience or objective or behavior (objective or behavior (virtual communitiesvirtual communities))

information and topics of interest evolve information and topics of interest evolve rapidly : "rapidly : "hot" topicshot" topics

Page 9: Semantic collaborative web caching

[email protected] 9

From proxies to adaptive From proxies to adaptive indexesindexes

►The (present + past) content of a proxy The (present + past) content of a proxy de facto provides a view over the global de facto provides a view over the global information systeminformation system

►This view has some real added valueThis view has some real added value►Examples :Examples :

what teaching materials about Java are the what teaching materials about Java are the most accessed ?most accessed ?

are there some news about football ?are there some news about football ? what correlated documents people who once what correlated documents people who once

read this document have accessed after ?read this document have accessed after ?

Page 10: Semantic collaborative web caching

[email protected] 10

Document Document indexationindexation

► indexing tree : an indexing tree : an "ontology" of the "ontology" of the web spaceweb space

► difficulty to find one difficulty to find one ! !

► « Yahoo » like« Yahoo » like

Page 11: Semantic collaborative web caching

[email protected] 11

How the indexation is performed How the indexation is performed ??

►analyzes the content of the document…analyzes the content of the document… TitleTitle Meta-tags (Content, Keywords, …)Meta-tags (Content, Keywords, …) LinksLinks Formatting (header, bold face, outline)Formatting (header, bold face, outline)

►… … to extract keywordsto extract keywords►Keywords are analyzed to find related Keywords are analyzed to find related

conceptsconcepts►mapping is realized from concepts to mapping is realized from concepts to

ontology ontology

Page 12: Semantic collaborative web caching

[email protected] 12

Weighted indexing treeWeighted indexing tree

► Edges between concepts (ancestors and Edges between concepts (ancestors and children) are weightedchildren) are weighted

► The weight relates to the The weight relates to the probability of a probability of a request for a document located under the request for a document located under the child node to be next requested after a child node to be next requested after a document under the parent node in the document under the parent node in the hierarchy was requested.hierarchy was requested.

► It is the “correlation” (in terms of access It is the “correlation” (in terms of access patterns) between the target node and its patterns) between the target node and its “brothers”“brothers”

Page 13: Semantic collaborative web caching

[email protected] 13

Weighted Weighted treetree

for instance, one interested in baseball is more likely tobe interested by soccerthan skiing(subject of discuss)

Page 14: Semantic collaborative web caching

[email protected] 14

Notion of TemperatureNotion of Temperature

►documents are assigned a documents are assigned a temperature related to their « hotness temperature related to their « hotness » : a more a document is accessed, the » : a more a document is accessed, the higher its temperaturehigher its temperature

►cache replacement policy uses the cache replacement policy uses the temperature of documents : cooler temperature of documents : cooler documents are first suppressed from documents are first suppressed from the cache; prefetching uses the hottest the cache; prefetching uses the hottest documentsdocuments

Page 15: Semantic collaborative web caching

[email protected] 15

TemperatureTemperature

►Represents the probability for a Represents the probability for a document to be accessed in the near document to be accessed in the near futurefuture

► It is the synthesis between the number It is the synthesis between the number of requests for a document in the last of requests for a document in the last time interval and the semantic links time interval and the semantic links represented by the data structure.represented by the data structure.

►A temperature value is also associated A temperature value is also associated to internal nodes of the data structure.to internal nodes of the data structure.

Page 16: Semantic collaborative web caching

[email protected] 16

Temperature computationTemperature computation►Temperature computation occurs at Temperature computation occurs at

regular requests intervalsregular requests intervals►The number of accesses to each The number of accesses to each

document between two consecutive document between two consecutive computations is stored in an access computations is stored in an access table. table. if a document has been accessed since the if a document has been accessed since the

last temperature computation, its last temperature computation, its temperature increases of the corresponding temperature increases of the corresponding value in the table and this value is stored in value in the table and this value is stored in a stack for future coolinga stack for future cooling

otherwise, it decreasesotherwise, it decreases

Page 17: Semantic collaborative web caching

[email protected] 17

Temperature propagation up the Temperature propagation up the data structuredata structure

► The temperature variation (The temperature variation () for each ) for each document is diffused along the edges of the document is diffused along the edges of the data structure. data structure.

►More precisely, for each (document, concept) More precisely, for each (document, concept) couple where there exists an edge of weight W couple where there exists an edge of weight W between document and concept, the between document and concept, the temperature of concept increases or decreases temperature of concept increases or decreases by by W * W *

► The concept temperature variation may be The concept temperature variation may be further diffused to its parent node further diffused to its parent node (with a given (with a given threshold).threshold).

Page 18: Semantic collaborative web caching

[email protected] 18

Example :

for document 1 : +3

Temperature variation for Soccer (from T1T1) : ss = 3*70% = 2.1

Temperature variation for Sports = 2.1 * 40% = 0.84

Temperature variation forRecreation and Sports = 0.84*15% = 0.126

[stops here if threshold is 0.5]

Page 19: Semantic collaborative web caching

[email protected] 19

Temperature retropropagation Temperature retropropagation down the data structuredown the data structure

► Temperature is diffused from Temperature is diffused from concepts down to documents concepts down to documents

► each document under a concept that each document under a concept that has seen its temperature modified has seen its temperature modified sees its temperature modifiedsees its temperature modified

► even « non-accessed » documents even « non-accessed » documents might see their temperature increasemight see their temperature increase

Page 20: Semantic collaborative web caching

[email protected] 20

Example :

Temperature variation for Games concept = +0.126*15% = 0.0189

Temperature variation forBaseball = 0.84*40% = 0.336

Temperature variation forDocument 2 = 2.1*50%= 1.05

Temperature variation forDocument 3 = 2.1*60%= 1.26

In fact, one upward phase for all documents, then a downward phase for all concepts

+2.1

0.84

0.126

Page 21: Semantic collaborative web caching

[email protected] 21

Document – Concept link Document – Concept link (precision)(precision)

►When a document is related to two When a document is related to two concepts, we duplicate its node and concepts, we duplicate its node and link the two created nodes to the two link the two created nodes to the two related concepts.related concepts.

►Otherwise, with only one node, Otherwise, with only one node, problem with the temperature problem with the temperature variation propagation among non variation propagation among non related documents (by related documents (by reboundrebound))

Page 22: Semantic collaborative web caching

22

A A distributed distributed collaborativcollaborativ

e e architecturearchitecture

Page 23: Semantic collaborative web caching

[email protected] 23

Proxy architectureProxy architecture

Index

Queryprocessing

Server/proxyconnection

Profile

Cache

ClientConnection

Temperature

Page 24: Semantic collaborative web caching

[email protected] 24

Navigator cache vs user Navigator cache vs user proxyproxy

►Navigator "local caches" are basic and Navigator "local caches" are basic and cannot communicatecannot communicate

► Implementing true communicating proxies Implementing true communicating proxies at the navigator/user level allows :at the navigator/user level allows : reducing the intermediate proxy loadreducing the intermediate proxy load optimizing the network trafficoptimizing the network traffic reducing the response timereducing the response time managing the user profilemanaging the user profile counting document hits counting document hits customizing semantic and contextual customizing semantic and contextual

informationinformation

Page 25: Semantic collaborative web caching

[email protected] 25

From proxies to virtual From proxies to virtual communitiescommunities

►User profile : topics of interestUser profile : topics of interest►Virtual community = users with similar Virtual community = users with similar

profileprofile►Virtual communities could be used for :Virtual communities could be used for :

monitoring the document usagemonitoring the document usage associating proxies with specific communitiesassociating proxies with specific communities providing users with pertinent information about providing users with pertinent information about

the content of proxy cachesthe content of proxy caches monitoring the evolution of the topics of interestmonitoring the evolution of the topics of interest sharing experiences and optimizing queriessharing experiences and optimizing queries

Page 26: Semantic collaborative web caching

[email protected] 26

Collaboration and Collaboration and communitiescommunities

►Subscription : manual and static to Subscription : manual and static to evolve to dynamic and automaticevolve to dynamic and automatic

►Relationships between the user proxy Relationships between the user proxy and the aggregate proxies in charge of and the aggregate proxies in charge of the community :the community : to find in another user proxy a requested to find in another user proxy a requested

document document to see the most accessed documents in to see the most accessed documents in

the communitythe community

►The proxy organization must reflect the The proxy organization must reflect the community structure and usagescommunity structure and usages

Page 27: Semantic collaborative web caching

[email protected] 27

PrototypePrototype

► JavaJava► Indexation tree limited to 2 or 3 levels Indexation tree limited to 2 or 3 levels

of Yahoo! of Yahoo! ► Matching done only with keywords Matching done only with keywords

(being or not in the indexing tree) and (being or not in the indexing tree) and not with conceptsnot with concepts

► Interfaced with ThoughtTreasure (a Interfaced with ThoughtTreasure (a french-english Wordnet) for keywords french-english Wordnet) for keywords not in the indexing treenot in the indexing tree

Page 28: Semantic collaborative web caching

[email protected] 28

EvaluationEvaluation

► temperature notion already proved temperature notion already proved efficient for video archives caching (hit efficient for video archives caching (hit rate)rate)

► small scale experiments of the proxy-small scale experiments of the proxy-web architecture proved to be robustweb architecture proved to be robust

► indexation is working well (more than indexation is working well (more than 90% of documents indexed)90% of documents indexed)

► difficulties related to the necessity to difficulties related to the necessity to handle contents of web pages to test the handle contents of web pages to test the behaviorbehavior

Page 29: Semantic collaborative web caching

[email protected] 29

ConclusionConclusion

►Enhancing the integration of distributed Enhancing the integration of distributed information systems or servers into a information systems or servers into a global service by the means of global service by the means of collaborative proxiescollaborative proxies

►Management and collaboration based on Management and collaboration based on semantic and contextual information semantic and contextual information temperaturetemperature

►Performance improvementPerformance improvement►Virtual communitiesVirtual communities►Attachment of a proxy to each userAttachment of a proxy to each user

Page 30: Semantic collaborative web caching

[email protected] 30

Future worksFuture works

►test the prototype on a large scale : test the prototype on a large scale : design a test platform !design a test platform !

►push the intermediate cache push the intermediate cache management to the heart of the networks management to the heart of the networks (active router)(active router)

►enhance the indexation algorithmenhance the indexation algorithm

►apply the technology to Grid computing apply the technology to Grid computing (cache management)(cache management)