View
103
Download
0
Category
Tags:
Preview:
DESCRIPTION
Citation preview
1
Supporting End Users in the Creation of Dependable Web Clips
Sandeep Lingam, Sebastian Elbaum
Proceedings of the 16th international conference on World Wide Web (WWW2007)
Reporter: Shih-Feng Yang
2007/7/2
2
Outline
Introduction Web Clipper Evaluation Conclusion
3
Introduction
Web authoring environments have enabled end-users who are non-programmers to design and quickly construct web pages.
Web clip : a component within the end-user’s website which can dynamically extract information from other web-sources.
4
Introduction
Web Clip
5
Introduction
Goal Web clipper : An approach to support end-users
through the entire process of creating a dependable web clip.
Three fundamental aspects:1. Our tool will be embedded in the web authoring tool
interface.
2. Training: increase the robustness of the web clip.
3. Deploy multiple filters to increase the confidence in the correctness of the retrieved information.
6
Introduction
Challenges We can’t expect end-users to have any
programming experience about web clip. The content within the target site of a web clip
will change.
7
Web Clipper
Approach Overview
8
Web Clipper-Clipping Target Clip Selection
There is a custom browser for controlling the web clip. Every extractable document element is highlighted when
the user moves the mouse, and the user can make a selection by clicking on it.
Extraction Pattern Once a selection is made, an extraction pattern is
generated. During the clipping process, the user’s selection is uniquely
identified by its HTML-Path. HTML-Path : a specialized XPATH expression.
9
Web Clipper-Clipping
10
Web Clipper-Training To increase the robustness of the web clip, they con
struct extraction patterns which uniquely characterize the end-user selection.
Several clips will created using different extraction patterns.
Every time the user marks a clipping as valid, the system generates a filter corresponding to the clipping. Filter: Javascript code, embedded within the user’s web pa
ge.
11
Web Clipper-Training
Validation of the extraction patterns presented by the system.
12
Web Clipper-Training
Extraction Patterns
13
Web Clipper-Training
14
Web Clipper-Deployment The URL and extraction patterns of the clipped
content are used to generate an AJAX script. HTML documents -> XHTML. Relative URLs -> absolute URLs. Generate filters from pre-defined templates for each
of the extraction patterns during training. The user can move, resize or annotate the web clip
to suit her preference.
15
Web Clipper-Filtering and Assessment
The content which the user want to see in the web clip
16
Web Clipper-Filtering and Assessment
17
Web Clipper-Filtering and Assessment
18
Web Clipper-Filtering and Assessment
Then the paper defined Confidence The ratio of the maximum filter score of all valid
extraction patterns generated during the training section.
The prototype will alert the user when the content within the target site changes.
The user can also configure the web clips to provide alerts when the confidence scores fall below a particular threshold.
19
Web Clipper-Filtering and Assessment
Label filter has the highest score, soThe system will use this pattern to extract content, andthe confidence score = 2/3 = 67%
20
Web Clipper-Filtering and Assessment
Alert the user when the content within the target site changes
21
Evaluation
Effectiveness of the extraction patterns used in generating web clips.
Dependability of web clips in providing sufficiently correct information over time.
Robustness of web clips to changes in the clipped web site.
22
Evaluation Effectiveness of extraction patterns
23
Evaluation Dependability of web clips
confidence scores
24
Evaluation Robustness
This experiment will test the degree to which the web clips change:
1. Block Insertion
2. Block Movement
3. Block Deletion
4. Enclosing Element Changes
5. Target Clipping Removed
25
Evaluation Robustness
26
Conclusion
This paper presented an approach to support end-users through the entire process of creating a dependable web clip.
Web clipper addresses the shortcomings of existing tools by introducing the notion of training and of dynamic confidence evaluation.
27
Finish
Thanks for your patience!
Recommended