13
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Corroborate and Learn Facts from the Web Presenter : Lin, Shu-Han Authors : Shubin Zhao, Jonathan Betz SIGKDD (2008)

Corroborate and Learn Facts from the Web

  • Upload
    janae

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Corroborate and Learn Facts from the Web. Presenter : Lin, Shu -Han Authors : Shubin Zhao, Jonathan Betz. SIGKDD (2008). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Wikipedia. Motivation. moviefone.com. Infoplease.com. Many “Facts” - PowerPoint PPT Presentation

Citation preview

Page 1: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Corroborate and Learn Factsfrom the Web

Presenter : Lin, Shu-HanAuthors : Shubin Zhao, Jonathan Betz

SIGKDD (2008)

Page 2: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

Many “Facts” The movie

“Independence day”

3

Wikipedia

Infoplease.com

moviefone.com

Page 4: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

Combine them Mentioned

The director of movie

“Roland Emmerich”

4

Page 5: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Cache the new “facts”: attribute + value Have the same HTML patterns

Then corroborate these new “facts” Check other website also mentioned

about these “facts” or not

Learn this factGood fact: commonly referenced.Incorrect facts: very few mentioned.

5

Attribute value

Page 6: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Overview

6

3. Extract New facts

2. Match

1. Relevant Page

Wiki、 Seed set

Search Engine

Page 7: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Corroborate fact– Common fact

7

A common fact “Susan”, gender: female

Threshold:

Match

Page 8: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Extract New facts

8

Cache “Repeated HTML patterns”

3. Extract New facts

Page 9: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

9

Page 10: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

10

Page 11: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

11

Conclusions

Find relevant pages about entities Extract new facts by corroborating existing facts Base on string match and HTML pattern discovery

Page 12: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

12

Comments

Advantage Idea is intuitive Language independent Search and integrate information/data on web

Drawback Can only adapt to the old entities or Lots of information hide in the articles, not only tables.

Application We can’t use it to extract the comment or new information, such as the

comments of food in the blog

Page 13: Corroborate and Learn Facts from the Web

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

13

Edit distance