13
Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

Embed Size (px)

Citation preview

Page 1: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

Never-Ending Language Learning for Vietnamese

Student: Phạm Xuân Khoái

Instructor: PhD Lê Hồng Phương

Coupled SEAL

Page 2: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

Main content

1• Introduction

2•Concepts

3•How it do

Page 3: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

1. Introduction

• SEAL (Set Expander for Any Language) is a set expansions system that accepts input elements (seeds) of some target set S and automatically finds other probable elements of S in semi-structured documents such as web pages.

• CSEAL (Coupled SEAL) is a SEAL systems which is added 2 constrants:

• mutual-exclusion

• type-checking constraints

Page 4: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

1. Introduction

• Coupled SEAL : A semi-structured extractor

• SEAL: use wrapper induction algorithm

• Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances

• Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables

• 5 queries/category 10 queries/relation fetches 50 web pages/query

• Rank by probabilities assigned as in CPL

Page 5: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

1. Introduction

Beliefs CSEAL New candidate

facts

Internet

Page 6: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

1. Introduction

3

2 1

Beliefs

Candidate facts

Knowledge Integrator

CPL RLCMCCSEAL

Data Resources

Knowledge Base

Subsystem Components

Page 7: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

Example

Page 8: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

2. Concepts

• Seed: input element

• Wrapper: defined by 2 character strings, which specify the left-context and right-context necessary for an entity to be extracted from a page. These strings are chosen by 2 conditions:

• Maximally-long contexts

• At least 1 occurrence of every seed strings on a page

Page 9: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

Example

Page 10: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

3. How it do

Page 11: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

3. How it do

Page 12: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

3. How it do

Page 13: Never-Ending Language Learning for Vietnamese Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương Coupled SEAL

References

• Toward an Architecture for Never-Ending Language Learning (http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf)

• Language-Independent Set Expansion of Named Entities using the Web (http://www.cs.cmu.edu/~wcohen/postscript/icdm-2007.pdf)

• Coupled Semi-Supervised Learning for Information Extraction (http://www.cs.cmu.edu/~rcwang/papers/wsdm-2010.pdf)

• Character-level Analysis of Semi-Structured Documents for Set Expansion (https://www.cs.cmu.edu/~rcwang/papers/emnlp-2009.pdf)