68
Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas 1

Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Embed Size (px)

Citation preview

Page 1: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Knowledge Acquisition on the Web

Growing the amount of available knowledge from within

Christopher Thomas

1

Page 2: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 2

Overview

• Knowledge Representation– GlycO – Complex Carbohydrates domain

ontology• Information Extraction

– Taxonomy creation (Doozer/Taxonom.com)– Fact Extraction (Doozer++)

• Validation

Page 3: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 3

Circle of knowledge on the Web

Page 4: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Goal:Harness the Wisdom of the

Crowds to automatically model a domain, verify the model and

give the verified knowledge back to the community

4

Page 5: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 5

Circle of knowledge on the Web

What is knowledge?

How do we turn propositions/beliefs into knowledge?

How do we acquire knowledge?

Page 6: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Background Knowledge

[15] Christopher Thomas and Amit Sheth, “On the Expressiveness of the Languages for the Semantic Web–Making a Case for ‘A Little More,’”in Fuzzy Logic and the Semantic Web, Eli Sanchez (Ed.), Elsevier, 2006.

[11] Amit Sheth, Cartic Ramakrishnan, and Christopher Thomas, “Semantics for The Semantic Web: the Implicit, the Formal and the Powerful,”International Journal on Semantic Web & Information Systems, 1 (no. 1), 2005, pp. 1–18.

6

Page 7: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 7

Different Angles

• Social construction– Large scale creation of knowledge

vs.– Small communities define their domains

• Normative vs. Descriptive=Top-Down vs. Bottom-Up

• Formal vs. Informal=Machine-readable vs. human-readable

Page 8: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Community-created knowledge

• Descriptive• Bottom-up• Formally less rigid• May contain false information• If a statement in the world is in conflict with

the Ontology, both may be wrong or both may be right

• Good for broad, shallow domains• Good for human processing and IR tasks

8

Page 9: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Wikipedia and Linked Open Data

• Created by large communities• Constantly growing• Domains within the linked data are not

always easily discernible• Contain few axioms and restrictions

– Little value to evaluation using logics

9

Page 10: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Formal - Modeling deep domains

• Prescriptive / Normative• Top-down• Contains “true knowledge”• If a statement in the world is in conflict with the

Ontology, the statement is false• Good for scientific domains• Good for computational reasoning/inference• Usually created by small communities of experts• Usually static, little change is expected

10

Page 11: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Example: GlycO

• Created in collaboration with the Complex Carbohydrate Research Center at the University of Georgia on an NCRR grant.

• Deep modeling of glycan structures and metabolic pathways

[6] Christopher Thomas, Amit P. Sheth, and William S. York, “Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain,”in Formal Ontology in Information Systems (FOIS 2006)

[5] Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William York, and Samir Tartir, “Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies,”15th International World Wide Web Conference (WWW2006),

11

Page 12: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science12

GlycO

Page 13: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

N-Glycosylation metabolic pathway

GNT-Iattaches GlcNAc at position 2

UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2

GNT-Vattaches GlcNAc at position 6

UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021

N-acetyl-glucosaminyl_transferase_VN-glycan_beta_GlcNAc_9N-glycan_alpha_man_4

13

Page 14: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15, 2003: 235-251

b-D-Manp-(1-6)+ | b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-D-GlcNAc |b-D-Manp-(1-3)+

Glycan Structures for the ontology

• Import structures from heterogeneous databases

• Possible connections modeled in the form of GlycoTree

• Match structures to archetypes

14

Page 15: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Interplay of extraction and evaluation

• Errors in the source databases are propagated through various new databases comparing multiple sources fails for error correction

• Less than 2% of incorrect information makes a database useless for automatic validation of hypotheses

• The ontology contains rules on how carbohydrate structures are known to be composed

• By mapping information in databases to the ontology and analyzing how successful the mapping was, we can identify possible errors.

15

Page 16: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 16

Database Verification using GlycO

N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15, 2003: 235-251

b-D-Manp-(1-6)+ | a-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-D-GlcNAc |b-D-Manp-(1-3)+

a-D-Manp-(1-4) is not part of the identified canonical structure for N-Glycans, hence it is likely that the database entry is incorrect

Page 17: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Pathway Steps - Reaction

Evidence for this reaction from three experiments

Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia

17

Page 18: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 18

Page 19: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 19

Summary - GlycO

• The amount of accuracy and detail that can be found in ontologies such as GlycO could most likely not be acquired automatically

• Only a small community of experts has the depth of knowledge to model such scientific ontologies

Page 20: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 20

Summary - GlycO

• However, the automatic population shows that a highly restrictive, expert-created rule set allows for automation or involvement of larger communities.

• Frame-based population of knowledge• The formal knowledge encoded in the

ontology serves to acquire new knowledge• The circle is completed

Page 21: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Summary Background Knowledge

• Large amounts of information and knowledge are available

• Some machine readable by default• Others need specific algorithms to extract

information• The more available information we can use,

the better the extraction of new information will be.

21

Page 22: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 22

Circle of knowledge on the Web

What is knowledge?

How do we turn propositions into knowledge?

Part 2

How do we acquire knowledge?

Page 23: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Model Creation

[3] Christopher Thomas, Pankaj Mehra, Roger Brooks and Amit Sheth. Growing Fields of Interest -Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence 2008, pp. 496-502

[2] Christopher Thomas, Wenbo Wang, Delroy Cameron, Pablo Mendes, Pankaj Mehra and Amit Sheth, What Goes Around Comes Around - Improving Linked Open Data through On-Demand Model Creation, WebScience 2010

[1] Christopher Thomas, Pankaj Mehra, Wenbo Wang, Amit Sheth, Gerhard Weikum and Victor Chana Automatic Domain Model Creation Using Pattern-Based Fact Extraction, Knoesis Center technical report.

Knowledge Acquisition through

[3]

[2]

[1]

23

Page 24: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

First create a domain hierarchy

Example: a hierarchy for the domain of Human Performance and Cognition

24

Page 25: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Connect with learned facts

25

Page 26: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Example: strongly connected component

26

Page 27: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Excerpt: strongly connected component

27

Page 28: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Expert evaluation of facts in the ontology

7-9: Correct Information not commonly known

1-2: Information that is overall incorrect

3-4: Information that is somewhat correct

5-6: Correct general Information

28

Page 29: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Technical Details

29

Page 30: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 30

Domain hierarchy creation

• Input terms e.g. related to Human Performance and Cognition

• Hierarchy is automatically carved from articles and categories on Wikipedia

Step 1

Page 31: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Overview - conceptual

• Expand and Reduce approach– Start with ‘high recall’ methods

• Exploration - Full text search• Exploitation – Node Similarity Method• Category growth

– End with “high precision” methods• Apply restrictions on the concepts found• Remove unwanted terms and categories

31

Page 32: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Graph-based expansion

32

Expand - conceptually

Full text search on Article texts

Delete results with low confidence score

Page 33: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 33

Collecting Instances

Page 34: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 34

Creating a Hierarchy

Page 35: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Step 2: Pattern-Based Relationship Extraction

Extracting meaningful relationships by macro-reading

free text

35

Page 36: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 36

Extracting from Plain text or hypertext

• Informal, human-readable presentation of information

• Vast amounts of information available– Web– Scientific publications– Encyclopediae

• Need sophisticated algorithms to extract information

Page 37: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 37

Pattern-based Fact Extraction

• Learn textual patterns that express known relationship types

• Search the text corpus for occurrences of known entities (e.g. from domain hierarchy)

• Semi-open– Types are known and limited– Types are automatically expanded when LOD

grows• Vector-Space Model• Probabilistic representation

Page 38: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Training

• Relationship data in the UMLS Metathesaurus or the Wikipedia Infobox-data provide a large set of facts in RDF Triple format– Limited set of relationships that can

be arranged in a schema– Semi-open

• Types are known and limited• Types are automatically expanded

when LOD grows

38

Page 39: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Training procedure

• Iterate through all facts (S->P->O triples)• Find evidence for the fact in a corpus

– Wikipedia, WWW, PubMed or any other collection

– If triple subject and triple object occur in close proximity in text, add the pattern in-between to the learned patterns

• Combined evidence from many different patterns increases the certainty of a relationship between the entities

39

Page 40: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Overview – initial computations

Fact Collection

Text Corpus

EntropySVD/LSI

CP2P CP2PmodCP2P R2P

Modifications *

Pertinence

R2P

Matrix Computations

*R2Pmod

40

Page 41: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science 41

Training procedure cont’d

Canberra::Australia

Canberra, the Australian capital city

Canberra, capital of theCommonwealth of Australia

Canberra, the Australian capital

Canberra, the Australian capital city

<Subject>, the <Object> capital city

<Subject>, capital of the Commonwealth of

<Object>

<Subject>, the <Object> capital

1 1 1

Page 42: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Relationship Patterns

X, the Y capital city

X, capital of theCommonwealth of Y

X, the Y capital

Capital_of 1 1 1

X, the Y capital city

X, capital of Y X, the Y capital

Capital_of 1 1 1

Extracted Synonyms

X, the Y capital * X, capital of Y

Capital_of 2 1

Generalize

42

Page 43: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Relationship Patterns

X, the Y capital *

X, capital of Y X, * * Y X, predecessor of Y

Capital_of 2 2 2 0

predecessor 0 0 2 2

X, the Y capital *

X, capital of Y X, * * Y X, predecessor of Y

Capital_of 1.0 1.0 0.5 0

predecessor 0 0 0.5 1.0

43

Page 44: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Resolve Relationships

X, the Y capital *

X, capital of Y

X, * * Y X, predeces-sor of Y

Capital_of

1.0 1.0 0.5 0

predecessor

0 0 0.5 1.0

0.5 X, the Y capital *

0.25 X, capital of Y

0.25 X, * * Y

0 X, predecessor of Y

x

44

Page 45: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Resolve Relationships

X, the Y

capital *

X, capital of Y

X, * * Y X, predecessor

of Y

Capital_of

1.0 1.0 0.5 0

predecessor

0 0 0.5 1.0

0.5 X, the Y capital *

0.25 X, capital of Y

0.25 X, * * Y

X, predecessor of Y

xCapital_of predecessor

0.875 0.125

45

Page 46: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Advanced Computations

Fact Collection

Text Corpus

EntropySVD/LSI

CP2P CP2PmodCP2P R2P

Modifications *

Pertinence

R2P

Matrix Computations

*R2Pmod

46

Page 47: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Advanced Computations

EntropySVD/LSI Pertinence

R2P

Matrix Computations

*R2Pmod

LSI to determine relationship similaritiesReduces sparsity in the matrix and makes relationship rows more comparableAllows better use of pertinence computation

EntropyIncrease weights for more unique patterns

PertinenceSmoothing of pattern occurrence frequencies

47

Page 48: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Example Output (DBPedia)

Subject :: Object

Extracted Rank 1

(Rel;Confidence) Rank 2 Rank 3

Howard Pawley :: Gary Filmon

successor;0.799

after;0.768

office;0.686

Species Deceases:: Midnight Oil

producer;0.761

artist;0.719

genre;0.467

The Crystal City :: Orson Scott Card

artist;0.625

author;0.617

writer;0.583

Horatio Allen :: William Maxwell predecessor;0.629 before;0.475

Basdeo Panday :: Trinidad &Tobago deathPlace;0.658

birthplace;0.658

nationality;0.330

Beccles railway station :: Suffolk district;0.772

borough;0.770

friend;0.749

48

Page 49: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Pertinence for Relations

• Looking at fact extraction as a classification of concept pairs into classes of relations

• Class boundaries are not clear cut• E.g. has_physical_part has_part• don’t punish the occurrence of the same

pattern with relationship types that are similar

49

Page 50: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Relationship Patterns

X, the Y capital *

X, capital of Y X, * * Y X, located in Y

Capital_of 2 2 2 2

Located_in 0 0 2 4

X, the Y capital *

X, capital of Y X, * * Y X, located in Y

Capital_of 1.0 1.0 0.2 0.5

Located_in 0 0 0.2 0.9

50

Page 51: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Resolve Relationships

X, the Y capital *

X, capital of Y

X, * * Y X, located in Y

Capital_of

1.0 1.0 0.2 0.5

Located_in

0 0 0.2 0.9

0.4 X, the Y capital *

0.1 X, capital of Y

0.3 X, * * Y

0.2 X, located in Y

xCapital_of Located_in

0.66 0.24

51

Page 52: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Evaluation of the fact extraction - DBPedia

52

Pre

cisi

on /

Rec

all

Confidence Threshold

Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over relation types.

60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus

Page 53: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Evaluation of the fact extraction - UMLS

53

Pre

cisi

on /

Rec

all

Confidence Threshold

Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over relation types.

60% training set, 40% testing, UMLS fact corpus, MedLine text corpus

Page 54: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Manual Evaluation strategy (DBPedia)

Score Subject :: Objectsuggested Relationship

Extracted Rank 1

(Rel;Confidence) Rank 2 Rank 3

1Howard Pawley :: Gary Filmon after

successor;0.799

after;0.768

office;0.686

0.5 Mulan :: Tarzan afternextSingle;0.603

followedBy;0.533

after;0.416

1

Species Deceases:: Midnight Oil artist

producer;0.761

artist;0.719

genre;0.467

1The Crystal City :: Orson Scott Card author

artist;0.625

author;0.617

writer;0.583

1Horatio Allen :: William Maxwell before predecessor;0.629 before;0.475

1Basdeo Panday :: Trinidad &Tobago birthplace deathPlace;0.658

birthplace;0.658

nationality;0.330

1Bob Nystrom :: Stockholm birthplace cityOfBirth;0.677 birthplace;0.513

1Beccles railway station :: Suffolk borough district;0.772

borough;0.770

friend;0.749

54

Page 55: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Manual Evaluation strategy (UMLS)

poisoning, fluoride::teeth[finding_site_of] finding_site_of 1

polyneuritis, endemic::vitamin b 1[associated_with] has_form 0

polyp of cervix nos (disorder)::768 polyps[associated_with] associated_with 1

polyp of cervix nos (disorder)::neck of uterus[location_of] finding_site_of 1

polyp of colon::benign neoplasms[related_to] associated_with 0.5

brain::brain contusion [has_location]associated_morphology_of 0.25

brain::brain ischemia [has_finding_site] location_of 0.5polyp of colon::gastrointestinal tract, nos[is_primary_anatomic_site_of_disease] location_of 0.5

polyvesicular vitelline tumor::gamete structure (cell structure)[is_normal_cell_origin_of_disease]

is_normal_cell_origin_of_disease 1

proptosis::apert syndrome[has_manifestation] has_manifestation 1

55

Page 56: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Manually evaluated precision for different confidence values

56

Page 57: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Manually evaluated precision, confidence > 0.5 (on UMLS – MedLine corpus)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

UMLS - Pert - Ent

57

Page 58: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Summary Model Creation

• Using background knowledge in the form of a fact corpus and a text corpus, we can suggest new facts/propositions

• Possible to try all combinations of known concepts (e.g. Read-the-Web project), but huge validation backlog

• Letting users drive the model creation focuses the creation on the parts that are of common interest

Willingness to help validate facts

58

Page 59: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Circle of knowledge on the Web

59

What is knowledge?

How do we turn propositions/beliefs into knowledge?

How do we acquire knowledge?

Part 3

Page 60: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Evaluation and Use

Christopher Thomas, Wenbo Wang, Delroy Cameron, Pablo Mendes, Pankaj Mehra and Amit Sheth, What Goes Around Comes Around - Improving Linked Open Data through On-Demand Model Creation, to appear in WebScience 2010

60

Current Work

Page 61: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Explicit evaluation

• “Evaluate for evaluation’s sake”– Domain-experts rank the value of a proposition– Committees of experts and/or laymen vote on

the correctness of propositions

61

Page 62: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Explicit evaluation in the Semantic Browser

• The user can vote on facts• Some facts are presented randomly• Most facts are presented after the user (by

browsing) showed interest in– The full triple– Subject/Object of the triple

62

Page 63: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Implicit evaluation

• Evaluation that does not explicitly involve a vote on the extracted information

• Use the Wisdom of the Crowds• Users show support for a proposition by

performing an action• Every action taken on a piece of

information is recorded and analyzed• The cumulative behavior of the users gives

an indication of which propositions are correct or interesting

63

Page 64: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Implicit evaluation in the Semantic Browser

• The user simply searches and browses• The search history and the click-stream

provide information about whether a page transition using an extracted triple was successful

• Assumption: on average, a successful trail-browsing session includes valid triples

• Problem: requires extensive use

64

Page 65: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Implicit evaluation in the Semantic Browser

65

1st triple

2nd triple

Triples

Page 66: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Conclusion

• Creating domain models gives a way of selectively adding knowledge to a system

• We showed that it is possible to automatically create such models with high accuracy

• The models immediately impact users Willingness to help evaluate

• Evaluation becomes integral part in knowledge lifecycle

66

Page 67: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

?

67

Page 68: Knowledge Enabled Information and Services Science Knowledge Acquisition on the Web Growing the amount of available knowledge from within Christopher Thomas

Knowledge Enabled Information and Services Science

Thank you

68