19
Using Text Comprehension Model for Learning Concepts, Context, and Topic of Web Content 11th International Conference on Semantic Computing IEEE ICSC 2017 - San Diego, California, USA Jan 30-Feb 1, 2017 Ismael Ali, Naser Al Madi, Austin Melton Department of Computer Science Kent State University

Using Text Comprehension Model for Learning Concepts, Context…iali1/Ismael-Ali-ppt-ICSC2017.… ·  · 2017-02-06Using Text Comprehension Model for Learning Concepts, Context,

  • Upload
    haliem

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Using Text Comprehension Model for Learning Concepts, Context, and Topic

of Web Content

11th International Conference on Semantic ComputingIEEE ICSC 2017 - San Diego, California, USA

Jan 30-Feb 1, 2017

Ismael Ali, Naser Al Madi, Austin Melton Department of Computer Science

Kent State University

Outline• Text Comprehension• System Architecture and Workflow• Semantic Learning

– Semantic Network Construction– Mathematical Foundation– Domain Concept Learning– Topic Learning– Context Learning

• Experimental Design• Evaluation Strategy• Results• Conclusion and Future Works

Abstract• Role of learning Semantics including concepts, contexts, and

topics from web documents– semantic-based structuring and retrieving

• We present a novel approach for domain-independent semantic learning.

• Our approach uses a computational version of the Construction-Integration (CI) model of text comprehension.

Text Comprehension

• Comprehension is a cognitive-based learning process

• Comprehension produces the mental representations:– perceptual– verbal– semantic representations

• CI model simulates the incremental and dynamic task of comprehending the text and it leads to the construction of a semantic network (SN)

CI as a Cognitive Model of Text Comprehension

This figure from: (Cathleen Wharton and Walter Kintsch, 1991 in ACM SIGART Bulletin)

Surface Model

Text-Base Model

Situation Model

Situation Model• Time of acquisition

• Recognizing main concepts

• Integrating them with background knowledge

System Architecture and Workflow Using Stanford CoreNLP1. Text tokenization2. Lemmatization 3. Sentence splitting - To get the Surface Model.4. Part of Speech Tagging5. Anaphora Resolution

Running the computational CI model to produce weighted semantic network

Analysis and filtering of the weighted semantic networks

Semantic Network Construction

• Sentences are presented as single units of time (a reading episode)

• “Knowledge is a familiarity. Awareness or understanding of something. Such as facts.”

Recognized Concepts

Neglected Concepts

Recognized Associations

Neglected Associations

Fig. 2. Sample Concept Network. (After running the CI model)

• “Knowledge is a familiarity. Awareness or understanding of something. Such as facts.”

• Episodes of {e1, e

2, ... , e

i} are background knowledge for episode {e

i+1}

• Weights on edges represents the semantic association strength

Fig. 2. Sample Concept Network. (After running the CI model)

1. concept recognition threshold (S) is 7 for Fig. 2

– s(“something”) = 6 – e1 + e2 < S– s(“Awareness”) = 12– e3 + e4 > S

2. association recognition threshold (I) is 5 for Fig. 2

– i(“Knowledge”,”facts”) < I– i(“Knowledge”,”Awareness”) > I

Semantic Network Construction

1. Associative Matrix is generated from Text-base model

2. Each sentence forms an Individual Concept Network, ICN

3. All ICN graphs are combined to create the Base Semantic Network, BSN

Semantic Network Construction:Semantic Association Graph

C1-Sent-ID C2-Sent-ID;in which C2 1st occured

C3-Sent-ID C4-Sent-ID ... Cn-Sent-ID

1 2 3 4 ... n

C1 C2 C3 C4 ... Cn

1 C1

2 C2

3 C3 Sentence-ID of 1st episode, which

C3 and C2are co-occurrence

4 C4

... ...

n Cn

- Finding weights and thresholds:

4. BSN shows recognized the which were neglected concepts and associations

6. BSN Semantic network is represented as a set of inequalities:

- Inequalities set upper- and lower-bound for concept (S) and association (I) recognition thresholds- Linear programming finds the suitable values for all variables to satisfy the inequalities

7. Finding values for the variable vector X that satisfies the inequalities; by minimizing the problem

specified in:

Semantic Network Construction: Mathematical Foundation

Where:- f is the linear objective function- A is the left hand side of the inequalities- B is the right hand side of the inequalities- LB is the lower bound of the solution - UB is the upper bound of the solution- The resulting variable vector contains weights for nodes and associations, along with individual thresholds (S) and (I) values for recognizing concepts and associations.

Domain Concepts Learning• variable vector used to construct the semantic network Gi = (Ci, Ei)

• Then the concept filtering performed to learn domain concepts

• Domain concepts for web document di are the concepts in a subgraph G*i of

its semantic network Gi :

- G* i = (C*i, E

*i) where;C*

i ⊂ Ci, and Ei* ⊂ Ei

• Filtering mechanisms: (1) statistical-based filtering: mean threshold and median threshold(2) positive-based filtering: suggested for the proposed cognitive-based semantic learning approach

Topic Learning

• Foreach domain concept ci ∈ C*

i in d

j calculate the Topic Identification

Weight (Tiw):

– CIw(c

i) : the weight calculated the computational CI model

– Eigenvector(ci) : the value of eigenvector centrality measure as the

function of the centralities of its neighbors– e(c

i) is the episode in which the given concept c

i first appeared

• Topic Identification: – Topic concept of d

i is the concept with the highest Tiw weight

– The most influential node in the semantic network G*i of domain

concept set

Context Learning

• The context of the di is the all the nearest neighbor (nodes

with distance k=1) to the topic concept

• Thus the context includes :

– the most semantically associated to the topic concept– a normal distribution of a concept selection from

different sections of the text

Experimental Design

• A diverse set of ten randomly selected web documents from Wikipedia – astronomy, brain, cognition, ecology, knowledge, law,

literacy, robotic, virus and tennis

• Testing the the openness (domain-independency) property of our approach in learning semantics of the web contents

Evaluation Strategies

• Results of filtering mechanisms are evaluated by human judgment strategy [4]:

1. A set of seven human judges (domain experts) selected, KSU

2. Human judges were asked to evaluate the list(s) of all potential concepts learned from the CI model for each web document

3. Then asked to identify whether the concepts belonged to a given domain or not

4. Next, domain concepts identified by the domain experts were compared against the domain concepts identified by each concept filtering strategy.

5. Then the quality of each concept filtering strategy was evaluated.

• The evaluation performed using the binary evaluation measures from IR: Precision, Recall and F1

Domain Concepts Analysis

Domain concepts for web document of Ecology

Context and Topic Analysis

Context for web document of EcologyTopic-Concept for web document of Ecology

• We investigated a novel approach for open learning of the concepts, contexts, and topics of web contents.

• Our approach is based on the Construction-Integration (CI) model of text comprehension, which mimics the way humans learn the semantic components of a web document.

• We also highlighted the use of cognitive science results in learning semantics from web content.

• Our work is a step toward our future research on cognition and open based:– Ontology Learning – Ontology Selection

Conclusion and Future Work

Thank you.