17
Visual analytics for discovering entity relationship on text data Hanbo Dai Ee-Peng Lim Hady Wirawan Lauw HweeHwa Pang

Visual analytics for discovering entity relationship on text data

  • Upload
    blanca

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Visual analytics for discovering entity relationship on text data. Hanbo Dai Ee-Peng Lim Hady Wirawan Lauw HweeHwa Pang. Analysis scenario. A homeland security analyst Finds out relationships between two terrorists on complex, large information sources Needs user judgments. Mas Selamat. - PowerPoint PPT Presentation

Citation preview

Page 1: Visual analytics for  discovering entity relationship on text data

Visual analytics for discovering entity relationship on text data

Hanbo DaiEe-Peng LimHady Wirawan LauwHweeHwa Pang

Page 2: Visual analytics for  discovering entity relationship on text data

Analysis scenario

• A homeland security analyst– Finds out relationships between two terrorists

on complex, large information sources – Needs user judgments

Jemaah Islamiah Al-QaedaMas Selamat Osama Bin Laden

Justinus Andjarwirawan

Born in Central Java

Abu Latif

Was not directly connected

Page 3: Visual analytics for  discovering entity relationship on text data

Visual analytics system architecture

Page 4: Visual analytics for  discovering entity relationship on text data

Two TUBE (Text-Cube) instances for entity relationship discovery

e0 e1 e2 e3 e4

e0 e1

e2

e3 e4

e0

e1

e2

e3

e4

T1=<S1, B1, M1, D>

T2 =<S2, B2, M2, D>

Document Evidencee.g. {d1, d2,…}

Mask value (0/1)nodes

Measures e.g. Path_strength

Document Evidencee.g. {d3, d4,…}

Mask value (0/1)edges

Measures e.g. strength

Page 5: Visual analytics for  discovering entity relationship on text data

ER-Explorer interface

Page 6: Visual analytics for  discovering entity relationship on text data

Visual analytical operations

• Insert

• Cluster

• Delete

Page 7: Visual analytics for  discovering entity relationship on text data

Our tool helps to discover new relationships

Page 8: Visual analytics for  discovering entity relationship on text data

Conclusion

• Interactive visual method to discover entity and relationships embedded in text data

• ER-Explorer equipped with TUBE model and operations

• Our tool assisted analysts in finding relationships between two terrorists

Page 9: Visual analytics for  discovering entity relationship on text data

Back up slides

Page 10: Visual analytics for  discovering entity relationship on text data

Case study• Dataset: The hijacking of IC814• Entities of type Person, Organization, Event, GPE are extracted• Co-occurrence Relationships are identified on sentence level.• Each sentence is considered as a document.

Page 11: Visual analytics for  discovering entity relationship on text data

Text-Cube Model Represents Entities and Relationships • An entity is either a named entity or a conceptual entity.• A n-dimensional TUBE is a tuple T= <S, B, M, D>

– S: Schema = {s1, s2,…, sn}• Si denotes the list of entities of dimension i

– B: Mask• 0 or 1 value

– M: Measure= {m1, m2,…, m|M|}• Each measure mi is associated with a measure function mfi

– D: Document Collection– A TUBE T has | s1|×|s2|×…×| sn | cells

• A cell c– Has document evidence denoted as Fd(c) – Is present if B(c)=1 , or hidden if B(c)=0– Has measure value denoted as c.mj , computed by mfj(c)– Represent the co-occurrence relationship, if Fd(c) is not empty

Page 12: Visual analytics for  discovering entity relationship on text data

Measure formulas

Page 13: Visual analytics for  discovering entity relationship on text data

Two TUBE Instances for entity relationship discovery• A discovery task is to find interesting paths between two

entities source (s) and target (t)– A path represents a chain of relationships

• 1-Dimension TUBE instance: T1=<S1, B1, M1, D>– S1 initiated as all named entities – M1= {path_strength}

• The strength of shortest path through an entity between s and t

• 2-Dimension TUBE instance: T2=<S2, B2, M2, D>– S2 initiated as all named entities on both dimensions– M2= {name_sim, strength, dom_entity}

• name_sim– Computed by edit distance

• strength– Computed by Jaccard Coefficent or Dice Coefficent

• dom_entity– Whenever ei appears ej is always there, ej dominate ei

Page 14: Visual analytics for  discovering entity relationship on text data

Related Work

• Social network visualization– assume entities and relations

• have been identified and verified.• can be studied without supporting document

– Use only measures of graph structure, such as degree, centrality.

• Automatic path/subgraph finding algorithms– Users have little control over the relations and entities

involved– Do not consider semantically identical entities.

Page 15: Visual analytics for  discovering entity relationship on text data

Formal definition of entity

• Entity e is defined as a named object or a set of other entities.

Page 16: Visual analytics for  discovering entity relationship on text data

Tube operations

• Insert– Add an entity to a dimension

• Remove– Remove an existing entity from a dimension

• SelectCell– Assign 0 or 1 to a entry (a cell in T) in Mask

• Cluster– Add a new conceptual entity representing a s

ubset of entities to a dimension

Page 17: Visual analytics for  discovering entity relationship on text data

Visual Analytics Operations

• Insert an entity– SelectCell in T1 and T2

– Reveals all relationships this entity has with all entities in the network

• Delete– Delete a named entity

• SelectCell in T1

– Delete a conceptual entity• Remove in T1 and T2

– Delete a relationship (a cell)• SelectCell in T2

• Cluseter– Cluster in in T1 and T2