CausalTriad: Toward Pseudo Causal Relation Discovery and ...ir.hit.edu.cn/~sdzhao/Stan_Zhao_ACM BCB...

Preview:

Citation preview

CausalTriad: Toward Pseudo Causal Relation Discovery and Hypotheses Generation from

Medical Text Data

Sendong (Stan) Zhao+, Meng Jiang*, Ming Liu+, Bing Qin+, Ting Liu+

+Harbin Institute of Technology, China*University of Notre Dame, USA

Pseudo Causal Relation

• Golden standard⁃ Randomized controlled experiments

⁃ Too costly

• Observational data ⁃ Structured data, eg. EHR

⁃ Unstructured data (Text data), eg. medical literature, patient report

• Pseudo causal relation⁃ Semantic-level causal relations

⁃ Verified true causal knowledge

⁃ Or, have not been identified previously

⁃ Or, no evidence to support them

Previous Studies

• Extract causal relations from single sentences

• While causal relations usually span multiple sentences

• Use only textual information and ignore structural information

• While causal relations naturally have an attached network structure

• Only extraction rather than inference

• While causality itself is a basic logical rule

Causation Transitivity

• Preserving transitivity is a basic desideratum for an adequate analysis of causation

--L. A. Paul and Ned Hall “Causation: A User’s Guide”

𝐴 𝐵

……

𝐶 𝐴 𝐶

Causation Transitivity in Medical Text

Obesity usually increases the risk of diabetes.

People with diabetes have more sugar in blood

called hyperglycemia.

Metformin has become a mainstay of type 2

diabetes management and is now the recommended

first-line drug for treating the disease.

Obesity Diabetes

Hyperglycemia

Metformin

?

?

cause

cause

Motivation

• Jointly utilize

⁃ Textual information (context and co-occurrence)

⁃ Structural information (causation transitivity rule)

• Through inference to

⁃ Discover causal relations in text

⁃ Generate new causal relation hypotheses

Problem Definition

• Problem: Causal Relation Discovery from Triad Structures

• Medical Cause-Effect Candidates Network𝐺 = 𝑉, 𝐸 , 𝐸 ∈ 𝑉 × 𝑉

• Triad Structure

⁃ Each Triangle in the network

⁃ Basic unit

Our method

• Causal Relation Candidates Matching

• 3 Clues for Causal Discovery

⁃ Causal Association

⁃ Contextual Information

⁃ Causal Transitivity Rules

• Factor Graph Model

Causal Relation Candidates Matching

• Medical Dictionary

⁃ Dryad data package

⁃ TCMonline and TCMID

• For every n consecutive sentences

• Match medical entities

• Pair each of them into several pairs

• Every two pairs with a shared entity generate a triad structure

• Eg. (𝑒𝑖, 𝑒𝑘) and (𝑒𝑖, 𝑒𝑗) generate a triad structure (𝑒𝑘, 𝑒𝑖, 𝑒𝑗)

Our method

• Causal Relation Candidates Matching

• 3 Clues for Causal Discovery

⁃ Causal Association

⁃ Contextual Information

⁃ Causal Transitivity Rules

• Factor Graph Model

3 Clues for Causal Discovery

• Causal Association⁃ Frequently co-occurring entities are more likely to be a causation [Do and

Roth 2013]

⁃ ei is a possible cause of entity ej, if ej happens more frequently with ei than by itself [Suppes 1970]

• Contextual Information⁃ Causal relations in the text tend to share special contexts

⁃ Like domain-related words, causal triggers, connectives, etc.

• Causation Transitivity Rule

Causal Association

• Modeling causal association

𝐶𝐴 𝑒𝑖𝑗 = 𝐼(𝑒𝑖 , 𝑒𝑗) × 𝐷(𝑒𝑖 , 𝑒𝑗) × 𝑀𝑎𝑥(𝑢𝑖 , 𝑢𝑗)

⁃ Larger mutual information

𝐼 𝑒𝑖 , 𝑒𝑗 = 𝑙𝑜𝑔𝑃(𝑒𝑖 , ej)

𝑃 𝑒𝑖 𝑃(𝑒𝑗)

⁃ Award pairs that co-exist closer, while penalizing those are further apart in text

𝐷 𝑒𝑖 , 𝑒𝑗 = − log𝑠𝑒𝑛𝑡 𝑒𝑖 − 𝑠𝑒𝑛𝑡 𝑒𝑗 + 1

2 ×𝑊𝑆⁃ Model the frequency of co-occurrence of two medical entities, 𝑀𝑎𝑥 𝑢𝑖 , 𝑢𝑗

𝑢𝑖 =𝑃(𝑒𝑖,𝑒𝑗)

max𝑘

𝑃 𝑒𝑖,𝑒𝑘 −𝑃(𝑒𝑖,𝑒𝑗 )+𝜀, 𝑢𝑗 =

𝑃(𝑒𝑖,𝑒𝑗)

max𝑘

𝑃 𝑒𝑘,𝑒𝑗 −𝑃(𝑒𝑖,𝑒𝑗 )+𝜀

Contextual Information (1)

• Encode Synthetic Context

Contextual Information (2)

• Encode context based on pre-trained word2vec Word Embedding

• Three ways

Causation Transitivity Rules

• angle rules and triadic rule

Integrate 3 Clues

• Combining evidence from both textual supports and structural inferences, the above three clues are better equipped to discover causal relations.

• They are complementary in several ways:

⁃ Causal association gives preferences to frequently co-occurring causal pairs.

⁃ Causal transitivity rules are designed to identify causal relations with few textual supports except for those that follow the transitivity rule and generate new causal hypothesis.

⁃ Incorporating contextual information from the text can potentially eliminate those frequently co-occurring medical entities which are not causal.

Our method

• Causal Relation Candidates Matching

• 3 Clues for Causal Discovery

⁃ Causal Association

⁃ Contextual Information

⁃ Causal Transitivity Rules

• Factor Graph Model

CausalTriad: Factor Graph for Each Triad Structure

Experiments

• Data collection

⁃ TCM consists of the abstracts of 106,151 papers.

⁃ HealthBoards consists of post messages on health and medical issues such as diseases, symptoms, medicines, and side-effects, etc.

• Generating new causal relation hypotheses

Experimental Results

• Different types of causal relations⁃ DISEASE–cause–SYMPTOM

⁃ FORMULA–against–DISEASE

⁃ HERB–against–DISEASE

⁃ FORMULA–relieve–SYMPTOM

⁃ HERB–relieve–SYMPTOM

⁃ DISEASE–bring–DISEASE

⁃ DRUG–against–DISEASE

⁃ DISEASE–cause–SYMPTOM

Experimental Results

• Patterns causal reasoning rules

Experimental Results

• Causal relation extraction

Experimental Results

• Extracting causal relations from single sentence and multiplesentences.

• Extracting implicit causal relations

Experimental Results

Influence Factors

• Influence from the size of labeled training data

Influence Factors

• Influence from the number of bootstrapping rounds and window size

Conclusions

• We propose CausalTriad to incorporate both textual and structural clues for causal relation discovery from text.

• Experimental results on two datasets demonstrate that:

⁃ CausalTriad is effective for discovering explicit and implicit causal relations from both single sentence and multiple sentences.

⁃ CausalTriad can generate new causal relation hypotheses through inference.

Thank You!Any comments and suggestions?

Homepage: http://ir.hit.edu.cn/~sdzhao/

Email: zhaosendong@gmail.com

Sendong (Stan) Zhao Meng Jiang Ming Liu Ting LiuBing Qin

Recommended