20
1 Constraint-Aware Dynamic Truth Discovery in Big Data Social Sensing IEEE Bigdata 17, Boston, MA, USA Daniel Zhang, Dong Wang, Yang Zhang Department of Computer Science and Engineering University of Notre Dame

Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

1

Constraint-Aware Dynamic Truth Discovery in Big Data Social Sensing

IEEE Bigdata 17, Boston, MA, USA

Daniel Zhang, Dong Wang, Yang Zhang

Department of Computer Science and Engineering

University of Notre Dame

Page 2: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

What is Social Sensing?(1/2)

• A new sensing paradigm of collecting observations about the physical environment from humans (social sensors) or devices

on their behalf.

2

Social Media Sensing for Disaster Report

Water/Air Quality Sensing Traffic Monitoring Personalized Recommendation

Page 3: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

What is Social Sensing?(2/2)

Advantage Compared to Physical Sensors:• Infrastructure free

• Economic

• Versatile

• Mobility

3

Page 4: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Sources Measurements (Claims)

Numeric data

Images

Text

People

Smart Devices

What to believe? Who to believe?

Truth Discovery Problem in Social Sensing

4

Page 5: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Truth Discovery Problem in Social Media

Sensing- A Twitter Example

• Social sensors are subjective• Social sensors’ reliabilities are unknown a priori

5

Page 6: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Related Work

6

Classic Batch Model Dynamic ModelExtended Batch Model

Distributed Model

IPSN 12 – Basic EM

TKDE 08 - Truth Finder

ACL 10 - Invest

.

BigData16- RTD

VLDB 14 - CATD

IPSN 14 - EM Source

KDD 15 - DynaTD

ICDCS 13 - EM Recursive

TPDS 16 - EM Hadoop Bigdata 17 - CADTD

Batch Extended Dynamic Distributed Our

Dynamic Truth × × √ √ √

Incompletes & Noisy Data × √ × × √

Physical Constraints × × × × √

We addressThree Challenges

ICDCS 17 - SSTD

Constraint-Aware Dynamic Truth Discovery

Page 7: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

7

Dynamic Truth Challenge:

Challenges (1/3)

When the truth dynamically changes how to effectively find such information?

Example - Suspect’s Escape Path Example- Impact Area of Hurricane

• When will truth change? How?• How to handle rumors and conflicting information?

Page 8: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

8

Noisy and incomplete data Challenge:

Challenges (2/3)

Social media data is incomplete and noisy in nature – how to get enough information for accurate estimation of truth?

Incomplete: 86% of the users only post one tweet and more than 91% post at most two tweets during a terrorist attack event.

Inadequate evidence to estimate the users’ reliabilities.

Noisy: rumors, misinformation, spams …

Page 9: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Physical Constraints Challenge:

Challenges (3/3)

How to incorporate prior knowledge and physical constraints into the truth discovery framework?

Common sense and prior knowledge can help improve dynamic truth discovery performance

Example1 - the suspect cannot travel 80 miles within 10 minutes during a terrorist attack.

Example2 – the number of casualties can only be non-decreasing. Wrong!

# of casualties

time

Page 10: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

10

Proposed Solution - Summary

• Dynamic Truth Discovery• Designed a Hidden Markov Model based algorithm

• Incomplete and Noisy Data• Data Fusion of Traditional News and Online Social Media

• Physical Constraints• Extended Viterbi Algorithm to consider the difficulty of

state transitions.

Page 11: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

11

Proposed Solution - Modules

Twitter News Media

Page 12: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

HMM for Dynamic TruthIdea: use crowd intelligence to infer the hidden truth.

Contribution Score (CS) = Attitude Score * (1- Uncertainty Score) * Independence Score

Issue: how to measure individual contribution to the claim?

Disagree, Assertive, Independent CS = -1 * 1 * 1

Agree, Assertive, DependentCS = 1 * 1 * 0.1

Disagree, Uncertain, Independent CS = -1 * 0.5 * 1

12

Page 13: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

External Source Fusion for Incomplete & Noisy Data

13

Claim 1

Claim 2

Adopt Modified HITS Algorithm

1

1

0.8

False

True->1

->0.5

->0.4

Page 14: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Extended Viterbi for Physical Constraints

• Global Order constraints: number of casualties ↑

• Spatial – temporal constraints: travel over cities within 10 mins

• Frequency constraints: five tornados in a row within 3 days – less likely

• Global Path constraints: barely possible to snow in Florida

Wrong!

# of casualties

time

Define 4 types of Constraints:

Page 15: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Extended Viterbi for Physical Constraints

We propose an extended Viterbi

Algorithm that considers the “difficulty” of each truth transition.

If accumulated difficulty score exceed a

threshold , the transition is invalid.

Boston

Boston

NYC

Houston

Boston

NYC

Houston0.7

0.50.7

0

0

Page 16: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

1616

Data DescriptionPrimary (Twitter): two real-world data traces collected using

Twitter Search API.

Complementary (Traditional News Media): crawled 228 reports from six major news medias that are relevant to events using the

Google Search’s customized time frame feature.

Page 17: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Evaluation Results (1/2)

72% measured variables in Boston Bombingdataset and 75% measured variables in Hurricane Matthewdataset evolve at least once

Page 18: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Evaluation Results (2/2)

Page 19: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

Future Work

• Collusion Attack - A group of user can collude intentionally craft fake social media news.

• Knowledge Transfer- Current HMM based model is event-specific, in the future we will explore more generic/transferrable solutions.

• Cyclic Dependency – social media and news media can cite each other -> source dependency

Page 20: Constraint-Aware Dynamic Truth Discovery in Big Data ...dyzhang.net/proj-images/truth.pdf · TPDS 16 - EM Hadoop Bigdata 17 - CADTD Batch Extended Dynamic Distributed Our Dynamic

20

Thank You!Social Sensing Lab at Univ. Notre Dame

http://www3.nd.edu/~sslab/

[email protected]

[email protected]