Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Constraint-Aware Dynamic Truth Discovery in Big Data Social Sensing
IEEE Bigdata 17, Boston, MA, USA
Daniel Zhang, Dong Wang, Yang Zhang
Department of Computer Science and Engineering
University of Notre Dame
What is Social Sensing?(1/2)
• A new sensing paradigm of collecting observations about the physical environment from humans (social sensors) or devices
on their behalf.
2
Social Media Sensing for Disaster Report
Water/Air Quality Sensing Traffic Monitoring Personalized Recommendation
What is Social Sensing?(2/2)
Advantage Compared to Physical Sensors:• Infrastructure free
• Economic
• Versatile
• Mobility
3
Sources Measurements (Claims)
Numeric data
Images
Text
People
Smart Devices
What to believe? Who to believe?
Truth Discovery Problem in Social Sensing
4
Truth Discovery Problem in Social Media
Sensing- A Twitter Example
• Social sensors are subjective• Social sensors’ reliabilities are unknown a priori
5
Related Work
6
Classic Batch Model Dynamic ModelExtended Batch Model
Distributed Model
IPSN 12 – Basic EM
TKDE 08 - Truth Finder
ACL 10 - Invest
.
BigData16- RTD
VLDB 14 - CATD
IPSN 14 - EM Source
KDD 15 - DynaTD
ICDCS 13 - EM Recursive
TPDS 16 - EM Hadoop Bigdata 17 - CADTD
Batch Extended Dynamic Distributed Our
Dynamic Truth × × √ √ √
Incompletes & Noisy Data × √ × × √
Physical Constraints × × × × √
We addressThree Challenges
ICDCS 17 - SSTD
Constraint-Aware Dynamic Truth Discovery
7
Dynamic Truth Challenge:
Challenges (1/3)
When the truth dynamically changes how to effectively find such information?
Example - Suspect’s Escape Path Example- Impact Area of Hurricane
• When will truth change? How?• How to handle rumors and conflicting information?
8
Noisy and incomplete data Challenge:
Challenges (2/3)
Social media data is incomplete and noisy in nature – how to get enough information for accurate estimation of truth?
Incomplete: 86% of the users only post one tweet and more than 91% post at most two tweets during a terrorist attack event.
Inadequate evidence to estimate the users’ reliabilities.
Noisy: rumors, misinformation, spams …
Physical Constraints Challenge:
Challenges (3/3)
How to incorporate prior knowledge and physical constraints into the truth discovery framework?
Common sense and prior knowledge can help improve dynamic truth discovery performance
Example1 - the suspect cannot travel 80 miles within 10 minutes during a terrorist attack.
Example2 – the number of casualties can only be non-decreasing. Wrong!
# of casualties
time
10
Proposed Solution - Summary
• Dynamic Truth Discovery• Designed a Hidden Markov Model based algorithm
• Incomplete and Noisy Data• Data Fusion of Traditional News and Online Social Media
• Physical Constraints• Extended Viterbi Algorithm to consider the difficulty of
state transitions.
11
Proposed Solution - Modules
Twitter News Media
HMM for Dynamic TruthIdea: use crowd intelligence to infer the hidden truth.
Contribution Score (CS) = Attitude Score * (1- Uncertainty Score) * Independence Score
Issue: how to measure individual contribution to the claim?
Disagree, Assertive, Independent CS = -1 * 1 * 1
Agree, Assertive, DependentCS = 1 * 1 * 0.1
Disagree, Uncertain, Independent CS = -1 * 0.5 * 1
12
External Source Fusion for Incomplete & Noisy Data
13
Claim 1
Claim 2
Adopt Modified HITS Algorithm
1
1
0.8
False
True->1
->0.5
->0.4
Extended Viterbi for Physical Constraints
• Global Order constraints: number of casualties ↑
• Spatial – temporal constraints: travel over cities within 10 mins
• Frequency constraints: five tornados in a row within 3 days – less likely
• Global Path constraints: barely possible to snow in Florida
Wrong!
# of casualties
time
Define 4 types of Constraints:
Extended Viterbi for Physical Constraints
We propose an extended Viterbi
Algorithm that considers the “difficulty” of each truth transition.
If accumulated difficulty score exceed a
threshold , the transition is invalid.
Boston
Boston
NYC
Houston
Boston
NYC
Houston0.7
0.50.7
0
0
1616
Data DescriptionPrimary (Twitter): two real-world data traces collected using
Twitter Search API.
Complementary (Traditional News Media): crawled 228 reports from six major news medias that are relevant to events using the
Google Search’s customized time frame feature.
Evaluation Results (1/2)
72% measured variables in Boston Bombingdataset and 75% measured variables in Hurricane Matthewdataset evolve at least once
Evaluation Results (2/2)
Future Work
• Collusion Attack - A group of user can collude intentionally craft fake social media news.
• Knowledge Transfer- Current HMM based model is event-specific, in the future we will explore more generic/transferrable solutions.
• Cyclic Dependency – social media and news media can cite each other -> source dependency
20
Thank You!Social Sensing Lab at Univ. Notre Dame
http://www3.nd.edu/~sslab/