66
Mining Citizen Sensor Communities to Improve Cooperation with Organizational Actors June 23 2015 PhD Defense Hemant Purohit (Advisor: Prof. Amit Sheth) Kno.e.sis, Dept. of CSE, Wright State University, USA

Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

Embed Size (px)

Citation preview

Page 1: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

Mining Citizen Sensor Communities to Improve Cooperation with Organizational Actors

June 23 2015 PhD Defense

Hemant Purohit (Advisor: Prof. Amit Sheth)  

Kno.e.sis, Dept. of CSE, Wright State University, USA

Page 2: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Outline

�  Citizen Sensor Communities & Organizations

�  Cooperative System Design Challenges

�  Contributions �  Problem 1. Conversation Classification using Offline Theories �  Problem 2. Intent Classification �  Problem 3. Engagement Modeling

�  Applications

�  Limitations & Future Work

2

Page 3: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Citizen Sensors: Access to Human Observations & Interactions

Uni-directional communication (TO people)

Unstructured, Unconstrained Language Data •  Ambiguity •  Sparsity •  Diversity •  Scalability

Bi-directional (BY people, TO people)

Web 2.0 media

3

Page 4: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Goal: Data to Decision Making

Organizational Decision Making

Noisy Citizen Sensor data

4

SOCIAL SCIENCE

•  Experts on Organizations •  Small-scale Data

COMPUTER SCIENCE

•  Experts on Mining •  Large-scale data

Scope of My Research

Page 5: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

1.  No Structured Roles 2.  No Defined Tasks

ü  But “GENERATE” Massive Data

1.  Structured Roles 2.  Defined Tasks

ü  COLLECT Data ü  Process, & Make Decisions

ORGANIZATIONS  

Sure! How to help?

CITIZEN  SENSOR  COMMUNITIES  

5

COOPERATIVE SYSTEM

Can you help us?

Page 6: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Computer-Supported Cooperative Work (CSCW) Matrix

6

[Johansen 1988,

Baecker 1995]

TIME

PLACE

Page 7: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Articulation

Challenges (Malone & Crowston 1990; Schmidt & Bannon 1992)

ENGAGEMENT MODELING INTENT MINING

COOPERATIVE SYSTEM

DATA PROBLEM

DESIGN PROBLEM

7

ORGANIZATIONS   CITIZEN  SENSOR  COMMUNITIES  

Awareness

Q1. Who to engage first?

Org. Actor

Q2. What are resource needs &

availabilities?

Org. Actor

Page 8: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Research Questions

�  Can general theories of offline conversation be applied in the online context?

�  Can we model intentions to inform organizational tasks using knowledge-guided features?

�  Can we find reliable groups to engage by modeling collective group divergence using content-based measure?

8

Page 9: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Thesis: Statement

Prior knowledge, and

interplay of features of users, their content, and network

efficiently model

Intent & Engagement

for cooperation of citizen sensor communities.

Scope of Concepts

•  Intent: aim of action, e.g., offering help •  Engagement: involvement in activity, e.g., participating in discussion

9

Page 10: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Contributions

1.  Operationalized computing in cooperative system design �  by accommodating articulation in Intent Mining, and �  enriching awareness by Engagement Modeling

2.  Improved computation of online social data �  by incorporating features from offline social theoretical knowledge

3.  Improved performance of intent classification �  by fusing top-down & bottom-up data representations

4.  Improved explanation of group engagement �  by modeling content divergence to complement existing structural measures

10

Page 11: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Data: Scope

�  Social Platform: Twitter �  Important bridge between citizens & organizations

�  Characteristics �  Users: follow/subscribe �  Content: status updates (140 chars max) �  Network: directed

�  Platform conversation functions �  Reply �  Retweet �  Mention

11

Page 12: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Outline

�  Citizen Sensor Communities & Organizations

�  Cooperative System Design Challenges �  Awareness: tackle via Engagement Modeling �  Articulation: tackle via Intent Mining

�  Contributions �  Problem 1. Conversation Classification using Offline Theories �  Problem 2. Intent Classification �  Problem 3. Engagement Modeling

�  Applications

�  Limitations & Future Work

12

Page 13: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

User1. Analyzing #Conversations on Twitter. Using platform provided functions #REPLY, #RT, and #Mention. .. … …….. User2. I kinda feel one might need more than just the platform fn -- @User1 u can think #Psycholinguistics, dude!

Problem 1. Conversation Classification

�  Function of Reply, Retweet, Mention reflect conversation

13

R1. Can general theories of conversation be applied in the online context?

Page 14: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Problem 1. Conversation Classification

�  Function of Reply, Retweet, Mention reflect conversation

�  Task: Given a set S of messages mi, Classify a sample {mi} for {RP, None}, {RT, None}, {MN, None} , where �  Ground-truth corpuses

�  RP = { mi | has_Reply_function (mi) = True } �  RT = { mi | has_Retweet_function (mi) = True } �  MN = { mi | has_Mention_function (mi) = True }

�  None = S – {RP, RT, MN}

�  Sample {mi} size = 3, based on average Reply conversation size

14

Page 15: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conversation Classification: Offline Theories

�  Psycholinguistics Indicators [Clark & Gibbs, 1986, Chafe 1987, etc.] �  Determiners (‘the’ vs. ‘a/an’) �  Dialogue Management (e.g., ‘thanks’, ’anyway’), etc.

�  Drawback �  Offline analysis focused on positive conversation instances

�  Hypotheses �  Offline theoretic features are discriminative �  Such features correlate with information density

15

Page 16: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conversation Classification: Feature Examples

16

CATEGORY Hj Hj SET

H1 - Determiners (the)

H3 - Subject pronouns (she, he, we, they)

H9 - Dialogue management indicators (thanks, yes, ok, sorry, hi, hello, bye, anyway, how about, so, what do you mean, please, {could, would, should, can, will} followed by pronoun)

H11 - Hedge words (kinda, sorta)

•  Feature_Hj (mi) = term-frequency ( Hj-set, mi ) •  Normalized •  Total 14 feature categories

Page 17: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conversation Classification: Results

�  Dataset �  Tweets from 3 Disasters, and 3 Non-Disaster events �  Varying set size (3.8K – 609K), time periods

�  Classifier: �  Decision Tree �  Evaluation: 10-fold Cross Validation �  Accuracy: 62% - 78% [Lowest for {Mention,None} ] �  AUC range: 0.63 - 0.84

17  Purohit,  Hampton,  Shalin,  Sheth  &  Flach.  In  Journal  of  Computers  in  Human  Behavior,  2013

Page 18: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conversation Classification: Discriminative Features

�  Consistent top features across classifiers �  Pronouns (e.g., you, he) �  Dialogue management (e.g., thanks) �  Determiners (e.g., the) �  Word counts

�  Positively correlated with RP, RT, MN �  Correlation Coefficient up to 0.69

18

Page 19: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conversation Classification: Psycholinguistic Analysis

�  LIWC: Tool for deeper content analysis [Pennebaker, 2001]

�  Gives a measure per psychological category

�  Categories of interest �  Social Interaction �  Sensed Experience �  Communication

�  Analyzed output sets in confusion matrices Ø  Higher values for positive classified conversation

Ø  suggests higher information for cooperative intent

19  Purohit,  Hampton,  Shalin,  Sheth  &  Flach.  In  Journal  of  Computers  in  Human  Behavior,  2013

True Positive

False Negative

False Positive

True Negative

Page 20: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conversation Classification: Lessons

1.  Offline theoretic features of conversations exist in the online environment Ø Can be applied for computing social data

2.  Such features correlate with information density in content - Reflection of conversation for an intent

20

Page 21: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Outline

�  Citizen Sensor Communities & Organizations

�  Cooperative System Design Challenges �  Awareness: tackle via Engagement Modeling �  Articulation: tackle via Intent Mining

�  Contributions �  Problem 1. Conversation Classification using Offline Theories �  Problem 2. Intent Classification �  Problem 3. Engagement Modeling

�  Applications

�  Limitations & Future Work

21

Page 22: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Thesis: Statement

Prior knowledge, and

interplay of features of users, their content, and network

efficiently model

Intent & Engagement

for cooperation of citizen sensor communities.

22

Page 23: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Short-text Document Intent

�  Intent: Aim of action

DOCUMENT   INTENT

Text  REDCROSS  to  90999  to  donate  10$  to  help  the  victims  of  hurricane  sandy

SEEKING HELP

Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy

OFFERING HELP  

Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http://

t.co/LyCSprbk has valuable info!

ADVISING  

23

Page 24: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Short-text Document Intent

�  Intent: Aim of action

DOCUMENT   INTENT

Text  REDCROSS  to  90999  to  donate  10$  to  help  the  victims  of  hurricane  sandy

SEEKING HELP

Anyone know where the nearest #RedCross is? I wanna give blood today to help the victims of hurricane Sandy

OFFERING HELP  

Would like to urge all citizens to make the proper preparations for Hurricane #Sandy - prep is key - http://

t.co/LyCSprbk has valuable info!

ADVISING  

24

How to identify relevant intent from ambiguous, unconstrained natural language text?

Relevant intent è Articulation of organizational tasks

(e.g., Seeking vs. Offering resources)

Page 25: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification: Problem Formulation

�  Given a set of user-generated text documents, identify existing intents

�  Variety of interpretations

�  Problem statement: a multi-class classification task

approximate f: S ! C , where C = {c1, c2 … cK}

is a set of predefined K intent classes, and S = {m1, m2 … mN}

is a set of N short text documents

Focus - Cooperation-assistive intent classes, C= {Seeking, Offering, None} 25

Page 26: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification: Related Work

TEXT CLASSIFICATION TYPE

FOCUS EXAMPLE

Topic predominant subject matter

sports or entertainment

Sentiment/Emotion/Opinion

focus on present state of emotional affairs

negative or positive; happy emotion

Intent

Focus on action, hence, future state of affairs

offer to help after floods

e.g., I am going to watch the awesome Fast and Furious movie!! #Excited

26

Page 27: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification: Related Work

DATA TYPE APPROACH FOCUS LIMITED APPLICABILITY

27

Formal text on Webpages/blogs

(Kröll and Strohmaier 2009, -15;

Raslan et al. 2013, -14)

Knowledge Acquisition:

via Rules, Clustering

•  Lack of large corpora with proper grammatical structure

•  Poor quality text hard to parse for dependencies

Commercial Reviews, marketplace

(Hollerit et al. 2013, Wu et al. 2011,

Ramanand et al. 2010, Carlos & Yalamanchi 2012, Nagarajan et al.

2009)

Classification: via Rules, Lexical template based,

Pattern

•  More generalized intents (e.g., ‘help’ broader than ‘sell’)

•  Patterns implicit to capture than for buying/selling

Search Queries

(Broder 2002, Downey et al. 2008,, Case 2012, Wu et al. 2010, Strohmaier & Kröll 2012)

User Profiling: Query Classification

•  Lack of large query logs, click graphs

•  Existence of social conversation

Page 28: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification: Challenges

�  Unconstrained Natural Language in small space

�  Ambiguity in interpretation

�  Sparsity of low ‘signal-to-noise’: Imbalanced classes �  1% signals (Seeking/Offering) in 4.9 million tweets #Sandy

�  Hard-to-predict problem: �  commercial intent, F-1 score 65% on Twitter [Hollerit et al. 2013]

@Zuora wants to help @Network4Good with Hurricane Relief. Text SANDY to 80888 & donate $10 to @redcross @AmeriCares & @SalvationArmyUS #help *Blue: offering intent, *Red: seeking intent

28

Page 29: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification: Types & Features

29

Intent

Binary

Crisis Domain: - [Varga et al. 2013] Problem vs. Aid (Japanese) - Features: Syntactic, Noun-Verb templates, etc.

Commercial Domain: - [Hollerit et al. 2013] Buy vs. Sell intent - Features: N-grams, Part-of-Speech

Multiclass

Commercial Domain: -  Not on Twitter

Page 30: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

TOP-DOWN

Pattern Rules:

Declarative Knowledge (patterns defined for intent association)

BOTTOM-UP

Bag of N-grams Tokens: Independent Tokens

(patterns derived from the data)

Our Hybrid

Approach

Learning Improves

Expressivity Increases

30

Page 31: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Top-Down: Binary Classifier - Prior Knowledge

�  Conceptual Dependency Theory [Schank, 1972] �  Make meaning independent from the actual words in input

�  e.g., Class in an Ontology abstracts similar instances

�  Verb Lexicon [Hollerit et al. 2013] �  Relevant Levin’s Verb categories [Levin, 1993]

�  e.g., give, send, etc.

�  Syntactic Pattern �  Auxiliary & modals: e.g., ‘be’, ‘do’, ‘could’, etc. [Ramanand et al. 2010] �  Word order: Verb-Subject positions, etc.

Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014  31

Page 32: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Top-Down: Binary Classifier – Psycholinguistic Rules

�  Transform knowledge into rules

�  Examples:

(Pronouns except 'you' = yes) ^ (need/want = yes) ^ (Adjective = yes/no) ^ (Things=yes) → Seeking

(Pronoun except 'you' | Proper Noun = yes) ^ (can/could/would/should = yes) ^ (Levin Verb = yes) ^ (Determiner = yes/no) ^ (Adjective = yes/no) ^ (Things = yes) -> Offering

Domain ontology

32 Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014  

Page 33: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Top-Down: Binary Classifier - Lessons

�  Preliminary Study �  2000 conversation and then rule-based classified tweets:

labeled by two native speakers �  Labels: Seeking, Offering, None

�  Results �  Avg. F-1 score: 78% (Baseline F-1 score: 57% [Varga et al. 2013] )

�  Lessons �  Role of prior knowledge: Domain Independent & Dependent �  Limitation: Exhaustive rule-set, low Recall, Ambiguity

addressed, but sparsity

               Purohit,  Hampton,  Bhatt,  Shalin,  Sheth  &  Flach.  In  Journal  of  CSCW,  2014  33

Page 34: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

TOP-DOWN

Pattern Rules:

Declarative Knowledge

BOTTOM-UP

Bag of N-grams Tokens: Independent Tokens

Hybrid Approach

34

Page 35: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Binary Classifier - Design

�  AMBIGUITY: addressed via rich feature space 1. Top-Down: Declarative Knowledge Patterns [Ramanand et al. 2010]

DK(mi, P) ! {0,1} e.g., P= \b(like|want) \b.*\b(to)\b.*\b(bring|give|help|raise|donate)\b

(acquired via Red Cross expert searches)

2. Abstraction: due to importance in info sharing [Nagarajan et al. 2010] -  Numeric (e.g., $10) à _NUM_ -  Interactions (e.g., RT & @user) à _RT_ , _MENTION_

-  Links (e.g., http://bit.ly) ! _URL_

3. Bottom-Up: N-grams after stemming and abstraction [Hollerit et al. 2013]

TOKENIZER ( mi ) à { bi-, tri-gram }

35

Page 36: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Binary Classifier - Design

�  SPARSITY: addressed via algorithmic choices 1.  Feature Selection 2.  Ensemble Learning 3.  Classifier Chain

36

DATASET

Knowledge-driven features

XT

, y

m_1

m_2

P(c2)

P(c1) X1

T, y1

X2T, y2

1 - P(c1)

Page 37: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Binary Classifier - Experiments

�  Binary classifiers: �  Seeking vs. not Seeking �  Offering vs. not Offering

�  Dataset: �  Candidate set: 4000 donation classified tweets

�  Labels: min. 3 judges �  Annotations: Seeking , Offering , None

37 Purohit,  Castillo,  Diaz,  Sheth,  &  Meier.  First  Monday  journal,  2014  

Page 38: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Binary Classifier - Results

Experiments

Supervised Learning

Training Samples

Precision (*Baseline)

F-1 score

Class-labels

Seeking vs. (None’ + Offering)

RF (CR=50:1)

3836 98% (*79%)

46% (56%)

56% requests

Offering vs. (None’) RF (CR=9:2)

1763 90% (*65%)

44% (*58%)

13% offers

RF = Random Forest ensemble CR = Asymmetric false–alarm Cost Ratios for True:False Evaluation : 10-fold CV

Notes:

-  Domain requires high precision than recall

-  Scope for improving low recall

38 Purohit,  Castillo,  Diaz,  Sheth,  &  Meier.  First  Monday  journal,  2014  

Page 39: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Multiclass Classifier - Generalization

�  Lessons from binary classification �  Improvement by fusing top-down & bottom-up �  Sparsity �  Ambiguity (Seeking & Offering complementary)

�  addressed via improved data representation

Hypothesis: Knowledge-guided approach improves multiclass classification accuracy

39

Page 40: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

TOP-DOWN

Knowledge Patterns

(DK) Declarative

(SK) Social Behavior

(CTK, CSK) Contrast Patterns

BOTTOM-UP

Bag of N-grams Tokens: (T) Independent Tokens

Hybrid Approach

40

Page 41: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Multiclass Classifier – Feature Creation 1. (T) Bag of Tokens -

2. (DK) Declarative Knowledge Patterns �  Domain expert guidance �  Psycholinguistics syntactic & semantic rules

�  Expand by WordNet and Levin Verbs

e.g.,

3. (SK) Social Knowledge Indicators �  Offline conversation indicators studied in Problem 1 e.g., Hj = Dialogue Management, Hj-set = {Thanks, anyway,..}

41

(how = yes) ^ (Modal-Set 'can' = yes) ^ (Pronouns except 'you' = yes) ^ (Levin Verb-Set 'give' = yes)

Feature_Hj (mi) = term-frequency ( Hj-set, mi )

Pj = Feature_Pj (mi) = 1 if Pj exists in mi , else 0

TOKENIZER(mi , min, max)

Page 42: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Multiclass Classifier - Feature Creation 4. (CTK) Contrast Knowledge Patterns

INPUT: corpus {mi} cleaned and abstracted, min. support, X For each class Cj

�  Find contrasting pattern using sequential pattern mining

OUTPUT: contrast patterns set {P} for each class Cj

5. (CPK) Contrast Patterns: on Part-of-Speech tags of {mi}

42

e.g., unique sequential patterns: SEEKING: help .* victim .* _url_ .* OFFERING: anyon .* know .* cloth .*

Page 43: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Multiclass Classifier - Feature Creation Finding CTK: Contrast Knowledge Patterns

For each class Cj 1.  Tokenize the cleaned, abstracted text of {mi } 2.  Mine Sequential Patterns: SPADE Algorithm

�  - Output: sequences of token sets, {P’}

3.  Reduce to minimal sequences {P}

4.  Compute growth rate & contrast strength for P with all other Ck

5.  Top-K ranked {P} by contrast strength

OUTPUT: contrast patterns set {P} for each class Cj 43

gr(P,Cj,Ck) = support (P,Cj) / support (P,Ck) .. (1)

Contrast-Growth (P,Cj,Ck) = 1/(|Cj| -1) ΣCk, k=/=j gr(P,Cj,Ck)/ (1 + gr(P,Cj,Ck)) ..(2)

Contrast-Strength(P,Cj) = support(P,Cj)*Contrast-Growth(P,Cj,Ck) .. (3)

Page 44: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

CORPUS

Set of short text

documents,

S

FEATURES

Knowledge-driven features

XT

, y

M_1

M_2

M_K

...Subset Xj

T ⊂ S such that, XjT includes

all the labeled instances of class Cj for model M_j

Binarization Frameworks for Multiclass Classifier: 1 vs. All

P(c2)

P(c1) X1

T, y1

X2T, y2

XKT, yK P(cK)

44 (In 1 vs. 1 framework: K*(K-1)/2 classifiers, for each Cj,Ck pair)

Page 45: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification Hybrid: Multiclass Classifier - Experiments

�  Datasets

�  Dataset-1: Hurricane Sandy, Oct 27 – Nov 7, 2012 �  Dataset-2: Philippines Typhoon, Nov 7 – Nov 17, 2013

�  Parameters �  Base Learner M_j: Random Forest, 10 trees with 100 features �  bi-, tri-gram for (T) �  K=100% & min. support 10% for CTK, 50% for CPK

45

Page 46: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Intent Classification: Multiclass Classifier – Results

46

56% 58% 60% 62% 64% 66% 68% 70%

T (Baseline)

T,DK

T,SK

T,CTK,CSK

T,DK,SK,CTK,CSK

1-vs-1

1-vs-All

Avg. F-1 Score (10-fold CV)

Frameworks:

Gain 7%, p < 0.05

Dataset-1 (Hurricane Sandy, 2012)

(Declarative)

(Social)

(Contrast)

Page 47: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

74% 76% 78% 80% 82% 84% 86%

T (Baseline)

T,DK

T,SK

T,CTK,CSK

T,DK,SK,CTK,CSK

1-vs-1

1-vs-All

Intent Classification: Multiclass Classifier - Results

47

Frameworks:

Gain 6%, p < 0.05

Dataset-2 (Philippines Typhoon, 2013)

(Declarative)

(Social)

(Contrast)

Avg. F-1 Score (10-fold CV)

Page 48: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Lessons 1.  Top-down & Bottom-up hybrid approach improves data

representation for learning (complementary) intent classes �  Top 1% discriminative features contained 50% knowledge driven

2.  Offline theoretic social conversation (SK) features (the, thanks, etc.), often removed for text classification are valuable for intent.

3.  There is a varying effect of knowledge types (SK vs. DK vs. CTK/CPK) in different types of real world event datasets Ø Culturally-sensitive psycholinguistics knowledge in future

48

Page 49: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Outline

�  Citizen Sensor Communities & Organizations

�  Cooperative System Design Challenges �  Awareness: tackle via Engagement Modeling �  Articulation: tackle via Intent Mining

�  Contributions �  Problem 1. Conversation Classification using Offline Theories �  Problem 2. Intent Classification �  Problem 3. Engagement Modeling

�  Applications

�  Limitations & Future Work

49

Page 50: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Thesis: Statement

Prior knowledge, and

interplay of features of users, their content, and network

efficiently model

Intent & Engagement

for cooperation of citizen sensor communities.

50

Page 51: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

�  Engagement: degree of involvement in discussion

�  Reliable groups: stay focused and collectively behave to diverge on topics

Problem 3. Group Engagement Model

51 Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014

How can organizations find reliable groups to engage for action?

Page 52: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

�  Engagement: degree of involvement in discussion

�  Reliable groups: stay focused and collectively behave to diverge on topics

�  Why & How do groups collectively evolve over time? 1.  Define a group from interaction network, g

2.  Define Divergence of g: content based in contrast to structure

3.  Predict change in the divergence between time slices �  Features of g based on theories of social identity, & cohesion

Problem 3. Group Engagement Model

52 Purohit, Ruan, Fuhry, Parthasarathy, & Sheth. ICWSM 2014

Page 53: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Group Engagement Model: Integrated Approach Unlike Prior Work

People (User): Participant

of the discussion

Content (Text): Topic of Interest

Network (Community):

Group around topic

AND

AND

Sources: tupper-lake.com/.../uploads/Community.jpg http://www.iconarchive.com/show/people-icons-by-aha-soft/user-icon.html

KEY POINT: capture User Node Diversity

53

Page 54: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

�  Candidate Group: Detect in interaction network

�  Group Discussion Divergence: Jenson-Shannon Divergence of topic distribution on group members’ tweets

Group Engagement Model: Discussion Divergence

where, H(*) = Shannon Entropy

Bt = Latent topic distribution of each tweet t in all members’ tweets |Tg| ,

Bg = mean topic distribution of group g, such that:

54

Page 55: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Lessons 1.  Content Divergence based measure helps explanation of

why groups collectively diverge �  Less diverging group write more social & future action related

content

2.  Emerging events such as disasters have higher correlation with social identity-driven features Ø Role of social context

55

Page 56: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Outline

�  Citizen Sensor Communities & Organizations

�  Cooperative System Design Challenges �  Awareness: tackle via Engagement Modeling �  Articulation: tackle via Intent Mining

�  Contributions �  Problem 1. Conversation Classification using Offline Theories �  Problem 2. Intent Classification �  Problem 3. Engagement Modeling

�  Applications

�  Limitations & Future Work

56

Page 57: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

DISASTER Event

Application-1: Filter Content for Disaster Response

CITIZEN Sensors

RESPONSE Organizations

Me  and  @CeceVancePR  are  coordinating  a  clothing/food  drive  for  families  affected  by  Hurricane  Sandy.  If  you  would  like  to  donate,  DM  us      

Does  anyone  know  how  to  donate  clothes  to  hurricane  #Sandy  victims?  

[SEEKING  

[OFFERING  

Intent-Classifiers as a Service

57

Page 58: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Broader Impact: Classifier Model integrated by Crisis Mapping Pioneer

58

Page 59: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

DISASTER Event

Application-2: “We TRUST people!” User engagement tool

CITIZEN Sensors

RESPONSE Organizations

Tool to mine Important

users

59

Page 60: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Broader Impact: Winner of Int’l Challenge: UN ITU Young Innovators 2014

60

Page 61: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Articulation

ENGAGEMENT MODELING INTENT MINING

COOPERATIVE SYSTEM

61

ORGANIZATIONS   CITIZEN  SENSOR  COMMUNITIES  

Awareness

Q1. Who to engage first?

Org. Actor

Q2. What are Resource needs &

availabilities?

Org. Actor

Page 62: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Limitations & Future Work

�  Cooperative System �  CSCW Application specific to domain of crisis Ø  How to create a full What-Where-When-Who knowledge base

�  Intent Mining �  Non-cooperation assistive intent classes not considered, as well as

the temporal drift of intent not considered Ø  How to mine actor-level intent beyond document level

�  Group Engagement �  Reliable prioritized groups based on Correlation, not Causality �  Interplay of Offline and Online interactions beyond the scope Ø  How to incorporate intent in the group divergence

�  Bipartite Intent Graph Matching �  Reducing time complexity of Seeking vs. Offering matching

62

Page 63: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Conclusion

Prior knowledge, and

interplay of features of users, their content, and network

efficiently model

Intent & Engagement

for cooperation between citizen sensors and organizations in the online social communities.

63

Page 64: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Thanks to the Committee Members

64

[Left to Right] Prof. Amit Sheth, (advisor, WSU), Prof. Guozhu Dong (WSU), Prof. Srinivasan Parthasarathy (OSU), Prof. TK Prasad (WSU), Dr. Patrick Meier (QCRI), Prof. Valerie Shalin (WSU)

Computer Science Social Science

Page 65: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Acknowledgement, Thanks and Questions J

�  NSF SoCS grant IIS-1111182 to support this work

�  Interdisciplinary Mentors especially Prof. John Flach (WSU), Drs. Carlos Castillo (QCRI), Fernando Diaz (Microsoft), Meena Nagarajan (IBM)

�  Kno.e.sis team especially Andrew Hampton from Psychology dept. and Shreyansh and Tanvi from CSE at Wright State, as well as Yiye Ruan (now Google) & David Fuhry at the Data Mining Lab, Ohio State University

�  Colleagues: Digital Volunteers from the CrisisMappers network, StandBy Task Force, InCrisisRelief.org, info4Disasters, Humanity Road, Ushahidi, etc. and the subject matter experts at UN FPA

65

Page 66: Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation with Organizations

@hemant_pt

Ambiguity

Sparsity

Diversity

Scalability

•  Mutual Influence in Sparse Friendship Network [AAAI ICWSM’12]

•  User Summarization with

Sparse Profile Metadata [ASE SocialInfo’12]

•  Matching intent as task of Information Retrieval [FM’14]

•  Knowledge-aware Bi-partite

Matching [In preparation]

•  Short-Text Document Intent Mining [FM’14, JCSCW’14]

•  Actor-Intent Mining

Complexity [In preparation]

•  Modeling Group Using Diverse Social Identity & Cohesion [AAAI ICWSM’14]

•  Modeling Diverse User-Engagement [SOME WWW’11, ACM WebSci’12]

(Interpretation)

(users)

(behaviors)

66

Other works