Upload
elizabeth-murnane
View
102
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
NOTICING THE NUANCE: Designing intelligent systems that can understand semantic, psychological, and behavioral dimensions
of our digital footprints
Elizabeth L. Murnane
[email protected] www.cs.cornell.edu/~elm236/
ABOUT ELIZABETH
Currently • 3nd year PhD at Cornell Information Science • Committee: Profs. Dan Cosley (chair), Claire Cardie, Geri Gay
Research • Personalization; IR/NLP; Personal Informatics; Affective-‐, Semantic-‐,
Social-‐ Computing • 2011 NSF Graduate Research Fellow
Background • 2007 MIT S.B. in Mathematics with Computer Science • Co-‐founded MIT CSAIL startup
USER-‐CENTRIC DATA
• Explicit & Implicit
• User-‐generated content
• Sensor data
• Big Data & Big Personal Data (“Little Data”)
DIGITAL FOOTPRINTS
DIGITAL FOOTPRINTS
• Search Queries
DIGITAL FOOTPRINTS
• Search Queries
• Social web, microblogs, media sharing
DIGITAL FOOTPRINTS
• Search Queries
• Social web, microblogs, media sharing
• Mobile sensing, personal informatics, life-‐logging, check-‐ins
DIGITAL FOOTPRINTS
• Search Queries
• Social web, microblogs, media sharing
• Mobile sensing, personal informatics, life-‐logging, check-‐ins
• Social networking
NUANCED DIMENSIONS OF DATA • Semantics
• Helping machines extract intended meaning from an individual’s content
• Personality & Emotion
• Helping machines interpret psychological, affective, and subjective characteristics of users and their data
• Behavior
• Helping machines understand the dynamics of both private and interpersonal activities
APPLICATION AREAS
Knowledge Sharing
Personal Informatics
Information Retrieval
RESEARCH PROJECTS
Information Retrieval Knowledge Sharing Personal Informatics Computational Problem:
Dimensions Mined:
Projects:
• Semantic • Psychological
• Psychological • Behavioral
• CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve • Activity Rhythms • Smoking Cessation
• Semantic • Psychological
• RESLVE • Sentiment-‐based search
RESEARCH PROJECTS
Information Retrieval Knowledge Sharing Personal Informatics Computational Problem:
Dimensions Mined:
Projects:
• Semantic • Psychological
• Psychological • Behavioral
• RESLVE • CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve • Activity Rhythms • Smoking Cessation
• Semantic • Psychological
• Sentiment-‐based search
Information Retrieval Knowledge Sharing Personal Informatics
THE RESLVE PROJECT
• Gain better understanding of challenges machines face in understanding semantic meaning of social Web data
• Use those insights to develop more advanced computational methods that can more reliably make sense of this data
Information Retrieval Knowledge Sharing Personal Informatics
SOCIAL WEB
Information Retrieval Knowledge Sharing Personal Informatics
SOCIAL WEB
10 million pages per day
Information Retrieval Knowledge Sharing Personal Informatics
SOCIAL WEB
800 million visitors per month
Information Retrieval Knowledge Sharing Personal Informatics
SOCIAL WEB
7 billion images (twice 4 years ago)
Information Retrieval Knowledge Sharing Personal Informatics
TASK DEFINITION
Information Retrieval Knowledge Sharing Personal Informatics
TASK DEFINITION
Information Retrieval Knowledge Sharing Personal Informatics
Named En)ty Recogni)on (NER) • SystemaEcally idenEfying menEons of en##es (e.g.,
people, places, concepts, ideas)
TASK DEFINITION
Named En)ty Recogni)on (NER) • SystemaEcally idenEfying menEons of en##es (e.g.,
people, places, concepts, ideas)
Named En)ty Disambigua)on (NED) • Resolving the intended meaning of ambiguous enEEes from
mulEple candidate meanings
Information Retrieval Knowledge Sharing Personal Informatics
AMBIGUOUS ENTITIES
aaahh one more day un,l finn!!! #cantwait
office holiday party Beetle
Information Retrieval Knowledge Sharing Personal Informatics
AMBIGUOUS ENTITIES
aaahh one more day un,l finn!!! #cantwait
office holiday party Beetle
Information Retrieval Knowledge Sharing Personal Informatics
AMBIGUOUS ENTITIES
aaahh one more day un,l finn!!! #cantwait
office holiday party Beetle
Information Retrieval Knowledge Sharing Personal Informatics
AMBIGUOUS ENTITIES
aaahh one more day un,l finn!!! #cantwait
office holiday party Beetle
Information Retrieval Knowledge Sharing Personal Informatics
Footage:
office holiday party
Information Retrieval Knowledge Sharing Personal Informatics
Footage:
office holiday party
Footage: • Workplace?
Information Retrieval Knowledge Sharing Personal Informatics
Footage:
office holiday party
Footage: • Workplace? • TV Show?
Information Retrieval Knowledge Sharing Personal Informatics
Footage:
office holiday party
Footage: • Workplace? • TV Show?
Episode 4
Information Retrieval Knowledge Sharing Personal Informatics
Footage:
office holiday party
Episode 4
Footage: • Workplace? • TV Show?
• US Version? • UK Version?
Information Retrieval Knowledge Sharing Personal Informatics
Episode 4
office holiday party
office, december 3
Footage: • Workplace? • TV Show?
• US Version? • UK Version?
Information Retrieval Knowledge Sharing Personal Informatics
ANALYSIS
Data Sample
• TwiKer: tweets • YouTube: video Etles, descripEons • Flickr: photo tags, Etles, descripEons
Information Retrieval Knowledge Sharing Personal Informatics
TEXT LENGTH
• Longest uKerances sEll shorter than even shortest texts from NER task corpora like Reuters-‐21578, Brown-‐Corpus
0"
5"
10"
15"
20"
25"
30"
10"
40"
70"
100"
130"
160"
190"
300"
450"
600"
800"
1100"
1400"
2500"
4000"
5500"
7000"
8500"
10000"
11500"
13000"
14500"
Twi/er" YouTube" Flickr"Reuters" Brown"
Information Retrieval Knowledge Sharing Personal Informatics
HIGH AMBIGUITY
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
Wikipedia"Miner" DBPedia"Spotlight"
• NER services have low confidence
Information Retrieval Knowledge Sharing Personal Informatics
HIGH AMBIGUITY
• NER services have low confidence
• Many potenEal candidates (2 to 163, avg. 5-‐6, median 4)
0"
0.1"
0.2"
0.3"
0.4"
0.5"
0.6"
0.7"
0.8"
0.9"
1"
Wikipedia"Miner" DBPedia"Spotlight"
Information Retrieval Knowledge Sharing Personal Informatics
HIGH AMBIGUITY
• 91% of uKerances contain at least 1 ambiguous enEty
• 2/3 of enEEes detected are ambiguous
• Almost no enEEes without at least 2 senses to disambiguate
Information Retrieval Knowledge Sharing Personal Informatics
CHALLENGES & FOCUS
• Short Length • Sparse Lexical Context
• Noisy • Highly personal in nature
Information Retrieval Knowledge Sharing Personal Informatics
CHALLENGES & FOCUS
• Short Length • Sparse Lexical Context
• Noisy • Highly personal in nature
Information Retrieval Knowledge Sharing Personal Informatics
LIMITATIONS OF EXTANT RESEARCH
Tweets severely degrade tradiEonal techniques
Information Retrieval Knowledge Sharing Personal Informatics
LIMITATIONS OF EXTANT RESEARCH
Tweets severely degrade tradiEonal techniques
• Stanford NER: F1 drops 90% à 46%
• DBPedia Spotlight & Wikipedia Miner: P@1 < 40%
Information Retrieval Knowledge Sharing Personal Informatics
LIMITATIONS OF EXTANT RESEARCH
Tweets severely degrade tradiEonal techniques
• Stanford NER: F1 drops 90% à 46%
• DBPedia Spotlight & Wikipedia Miner: P@1 < 40%
Recent strategies
Information Retrieval Knowledge Sharing Personal Informatics
LIMITATIONS OF EXTANT RESEARCH
Tweets severely degrade tradiEonal techniques
• Stanford NER: F1 drops 90% à 46%
• DBPedia Spotlight & Wikipedia Miner: P@1 < 40%
Recent strategies
• Crowd-‐sourcing • LimitaEon: Dependent on reliable human workers
Information Retrieval Knowledge Sharing Personal Informatics
LIMITATIONS OF EXTANT RESEARCH
Tweets severely degrade tradiEonal techniques
• Stanford NER: F1 drops 90% à 46%
• DBPedia Spotlight & Wikipedia Miner: P@1 < 40%
Recent strategies
• Crowd-‐sourcing • LimitaEon: Dependent on reliable human workers
• Automated aKempts • LimitaEon: Focus on NER not NED • LimitaEon: Generalizability beyond TwiKer?
Information Retrieval Knowledge Sharing Personal Informatics
HYPOTHESES
Information Retrieval Knowledge Sharing Personal Informatics
• User has core interests • User more likely to menEon an enEty about a topic relevant to personal interests than menEon a topic of non-‐interest
• User expresses these interests consistently in content she posts online in mulEple communiEes
• Can use a semanEc knowledge base to formally represent these topics of interest
HYPOTHESES
Information Retrieval Knowledge Sharing Personal Informatics
• User has core interests • User more likely to menEon an enEty about a topic relevant to personal interests than menEon a topic of non-‐interest
• User expresses these interests consistently in content she posts online in mulEple communiEes
• Can use a semanEc knowledge base to formally represent these topics of interest
• Wikipedia • ArEcles, categories effecEvely represent topic • CompaEble with NER toolkits (DBPedia Spotlight, Wikipedia Miner) ArEcle ediEng behavior ≈ interests
QUALITATIVE ANALYSIS: STABLE INTERESTS
User’s topics of contribuEon similar across Web:
On average, 52.4% of enEEes a user menEons in social Web (e.g., “Java”) have at least 1 candidate sense in same parent category of Wikipedia arEcle same user edited (e.g., “Programming language”)
If extend to just 4 parents up category hierarchy, get all 100%
Information Retrieval Knowledge Sharing Personal Informatics
QUALITATIVE ANALYSIS: STABLE INTERESTS
User’s topics of contribuEon similar across Web:
Same Topic
On average, 52.4% of enEEes a user menEons in social Web (e.g., “Java”) have at least 1 candidate sense in same parent category of Wikipedia arEcle same user edited (e.g., “Programming language”)
If extend to just 4 parents up category hierarchy, get all 100%
Ambiguous YouTube post: office, december 3
Same user’s recent Wikipedia edit: <item userid="xxxx" user="xxxx” pageid="31841130” ,tle= "The Office (U.S. season 8)"/>
Information Retrieval Knowledge Sharing Personal Informatics
QUALITATIVE ANALYSIS: STABLE INTERESTS
A user’s topics of contribuEon similar across Web:
Same Topic
Same categories
On average, 52.4% of enEEes a user menEons in social Web (e.g., “Java”) have at least 1 candidate sense in same parent category of Wikipedia arEcle same user edited (e.g., “Programming language”)
If extend to just 4 parents up category hierarchy, get all 100%
Ambiguous YouTube post: office, december 3
Same user’s recent Wikipedia edit: <item userid="xxxx" user="xxxx” pageid="31841130” ,tle= "The Office (U.S. season 8)"/>
Information Retrieval Knowledge Sharing Personal Informatics
STRATEGY
Information Retrieval Knowledge Sharing Personal Informatics
Ø Bridge user idenEty between social Web and knowledge base, K Ø Model interests using K’s organizaEonal scheme Ø Rank enEty senses according to relevance to interests
EXPLORING A PERSONALIZED SOLUTION
Individual-‐centric approach to NED
Information Retrieval Knowledge Sharing Personal Informatics
EXPLORING A PERSONALIZED SOLUTION
Individual-‐centric approach to NED
Incorporates external, user-‐specific semanEc data
Personal Context
Information Retrieval Knowledge Sharing Personal Informatics
EXPLORING A PERSONALIZED SOLUTION
Individual-‐centric approach to NED
Incorporates external, user-‐specific semanEc data
Model personal interests with respect to this informaEon
Personal Context
Information Retrieval Knowledge Sharing Personal Informatics
EXPLORING A PERSONALIZED SOLUTION
Individual-‐centric approach to NED
Incorporates external, user-‐specific semanEc data
Model personal interests with respect to this informaEon
Determine user’s likely intended meaning of ambiguous enEty based on similarity between potenEal meanings and interests
Personal Context
Information Retrieval Knowledge Sharing Personal Informatics
EXPLORING A PERSONALIZED SOLUTION
Individual-‐centric approach to NED Incorporates external, user-‐specific semanEc data
Model personal interests with respect to this informaEon
Determine user’s likely intended meaning of ambiguous enEty based on similarity between potenEal meanings and interests
RESLVE Resolving EnEty Sense by LeVeraging Edits
Personal Context
Information Retrieval Knowledge Sharing Personal Informatics
IMPLEMENTATION: THE RESLVE SYSTEM
RESLVE (Resolving EnEty Sense by LeVeraging Edits) addresses NED by:
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
IMPLEMENTATION: THE RESLVE SYSTEM
RESLVE (Resolving EnEty Sense by LeVeraging Edits) addresses NED by:
I. ConnecEng social Web + Wikipedia editor idenEty
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
IMPLEMENTATION: THE RESLVE SYSTEM
RESLVE (Resolving EnEty Sense by LeVeraging Edits) addresses NED by:
I. ConnecEng social Web + Wikipedia editor idenEty
II. Modeling topics of interests using arEcle edits
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
IMPLEMENTATION: THE RESLVE SYSTEM
RESLVE (Resolving EnEty Sense by LeVeraging Edits) addresses NED by:
I. ConnecEng social Web + Wikipedia editor idenEty
II. Modeling topics of interests using arEcle edits
III. Ranking enEty candidates by personal relevance
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
IMPLEMENTATION: THE RESLVE SYSTEM
RESLVE (Resolving EnEty Sense by LeVeraging Edits) addresses NED by:
I. ConnecEng social Web + Wikipedia editor idenEty
II. Modeling topics of interests using arEcle edits
III. Ranking enEty candidates by personal relevance
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
PHASE 1: BRIDGING WEB IDENTITIES
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
• Connect idenEty of social media user with Wikipedia editor
PHASE 1: BRIDGING WEB IDENTITIES • Connect idenEty of social media user with Wikipedia editor
• Simple string matching Iofciu, 2011; Perito, 2011
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Models user’s topics of interest using bridged Wiki account’s ediEng-‐history
Compares similarity of those topics to topic associated with candidate sense
Information Retrieval Knowledge Sharing Personal Informatics
PHASE 2: REPRESENTING USERS AND ENTITIES
Models user’s topics of interest using bridged Wiki account’s ediEng-‐history
Compares similarity of those topics to topic associated with candidate sense
Content-‐based & knowledge-‐graph based similarity
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
PHASE 2: REPRESENTING USERS AND ENTITIES
MODELING A KNOWLEDGE CONTEXT
Knowledge base, K
K=(N,E)
2 node types: Categories Topics
c1c2
c4
t3t2
c3
d2d1 d3
t1
Information Retrieval Knowledge Sharing Personal Informatics
USER INTEREST MODEL
• EdiEng a descripEon signals interest in associated topic • Topic nodes: all topics user edited descripEon of • Category nodes: categories reachable in knowledge graph from those topics • Edge weight = inverse of shortest path length
! c1 c2 c3 c4
t1 !!! 1!
!!! 0!
t2 !!! 1!
!!! 1!
t3 0! 0! !!! 1!
• Same representaEon for candidates
Information Retrieval Knowledge Sharing Personal Informatics
Models user’s topics of interest using bridged Wiki account’s ediEng-‐history
Compares similarity of those topics to topic associated with candidate sense
Content-‐based & knowledge-‐graph based similarity
Weighted vectors used to represent user and candidate sense
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
PHASE 2: REPRESENTING USERS AND ENTITIES
PHASE 3: RANKING BY PERSONAL RELEVANCE
Output highest scoring candidate as intended meaning by measuring: sim(u,m)=α*simcontent(u,m)+(1-‐α)*simcategory(u,m)
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
PRE-‐PROCESSING & PREPARATION MODULES
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
Information Retrieval Knowledge Sharing Personal Informatics
pre-processor
Wikipedia Miner
user utterances unstructured short texts
DBPedia Spotlight
top ranked personally-
relevant candidates
entity
mmm
entity
username
user contributed structured documents
user interest model
BRIDGING USER
IDENTITY
MODELING USER
INTEREST
I II
IIIRANKING
CANDIDATES BY PERSONAL RELEVANCE
mmm
m mm m
mmm
entity
entity
detected entities & candidate meanings ("m")
PRE-‐PROCESSING & PREPARATION MODULES
EXPERIMENT
Labeling correct enEty meaning
• 1545 valid ambiguous enEEes
• Mechanical Turk CategorizaEon Masters
• Averaged observed agreement across all coders and items = 0.866
• Average Fleiss Kappa = 0.803
• 918 unanimously labeled ambiguous enEEes
Information Retrieval Knowledge Sharing Personal Informatics
PERFORMANCE
Metric
• Precision at rank 1 (P@1)
Information Retrieval Knowledge Sharing Personal Informatics
PERFORMANCE
Metric • Precision at rank 1 (P@1)
Methods of comparison • Human annotated gold standard • RC: Randomly sorted candidates • PF: Prior frequency • RU: RESLVE given a random Wikipedia user's interest model • DS: DBPedia Spotlight • WM: Wikipedia Miner
Information Retrieval Knowledge Sharing Personal Informatics
RESULTS
Flickr YouTube
RESLVE 0.63 0.76 0.84
RC 0.21 0.32 0.31
PF 0.74 0.69 0.66
RU 0.51 0.71 0.78
WM 0.78 0.58 0.80
DS 0.53 0.67 0.63
Information Retrieval Knowledge Sharing Personal Informatics
RESULTS
• Best performance on YouTube texts
(longest) due to content-‐based sim
Information Retrieval Knowledge Sharing Personal Informatics
RESULTS
• Best performance on YouTube texts
(longest) due to content-‐based sim
• Outperforms on more personal text (e.g., tweets)
Information Retrieval Knowledge Sharing Personal Informatics
RESULTS
• Best performance on YouTube texts
(longest) due to content-‐based sim
• Outperforms on more personal text (e.g., tweets)
• Less effecEve on impersonal text (e.g., photo geo-‐tags) • High prior frequency so standard methods suffice • Personally-‐unfamiliar topics so not likely to make Wiki edits about them • Stable interests assumpEon breaks down here
Information Retrieval Knowledge Sharing Personal Informatics
RESEARCH PROJECTS
Information Retrieval Knowledge Sharing Personal Informatics
• Semantic • Psychological
• Psychological • Behavioral
• RESLVE • CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve • Activity Rhythms • Smoking Cessation
• Semantic • Psychological
• Sentiment-‐based search
Computational Problem:
Dimensions Mined:
Projects:
SENTIMENT BASED SEARCH
Information Retrieval Knowledge Sharing Personal Informatics
SENTIMENT BASED SEARCH
• Zip codes of 10 most populated cities, 10 least populated cities, 10 random cities across the country
• 54,015 places across 1500 US cities • Movie theaters, hotels, spas, stores, restaurants, etc.
Information Retrieval Knowledge Sharing Personal Informatics
Information Retrieval Knowledge Sharing Personal Informatics
Information Retrieval Knowledge Sharing Personal Informatics
Information Retrieval Knowledge Sharing Personal Informatics
Information Retrieval Knowledge Sharing Personal Informatics
CHALLENGES FOR USERS
• Interpreting mixed reviews
• Confidence in reviewer’s subjective opinions
• Reading multiple reviews
Information Retrieval Knowledge Sharing Personal Informatics
RESEARCH QUESTIONS
• Language and rating • How does a place’s rating relate to the language used in its reviews?
• Personality and rating • Do people with similar personalities tend to like or dislike the same places?
• Search interfaces • How can we rank search results in order to recommend places according to how appealing their atmosphere is likely to be to a user based on her personality and mood?
Information Retrieval Knowledge Sharing Personal Informatics
STRATEGY
• Extract features from reviews using Linguistic Inquiry and Word Count (LIWC) and MRC Psycholinguistic Database
• Support vector models trained by Mairesse algorithm to derive Big Five personality types of reviewers
• Average personality score of reviewers of a place who rated the place 5 or higher/lower as proxy for people who like/dislike a location’s essence
Information Retrieval Knowledge Sharing Personal Informatics
SEARCH INTERFACES
Information Retrieval Knowledge Sharing Personal Informatics
Extraversion = Red, Stability = Purple, Agreeableness = Green, Conscientiousness = Blue,
Openness = Yellow
Information Retrieval Knowledge Sharing Personal Informatics
Information Retrieval Knowledge Sharing Personal Informatics
CONSUMING INFORMATION
Information Retrieval Knowledge Sharing Personal Informatics
CREATING & SHARING INFORMATION
Knowledge Sharing Personal Informatics Information Retrieval
RESEARCH PROJECTS
Knowledge Sharing Personal Informatics
• CeRI • Outreach • Task routing • Commenting
interface
• Semantic • Psychological
Information Retrieval
• RESLVE • Sentiment-‐based search
• Semantic • Psychological
• Psychological • Behavioral
• Smart Pensieve • Activity Rhythms • Smoking Cessation
• CeRI
Knowledge Sharing Personal Informatics Information Retrieval
Computational Problem:
Dimensions Mined:
Projects:
CERI: CORNELL E-‐RULEMAKING INITIATIVE
• Law School
• Legal Information Institute (LII)
• Information Science
• Computer Science
Knowledge Sharing Personal Informatics Information Retrieval
BACKGROUND
• Rulemaking: process federal agencies use to create regulations (called “rules”)
• e-‐Rulemaking: the use of digital technologies during this process
• Regulations.gov, RegulationRoom.org: online communities that allow people to learn about, discuss, and react to proposed rules during e-‐Rulemaking process
Knowledge Sharing Personal Informatics Information Retrieval
BACKGROUND
• Rulemaking: process federal agencies use to create regulations (called “rules”)
• e-‐Rulemaking: the use of digital technologies during this process
• Regulations.gov, RegulationRoom.org: online communities that allow people to learn about, discuss, and react to proposed rules during e-‐Rulemaking process
Knowledge Sharing Personal Informatics Information Retrieval
PARTICIPATION PATTERNS • Regulations.gov
• 14,000 rules • 2 million comments
• Regulation Room • 5 live rules • 1,318 comments
• Common problem: under-‐contribution
Knowledge Sharing Personal Informatics Information Retrieval
PARTICIPATION PATTERNS
Knowledge Sharing Personal Informatics Information Retrieval
PARTICIPATION PATTERNS
Frequency of comments per rule Comments per rule across agencies Knowledge Sharing Personal Informatics Information Retrieval
CHALLENGE
Knowledge Sharing Personal Informatics Information Retrieval
• A major goal of eRulemaking is to increase public participation across a broad audience and make the process more representative
• A major challenge is sustained participation by multiple actors across rules
SOLUTION
• Twitter is a popular medium where people express views and ideas
• Identify and target Twitter users who may be interested in contributing feedback on a rule
A solution is to bring new users to an e-‐rule
• A major goal of eRulemaking is to increase public participation across a broad audience and make the process more representative
• A major challenge is sustained participation by multiple actors across rules
Knowledge Sharing Personal Informatics Information Retrieval
EXPERIMENT • How useful is Twitter content for drawing inferences about
people’s interests and knowledgeability about a topic?
• Are users who create content about topics relevant to an e-‐rule more likely to engage in related e-‐Rulemaking processes if targeted with requests for participation?
Knowledge Sharing Personal Informatics Information Retrieval
1 Identify Subjects
Bio Tweet
Combo Control
• Similarity between query and each document • Highest score used to assign user to condition
* via Google Keyword Tool, which provides less technical words used by public to discuss same topics
User: Rule: Document term matrix Query
q = words in rule + query expansion *
D1 = bio D2 = tweets D3 = bio+tweets (“combo”)
Knowledge Sharing Personal Informatics Information Retrieval
2
² Highest ranked users in each group sent an outreach tweet
Send Tweets
Knowledge Sharing Personal Informatics Information Retrieval
3
• Engagement (retweets, replies, and follows)
• Click Through Rate
• Contributed to the rule
Measure Response
Knowledge Sharing Personal Informatics Information Retrieval
PSYCHOLOGICAL TRAITS OF EFFECTIVE CONTRIBUTORS
Knowledge Sharing Personal Informatics Information Retrieval
• Connecting psychological traits, language use, and contribution capability
• Classification, Outreach, and Task Routing
PSYCHOLOGICAL TRAITS OF EFFECTIVE CONTRIBUTORS
• Connecting psychological traits, language use, and contribution capability
• Classification, Outreach, and Task Routing
• Inventories • Self-‐efficacy & self-‐esteem • Big 5 personality
• Self-‐regulation & self-‐monitoring • Trendsetting & Opinion Leadership • Pro-‐social & altruistic value orientations
Knowledge Sharing Personal Informatics Information Retrieval
COMPUTATIONAL SUPPORTS FOR KNOWLEDGE SHARING
• Meaningful games to teach community norms
• Personalized rule recommendation
• Providing assistance, prompts, and examples to improve the quality of contributions
Knowledge Sharing Personal Informatics Information Retrieval
COMPUTATIONAL SUPPORTS FOR KNOWLEDGE SHARING
• Meaningful games to teach community norms
• Personalized rule recommendation
• Providing assistance, prompts, and examples to improve the quality of contributions
Knowledge Sharing Personal Informatics Information Retrieval
RESEARCH PROJECTS
Knowledge Sharing
• Psychological • Behavioral
• CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve • Activity Rhythms • Smoking Cessation
• Semantic • Psychological
Information Retrieval
• RESLVE • Sentiment-‐based search
Personal Informatics
• Semantic • Psychological
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
Computational Problem:
Dimensions Mined:
Projects:
RESEARCH PROJECTS
Knowledge Sharing
• Psychological • Behavioral
• CeRI • Outreach • Task routing • Commenting
interface
• Activity Rhythms • Smoking Cessation
• Semantic • Psychological
Information Retrieval
• RESLVE • Sentiment-‐based search
Personal Informatics
• Semantic • Psychological
• Smart Pensieve
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
Computational Problem:
Dimensions Mined:
Projects:
REMINISCENCE
• Current tools are too technically focused • Emphasize data capture and logging (photos, videos,
scanned documents) • Treats memories as information to be later manipulated
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
REMINISCENCE
• Current tools are too technically focused • Emphasize data capture and logging (photos, videos,
scanned documents) • Treats memories as information to be later manipulated
• But the activity of reminiscence is actually.. • Imprecise • Social • Nuanced
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
SMART PENSIEVE: WHAT MAKES A MEMORY MEANINGFUL? • Content type
• Photos, wall posts, status updates, event information
• Social dynamics • Tie strength, kind of relationship, amount of interaction
• Temporal features • Recent, distant past
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
TRIGGERING MEMORY
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
PAM, PANAS, ISS, MSCS
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
COLLECTING SURVEY DATA
• Laboratory setting • Pro: Can monitor participants & ensure data quality • Con: More time consuming for researcher • Con: Higher pay rates
• Online surveys • Pro: Allow larger scale collection • Pro: Cheaper (time & money) • Con: Drop-‐outs and missing responses common
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
IMPROVING SURVEY ADMINISTRATION
RESEARCH PROJECTS
Knowledge Sharing
• Psychological • Behavioral
• CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve
• Smoking Cessation
• Semantic • Psychological
Information Retrieval
• RESLVE • Sentiment-‐based search
Personal Informatics
• Semantic • Psychological
• Activity Rhythms
Computational Problem:
Dimensions Mined:
Projects:
BEHAVIOR & HEALTH
• Assess sleep patterns & circadian rhythm
• Capture behavioral factors associated with stress
• Approach • Screen on/off
• Unlocking • Application usage • Internet search • SMS, email, phone
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
BEHAVIOR & HEALTH
• Scheduling Patterns • Socially-‐Oriented Behaviors
• Approach • Calendar entries, social media posts, messages
• Psycholinguistic Analysis • Personality Inventory
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
RESEARCH PROJECTS
Knowledge Sharing
• Psychological • Behavioral
• CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve • Activity Rhythms
• Semantic • Psychological
Information Retrieval
• RESLVE • Sentiment-‐based search
Personal Informatics
• Semantic • Psychological
• Smoking Cessation
Computational Problem:
Dimensions Mined:
Projects:
SMOKING CESSATION
• Leading cause of preventable death & leading form of chemical dependence in U.S.
• 44 million smokers in the U.S. alone (1/5 of population)
• 68.8% report they want to quit and over 50% have tried for at least 1 day in the past year
• Relapse common & a minority permanently abstain
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
INTERVENTION • Requires tailoring to individual conditions
• Lack of long term patient assessment & follow-‐up
• Access and affordability are obstacles
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
INTERVENTION • Requires tailoring to individual conditions
• Lack of long term patient assessment & follow-‐up
• Access and affordability are obstacles
• Technology based interventions have major shortcomings • Low adherence to established guidelines • Not personalized • Unable to handle user struggles and setbacks
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
FACTORS INFLUENCING OUTCOME • Personal, psychological, emotional traits
• Behaviors & activities
• Environment and social interactions
• Cessation motivations and process
LEVERAGING DIGITAL FOOTPRINTS
• Naturally expressed language
• Content is posted spontaneously and regularly
• Social setting • Low-‐cost, large-‐scale, longitudinal data access
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
MAKE A PREDICTION General illness + coughing + wheezing = Today I quit smoking.
Just saw a cigarette commercial with people with holes in their throat. It's official. No more cigarettes.
Today, I quit smoking. My son came home with an ashtray he made in arts and crafts class. FML
MAKE A PREDICTION General illness + coughing + wheezing = Today I quit smoking.
Just saw a cigarette commercial with people with holes in their throat. It's official. No more cigarettes.
Today, I quit smoking. My son came home with an ashtray he made in arts and crafts class. FML
MAKE A PREDICTION General illness + coughing + wheezing = Today I quit smoking.
Just saw a cigarette commercial with people with holes in their throat. It's official. No more cigarettes.
Today, I quit smoking. My son came home with an ashtray he made in arts and crafts class. FML
n i’m cool, day 4 no cigs but my mom smokes, i stay with her, does not respect me trying to quit :\
n I quit smoking on Sunday evening. Day 3 today. I feel exhausted, annoyed, bored. But the fight must go on. Keep fighting :)
n somebody is getting punched in the f***ing mouth today. #coldturkey
MAKE A PREDICTION General illness + coughing + wheezing = Today I quit smoking.
Just saw a cigarette commercial with people with holes in their throat. It's official. No more cigarettes.
Today, I quit smoking. My son came home with an ashtray he made in arts and crafts class. FML
n i’m cool, day 4 no cigs but my mom smokes, i stay with her, does not respect me trying to quit :\
n I quit smoking on Sunday evening. Day 3 today. I feel exhausted, annoyed, bored. But the fight must go on. Keep fighting :)
n somebody is getting punched in the f***ing mouth today. #coldturkey
METHODOLOGY & DATA COLLECTION
• Identify smokers • Query Twitter firehose for cessation event tweets • Sample 2000 users • 3 Mechanical Turkers per tweet for verification • 2 years worth of tweets per verified smoker (1 year before cessation event, 1 year after)
MEASURES Activity variables
• Tweet volume, burstiness, frequency
Social variables
• Friends, followers, tweets with @mentions, unique mentions
Personal & Emotional variables
• Location, sentiment intensity
Behavior Change Process variables
• Cessation date, motive to quit, treatment, stages of behavior change
MEASURES Activity variables
• Tweet volume, burstiness, frequency
Social variables
• Friends, followers, tweets with @mentions, unique mentions
Personal & Emotional variables
• Location, sentiment intensity
Behavior Change Process variables
• Cessation date, motive to quit, treatment, stages of behavior change
MEASURES Activity variables
• Tweet volume, burstiness, frequency
Social variables
• Friends, followers, tweets with @mentions, unique mentions
Personal & Emotional variables
• Location, sentiment intensity
Behavior Change Process variables
• Cessation date, motive to quit, treatment, stages of behavior change
MEASURES Activity variables
• Tweet volume, burstiness, frequency
Social variables
• Friends, followers, tweets with @mentions, unique mentions
Personal & Emotional variables
• Location, sentiment intensity
Behavior Change Process variables
• Cessation date, motive to quit, treatment, stages of behavior change
MEASURES Activity variables
• Tweet volume, burstiness, frequency
Social variables
• Friends, followers, tweets with @mentions, unique mentions
Personal & Emotional variables
• Location, sentiment intensity
Behavior Change Process variables
• Cessation date, motive to quit, treatment, stages of behavior change
RESPONSE VARIABLES Outcome Survival / Relapse
Survivors Congratulations to me, still smoke free J
@username nope i don’t smoke anymore
first few weeks were hard but I haven’t craved a cig in months
Relapsers Day 26: Broke down and bought a pack of smokes last weekend. Smoked the last one today.
Well, tried to quit smokin tobacco but..had a fucked up day
So day 3 of not smoking is about to get cut short..i can’t do it lol
ALIGNMENT WITH CDC REPORTS
! !
Men Women CDC 54% 46% Twitter 59% 41%
Location
Gender
Abstinence Rates
! !
ALIGNMENT WITH CDC REPORTS
! !
Men Women CDC 54% 46% Twitter 59% 41%
Location
Gender
Abstinence Rates
! !
ALIGNMENT WITH CDC REPORTS
! !
Men Women CDC 54% 46% Twitter 59% 41%
Location
Gender
Abstinence Rates
! !
ALIGNMENT WITH CDC REPORTS
! !
Men Women CDC 54% 46% Twitter 59% 41%
Location
Gender
Abstinence Rates
! !
RESULTS
• Survivors (S) and Relapsers (R)
• Before (B) and After (A) the cessation point
SIGNIFICANT DIFFERENCES: ACTIVITY
Tweets before
Tweets after
Burst before
Burst after
Freq before Freq after
FAIL 1243 3551 10.119 10.943 3.56 2.704
SUCCEED 412 771 4.459 4.278 9.906 11.254
TIME OF DAY
!“im really considering smoking tonight bcause im so stressed”
TIME OF DAY
!“outside the club and guy beside me smoking makes me wanna”
“im really considering smoking tonight bcause im so stressed”
SIGNIFICANT DIFFERENCES: SOCIAL
Friends before
Friends after Followers before
Follwers after
FAIL .093 .073 .074 .064
SUCCEED .187 .207 .114 .125
“Starting the patch today. Everyone please support me on the road to quitting smoking”
“Ok I started a really big challenge yesterday... I quit smoking! I may need some help from you guys in the upcoming days/weeks”.
SIGNIFICANT DIFFERENCES: SOCIAL
Friends before
Friends after Followers before
Follwers after
FAIL .093 .073 .074 .064
SUCCEED .187 .207 .114 .125
Day 2 of not smoking #bittersweet I quit smoking yesterday and everyone is pissing me off! Day 3 without a cig. Ooo I'm about to shoot someone
MOTIVES
!
!
!
Information Retrieval Personal Informatics Information Retrieval Knowledge Sharing
PREDICTION
CONTRIBUTIONS Theoretical contributions Goal setting Behavior change
Computational contributions Classification of smoking-‐relevant content
Extraction of informative data features Modeling the process & predicting ultimate outcome Design implications for intelligent intervention technologies
RESEARCH PROJECTS
Information Retrieval Knowledge Sharing Personal Informatics Computational Problem:
Dimensions Mined:
Projects:
• Semantic • Psychological
• Psychological • Behavioral
• CeRI • Outreach • Task routing • Commenting
interface
• Smart Pensieve • Activity Rhythms • Smoking Cessation
• Semantic • Psychological
• RESLVE • Sentiment-‐based search
SUMMARY & CONCLUSION
• Advance our understanding of what our digital footprints reveal about us as humans
• Develop new computational techniques that can make sense of and utilize this data’s nuanced semantic, psychological, and behavioral dimensions
• Apply the resulting intelligent systems across multiple domains in order to help people use digital information and have meaningful experiences with technology
THANK YOU!
• Advance our understanding of what our digital footprints reveal about us as humans
• Develop new computational techniques that can make sense of and utilize this data’s nuanced semantic, psychological, and behavioral dimensions
• Apply the resulting intelligent systems across multiple domains in order to help people use digital information and have meaningful experiences with technology
v Questions, comments, and guidance welcome!
Elizabeth L. Murnane [email protected]
www.cs.cornell.edu/~elm236/