Upload
julie-goodwin
View
214
Download
0
Embed Size (px)
Citation preview
Lynne Grewe and Sushmita Pandey
California State University East [email protected]
The GoalUsing Social Data to make Social
Advertisement Recommendations.
PPARSPPARSSocial Network
ApplicationSocial Network
Application
Social Network
Social Network
User and FriendsUser and Friends
AdvertisementsAdvertisements
Your friends Nathan and Marty will like this
The ProblemsWhat is the Social data?Which Social Data is useable/best?How do we capture and analyze it?How do relate Social data to Advertisements?How do we deliver a Social Advertisement?
The EnvironmentSocial Network: MySpace, Facebook, Hi5,
Orkut, LinkedIn, Netlog, more
Overview of TalkPPARS overviewData – problem of multiple networksExample of DataParsingQuantizationResultsAdvertisement Recommendation ResultsFuture Work
Our System OverviewPPARS = Peer Pressure Advertisement
Recommendation SystemDATAINPUTDATAINPUT
FRONT ENDFRONT END
Get user-friends quantizedGet user-friends quantized
Process groups
Process groups
QuantizedQuantized
AdAdUser Ad
choice
User Ad
choice
Peer – Pressure Ad SelectionPeer – Pressure Ad Selection
User-originUser-origin
Group /Ad matches &
socialize
Group /Ad matches &
socialize
Model Ads
Model Ads
Social DataEvery network can provide different social dataTwo main splits: Facebook and OpenSocial
(majority of others).
OpenSocial is an open standard adopted by over 30 containers and growing --- international audience. Allows for “standardized” access. Popular containers like MySpace, Linkedin,
Google, Yahoo!, etc.Corporate support Google, Yahoo!, IBM,
Microsoft, and more.
Data FieldsAbout Me Activities Addresses Age Body_type
BooksCars
Cars Children Current_Location
Date_Of_BirthDrinker
Drinker Emails Ethnicity Fashion Food
Gender Happiest_when
Has_app Heroes Humor
ID Interests Job_interests Jobs Languages_Spoken
Living_Arrangments
Looking_for Movies Music Name
Network Prescense
Nick Name Pets Phone Political Views
Profile song Profile url Profile video quotes
Relationship status
Religion
Romance Scared Of Schools Sexual Orientation
Sports
Status Tags Thumbain Url Addresses Time Zone
Turn Ons Turn Offs TV Shows URLS
Some Example Data
AboutMe Ok, so I am a graduate of with degrees in Philosophy, and Religion. I currently live in with my wife and daughter. I enjoy Snowboarding/skiing, Motorcycles, computers, sports cars, and hanging out with friends.
Some Example DataAge 33
Books The Professor and the Madman, Plato, Aristotle, Locke, Hume, Kant, luscombe
Movies Things to do in Denver when yer dead, The Departed, Encino Man, Real Genius
Music Very Eclectic, including Pennywise, Disturbed, System of a Down, Linkin Park, Senses Fail, Mudvayne, Goldfinger, and a bunch of others I am sure I cannot remember at this time
Music allen to // chimaira // sw1tched // bleed the sky // destiny // 40 below summer // endo // nothingface // enhancer // watcha // lamb of god // soilwork // skrape // flaw // unearth // slodust // deftones // raunchy // devildriver // reveille // american head charge // nonpoint // stutterfly // factory 81 // in flames // (hed) p.e. // dry kill logic // primer 55 // 36 crazyfists // sevendust // taproot // candiria // bionic jive // funeral for a friend // .....
Television Smallville, heros
Some Example DataInterests Snowboarding/skiing,
Motorcycles, computers, sports cars, and hanging out with friends.
Some Example DataStatus MarriedStatus In a Relationship
Smoker No
Drinker YesHeroes FatherHeroes Freie Stelle als
Held zu vergeben, Bewerbungen bitte an mich...
Looking_for Networking , Friends
Ethnicity White / CaucasianChildren Proud parentSexual_Orientation Straight
Some Example DataSchools University Of Nevada-Reno
Reno, NV Graduated: N/A Degree: Master's DegreeMajor: Hydrogeology
2007 to Present Purdue University-Main Campus West Lafayette,Indiana Graduated: 2003 Student status: AlumniDegree: Bachelor's DegreeMajor: PhilosophyMinor: CPTClubs: Purdue Student Government Liberal Arts Student CouncilGreek: Delta Chi
2001 to 2003 Reed Hs Sparks, NV Graduated: N/A Student status: AlumniDegree: High School Diploma
Social Data – which?Not all networks provide access to same data Users can keep information privateNot all data is “social”Not all data is directly useful for advertisers
Data
Current_Location
Date_Of_Birth Addresses Phone
Not typically available / private
Not all data is “social”
Not all data is directly useful for advertisers
ID Name Has_app Nich_Name Network Presence
Profile url Profile song Profile video Thumnail URL URLs
Drinker Emails Ethnicity Fashion Food
Infrequent dataFor our scheme need in common data to be
able to reason over in common feature space.Data that is NOT frequent:
Cars Fashion Food Humor
Political Views Pets Heroes
Social Data - whichFirst go around- based on network
availability and commonality, user prevalence and estimated advertisement usefulness
Balance between small sample space and feature dimensionalityAbout Me Activities Age Gender
Books TV Music Looking For
Drinker Relationship Ethnicity Religion
Language Interests Date_Of_Birth
Smoker
PPARS – Front EndUser DataUser Data
PARSINGPARSING
Individual Social Data Tokens
Individual Social Data Tokens
CodebooksCodebooksWeb
ServicesWeb
Services
Friend 1 data
Friend 1 data
Friend2 data
Friend2 data
FriendX data
FriendX data User-originUser-origin
OntologyCodebookOntologyCodebook
QuantizedQuantizedSet of User and Friend Quantized Data VectorsSet of User and Friend
Quantized Data Vectors
QUANTIZATION
I like cars, have 2 kids,….. Movies: Star Wars
Age= 30 …..
ParsingCreate small
social data tokens to passto Quantization
Null Data TestNull Data Test
Raw Social DataRaw Social Data
Hierarchical Segmentation
Split by . / ! / ?Split by . / ! / ?
Split by : Split by :
Split by - Split by -
Split by ; Split by ;
Split by ,Split by ,
Individual Social Data TokensIndividual Social Data Tokens
I like lots of movies. Like:Star Wars, Star Wars II, Jaws.And I love Harrison Fords acting.
•I like lots of movies• Like•Star Wars•Star Wars II•Jaws•And I love Harrison Fords acting.
Parsing ExampleAbout Me input = "I work as an engineer at
Motorola. I work in the peripherals department and do chip design. I am doing some management.“
Resulting Social Data Tokens:I work as an engineer at MotorolaI work in the peripherals department and do chip
designI am doing some management
Parsing ExampleInterests input = “Internet, Movies, Reading,
Karaoke,Building alternate communities”
Resulting Social Data Tokens:InternetMoviesReadingKaraokeLanguageBuilding alternative communities
Parsing ExampleMusic input = “Bands: Superdrag, Weezer, The Doors, The Beach Boys,
Journey Solo Artists: Billy Joel, Albums: Appetite for Destruction - Guns & Roses; Blue - Weezer“
Resulting Social Data Tokens: Bands Superdrag The Doors Cheap Trick The Beach Boys Journey Solo Artists Billy Joel Albums Appetite for Destruction Guns & Roses Blue Weezer
Lost formatting of line return between Journey and Solo Artists
ParsingSimple technique of segmentationFuture work – include semantics of phrases
to detect potential “headings”, syntax rules around delimiters like : and –
QuantizationTake a social data token and translate it into a
numerical feature vector. “I like cars” Cars = 0.2
For each social data field need to create meaningful feature vector elements.
For each social data field need to come up with techniques/algorithms to translate the raw social data token into support for its different feature vector elements.
Quantization- feature vectorPattern Recognition and Matching are later
parts of PPARSNeed numerical representations for this of
our user, friend social data and also to represent Ads.
“I like cars” =???what ad??
Cars = 0.2 Ad with cars around 0.2
Quantization – feature vectorFor each social data element like “About Us”,
“Gender”, “Movies” we have designed its own feature vector.
Result of technique used to quantize the input social token data
Result of studying keywords /trends in user database of sample social tokens.
To understand this ---- lets first discuss techniques used to quantize social data tokens as it related to the “type” of data element.
Quantization and Social Data TypeNumerical Data
Data is naturally numerical – i.e. Age, date of birth Can be quickly and effectively translated into number in some defined range:
Address – can be translated into lattitude and longitude Phone – again limited in digits Time zone – again predefined ranges
Categorizable Data Data where there is a predefined accepted taxonomy – i.e. movies their genre Data where through sample analysis and advertisement goals categories can
be derived Example: interests, about me, food, fashion
Indexed Data This is data that has defined sets of values specific to either container or
OpenSocial. Example : smoker = yes, no, occasionally, quit, never Other examples: gender, relationship, drinker, sexual orientation
Other This is data for which we can not easily derive an algorithm for categorizing.
Examples Profile Image , Profile Song URL, etc.
Collapsing of DataSome data fields have almost same meaning
or content typically greatly overlaps About Me and Interests (and even Status) Age and Date of Birth
Categorizable DataThis is the bulk of the data fields: About Me,
Interests, Music, Movies, TV, Books, Looking For, Religion, Ethnicity, Language
Determine Feature Elements:Accepted “standard” taxonomies Web Service taxonomiesAdvertisement driven taxonomies
PPARS – Front EndUser DataUser Data
PARSINGPARSING
Individual Social Data Tokens
Individual Social Data Tokens
CodebooksCodebooksWeb
ServicesWeb
Services
Friend 1 data
Friend 1 data
Friend2 data
Friend2 data
FriendX data
FriendX data User-originUser-origin
OntologyCodebookOntologyCodebook
QuantizedQuantizedSet of User and Friend Quantized Data VectorsSet of User and Friend
Quantized Data Vectors
QUANTIZATION
I like cars, have 2 kids,….. Movies: Star Wars
Age= 30 …..
Categorization: Web ServiceFor some of our social data fields we are able
to utilize popular web services to convert our social data tokens into search hits that have categorized information associated with them.
Example: Internet Video Archive and IMDB Use movie genre
IVA – movie search by actor “Robert Redford” http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?
DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford
Some of the Results :- <item>- <Description>- <![CDATA[ The Unforeseen movie trailer - starring Robert Redford, Willie Nelson, Ann Richards,
Gary Bradley, Judah Folkman, William Greider. Directed by Laura Dunn. Theatrical Release Date: 2/29/2008 Genre: Documentary Rating: Not Rated ]]>
</Description> <Title>THE UNFORESEEN</Title> <Language>English</Language> <Country>United States</Country> <SiteUrl /> <Studio>Two Birds Films</Studio> <StudioID>3018</StudioID> <Rating>Not Rated</Rating>
<Genre>Documentary</Genre> <GenreID>13</GenreID>
IVA – movie search continued http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?
DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford
<HomeVideoReleaseDate>9/16/2008</HomeVideoReleaseDate> <TheatricalReleaseDate>2/29/2008</TheatricalReleaseDate> <Director>Laura Dunn</Director> <DirectorID>36635</DirectorID> <Actor1>Robert Redford</Actor1> <ActorId1>7105</ActorId1> <Actor2>Willie Nelson</Actor2> <ActorId2>8591</ActorId2> <Actor3>Ann Richards</Actor3> <ActorId3>36642</ActorId3> <Actor4>Gary Bradley</Actor4> <ActorId4>36637</ActorId4>
IVA – movie search continued http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?
DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford
<HomeVideoReleaseDate>9/16/2008</HomeVideoReleaseDate> <Link>http://videodetective.com/titledetails.aspx?publishedid=947964</Link> <BoxOfficeInMillions>-1</BoxOfficeInMillions> - <!-- Television Content --> <AirDayOfWeek>-1</AirDayOfWeek> <AirStartTime /> <ShowLengthInMinutes>-1</ShowLengthInMinutes> <IsTelevisionContent>false</IsTelevisionContent> <FirstReleasedYear>2008</FirstReleasedYear>
<Image>http://content.internetvideoarchive.com/content/photos/1250/05253626_.jpg</Image>
<Duration>164</Duration> <DateCreated>3/20/2008 8:00:00 AM</DateCreated> <Media>Movie</Media> <PublishedId>947964</PublishedId> <DateModified>4/22/2011 1:57:00 PM</DateModified>
AND MORE !!!!
selected GENRE
IVA genres --- our movie feature elements
VideoCategory
Not Assigned
Western
Action-Adventure
Children's
Comedy
Drama
Family
Horror
Musical
Mystery-Suspense
Non-Fiction
Sci-Fi
War
Health/ Workout
Documentary
Thriller
Biography
Romance
Movie QuantizationFor each Social data token “Adam Sandler” , “Star
Wars” we can get multiple hits.
Example, “Robert Redford” – first 8 hits:Drama = 5Western = 1Documentary = 2
Issues: How do we know if actor name, movie title, director or
other? Multiple hits for actor or director ---what do we do?
(evidence them all) Multiple hits for movie title – what do we do? (take first hit)
These genres become our Movie feature elements
Order of Movie QuantizationGiven any social data element parsed from
the user’s MOVIE data, we cannot know apriori if it is a title or actor or director’s name. It may even be the genre of movies a user likes.
1.Title search (take first hit)
2.Actor search (evidence all)
3.Director Search (evidence all)
4.Keyword Matching (see next)
Quantization Result 1Up,Forrest Gump,Rear Window,District 9,Pac-
Man,WALL·E,My Flesh and Blood, MacMusical,
Yields:MOVIE_FAMILY=0.6, MOVIE_SCIFI=0.2,
MOVIE_DOCUMENTARY=0.4, MOVIE_THRILLER=0.2
Quantization using other servicesTV - IMDB,
http://www.imdb.com/search/title?title_type=tv_series&title=".
Books - Google Books Search, http://books.google.com/books/feeds/volumes?
Music - IVA’s music API http://api.internetvideoarchive.com/Music/**
Quantization via Keyword MatchingWhat do we do when there is no pre-determined
taxonomy and no services for database hits?Natural Language Processing techniques
Currently employ simple (but, effective and efficient) technique of Keyword matching /lookupCreate database of predetermined phrases/
keywordsLookup scheme to quantize social data token(s).
Individual Social Data Tokens
Individual Social Data Tokens
CodebooksCodebooksOntologyCodebookOntologyCodebook
QuantizedQuantizedSet of User and Friend Quantized Data VectorsSet of User and Friend
Quantized Data Vectors
“I work as an engineer” About ME lookup??“Watch a lot of drama” Movies look up ??
Keyword DatabaseUsed on : About Me / Interests, Religion,
Ethnicity, Looking For, Language, Relationship
Secondary use: Books, TV, Music, MoviesWhen service fails to provide any hits
Keyword Database Creationmanual scanning of hundreds (at starting level) of
user profilesdomain specific expert (human) knowledgedictionaries and taxonomies when exist
Issue: how determine weights for every entryExpert determined (consistency) or all equal valued
(no sense of importance)Issue: at very beginning level---can we create a
dictionary for everything ---no --- are there more advance NLP techniques
Some arbitrary Keyword DB entriesABOUT_ME HOME Cats 0.2ABOUT_ME HOME Children 0.2ABOUT_ME HOME Daughter 0.2ABOUT_ME HOME Dog 0.2ABOUT_ME HOME Cats 0.2ABOUT_ME HOME Children 0.2ABOUT_ME HOME Daughter 0.2ABOUT_ME HOME Dog 0.2ABOUT_ME HOME home 0.5
Some arbitrary Keyword DB entriesABOUT_ME ENTERTAINMENT
Shopping 0.2ABOUT_ME ENTERTAINMENT Shows
0.2ABOUT_ME ENTERTAINMENT Sing
0.2ABOUT_ME ENTERTAINMENT Ski
0.2ABOUT_ME ENTERTAINMENT
Songwriter 0.2
Keyword DB- evidence weight
Issue: how determine weights for every entryExpert determined (consistency) or all equal valued (no sense of importance)
System options: DB weights can take on different values, option to run with all weights equal.
Keyword DB- ??Issue: at very beginning level---can we create a
dictionary for everything ---no --- are there more advance NLP techniques to explore for inferences.
While users can write anything (and do), remember we are focuses on Advertisement Recommendation --- so the scope of our language is limited to hits related to our feature vector elements….this is a constrained problem
Home, Entertainment, Smoking, Work, Social, Movies, TV, Shopping, Books, etc.—these are the kinds of areas we are concerned with.
Types of Keyword MatchingSTRICT
Social data token must match exactly a DB entry“Drama” Drama √“I like Drama” Drama X
DB_ENTRY_CONTAINS_DATA_ELEMENTData token must exist inside the DB entry
“Drama” Drama and Comedy √
DB_ENTRY_PARTOF_DATA_ELEMENTPart of data token matches DB entry (this is further
segmenting data token) “I like Drama” Drama √
Quantization Results different kinds of Keyword Matching ‘ I am a student and I work and love cars'Output STRICT: No hitsABOUT_ME_ENTERTAINMENT = -1
ABOUT_ME_WORK = -1ABOUT_ME_HOME] = -1ABOUT_ME_SOCIAL = -1ABOUT_ME_FOOD = -1
Quantization Results different kinds of Keyword Matching ‘ I am a student and I work and love cars' Output
DB_ENTRY_CONTAINS_DATA_ELEMENTNo hitsABOUT_ME_ENTERTAINMENT = -1
ABOUT_ME_WORK = -1ABOUT_ME_HOME] = -1ABOUT_ME_SOCIAL = -1ABOUT_ME_FOOD = -1
Quantization Results different kinds of Keyword Matching
‘ I am a student and I work and love cars'
Output DB_ENTRY_PARTOF_DATA_ELEMENTkeyword = student ABOUT_ME_WORK =0.2 keyword = work ABOUT_ME_WORK =0.5 keyword = cars ABOUT_ME_ENTERTAINMENT =0.2 keyword = LOVE ABOUT_ME_HOME=0.2 ABOUT_ME_SOCIAL=0.2 ABOUT_ME_ENTERTAINMENT = 0.2
ABOUT_ME_WORK = 0.7 ABOUT_ME_HOME = 0.2 ABOUT_ME_SOCIAL = 0.2 ABOUT_ME_FOOD = -1
Quantization Results 2 – using DB_ENTRY_PARTOF_DATA_ELEMENT“
Fell in love with computers at 11, never got over it... Nonetheless, I have always understood that human problems are solved by people, not technology. My lifes work has been to empower communities to design and build their own solutions.”
6 data tokens from parsing RESULTS:
ABOUT_ME_ENTERTAINMENT = 0.2ABOUT_ME_WORK = 0.5ABOUT_ME_HOME = 0.2ABOUT_ME_SOCIAL = 0.2ABOUT_ME_FOOD = -1
Quantization Result 3 – good null resultsi am xing ju. test ABOUT ME for opensocial. Parsed results:i am xing jutest ABOUT ME for opensocial
NO keyword db hits ABOUT_ME_ENTERTAINMENT=> -1 ABOUT_ME_WORK => -1 ABOUT_ME_HOME => -1 ABOUT_ME_SOCIAL => -1 ABOUT_ME_FOOD => -1
Quantization ResultsGarbage in and Garbage out
LoL really dude that is the way to be no hits
is this garbage “LoL” = lots of love…..could you interpret this to be someone interested in social / friends?? Future – deeper interpretation / semantic analysis?
IndexedSmoker, Drinker, Gender, Relationship (some
networks), Looking for (some networks) , etc.
Example for Drinker:
opensocial.Enum.Drinker.HEAVILYopensocial.Enum.Drinker.NOopensocial.Enum.Drinker.OCCASIONALLYopensocial.Enum.Drinker.QUITopensocial.Enum.Drinker.QUITTINGopensocial.Enum.Drinker.REGULARLYopensocial.Enum.Drinker.SOCIALLYopensocial.Enum.Drinker.YES
Quantized Feature Vector107 elementsNormalize to 0 to 1.0 (near)
Advertisement DescriptionExperts manually determine the feature
vector weighting for each add.Future –
to automate this from survey/ input directly from Advertiser
Is there a way to analyze the ad message or image – image understanding? Will results even match advertiser’s goals.
PPARS --- Advertisement MatchingNot focus of this talkCurrently doing variations on KNN with
different forms of clusteringEarly results with small advertising database
and beginning Keyword database look goodWhat kinds of groups ---groups with user in it
or not? based on only in common feature elements or not.
PPARS- Advertisement DeliveryArea of future work could be in effective
delivery of “social message” related to selected add. Now simple form of direct delivery
Based on grouping of same gender and age and strong likesin interests on home.
PPARS- Advertisement DeliveryArea of future work could be in effective
delivery of “social message” related to selected add. Now simple form of direct delivery
Based on grouping of same gender and age and drinking.This is a grouping the user is not part of---only friends
Your friends Nathan and Marty will like this
PPARS- Advertisement DeliveryHere the grouping is “loose” only related by
gender and very loosely by age. So the advertisement match is not great
Question: should be only serve to “strong” groups?
Analysis of Advertisement ResultsGroupings are tight when data allowsMatches to advertisements in levels – best,
top 10, etc. are correct
Future WorkParsing – more syntax and semantics (NLP)
Parsing – differences in different languages.
Quantization – extend to Natural Language Understanding in addition/replacement of Keyword matching, effects of different evidence accumulation.
Data Extrapolation – using inference to create hits in more feature elements.