Lynne Grewe and Sushmita Pandey California State University East Bay [email protected]

Lynne Grewe and Sushmita Pandey

California State University East [email protected]

The GoalUsing Social Data to make Social

Advertisement Recommendations.

PPARSPPARSSocial Network

ApplicationSocial Network

Application

Social Network

Social Network

User and FriendsUser and Friends

AdvertisementsAdvertisements

Your friends Nathan and Marty will like this

The ProblemsWhat is the Social data?Which Social Data is useable/best?How do we capture and analyze it?How do relate Social data to Advertisements?How do we deliver a Social Advertisement?

The EnvironmentSocial Network: MySpace, Facebook, Hi5,

Orkut, LinkedIn, Netlog, more

Overview of TalkPPARS overviewData – problem of multiple networksExample of DataParsingQuantizationResultsAdvertisement Recommendation ResultsFuture Work

Our System OverviewPPARS = Peer Pressure Advertisement

Recommendation SystemDATAINPUTDATAINPUT

FRONT ENDFRONT END

Get user-friends quantizedGet user-friends quantized

Process groups

Process groups

QuantizedQuantized

AdAdUser Ad

choice

User Ad

choice

Peer – Pressure Ad SelectionPeer – Pressure Ad Selection

User-originUser-origin

Group /Ad matches &

socialize

Group /Ad matches &

socialize

Model Ads

Model Ads

Social DataEvery network can provide different social dataTwo main splits: Facebook and OpenSocial

(majority of others).

OpenSocial is an open standard adopted by over 30 containers and growing --- international audience. Allows for “standardized” access. Popular containers like MySpace, Linkedin,

Google, Yahoo!, etc.Corporate support Google, Yahoo!, IBM,

Microsoft, and more.

Data FieldsAbout Me Activities Addresses Age Body_type

BooksCars

Cars Children Current_Location

Date_Of_BirthDrinker

Drinker Emails Ethnicity Fashion Food

Gender Happiest_when

Has_app Heroes Humor

ID Interests Job_interests Jobs Languages_Spoken

Living_Arrangments

Looking_for Movies Music Name

Network Prescense

Nick Name Pets Phone Political Views

Profile song Profile url Profile video quotes

Relationship status

Religion

Romance Scared Of Schools Sexual Orientation

Sports

Status Tags Thumbain Url Addresses Time Zone

Turn Ons Turn Offs TV Shows URLS

Some Example Data

AboutMe Ok, so I am a graduate of with degrees in Philosophy, and Religion. I currently live in with my wife and daughter. I enjoy Snowboarding/skiing, Motorcycles, computers, sports cars, and hanging out with friends.

Some Example DataAge 33

Books The Professor and the Madman, Plato, Aristotle, Locke, Hume, Kant, luscombe

Movies Things to do in Denver when yer dead, The Departed, Encino Man, Real Genius

Music Very Eclectic, including Pennywise, Disturbed, System of a Down, Linkin Park, Senses Fail, Mudvayne, Goldfinger, and a bunch of others I am sure I cannot remember at this time

Music allen to // chimaira // sw1tched // bleed the sky // destiny // 40 below summer // endo // nothingface // enhancer // watcha // lamb of god // soilwork // skrape // flaw // unearth // slodust // deftones // raunchy // devildriver // reveille // american head charge // nonpoint // stutterfly // factory 81 // in flames // (hed) p.e. // dry kill logic // primer 55 // 36 crazyfists // sevendust // taproot // candiria // bionic jive // funeral for a friend // .....

Television Smallville, heros

Some Example DataInterests Snowboarding/skiing,

Motorcycles, computers, sports cars, and hanging out with friends.

Some Example DataStatus MarriedStatus In a Relationship

Smoker No

Drinker YesHeroes FatherHeroes Freie Stelle als

Held zu vergeben, Bewerbungen bitte an mich...

Looking_for Networking , Friends

Ethnicity White / CaucasianChildren Proud parentSexual_Orientation Straight

Some Example DataSchools University Of Nevada-Reno

Reno, NV Graduated: N/A Degree: Master's DegreeMajor: Hydrogeology

2007 to Present Purdue University-Main Campus West Lafayette,Indiana Graduated: 2003 Student status: AlumniDegree: Bachelor's DegreeMajor: PhilosophyMinor: CPTClubs: Purdue Student Government Liberal Arts Student CouncilGreek: Delta Chi

2001 to 2003 Reed Hs Sparks, NV Graduated: N/A Student status: AlumniDegree: High School Diploma

Social Data – which?Not all networks provide access to same data Users can keep information privateNot all data is “social”Not all data is directly useful for advertisers

Data

Current_Location

Date_Of_Birth Addresses Phone

Not typically available / private

Not all data is “social”

Not all data is directly useful for advertisers

ID Name Has_app Nich_Name Network Presence

Profile url Profile song Profile video Thumnail URL URLs

Drinker Emails Ethnicity Fashion Food

Infrequent dataFor our scheme need in common data to be

able to reason over in common feature space.Data that is NOT frequent:

Cars Fashion Food Humor

Political Views Pets Heroes

Social Data - whichFirst go around- based on network

availability and commonality, user prevalence and estimated advertisement usefulness

Balance between small sample space and feature dimensionalityAbout Me Activities Age Gender

Books TV Music Looking For

Drinker Relationship Ethnicity Religion

Language Interests Date_Of_Birth

Smoker

PPARS – Front EndUser DataUser Data

PARSINGPARSING

Individual Social Data Tokens


CodebooksCodebooksWeb

ServicesWeb

Services

Friend 1 data

Friend 1 data

Friend2 data

Friend2 data

FriendX data

FriendX data User-originUser-origin

OntologyCodebookOntologyCodebook

QuantizedQuantizedSet of User and Friend Quantized Data VectorsSet of User and Friend

Quantized Data Vectors

QUANTIZATION

I like cars, have 2 kids,….. Movies: Star Wars

Age= 30 …..

ParsingCreate small

social data tokens to passto Quantization

Null Data TestNull Data Test

Raw Social DataRaw Social Data

Hierarchical Segmentation

Split by . / ! / ?Split by . / ! / ?

Split by : Split by :

Split by - Split by -

Split by ; Split by ;

Split by ,Split by ,

Individual Social Data TokensIndividual Social Data Tokens

I like lots of movies. Like:Star Wars, Star Wars II, Jaws.And I love Harrison Fords acting.

•I like lots of movies• Like•Star Wars•Star Wars II•Jaws•And I love Harrison Fords acting.

Parsing ExampleAbout Me input = "I work as an engineer at

Motorola. I work in the peripherals department and do chip design. I am doing some management.“

Resulting Social Data Tokens:I work as an engineer at MotorolaI work in the peripherals department and do chip

designI am doing some management

Parsing ExampleInterests input = “Internet, Movies, Reading,

Karaoke,Building alternate communities”

Resulting Social Data Tokens:InternetMoviesReadingKaraokeLanguageBuilding alternative communities

Parsing ExampleMusic input = “Bands: Superdrag, Weezer, The Doors, The Beach Boys,

Journey Solo Artists: Billy Joel, Albums: Appetite for Destruction - Guns & Roses; Blue - Weezer“

Resulting Social Data Tokens: Bands Superdrag The Doors Cheap Trick The Beach Boys Journey Solo Artists Billy Joel Albums Appetite for Destruction Guns & Roses Blue Weezer

Lost formatting of line return between Journey and Solo Artists

ParsingSimple technique of segmentationFuture work – include semantics of phrases

to detect potential “headings”, syntax rules around delimiters like : and –

QuantizationTake a social data token and translate it into a

numerical feature vector. “I like cars” Cars = 0.2

For each social data field need to create meaningful feature vector elements.

For each social data field need to come up with techniques/algorithms to translate the raw social data token into support for its different feature vector elements.

Quantization- feature vectorPattern Recognition and Matching are later

parts of PPARSNeed numerical representations for this of

our user, friend social data and also to represent Ads.

“I like cars” =???what ad??

Cars = 0.2 Ad with cars around 0.2

Quantization – feature vectorFor each social data element like “About Us”,

“Gender”, “Movies” we have designed its own feature vector.

Result of technique used to quantize the input social token data

Result of studying keywords /trends in user database of sample social tokens.

To understand this ---- lets first discuss techniques used to quantize social data tokens as it related to the “type” of data element.

Quantization and Social Data TypeNumerical Data

Data is naturally numerical – i.e. Age, date of birth Can be quickly and effectively translated into number in some defined range:

Address – can be translated into lattitude and longitude Phone – again limited in digits Time zone – again predefined ranges

Categorizable Data Data where there is a predefined accepted taxonomy – i.e. movies their genre Data where through sample analysis and advertisement goals categories can

be derived Example: interests, about me, food, fashion

Indexed Data This is data that has defined sets of values specific to either container or

OpenSocial. Example : smoker = yes, no, occasionally, quit, never Other examples: gender, relationship, drinker, sexual orientation

Other This is data for which we can not easily derive an algorithm for categorizing.

Examples Profile Image , Profile Song URL, etc.

Collapsing of DataSome data fields have almost same meaning

or content typically greatly overlaps About Me and Interests (and even Status) Age and Date of Birth

Categorizable DataThis is the bulk of the data fields: About Me,

Interests, Music, Movies, TV, Books, Looking For, Religion, Ethnicity, Language

Determine Feature Elements:Accepted “standard” taxonomies Web Service taxonomiesAdvertisement driven taxonomies

PPARS – Front EndUser DataUser Data

PARSINGPARSING



CodebooksCodebooksWeb

ServicesWeb

Services

Friend 1 data

Friend 1 data

Friend2 data

Friend2 data

FriendX data

FriendX data User-originUser-origin

OntologyCodebookOntologyCodebook



QUANTIZATION

I like cars, have 2 kids,….. Movies: Star Wars

Age= 30 …..

Categorization: Web ServiceFor some of our social data fields we are able

to utilize popular web services to convert our social data tokens into search hits that have categorized information associated with them.

Example: Internet Video Archive and IMDB Use movie genre

IVA – movie search by actor “Robert Redford” http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?

DeveloperId=f377f57f-3bad-4704-8e80-1b643b206abd&SearchTerm=Robert+Redford

Some of the Results :- <item>- <Description>- <![CDATA[ The Unforeseen movie trailer - starring Robert Redford, Willie Nelson, Ann Richards,

Gary Bradley, Judah Folkman, William Greider. Directed by Laura Dunn. Theatrical Release Date: 2/29/2008 Genre: Documentary Rating: Not Rated ]]>

</Description> <Title>THE UNFORESEEN</Title> <Language>English</Language> <Country>United States</Country> <SiteUrl /> <Studio>Two Birds Films</Studio> <StudioID>3018</StudioID> <Rating>Not Rated</Rating>

<Genre>Documentary</Genre> <GenreID>13</GenreID>

IVA – movie search continued http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?


<HomeVideoReleaseDate>9/16/2008</HomeVideoReleaseDate> <TheatricalReleaseDate>2/29/2008</TheatricalReleaseDate> <Director>Laura Dunn</Director> <DirectorID>36635</DirectorID> <Actor1>Robert Redford</Actor1> <ActorId1>7105</ActorId1> <Actor2>Willie Nelson</Actor2> <ActorId2>8591</ActorId2> <Actor3>Ann Richards</Actor3> <ActorId3>36642</ActorId3> <Actor4>Gary Bradley</Actor4> <ActorId4>36637</ActorId4>

IVA – movie search continued http://api.internetvideoarchive.com/Video/MoviesByActorName.aspx?


<HomeVideoReleaseDate>9/16/2008</HomeVideoReleaseDate> <Link>http://videodetective.com/titledetails.aspx?publishedid=947964</Link> <BoxOfficeInMillions>-1</BoxOfficeInMillions> -  <AirDayOfWeek>-1</AirDayOfWeek> <AirStartTime /> <ShowLengthInMinutes>-1</ShowLengthInMinutes> <IsTelevisionContent>false</IsTelevisionContent> <FirstReleasedYear>2008</FirstReleasedYear>

<Image>http://content.internetvideoarchive.com/content/photos/1250/05253626_.jpg</Image>

<Duration>164</Duration> <DateCreated>3/20/2008 8:00:00 AM</DateCreated> <Media>Movie</Media> <PublishedId>947964</PublishedId> <DateModified>4/22/2011 1:57:00 PM</DateModified>

AND MORE !!!!

selected GENRE

IVA genres --- our movie feature elements

VideoCategory

Not Assigned

Western

Action-Adventure

Children's

Comedy

Drama

Family

Horror

Musical

Mystery-Suspense

Non-Fiction

Sci-Fi

War

Health/ Workout

Documentary

Thriller

Biography

Romance

Movie QuantizationFor each Social data token “Adam Sandler” , “Star

Wars” we can get multiple hits.

Example, “Robert Redford” – first 8 hits:Drama = 5Western = 1Documentary = 2

Issues: How do we know if actor name, movie title, director or

other? Multiple hits for actor or director ---what do we do?

(evidence them all) Multiple hits for movie title – what do we do? (take first hit)

These genres become our Movie feature elements

Order of Movie QuantizationGiven any social data element parsed from

the user’s MOVIE data, we cannot know apriori if it is a title or actor or director’s name. It may even be the genre of movies a user likes.

1.Title search (take first hit)

2.Actor search (evidence all)

3.Director Search (evidence all)

4.Keyword Matching (see next)

Quantization Result 1Up,Forrest Gump,Rear Window,District 9,Pac-

Man,WALL·E,My Flesh and Blood, MacMusical,

Yields:MOVIE_FAMILY=0.6, MOVIE_SCIFI=0.2,

MOVIE_DOCUMENTARY=0.4, MOVIE_THRILLER=0.2

Quantization using other servicesTV - IMDB,

http://www.imdb.com/search/title?title_type=tv_series&title=".

Books - Google Books Search, http://books.google.com/books/feeds/volumes?

Music - IVA’s music API http://api.internetvideoarchive.com/Music/**

Quantization via Keyword MatchingWhat do we do when there is no pre-determined

taxonomy and no services for database hits?Natural Language Processing techniques

Currently employ simple (but, effective and efficient) technique of Keyword matching /lookupCreate database of predetermined phrases/

keywordsLookup scheme to quantize social data token(s).



CodebooksCodebooksOntologyCodebookOntologyCodebook



“I work as an engineer” About ME lookup??“Watch a lot of drama” Movies look up ??

Keyword DatabaseUsed on : About Me / Interests, Religion,

Ethnicity, Looking For, Language, Relationship

Secondary use: Books, TV, Music, MoviesWhen service fails to provide any hits

Keyword Database Creationmanual scanning of hundreds (at starting level) of

user profilesdomain specific expert (human) knowledgedictionaries and taxonomies when exist

Issue: how determine weights for every entryExpert determined (consistency) or all equal valued

(no sense of importance)Issue: at very beginning level---can we create a

dictionary for everything ---no --- are there more advance NLP techniques

Some arbitrary Keyword DB entriesABOUT_ME HOME Cats 0.2ABOUT_ME HOME Children 0.2ABOUT_ME HOME Daughter 0.2ABOUT_ME HOME Dog 0.2ABOUT_ME HOME Cats 0.2ABOUT_ME HOME Children 0.2ABOUT_ME HOME Daughter 0.2ABOUT_ME HOME Dog 0.2ABOUT_ME HOME home 0.5

Some arbitrary Keyword DB entriesABOUT_ME ENTERTAINMENT

Shopping 0.2ABOUT_ME ENTERTAINMENT Shows

0.2ABOUT_ME ENTERTAINMENT Sing

0.2ABOUT_ME ENTERTAINMENT Ski

0.2ABOUT_ME ENTERTAINMENT

Songwriter 0.2

Keyword DB- evidence weight

Issue: how determine weights for every entryExpert determined (consistency) or all equal valued (no sense of importance)

System options: DB weights can take on different values, option to run with all weights equal.

Keyword DB- ??Issue: at very beginning level---can we create a

dictionary for everything ---no --- are there more advance NLP techniques to explore for inferences.

While users can write anything (and do), remember we are focuses on Advertisement Recommendation --- so the scope of our language is limited to hits related to our feature vector elements….this is a constrained problem

Home, Entertainment, Smoking, Work, Social, Movies, TV, Shopping, Books, etc.—these are the kinds of areas we are concerned with.

Types of Keyword MatchingSTRICT

Social data token must match exactly a DB entry“Drama” Drama √“I like Drama” Drama X

DB_ENTRY_CONTAINS_DATA_ELEMENTData token must exist inside the DB entry

“Drama” Drama and Comedy √

DB_ENTRY_PARTOF_DATA_ELEMENTPart of data token matches DB entry (this is further

segmenting data token) “I like Drama” Drama √

Quantization Results different kinds of Keyword Matching ‘ I am a student and I work and love cars'Output STRICT: No hitsABOUT_ME_ENTERTAINMENT = -1

ABOUT_ME_WORK = -1ABOUT_ME_HOME] = -1ABOUT_ME_SOCIAL = -1ABOUT_ME_FOOD = -1

Quantization Results different kinds of Keyword Matching ‘ I am a student and I work and love cars' Output

DB_ENTRY_CONTAINS_DATA_ELEMENTNo hitsABOUT_ME_ENTERTAINMENT = -1

ABOUT_ME_WORK = -1ABOUT_ME_HOME] = -1ABOUT_ME_SOCIAL = -1ABOUT_ME_FOOD = -1

Quantization Results different kinds of Keyword Matching

‘ I am a student and I work and love cars'

Output DB_ENTRY_PARTOF_DATA_ELEMENTkeyword = student ABOUT_ME_WORK =0.2 keyword = work ABOUT_ME_WORK =0.5 keyword = cars ABOUT_ME_ENTERTAINMENT =0.2 keyword = LOVE ABOUT_ME_HOME=0.2 ABOUT_ME_SOCIAL=0.2 ABOUT_ME_ENTERTAINMENT = 0.2

ABOUT_ME_WORK = 0.7 ABOUT_ME_HOME = 0.2 ABOUT_ME_SOCIAL = 0.2 ABOUT_ME_FOOD = -1

Quantization Results 2 – using DB_ENTRY_PARTOF_DATA_ELEMENT“

Fell in love with computers at 11, never got over it... Nonetheless, I have always understood that human problems are solved by people, not technology. My lifes work has been to empower communities to design and build their own solutions.”

6 data tokens from parsing RESULTS:

ABOUT_ME_ENTERTAINMENT = 0.2ABOUT_ME_WORK = 0.5ABOUT_ME_HOME = 0.2ABOUT_ME_SOCIAL = 0.2ABOUT_ME_FOOD = -1

Quantization Result 3 – good null resultsi am xing ju. test ABOUT ME for opensocial. Parsed results:i am xing jutest ABOUT ME for opensocial

NO keyword db hits ABOUT_ME_ENTERTAINMENT=> -1 ABOUT_ME_WORK => -1 ABOUT_ME_HOME => -1 ABOUT_ME_SOCIAL => -1 ABOUT_ME_FOOD => -1

Quantization ResultsGarbage in and Garbage out

LoL really dude that is the way to be no hits

is this garbage “LoL” = lots of love…..could you interpret this to be someone interested in social / friends?? Future – deeper interpretation / semantic analysis?

IndexedSmoker, Drinker, Gender, Relationship (some

networks), Looking for (some networks) , etc.

Example for Drinker:

opensocial.Enum.Drinker.HEAVILYopensocial.Enum.Drinker.NOopensocial.Enum.Drinker.OCCASIONALLYopensocial.Enum.Drinker.QUITopensocial.Enum.Drinker.QUITTINGopensocial.Enum.Drinker.REGULARLYopensocial.Enum.Drinker.SOCIALLYopensocial.Enum.Drinker.YES

Quantized Feature Vector107 elementsNormalize to 0 to 1.0 (near)

Advertisement DescriptionExperts manually determine the feature

vector weighting for each add.Future –

to automate this from survey/ input directly from Advertiser

Is there a way to analyze the ad message or image – image understanding? Will results even match advertiser’s goals.

PPARS --- Advertisement MatchingNot focus of this talkCurrently doing variations on KNN with

different forms of clusteringEarly results with small advertising database

and beginning Keyword database look goodWhat kinds of groups ---groups with user in it

or not? based on only in common feature elements or not.

PPARS- Advertisement DeliveryArea of future work could be in effective

delivery of “social message” related to selected add. Now simple form of direct delivery

Based on grouping of same gender and age and strong likesin interests on home.

PPARS- Advertisement DeliveryArea of future work could be in effective

delivery of “social message” related to selected add. Now simple form of direct delivery

Based on grouping of same gender and age and drinking.This is a grouping the user is not part of---only friends

Your friends Nathan and Marty will like this

PPARS- Advertisement DeliveryHere the grouping is “loose” only related by

gender and very loosely by age. So the advertisement match is not great

Question: should be only serve to “strong” groups?

Analysis of Advertisement ResultsGroupings are tight when data allowsMatches to advertisements in levels – best,

top 10, etc. are correct

Future WorkParsing – more syntax and semantics (NLP)

Parsing – differences in different languages.

Quantization – extend to Natural Language Understanding in addition/replacement of Keyword matching, effects of different evidence accumulation.

Data Extrapolation – using inference to create hits in more feature elements.

Documents

Lynne Grewe and Sushmita Pandey California State University East Bay [email protected]