20
Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al

  • Upload
    dara

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Discovering Important Bloggers based on Analyzing Blog Threads by Nakajima et al. Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith. Introduction. People increasingly publish their reactions to public events using a blog - PowerPoint PPT Presentation

Citation preview

Page 1: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Thomas van der Elsen, Richard Lawrence,

Jumi Oladimeji, Alastair Smith

Page 2: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

IntroductionPeople increasingly publish their reactions to

public events using a blogA tool that enables this info to be published quicklyA journal that is available on the web

Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web)

“Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”

Page 3: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

OverviewData-mining techniques

Creation of blog link structureAnalysing link structure

Types of important bloggersAgitatorsSummarisers

Applications, analysis and conclusionsReal-world applications and extensionsPros and cons of the paper

Page 4: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Crawling blogsExtracting hyperlinksExtracting blog threads

Page 5: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Crawling blogs

System crawls through RSS list registering for each entry:TitlePermalink List entry date

Aggregator: gathers RSS feeds from multiple sources and organises them

OPML: file format used to share RSS feed lists

RSS: A format for distributing content on the web

Aggregators

RSS list

RSS feeds

OPML

Page 6: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Extracting hyperlinks

Problem: Different tag structures per server

RSS feed from list

Description

Blog entries

Hyperlink list

Page 7: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Extracting blog threadsHyperlink

If sourceLinkIf replyLink

Check links exist in thread data

Add

Check departure URL exists in thread data

Check destination URL points to entry on list

&&

Add dest entry to thread

11

Add destination entry to entry list and add to thread

10

Add departure entry to thread

01Create new thread

00

Page 8: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Example Results

Page 9: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

AgitatorsSummarisersJoe Bloggs

Page 10: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

AgitatorsDiscussion stimulatorThreads often grow after an agitator’s entryThree discriminants for an agitator

Link (Agi1)Popularity (Agi2)Topic (Agi3)

The three discriminants can be weighted using the following formula:

Page 11: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Link-based Discriminantex is an agitator if

(kx) > θ1

ex = a blog entry

kx = no of entries

in threadi with a

replyLink to ex

Page 12: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Popularity-based discriminantex is an agitator if

(lx/mx) > θ2

ex = a blog entrylx = no of entries in

threadi

published t days after ex

mx = no of entries in

threadi published t days

before ex

Page 13: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Topic-based discriminantex is an agitator if

ex = a blog entry

n = number of entries

Page 14: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Summarizers Publish entries that collate

and compact previous posts Provide a convenient way of

digesting an entire thread The discriminant for

summarizers is link-based:ex is a summarizer if

(px) > θ4

ex = a blog entry

px = number of entries in threadi that have a replyLink from ex

Page 15: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

ApplicationsPros and ConsConclusions

Page 16: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

ApplicationsSupplementary info e.g. TV, news site etc

Home and Away – who shot Josh West Agitator

Sports, etc. – used by studios and media to highlight points of interest in a match Summariser

Page 17: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Analysis – ProsBasis for future research – a brief intro to the

subject. Multiple thread analysisIdentification of areas of bloggers’ expertise

Highly effective in certain specific areasNews and reviews

Implementation of theory (feature vector)

Page 18: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Analysis – ConsOnly 25 sites used in sample (but 1000s of

blogs)Does not take context into consideration

E.g., an agitator may be posting offensive entries

No measurement of summary successComments are not analysedInappropriate for certain areas

MySpace, Bebo, et al. (due to target audience)

Page 19: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

ConclusionsCreated a data-mining framework for future

researchMay instigate research into further work

Nice idea and potentially useful but needs to be extended

Page 20: Discovering Important  Bloggers  based on Analyzing Blog Threads  by Nakajima et al

Thank you for your time