59
Modeling Temporal Intention in Resource Sharing Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013 Hany M. SalahEldeen & Michael L. Nelson Old Dominion University Department of Computer Science Web Science and Digital Libraries Lab. WADL 2013

Modeling Temporal Intention in Resource Sharing

  • Upload
    kedem

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Modeling Temporal Intention in Resource Sharing. Hany M. SalahEldeen & Michael L. Nelson. Old Dominion University. Department of Computer Science Web Science and Digital Libraries Lab. WADL 2013. Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013. - PowerPoint PPT Presentation

Citation preview

Page 1: Modeling Temporal Intention in Resource Sharing

Modeling Temporal Intention in Resource Sharing

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Hany M. SalahEldeen & Michael L. NelsonOld Dominion University

Department of Computer ScienceWeb Science and Digital Libraries Lab.

WADL 2013

Page 2: Modeling Temporal Intention in Resource Sharing

All tweets are equal…

…but some are more equal than the others

Hany SalahEldeen & Michael Nelson 01 Modeling Temporal Intention. WADL2013

Page 3: Modeling Temporal Intention in Resource Sharing

Preliminary research questions:

1. How long would these last?2. And if lost, is there backup somewhere?3. Is this what the author intended?

Hany SalahEldeen & Michael Nelson 01 Modeling Temporal Intention. WADL2013

Page 4: Modeling Temporal Intention in Resource Sharing

Since tweets are considered the first draft of history… the historical integrity of the tweets could be compromised.

Hany SalahEldeen & Michael Nelson 02 Modeling Temporal Intention. WADL2013

Historical integrity

Page 5: Modeling Temporal Intention in Resource Sharing

People rely on social media for most updated information

Hany SalahEldeen & Michael Nelson 03 Modeling Temporal Intention. WADL2013

Page 6: Modeling Temporal Intention in Resource Sharing

The life cycle of a social post

Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013

Page 7: Modeling Temporal Intention in Resource Sharing

The life cycle of a social post

tweets

Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013

Page 8: Modeling Temporal Intention in Resource Sharing

The life cycle of a social post

tweets Links to

Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013

Page 9: Modeling Temporal Intention in Resource Sharing

The life cycle of a social post

tweets

What the reader

receives

Links to

Same state the author intended

Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013

Page 10: Modeling Temporal Intention in Resource Sharing

The life cycle of a social post

tweets

What the reader

receives

Links to

Same state the author intended

Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013

The resource has disappeared

Page 11: Modeling Temporal Intention in Resource Sharing

The life cycle of a social post

tweets

What the reader

receives

Links to

Same state the author intended

The resource has disappeared

The resource has changed

Hany SalahEldeen & Michael Nelson 04 Modeling Temporal Intention. WADL2013

Page 12: Modeling Temporal Intention in Resource Sharing

Same state the author intended

Resource’s possibilitiesWhat the

reader receives

The resource has disappeared

The resource has changed

Hany SalahEldeen & Michael Nelson 05 Modeling Temporal Intention. WADL2013

Page 13: Modeling Temporal Intention in Resource Sharing

Same state the author intended

Resource’s possibilities

a bigger problem since the reader might not know.

What the reader

receives

The resource has disappeared

The resource has changed

Hany SalahEldeen & Michael Nelson 05 Modeling Temporal Intention. WADL2013

Page 14: Modeling Temporal Intention in Resource Sharing

We could lose the linked resource

Hany SalahEldeen & Michael Nelson 06 Modeling Temporal Intention. WADL2013

Page 15: Modeling Temporal Intention in Resource Sharing

The attack on the embassy was in February 2013

Or the resource could change

Hany SalahEldeen & Michael Nelson 07 Modeling Temporal Intention. WADL2013

Page 16: Modeling Temporal Intention in Resource Sharing

Why do we want to detect the Author’s Temporal Intention?

• Match: and convey the intended information.• Notify:– the author that the resource is prone to change.– the reader that the resource has changed.

• Preserve: the resource by pushing snapshots into the archive automatically.

• Retrieve: the closest archived version to maintain the consistency.

Hany SalahEldeen & Michael Nelson 08 Modeling Temporal Intention. WADL2013

Page 17: Modeling Temporal Intention in Resource Sharing

Our investigation angles

1. The state of the archived content2. The age of the shared resource 3. The states of the resource:

1. Missing from the live web2. Changed from what the author intended to share

4. Detect the author’s intention and collect a dataset5. Model this intention6. Create a time-based navigation tool to match the predicted

intention

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Page 18: Modeling Temporal Intention in Resource Sharing

Estimating web archiving coverage• Goal: Estimate how much of the public web is present in the public archives and how

many copies are available?• Action:

– Getting 4 different datasets from 4 different sources:• Search Engines Indices• Bit.ly• DMOZ• Delicious.

• Results: *

• Publications: – How much of the web is archived? JCDL '11

* Table Courtesy of Ahmed AlSum JCDL 2011

Hany SalahEldeen & Michael Nelson 09 Modeling Temporal Intention. WADL2013

Page 19: Modeling Temporal Intention in Resource Sharing

Our investigation angles

1. The state of the archived content2. The age of the shared resource 3. The states of the resource:

1. Missing from the live web2. Changed from what the author intended to share

4. Detect the author’s intention and collect a dataset5. Model this intention6. Create a time-based navigation tool to match the predicted

intention

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Page 20: Modeling Temporal Intention in Resource Sharing

The timeline of the resource

Hany SalahEldeen & Michael Nelson 10 Modeling Temporal Intention. WADL2013

Page 21: Modeling Temporal Intention in Resource Sharing

Timestamps accumulation

Hany SalahEldeen & Michael Nelson 11 Modeling Temporal Intention. WADL2013

Page 22: Modeling Temporal Intention in Resource Sharing

Our investigation angles

1. The state of the archived content2. The age of the shared resource 3. The states of the resource:

1. Missing from the live web2. Changed from what the author intended to share

4. Detect the author’s intention and collect a dataset5. Model this intention6. Create a time-based navigation tool to match the predicted

intention

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Page 23: Modeling Temporal Intention in Resource Sharing

• From Twitter, Websites, Books:• The Egyptian revolution.

• From Twitter Only:• Stanford’s SNAP dataset:• Iranian elections.• H1N1 virus outbreak.• Michael Jackson’s death.• Obama’s Nobel Peace Prize.

• Twitter API:• The Syrian uprising.

Six socially significant events

Hany SalahEldeen & Michael Nelson 12 Modeling Temporal Intention. WADL2013

Page 24: Modeling Temporal Intention in Resource Sharing

Resources missing & archived

Hany SalahEldeen & Michael Nelson 13 Modeling Temporal Intention. WADL2013

Page 25: Modeling Temporal Intention in Resource Sharing

Revisiting after a year…

Hany SalahEldeen & Michael Nelson 14 Modeling Temporal Intention. WADL2013

Page 26: Modeling Temporal Intention in Resource Sharing

Measured vs. predicted

Hany SalahEldeen & Michael Nelson 15 Modeling Temporal Intention. WADL2013

Page 27: Modeling Temporal Intention in Resource Sharing

Interesting phenomenon: reappearance on the live web and disappearance from

the archives

Hany SalahEldeen & Michael Nelson 16 Modeling Temporal Intention. WADL2013

Page 28: Modeling Temporal Intention in Resource Sharing

Reappearing and disappearance predictions

Hany SalahEldeen & Michael Nelson 17 Modeling Temporal Intention. WADL2013

Page 29: Modeling Temporal Intention in Resource Sharing

Our investigation angles

1. The state of the archived content2. The age of the shared resource 3. The states of the resource:

1. Missing from the live web2. Changed from what the author intended to share

4. Detect the author’s intention and collect a dataset5. Model this intention6. Create a time-based navigation tool to match the predicted

intention

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Page 30: Modeling Temporal Intention in Resource Sharing

Temporal Intention Relevancy Model( TIRM)

Between ttweet and tclick:

The linked resource could have:• Changed• Not changed

The tweet and the linked resource could be:• Still relevant• No longer relevant

Hany SalahEldeen & Michael Nelson 18 Modeling Temporal Intention. WADL2013

Page 31: Modeling Temporal Intention in Resource Sharing

Resource is changed but relevant

• The resource changed• But it is still relevant

Intention: need the current version of the resource at any time

Hany SalahEldeen & Michael Nelson 19 Modeling Temporal Intention. WADL2013

Page 32: Modeling Temporal Intention in Resource Sharing

Relevancy and intention mapping

Current

Hany SalahEldeen & Michael Nelson 20 Modeling Temporal Intention. WADL2013

Page 33: Modeling Temporal Intention in Resource Sharing

Resource is changed and not relevant

Intention: need the past version of the resource at any time

• The resource changed• But it is no longer relevant

Hany SalahEldeen & Michael Nelson 21 Modeling Temporal Intention. WADL2013

Page 34: Modeling Temporal Intention in Resource Sharing

Past

Relevancy and intention mapping

Current

Hany SalahEldeen & Michael Nelson 22 Modeling Temporal Intention. WADL2013

Page 35: Modeling Temporal Intention in Resource Sharing

Resource is not changed and relevant

Intention: need the past version of the resource at any time

• The resource is not changed• And it is relevant

Hany SalahEldeen & Michael Nelson 23 Modeling Temporal Intention. WADL2013

Page 36: Modeling Temporal Intention in Resource Sharing

Past

Relevancy and intention mapping

Current

Past

Hany SalahEldeen & Michael Nelson 24 Modeling Temporal Intention. WADL2013

Page 37: Modeling Temporal Intention in Resource Sharing

Resource is not changed and not relevant

Intention: I am not sure which version of the resource I need

• The resource is not changed• But it is not relevant

Hany SalahEldeen & Michael Nelson 25 Modeling Temporal Intention. WADL2013

Page 38: Modeling Temporal Intention in Resource Sharing

Past

Relevancy and intention mapping

Current

Past Not Sure

Hany SalahEldeen & Michael Nelson 26 Modeling Temporal Intention. WADL2013

Page 39: Modeling Temporal Intention in Resource Sharing

Our investigation angles

1. The state of the archived content2. The age of the shared resource 3. The states of the resource:

1. Missing from the live web2. Changed from what the author intended to share

4. Detect the author’s intention and collect a dataset5. Model this intention6. Create a time-based navigation tool to match the predicted

intention

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Page 40: Modeling Temporal Intention in Resource Sharing

Feature extraction

• For each tweet we perform:– Link analysis– Social Media Mining– Archival Existence– Sentiment Analysis– Content Similarity– Entity Identification

Hany SalahEldeen & Michael Nelson 27 Modeling Temporal Intention. WADL2013

Page 41: Modeling Temporal Intention in Resource Sharing

• To remove confusion we removed the close calls

898 instances remaining

Relevant Assignments 929 82.65%

Non-Relevant Assignments 195 17.35%

5 MT workers agreeing (5-0 split) 589 52.40%

4 MT workers agreeing (4-1 split) 309 27.49%

3 MT workers agreeing (3-2 close call split) 226 20.11%

Modeling and classification using Mechanical Turk

Hany SalahEldeen & Michael Nelson 28 Modeling Temporal Intention. WADL2013

Page 42: Modeling Temporal Intention in Resource Sharing

The trained classifier

• From the feature extraction phase we extracted 39 different features to train the classifier.

• Using 10-fold cross validation, the Cost Sensitive Classifier Based on Random Forests gave the highest success rate = 90.32%

Hany SalahEldeen & Michael Nelson 29 Modeling Temporal Intention. WADL2013

Page 43: Modeling Temporal Intention in Resource Sharing

Testing the model10-Fold Cross-Validation Testing

Classifier Mean Absolute Error

Root Mean Squared Error

Kappa Statistic

Incorrectly Classified %

Correctly Classified %

Cost sensitive classifier based on Random Forest

0.15 0.27 0.39 9.68% 90.32%

Classifier Precision Recall F-measure Class

Cost sensitive classifier based on Random Forest

0.930.53

0.960.37

0.950.44

RelevantNon-Relevant

Weighted Average 0.89 0.90 0.90

Hany SalahEldeen & Michael Nelson 30 Modeling Temporal Intention. WADL2013

Page 44: Modeling Temporal Intention in Resource Sharing

Our investigation angles

1. The state of the archived content2. The age of the shared resource 3. The states of the resource:

1. Missing from the live web2. Changed from what the author intended to share

4. Detect the author’s intention and collect a dataset5. Model this intention6. Create a time-based navigation tool to match the predicted

intention

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Page 45: Modeling Temporal Intention in Resource Sharing

TimeLord Navigator

Hany SalahEldeen & Michael Nelson 31 Modeling Temporal Intention. WADL2013

Page 46: Modeling Temporal Intention in Resource Sharing

Thanks!

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Hany SalahEldeenWeb Science & Digital Libraries Old Dominion University

Email: [email protected]

@hanysalaheldeen

Hany SalahEldeen

Page 47: Modeling Temporal Intention in Resource Sharing

TimeLord Navigator

Hany SalahEldeen & Michael Nelson Modeling Temporal Intention. WADL2013

Demo:

www.cnn.com

www.bbc.com

Page 48: Modeling Temporal Intention in Resource Sharing
Page 49: Modeling Temporal Intention in Resource Sharing

Evaluation

Hany SalahEldeen & Michael Nelson 13 Modeling Temporal Intention. WADL2013

Page 50: Modeling Temporal Intention in Resource Sharing

Actual Vs. Estimated Dates

Hany SalahEldeen & Michael Nelson 14 Modeling Temporal Intention. WADL2013

Page 51: Modeling Temporal Intention in Resource Sharing

Resources Missing & ArchivedCollection Percentage Missing Percentage Archived

23.49%H1N1 Outbreak 41.65%

36.24%Michael Jackson 39.45%

26.98%Iran 43.08%

24.59%Obama 47.87%

10.48%Egypt 20.18%

7.04%Syria 5.35%

31.62% 30.78%

24.47% 36.26%

25.64% 43.87%

26.15% 46.15%

Hany SalahEldeen & Michael Nelson 16 Modeling Temporal Intention. WADL2013

Page 52: Modeling Temporal Intention in Resource Sharing

First Attempts to Shared Content Replacement

Hany SalahEldeen & Michael Nelson 22 Modeling Temporal Intention. WADL2013

Page 53: Modeling Temporal Intention in Resource Sharing

Link analysis

• Since the tweets have embedded resources shortened by Bit.ly we can extract:– Total number of clicks– Hourly click logs– Creation dates– Referring websites– Referring countries.

• We calculate the depth of the resource in relation to its domain (either it is a leaf node or a root page)– We calculated the number of backslashes in the resource’s URI

Hany SalahEldeen & Michael Nelson 29 Modeling Temporal Intention. WADL2013

Page 54: Modeling Temporal Intention in Resource Sharing

Social Media Mining

• Twitter:– Using Topsy.com’s API to

extract:• Total number of tweets.• The most recent 500.• Number of tweets by

influential users.

The collection of tweets extracted provided an extended context of the resource authored by users in the twittersphere.

Hany SalahEldeen & Michael Nelson 30 Modeling Temporal Intention. WADL2013

Page 55: Modeling Temporal Intention in Resource Sharing

Social Media Mining• Facebook:– Mined too for likes, shares, posts, and clicks related to each

resource.

Hany SalahEldeen & Michael Nelson 31 Modeling Temporal Intention. WADL2013

Page 56: Modeling Temporal Intention in Resource Sharing

Archival Existence• Using Memento Time

Maps we get:– Total mementos

available– Different archives count.– The closest archived

version to the tweet time.

Hany SalahEldeen & Michael Nelson 32 Modeling Temporal Intention. WADL2013

Page 57: Modeling Temporal Intention in Resource Sharing

Sentiment Analysis• Using NLTK libraries of natural language text processing• Extract the most prominent sentiment in the text

Hany SalahEldeen & Michael Nelson 33 Modeling Temporal Intention. WADL2013

Page 58: Modeling Temporal Intention in Resource Sharing

Content Similarity• Steps:– We download the content HTML using Lynx browser.– We apply boilerplate removal algorithm and full text extraction.– Calculate the cosine similarity between the two pages.

70% similarity

Hany SalahEldeen & Michael Nelson 35 Modeling Temporal Intention. WADL2013

Page 59: Modeling Temporal Intention in Resource Sharing

Entity Identification• By visual inspection we observed that the majority of tweets about

celebrities are related to current events.• We harvested Wikipedia for lists of actors, politicians, and athletes.• Checked the existence of a celebrity mention in the tweets.

Actor: Johnny Depp

Hany SalahEldeen & Michael Nelson 36 Modeling Temporal Intention. WADL2013