29
@peterbroadwell and @mart1nkle1n Linking Born-Digital News and Social Media Collections via Automated Entity Detection and Authority Linking Born-Digital News and Social Media Collections via Automated Entity Detection and Authority Linking Born-Digital News and Social Media Collections via Automated Entity Detection and Authority Matching Peter Broadwell @peterbroadwell Martin Klein @mart1nkle1n University of California Los Angeles Research Library

Linking Born-Digital News and Social Media Collections via Automated Entity Detection and Authority Matching

Embed Size (px)

Citation preview

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

Linking Born-Digital News and Social Media Collections via Automated Entity Detection and Authority Matching

DLF Forum, Vancouver, 28 October 2015

Linking Born-Digital News and Social Media Collections

via Automated Entity Detection and

Authority Matching

Peter Broadwell@peterbroadwell

Martin Klein@mart1nkle1n

University of California Los AngelesResearch Library

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

2

• Collected by researchers• Donated by activists• Diverse in format:

• Images, audio, video, scanned documents, social media, web server logs

Collections related to news events

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

3

• 829 digitally recorded Iranian dissident news programs• 9,166 other videos from the Iranian Green Movement• 29,441 digital photographs from the Green Movement• 543 documents from Tahrir Square

International Digitizing Ephemera

http://digital.library.ucla.edu/dep/

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

4

• Tahrir Square Egypt & Libya unrest, 2011• Tōhoku earthquake and tsunami, Japan, 2011• AirAsia 8501 crash, December 2014• Charlie Hebdo shooting, January 2015

International Digitizing Ephemera – Tweets

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

5

• Tahrir Square Egypt & Libya unrest, 2011• Tōhoku earthquake and tsunami, Japan, 2011• AirAsia 8501 crash, December 2014• Charlie Hebdo shooting, January 2015

International Digitizing Ephemera – Tweets

Social Feed Managerhttp://social-feed-manager.readthedocs.org/

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

6

NewsScape

• 264,000 hours of TV news archived digitally• Recorded 2005-present, ca. 100 shows/day• 13 countries, 9 languages• 38 networks• Searchable by captions, on-screen text, official transcripts

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

7

NewsScape

• 264,000 hours of TV news archived digitally• Recorded 2005-present, ca. 100 shows/day• 13 countries, 9 languages• 38 networks• Searchable by captions, on-screen text, official transcripts

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

8

NewsScape

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

9

Social Local Global

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

10

Linking social media, TV news, and web news

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

11

Linking social media, TV news, and web newsCollection on AirAsia QZ8501 crash on 12/28/2014, recorded TV and social media through 1/17/2015• 7.3 million tweets containing #AirAsia or #QZ8501• 1.3 million distinct users• 262 distinct television recordings• 1,535 on-air mentions of AirAsia or [QZ]8501• ~3,000 on-screen appearances of AirAsia or [QZ]8501

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

12

Linking social media, TV news, and web newsCollection on AirAsia QZ8501 crash on 12/28/2014, recorded TV and social media through 1/17/2015• 7.3 million tweets containing #AirAsia or #QZ8501• 1.3 million distinct users• 262 distinct television recordings• 1,535 on-air mentions of AirAsia or [QZ]8501• ~3,000 on-screen appearances of AirAsia or [QZ]8501

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

13

Linking via Automated Entity Detection

• Discover and highlight commonalities and relationships between disjoint collections on related news events• Link to authorities• Address problem of disambiguation

• Establish workflow for automatic linking • Integration with search and discovery interfaces• Exposure via APIs

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

14

CNN09/16/201505:22pm

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

15

CNN09/16/201505:22pm

Twitter09/16/2015

06:22pm

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

16

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

17

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

18

Experiment 1/3

• Apply DBpedia Spotlight Named Entity Recognition (NER) software to collections on second GOP presidential primary debate on 09/16/2015• Twitter: 800,000 tweets• TV: CNN coverage of debate• Minute granularity• Persons, Organizations, Places

Results:• Linked entities with URIs to DBpedia resources• Visualization of correlations between entities

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

19

Experiment 1/3 - Persons

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

20

Experiment 1/3 - Places

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

21

Experiment 1/3 - Organizations

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

22

Experiment 2/3

• Expanded range of TV news coverage up to 4 days after the debate on 17 local, U.S., and international channels

Results:• Discovery of related news shows by matching terms

and entities from Twitter• Visualization highlighting degree of relationships

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

23

Experiment 2/3 – Terms matched

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

24

Experiment 2/3 – Persons matched

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

25

Experiment 3/3

• Automatic geocoding of extracted place names from Twitter and CNN coverage

Results:• Using geographical proximity to explore potentially

relevant correlations • Visualization of places/regions and their frequency of

reference in each collection

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

26

Experiment 3/3 – Twitter

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

27

Experiment 3/3 – NewsScape

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

28

Next steps

• Apply techniques to other DL collections• Explore special domain customization of NER tools and

authority matching• Investigate methods to quantify collection overlap• Incorporate more linked open data ontologies• Improve support of other languages

@peterbroadwell and @mart1nkle1nLinking Born-Digital News and Social Media Collections via

Automated Entity Detection and Authority MatchingDLF Forum, Vancouver, 28 October 2015

Linking Born-Digital News and Social Media Collections via Automated Entity Detection and Authority Matching

DLF Forum, Vancouver, 28 October 2015

Linking Born-Digital News and Social Media Collections

via Automated Entity Detection and

Authority Matching

Peter Broadwell@peterbroadwell

Martin Klein@mart1nkle1n

University of California Los AngelesResearch Library