20
BACK TO BASICS: IG DATA AND EDUCATION IN THE SOCIAL SCIENCES Matthew S. Weber Rutgers University AEJMC 2014 Montreal, Canada

AEJMC 2014 - Big Data and Education

  • Upload
    mwe400

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

The role of big data in education for the social sciences.

Citation preview

Page 1: AEJMC 2014 - Big Data and Education

BACK TO BASICS: BIG DATA AND EDUCATION IN THE SOCIAL

SCIENCES Matthew S. WeberRutgers University

AEJMC 2014Montreal, Canada

Page 2: AEJMC 2014 - Big Data and Education

2

Page 3: AEJMC 2014 - Big Data and Education
Page 4: AEJMC 2014 - Big Data and Education
Page 5: AEJMC 2014 - Big Data and Education

5

Breaking down the walls of big data?

Page 6: AEJMC 2014 - Big Data and Education

6

http://archivehub.rutgers.edu

Page 7: AEJMC 2014 - Big Data and Education

EXAMPLE: Undergraduates

Page 8: AEJMC 2014 - Big Data and Education

Learning About Your Network• By being aware of your connections, you can take an active role

in managing your connections

– Be aware of the connections that you have, and what they contribute to your “network”

– Seek out networking opportunities

– Forge connections with people you admire and respect

Page 9: AEJMC 2014 - Big Data and Education
Page 10: AEJMC 2014 - Big Data and Education

LinkedIn Network Maps

Page 11: AEJMC 2014 - Big Data and Education
Page 12: AEJMC 2014 - Big Data and Education

Assignment Prompt

Prompt: Use www.touchgraph.com/facebook to generate a map of your Facebook network. Spend some time exploring your different connections, and then respond to the following:• What different types of clusters do you see? Be specific in identifying

at least 2 – 3 different clusters.• Is there someone in your network you forgot about? Who? Why?• Identify 2 people who you feel are the most useful connections in

your network based on where they are positioned. Who are they and why are they useful?

12

Page 13: AEJMC 2014 - Big Data and Education

EXAMPLE: PhD

Page 14: AEJMC 2014 - Big Data and Education
Page 15: AEJMC 2014 - Big Data and Education

SET DEFAULT_PARALLEL 30;titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response-Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target-URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header-Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS (links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray);

nonnulls = filter titles by links is not null;paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4;i6 = foreach paths generate bagwati.url,$1,$2,$3; i7 = foreach i6 generate flatten($0) as words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3;

i8 = foreach i7 generate org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.getText($0),$1,$2,(long)$3;

i9 = group i8 by ($0,$1,$3);i10 = foreach i9 generate FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5);

i11 = filter i10 by $0 is not null;i12 = filter i11 by $1 is not null;store i12 INTO '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/HC_Output' using PigStorage();

Page 16: AEJMC 2014 - Big Data and Education

SET DEFAULT_PARALLEL 30;titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response-Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target-URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header-Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS (links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray);

nonnulls = filter titles by links is not null;paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4;i6 = foreach paths generate bagwati.url,$1,$2,$3; i7 = foreach i6 generate flatten($0) as words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3;

i8 = foreach i7 generate org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.getText($0),$1,$2,(long)$3;

i9 = group i8 by ($0,$1,$3);i10 = foreach i9 generate FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5);

i11 = filter i10 by $0 is not null;i12 = filter i11 by $1 is not null;store i12 INTO '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/HC_Output' using PigStorage();

Page 17: AEJMC 2014 - Big Data and Education

SET DEFAULT_PARALLEL 30;titles = LOAD '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/nsf1.wat.gz' USING org.archive.hadoop.ArchiveJSONViewLoader('Envelope.Payload-Metadata.HTTP-Response-Metadata.HTML-Metadata','Envelope.WARC-Header-Metadata.WARC-Target-URI','Envelope.WARC-Header-Metadata.WARC-Date','Envelope.WARC-Header-Metadata.Content-Type','Envelope.WARC-Header-Metadata.Content-Length') AS (links:chararray,target:chararray,date:chararray,contenttype:chararray,contentlength:chararray);

nonnulls = filter titles by links is not null;paths = foreach nonnulls generate org.sci.historycrawl.parser($0,$1,$2),$2,$3,$4;i6 = foreach paths generate bagwati.url,$1,$2,$3; i7 = foreach i6 generate flatten($0) as words,org.sci.historycrawl.formatdate(SUBSTRING($1,0,10)),$2,$3;

i8 = foreach i7 generate org.sci.historycrawl.getsourceURL($0),org.sci.historycrawl.getdstURL($0),org.sci.historycrawl.getText($0),$1,$2,(long)$3;

i9 = group i8 by ($0,$1,$3);i10 = foreach i9 generate FLATTEN(group),FLATTEN(TOP(1,0,i8.$2)),COUNT(i8),FLATTEN(TOP(1,0,i8.$4)),SUM(i8.$5);

i11 = filter i10 by $0 is not null;i12 = filter i11 by $1 is not null;store i12 INTO '/home/hai/Projects/HistoryCrawl/Data/IA/2_26_2014/HC_Output' using PigStorage();

Page 18: AEJMC 2014 - Big Data and Education

18

Source | Destination | Date | Frequency | Content Type | Bytes | Descriptive Text

Link Data:

http://gawker.com/5953665/mitt-romneys-staff-played-the-media-covering-them-in-a-friendly-game-of-flag-football

Mitt Romney's Staff Played the Media Covering Them in a Friendly Game of Flag

http://gawker.com

2012-10-22

Page 19: AEJMC 2014 - Big Data and Education

19

Dataset Research Potential Dates Captures Unique URLs

Hurricane KatrinaOnline networks and organizational resilience (Chewning, Lai and Doerfel, 2012; Perry, Taylor and Doerfel, 2003) in the wake of disasters; information dissemination

2003 – 2012 1,694,236 663,740

Superstorm Sandy 2003 – 2012 41,703,112 20,013,455

US SenateStudy the growth of political activity in online environments (Adamic & Glance, 2005; Bruns, 2007; Chang & Park, 2012); polarization & media discourse

109th – 112th Congresses

26,965,770 8,674,397

US House 51,840,777 12,410,014

Occupy Wall Street

Previous research on NGOs in the online environment (Bach & Stark, 2004; Shumate, 2003, 2012; Shumate, Fulk, & Monge, 2005); use of hyperlink data to study the formation and role of alliances between SMOs

2010 – 2012 247,928,272 11,3259,655

US MediaPrevious studies of news media organizations (Greer & Mensing, 2006; Weber, 2012; Weber & Monge, In Press); focus on evolutionary patterns

2008 – 2012 1,315,132,555 539,184,823

Page 20: AEJMC 2014 - Big Data and Education

• Email me! [email protected]• ArchiveHub: http://archivehub.rutgers.edu

• The Team– Kris Carpenter, Vinay Goel, Internet Archive – David Lazer, Katherine Ognyanova, Northeastern University – Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University– Peter Monge, Ayushman Datta, Kristen Guth, USC

20

Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers