Upload
digital-methods-initiative
View
396
Download
2
Tags:
Embed Size (px)
Citation preview
Crawling and ScrapingThe Issuecrawler and the Lippmannian device.
Michael Stevenson
Issuecrawler. What does it do?
Body text
Body Text
Site
A
B
C
CRAWL STARTING POINTS
Body text
Body Text
Site
A
B
C
CRAWL STARTING POINTS
Site
A
B
C
D
CRAWL DEPTH ONEfollow all starting points' outlinks
Body text
Body Text
Site
A
B
C
CRAWL STARTING POINTS
Site
A
B
C
D
CRAWL DEPTH ONEfollow all starting points' outlinks
Site
A
B
C
D
E
F
G
H
CRAWL DEPTH TWOfollow all outlinks from the pages found in the previous depth
Body text
Body Text
ANALYSIS SNOWBALLretain all links and sites discovered during the crawl
Site
A
B
C
D
E
F
G
H
Body text
Body Text
ANALYSIS INTER-ACTORretain only links between the starting points
Site
A
B
C
Body text
Body Text
ANALYSIS CO-LINKretain sites that receive links from at least two other sites
Site
B
D
Climate change blogs network
Starting points: blogroll from RealClimate.org
Climate change blogs network
Results: mix of blogs, social media, traditional media and governmental and non-governmental organizations.
Climate change science network
Starting points: “science links” from RealClimate.orgResults: mix of governmental, non-governmental, educational and media organizations
OK... We have the issue networks, but what can we
can say about their content?
Lippmannian device.(aka the google scraper)
What does it do?
1. Explore a source’s partisanship or commitment.
2. Show the issue agenda of an organization or movement.
Source cloud Issue cloud
Partisanship or commitment. Which sources mention the expert’s name?
Issue agenda. Which issues are on the agenda of an organization or movement?
Lippmannian device. “Source cloud”Showing the partisanship or
commitment of sources to one name
Craig Venter's presence in the Synthetic Biology issue space, March 2008. Top sources on "synthetic biology" according to a Google query, with number of mentions of Venter per source, ordered.
Lippmannian device. “Source cloud”
Method for showing the partisanship or commitment of sources to names
1. Gather source list (e.g. through Issuecrawler)2. Query source list for one or more experts
Lippmannian device. “Source cloud”Showing the partisanship or
commitment of sources to names
Climate Change Skeptics: Who recognizes them?
(Digital Methods Initiative, 2007)https://wiki.digitalmethods.net/Dmi/ClimateChangeSkeptics
Lippmannian device. “Making an Issue cloud”
An organization’s issue agenda (or commitment)
Public Knowledge, a digital rights NGO, has issues. Which are they most committed to?
Lippmannian device. “Issue cloud”
Showing the issue commitments of the NGO, Public Knowledge
Public Knowledge's issue commitment. Lower six issues on Public Knowledge's issue list, ranked according to number of mentions of issues on publicknowledge.org, 2 October 2009.
Lippmannian device. “Making an Issue cloud”
Greenpeace issues, http://www.greenpeace.org/international/campaigns.
Stop climate changeProtect ancient forestsDefending our OceansSay no to genetic engineeringEliminate toxic chemicalsDemand Peace and DisarmamentEnd the nuclear ageEncourage sustainable trade
Keep most significant issue language.
"climate change""ancient forests"oceans"genetic engineering""toxic chemicals"disarmament"nuclear power""sustainable trade"
Lippmannian device. “Issue cloud”
Greenpeace’s issue agenda (distribution of commitment)
Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of mentions of issues on greenpeace.org, 11 October 2009.
Lippmannian device. “Making an Issue cloud”
Multiple sources, multiple issues
What is the agenda of the global human rights network?
Which issues are at the top and
at the bottom of the agenda?
What is the current level of commitment to a particular issue?
Lippmannian device. “Making an Issue cloud”
Multiple sources, multiple issues
This is more complicated, but still doable(Govcom.org, University of Pittsburg, UMass Amhearst, ongoing)
Lippmannian device. “Making an Issue cloud”
Take three good lists of human rights organizations (global south, global north, UN’s)
Lippmannian device. “Making an Issue cloud”
Make a list of all issues listed on all Websites
Lippmannian device. “Issue cloud”
Showing the issue commitments of global human rights network
Global human rights issue agenda. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
Lippmannian device. “Issue cloud”
Showing the issue commitments of global human rights network
Global human rights issue agenda, bottom. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
Lippmannian device.
Partisanship check. Which side of the controversy is an actor on?
Use the source cloud
Lippmannian device.
1. Check an organziation’s issue agenda. What are its current commitments?
2. Check a national or global movement’s issue agenda. What are its current commitments?
Use the issue cloud
Questions.
Exercise: Sourcing Climate Change
Skeptics.
Body text
Body Text
Climate Change Sceptics on the Web (Frederick Seitz)
Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web?Findings_There is distance between the skeptics and the top of the search engine returns.
Source_google.comQuery_“Frederick Seitz”Method_Search for query “Frederick Seitz” in top 100. Organized in order.Tools_Google Scraper and Tag Cloud GeneratorDate_30 July 2007
Product_of the Digital Methods Initiative, dmi.mediastudies.nl. Analysis_by Bram Nijhof, Richard Rogers and Laura van der Vlies. Design_Anne Helmond.
CC_BY:NC:SA
campaigncc.org (1)
climateark.org (4)marshall.org (8)
realclimate.org (35)sourcewatch.org (21)
abc.net.au (0)
acfonline.org.au (0)
bbc.co.uk (0) bom.gov.au (0)
cbc.ca (0)
ciel.org (0)
climatechallenge.gov.uk (0)
climatechange.ca.gov (0)
climatechange.com.au (0)
climatechangecentral.com (0)
climatechangecollege.org (0)
climatecrisis.net (0)
climatescience.gov (0)
dar.csiro.au (0)
davidsuzuki.org (0)
defra.gov.uk (0)
dfat.gov.au (0)
ec.gc.ca (0)
ecn.ac.uk (0)
ecokids.ca (0)
ecy.wa.gov (0)
eea.europa.eu (0)
eldis.org (0)
energy.gov (0)
envirolink.org (0)
epa.gov (0)
exploratorium.edu (0)
faqs.org (0)
foe.co.uk (0)
ft.com (0)
g8.gov.uk (0)
gcrio.org (0)
greenpeace.org (0)
grida.no (0)
guardian.co.uk (0)
iea.org (0)
iisd.org (0)
ipcc.ch (0)
iucn.org (0)
ltscotland.org.uk (0)
metoffice.gov.uk (0)
mfe.govt.nz (0)
mofa.go.jp (0)
nature.com (0) nature.org (0)
ncdc.noaa.gov (0)
open2.net (0)
panda.org (0)
pewclimate.org (0)
royalsoc.ac.uk (0)
scidev.net (0)
scienceagogo.com (0)
state.gov (0)
theglobeandmail.com (0)
ucar.edu (0)
un.org (0)
unep.org (0)
who.int (0)
whoi.edu (0)
worldwildlife.org (0)
CLIMATE CHANGESCEPTICS
Research Question:Which climate change issue actors mention the skeptics, and what kinds of actors are more likely to mention them?
Method:Comparative Query: skeptics in three source sets (‘top’ sources, climate change blogs and climate change science network), outputting source cloud for each.
Source Sets:
(1) Top ten Google returns for “climate change” (mix of media as well as governmental organizations)
Source Sets:
(2) Climate change blogs network (IssueCrawler results - mix of blogs, social media, traditional media and governmental and non-governmental organizations)
Source Sets:
(3) Climate change science network (IssueCrawler results - governmental, non-governmental, educational and media organizations)
Steps:- Install the DMI toolbar, and open the Lippmannian device (aka Google Scraper - see tools.digitalmethods.net).
- Acquire source sets and skeptics list.
- Enter source sets and skeptics names. Query the source sets separately, and remember to use “” to get exact returns.
- Wait, fill in CAPTCHA’s if necessary. Also use this moment to discuss hypotheses.
- Explore the output, and present findings.