19
Workshop on Data Journalism February 17, 2014 Ghent How to get the data and how to process them? Lorenzo Pellizzari 1

DataJournalism: How To get data and process them?

Embed Size (px)

DESCRIPTION

Workshop on datajournalism given at the DataDays organised by the Open Knowledge Foundation on the 17th of February 2014.

Citation preview

Page 1: DataJournalism: How To get data and process them?

Workshop on Data Journalism

February 17, 2014Ghent

How to get the data and

how to process them?

Lorenzo Pellizzari1

Page 2: DataJournalism: How To get data and process them?

2

About me …

Page 3: DataJournalism: How To get data and process them?

Get the data

Receive it

Advanced search techniquesScrape it

How to get the data?

3

Page 4: DataJournalism: How To get data and process them?

Receive it

4

1

Analyzing the War Logs (Associated Press)

Page 5: DataJournalism: How To get data and process them?

Advanced search techniques: Google

5

2

79.300.000 results

5results

Page 6: DataJournalism: How To get data and process them?

Advanced search techniques: SPARQL

6

2

http://dbpedia.org/sparql

Page 7: DataJournalism: How To get data and process them?

Advanced search techniques: SPARQL

7

2

Page 8: DataJournalism: How To get data and process them?

Advanced search techniques: SPARQL

8

2

http://latemar.science.unitn.it/spacetime/spacetime.html

Page 9: DataJournalism: How To get data and process them?

Freedom of Information laws

9

3

Page 10: DataJournalism: How To get data and process them?

Freedom of Information laws

10

3

Page 11: DataJournalism: How To get data and process them?

Scrape your data

11

4

“Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.” (Wikipedia)

http://www-news.iaea.org/

Page 12: DataJournalism: How To get data and process them?

Scrape your data

12

4

Page 13: DataJournalism: How To get data and process them?

Scrape your data

13

4

Page 14: DataJournalism: How To get data and process them?

14

What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters]

Process the data

http://www.kdnuggets.com/

Page 15: DataJournalism: How To get data and process them?

15

The software for data analysis

Share of R- or SAS-related posts to Stack Overflow by week.

http://r4stats.com/articles/popularity/

Page 16: DataJournalism: How To get data and process them?

16

The software for data analysis

Page 17: DataJournalism: How To get data and process them?

17

Example: ABC News

Scraping: Main data coming from gouvernemental websites

Variety of reports: Data on salt and water

FOI: Data on chemical releases

Interactive map of gas wells and leases in Australia

http://datajournalismhandbook.org/

Page 18: DataJournalism: How To get data and process them?

18

Example: ABC News

• A web developer and designer

• A lead journalist

• A part time researcher with expertise in data extraction, excel spread sheets and data cleaning

• A part time junior journalist

• A consultant executive producer

• A academic consultant with expertise in data mining, graphic visualization and advanced research skills

• The services of a project manager and the administrative assistance of the ABC’s multi-platform unit

• Importantly we also had a reference group of journalists and others whom we consulted on a needs basis

http://datajournalismhandbook.org/

Page 19: DataJournalism: How To get data and process them?

19