34
M A K I N G T H E G O V D A T A O P E N M A R E K S O T A K | A T O M I C A N T w w w . a t o m i c a n t . c o . u k may 2 0 1 1

Making the gov data more open

Embed Size (px)

DESCRIPTION

http://spring2011.drupalcamp.se/schedule/making-government-data-open-drupal-and-other-tools

Citation preview

M A K I N G   T H E   G O V   D A T A   O P E NM A R E K   S O T A K   |   A T O M I C   A N T  

w w w . a t o m i c a n t . c o . u k

may 2 0 1 1

O H   H A I !A B O U T   M E   &   A T O M I C   A N T

a t o m i c a n t . c o . u k

Marek Sotak• Web designer, developer• From Prague, Czech Republic• Over 5 years with Drupal - since v4.6• Rootcandy admin theme• Organising events - Drupal Design Camp, Local Meet-ups

• @sotak on twitter• http://sotak.co.uk - personal blog/experiments

6 : 0 2 : 1

#justsaying ;)

O H   H A I !A B O U T   M E   &   A T O M I C   A N T

• Based in London & Prague• Human interface design, training, branding, development• Clients all over the world• http://atomicant.co.uk

O P E N D A T A ?H U H ?

a t o m i c a n t . c o . u k

What is OPEN DATA?

O P E N D A T A ?H U H ?

a t o m i c a n t . c o . u k

Wikileaks Iraq war logs: every death mapped http://bit.ly/iraqwarlogs

O P E N D A T A ?H U H ?

a t o m i c a n t . c o . u k

Don't eat at ____ http://donteat.at

O P E N D A T A ?H U H ?

a t o m i c a n t . c o . u k

Don't eat at - http://donteat.at/

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

BigClean.org – Prague

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

There's a lot of data laying around on the internet that can be useful → Crime reports, government reports, statistics, missing pets register, current affairs

However sometimes they are in a format such as PDF, html, etc... something you can't really take and perform calculations, visualizations, filtering, etc... on.

Is it really that hard to publish something in a CSV, XML,.. ?

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

Ministry of the interior – Czech RepublicPublic Collections - open what?

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

a t o m i c a n t . c o . u k

Request a site/content

Run through the html – DOM - selectors

Do whatever you want with the data

Save the data

S C R A P E R W I K IR E F I N E A N D S C R A P E D A T A

a t o m i c a n t . c o . u k

S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

Scrape and link data using Ruby, Python and PHP scripts that run maintenance-free in the cloud. Request data for scoops and better decisions.

D A T A M I N I N G - S C R A P I N GL E T ' S G E T D I R T Y

S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

Why would you want to use SCRAPERWIKI rather than other scraping tools or custom code?

S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

• The dataset is available to everyone• Anyone can access the data through API• If the source changed and the scraper brakes, anyone can

fix the scraper• Anyone can fork the scraper

I S T H A T I T ?C E R T A I N L Y N O T

S C R A P E R W I K IW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

G O O G L E R E F I N EW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services,...

V I S U A L I S ET E L L T H E S T O R Y

a t o m i c a n t . c o . u k

There is more to that

It's just not data with values in a spreadsheet or database

Data can tell the story!

G O O G L E F U S I O N T A B L E SW H A T I S I T ? H O W T O U S E I T

a t o m i c a n t . c o . u k

Easy visualisation http://tables.googlelabs.com/

S C R A P I N G W I T H D R U P A LA N D N O W F O R S O M E T H I N G C O M P L E T E L Y D I F F E R E N T

a t o m i c a n t . c o . u k

Feeds – http://drupal.org/project/feeds

ScrapingFeeds query path parser - project/feeds_querypath_parserFeeds xpath parser – project/feeds_xpathparser

Cleaning up dataFeeds tamper - http://drupal.org/project/feeds_tamper

V I S U A L I S E W I T H D R U P A LA N D N O W F O R S O M E T H I N G C O M P L E T E L Y D I F F E R E N T

a t o m i c a n t . c o . u k

Mapping - Location – http://drupal.org/project/location - Openlayers – http://drupal.org/project/openlayers - Gmap – http://drupal.org/project/gmap

Graphs/Charts- Graphs- Graphs Charts- Open Flash Chart- Views

G O ! S C R A P E I T !C H A L L E N G E

a t o m i c a n t . c o . u k

EU Open Data Challenge - €20,000 to win - 28 days left to enter

http://opendatachallenge.org/

T O O L SS C R A P I N G D A T A

a t o m i c a n t . c o . u k

ScraperWiki – http://scraperwiki.com

PHP Simple HTML DOM – http://bit.ly/phphtmldom

PHPQuery - http://code.google.com/p/phpquery/

Open Data Kit - http://opendatakit.org/

T O O L SC L E A N I N G D A T A

a t o m i c a n t . c o . u k

Google Refine - http://code.google.com/p/google-refine/

T O O L SV I S U A L I Z I N G D A T A

a t o m i c a n t . c o . u k

Google fusion tables - http://tables.googlelabs.com/

The Best Tools for Visualization - http://rww.to/toolsforvis

T O O L SV I S U A L I Z I N G D A T A

a t o m i c a n t . c o . u k

OpenHeatmap http://bit.ly/openheatmap

T H A N K Y O UQ & A | L E T S C O N N E C T

a t o m i c a n t . c o . u k

QUESTIONS?

@sotak - twitterhttp://sotak.co.uk - personal bloghttp://atomicant.co.uk - company website