50

(Some of) Wikipedia's Open Data

Embed Size (px)

Citation preview

Page 1: (Some of) Wikipedia's Open Data
Page 2: (Some of) Wikipedia's Open Data

Analytics Engineering

Page 3: (Some of) Wikipedia's Open Data
Page 4: (Some of) Wikipedia's Open Data

We build analytics infrastructure

[email protected]

Page 5: (Some of) Wikipedia's Open Data

The Analytics Team sees as its primary responsibility making

Wikimedia related data available for querying and analysis

to both WMF and the different Wiki communities and

stakeholders. We develop infrastructure so all our users,

both within the Foundation as within the different

communities, can access data in a self-service fashion that is

consistent with the values of the movement.

Page 6: (Some of) Wikipedia's Open Data

We do not handle data requests (for the most

part)

Page 7: (Some of) Wikipedia's Open Data

We try for (all) data to be public by default.

The more accessible the data is, the more impact it can have.

Page 8: (Some of) Wikipedia's Open Data

But we are not there Yet

Page 9: (Some of) Wikipedia's Open Data

Public Data

Page 10: (Some of) Wikipedia's Open Data

Data that is Useful for the world at large.

Page 11: (Some of) Wikipedia's Open Data
Page 12: (Some of) Wikipedia's Open Data

6th largest website [Alexa]

Page 13: (Some of) Wikipedia's Open Data

Wikipedia reaches hundreds of millions of unique devices every month and, as such, are a good barometer of browser popularity.

Page 14: (Some of) Wikipedia's Open Data

The most popular browser

Page 15: (Some of) Wikipedia's Open Data

?

Page 16: (Some of) Wikipedia's Open Data

The most popular browser

in April 2017

Page 17: (Some of) Wikipedia's Open Data

Was Chrome 56 with 25% market

share

Page 19: (Some of) Wikipedia's Open Data

Issues

Page 20: (Some of) Wikipedia's Open Data

IE7 making a comeback… up more than 1% last year

Page 21: (Some of) Wikipedia's Open Data
Page 22: (Some of) Wikipedia's Open Data
Page 23: (Some of) Wikipedia's Open Data

Bots ...

Page 24: (Some of) Wikipedia's Open Data

Data useful to WMF, Researchers and

Community

Page 25: (Some of) Wikipedia's Open Data

Pageviews

Page 26: (Some of) Wikipedia's Open Data

We process about 200,000 HTTP requests / second at peak

Page 27: (Some of) Wikipedia's Open Data

At peak weprocess about 200.000 requestsper second

Page 28: (Some of) Wikipedia's Open Data

Pageview API

Page 31: (Some of) Wikipedia's Open Data

http://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&range=latest-20&sites=tr.wikipedia.org

Page 32: (Some of) Wikipedia's Open Data

Issues

Page 33: (Some of) Wikipedia's Open Data

Bots, Bots, Bots

Page 34: (Some of) Wikipedia's Open Data

Data useful to WMF (mostly)

Page 35: (Some of) Wikipedia's Open Data

Unique Devices

Page 38: (Some of) Wikipedia's Open Data

Data useful to Community (mostly)

Page 39: (Some of) Wikipedia's Open Data

Wikistats 2.0

http://stats.wikimedia.org

Page 40: (Some of) Wikipedia's Open Data
Page 41: (Some of) Wikipedia's Open Data

CHECK IN

April 2007TEAM/DEPT

Analytics

Wikistats exists to motivate our editor community.

In Wikistats 2.0 we are not only updating the website interface but we are also providing new access to all our edit data in an analytics-friendly form. This much improves (and fundamentally changes) the way, time and resources it takes to calculate edit metrics, for WMF and community.

Page 42: (Some of) Wikipedia's Open Data

https://analytics-prototype.wmflabs.org/

Page 43: (Some of) Wikipedia's Open Data

https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedback/Round2

Please Chime in!

Page 44: (Some of) Wikipedia's Open Data

Live Data

Page 45: (Some of) Wikipedia's Open Data

EventStreams is a web service that exposes continuous streams of structured event data. Get live updates to Wikimedia projects.

Page 46: (Some of) Wikipedia's Open Data
Page 47: (Some of) Wikipedia's Open Data

Navigate to http://wikimedia.org in your browser and open the development console

// This is the EventStreams RecentChange stream endpointvar url = 'https://stream.wikimedia.org/v2/stream/recentchange';// Use EventSource (available in most browsers, or as an// npm module: https://www.npmjs.com/package/eventsource)// to subscribe to the stream.var recentChangeStream = new EventSource(url);// Print each event to the consolerecentChangeStream.onmessage = function(message) {

//Parse the message.data string as JSON.var event = JSON.parse(message.data);console.log(event);

};

Page 48: (Some of) Wikipedia's Open Data

Questions?

https://xkcd.com/285

Page 49: (Some of) Wikipedia's Open Data

Most things documented at:https://wikitech.wikimedia.org/wiki/Analytics/

Page 50: (Some of) Wikipedia's Open Data