Big Data Using Big Data for Cultures and Communities Jeremy Reffin Simon Wibberley CASM, University of Sussex Carl Miller CASM, Demos July 2014

Embed Size (px)

DESCRIPTION

Overview Emergence of “Big Data” Environment becoming saturated with digital devices that record data Social lives becoming increasingly digital (e.g. 500 million tweets / day) Access to big data now widespread through internet

Citation preview

Big Data Using Big Data for Cultures and Communities Jeremy Reffin Simon Wibberley CASM, University of Sussex Carl Miller CASM, Demos July 2014 Overview Emergence of Big Data Open Data initiatives Open source tools and processor power Analytics culture in decision making Overview Emergence of Big Data Environment becoming saturated with digital devices that record data Social lives becoming increasingly digital (e.g. 500 million tweets / day) Access to big data now widespread through internet Overview Open Data initiatives Key indices / measures have long been collected but sequestered Access to key data now widespread through internet Culture of what gets measured gets managed led to collection of more indices / measures Political initiatives to make data more widely available (open) as a market tool Overview Open source tools and processor power Sophisticated data processing and visualisation tools now in the hands of most users Overview Analytics culture in decision making 1980s: rise of data-driven approach to decision-making in business management Trend increasingly influenced formation of public policy Now potentially useful data accessible to community-level initiatives Overview Emergence of Big Data Open Data initiatives Open source tools and processor power Analytics culture in decision making SOURCES TOOLS Sources Social MediaCommunity Information Local Government Central Government Corporate Sources Social MediaCommunity Information Local Government Central Government Corporate Availability Subject Matter UbiquitousIsolated SocialFactual Focus GeneralSpecific Sources Social MediaCommunity Information Local Government Central Government Corporate Availability Subject Matter Focus Degree of Structure Degree of Oversight UbiquitousIsolated SocialFactual GeneralSpecific UnstructuredStructured Not curatedOften curated Sources Social MediaCommunity Information Local Government Central Government Corporate Availability Subject Matter Focus Degree of Structure Degree of Oversight High Volume Open Access UbiquitousIsolated SocialFactual GeneralSpecific UnstructuredStructured Not curatedOften curated DependsNoSometimes NoSometimes Rarely Sources Social MediaCommunity Information Local Government Central Government Corporate Availability Subject Matter Focus Degree of Structure Degree of Oversight UbiquitousIsolated SocialFactual GeneralSpecific UnstructuredStructured Not curatedOften curated DependsNoSometimes NoSometimes Rarely High Volume Open Access Tools Extract & process Analyse Visualise Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise APIs / Downloadable data sets Analytics tools (Excel, Google Analytics) Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise Wordle Wordle is a tool for generating word clouds from text that you provide. Word clouds give greater prominence to words that appear more frequently in the source text The word clouds can be customised with different fonts, layouts, and color schemes See: wordle.net Text is Beautiful This resource creates interactive word clouds, concept webs, and correlation wheels -In a concept web, the position of concepts matter; those more related appear near each other. Related concepts are grouped into themes, denoted by colour -A correlation wheel visualises concepts that are correlated with each other. Two concepts are correlated if they appear together in the text often and appear apart rarely See: textisbeautiful.net Circos Circos extends the correlation wheel concept into much more sophisticated data visualization using a cicular layout. Originally designed for genomic data, it produces attractive graphics that are particularly effective for displaying pairwise interactions in general and flowsin particular See: circos.ca Circos (continued) Circos is often used to convey complex data in a condensed format Circos is relatively simple to use but is not point and click It requires comfort / familiarity with scripting approaches See: circos.ca Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise gephi Gephi visualises graphs. A graph is a representation of a set of objects where there are pairwise relationships between some of these objects. Any network can be usefully represented as a graph as can most data with structural relationships between elements Gephi sits on the interface between data visualisation and data analysis. It has sophisticated tools for exploring data and analysing, filtering, clustering, manipulating and exporting it See: gephi.github.io gephi: example output tableau Tableau provides sophisticated tools for analysis and visualisation of data stored in a spreadsheet or similar data format. It can create interactive See: tableausoftware.com Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise Integrated analysis: DataShine Provides interactive mapping visualisation of UK Census data Source: datashine.org.uk Sources x Tools Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise Social MediaCommunity Information Local Government Central Government Corporate Extract & process Analyse Visualise Unstructured Not curated High Volume Open Access General Method51 Twitter Boolean term scraper Relevancy Classifier Pattern Classifier (1) Pattern Classifier (2) Output 1 Output 2 Extract & process Analyse Visualise Method51 is a framework for collecting, analysing, and understanding Twitter data sets It helps users to locate tweets relevant to a precise topic of interest (i.e. separate the wheat from the chaff) and to gain the best possible insight into what is being said about that topic This is achieved using chains of classifiers devices for placing tweets into different categories based on the words that they contain