Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Big Data:How Will It Change Your Life?
Gregory Butler
Department of Computer Science & Software EngineeringData Science Research Centre
Centre for Structural and Functional GenomicsConcordia University, Montreal, Canada
October 2019 — Malaysia
Outline
What is Big Data?
What is Data Analytics?
What is Big Data Analytics?
Changes in Your Life
Conclusion
2 / 43
Outline
Examples of Changes due to Big Data
3 / 43
Transport and Mobility
4 / 43
Money, Finance and CreditTraditional Credit Score
Future Psychometric Credit Score
5 / 43
The Elderly or Remote Patient Perspective
Dimitrov (Health Informatics Research, 2016)
6 / 43
VISR — A Canadian CompanyBetter mental and emotional health via social media data mining
“On a mission to help families better navigate technology, bynotifying parents about safety and wellness issues their kids face onsocial media”
http://www.visr.co 7 / 43
Outline
What is Big Data?
8 / 43
Big Data: 2,000,000,000,000,000,000 bytes (2EB)per day!
https://shhrota.com/2012/01/02/the-big-in-big-data/
9 / 43
Big Data
10 / 43
Actionable Data in Data-Driven (Clinical) Healthcare
(Infoway Health Canada 2016)
11 / 43
Big Data (http://dsrc.encs.concordia.ca/what-is-bigdata.html)Big DataDefinition of “Big” has changed as we have become more advanced
HistoryHollerith Cards 1890 (US population census)
Economic Data 1952 (GDP etc)
Computers 1959 — The First Digital Data Tsunami
World Wide Web 1990’s — The Second Digital Data Tsunami
Social Media 1985 — The Third Digital Data Tsunami
Internet of Things 2000 — The Fourth Digital Data Tsunami
Big Science — 1960’s onwards
Deep Knowledge — 2011 onwards
A key notion is actionable data that is useful in supportingdecisions, determining actions, and adding value to an endeavour.
12 / 43
Big DataThe 5 V’sVolume: amount of dataVariety: different types of dataVelocity: rate at which data is generatedVeracity: trustworthiness, level of noiseValue: usefulness of data to a businessplus Visualization, Viscosity (sticky), Virality (convey a message)
DriversTransactionsMobileSocial MediaInternet of Things
MGI ReportMcKinsey Global Institute, Big data: The next frontier forinnovation, competition, and productivity, May 2011.
13 / 43
Types of Jobs in Big DataData Analystanalyses data from a business perspective to put data to use
Data Scientistexpert at data analysis, mathematical modeling, machine learningand coding
Data Architectdesign, create, deploy, manage an organization’s
data architecturecomposed of models, policies, standards for data collection,storage, structure, integration, and use
Chief Data Officerresponsible for enterprise-wide governance and utilization of dataas an asset; integrate data-driven business process
14 / 43
Outline
What is Data Analytics?
15 / 43
Data Analytics
(Microsoft)
16 / 43
Data Analytics
(Microsoft) 17 / 43
Data Analytics: Data Wrangling
Design a Data Collection ProgramI Establish whether or not the data exists in the real world and
is relevant to the question
I Devise a collection scheme to acquire itLogistical considerations? Cost? Privacy issues?
I Coordinate with departments or agencies needed for collection
18 / 43
Data Analytics: Data Wrangling
Collect and Review the DataI Store the incoming data to allow modeling and reporting
I Join data from multiple sources in relevant & logical manner
I Check for anomalies or unusual patternsI Caused by the collection process?I Inherent to topic of investigation?I Correct them, or develop new collection scheme?
19 / 43
Data Analytics: Exploratory Data Analysis
Exploratory Data AnalysisLearn about the properties of the data
StepsI Descriptive statistics: mean/median, variance/quartiles,
outliersI CorrelationI Fitting curves and distributionsI Dimension reductionI Clustering
20 / 43
Data Analytics: Modeling
ModelingGetting “meaning” from a clean data set
StepsI Build a data model to fit the questionI Validate the model against the actual collected dataI Perform the necessary statistical analysesI Machine-learning or recursive analysisI Regression testing and other classical statistical analysisI Compare results against other techniques or sources
21 / 43
Data Analytics: Modeling
(Microsoft)
22 / 43
Data Analytics: Story telling
Visualize and Communicate the ResultsThe most challenging part of the data scientist’s job is taking theresults of the investigation and presenting them to the public orinternal consumers of information in a way that makes sense andcan be easily communicated.
StepsI Graph or chart the informationI Tell a story to fit the results: Interpret the data to describe
the real-world sources in a plausible mannerI Assist decision-makers in using results to make decisions
23 / 43
Approaches to Data AnalysisScriptingUnix tools, egtext files, csv files for inputs, outputs, intermediate stepsstepwise development of analysisscript captures steps, parameterseasy to replay
NotebooksJupyter, eginteractive scripting with “literate programming”keep track of thought processes during analysiswork with files to replay analysis
“Spreadsheet” EnvironmentsOpenRefine, eglots of tools, little guidanceneed macros, histories, to capture/replay workoften proprietary
24 / 43
Outline
What is Big Data Analytics?
25 / 43
Big Data Analytics — Compute Clusters & the Cloud
Map Reduce ApproachHadoop, SparkDistributed database support (HBase)
Knowledge GraphsLinked data, ontologies & semantic web
CloudFlexible, distributed computing, as needed
noSQL DatabasesModern technology for varieties of data
26 / 43
Canada’s Big Data ConsortiumWhoEstablished by Ryerson University in mid 2014Bring together Govt, Industry, and UniversitiesFour academic founding partners: Ryerson, SFU, Dal, Concordia
Lessons LearnedGovt keen to exploit Big Data job growth & economic growth
and improve efficiency of govt operations
Some schools recognise importance of numeracy & analyticsMany universities introducing courses & programmes
Enormous talent gap between demand and supply
4 Job Types Identified:Chief Data Officer, Data Scientist, Data Analyst, Data Architect
27 / 43
Montreal International Panel of ExpertsWho — Dec 2014 to Dec 2015MI profile of Big Data industry in Montreal and QuebecAll the “movers and shakers”
quebec govt: industry, trade, research, labouropen montreal directormontreal smart city initiativeindustry, SME, university, CRIM
Lessons LearnedMontreal has what it takes!
Big Data is ideal space for innovative small start-ups!
Big Data essential for biotech, pharma industries and researchand Quebec is lagging in such data-driven life sciences
Importance of open source and open data
28 / 43
Working Group on Predictive HealthWho — WG of Canada’s Big Data ConsortiumAcademia, govt, consultants, industry, hospital, insurance, start-ups
Data Access is Key!Open data by default will be the future.Data secure in the cloud and on personal smartcardsPrivacy and sharing of data is in control of patient!Anonymised data using “id servers” for data integrationFuture is linked open data using RDF, ontologies, semantic web
Incubators need Technology + Business + HealthcarePhysicians must buy-in ... or ... it’s a no-go!
Scale Out beyond IncubatorsNeed community engagement ...
for acceptance, use, and data sharingto establish agreed terminology for data sharingto reap benefits of innovation from Big Data
29 / 43
Outline
Changes to Your Life
30 / 43
Big Data
31 / 43
What Data is Gathered?
GPS in your watch and carYour location at all times
Your web browserWhich sites you visitHow long you stayWhat search queries you askWhen, from where, how often
Amazon and ShopsWhat you look atWhat you buyAre you influenced byrecommendations
EmailWhat you sayWho you know
Social MediaWhat you sayWho you knowWhat you like
Your Fitness TrackerYour healthYour exercise regime
Credit Card and BankWhat you buyWhen, from where, how often
32 / 43
Business Benefits of Big Data
Better decision-making
Increased productivity
Reduced costs
Understanding your customers
Fraud detection
33 / 43
34 / 43
Transport and Mobility
35 / 43
Transport and Mobility
Walker for the Blind
36 / 43
Sport and LeisureBadminton
37 / 43
Money, Finance and CreditTraditional Credit Score
Future Psychometric Credit Score
38 / 43
Outline
Conclusion
39 / 43
Conclusion — Challenges
Data Ownership and ControlWill you control your data?Control ... Who can access it? How they can use it?Will you benefit ($$$) from others using your data?
Balance of Control between Computer & Humansmart homes, smart cars, smart buildings, smart cities
Safeguards for Disastersfor example, loss of electrical powerfor example, loss of communicationfor example, loss of satellites & GPS
when everything depends on computers
40 / 43
Thank You!
Questions, Please?
41 / 43
Privacy and Security
“Privacy refers to an individual’s right to control the collection,use, and disclosure of his/her personal health information (PHI)and/or personal information (PI) in a manner that allows healthcare providers to do their work.
Security is about ensuring the information gets to the right personin a secure manner.”Ontario’s Ehealth Blueprint http://www.ehealthblueprint.com
42 / 43
Privacy by Design 2009
Seven Foundational Principles1) being proactive not reactive;2) having privacy as the default setting;3) having privacy embedded into design;
4) avoiding the pretence of false dichotomies,such as privacy vs. security;
5) providing full life-cycle management of data;6) ensuring visibility and transparency of data; and7) being user-centric
Prof. Ann Cavoukian, formerly Information and Privacy Commissioner ofOntario; now Ryerson University. http://www.privacybydesign.ca
43 / 43