Upload
mario-faria
View
533
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Panel I hosted at MIT for the 7th Information Quality Conference in July 2013, with J.Andrew Rogers (SpaceCurve) and Matt Piekarczyk (Cortix Systems)
Citation preview
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
New Trends and Direc9ons in Data Science
Moderator : Mario Faria
July 19th , 2013
July 17, 2012
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
• J.Andrew Rogers (SpaceCurve) • Ma? Piekarczyk (CorDx Systems)
Panelists
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Format
• Mario’s introduc9on on the subject • Each panelist will have 20 minutes to present a point of view
• Mario will ask a few ques9ons • Panelists will debate among each other or answer ques9ons from the audience
Data Science
The process of taking raw data, producing informa9on from data, and using this informa9on to guide ac9ons that will bring financial benefits to business
Quality is mandatory for Data Science to
work
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Where we stand today
• Fragmented ecosystem • Over usage of the Big Data term • The “how to compete on analy9cs” is s9ll hard to achieve
• In the majority of companies, data is s9ll managed with an IT mind set
Mario Faria
7
The Big Data Fragmented Tech Vendors data life cycle process view
Mario Faria
8
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Mario Faria
10
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
New Trends and Direc9ons in Data Science
J.Andrew Rogers Founder and CTO
SpaceCurve
www.spacecurve.com
© 2013 SpaceCurve, Inc. All rights reserved. 12
Five Big Data Trends and Directions In Data Science
J. Andrew Rogers Founder & CTO
July 18, 2013
© 2013 SpaceCurve, Inc. All rights reserved. 13
The Evolution Of Data Science
§ 1st Generation
– An organization’s structured data
– Example: OLAP / Data Warehouse
§ 2nd Generation
– An organization’s unstructured data
– Example: Hadoop / MapReduce
§ 3rd Generation
– Real-time context and actionability of an organization’s data
– Example: SpaceCurve
© 2013 SpaceCurve, Inc. All rights reserved. 14
Capturing and Fusing In-Motion Data
§ Monetization of data-in-motion – Satellites, smartphones, sensor, social media, spatial, radar, …
§ Real-time processing and fusing § Immediate insights from multiple layers of data in motion and
historical data at once § Immersive intelligence with real-time location analysis
© 2013 SpaceCurve, Inc. All rights reserved. 15
Trend #1. Use of diverse data sources for better situational awareness
§ Proliferation of inexpensive sensors create new possibilities
– Imagery and video: satellite, UAV, coincidental
– GPS-tagged entities and entity motion vectors
– Sensor networks, RF, radar
§ Many challenges
– Integration and fusion of unrelated data sources
– Domain expertise required to use data effectively
– Standardization of data representation
© 2013 SpaceCurve, Inc. All rights reserved. 16
Trend #2. Leveraging machine-generated data to increase model quality
§ Machines continuously make measurements of reality
– Sensor networks e.g. imaging, radar, GPS tracking, RF, seismic
– Operational sensors on machines e.g. automotive and aircraft
– Computer network activity and audit logs
§ Challenge is extreme data generation rates
– Few big data platforms designed for continuous data ingest
– Computers and sensors are not constrained by human biology
© 2013 SpaceCurve, Inc. All rights reserved. 17
Real-world scenario: Hurricane Sandy
© 2013 SpaceCurve, Inc. All rights reserved. 18
Trend #3. Real-time data ingestion concurrent with analysis (“round-trip real-time”)
§ Minimizing latency from new data availability to updated analytic models and actionable intelligence is a multi-faceted advantage
– Leverage highly perishable contextual data before it expires
– Identify operational risks as soon as they manifest in the data
– Continuously evolve models to reflect operational environment
§ Challenges for traditional data science platforms
– Moving from batch to on-line or near-line analytical models
– Minimizing data movement in analytical processes
– Scaling out analytic query performance with online updates
© 2013 SpaceCurve, Inc. All rights reserved. 19
Trend #4. Space and time relationships for data fusion and deeper insights
§ Space and time are primary keys of reality
– Entities and events can be localized at a point in time
– Robust method for fusing unrelated slow and fast moving data
– Interactions and movement over time can be modeled as graphs
§ Powerful and unique analytical capability
– Correlation of data by time and space relationships
– Relationship discovery by analyzing unrelated entity vectors
– Anomaly detection using vector analysis
© 2013 SpaceCurve, Inc. All rights reserved. 20
Real-world scenario: Correlating entities on social media with flight data
© 2013 SpaceCurve, Inc. All rights reserved. 21
Trend #5. Layering many data sources for data quality and immersive intelligence
§ Understanding the full context in which events occur for maximum model fidelity
§ Reinforce signal and cancel out noise by overlaying different measurements of the same event
– Fill in incomplete or missing data from single data sources
– Corroborate similar data sources against each other to detect errors and fraud
– Corroborate a fact analytically from dissimilar data sources
– Identify subtle semantic and representation differences across data sets
© 2013 SpaceCurve, Inc. All rights reserved. 22
New Big Data capabilities needed to meet future market requirements
© 2013 SpaceCurve, Inc. All rights reserved. 23
Delivering immediately actionable intelligence
www.spacecurve.com
© 2013 SpaceCurve, Inc. All rights reserved. 24
Thank You!
J. Andrew Rogers Office: +1 206.453.2236 Email: [email protected] Twitter: @jandrewrogers
For More Information, Please Contact:
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
New Trends and Direc9ons in Data Science
Ma] Piekarczyk President
Cor9x Systems
Matt Piekarczyk"President"(703) 740-9162 x701"[email protected]"
Let knowledge flow"
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
17 hrs /week spent gathering and fusing data
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
80% Effort 1/3 Cost 11% Integrated
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
0
1
2
3
4
5
1 201 401 601 801
x 100000
Fundamental Law
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Parse Clean Map Find
Use
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
There is a better way
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Learn Learn Learn Learn
Use Share
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Learning solu9ons
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Custom dynamic fused data go
Data is the platform
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Cost
Focus
Underpowered High Risk
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
Cost
Focus
Optimize Resource Allocation and Focus
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ)
• Mario Faria (Moderator) • J.Andrew Rogers (SpaceCurve) • Ma? Piekarczyk (CorDx Systems)
The Debate