45
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ) New Trends and Direc9ons in Data Science Moderator : Mario Faria July 19 th , 2013

New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Embed Size (px)

DESCRIPTION

Panel I hosted at MIT for the 7th Information Quality Conference in July 2013, with J.Andrew Rogers (SpaceCurve) and Matt Piekarczyk (Cortix Systems)

Citation preview

Page 1: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

New  Trends  and  Direc9ons  in  Data  Science    

Moderator  :  Mario  Faria    

July  19th  ,  2013  

July  17,  2012  

Page 2: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

•  J.Andrew  Rogers  (SpaceCurve)  •  Ma?  Piekarczyk  (CorDx  Systems)  

Panelists  

Page 3: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Format  

•  Mario’s  introduc9on  on  the  subject  •  Each  panelist  will  have  20  minutes  to  present  a  point  of  view  

•  Mario  will  ask  a  few  ques9ons    •  Panelists  will  debate  among  each  other  or  answer  ques9ons  from  the  audience  

Page 4: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Data  Science      

The  process  of  taking  raw  data,  producing  informa9on  from  data,  and  using  this  informa9on  to  guide  ac9ons  that  will  bring  financial  benefits  to  business  

Page 5: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Quality  is  mandatory  for  Data  Science  to  

work          

Page 6: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Where  we  stand  today  

•  Fragmented  ecosystem  •  Over  usage  of  the  Big  Data  term  •  The  “how  to  compete  on  analy9cs”  is  s9ll  hard  to  achieve  

•  In  the  majority  of  companies,  data  is  s9ll  managed  with  an  IT  mind  set    

Page 7: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Mario Faria

7

The Big Data Fragmented Tech Vendors data life cycle process view

Page 8: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Mario Faria

8

Page 9: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 10: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Mario Faria

10

Page 11: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

New  Trends  and  Direc9ons  in  Data  Science    

J.Andrew  Rogers    Founder  and  CTO  

SpaceCurve  

Page 12: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

www.spacecurve.com

© 2013 SpaceCurve, Inc. All rights reserved. 12

Five Big Data Trends and Directions In Data Science

J. Andrew Rogers Founder & CTO

July 18, 2013

Page 13: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 13

The Evolution Of Data Science

§  1st Generation

–  An organization’s structured data

–  Example: OLAP / Data Warehouse

§  2nd Generation

–  An organization’s unstructured data

–  Example: Hadoop / MapReduce

§  3rd Generation

–  Real-time context and actionability of an organization’s data

–  Example: SpaceCurve

Page 14: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 14

Capturing and Fusing In-Motion Data

§  Monetization of data-in-motion –  Satellites, smartphones, sensor, social media, spatial, radar, …

§  Real-time processing and fusing §  Immediate insights from multiple layers of data in motion and

historical data at once §  Immersive intelligence with real-time location analysis

Page 15: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 15

Trend #1. Use of diverse data sources for better situational awareness

§  Proliferation of inexpensive sensors create new possibilities

–  Imagery and video: satellite, UAV, coincidental

–  GPS-tagged entities and entity motion vectors

–  Sensor networks, RF, radar

§  Many challenges

–  Integration and fusion of unrelated data sources

–  Domain expertise required to use data effectively

–  Standardization of data representation

Page 16: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 16

Trend #2. Leveraging machine-generated data to increase model quality

§  Machines continuously make measurements of reality

–  Sensor networks e.g. imaging, radar, GPS tracking, RF, seismic

–  Operational sensors on machines e.g. automotive and aircraft

–  Computer network activity and audit logs

§  Challenge is extreme data generation rates

–  Few big data platforms designed for continuous data ingest

–  Computers and sensors are not constrained by human biology

Page 17: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 17

Real-world scenario: Hurricane Sandy

Page 18: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 18

Trend #3. Real-time data ingestion concurrent with analysis (“round-trip real-time”)

§  Minimizing latency from new data availability to updated analytic models and actionable intelligence is a multi-faceted advantage

–  Leverage highly perishable contextual data before it expires

–  Identify operational risks as soon as they manifest in the data

–  Continuously evolve models to reflect operational environment

§  Challenges for traditional data science platforms

–  Moving from batch to on-line or near-line analytical models

–  Minimizing data movement in analytical processes

–  Scaling out analytic query performance with online updates

Page 19: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 19

Trend #4. Space and time relationships for data fusion and deeper insights

§  Space and time are primary keys of reality

–  Entities and events can be localized at a point in time

–  Robust method for fusing unrelated slow and fast moving data

–  Interactions and movement over time can be modeled as graphs

§  Powerful and unique analytical capability

–  Correlation of data by time and space relationships

–  Relationship discovery by analyzing unrelated entity vectors

–  Anomaly detection using vector analysis

Page 20: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 20

Real-world scenario: Correlating entities on social media with flight data

Page 21: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 21

Trend #5. Layering many data sources for data quality and immersive intelligence

§  Understanding the full context in which events occur for maximum model fidelity

§  Reinforce signal and cancel out noise by overlaying different measurements of the same event

–  Fill in incomplete or missing data from single data sources

–  Corroborate similar data sources against each other to detect errors and fraud

–  Corroborate a fact analytically from dissimilar data sources

–  Identify subtle semantic and representation differences across data sets

Page 22: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 22

New Big Data capabilities needed to meet future market requirements

Page 23: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 23

Delivering immediately actionable intelligence

Page 24: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

www.spacecurve.com

© 2013 SpaceCurve, Inc. All rights reserved. 24

Thank You!

J. Andrew Rogers Office: +1 206.453.2236 Email: [email protected] Twitter: @jandrewrogers

For More Information, Please Contact:

Page 25: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

New  Trends  and  Direc9ons  in  Data  Science    

Ma]  Piekarczyk  President  

Cor9x  Systems  

Page 26: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Matt Piekarczyk"President"(703) 740-9162 x701"[email protected]"

Let  knowledge  flow"

Page 27: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 28: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 29: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

17 hrs /week spent gathering and fusing data

Page 30: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

80% Effort 1/3 Cost 11% Integrated

Page 31: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

0  

1  

2  

3  

4  

5  

1   201   401   601   801  

x  100000  

Fundamental Law

Page 32: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Parse Clean Map Find

Use

Page 33: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 34: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 35: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 36: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 37: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 38: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

There is a better way

Page 39: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Learn Learn Learn Learn

Use Share

Page 40: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Learning  solu9ons  

Page 41: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Custom dynamic fused data go  

Data is the platform

Page 42: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 43: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Cost

Focus

Underpowered High Risk

Page 44: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Cost

Focus

Optimize Resource Allocation and Focus

Page 45: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

•  Mario  Faria  (Moderator)  •  J.Andrew  Rogers  (SpaceCurve)  •  Ma?  Piekarczyk  (CorDx  Systems)  

The  Debate