22
More Than A Buzzword: Big Data in the Environmental Arena 2015 Na>onal Environmental Monitoring Conference | July 15, 2015 Brooke Roecker Senior Environmental Data Analyst Mark Packard, PG, CPG President/CEO

More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

More  Than  A  Buzzword:    Big  Data  in  the  Environmental  Arena 2015  Na>onal  Environmental  Monitoring  Conference    |    July  15,  2015  

Brooke  Roecker  Senior  Environmental  Data  Analyst    Mark  Packard,  PG,  CPG  President/CEO  

Page 2: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Presenta>on  Outline  

•  Big  Data  Defined  •  Environmental  Data  Past  &  

Present  

•  Today’s  Tools  and  Approaches  

•  Example  Projects  

•  Future  Considera>ons  

2  

Page 3: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Big  Data  Defined:  (yet  again)  "Big  data  is  high  volume,  high  velocity,  and/or  high  variety  informa>on  assets  that  require  new  forms  of  processing  to  enable  enhanced  decision  making,  insight  discovery  and  process  op>miza>on.“  Laney, Douglas. "The Importance of 'Big Data': A Definition". Gartner. Retrieved 21 June 2012.

Big  Data  in  Environmental  Monitoring/Remedia>on  

3  

Page 4: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Big  Data  Defined:  (yet  again)  "Big  data  is  high  volume,  high  velocity,  and/or  high  variety  informa>on  assets  that  require  new  forms  of  processing  to  enable  enhanced  decision  making,  insight  discovery  and  process  op>miza>on.“  Laney, Douglas. "The Importance of 'Big Data': A Definition". Gartner. Retrieved 21 June 2012.

Big  Data  in  Environmental  Monitoring/Remedia>on  

4  

Page 5: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

5  

•  Velocity:  high  frequency  data  •  Variety:  mixed  data/a^ributes   •  Volume:  very  large  datasets  

•  VERACITY:  Accuracy  of  data  

Diya Soubra, “The 3Vs that define Big Data”. http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data

Page 6: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Where  have  we  come  from?  

•  Hand-­‐wri^en  field  logs  

•  Text  files  •  Spreadsheets  •  Simple  Reports  

6  

Page 7: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Where  are  we  today?  

•  Older  technologies  remain  •  Database  storage  •  Out  of  the  box  storage/analysis  tools  

–  EQuIS™  –  ENFOS  –  Locus  –  Project  Portal™  

7  

Page 8: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Where  are  we  today?  

•  Older  technologies  remain  •  Database  storage  •  Out  of  the  box  storage/analysis  tools  

–  EQuIS™  –  ENFOS  –  Locus  –  Project  Portal™  

•  Limita>ons?  •  What  data  are  you  not  managing/analyzing?  

8  

Page 9: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

“Bigger”  data  tools  available  today  

•  High  frequency  advancements:  –  EQuIS  Live  –  Project  Portal  Analy>cs  Module  

•  Analysis  and  modeling  tools  –  Spa>al:  ArcGIS,  EVS  –  Visualiza>on:  Tableau  

•  Custom  scrip>ng  (R,  Python,      T-­‐SQL…)  –  SSAS,  Weka  –  MatPlotLib  (Python),  ggplot2  (R)  

9  

Page 10: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project  Examples  

10  

Page 11: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  Surface  Water  Monitoring  •  High  velocity  data,  5-­‐minute  

intervals  •  Teledyne  ISCO  samplers  •  Historical  data  archived  in  raw  

MS  Excel  files  Challenge:  •  Dataset  too  large  for  “Big  

picture”  trends  •  Storing/archiving  data  long-­‐term  •  Centralized  access  for  project  

team  

11  

Page 12: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  Surface  Water  Monitoring  Solu>on:  •  Streamlined  data  resampling  &  import  rou>ne  

–  Resample  from  5-­‐minute  to  12-­‐hour  averages  (or  totals)  

12  

Raw Data Resampled Data

Page 13: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  Surface  Water  Monitoring  Solu>on:  •  Data  available  to  project  team  via  Project  Portal  

–  Environmental  Database  module  for  resampled  data  –  Documents  module  for  raw  data  

13  

Page 14: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  RAD  Site  Monitoring  Challenge:  •  High  volume,  high  velocity  

sensor  data  w/  telemetry  •  Automated  database  storage    •  Visual  analysis  of  high  volume  

weather  data  

14  

Page 15: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  RAD  Site  Monitoring  

Solu>on:  •  Automated  data  import  

via  web  upload  •  Database  available  to  

field  staff  via  Project  Portal  

•  Wind  rose  graphics  to  visualize  data  via  EnviroInsite  

•  QA  of  erroneous  data  points  (par>culates)  

15  

Page 16: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  Mine  Tunnel  Monitoring  

•  High  volume,  high  velocity  data    –  On-­‐site  sensor  data  from  

PLC  system  –  Public  big  data  streams  

Challenge:  •  Centralized  database  

storage  •  Real-­‐>me  data  access  •  Real-­‐>me  no>fica>ons/

alarms  

See next slide

16  

Page 17: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  Mine  Tunnel  Monitoring  

Solu>on:  •  Generic  database  design      

op>mized  for  high  frequency  data  •  Predic>ve  trend  modeling  

calcula>ons  

•  Data  available  via  Project  Portal  •  Email  alerts  when  incoming  data  

parameters  out  of  spec.  

17  

Page 18: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  O&M  Site  Monitoring  High  velocity,  automated  SVE  and  GW  treatment  systems    

Challenge:    •  Centralized  storage  •  Centralized  

monitoring  •  System  

troubleshoo>ng…  

18  

Groundwater Treatment System

Page 19: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Project:  O&M  Site  Monitoring  

19  

System  Solu>on:    •  Mul>variate  analysis  to  review  system  variables  •  Secondary  analysis  to  iden>fy  fluctua>ons  

Page 20: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

What’s  Next?  

•  Use  of  emerging  technologies:  – Distributed  data  sourcing  

•  Hadoop  HDFS  •  NoSQL  

– Distributed  processing  •  Batch  processing  (MapReduce,  Apache  Hive)  •  Real-­‐>me  processing/streaming  (Cloudera  Impala,  Apache  54)  

•  PolyBase  (cross-­‐querying  HDFS  and  SQL  Server)  

20  

Page 21: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Summary  /  Key  Takeaways  •  Free  big  data  –  go  out  and  use  it!  •  Big  data  to  inves>gate  the  unknown  •  Greater  project  intelligence  &  decision  making    

21  

linke

din.

com

/pul

se/a

uto-

indu

stry

-bor

ed-b

ig-d

ata-

bria

n-pa

sch

Page 22: More%Than%A%Buzzword:%% Big%Datain%the ...nemc.us/docs/2015/presentations/Wed-Data Quality-12.2...Big%DataDefined:% (yetagain)% "Big%datais%high% volume,high% velocity,%and/or%high%variety%

Brooke  Roecker  [email protected]    Mark  Packard,  PG,  CPG  [email protected]  

Thank  you!  

Contributors:  Myles  Hook,  ddms  Jon  Turner,  ddms  Heidi  Gaedy,  ddms  Ed  Larson,  ddms  

Angela  Remer,  ddms  Emily  Mulford,  Earthsov  Inc.