41
Learn with WSO2 Building your Big Data Solu8on Srinath Perera Director of Research WSO2 Inc.

Building your big data solution

  • Upload
    wso2

  • View
    109

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Building your big data solution

Learn  with  WSO2  -­‐  Building  your  Big  Data  Solu8on  

 Srinath  Perera  Director  of  Research  

WSO2  Inc.    

Page 2: Building your big data solution

About WSO2

•  Providing the only complete open source componentized cloud platform

–  Dedicated to removing all the stumbling blocks to enterprise agility –  Enabling you to focus on business logic and business value

•  Recognized by leading analyst firms as visionaries and leaders

–  Gartner cites WSO2 as visionaries in all 3 categories of application infrastructure

–  Forrester places WSO2 in top 2 for API Management •  Global corporation with offices in USA, UK & Sri Lanka

–  200+ employees and growing

•  Business model of selling comprehensive support & maintenance for our products

Page 3: Building your big data solution

150+ globally positioned support customers

Page 4: Building your big data solution

Consider  a  day  in  your  life  •  What  is  the  best  road  to  take?  •  Would  there  be  any  bad  weather?  

•  What  is  the  best  way  to  invest  the  money?  

•  Should  I  take  that  loan?  •  Can  I  op8mize  my  day?  •  Is  there  a  way  to  do  this  faster?  

•  What  have  others  done  in  similar  cases?  

•  Which  product  should  I  buy?      

Page 5: Building your big data solution

People  wanted  to  (through  ages)  •  To  know  (what  happened?)  

•  To  Explain  (why  it  happened)  

•  To  Predict  (what  will  happen?)  

 

Page 6: Building your big data solution

What  is  Big  data?  

•  There  is  lot  of  data  available  –  E.g.  Internet  of  things    

•  We  have  compu8ng  power    •  We  have  technology    •  Goal  is  same  

–  To  know  –  To  Explain    –  To  predict    

•  Challenge  is  the  full  lifecycle    

Page 7: Building your big data solution

Drivers  of  Big  Data  

Page 8: Building your big data solution

Data  Avalanche/  Moore’s  law  of  data  

•  We  are  now  collec8ng  and  conver8ng  large  amount  of  data  to  digital  forms    

•  90%  of  the  data  in  the  world  today  was  created  within  the  past  two  years.    

•  Amount  of  data  we  have  doubles  very  fast  

Page 9: Building your big data solution

In  real  life,  most  data  are  Big  •  Web  does  millions  of  ac8vi8es  per  second,  and  so  much  server  logs  are  created.      

•  Social  networks  e.g.  Facebook,  800  Million  ac8ve  users,  40  billion  photos  from  its  user  base.  

•  There  are  >4  billion  phones  and  >25%  are  smart  phones.  There  are  billions  of  RFID  tags.    

•  Observa8onal  and  Sensor  data  – Weather  Radars,  Balloons    –  Environmental  Sensors    –  Telescopes    –  Complex  physics  simula8ons  

Page 10: Building your big data solution

Why  Big  Data  is  hard?  •  How  store?  Assuming  1TB  bytes  it  

takes  1000  computers  to  store  a  1PB    •  How  to  move?  Assuming  10Gb  

network,  it  takes  2  hours  to  copy  1TB,  or  83  days  to  copy  a  1PB    

•  How  to  search?  Assuming  each  record  is  1KB  and  one  machine  can  process  1000  records  per  sec,  it  needs  277CPU  days  to  process  a  1TB  and  785  CPU  years  to  process  a  1  PB  

•  How  to  process?    –  How  to  convert  algorithms  to  work  in  

large  size  –  How  to  create  new  algorithms  

hap://www.susanica.com/photo/9  

Page 11: Building your big data solution

Why  it  is  hard  (Contd.)?  •  System  build  of  many  computers    

•  That  handles  lots  of  data  •  Running  complex  logic    •  This  pushes  us  to  fron8er  of  Distributed  Systems  and  Databases    

•  More  data  does  not  mean  there  is  a  simple  model    

•  Some  models  can  be  complex  as  the  system  

hap://www.flickr.com/photos/mariachily/5250487136,    Licensed  CC  

Page 12: Building your big data solution

Big  Data  Architecture  

Page 13: Building your big data solution

WSO2  Offerings  

•  Two  tools    – WSO2  BAM  for  store  and  process    – WSO2  CEP  for  real8me  processing  

•  These  tools  covers  whole  processing  life  cycle  for  your  Big  Data  with  help  of  few  other  products  as  needed.    – WSO2  Storage  server  – WSO2  User  Experience  Server    

Page 14: Building your big data solution

Big  Data  Architecture  Implementa8on  

Page 15: Building your big data solution

Sensors  •  Built  sensors  in  WSO2  Products  

•  Event  logs    –  Click  streams,  Emails,  chat,  search,  tweets  ,Transac8ons  …  

•  Custom  Sensors    –  Video  surveillance,  Cash  flows,  Traffic,  Surveillance,  Smart  Grid,  Produc8on  line,  RFID  (e.g.  Walmart),  GPS  sensors,  Mobile  Phone,  Internet  of  Things    

 hap://www.flickr.com/photos/imuaoo/4257813689/  by  Ian  Muaoo,  

hap://www.flickr.com/photos/eastcapital/4554220770/,  hap://www.flickr.com/photos/patdavid/4619331472/  by  Pat  David  copyright  CC  

 

Page 16: Building your big data solution

Collec8ng  Data  •  Data  collected  at  sensors  and  sent  to  big  data  system  via  events  or  flat  files  

•  Event  Streams:  we  name  the  events  by  its  content/  originator    

•  Get  data  through    – Point  to  Point  – Event  Bus  

•  E.g.  Data  bridge  – a  thrij  based  transport  we  did  that  do  about  400k  events/  sec  

Page 17: Building your big data solution

Storing  Data  •  Historically  we  used  databases  

–  Scale  is  a  challenge:  replica8on,  sharding    

•  Scalable  op8ons      –  NoSQL  (Cassandra,  Hbase)  [If  data  is  structured]  

•  Column  families  Gaining  Ground  –  Distributed  file  systems  (e.g.  HDFS)  [If  data  is  unstructured]  

•  New  SQL  –  In  Memory  compu8ng,  VoltDB    

•  Specialized  data  structures  –  Graph  Databases,  Data  structure  servers       hap://www.flickr.com/photos/keso/

363133967/  

Page 18: Building your big data solution

Storing  Data  (Contd.)  

•  WSO2  Offerings  (WSO2  Storage  Server)  – Small  Structured  Data:    keep  in  rela8onal  databases.    

– Large  structured  data  :  Cassandra  – Large  unstructured  data:  HDFS  

Page 19: Building your big data solution

Making  Sense  of  Data  •  To  know  (what  happened?)  

–  Basic  analy8cs  +  visualiza8ons  (min,  max,  average,  histogram,  distribu8ons  …  )  

–  Interac8ve  drill  down  •  To  explain  (why)  

– Data  mining,  classifica8ons,  building  models,  clustering        

•  To  forecast    – Neural  networks,  decision  models    

 

Page 20: Building your big data solution

Making  Sense  of  Data  (Contd.)  

•  Batch  processing  –  WSO2  BAM  – Hive  Scripts    – Map  Reduce  Jobs    

•  Real  8me  processing  –  CEP    – Event  Query  Language    

•  Above  two  are  the  plarorm,  you  need  to  program  your  usecase.    

Page 21: Building your big data solution

To  know  (what  happened?)  •  Mainly  Analy8cs  

– Min,  Max,  average,  correla8on,  histograms    

– Might  join  group  data  in  many  ways    

•  Implemented  with  MapReduce  or  Queries    

•  Data  is  ojen  presented  with  some  visualiza8ons  

•  Examples  –   forensics    –  Assessments  –  Historical  data/  reports/  trends      

hap://www.flickr.com/photos/isriya/2967310333/  

Page 22: Building your big data solution

To  Explain  (Paaerns)  •  Correla8on  

–  Scaaer  plot,  sta8s8cal  correla8on  

•  Data  Mining  (Detec8ng  Paaerns)  –  Clustering  and  classifica8on    –  Finding  Similar  items    –  Finding  Hubs  and  authori8es  in  a  Graph    

–  Finding  frequent  item  sets  – Making  recommenda8on    

•  Apache  Mahout    hap://www.flickr.com/photos/eriwst/2987739376/  and  hap://www.flickr.com/photos/focx/5035444779/        

Page 23: Building your big data solution

To  Predict:  Forecasts  and  Models  •  Trying  to  build  a  model  for  the  

data  •  Theore8cally  or  empirically    

–  Analy8cal  models  (e.g.  Physics)  –  Neural  networks    –  Reinforcement  learning    –  Unsupervised  learning  (clustering,  dimensionality  reduc8on,  kernel  methods)  

•  Examples    –  Transla8on    –  Weather  Forecast  models    –  Building  profiles  of  users    –  Traffic  models  –  Economic  models    

•  Lot  of  domain  specific  work    

hap://misterbijou.blogspot.com/2010_09_01_archive.html  

Page 24: Building your big data solution

Informa8on  Visualiza8on  

•  Presen8ng  informa8on    –  To  end  user    –  To  decision  takers    –  To  scien8st    

•  Interac8ve  explora8on  •  Sending  alerts      •  WSO2  UES    

–  Jaggery  based    •  BAM/  CEP  can  Work  with  most  other  UI  tools  

hap://www.flickr.com/photos/stevefaeembra/3604686097/  

Page 25: Building your big data solution

WSO2  UES  

•  Dashboards,  and  Store  •  Build  your  own  Uis  with  Jaggery    

Page 26: Building your big data solution

MapReduce/  Hadoop  •  First  introduced  by  Google,  and  used  as  the  processing  model  for  their  architecture    

•  Implemented  by  opensource  projects  like  Apache  Hadoop  and  Spark    

•  Users  writes  two  func8ons:  map  and  reduce    

•  The  framework  handles  the  details  like  distributed  processing,  fault  tolerance,  load  balancing  etc.    

•  Widely  used,  and  the  one  of  the  catalyst  of  Big  data  

void map(ctx, k, v){ tokens = v.split(); for t in tokens ctx.emit(t,1)

} void reduce(ctx, k, values[]){

count = 0; for v in values count = count + v; ctx.emit(k,count);

}

Page 27: Building your big data solution

MapReduce  (Contd.)  

Page 28: Building your big data solution

Data  In  the  Move  •  Idea  is  to  process  data  as  they  

are  received  in  streaming  fashion    

•  Used  when  we  need    –  Very  fast  output    –  Lots  of  events  (few  100k  to  millions)  

–  Processing  without  storing  (e.g.  too  much  data)  

•  Two  main  technologies  –  Stream  Processing  (e.g.  Strom,  hap://storm-­‐project.net/  )  

–  Complex  Event  Processing  (CEP)  hap://wso2.com/products/complex-­‐event-­‐processor/    

Page 29: Building your big data solution

Complex  Event  Processing  (CEP)  •  Sees  inputs  as  Event  streams  and  queried  with  SQL  like  language    

•  Supports  Filters,  Windows,  Join,  Paaerns  and  Sequences    

from p=PINChangeEvents#win.time(3600) join t=TransactionEvents[p.custid=custid][amount>10000] #win.time(3600)

return t.custid, t.amount;

Page 30: Building your big data solution

Summary    

Page 31: Building your big data solution

Case  Study  1:  Tracing  Business  Process  •  Business  process  is  built  using  many  services  

•  Track  trace  each  step,  and  analyze  to  understand  how  to  op8mize    

•  E.g.  sales  pipeline    

Page 32: Building your big data solution

Some  Queries  

•  Conversion  rate?  •  How  many  deals  in  pipeline  at  each  month?  •  Average  size  of  the  deals?    •  Average  8me  deal  takes?  •  Can  we  guess  an  large  size  deals  early?    •  Which  is  beaer?  Going  for  few  large  ones  or  many  small  ones?    

•  Was  there  any  delays  from  Ourside?  

Page 33: Building your big data solution

Hive:  Average  Size  of  the  Deal  

•  Hive  uses  an  SQL  like  synatax.    •  Easy  to  understand  and  learn    

hive> LOAD DATA ..

hive> SELECT avg(value) from LEAD_ACTIVITY WHERE action=“closedWon” groupby month;

Page 34: Building your big data solution

Map  Reduce:  How  many  deals  in  Pipeline?  

Page 35: Building your big data solution

How  many  deals  in  Pipeline?(Contd.)  void map(ctx, k, v){

Deals deal= parse(v); int month = getMonth(deal.time); ctx.emit(month,1)

} void reduce(ctx, k, values[]){

count = 0; for v in values count = count + v; ctx.emit(k,count);

}

Page 36: Building your big data solution

Case  study  2:  DEBS  Challenge  •  Event  Processing  challenge    

•  Real  football  game,  sensors  in  player  shoes  +  ball    

•  Events  in  15k  Hz    •  Event  format    

–  Sensor  ID,  TS,  x,  y,  z,  v,  a  

•  Queries  –  Running  Stats  –  Ball  Possession  –  Heat  Map  of  Ac8vity    –  Shots  at  Goal    

Page 37: Building your big data solution

Example:  Detect  ball  Possession    •  Possession  is  8me  a  player  hit  the  ball  un8l  someone  else  hits  it  or  it  goes  out  of  the  ground  

from Ball#window.length(1) as b join Players#window.length(1) as p

unidirectional on debs: getDistance(b.x,b.y,b.z, p.x, p.y, p.z) < 1000 and b.a > 55 select ... insert into hitStream from old = hitStream ,

b = hitStream [old. pid != pid ], n= hitStream[b.pid == pid]*,

( e1 = hitStream[b.pid != pid ] or e2= ballLeavingHitStream)

select ... insert into BallPossessionStream

hap://www.flickr.com/photos/glennharper/146164820/  

Page 38: Building your big data solution

Conclusions  

•  What  is  Big  Data?    •  Big  Data  Architecture    

– Collec8ng  data  – Storing  data  – Processing  Data  

•  WSO2  Offerings  •  Case  Studies    

Page 39: Building your big data solution

Ques%ons?  

Page 40: Building your big data solution

Engage with WSO2

•  Helping you get the most out of your deployments •  From project evaluation and inception to development

and going into production, WSO2 is your partner in ensuring 100% project success

Page 41: Building your big data solution