22
RealTime, High Volume Log Processing with Flume & Cassandra Gemini Mobile Technologies 11.3.6 Gemini Mobile Technologies, Inc. 1

Flume-Cassandra Log Processor

  • Upload
    cloudian

  • View
    13.282

  • Download
    0

Embed Size (px)

DESCRIPTION

Gemini Mobile Technologies ("Gemini") released a Real-Time Log Processing System based on Flume and Cassandra ("Flume-Cassandra Log Processor") as open source. The Flume-Cassandra Log Processor enables massive volumes of production system logs to be collected and processed into graphical reports, in real-time. In addition, logs from multiple data centers can be simultaneously aggregated and analyzed in a single database.

Citation preview

Page 1: Flume-Cassandra Log Processor

Real-­‐Time,  High  Volume  Log  Processing  with  Flume  &  Cassandra  

Gemini Mobile Technologies

11.3.6 Gemini Mobile Technologies, Inc. 1

Page 2: Flume-Cassandra Log Processor

Overview  

1.   Log  CollecAon  &  Storage  in  DB  •  Reliably  and  efficiently  moves  logs  from  mul6ple  applica6on  nodes  using  Flume  

•  Store  raw  and  processed  log  data  in  Cassandra  DB  

2.   Real  Time  and  On  Demand  reports  

•  Via  Web  GUI  to  query  against  Cassandra.  E.g.  Transac6ons  Per  Second  (TPS)  vs  Time,  search  user’s  records.    

3.   Summary  reports  by  Map-­‐Reduce  

•  E.g.  Monthly  usage  by  category  (voice,  data,  mail,  etc)  for  groups  of  users.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 2  

Applica6on  Node  

Applica6on  Node  

Applica6on  Node  

…  

Log  Aggregator  

Log  Aggregator  

Reports  (Web  GUI)   Cassandra  

Cassandra  

OA&M  

Page 3: Flume-Cassandra Log Processor

Key  Benefits  

1.   Real  Time,  Up  to  Date  Business  Intelligence  

•  Dynamic,  near-­‐real-­‐6me  reports.      

2.   Flexible  Analysis  on  Large  Historical  Data  •  Instant  Query  by  6me  range,  raw  log  fields,  processed  log  fields  (Data  is  stored  in  a  

Database  for  fast  querying,  not  flat  Log  Files)  

•  Create  On  Demand  Custom  Summary  Reports  by  Map-­‐Reduce  

3.   MulAple  Data  Center  Support  

•  Collect  and  Store  in  local  Data  Center,  Query  and  Analyze  across  Data  Centers  

4.   Reliable,  Easy  OperaAon,  Maintenance,  and  Scalability  

•  No  Data  Loss  if  network  and  PCs  fail  

•  As  Data  Volume  (size  of  data  stored)  or  Velocity  (speed  of  new  data  arrival)  grow,  scale  

to  100s  of  nodes,  TBs  of  data/day  by  adding  PCs  horizontally    

•  Easy  to  setup,  configure,  and  monitor  for  a  large  network  

5.   Easy  CustomizaAon  

•  Open  source,  easy  to  change  for  custom  log  format,  custom  reports,  and  queries  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 3  

Page 4: Flume-Cassandra Log Processor

Log  CollecAon:  Flume    •  Open-­‐source  log  collec6on  system:  h^p://archive.cloudera.com/cdh/3/flume/UserGuide.html  

•  Flume  Agent:  Reads  logs  at  configurable  interval  (e.g.,  100ms)  and  sends  to  

Collector  nodes.  

•  Flume  Collector:  Parses  logs  and  inserts  to  Cassandra.  

•  Flume  Master:  Monitors  health  and  processing  state  of  Agents  and  Collectors.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 4  

Flume  agent1_src1  

Flume    agent1_src2  

App Node 1

Flume  agent2_src1  

Flume  agent2_src2  

App Node 2

Flume  collector_src1  

Flume  collector_src2  

Log Aggregator

Cassandra  

Cassandra  Flume  Master  

Page 5: Flume-Cassandra Log Processor

Storage  Layer:  Cassandra    

•  Cassandra,  an  open-­‐source  Apache  project,  is  the  storage  layer.      It’s  a  high  performance,  highly-­‐scalable  distributed  database.  

•  Top-­‐level  Apache  Project    (h^p://cassandra.apache.org/)  

•  Key  Features  •  Op6mized  for  Fast  Writes  of  Small  Data  (<100KB  each)  

•  Peer-­‐to-­‐peer  nodes,  easy  to  add/remove  nodes  ad-­‐hoc  

•  Scalable  for  clusters  from  2  to  100s  of  nodes.  

•  Mul6ple  data  center  replica6on  

•  Tunable  consistency  level,  per  request  level  

Page 6: Flume-Cassandra Log Processor

Log  CollecAon  System  Monitoring  (Flume  Master)  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 6  

Page 7: Flume-Cassandra Log Processor

Reports  

•  Search  by  a^ribute:  •  Date  Range  •  Log  fields  (e.g.,  userID,  Message  Type)  

•  List  view  (Rows  of  log  data)  •  Graph  view  (quan6ty  vs.  6me)  

•  Data  downloadable  to  CSV  format.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 7  

Page 8: Flume-Cassandra Log Processor

Reports  Example:  CDR  Search    

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 8  

Page 9: Flume-Cassandra Log Processor

Reports  Example:  CDR  Search  Results  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 9  

Page 10: Flume-Cassandra Log Processor

Reports  Example:  Graphs  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 10  

Page 11: Flume-Cassandra Log Processor

Sizing  Example  Node  Hardware:      Supermicro  (CPU:  2  quad-­‐core  Intel  E5420,  32GB  RAM,  16x1TB  SATA  HD)    ~$6,000.  

Monitoring  Layer:  

•  Nodes  required:  2    (1  Master  +  1  Standby  for  High  Availability)  

Collector  Layer:    

•  Nodes  required  =  MAX(2,  Node  Write  Throughput  (MB/S)  /  (log  bytes  per  transac6on  *  transac6ons  per  second  (TPS)))  

•  Example  :  1  MB/sec  write  throughput  per  node,  1  KB/Transac6on,  1000  TPS  system  =  1MB/s  writes.    

Storage  Layer:  

•  Nodes  required  =  MAX(Replica6on  Factor,  Data  Per  Day  *  #  of  Days  to  keep  /  (Effec6ve  Node  Storage  /  Replica6on  Factor)  )  

•  Example:  Data  Per  Day  =  100  GB,  #  of  Days  to  Keep  =  365,  Effec6ve  Node  Storage  =  8  TB,  Replica6on  Factor  =  2;  Then  Nodes  Required  =  

100  *  365  /  (8000  /  2)  =  9.125  =  10  nodes  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 11  

2

3

4

5

2 3 4 5

Collector nodes required

MB/Sec (log bytes/tx * TPS)

EffecAve  storage(GB)  /  node  

replicaAon  factor  

Data  (GB)  /  day  

#  days  of  data  /  node  

#  of  nodes  for  365  days  

8000   2   10   400   2  

8000   3   10   266   3  

8000   2   100   40   10  

Example

Example

Page 12: Flume-Cassandra Log Processor

Open  Source  Components  

•  Flume  and  Cassandra  are  available  open-­‐source  components.    We  add  the  following  

components:  

1.  Custom  Flume-­‐Cassandra  Connector:  Reads  our  log  format  and  inserts  into  Cassandra  

2.  Cassandra  data  design  including  schemas  and  configura6on  

3.  Browser  UI  and  Queries  to  Cassandra    

4.  Post-­‐processor  to  generate  custom  log  format  files  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 12  

Page 13: Flume-Cassandra Log Processor

Cassandra  Data  Model  Currently,  Flume  inserts  into  4  tables:  

1.    Raw  Data  Table  

•  Func6on:  Store  original  log  data  as  received.  •  Row  key:  YYYYMMDDHH,  One  for  each  hour.  

•  Column:    Name:  Log  entry  UUID,  Value:  Log  data.      

2.    CDR  Entry  Table  

•  Func6on:  Represent  each  log  field  as  a  column.    Useful  for  querying  and  indexing.  

•  Row  key:    Log  entry  UUID.  •  Column:      Name:  log  data  field  name,  Value:  log  data  field  value.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 13  

AAB32431352   ABC32433781   BCD32433901  

2011011107  

01S,Market1,12345AA,20110111071200000,10.10.2.9,,10.10.2.10,09012345673,carrier.ne.jp,carrier.ne.jp,,,,,  

04RR,Market1,12345ZZ,20110111071200005,10.10.2.9,,10.10.2.10,09023456890,carrier.ne.jp,carrier.ne.jp,,,,,  

07S,Market1,12345BB,20110111071200010,10.10.2.9,,10.10.2.10,09012345673,carrier.ne.jp,carrier.ne.jp,,,,,  

type  

market   id   Amestamp   moipaddress  

mApaddress  

msisdn   senderdomain  

recipientdomain  

AAB32431352  

01S   Market1   1235AA   20110111071200000  

10.10.2.9   10.10.2.10   09012345673  

carrier.ne.jp   carrier.ne.jp  

ABC32433781  

04RR  

Market1   1235ZZ   20110111071200005  

10.10.2.9   10.10.2.10   09023456890  

carrier.ne.jp   carrier.ne.jp  

BCD32433901  

07S   Market1   1235BB   20110111071200010  

10.10.2.9   10.10.2.10   09012345673  

carrier.ne.jp   carrier.ne.jp  

Column •  added for each log entry in that hour •  sorted by Unique Log Entry ID (UUID)

Row •  added for each log entry

Row •  added for each hour

Page 14: Flume-Cassandra Log Processor

Cassandra  Data  Model  

3.    MSISDN  Timeline  Table  

•  Func6on:  Organize  by  MSISDN  then  6mestamp.  

•  Row  key:  MSISDN.    

•  Column:        name:  6mestamp.    Value:  Log  entry  UUID  to  point  to  CDREntry.  

4.  HourlyTimeline  Table  

•  Func6on:  Organize  by  6me  (hour)  then  by  6mestamp.  

•  Row  key:  YYYYMMDDHH.    

•  Column:      Name:  6mestamp  value.      Value:  UUID  to  point  to  CDREntry.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 14  

20110111071200000   20110111071200010  

09012345673   AAB32431352   BCD32433901  

20110111071200000   20110111071200005   20110111071200010  

2011011107   AAB32431352   ABC32433781   BCD32433901  Column •  added for each log entry in that hour •  sorted by Time stamp

Column •  added for each log entry for that MSISDN •  sorted by Time stamp

20110111071200005  

09023456890   ABC32433781  

Row •  added for each MSISDN

Row •  added for each hour

20110111081200001   20110111081200010  

2011011108   BDB32431352   CDC32431352  

Page 15: Flume-Cassandra Log Processor

Next  Steps  

•  Gemini  has  open  sourced  the  package  at  

h^ps://github.com/geminitech/logprocessing  

•  README,  sample  data,  package  

•  To  try:  •  Download  and  install  Flume,  Cassandra,  and  Gemini’s  code  

•  Try  with  sample  data  

•  To  use  for  a  Produc6on  System  

•  Get  sample  logs  from  the  actual  system,  Customize  Flume  Plug-­‐in  if  needed  

•  Decide  what  reports  are  needed,  Customize  Cassandra  Table  format  and  UI  if  needed  

•  Test  func6onality  and  performance  with  sampe  logs  

•  Deploy:  Lab  system  first,  then  Produc6on  System  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 15  

Page 16: Flume-Cassandra Log Processor

Backup  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 16  

Page 17: Flume-Cassandra Log Processor

Database  Storage  AlternaAves  

Cassandra  is  the  storage  system  used.  

Comparisons  to  some  alterna6ves:  

•   SQL.    Can't  insert  so  much  data  at  a  high  rate.    Cannot  scale  horizontally  easily.  

•   Hadoop.    Cannot  query  flexibly  and  manipulate  data  since  it  is  not  in  a  database-­‐

like  system.  

•   HBase  or  Hibari.    Provides  much  of  same  capability  as  Cassandra.    Cassandra  was  

chosen  because  

•  Mul6ple  data  center  support  

•  Peer-­‐to-­‐peer  nodes,  easy  to  add/remove  nodes  ad-­‐hoc  

•  Tunable  consistency  •  Not  currently  used,  but  would  be  useful  with  mul6ple  data  centers,  or  with  different  classes  of  

data  (e.g.  Billing  Records  vs  Sta6s6cs  Records)  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 17  

Page 18: Flume-Cassandra Log Processor

FAQ  (Page  1)  

From  a  view  of  using  stored  logs,  will  you  please  tell  us  your  know-­‐how,  e.g.,  

Q.  What  approach  of  storage  (lumped  storage  of  mulAple  logs  or  concentrated  

storage  of  similar  data)  would  be  effecAve  for  analysis  or  parse?  

A.  It  depends  on  what  analysis/reports  we  would  like  to  do  later.    In  our  solu6on,  we  have  one  table  which  stores  all  logs,  and  we  have  3  other  tables  which  provide  

indexing  based  on  similar  data  (e.g.  MSISDN,  Time  stamp,  Log  ID)  for  fast  queries.    

The  exact  table/schema  may  be  customized  depending  on  the  actual  log  and  desired  reports.  

Q.  When  logs  are  analyzed  /  parsed  later,  would  it  be  beger  to  use  a  stored  

distributed  DB  on  an  as-­‐is  basis,  or  is  it  beger  to  convert  a  data  structure  in  a  

certain  way  before  returning  data  into  distributed  DB  and  DWH?  

A.  For  real  6me  queries  and  fast  report  genera6on,  it  is  useful  to  convert  the  data  into  

certain  tables.    As  shown  in  our  example,  we  store  the  log  both  "as-­‐is"  and  in  table  

format.    This  allows  the  most  flexible  usage.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 18  

Page 19: Flume-Cassandra Log Processor

FAQ  (Page  2)  

Q.  What  type  of  logs  (logs,  each  of  which  has  a  short  record,  e.g.,  syslog  and  APlog  of  

a  system;  a  variety  of  logs  such  as  Lifelog;  or  large  logs  such  as  mulAmedia  data  

or  web  pages)  would  best  fit  for  aggregaAon?  

A.  All  of  these  fit  well.    In  our  Cassandra  based  system,  we  can  set  expiry  6me  for  

each  log  entry.    Then  we  can  have  some  short  lived  records  be  automa6cally  

deleted  azer  certain  periods,  while  long  live  logs  can  stay  for  a  long  6me.    So  in  our  

system,  it's  possible  to  have  different  types  of  logs  in  the  same  database.      

Q.  Is  there  any  assumpAon  or  example  such  as  a  BI  tool  for  analysis  /  parse?    

A.      No.    Once  the  data  is  in  the  database,  any  BI  tool  can  be  used  for  analysis.    The  BI  

tool  would  need  to  be  integrated  to  Cassandra.    There  are  a  variety  of  ways  to  do  

this,  and  amount  of  customiza6on  depends  on  the  BI  tool.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 19  

Page 20: Flume-Cassandra Log Processor

FAQ  (Page  3)  

Q.   How  is  older  data  deleted?  A.      Cassandra  has  a  6me-­‐to-­‐live  (TTL)  for  each  column  (in  seconds).    Azer  TTL  is  

expired,  data  is  automa6cally  deleted  at  compac6on  6me.  

Q.   How  do  we  detect/process  alarms  when  the  data  store  gets  full?    How  do  we  

predict  when  data  store  is  full  so  we  can  expand?  

B.  SNMP  (netsnmp)  can  be  used  to  monitor  server  disk  usage.    When  it  exceeds  a  

certain  threshold,    an  SNMP  trap  is  generated.  

S.   How  does  this  compare  with  a  Hadoop-­‐based  log  processing  system?  

D.  By  adding  a  database  (i.e.,  Cassandra),  we  can  query  in  real  6me,  issue  complex  

queries  and  do  other  database-­‐type  opera6ons.      

U.   Do  we  use  Map/Reduce?  

A.      A  map-­‐reduce  script  can  be  used  to  post-­‐process  the  log  data  and  to  generate  

other  log  formats  or  analysis.    We  haven’t  tested  but  “should  work.”  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 20  

Page 21: Flume-Cassandra Log Processor

FAQ  (Page  4)  

Q.    How  real  Ame  is  this  system  (exactly  how  much  delayed  it  would  be  under  the  

best  circumstances)?    What  would  it  take  to  make  it  more  real  Ame?  

A.  Total  latency  is  A  +  B  +  C  where  A  is  configurable  delay  to  read  log  file,  B  is  6me  to  

move  data  from  Agent  node  to  Collector  node,  C  is  insert  into  Cassandra.      As  an  

example  scenario  A=100ms,  B=50ms,  C=10ms,  total  is  160ms.  

Q.    How  many  lines  of  code?    In  what  language?      

A.  Flume  to  Cassandra  plugin  (~40  lines  of  Java),  UI  (~2000  lines  of  Java,  JSP),  Post-­‐process  log  format  (~250  lines  of  Java).  

R.   Areas  to  improve?  

C.  1.  Generalize  UI  so  it  can  work  with  any  log  format.  

             2.    Extensive  load  and  large  system  tes6ng.  

             3.    Add  Pig  scripts  to  post-­‐process  log  data.  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 21  

Page 22: Flume-Cassandra Log Processor

Pig  for  Cassandra  

•  Pig  (h^p://pig.apache.org/)  is  a  high-­‐level,  rela6onal  language  to  write  queries  that  are  then  translated  to  Map/Reduce  jobs.  

•  The  Map/Reduce  jobs  are  supported  by  Cassandra.  

•  Example  Pig  script  that  finds  the  top-­‐100  MSISDNs  that  have  the  highest  number  of  

log  records  since  2011-­‐01-­‐01.  msisdn = LOAD 'cassandra://CDRLogs/MSISDNTimeline' USING CassandraStorage();

cdrs = FOREACH msisdn GENERATE flatten($1);

cdrtime = FOREACH cdrs GENERATE $0;

givenhourcdr = FILTER cdrtime BY $0 > 20110101000000

msisdnByHour = GROUP givenhourcdr BY $0;

msisdnByHourCount = FOREACH msisdnByHour GENERATE COUNT($1), group;

orderedMsisdn = ORDER msisdnByHourCount BY $0;

topUserAfterNewYear = LIMIT orderedMsisdn 100;

dump topUserAfterNewYear;  

11.3.6   Gemini  Mobile  Technologies,  Inc.    All  rights  reserved. 22