41
I got 10 trillion problems, but logging ain’t one John GrahamCumming

10 Trillion

Embed Size (px)

DESCRIPTION

Trillion http requests

Citation preview

Page 1: 10 Trillion

I  got  10  trillion  problems,  but  logging  ain’t  one

John  Graham-­‐Cumming

Page 2: 10 Trillion
Page 3: 10 Trillion
Page 4: 10 Trillion
Page 5: 10 Trillion
Page 6: 10 Trillion
Page 7: 10 Trillion
Page 8: 10 Trillion
Page 9: 10 Trillion
Page 10: 10 Trillion
Page 11: 10 Trillion

10  trillion  HTTP  requests  per  month

Page 12: 10 Trillion

4Mhz  log  lines

Page 13: 10 Trillion

A  log  processing*  company  that  also  runs  a  CDN  and  web  security  service

Page 14: 10 Trillion

A  log  processing*  company  that  also  runs  a  CDN  and  web  security  service

*not storage

Page 15: 10 Trillion

The  Data  Team400TB/day*

Page 16: 10 Trillion

The  Data  Team400TB/day** 146 PB per year

Page 17: 10 Trillion

The  Data  Team400TB/day** 146 PB per year* and that’s the

compressed size

Page 18: 10 Trillion

The  Data  Team400TB/day** 146 PB per year* and that’s the

compressed size

122,000,000,000 floppies

Page 19: 10 Trillion

Privacy

Page 20: 10 Trillion
Page 21: 10 Trillion

Three  Things

•Provide  charts  for  our  customers  

•Give  customers  data  about  attacks    

•Automatically  spot  attacks  in  real-­‐time

Page 22: 10 Trillion
Page 23: 10 Trillion
Page 24: 10 Trillion
Page 25: 10 Trillion

Things  we  ❤️• NGINX  +  LuaJIT  

• Cap’n  Proto  

• Apache  KaSa  

• Redis  

• Go  

• Postgres  and  CitusDB  

• Streaming  Algorithms

Page 26: 10 Trillion

Things  we  ❤️• NGINX  +  LuaJIT  

• Cap’n  Proto  

• Apache  KaSa  

• Redis  

• Go  

• Postgres  and  CitusDB  

• Streaming  Algorithms

Page 27: 10 Trillion

Things  we  ❤️• NGINX  +  LuaJIT  

• Cap’n  Proto  

• Apache  KaSa  

• Redis  

• Go  

• Postgres  and  CitusDB  

• Streaming  Algorithms

https://blog.cloudflare.com/tag/go/

Page 28: 10 Trillion

NGINX + LuaJIT

• Every  request  executes  Lua  code…  lots  

• LuaJIT  is  very  fast  

• Mixture  of  human-­‐written  Lua  and  generated  code  

• http://wiki.nginx.org/HttpLuaModule

Page 29: 10 Trillion

Lua  for  WAF

Page 30: 10 Trillion

Generated  Code

Page 31: 10 Trillion

NGINX + LuaJIT

• Go  program  receives  log  events  from  NGINX  in  Cap’n  Proto  format  

• Batches  events  

• Compresses  using  LZ4  

• Sends  via  TLS  to  Data

Page 32: 10 Trillion

Cap’n Proto• Insanely  fast  

• Saw  20x  speedup  over  cjson  

• It’s  a  wire  format  and  an  in  memory  representation  

• Extend  with  no  penalty  

• https://capnproto.org/  

• Our  interface  from  Luahttps://blog.cloudflare.com/introducing-­‐lua-­‐capnproto-­‐better-­‐serialization-­‐in-­‐lua/

Page 33: 10 Trillion

Apache  KaSa

• Fast,  scalable,  resilient  queue  

• Queue  on  a  cluster  not  a  single  machine  

• Allows  clusters  of  readers  to  process  queue  messages  

• https://kaSa.apache.org/

Page 34: 10 Trillion

Apache  KaSa• Cluster  of  Go  programs  process  log  messages  

• Generate  detailed  attack  logs  for  customers  

• Feed  aggregates  to  Postgres

Page 35: 10 Trillion

Attack  Log

Page 36: 10 Trillion

Postgres  and  CitusDB• Go  processes  produce  1  minute  roll  ups  of  customer  analytics  an  insert  into  CitusDB  

• Later  1  hour,  1  day  etc.  roll  ups  created  

• CitusDB  is  a  sharded,  replicated  Postgres  implementation  for  very  fast  queries  

• https://blog.cloudflare.com/scaling-­‐out-­‐postgresql-­‐for-­‐cloudflare-­‐analytics-­‐using-­‐citusdb/    

• https://www.citusdata.com/

Page 37: 10 Trillion
Page 38: 10 Trillion

• 128GB  RAM2x  Intel®  Xeon®  Processor  E5-­‐2630  v3  (16  cores) 10G  EthernetDisks  vary  

• Custom  made  for  us  by  Quanta

Page 39: 10 Trillion

• 40  machines  for  KaSa400TB  of  compressed,  replicated  500-­‐byte  log  linesIngest  at  ~15Gbps ~50TB  of  spinning  rust  per  node  

• 5  machines  for  CitusDBAnalytics  for  2  million  customers ~12TB  of  SSD  per  node  

• >  100  machines  for  consumers  written  in  Go The  analytics  roll  up  processesAttack  detectionBotnet  analysis

Page 40: 10 Trillion

Streaming  Algorithms• Space  Saving  AlgorithmEfficient  Computation  of  Frequent  and  Top-­‐k  Elements  in  Data  Streams https://icmi.cs.ucsb.edu/research/tech_reports/reports/2005-­‐23.pdfHasn’t  worked  well  for  ‘long  tail  data’  

• HyperLogLogCounting  distinct  elementshttps://github.com/aggregateknowledge/postgresql-­‐hll

Page 41: 10 Trillion

https://cloudflare.github.io/