43

Measure() or die()

Embed Size (px)

Citation preview

Page 1: Measure() or die()
Page 2: Measure() or die()

By Arik Lerner Team Lead Automation & Performance/Resilience

Measure() OR Die();

Measure or Die

Page 3: Measure() or die()

- 3.5 years in Liveperson

- 2 years - Reporting Platform

- 1.5 years Team Lead Automation & Performance/Resilience

- Interests: Private pilot on Cessna 172

Bio

Page 4: Measure() or die()
Page 5: Measure() or die()

➔ How we monitor with e2e testing

➔ E2E Products & Persona’s

➔ The Awakens of the End2End Data

➔ Architecture & Life cycle

Meetup Agenda

Page 6: Measure() or die()

About Liveperson

Liveperson transforms theconnection between brands and

consumers.

Page 7: Measure() or die()

3BN Visits/month

200BN API calls/month

2 PB data a year

1.5 M Visits concurrent

Our Scale

Page 8: Measure() or die()

Our Engineering

~200 people RnD

Constant innovation

Multiple Technologies

Fast release cycle

Page 9: Measure() or die()
Page 10: Measure() or die()

We Monitor Liveperson Services

By e2e tests which simulate Real Business scenario

➔ Indicates real business problems

➔ Service availability from consumer eyes.

➔ Alert and acquire immediate action.

➔ Insight on our business services

Page 11: Measure() or die()

Agent Login Enter into the system

Visitor init chatVisitor enter into site

Agent Chat

E2E Scenario Example

Page 12: Measure() or die()

E2E customers expectations

➔ Stability == TRUST

➔ Investigatable

➔ Service Coverage

➔ Scale

Page 13: Measure() or die()

E2E

Page 14: Measure() or die()

E2E Dashboard Statistics

Page 15: Measure() or die()

Real Time Dashboard

Page 16: Measure() or die()

Kibana - HAR statistics & Aggregation

Page 17: Measure() or die()

E2E Persona’s

Production specialist

PMO

Management

Page 18: Measure() or die()

This is Yossi.When Yossi gets up in the morning Yossi looks at the E2E RT dashboardYossi recognize failureYossi enters into E2E debug center toolsYossi is smart!Be like Yossi.

Production Specialist User Story

Page 19: Measure() or die()

PMO User Story

This is Michal.Before any software deployment When dashboard failure rate is below 3%Michal have a GO for deploymentMichal is smart!Be like Michal.

Page 20: Measure() or die()

Management story

This is Eli.When Eli getup in the morning.Eli looks into the Dashboard statisticsEli can see the health and availabilityEach Data CentersEli is smart!Be like Eli.

Page 21: Measure() or die()
Page 22: Measure() or die()

➔ Total failures rate.

◆ Filter for each Data Center

◆ Filter each business flow

KPIs

➔ Trend to understand service stability

Widgets

What KPIs do I need to measure ?

Page 23: Measure() or die()

➔ Total chats failure rate.

➔ Total missing engagements

➔ Total login failures

➔ Average login response time.

KPIs

➔ Failure cause break down

➔ Client location root cause

➔ Test scenario failures

Widgets

What KPIs do I need to measure ?

Page 25: Measure() or die()

The Awakening of the End2End Data

Page 26: Measure() or die()

Start collecting the data!

➔ Get build failures/success

➔ Get failure cause

➔ Business flows

➔ Test duration

➔ Client location

➔ Data Center location

➔ Account

@Test

Raw Data Output

Page 27: Measure() or die()

The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser's

interaction with a site. The common extension for these files is .har.

The specification for the HTTP Archive (HAR) format defines an archival format for HTTP transactions that can

be used by a web browser to export detailed performance data about web pages it loads. The specification for

this format is produced by the Web Performance Working Group [1] of the World Wide Web Consortium (W3C).

The specification is in draft form and is a work in progress.

HAR (Http Archive)

➔Logging web browser traffic

Page 28: Measure() or die()

HAR proxy diagram

Proxy on port XXX

Selenium WebDriver

HAR

www.Liveperson.com

Request passes through proxy

Based on BrowserMob embedded proxy server

Code snippet - adding proxy into Selenium

Page 29: Measure() or die()

• N scenarios• Running from M locations • Running to X Data Centers • Yields HAR Data

Question: how do we investigate the data for the entire Farm/Location/Scenario ? etc...

Answer: aggregation.

Pop quiz:

Page 30: Measure() or die()

Start with collecting the data!

@Test

Raw Data Output { metaData:{ "Testname": ChatFlow, "Account": qa12345, "ClientLocation": US, "DataCenter": UK, }}

MetadataHAR

Page 31: Measure() or die()

Kafka (topic e2e)

Logstash + Elasticsearch

Kibana Dashboard

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

HAR files@Test @Test

HAR Processor

Files Output Get Json

Send data

Code snippet send message into Kafka

Page 32: Measure() or die()

Our benefits➔ Data Retention - 30 days

➔ Ability to query and aggregate over the data for investigation

➔ Ability to build dashboards

➔ Access to the data thorough Elasticsearch APIs

ELK & HAR Downsides➔ Complicated queries over Kibana

➔ ELK setup & maintenance

➔ When getting response timeout -> HAR displayed enormous number (need to be handled by code)

Page 33: Measure() or die()

What more E2E outputs do we have ?

@Test

More Output BDD ReportsVideoLogsBrowser console logs

Page 34: Measure() or die()

Code snippet

BDD - Behaviour Driven Development

Page 35: Measure() or die()
Page 36: Measure() or die()

MySql DB KAFKA + ELK

Kibana service E2E Reports

HAR datae2e data

Graphite

Zabbix

Jenkins Master

Production

metrics

Grafana

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

DC-1 DC-2 DC-N

@Test @Test

RT Dashboard

Jenkins Master DR

Page 37: Measure() or die()

E2E Test Lifecycle

DEV ProductionStagingQADEV

Page 38: Measure() or die()

E2E @ Scale

Page 39: Measure() or die()

E2E @ Scale➔ 1.5M http traffic records per day

➔ 200K runs per day

➔ 60 Jenkins slaves machines

➔ 28 scenarios

➔ 6 client location

➔ 6 Regions

Page 40: Measure() or die()

What to take home ?

➔ Monitor your Data Centers from consumer experience

➔ Collect data

➔ Provide business meaning with the data.

Page 41: Measure() or die()
Page 42: Measure() or die()

THANK YOU!We are hiring

Page 43: Measure() or die()

YouTube.com/LivePersonDev

Twitter.com/LivePersonDev

Facebook.com/LivePersonDev

Slideshare.net/LivePersonDev