44
WE SCALED IT Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018

WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

WE SCALED IT

Scale testing Eclipse Hono on OpenShiftDejan Bosanac, Jens ReimannSenior Software EngineerEclipseCon Europe 2018

Page 2: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

2

DEJAN BOSANAC

Twitter: @dejanbGitHub: @dejanb

Senior Software EngineersMiddleware / Messaging

Focus on IoT

JENS REIMANN

Twitter: @ctronGitHub: @ctron

WHO WE AREWorking for Red Hat

Page 3: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

3

ECLIPSE HONOProject Introduction

Connect. Command. Control.

“Eclipse Hono™ provides remote service interfaces for connecting large numbers of IoT devices to a back end and interacting with them in a uniform way regardless of the device communication protocol. “…“Hono specifcally supports scalable and secure ingestion of large volumes of sensor data by means of its Telemetry and Event APIs.“

Page 4: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

THE CLAIM

Page 5: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

5

ECLIPSE HONOConnect. Command. Control.

Hono provides scalability for IoT

● Hono is scalable● Hono allows to connect a large numbers of devices ● A core functionality is the ingestion of telemetry

data

Page 6: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

6

OpenShift

BASIC ARCHITECTUREThis is what was interesting to us.

HonoProtocol Adapters

EnMasseAMQP 1.0 Layer Consumer

TelemetryDevices

Page 7: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

7

First attempt to run a larger test.

WE’VE DONE THIS BEFORE

Nov2017

Jan2018

Validate that our fxes actually worked.

Yes, they did.

This wasn’t the frst test. We already tested Hono previously, but not with a focus on scalability.

Test if Hono scales.

… does it?July

2018

Page 8: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

THE LAB

Page 9: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

9

PICK A LABLuckily we have more than one, we went with the one we already knew.

Lab capabilities

● 148 nodes● 1888 cores (3392 threads)● 1.1PB of storage capacity

Our reservation – for 3+1 weeks

● 16 DELL R620 - 2x E5-2620, 64/128 GB RAM● 192 total cores, 384 total threads● 1536 GB RAM

Page 10: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

10

HOW WE SET IT UPA tale of two clusters.

Two separate OpenShift clusters.

● One for Hono and its infrastructure● 3 infrastructure nodes● Cluster native storage (Gluster)

● One for simulating devices and consumers● Was it necessary? No, but easier to work with.

● Dedicated masters

Master Infra Infra Infra

Master Compute Compute Compute

Compute Compute Compute Compute

Compute Compute Compute Compute

Page 11: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

11

WHAT WE WANTED TO TESTThe use case we focused on.

Ingestion of telemetry data via HTTP.

● We also did tests around events, MQTT. But the focus was on telemetry over HTTP.

● We wanted to establish a baseline for telemetry over HTTP, and then start playing around, making modifcations.

● We also wanted to validate a view fxes and ideas we had in the past.

● Take a top down approach. Use the defaults, then optimize.

ConsumerQpid Router

Device Simulator

HTTP Adapter

Page 12: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

12

SOFTWARE COMPONENTSA few facts about the software we used.

● OpenShift (OCP) 3.9.1● RHEL 7.5● EnMasse 0.21

● Eclipse Hono 0.7● Snapshot version, close to M2● Forked on GitHub – “redhat-iot/hono”.

● Hono Simuator – “redhat-iot/hono-simulator”● Device simulator● Consumer simulator

● Hono Scale Test – “redhat-iot/hono-scale-test”● Scale test deployment and tools

Page 13: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

13

MAKING PLANS

A few simple steps

● Get the test setup up and running● Create a baseline for testing, max out the number of

messages/s● Start automated testing, ramping up the workload,

measuring the results, seeing if it scales.● Test a few scenarios: OpenJ9, Threading settings,

events, …

Page 14: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

UP AND RUNNING

Page 15: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

15

SCALING EVERYTHING

Page 16: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

16

SCALING EVERYTHING

Page 17: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

17

SCALING EVERYTHING

Page 18: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

MAXING IT OUT

Page 19: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

19

PLEASE NOTEBeware absolute numbers.

Results may be differentin your environment.

Do your own tests!It is all Open Source!

Page 20: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

20

MEASURING PERFORMANCEGenerating and consuming IoT workload.

● The simulation cluster runs X pods, simulating Y devices, sending one message each second.● How many attempts to send?

● The simulation cluster runs Z pods, consuming messages as fast as possible.● How many messages received?

● Hono handles back-pressure by rejecting messaging at the protocol adapter. Has an HTTP code for that.● Which errors?

● We can calculate the error rate, messages that couldn’t be sent due to an error/reject on the cloud side.

● We can scale up producers, consumers and infrastructure components (EnMasse, Hono).

Page 21: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

21

CATCH A GLIMPSE

Page 22: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

22

CATCH A GLIMPSEQPID Routers 1 1 1 1 1 2 2

Devices / Pod 1,000 1,000 1,000 1,000 1,000 1,000 1,000Pods 1 2 4 8 16 32 40Messages / second 1,000 2,000 4,000 8,000 16,000 32,000 40,000

Request RTT (ms) <5 <5 11 45 20 50 150

Consumers 20 20 20 20 20 20 8Consumer Credits 1,000 1,000 1,000 1,000 1,000 1,000 1,000

Hono Adapters 1 1 1 1 2 5 10Receiver Link Credit 1,000 1,000 1,000 1,000 1,000 1,000 1,000

Hono Device Registry 10 10 10 10 10 10 10Receiver Link Credit 1,000 1,000 1,000 1,000 1,000 1,000 1,000

Failure Ratio (%) < 0.1 < 0.1 < 0.1 < 0.1 < 0.1 < 0.1 < 1Average Throughput (msgs 1,000 2,000 4,000 8,000 16,000 32,000 40,000

Messages / Instance 1000 2000 4000 8000 8000 6400 4000

Page 23: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

23

THE BASELINE

Maximum of 80.000 msgs/s

● 2 Dedicated QPID router nodes● Device registry service as a sidecar● 8 nodes running Hono services● Using a HTTP client library which scales.

● Throughput is limited by CPU and Network resources

● RAM is not an limiting factor● Neither is disk (for telemetry!)

Page 24: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

THINGS WE LEARNED

Page 25: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

25

DEVICE REGISTRY SIDECAR

DEVICE ADAPTER QPIDROUTER

REGISTRY

Page 26: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

26

DEVICE REGISTRY SIDECAR

POD

CONTAINER

CONTAINER

POD

CONTAINER

POD

CONTAINER

POD

CONTAINER

Page 27: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

27

QPID ROUTER NODES

Putting QPID router on infra nodes helped streamline the message fow.

● Having dedicated QPID router nodes, did reduce the inter node traffc.

● Using node ports (vs routes) improved the performance a lot.

Master Infra Infra Infra

Master Compute Compute Compute

Compute Compute Compute Compute

Compute Compute Compute Compute

Page 28: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

28

A PERFORMANT HTTP CLIENT

Finding a performant HTTP client library is tricky.

● Some of the initial limitations came from the device simulator, which could not keep up sending requests.

● OkHttp – Simple API, average performance● Java default HTTP client – Complicated, bad

performance● Apache HttpClient – Complicated, bad performance

● AHC – Average API, good performance● Vert.x HTTP Client – Good API, good performance

● We went with vert.x it also provides a simple API and we already knew it.

Page 29: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

29

NATIVE SOCKETS AND TLS

Vertx/Netty allow the use of “epoll()” and “libssl.so”

● Switching to native TLS reduces the CPU load, especially over the long run.

● Switching to netty epoll() improves the throughput and takes load off the JVM.

Page 30: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

AUTOMATED SCALING TESTS

Page 31: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

31

DEFAULT SCENARIOScale. Wait. Stabilize. Repeat.

We wrote an automated test, to run the same scenario with different parameters.

1) Scale up producer2) Wait for a stable message fow3) If the fow is stable (error rate), continue with 1)4) Otherwise scale up Hono adapters until we hit a limit5) End the test when we reached the adapter limit

+1 PROD

ERR < 2%

+1 HTTP

END

HTTP < x

Page 32: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

32

DEFAULT SCENARIOResults.

Page 33: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

THINGS WE TRIED

Page 34: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

34

I/O THREADSComparing 1 vs 9 vert.x worker threads.

The same scenario was run, with a different thread confguration.

● Vert.x assigns “verticles” (services) to a worker thread.

● Is 1 JVM with 9 worker threads better than 9 JVMs with 1 worker thread?

2.00

04.

0006.

0008.

000

10.0

00

12.0

00

14.0

00

16.0

00

18.0

00

20.0

00

22.0

00

24.0

00

26.0

00

28.0

00

30.0

00

32.0

00

34.0

00

36.0

00

38.0

00

40.0

00

42.0

00

44.0

00

46.0

00

48.0

00

50.0

00

52.0

00

54.0

00

56.0

00

58.0

00

60.0

00

62.0

00

64.0

00

66.0

00

68.0

00

70.0

00

72.0

000

20

40

60

80

100

120

0

200

400

600

800

1.000

1.200

rtt (ms) – 1

msgs/s/thread – 1

rtt (ms) – 9

msgs/s/thread – 9

Page 35: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

35

1 vs 9 IO THREADS

2.00

04.

0006.

0008.

000

10.0

00

12.0

00

14.0

00

16.0

00

18.0

00

20.0

00

22.0

00

24.0

00

26.0

00

28.0

00

30.0

00

32.0

00

34.0

00

36.0

00

38.0

00

40.0

00

42.0

00

44.0

00

46.0

00

48.0

00

50.0

00

52.0

00

54.0

00

56.0

00

58.0

00

60.0

00

62.0

00

64.0

00

66.0

00

68.0

00

70.0

00

72.0

000

20

40

60

80

100

120

0

200

400

600

800

1.000

1.200

rtt (ms) – 1

msgs/s/thread – 1

rtt (ms) – 9

msgs/s/thread – 9

Page 36: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

36

OPENJ9 vs OPENJDKTesting with Eclipse OpenJ9 vs OpenJDK

The claim is, that OpenJ9 consumes less memory, having the same performance.

● Create a fabric S2I builder for OpenJ9● ctron/s2i-java-openj9

● Run the default scenario with it

● No real difference could be seen.● RAM never was a limiting factor.● But the performance was the same.

Page 37: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

SUMMARY

Page 38: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

38

YES, IT SCALES!

Page 39: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

39

YES, IT SCALES.So, what does that mean?

In the amount of resources we had available, we could fnd that:

1) In the default scenario, with 9 worker threads, adding an additional pod increased the cluster capacity by ~5.000 msgs/s.

2) We could repeat that until we hit out cluster limit.

Page 40: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

40

IS THAT GOOD ENOUGH?Can we do better?

There is always room for improvement.

● The request/response client had some issues with back-pressure handling (fxed in 0.7)

● There was a memory leak, building up stress on the JVM (fxed in 0.7)

● There was a performance issue in the AMQP 1.0 encoding (fxed in 0.8-M1)

● And we continue to improve it …

Page 41: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

41

THINGS TO CONSIDERDeploying a cloud IoT solution can be tricky.

Keep in mind:

● The device registry, which is not part of Hono, has a huge impact on the overall performance.

● Consumers also play a role in the overall performance. We simply threw away the payload. This may not be viable business case for you ;-)

Page 42: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

42

TRY THIS AT HOME

Everything is Open Source, you can do your own scale test.

● Eclipse Hono● eclipse/hono

● Scale Test Deployment● redhat-iot/hono-scale-test

● Hono Simulator● redhat-iot/hono-simulator

Page 43: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max

QUESTIONS?

Page 44: WE SCALED IT - EclipseCon 2020 · 2018. 11. 12. · Scale testing Eclipse Hono on OpenShift Dejan Bosanac, Jens Reimann Senior Software Engineer EclipseCon Europe 2018. 2 ... max