58
Move Over IBM WebSeal and F5 BigIP, Here Comes NGINX 09/23/2015

Nginx conference 2015

  • Upload
    ing-it

  • View
    1.010

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Nginx conference 2015

Move Over IBM WebSeal and F5 BigIP, Here Comes NGINX09/23/2015

Page 2: Nginx conference 2015

#nginx #nginxconf2

Advisory IT Specialist at ING Bank N.V.

Bart Warmerdam

Page 3: Nginx conference 2015

Who is ING globally

3

Page 4: Nginx conference 2015

Who is ING in the Netherlands

4

Page 5: Nginx conference 2015

• Bank with diverse software and hardware landscape• Cost driven IT• Traditional software development: design, build, test, implement• Software strategy: buy before build• Middleware strategy: buy• Hardware strategy: appliance

History up to 2.5 years ago within ING

5

Page 6: Nginx conference 2015

• Bank with diverse software and hardware landscape• IT and Time-to-Market is important• 60 scrum teams internally working on software• Software strategy: build before buy (a lot of time)• Middleware strategy: buy but…• Hardware strategy: standard scalable stacks

From 2.5 years ago up to now

6

Page 7: Nginx conference 2015

Complex IT landscape

Task: simplify IT

Add missing functionality

7

Page 8: Nginx conference 2015

• Internet facing reverse proxies (IBM TAM WebSeal) Authenticating proxy Content caching and compression Cookie jar functionality

• Multiple layers of load balancers (F5 BigIP) Over data centers Over nodes in different network zones

For all internet facing domains of domestic banking Netherlands

Infra structure to replace

8

Page 9: Nginx conference 2015

• Investigate open source software: NGINX or Apache vs IBM WebSeal / F5• Perform a proof of concept with NGINX for Authentication and Event Publishing• Write a report for deciding architects which concluded after proof of concept:

Replace IBM TAM WebSeal with NGINX using custom modules Integrate the layers of F5 BigIP’s with NGINX

The result “GO!” Now we are more in control then ever.

The Plan to Simplify

9

Page 10: Nginx conference 2015

Starting with

10

Load balancer

WebSeal

Load balancer

Tier 1 (dmz)

Tier 2

F5

IBM

F5

F5

External Authentication

Interface

ApplicationApplication

Application

10

Inter Connectivity Cloud (between DC’s)Inter Connectivity Cloud (between DC’s)

Policy Mgr LDAP

Load Balancer

Page 11: Nginx conference 2015

Working towards

11

Load balancer

NGINX

Tier 1 (dmz)

Tier 2

F5

NGINX

External Authentication

Interface

ApplicationApplication

Application

11

Inter Connectivity Cloud (between DC’s)Inter Connectivity Cloud (between DC’s)

Page 12: Nginx conference 2015

Control in…

12

• Integrate Authentication and Event Publishing module from PoC

Functionality

Time-to-Market

Operational Monitoring

Control

Page 13: Nginx conference 2015

Control in…

13

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality

Functionality

Time-to-Market

Operational Monitoring

Control

Page 14: Nginx conference 2015

Control in…

14

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

Functionality

Time-to-Market

Operational Monitoring

Control

Page 15: Nginx conference 2015

Control in…

15

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points

Functionality

Time-to-Market

Operational Monitoring

Control

Page 16: Nginx conference 2015

Control in…

16

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

Functionality

Time-to-Market

Operational Monitoring

Control

Page 17: Nginx conference 2015

Control in…

17

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite

Functionality

Time-to-Market

Operational Monitoring

Control

Page 18: Nginx conference 2015

Control in…

18

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite• Add Grafana dashboards and Mobile alerts for team dashboards

Functionality

Time-to-Market

Operational Monitoring

Control

Page 19: Nginx conference 2015

Control in…

19

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite• Add Grafana dashboards and Mobile alerts for team dashboards• Monitor and report upstream errors to Tivoli Omnibus (MCR)

Functionality

Time-to-Market

Operational Monitoring

Control

Page 20: Nginx conference 2015

Control in…

20

• Integrate Authentication and Event Messaging module from PoC• Add missing cookie jar functionality• Add load balancing persistency over data centers

• Add dynamic service discovery so teams can self-service end points• Integrate existing (Java) Continuous Delivery Pipeline

• Monitor system resource usages and errors to Graphite• Add Grafana dashboards and Mobile alerts for team dashboards• Monitor and report upstream errors to Tivoli Omnibus (MCR)• Make performance data and reports available to all scrum teams

Functionality

Time-to-Market

Operational Monitoring

Control

Page 21: Nginx conference 2015

• First step: Integrate into the Continuous Delivery Pipeline• From GIT to production

• Second step: Add additional functionality to NGINX

• Future roadmap of the NGINX authenticating proxy environment

Roll-out planning

21

Page 22: Nginx conference 2015

• Using standard open source tools like:Git, Jenkins, Maven, Nexus, Docker, Valgrind, Python

• And closed source tools likeNolio (deployments), Fortify (static source code analysis)

First step: integrate in continuous delivery pipeline

22

Page 23: Nginx conference 2015

23

GIT repository

Page 24: Nginx conference 2015

24

Commits on “develop” trigger a build in JenkinsUsing an Apache Maven build profile

Page 25: Nginx conference 2015

25

Which builds the project modules

Page 26: Nginx conference 2015

26

By packaging all own modulesAnd add nginx.org source from our Nexus repositoryAnd 3rd party source modules from our Nexus repositoryAs a tar.gz file

Page 27: Nginx conference 2015

27

And add the RedHat .spec file

Page 28: Nginx conference 2015

28

To start a Docker build in a CentOS imageWhich results in an RPM

Page 29: Nginx conference 2015

29

If all Python tests succeed on the binary

Page 30: Nginx conference 2015

30

If all integration test scripts ran successfullyAll product acceptance scripts ran successfully

Page 31: Nginx conference 2015

31

And all module tests succeed as well

Page 32: Nginx conference 2015

32

Using a Python test frameworkTo easily create test cases for the binary and modules

Page 33: Nginx conference 2015

33

The RPM’s and test results are uploaded to a Nexus RepositoryTogether with Nolio deployment scriptsAfter which Jenkins triggers an automatic Nolio deployment in LCM

Page 34: Nginx conference 2015

34

Each commit in “develop” also starts a Jenkins job thatTriggers the Valgrind tests on all modulesAnd emails the results on failures

Page 35: Nginx conference 2015

35

Each commit in “develop” also starts a nightly Jenkins job thatStarts a Fortify scan for static source code analysisOn all own modules, NGINX code and all 3rd party modules used

Page 36: Nginx conference 2015

36

Releases on “master” trigger a build in JenkinsUsing Apache Maven release profileWhere versioned artifacts are uploaded to Nexus

Page 37: Nginx conference 2015

37

Configuration releases on “master” trigger a build in JenkinsWhere the correct nginx.conf and site information created

Page 38: Nginx conference 2015

38

And SQL is used to create a list of URL endpointsAnd their module directives

Page 39: Nginx conference 2015

39

Using a maven plugin to create the correct configuration files

Page 40: Nginx conference 2015

40

Using Docker to build a RPM and test all generated configurations

Page 41: Nginx conference 2015

41

So it can be automatically deployed in Nolio in LCM by Jenkins

Page 42: Nginx conference 2015

• LCM DEV + TST environment for internal team tests

• DEV + TST for integration tests for all other teams

• ACC for pre-production testsDaily load tests using Load Runner & perf. reports using Python, Latex and gnuplotWeekly resilience testsUnplanned Simian Army testsRun “perf” tests for NGINX profiling (if a change requires it)Penetration and security tests

• Multiple PRD environments in different data centersReplaced all IBM WebSeal reverse proxies with NGINXStarting to replace all F5 BigIP internal load balancers with NGINX load balancer module

The result…

42

Page 43: Nginx conference 2015

• Using “perf” we analyzed the binary under load ~500 URI/sec

Optimizing the result

43

Number 1, 3, 8,11 is GZIP compressionNumber 2 is memset => hard to pinpoint since generic use

Number 4 is network driver => cannot changeNumber 5 is cookie header parsing, triggered by our codeNumber 6 is OSNumber 7 is Kafka CRC32 code

Number 9 is memcpy => hard to pinpoint since generic useNumber 10 is cause by the audit system => cannot change

Number 20 first own method listed

Page 44: Nginx conference 2015

• GZIP is expensive on the CPU, use optimized libraries when possible

• Use static linking when replacing the patched library cannot be done on target machine

• Two patches available, from Intel and CloudflareCompression level 5

Source: https://www.snellman.net/blog/archive/2014-08-04-comparison-of-intel-and-cloudflare-zlib-patches.html

Include optimized libraries

44

Page 45: Nginx conference 2015

• Some libraries are not available on the target machine (Kafka, MaxMind, Protobuf)

• Some libraries are too old on target machine (PCRE3 – for JIT)

• CPU optimized versions are added in the Docker image and statically linked

Patching libraries for performance

45

Page 46: Nginx conference 2015

• Our five most important home-made modules

Cookie jar module – store Set-Cookie operations in reverse proxy WebSeal module – Authentication module based on Extended Authentication Interface (EAI) Kafka module – Send Event Messages from proxy layer to other systems Load balancing – Rule based upstream use, allow dynamic service discovery Monitoring module – Monitor application use and system resource usage

Second step: Add additional functionality to NGINX

46

Page 47: Nginx conference 2015

• Uses two levels of RB Trees to store state

• Highly configurable

• Use timers for automatic expiration and cleanup

• Use shared memory to share state between workers

Cookie jar module

47

Page 48: Nginx conference 2015

• Uses a RB Trees to store session state

• Allows access on different policies (fine or coarse grained)

• Use timers for automatic expiration and cleanup

• Use shared memory to share state between workers

• Implement the EAI interface to allow gradual migration

WebSeal module

48

Page 49: Nginx conference 2015

• Publish Events for monitoring and error analysis

• Highly configurable using a separate json config file

• Fast and asynchronous to avoid processing overhead

Event Publishing (Kafka) module

49

Page 50: Nginx conference 2015

• Use specific upstream servers based on rules (e.g. confidence test)

• Allow static load balancing over data centers for stateful applications

• Allow TCP connection re-use, using pools

• Integration with monitoring module to allow monitoring via MCR

Load balancing module

50

Page 51: Nginx conference 2015

• Read variables from other modules to monitor

• Create and expose variables with system resources to monitor

• Use UDP or TCP to transfer monitor data to Graphite

• Integration with Tivoli Omnibus to allow monitoring via MCR

Monitoring module

51

Page 52: Nginx conference 2015

Monitoring example

52

Page 53: Nginx conference 2015

• Add WAF modules

• Fully implement dynamic service discovery to dynamically add/remove URI’s and upstream servers

• Implement cross datacenter persistency for cookie jar

Future roadmap of the NGINX authenticating proxy environment

53

Page 54: Nginx conference 2015

• Remove manual work in development and testing ASAP

• NGINX has a lot of configuration optimization possibilitiesTCP Socket/TCP options, caching, connection re-use, JIT, Threads, upstream zone, buffer settings, timeouts

• In own modulesUse Shared Memory for Session State (if needed), RB Trees, Thread pools, Timers and the event queueUse atomic reference counter over shared mutex locks if possibleUse variables to pass data between modules

• In NGINX modulesCompression on content is CPU expensive!Cookie lookups in modules are potentially CPU expensiveCRC32 is potentially CPU expensiveIf using symmetric crypto, use types supported by the CPU (EAS-NI), like EAS GCM/CTR

Lessons learned so far…

54

Page 55: Nginx conference 2015

• Older stack require more work to fully use all configurationsRecompiled new GCC C-compiler for strong stack protector and CPU optimization optionsRecompiled libz and static link for latest version and add Intel performance patchesRecompiled libpcre and static link for latest version for JIT, and use CPU optimize flagsRecompiled other libs which are not present in RHEL and use CPU optimize flags

• Make monitoring highly configurable per site and fine-tune over time

• Use good monitoring dashboardsCombination of Graphite and Grafana works very wellTest which log data in error.log is required for good root-cause-analysis if an error occurs

• Take enough time to testPerformance tests under stress load with tools like “perf” give a lot of insightInvest enough time in resilience tests and what key data is needed to monitor your systemAll code which involves shared memory, locks, timers and configuration reloads take more time to get right

Lessons learned so far…

55

Page 56: Nginx conference 2015

And… NGINX is very fast, very efficiently coded and extremely fun to program for!

Lessons learned so far…

56

Page 57: Nginx conference 2015

Questions??

E-mail: [email protected]

And...

57

Page 58: Nginx conference 2015

The opinions expressed in this publication are based on information gathered by ING and on sources that ING deems reliable. This data has been processed with care in our analyses. Neither ING nor employees of the bank can be held liable for any inaccuracies in this publication. No rights can be derived from the information given. ING accepts no liability whatsoever for the content of the publication or for information offered on or via the sites. Author rights and data protection rights apply to this publication. Nothing in this publication may be reproduced, distributed or published without explicit mention of ING as the source of this information. The user of this information is obliged ot abide byb ING's instructions relating to the use of this information. Dutch law applies.

www.ing.com

Disclaimer

58