Upload
dokhanh
View
216
Download
1
Embed Size (px)
Citation preview
CONFIDENTIAL | FOR INTERNAL USE ONLY
OpenShift at Point72 A view into our journey
About Point72
2 CONFIDENTIAL
Who is Point72
Ethics and
Integrity Firm First
Innovation &
Excellence
We are
professionals who
conduct ourselves
ethically and with
integrity at all
times.
We operate as one
firm, dedicated to
succeeding
together, with
mutual respect and
commitment.
We are not
satisfied with the
status quo and are
committed to
pursuing innovation
and excellence.
Growth and
Development
We work together
to advance our
professional and
personal
development.
Community
We are exemplary
citizens of the world
and contribute to the
communities in
which we live and
work.
Point72 Asset Management, L.P. is a family office investment company. We manage the assets of our founder Steve Cohen and
eligible employees. The firm operates with approximately $11B AUM. Our primary office is located Stamford, CT with offices in London,
Hong Kong, Tokyo, and Singapore. We have a world class team consisting of approximately 1,000 employees.
Values
Mission To be the industries premier asset management firm through delivering superior risk adjusted returns, adhering to the highest ethical
standards and offering the greatest opportunities the industries brightest talent.
3 CONFIDENTIAL
Who we are
Billy Shaw @360linux Dan Foley @djfoley0
Director Systems Engineering, at Point72 Asset
Management, has been in the industry working with
Linux and Unix for the last 24 years. Since 2004 he
has been the primary person responsible for the
Linux services at Point72 growing it from a handful of
servers to the majority server operating system at the
firm. Prior to Point72 Billy has worked for JPMorgan
Chase, Travelocity, Organic Inc, and was a
Cryptologic Technician in the United States Navy.
Systems Engineer, at Agio, has been working with
Linux and Unix for the past 7 years. In 2013 he
started working at Agio on the Linux support and
implementation team, working projects to deploy
new software and provide support for our client
environments. Since March 2016 he has been a
dedicated resource for the Point72 Linux Team,
working several projects including the deployment
and development of the Openshift Enterprise
environment.
4 CONFIDENTIAL
How we got started with OpenShift
As we started working on the next generation platform for our trade processing
applications we did a lot of planning. As a team we came up with principles to guide on
our journey to a microservices architecture.
• Open Source
• Cloud First
• Reactive
• CI/CD pipelines
• Elastic Scale
• Resilient
• Evolvable
• Secure
• Everything is streaming
• Everything is distributed
• Big things from small things
• Test Driven Development
• Move code not data
• Anything can fail
• Documentation
5 CONFIDENTIAL
Principles for development
As a team we promote the use of OpenSource
software tools and technologies.
It is important to us that we contribute back to
OpenSource projects we use.
Always deliver new services and workloads
using a cloud platform for IaaS.
Reactive CI/CD
Reactive Systems rely on asynchronous
message-passing to establish a boundary
between components that ensures loose
coupling, isolation and location transparency.
Development processes are automated
wherever possible.
Open Source Cloud First
6 CONFIDENTIAL
Principles for development (continued)
Systems respond to workload requirements
expanding and contracting based on business
requirements.
Systems stay responsive meeting SLA’s based
on varying workloads.
Evolvable Secure
Applications and systems are decoupled and
we are not locked to specific providers at any
layer in the stack.
Security is not optional and applied
appropriately to every layer of the CI/CD
pipelines.
Elastic Scale Resilient
7 CONFIDENTIAL
Principles for development (continued)
The services to the business cannot pause or
stop. It is far easier to treat batch as a stream
process than to treat a stream as a batch
process.
Presume code will run on multiple pods
simultaneously across multiple servers.
Big things from small things Test driven development
Focus on the details delivering components in
small deliverable cycles following agile
principles.
Follow test driven development principles to
maintain quality by having full test coverage of
all modules.
Everything is streaming Everything is distributed
8 CONFIDENTIAL
Principles for development (continued)
Code is usually smaller than data. With a
distributed containerized infrastructure we are
able to esnure code is moved not the data.
Hardware will fail and software will crash. We
embrace that in our design principles and using
distributed orchestrated containers and
streaming it becomes easier to achieve.
Documentation Monitoring
Everything we do is clearly written up as
comments in code, README markdown in git,
and wiki pages with appropriate supporting
details.
All vital functions are monitored, managed and
logged in a consistent manner across the entire
platform.
Move code not data Anything can fail
9 CONFIDENTIAL
Openshift adoption timeline
June April January December October November September August July February March
2016 2017
PaaS POC’s start Decide to use Origin OpenShift Enterprise purchased MVP Complete
OCP installed with RedHat consulting POC for microservices starts Sprints start for MVP Jenkins deployed
Cloudforms deployed EFK stack deployed
10 CONFIDENTIAL
OpenShift deployment strategy
Deployment description Deployment details
Early on we knew that we wanted to empower our developers to
have as great control as possible over their deployments into
OpenShift.
To facilitate that a technique was put together using our existing
resources in a new way.
All git repositories contain an OpenShift template in JSON format
maintained as part of the application repository. This template has
a corresponding answer file with definitions of values which can be
whenever a developer needs it (any valid OpenShift configuration
can be specified and a build will fail if it is incorrect).
During build time the code is complied as needed and merged
with our docker images in our docker registry and a new image is
created containing the application and artifacts deployed via an
API call into OpenShift.
As we focused on OpenShift we started with a strong deployment strategy from day one
11 CONFIDENTIAL
Reference architecture
All connectivity is done with
dedicated encrypted circuits
Using node selectors during a
deployment applications are
only deployed to the nodes
approved for the type of work
A multi-master design provides HA
for scheduling and API calls
Router sharding is used to
provide isolated environments for
Sandbox, DEV, QA, and UAT
Deployments can also
guarantee compute resources
for applications with well
defined SLA’s.
Monitoring and Reporting
Tools for understanding the state of OpenShift
• Cloudforms
• EFK
• Prometheus
• Grafana
13 CONFIDENTIAL
CloudForms
Requirements Implementation
• Chargeback or showback reports
• Support multiple cloud providers
• Provisioning capabilities
• Consolidated view of cloud resources
• Consolidated view of OpenShift resource
Cloud forms is installed and running inside
OpenShift it’s own name space. We have had
success using it with both Origin and OCP.
We have been able to use to get insight into
resource for our cloud providers to give an ‘evelator
pitch” into resources and cost.
Example of our implementation
14 CONFIDENTIAL
EFK
Requirements Implementation details
• UI for generating dashboards
• Easy expansion
• Fast queries
• Does not run alongside applications
being monitored
• Access control
• Can accept API quires
• Data can come from sources other
than OpenShift
We set up EFK outside of the cluster using a
fluentd daemon set inside OpenShift to get
messages out.
This allows us to keep up with the frequent
Elastic.co release cycle and install any plugins we
require for any component of the stack.
Example of our implementation
15 CONFIDENTIAL
Prometheus
Requirements Implementation details
• Open Source
• Pull all metrics from nodes, containers, and
applications
• Create custom metrics for specific
applications
• Export data
• Provide live data feed for external
applications via RESTful API
• Pull metrics from Jenkins
Prometheus is deployed in a custom container. This
allows us to have more control over the application
in our environment. Currently it is deployed within
OpenShift pulling metrics from Hawkular.
Our microservices in OpenShift also provide data for
Prometheus.
We do not expose Prometheus data to users.
Instead we expose the data via Grafana
Implementation example
16 CONFIDENTIAL
Grafana
Requirements Implementation details
• OpenSource
• Handle multiple data feeds
• Active Directory integration
• User access control
• Export graphs and data
• Ability to create custom views and
dashboards
Grafana is deployed in a container we built. For
access to the data feeds and to keep Prometheus
inaccessible, we removed the Prometheus route and
used the internal dns name.
prometheus.devops.svc.cluster.local
In grafana we use multiple data feeds and are able
to control user access, create custom dashboards,
queries, and export data for analysis.
Example of our implementation
Troubleshooting
Techniques and tools we use for addressing issues
• Cluster
• Routes
• Services
• Pods
• Docker
• Networking
• Persistent Storage
• Backups
18 CONFIDENTIAL
Cluster
Scenario
We use node selectors as part of our deployments
we have had to change the cloud instance types to
match the business workloads used by OpenShift.
We found there are times where node labels were no
longer applied after the instance type was changed.
How we handled it
After some investigation we found all we had to do
was add the label back to the node after the instance
type change.
19 CONFIDENTIAL
Cluster recovery
Scenario
In order to reduce cost we turned off the POC and
now the DEV environments when not in use.
This also allowed us to evaluate, over a long period
of time, how OpenShift behaves when servers “just
go away” and “come back”.
How we handled it
The servers were stopped/started using cloud
provider API’s.
Over the course of 9 months we only saw a small
number of issues each of which was straight forward
to resolve.
20 CONFIDENTIAL
Cluster node reports ‘NotReady’
Scenario
At some point you will encounter a node which
reports a status of NotReady.
How we handled it
Start with the ‘oc describe node nodename’ It will
provide details about the node including pods, their
status, allocated resources, and events.
We review these and check server node health if
anything looks amiss.
21 CONFIDENTIAL
Routes
Scenario
When adding or removing routers we would see
inconsistent results when requests would come in for
routes.
How we handled it
This was an easy one to fix, but took a little bit to
diagnose. We spent a long time in OpenShift looking
for the issue.
At some point it was realized that the entries in our
DNS servers subdomain for the OpenShift
environment had not been updated.
22 CONFIDENTIAL
Routes and DNS
Scenario
It is important to know how SkyDNS resolves names
in order to keep specific connections within
OpenShift without having to use POD IP addresses
which can and will change overtime.
How we handled it
SkyDNS internal format:
Default: <pod_namespace>.cluster.local
<service>.<project / namespace>.svc.cluster.local
<name>.<namespace>.endpoints.cluster.local
23 CONFIDENTIAL
Services
Scenario
We found that applications will expose ports other
than http or https and developers would require
access to the applications on those ports.
Examples of this are crate.io, zookeeper, kafka,
various custom applications, etc.
How we handled it
In order to resolve this issue we took advantage of
node ports.
Node ports operate by reserving a port on each
instance. The port range available is 30000 to
32000.
24 CONFIDENTIAL
Example: let’s say we have 3 pods we want to run on
high compute nodes, but they must be evenly
distributed. To do this you can have 3 or 6 nodes. In
sub groups
Pods
Scenario
We wanted to ensure there was node affinity for pods.
Prior to 3.4, node Anti Affinity was an issue. To ensure
pods were evenly distributed to different nodes we created
a work around using multiple labels for groups of servers.
How we handled it
Each node with a label, computenodeg1 / g2 / g3
and you could tag one or more computer nodes in
the different groups. While also having a main label
of “computenode”.
25 CONFIDENTIAL
Networking
Scenario
We will often need to troubleshoot network
connectivity at many different layers but do not want
to install tools like ping, netcat, nmap, curl, wget, etc.
everywhere (especially in the containers which run
our pods)
How we handled it
Using the built in device file /dev/tcp we can open a
tcp connection on any port we need for testing (the
same works for /dev/udp).
26 CONFIDENTIAL
Docker registry
Scenario
While it is great that OpenShift can act as our
primary docker registry we wanted to be able to
leverage our images for multiple OpenShift
installations, standalone docker daemons, and
ensure that we had full control over the images
coming into the firm.
How we handled it
Using the docker registry in Artifactory we ensured
that all our /etc/sysconfig/docker files were set up to
use our registry as the primary source for images.
27 CONFIDENTIAL
Docker networking
Scenario
From vanilla docker on RHEL 7 or CentOS 7 on up
through Origin and OCP the default IP range used by
the default docker configuration conflicted with
network ranges used at Point72.
How we handled it
Since we currently use 172.17.0.0/16 for production
network segments we had to make sure that all
docker configuration were set up to use something
else. We opted to ensure that all docker settings
(/etc/sysconfig/docker) use a default network of
192.168.0.0/16 CIDR block.
28 CONFIDENTIAL
Docker
Scenario
We found that over time the docker daemon would
hold onto images which did not have a repository
associated with them. Over time this would fill up the
volume allocated to docker preventing deployments
from running correclty.
How we handled it
We started to regularly go through and clean up the
docker images which were no longer used.
If for some reason an image was removed by
mistake on the next deployment it would be pulled
from our internal docker registry.
29 CONFIDENTIAL
Persistent Storage
Scenario
Running out of space on your EBS persistent
storage. It needs to be expanded.
How we handled it
Expanding an EBS volume can be done by creating a
snapshot, then creating a new larger volume from the
snapshot.
Updating the PV object in OpenShift with the new
VolumeID and size is the next step. Then finally the
tricky part is starting the POD to let the volume mount to
a node. Log into that node and run an xfs_growfs (or
appropriate command for your filesystem).
30 CONFIDENTIAL
Persistent Storage
Scenario
Running out of space on your EBS persistent
storage. It needs to be expanded.
How we handled it
Expanding an EBS volume can be done by creating a
snapshot, then creating a new larger volume from the
snapshot.
Updating the PV object in OpenShift with the new
VolumeID and size is the next step. Then finally the
tricky part is starting the POD to let the volume mount to
a node. Log into that node and run an xfs_growfs (or
appropriate command for your filesystem).
31 CONFIDENTIAL
The API creates the EBS volume in AWS in the correct
zone, then return the volumeid, with the volume id, the API
would then create the Persistent Volume object in
OpenShift using the name, size, and returned volume id.
Persistent Storage
Scenario
Prior to v 3.4 using the AWS API for storage
management was in technical preview. We found
issues with being able to have persistent volumes
detach and attach correctly when pods came up on
different nodes.
How we handled it
To address the issue we created our own API to
interface between OpenShift and AWS .
The API would require that the volume size and
name along with a valid authorization token from one
of the masters be provided as input.
32 CONFIDENTIAL
Backups
Scenario
Our ability to recover from failure was a
important requirement.
Backups had to occur daily and be easy to
use for restores.
How we handled it
To address this we looked at each part of the cluster
and came up with techniques to store the
configuration on a daily basis.
Our backup system would pick up the files during the
scheduled backups on each master and node.
tar cf ${BACKUPDIR}/certs-and-keys-
$(hostname).tar *.key *.crt
etcdctl backup --data-dir $ETCD_DATA_DIR --backup-
dir ${BACKUPDIR}/etcd-$(hostname).bak
oc login -u <user> -p <passwd>
oc get projects -o name
oc export dc <project> --as-
template=projectBackup-o json >
yourProjectTemplate.json
oc get rolebindings --export=true --as-
template=roleBindingsBackup -o json >
yourRolebindingsTemplate.json
oc get serviceaccount --as-
template=serviceaccountBackup -o json >
yourServiceaccountTemplate.json
oc get secrets --as-template=secretsBackup -o
json > yourSecretsTemplate.json
oc get pvc --as-template=pvcBackup -o json >
yourPVCsTemplate.json
OpenShift and Docker best practices
Guidelines and practices we adhere to
34 CONFIDENTIAL
Our docker best practices
Take-Away (20 Point Arial)
Reuse
images
New images should be based of an
existing image. Use the FROM
statement in your DOCKERFILE. It
will ensure that updates to the
upstream image are available in the
new image.
Maintain
compatibility
within tags
If you tag your image as p72image:v1
then stick with that for updates to the
image.
If an update is no longer compatible
with the p72image:v1 then update to
v2.
Limit
services
running in
containers
Keep your containers as simple as
possible. Include only what is
necessary for the container to do the
work it needs to do.
THINK LIGHTWEIGHT
Use exec in
wrapper
scripts
Always use exec.
Keep in mind that a Docker container
runs as PID 1 so when the the exec
dies the container (and pod) dies with
it.
1. Footnotes – 9 Point Arial
35 CONFIDENTIAL
Our docker best practices
Take-Away (20 Point Arial)
Remove
temporary
files
Always remove temporary files to
keep bloat out of your image.
For example, if you install RPM’s via
yum, put all three plus a yum clean all
on the same line (each command
would be it’s own layer).
Order your
docker
instructions
properly
If your Dockerfile starts getting
complex think it through. Docker
processes from top to bottom.
Steps least likely to change are at the
top. Through cache subseqeunt
images will be faster.
Always
expose
important
ports
Expose only what is needed. Pay
attention to software you run and if a
port is not needed by your application
it is not important so don’t expose it. Set
environment
variables
Using the ENV instruction, set your
environment variables.
Always include the version of your
project making it easy for others to
know which version of your code is
running.
1. Footnotes – 9 Point Arial
36 CONFIDENTIAL
OpenShift best practices
Take-Away (20 Point Arial)
Be ready for
any user id
Containers in PODS will use an
arbitrary user id. It is a small measure
of security. If you need known
permissions use a default group of
root. RUN chgrp -R 0 /some/directory
RUN chmod -R g+rw /some/directory
RUN find /some/directory -type d -exec chmod g+x {} +
Use services
Communication between pods must
use services. The service provides a
static endpoint for access and will not
change as pods come up and down.
More
environment
variables
Include environment variables in your
OpenShift deployments.
Control groups can be used too. For
example, dynamically setting the java
HEAP size can be derived from the
memory values.
Use image
metadata
Metatdata will assist everyone using
the deployments and images you
create.
The spirit is to ensure enough
information is available for others
down the road to know what was
intended.
37 CONFIDENTIAL
OpenShift best practices
Take-Away (20 Point Arial)
Clustering
There will be applications which need
to be clustered (think zookeeper).
Pod IP’s will change over time and
the underlying clustered application
needs to be able to handle this in it’s
election process.
Logging
Send all logging to STDOUT. This
data is collected by OpenShift and via
a fluentd forwarded sent to the EFK
stack.
Liveness
and
readiness
probes
This is a nice simple way to check if
your container is still running and
restart based on policy.
A readiness probe will check if a pod
is ready to service requests. If it fails
then the endpoint controller will
remove the pod from the service.
Templates
Always thing of deployments as
templates. This is how all our
deployments are done from an
applications git repository.
1. Footnotes – 9 Point Arial
Thank you Q&A time