32
Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office, CloudLock [email protected] @dvdmelamed

Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

  • Upload
    buithuy

  • View
    270

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Ansible at Scale

Ansible Israel, May 9, 2016

David Melamed

Senior Research Engineer, CTO Office, CloudLock

[email protected] @dvdmelamed

Page 2: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Who is this guy?

Page 3: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

4 B

Where is he working?

Founded: 2011

Corporate Headquarters: Waltham, Mass. (U.S.A.)

R&D Headquarters: Tel Aviv

Employees: 140 (30 in TLV)

Trusted by major brands:

157K APPS

10 MUSERS ACTIVITIES

Page 4: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

01 Ansible main notions

Page 5: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

What is Ansible?

● Open-source configuration automation tool● Written in Python and easily extensible● Agent less (only requires SSH / WinRM)● Idempotent modules● Ad hoc task execution● Reusable list of tasks● Code deployment

Page 6: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Inventory

WEB SERVERS DAEMON SERVERS FILE SERVERS

COMPUTING CLUSTER

[webservers]192.168.1.12192.168.1.13192.168.1.19

[daemonservers]192.168.1.34192.168.4.24

[vpc]webserversdaemonservers

Static inventory

VPC

Page 7: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Task, play & playbook

- name: check server is aliveaction: ping

- name: update app configurationaction: copy src=myapp.conf dest=/etc/myapp/prod.conf

...

task

play

playbook

Page 8: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Role

- tasks main.yml

- handlersmain.yml

- templatestemplate.conf.j2

- filesfile1.txt

- varsmain.yml

Page 9: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Vault

● Put all secrets in one place● Store secrets into git

Page 10: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

02 Our requirements

Page 11: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

CloudLock requirements

● Multiple environments (AIO vs. VPC, AWS vs. AppEngine)● Multiple environment types (local / stage / prod)● 10 different VPCs with different access levels● VPCs with ~ 100 machines of several types● Multiple small repos (python package) with dependencies● Zero-downtime deployment as much as possible

Page 12: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Multiple stacks & environments

Web server(Angular app)

My laptop(OSX)

Your laptop(Ubuntu)

Multi-tier env.in AWS

AIOin AWS

Multi-tier env.in AWS

LOCAL STAGE PROD

API server(Flask app)

Database(PostgreSQL or RDS)

Cache server(Redis or ElastiCache)

Message Queue(RabbitMQ)

PRE-PROD

Multi-tier env.in AWS

Page 13: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

03 Ansible profiling

Page 15: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Profiling Ansible (2)PLAY [Deploy | Ensure database and user] *************************** Thursday 15 October 2015 09:51:01 +0000 (0:00:01.786) 0:00:12.318 ****** ===============================================================================

TASK: [storage/postgresql-database | Create | Ensure database from database variable] *** Thursday 15 October 2015 09:51:01 +0000 (0:00:00.011) 0:00:12.329 ****** ok: [sandbox]

TASK: [storage/postgresql-database | Create | Ensure database user from database.user variable] *** Thursday 15 October 2015 09:51:01 +0000 (0:00:00.163) 0:00:12.493 ****** ok: [sandbox]

TASK: [storage/pgbouncer | Start pgBouncer] *********************************** Thursday 15 October 2015 09:51:09 +0000 (0:00:00.242) 0:00:20.782 ****** ok: [sandbox]

TASK: [storage/pgbouncer | Bump file descriptor limits] *********************** Thursday 15 October 2015 09:51:09 +0000 (0:00:00.177) 0:00:20.960 ****** changed: [sandbox] => (item=hard)changed: [sandbox] => (item=soft)

...

PLAY RECAP ******************************************************************** module1 | Install | Ensure modules ------------------------------------- 13.14smodule2 | Install pgBouncer --------------------------------------------- 7.51smodule3 | Install | Clean/uninstall modules ----------------------------- 6.85smodule4 | Install | Ensure core installed ------------------------------ 4.66s...Thursday 15 October 2015 09:52:49 +0000 (0:00:00.023) 0:02:00.236 ****** =============================================================================== sandbox : ok=142 changed=82 unreachable=0 failed=0

Page 16: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

04 Tips for scale support

(faster & easier to maintain)

Page 17: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Factors impacting ansible speed

● SSH connection● Facts gathering● Tasks performed serially● Redundant tasks

Page 18: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Improving SSH speed

● Persistent connection (default on for SSH)○ ControlMaster=auto○ ControlPersist=60s

● SSH pipelining (1 connection per task)○ Requires disabling requiretty

Page 19: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Ansible configuration

● Commit your ansible.cfg● Control facts gathering (gathering)

○ implicit (default) - always discover the facts○ explicit - use facts cache, not used unless defined in play○ smart - use facts cache, discover facts for new hosts

● Control the number of parallel processes (forks)○ default is 5○ we use 25

● SSH args / SSH pipelining

Page 20: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Inventory

● Make your ansible code environment agnostic● Machine grouping by environment or by “role” type● Hierarchical inventory● Vault per environment● Dynamic inventory for better cloud support● Use dedicated machine to deploy (ansible-workstation)

Page 21: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

CloudLock static inventory overview

inventory/ | |---- environments | |----- allinone |----- beta |----- demo |----- dev1 |----- dev2 |---- qa1 |----- qa2 |---- group_vars | |----- allinone/ |----- beta/ |----- demo/ |----- dev1/ |----- dev2/ |---- qa1/ |----- qa2/

+ use of route53 for internal DNS

Page 22: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

EC2 dynamic inventory

● Python script using boto● List of instances + hostvars● Use instance names or IPs● Groups by instance tags, vpc, …● List cached

"ec2": [ "52….", "52….", "52….", ], "tag_Environment_prod": [ "52….", "52…..", "54….." ], "tag_Name_prod_bastion": [ "54…." ], "tag_Name_Report_Decryptor": [ "52….." ], "tag_Name_devpi": [ "52….." ]

Page 23: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Playbooks

● Tasks executed synchronously○ Segment roles/groups to leverage parallel forks

● Use tags to add modularity (i.e. config, deploy…)● Name each task● Limit conditional execution in roles, put them in the

playbooks instead

Page 24: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Tasks & Roles

● Make your role generic and simple● Role should be decoupled from inventory● Keep your configuration separate● Tasks should be idempotent● Use “include” for sub-roles● Try to avoid redundant tasks (use AMI)● Share handlers with a global role ● Avoid using command and shell and use appropriate modules instead

- roles/

ci/

jenkins/

jobs/

monitor/

cloudwatch/

nagios/

platform/

base/

component-a/

component-b/

events/

setup/

teardown/

system/

web/

Page 25: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Vault

● Encrypt only what is necessary● No way to merge 2 encrypted files● Several tools to improve vault management

○ https://github.com/building5/ansible-vault-tools○ https://gist.github.com/benzado/7bf5aa15e15d2d0d0380

Page 26: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

ansible-playbook vs ansible-pull

● Regular mode: connect to server and deploy● “Pull” mode: pull from repo on remote and execute● Syntax: ansible-pull -U git://github.com/REPO.git -d DEST_DIR● Example of cron install using ansible

https://github.com/ansible/ansible-examples/blob/master/language_features/ansible_pull.yml

Page 27: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

CI for Ansible

● Test locally with vagrant / docker● PR reviews (issue with vault changes)● Jenkins job deploying to AIO + github hook

● Coming soon: unit tests (ansible-kitchen)

Page 28: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Ansible 1.9 vs. Ansible 2.0

● Some breaking changes● A lot of new cloud modules (i.e. ECS, VPC)

Page 29: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Results

● Before: deployment to VPC took several hours● After: ~ 20 min for a full deployment

Page 30: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

More about Ansible

● Awesome Ansible: https://github.com/jdauphant/awesome-ansible

● Ansible for DevOpshttps://leanpub.com/ansible-for-devops

Page 31: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Cloudlock is looking for talents

Page 32: Ansible at Scale - Meetupfiles.meetup.com/17312132/Ansible IL - Ansible at scale.pdf · Ansible at Scale Ansible Israel, May 9, 2016 David Melamed Senior Research Engineer, CTO Office,

Questions/feedback