From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Preview:

Citation preview

From Data Science to Production

01

deploy, scale, enjoy!

Sergii Khomenko, Data Scientistsergii.khomenko@stylight.com, @lc0d3r

PyData Amsterdam - March 12, 2016

Sergii Khomenko

2

Data scientist at one of the biggest fashion communities, Stylight.

Data analysis and visualisation hobbyist, working on problems not only in working time but in free time for fun and personal data visualisations.

Originally from computer engineering background.

Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM 2016

Fellow DevOps

3

Quentin NerdenMilos Radovanovic Patrick Roelke

Profitable LeadsStylight provides its partners with high-quality leads enabling partner shops to leverage Stylight as a ROI positive traffic channel.

InspirationStylight offers

shoppable inspiration that

makes it easy to know what to

buy and how to style it.

Branding & ReachStylight offers a unique opportunity for brands to reach an audience that is actively looking for style online.

ShoppingStylight helps users search

and shop fashion and lifestyle products smarter across

hundreds of shops.

4

Stylight – Make Style HappenCore Target Group

Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.

Stylight – acting on a global scale

Experienced & Ambitious Team

Innovative cross-functional organisation with flat hierarchy builds a unique team spirit.

• +200 employees• 40 PhDs/Engineers• 28 years average age

• 63% female• 23 nationalities• 0 suits

6

7

D a t a S c i e n t i s t : P e r s o n w h o i s b e t t e r a t s t a t i s t i c s t h a n a n y

s o f t w a r e e n g i n e e r a n d b e t t e r a t s o f t w a r e e n g i n e e r i n g t h a n a n y

s t a t i s t i c i a n .

Agenda

8

E a r l y d a y s o f s t a r t u p s S o f t w a r e e n g i n e e r i n g

I m m u t a b l e i n f r a s t r u c t u r e S e r v e r l e s s a r c h i t e c t u r e

The Early Days of Startups

9

Problem definition:

10

• Many different technologies • Hard to reproduce data science results • Issues with backward compatibility • Dependency hell • Hard to scale products • Hard to on-board new people

11

Software engineering

12

built circa 2015-16

Our stack

13

14

You most likely doing it already

15

• Version control • Cover code with tests • nosetests, pytest, unittest2 - start small with doc tests - try out TDD: rednose, nose-watch

You most likely doing it already

16

• Cover code with tests • yes, even your R application could

have tests - testthat - devtools

• Code reviews • Pair programming

Some of the mentioned problems

17

• Many different technologies • Issues with backward compatibility • Dependency hell • Hard to on-board new people

18image from http://udaypal.com/

19image from http://udaypal.com/

20image from http://udaypal.com/

Some of the mentioned problems

21

• Many different technologies • Issues with backward compatibility • Dependency hell • Hard to on-board new people

How it could help:

22

• Every technology has its own container - just docker run

• Every package with version defined in Dockerfile - have a base image for more advanced cases

• New people - just docker run

23image from http://udaypal.com/

r-base/Dockerfile

24image from http://udaypal.com/

lc0/docker-shiny-server

25image from http://udaypal.com/

Known issues

26

• Images could be really huge • Try to skip anything you do not need • Alpine Linux as a base image • 5 mb base image (musl libc and BusyBox)

• Iron.io has pre-built images based on alpine • python, scala, java, elixir, etc

Known issues

27

16 mb

232 mb

Some of the mentioned problems

28

• Hard to roll out • Hard to maintain production dependencies

29image from http://udaypal.com/AWS ECR

30image from http://udaypal.com/

31image from http://udaypal.com/

CircleCI deployments

32image from http://udaypal.com/

CircleCI deployments

33image from http://udaypal.com/

CircleCI deployments

34image from http://udaypal.com/

CircleCI deployments

Immutable infrastructure

35

Infrastructure as Code

36

N e e d t o u p g r a d e ? N o p r o b l e m . B u i l d a n e w , u p g r a d e d s y s t e m a n d t h r o w t h e o l d o n e a w a y . N e w a p p

r e v i s i o n ? S a m e t h i n g . B u i l d a s e r v e r ( o r i m a g e ) w i t h a n e w

r e v i s i o n a n d t h r o w a w a y t h e o l d o n e s .

37

38

39

40

CloudFormation

41

CloudFormation

42

cloudtools/troposphere

43

cloudtools/troposphere

44

cloudtools/troposphere

45

Terraform

46

47

Terraform

Kubernetes and Docker {Swarm, Compose}

Serverless architecture

48

49

50

51

52

53

54

55

Possibilities

56

• all Lambdas in one place with version control • integration tests with real events • proper CI/CD setup

57

CircleCI deployments

58

CircleCI deployments

59

CircleCI deployments

60

Cloud functions

Use-case of outlier detection

61

62

63

custom unificationpipeline

DepartmentsBusiness

Intelligence

internal processes variety of event types and structures

64

Outlier detection to Slack

www.stylight.com

sergii.khomenko@stylight.com@lc0d3r

Related links

66

1. Testing Your Code - The Hitchhiker's Guide to Python

2. https://hub.docker.com/_/r-base/

3. http://www.alpinelinux.org/

4. https://github.com/iron-io/dockers

5. Docker Hub: A new stack plus ecosystem partners automate developer workflows

6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components

Related links

67

7. https://github.com/cloudtools/troposphere

8. CloudFormation UpdatePolicy Attribute

9. https://www.terraform.io/

10.(Docker Compose + Docker Swarm) or Kubernetes

11.Google Cloud Functions

12.https://github.com/apex/apex

13.Streaming Data Processing with Amazon Kinesis and AWS Lambda

68

69