69
From Data Science to Production 01 deploy, scale, enjoy! Sergii Khomenko, Data Scientist [email protected], @lc0d3r PyData Amsterdam - March 12, 2016

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Embed Size (px)

Citation preview

Page 1: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

From Data Science to Production

01

deploy, scale, enjoy!

Sergii Khomenko, Data [email protected], @lc0d3r

PyData Amsterdam - March 12, 2016

Page 2: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Sergii Khomenko

2

Data scientist at one of the biggest fashion communities, Stylight.

Data analysis and visualisation hobbyist, working on problems not only in working time but in free time for fun and personal data visualisations.

Originally from computer engineering background.

Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM 2016

Page 3: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Fellow DevOps

3

Quentin NerdenMilos Radovanovic Patrick Roelke

Page 4: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Profitable LeadsStylight provides its partners with high-quality leads enabling partner shops to leverage Stylight as a ROI positive traffic channel.

InspirationStylight offers

shoppable inspiration that

makes it easy to know what to

buy and how to style it.

Branding & ReachStylight offers a unique opportunity for brands to reach an audience that is actively looking for style online.

ShoppingStylight helps users search

and shop fashion and lifestyle products smarter across

hundreds of shops.

4

Stylight – Make Style HappenCore Target Group

Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.

Page 5: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Stylight – acting on a global scale

Page 6: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Experienced & Ambitious Team

Innovative cross-functional organisation with flat hierarchy builds a unique team spirit.

• +200 employees• 40 PhDs/Engineers• 28 years average age

• 63% female• 23 nationalities• 0 suits

6

Page 7: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

7

D a t a S c i e n t i s t : P e r s o n w h o i s b e t t e r a t s t a t i s t i c s t h a n a n y

s o f t w a r e e n g i n e e r a n d b e t t e r a t s o f t w a r e e n g i n e e r i n g t h a n a n y

s t a t i s t i c i a n .

Page 8: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Agenda

8

E a r l y d a y s o f s t a r t u p s S o f t w a r e e n g i n e e r i n g

I m m u t a b l e i n f r a s t r u c t u r e S e r v e r l e s s a r c h i t e c t u r e

Page 9: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

The Early Days of Startups

9

Page 10: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Problem definition:

10

• Many different technologies • Hard to reproduce data science results • Issues with backward compatibility • Dependency hell • Hard to scale products • Hard to on-board new people

Page 11: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

11

Page 12: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Software engineering

12

built circa 2015-16

Page 13: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Our stack

13

Page 14: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

14

Page 15: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

You most likely doing it already

15

• Version control • Cover code with tests • nosetests, pytest, unittest2 - start small with doc tests - try out TDD: rednose, nose-watch

Page 16: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

You most likely doing it already

16

• Cover code with tests • yes, even your R application could

have tests - testthat - devtools

• Code reviews • Pair programming

Page 17: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Some of the mentioned problems

17

• Many different technologies • Issues with backward compatibility • Dependency hell • Hard to on-board new people

Page 18: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

18image from http://udaypal.com/

Page 19: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

19image from http://udaypal.com/

Page 20: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

20image from http://udaypal.com/

Page 21: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Some of the mentioned problems

21

• Many different technologies • Issues with backward compatibility • Dependency hell • Hard to on-board new people

Page 22: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

How it could help:

22

• Every technology has its own container - just docker run

• Every package with version defined in Dockerfile - have a base image for more advanced cases

• New people - just docker run

Page 23: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

23image from http://udaypal.com/

r-base/Dockerfile

Page 24: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

24image from http://udaypal.com/

lc0/docker-shiny-server

Page 25: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

25image from http://udaypal.com/

Page 26: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Known issues

26

• Images could be really huge • Try to skip anything you do not need • Alpine Linux as a base image • 5 mb base image (musl libc and BusyBox)

• Iron.io has pre-built images based on alpine • python, scala, java, elixir, etc

Page 27: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Known issues

27

16 mb

232 mb

Page 28: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Some of the mentioned problems

28

• Hard to roll out • Hard to maintain production dependencies

Page 29: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

29image from http://udaypal.com/AWS ECR

Page 30: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

30image from http://udaypal.com/

Page 31: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

31image from http://udaypal.com/

CircleCI deployments

Page 32: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

32image from http://udaypal.com/

CircleCI deployments

Page 33: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

33image from http://udaypal.com/

CircleCI deployments

Page 34: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

34image from http://udaypal.com/

CircleCI deployments

Page 35: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Immutable infrastructure

35

Infrastructure as Code

Page 36: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

36

N e e d t o u p g r a d e ? N o p r o b l e m . B u i l d a n e w , u p g r a d e d s y s t e m a n d t h r o w t h e o l d o n e a w a y . N e w a p p

r e v i s i o n ? S a m e t h i n g . B u i l d a s e r v e r ( o r i m a g e ) w i t h a n e w

r e v i s i o n a n d t h r o w a w a y t h e o l d o n e s .

Page 37: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

37

Page 38: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

38

Page 39: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

39

Page 40: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

40

CloudFormation

Page 41: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

41

CloudFormation

Page 42: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

42

cloudtools/troposphere

Page 43: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

43

cloudtools/troposphere

Page 44: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

44

cloudtools/troposphere

Page 45: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

45

Terraform

Page 46: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

46

Page 47: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

47

Terraform

Kubernetes and Docker {Swarm, Compose}

Page 48: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Serverless architecture

48

Page 49: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

49

Page 50: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

50

Page 51: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

51

Page 52: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

52

Page 53: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

53

Page 54: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

54

Page 55: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

55

Page 56: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Possibilities

56

• all Lambdas in one place with version control • integration tests with real events • proper CI/CD setup

Page 57: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

57

CircleCI deployments

Page 58: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

58

CircleCI deployments

Page 59: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

59

CircleCI deployments

Page 60: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

60

Cloud functions

Page 61: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Use-case of outlier detection

61

Page 62: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

62

Page 63: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

63

custom unificationpipeline

DepartmentsBusiness

Intelligence

internal processes variety of event types and structures

Page 64: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

64

Outlier detection to Slack

Page 66: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Related links

66

1. Testing Your Code - The Hitchhiker's Guide to Python

2. https://hub.docker.com/_/r-base/

3. http://www.alpinelinux.org/

4. https://github.com/iron-io/dockers

5. Docker Hub: A new stack plus ecosystem partners automate developer workflows

6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components

Page 67: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

Related links

67

7. https://github.com/cloudtools/troposphere

8. CloudFormation UpdatePolicy Attribute

9. https://www.terraform.io/

10.(Docker Compose + Docker Swarm) or Kubernetes

11.Google Cloud Functions

12.https://github.com/apex/apex

13.Streaming Data Processing with Amazon Kinesis and AWS Lambda

Page 68: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

68

Page 69: From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

69