26
Kakaocorp From unmanned Datacenter To Algorithmic Economy using Openstack Andrew Yongjoon Kong [email protected] LTHlab

Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack

Embed Size (px)

Citation preview

Kakaocorp

From unmanned Datacenter To Algorithmic Economy using

OpenstackAndrew�Yongjoon Kong

[email protected]

LTHlab

KakaocorpAndrew. Yongjoon kong

• CloudTechnicalAdvisory forGovernmentBroadCastAgency• AdjunctProf.Ajou Univ• KoreaDataBaseAgencyActingProfessor forBigdata• MemberofNationalInformationAgencyBigdata Advisorycommittee• KTcloudware Techlead(ex)!• Kakaoà Daum Kakaoà Kakaocorp,CloudComputing Celllead

Supervised,Koreanedition

KoreanEditioncomingsoon.

KakaocorpOur vision.

KakaocorpOur culture.

Trust,Conflicts,Commitment

KakaocorpWhat is Cloud?

From Our Side

• Cloud == “Programmable Resource Management”• What is Programmable?• What is Resource?• What is Management?• NOP!• Cloud is the one of the ways of managing/deploying

resources

• Basically, It’s culture.

• Tech. can support this culture

• Our culture is “Automation”

KakaocorpSome Numbers

5xxx VMs is running.

We revealed this already last Feb. in Openstack Community Days, Korea

KakaocorpSome Numbers

964 tenants

455 pull request since 2014.9

136 VMs are created/deleted per day

KakaocorpSome information about kakao Openstack

openstack release from grizzly to Kilo

total 3Region

additional service Heat/Trove/Sahara

KakaocorpUnmanned Data Center

Self Managed Computational Resource - Krane

• No Dedicated human resource in Front Desk

for getting order

• 24 x 7 API is open ( try to )

• Using Openstack API is Users Job.

• Maintaining Openstack Cloud is Our Job.

• We do not control anything at all.

KakaocorpSuccess or Issue?

Unit is “krane[virtual money]”.Just for fun, not actually charged.

From only one region.

KakaocorpCritical Volume vs Controlled Volume

When One thing is over Critical Volume, Have to change the point of view!

KakaocorpCloud, We do adopt devops culture : KField

KakaocorpControlled volume

Genesis:- Krane was based on “left over or warranty-

outed resources”- Some hypervisor(not vm) has only 16G. - Interconnect was only 1G- It’s for only “dev” stage service.Exodus:- more than 128GB- 10Gbps - SSD

It needs to have control

The easiest way:- Making quota like everyone does.

KakaocorpCloud, we do have SDN, have No Openflow, No L2 network either

KakaocorpCloud, we do have SDN, not have Openflow, No others

eth0

Compute node

nova-compute

neutron-linuxbridge-

agent

neutron-dhcp-agent

Gateway10.10.100.1

linux bridge

vm

IP:10.10.100.2/32

Routing Table

1 10.100.10.2/32 via 192.1.1.201

BGP

192.1.1.202 BGPVirtual Switch block

Process block

Neutron-l3-agent

vlan

vlan

Virtual Router

Service Route Table

1

Management Route Table

1

PracticeFrugalitytoBoostCreativity

KakaocorpWhy we want to rethink about quota

The thing which doesn’t exist in kakao is:

• No live migration

• No H/A in computed node

• No mirror in system disk

• No bonding for compute node network

• No extra interface for service and storage

à Technically, No extra something for failure

à What if server goes? software will take care of it.

à We recommend user to be ready for some failure.

à We do have LB ,volume and object storage for stability

KakaocorpCulture: Trust

We do understand our developer

is on harsh environment.

And adding Quota to this, make

developer more stressful.

KakaocorpCulture2: Commitment

So, We want our user to have the freedom

of creating resource.

But, We want our user to have the

responsibility of deleting unused resource

too.

We understand this is quite tedious job

So, we decide to find unused VM instead

KakaocorpSimple, Difficult at the same time

It looks simple, but quite difficult to define what “unused vs low usage”resource is.

KakaocorpThe initial the DC scale Garbage Collection

Anyway, somehow, we have to define some guide lines for unused resources. And Should be Done in Algorithmic way.

CPU

Load

Traffic

login

IO

Topprocess

Analysis Noti.

EveryResource Data model

KakaocorpUnified Data System

First of all, unified metric store/retrieve system needed to detect certain levels

of computing resources.

• Have to gather/retrieve unified way

• Have to cover all resources from physical machine to virtual machine and

network switches.

• Have to interface with Configuration Managing Database

• Have to interface with internal ERP

KakaocorpIntegrated Information Service Bus & EIP : Code Name Crow

Based on Opensource

Component• Kafka• Samza• Camel• Storm

• Gobblin• Yarn• HDFS• Etcd• OpenTSDB

• Hbase• Tajo• Grafana

KakaocorpIntegrated Information Service Bus & EIP : Code Name Crow

Enterprise Integration

• Topic based Data ETL• Can cover every computing

resource ( Physical Server, Virtual instance, Container, Public Cloud )

• Abstracting “Data Center Information layer”

• Can make deep engineering experience over every resources.

PhysicalServers

VirtualInstances Containers External

Clouds

Others(switches,

logs)

monitoring

CROWIMS

(kakao CMDBAPI)

SB

RuleEngine

Notification ETL

Data Center Information abstraction layer

API

predicting

scheduling

OpenstackHeat

OtherServiceAPI

DataCenter (orService)ManagementActivity

control

KakaocorpResult

Start up with finding unused Resources

– We created unified monitoring system

– can replace pre-existed system monitoring system

– can extract/analyze more information

– End up with creating brand new resource information center

We try to target 10% as a potential candidate. More than 40% of them was the real “abandoned vm”because of structural changes ( I mean not purposely left)

KakaocorpResult: Add another controlled volume subsystem

KakaocorpQ&A

Q&AP.S. We’re hiring, always!

http://www.kakaocorp.com/recruit