Upload
andrew-yongjoon-kong
View
960
Download
2
Embed Size (px)
Citation preview
Kakaocorp
From unmanned Datacenter To Algorithmic Economy using
OpenstackAndrew�Yongjoon Kong
LTHlab
KakaocorpAndrew. Yongjoon kong
• CloudTechnicalAdvisory forGovernmentBroadCastAgency• AdjunctProf.Ajou Univ• KoreaDataBaseAgencyActingProfessor forBigdata• MemberofNationalInformationAgencyBigdata Advisorycommittee• KTcloudware Techlead(ex)!• Kakaoà Daum Kakaoà Kakaocorp,CloudComputing Celllead
Supervised,Koreanedition
KoreanEditioncomingsoon.
KakaocorpWhat is Cloud?
From Our Side
• Cloud == “Programmable Resource Management”• What is Programmable?• What is Resource?• What is Management?• NOP!• Cloud is the one of the ways of managing/deploying
resources
• Basically, It’s culture.
• Tech. can support this culture
• Our culture is “Automation”
KakaocorpSome Numbers
5xxx VMs is running.
We revealed this already last Feb. in Openstack Community Days, Korea
KakaocorpSome information about kakao Openstack
openstack release from grizzly to Kilo
total 3Region
additional service Heat/Trove/Sahara
KakaocorpUnmanned Data Center
Self Managed Computational Resource - Krane
• No Dedicated human resource in Front Desk
for getting order
• 24 x 7 API is open ( try to )
• Using Openstack API is Users Job.
• Maintaining Openstack Cloud is Our Job.
• We do not control anything at all.
KakaocorpSuccess or Issue?
Unit is “krane[virtual money]”.Just for fun, not actually charged.
From only one region.
KakaocorpCritical Volume vs Controlled Volume
When One thing is over Critical Volume, Have to change the point of view!
KakaocorpControlled volume
Genesis:- Krane was based on “left over or warranty-
outed resources”- Some hypervisor(not vm) has only 16G. - Interconnect was only 1G- It’s for only “dev” stage service.Exodus:- more than 128GB- 10Gbps - SSD
It needs to have control
The easiest way:- Making quota like everyone does.
KakaocorpCloud, we do have SDN, not have Openflow, No others
eth0
Compute node
nova-compute
neutron-linuxbridge-
agent
neutron-dhcp-agent
Gateway10.10.100.1
linux bridge
vm
IP:10.10.100.2/32
Routing Table
1 10.100.10.2/32 via 192.1.1.201
BGP
192.1.1.202 BGPVirtual Switch block
Process block
Neutron-l3-agent
vlan
vlan
Virtual Router
Service Route Table
1
Management Route Table
1
PracticeFrugalitytoBoostCreativity
KakaocorpWhy we want to rethink about quota
The thing which doesn’t exist in kakao is:
• No live migration
• No H/A in computed node
• No mirror in system disk
• No bonding for compute node network
• No extra interface for service and storage
à Technically, No extra something for failure
à What if server goes? software will take care of it.
à We recommend user to be ready for some failure.
à We do have LB ,volume and object storage for stability
KakaocorpCulture: Trust
We do understand our developer
is on harsh environment.
And adding Quota to this, make
developer more stressful.
KakaocorpCulture2: Commitment
So, We want our user to have the freedom
of creating resource.
But, We want our user to have the
responsibility of deleting unused resource
too.
We understand this is quite tedious job
So, we decide to find unused VM instead
KakaocorpSimple, Difficult at the same time
It looks simple, but quite difficult to define what “unused vs low usage”resource is.
KakaocorpThe initial the DC scale Garbage Collection
Anyway, somehow, we have to define some guide lines for unused resources. And Should be Done in Algorithmic way.
CPU
Load
Traffic
login
IO
Topprocess
Analysis Noti.
EveryResource Data model
KakaocorpUnified Data System
First of all, unified metric store/retrieve system needed to detect certain levels
of computing resources.
• Have to gather/retrieve unified way
• Have to cover all resources from physical machine to virtual machine and
network switches.
• Have to interface with Configuration Managing Database
• Have to interface with internal ERP
KakaocorpIntegrated Information Service Bus & EIP : Code Name Crow
Based on Opensource
Component• Kafka• Samza• Camel• Storm
• Gobblin• Yarn• HDFS• Etcd• OpenTSDB
• Hbase• Tajo• Grafana
KakaocorpIntegrated Information Service Bus & EIP : Code Name Crow
Enterprise Integration
• Topic based Data ETL• Can cover every computing
resource ( Physical Server, Virtual instance, Container, Public Cloud )
• Abstracting “Data Center Information layer”
• Can make deep engineering experience over every resources.
PhysicalServers
VirtualInstances Containers External
Clouds
Others(switches,
logs)
monitoring
CROWIMS
(kakao CMDBAPI)
SB
RuleEngine
Notification ETL
Data Center Information abstraction layer
API
predicting
scheduling
OpenstackHeat
OtherServiceAPI
DataCenter (orService)ManagementActivity
control
KakaocorpResult
Start up with finding unused Resources
– We created unified monitoring system
– can replace pre-existed system monitoring system
– can extract/analyze more information
– End up with creating brand new resource information center
We try to target 10% as a potential candidate. More than 40% of them was the real “abandoned vm”because of structural changes ( I mean not purposely left)