Upload
motonobu-ichimura
View
283
Download
0
Embed Size (px)
Citation preview
Copyright © 2012 NTT DATA Corporation
15/Oct/2012NTT DATA INTELLILINK
Motonobu Ichimura @famao
Inter-cloud object storage: Colony
2Copyright © 2012 NTT DATA INTELLILINK Corporation
http://etherpad.openstack.org/grizzly-colony
EtherPad
3Copyright © 2012 NTT DATA INTELLILINK Corporation
Agenda
• What is Colony ?– Our goal– Usecase
• How to make swift network(or region) aware– Problems with original swift code– Our modification– Investigation– Conclusion
• Future Plan– Problems to tackle (and being tackled)– Collaboration
4Copyright © 2012 NTT DATA INTELLILINK Corporation
What is Colony?
5Copyright © 2012 NTT DATA INTELLILINK Corporation
・・・
Univ. -A Cloud Univ.-B Cloud
Univ.-X Cloud
Academic Community Cloud Education CloudEducation Cloud
Research CloudResearch Cloud
Science Information Network
Goal: academic community cloud
5
Intercloud servicesIntercloud services
6Copyright © 2012 NTT DATA INTELLILINK Corporation
Intercloud object storage service
Swift for intercloud use
Swift
Swift
Swift
Swift for intercloud use
Swift for intercloud use
Swift for local use
Swift for intercloud use
Nova
Nova
Nova
Glance
GlanceGlance
Colony federates cloud object storage services, like swift, to archive intercloud objectstorage service.
7Copyright © 2012 NTT DATA INTELLILINK Corporation
Swift-I
Cloud-A
Swift-AContainer A1Container A2Container A3
Inter-cloud Container I1Inter-cloud Container I4
Object A1-1Object A1-2Object A1-3
Object I4-1Object I4-2Object I4-3
Cloud-BContainer B1Container B2Container B3
Inter-cloud Container I1Inter-cloud Container I8
Object B1-1Object B1-2Object B1-3
Object I1-1Object I1-2Object I1-3
Inter-cloud object storage service : colony
Cloud Services
Inter-cloud Container I1Inter-cloud Container I2Inter-cloud Container I3
Inter-cloud Container I13
Inter-cloud Container I10
Inter-cloud Container I4
Swift-B
Geographically
Distributed
Geographically
Distributed Object I4-1Object I4-2Object I4-3
Object I1-1Object I1-2Object I1-3
Users’ points of view
7
8Copyright © 2012 NTT DATA INTELLILINK Corporation
Colony archives the federation
ColonyApache
mod_wsgi mod_shib
Colony-horizon
Colony-keystoneColony-
dispatcherSquid
Slapd
Ubuntu
Colony-Keystone
Slapd
Swift
Colony-Keystone
Slapd
Swift
Provide seamless access to multiple swifts
Authenticate with Shibboleth IdP
Shibboleth IdP
Cloud-A User
Swift-I Swift-A
9Copyright © 2012 NTT DATA INTELLILINK Corporation
UseCaseWe plan to use Colony as
Object Storage for Clouds to Clouds migrationObject Storage to delevery VM Images around Japan
Object Storage to store big data.
10Copyright © 2012 NTT DATA INTELLILINK Corporation
Developed software components in colony• Colony-Horizon – based on diablo/stable Horizon with some enhancements
• Multi-region support – Users can choose which swift is used to store/retrieve objects
• Swift Container’s ACL ,metadata support• Swift Object’s metadata support• >5G segment upload support …
• Colony-Keystone – based on diablo/stable Keystone with some enhancements
• Authenticate with Shibboleth• %{tanant_name} can be used for endpointTemplates in addition to %
{tenant_id} to federate cloud services• Colony-Dispatcher - new
• Relay requests to multiple object services (and merge response for clients)
• Relay requests to a specific object service indicated by URI• Choose the “nearest” swift-proxy server to relay requests• Copy objects among different swifts
• Utilities - new• Tools to simplfy admin tasks to federate object storage services
11Copyright © 2012 NTT DATA INTELLILINK Corporation
Swift -ASwift -A
Colony-horizon
Swift -ISwift -I Users can choose swift
12Copyright © 2012 NTT DATA INTELLILINK Corporation
Shibboleth SPShibboleth SPColony-HorizonColony-HorizonColony-HorizonColony-Horizon
Shibboleth IdP
Colony-KeystoneColony-
Keystone
Colony – keystone
1. ID/passwd 2. Attribute: ePPN, mail_addr
3. Attribute: ePPN
4. auth_token
0-1. User registration by mail_addr0-2 . Associate ePPN to mail_addr by initial access
Modifications to keystone • Add ePPN field to keystone schema• ADD rest api services to create token by ePPN ('/token_by/eppn') and email address('/token_by/email') • Add a rest api service to register/update ePPN ('/users/{user_id}/eppn')
• Add ePPN field to keystone schema• ADD rest api services to create token by ePPN ('/token_by/eppn') and email address('/token_by/email') • Add a rest api service to register/update ePPN ('/users/{user_id}/eppn')
13Copyright © 2012 NTT DATA INTELLILINK Corporation
Colony-dispatcher
Swift Proxy
Colony Dispatcher
Swift Proxy Swift Proxy
Swift-A (local) Swift-I (intercloud )
A:container1A:container2I:container1I:container2
Swift Client
1. Swift client can send requests to Swift-A and Swift-I through Swift Dispatcher2. Swift Dispatcher merges and sends the response from each Swift to Swift Client
Requests modified for merging responses.•Account Info•Container List•X-Copy-from/to
Response merged by Colony Dispatcher has a prefix to indicate which Swift is used to store.
Response merged by Colony Dispatcher has a prefix to indicate which Swift is used to store.
13
14Copyright © 2012 NTT DATA INTELLILINK Corporation
A:container1A:container2I:container1I:container2
Swift Client
Swift Dispatcher can use cache proxy (like squid) per swift proxy to retrieve objects from remote swifts.
Caching
14
Swift Proxy
Colony Dispatcher
Swift Proxy Swift Proxy
Swift-A (local) Swift-I (intercloud )
Cache(Proxy)
15Copyright © 2012 NTT DATA INTELLILINK Corporation
How to swift make network aware
16Copyright © 2012 NTT DATA INTELLILINK Corporation
Current implementation
17Copyright © 2012 NTT DATA INTELLILINK Corporation
Problems which original swift code has
• PUT/GET performance– Swift proxy waits all objects are put to storage servers.– Swift proxy chooses randomly the node to retrieve object.
18Copyright © 2012 NTT DATA INTELLILINK Corporation
Test Environments
Sapporo
Tokyo
200-850Mbps(18msec)
9900MBps
900MBps(0.1msec)
CPU: AMD Opetron 6128 2000Mhz (16core)Mem: 32GBNIC: 10000baseT/Full
CPU: Intel(R) Xeon(R) CPU E7- 8870 (40core)Mem: 126GBNIC: 1000baseT/Full
x2
x2
19Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
Object PUT operation is always affected by the worst case.
20Copyright © 2012 NTT DATA INTELLILINK Corporation
name @Tokyo @Sapporo
1K 1 2
1M 2 1
10M 1 2
100M 2 1
1G 2 1
Object's location
21Copyright © 2012 NTT DATA INTELLILINK Corporation
1 2 3 4 5
1K 4,857 5,596 2,384 405 7,844
1M 1,109,196 1,161,519 1,157,529 1,092,685 1,162,359
10M 2,052,541 1,935,695 2,066,010 2,065,412 2,068,340
100M 9,425,346 9,411,894 9,441,722 9,427,770 9,432,213
1G 47,020,441 47,032,115 47,667,067 47,083,438 47,852,594
PUT object's throughput @Tokyo (Bytes/sec)
22Copyright © 2012 NTT DATA INTELLILINK Corporation
GET operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
1/replications
High-bandwidth, low-latency
High-bandwidth, low-latency
23Copyright © 2012 NTT DATA INTELLILINK Corporation
name @Tokyo @Sapporo
1K 1 2
1M 2 1
10M 1 2
100M 2 1
1G 2 1
1.txt (1G) 3 0
5.txt (1G) 0 3
Object's location
24Copyright © 2012 NTT DATA INTELLILINK Corporation
1 2 3 4 5
1K 8,859 8,165 8,225 11,455 11,504
1M 1,222,259 1,172,193 1,149,629 1,148,493 49,542,924
10M 96,848,249 97,777,529 2,098,071 100,899,319 99,814,948
100M 104,857,600 9,670,414 9,672,893 9,658,095 9,657,313
1G 117,490,592 115,273,333 51,117,116 51,109,464 51,099,616
1.txt(Worst case)
51,085,780 44,245,222 50,812,419 50,923,435 51,066,880
5.txt(Best case)
117,473,740 115,216,645 115,340,248 115,288,545 114,347,285
GET object's throughput @Tokyo (Bytes/sec)
Performance degradation by network between Sapporo and Tokyo
25Copyright © 2012 NTT DATA INTELLILINK Corporation
Our modification
26Copyright © 2012 NTT DATA INTELLILINK Corporation
How to solve - Basic Idea
• Limitation– Don’t modify data structure (including ring)– Minimize customization
• Adding some rules to the ring’s data strcuture– Zone information is treated as decimal number, so consider
difference between zoneA and ZoneB represents a distance of zoneA and ZoneB
• Adding some zone hints to Swift proxy servers• Changes the order of nodes for Proxy server.
27Copyright © 2012 NTT DATA INTELLILINK Corporation
How to solve
[app:proxy-server]
nearby_mode = false
own_zone = 100
near_distance = 10
Tokyo
Sapporo
zone 100-102
zone 200-202
ProxyZone 100Distance
10
ProxyZone 200Distance
10
Proxy ,which has zone info(100) and zone distance(10), considersstorage servers between zone 100-110 to be located near the proxy.
Proxy , which has zone info(200) and zone distance(10), considersstorage servers between zone 200-210 to be located near the proxy.
28Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT operation
Tokyo
Proxy
StorageA
StorageB
StorageC
Sapporo
Storage D
StorageF
StorageG
Client
Proxy initially puts objects to the nearest storage servers using zone information and zone distance. Then object replicator replicates it the proper position asyncronasly.
zone_info: 100zone_distance: 10
29Copyright © 2012 NTT DATA INTELLILINK Corporation
PUT operation
Tokyo
Proxy
StorageA
StorageB
StorageC
Sapporo
StorageD
StorageE
StorageF
Client
××
××
××
Hinted hand off
This is the same situation that all storage servers located in Supporo are broken.
30Copyright © 2012 NTT DATA INTELLILINK Corporation
GET operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
1. First, try to retrieve object from storage server near the proxy.
2. After that, try to retrieve object from storage server indicated as a primary zone
31Copyright © 2012 NTT DATA INTELLILINK Corporation
DELETE operation
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
1. First, try to delete object from storage server near the proxy
2. After that, try to delete object from storage server indicated as a primary zone
32Copyright © 2012 NTT DATA INTELLILINK Corporation
Code
def get_near_nodes(self, account, container, obj, own_zone, near_distance): """ Get the partition and nodes same as get_nodes,
:param account: account name :param container: container name :param obj: object name :param own_zone: top number of zone name :param near_distance: recognize matched zone name which start from own_zone to a number add own_zone and this number. :returns: a tuple of (partition, list of node dicts) """ part, nodes = self.get_nodes(account, container, obj)
def isnearby(one, other, distance): if one <= other and one + distance > other: return True return False
near_nodes = [] for node in nodes: if isnearby(own_zone, node['zone'], near_distance): near_nodes.append(node) if len(near_nodes) <= self.replica_count: for node in self.get_more_nodes(part): if isnearby(own_zone, node['zone'], near_distance): near_nodes.append(node) if len(near_nodes) >= self.replica_count: break return part, near_nodes
ring.py@@ -1044,6 +1056,14 @@ def POST(self, req): 1056 container_partition, containers, _junk, req.acl, _junk = \ 1057 self.container_info(self.account_name, self.container_name, 1058 account_autocreate=self.app.account_autocreate) 1059 + if self.app.nearby_mode: 1060 + partition, near_nodes = self.app.object_ring.get_near_nodes( 1061 + self.account_name, self.container_name, self.object_name, 1062 + self.app.own_zone, self.app.near_distance) 1063 + print 'before nodes: %s' % containers 1064 + containers = near_nodes + \ 1065 + [cont for cont in containers if cont['zone'] not in [c['zone'] for c in near_nodes]] 1066 + print 'after nodes: %s' % containers 1047 1067 if 'swift.authorize' in req.environ: 1048 1068 aresp = req.environ['swift.authorize'](req) 1049 1069 if aresp:
adding get_near_nodes() to ring.py
proxy/server.py
and then modify proxy/server.py to use get_near_nodes() for each method.
33Copyright © 2012 NTT DATA INTELLILINK Corporation
Investigation
1K 1M 10M 100M 1G0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
PUT Average (bytes/sec) @Sapporo
OriginalPatched
1K 1M 10M 100M 1G0
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
140,000,000
160,000,000
PUT Average (bytes/sec) @Tokyo
OriginalPatched
34Copyright © 2012 NTT DATA INTELLILINK Corporation
Using Cache
Tokyo
Proxy
Storage
Storage
Storage
Sapporo
Storage
Storage
Storage
Client
Kyusyu
Proxy
How about the case of all objects are located to remote areas ?
35Copyright © 2012 NTT DATA INTELLILINK Corporation
Colony-Dispatcher as a cache
Colony-Dispatcher can be a swift-proxy-proxy with cache mechanism
36Copyright © 2012 NTT DATA INTELLILINK Corporation
Investigation – Cache effectiveness
Using Colony-Dispatcher as a cache, the performance to retrieve objects from remote area could be nice.
1K 1M 10M 100M 1G0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
GET average (bytes/sec) @Tokyo
Column KColumn KColumn KColumn K
1K 1M 10M 100M 1G0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
300,000,000
350,000,000
GET average (bytes/sec) @Sapporo
Column KColumn KColumn KColumn K
37Copyright © 2012 NTT DATA INTELLILINK Corporation
Conclusion
• Re-ordering the nodes by regions for Proxy resolves GET/PUT performance issues
– And this feature can be implemented with minimum(<50 lines of code) customization.
• Using cache is a good idea for inter-cloud use
38Copyright © 2012 NTT DATA INTELLILINK Corporation
Our future plan
39Copyright © 2012 NTT DATA INTELLILINK Corporation
Problems to tackle
• Object’s location• Adding Region concepts to the ring structure might help.
– Primary nodes isolated by region
• Replication’s performance– Key factor
• We aggressivelly used hinted-hand-off mechanism to – Using UDT instead of TCP for replication– Using pyinotify to I/O event driven replication– Separation of Network for replication– Hop by Hop replication
40Copyright © 2012 NTT DATA INTELLILINK Corporation
Are you interested in Colony ?
• Please contact with me if you are interested in Colony project.– We want to collaborate with people who want to use/develop swift as
a inter-cloud object store.
41Copyright © 2012 NTT DATA INTELLILINK Corporation
Are you interested in academic clouds?
• If you are interested in the way how to integrate clouds using dodai and clony
– My colleague (guan-san) will make a presentation about dodai (Cluster as a service) at 17:20 @Manchester A
– Yokoyama-san (a member of NII) might talk about how to integrate both Colony and Dodai on LT
42Copyright © 2012 NTT DATA INTELLILINK Corporation
Thank you.
43Copyright © 2012 NTT DATA INTELLILINK Corporation
Q&A
• Please phase your question using simple grammar if possible.
Copyright © 2011 NTT DATA Corporation
Copyright © 2012 NTT DATA INTELLILINK Corporation