View
39
Download
7
Category
Preview:
DESCRIPTION
Peer-to-Peer Supported Cache System for File Transfer. 2003.8.28 Joonbok Lee KAIST jblee@cosmos.kaist.ac.kr. Contents. Motivation Problem Statement Related Work Approach Simulation Conclusion Reference. 1. Motivation. KAIST Netflow Measurement (2002.10.4) - PowerPoint PPT Presentation
Citation preview
Peer-to-Peer SupportedPeer-to-Peer SupportedCache System for File TransferCache System for File Transfer
2003.8.282003.8.28
Joonbok LeeJoonbok Lee
KAISTKAIST
jblee@cosmos.kaist.ac.krjblee@cosmos.kaist.ac.kr
ContentsContents
1.1. MotivationMotivation
2.2. Problem StatementProblem Statement
3.3. Related WorkRelated Work
4.4. ApproachApproach
5.5. SimulationSimulation
6.6. ConclusionConclusion
7.7. ReferenceReference
1. Motivation1. Motivation► KAIST Netflow MeasurementKAIST Netflow Measurement (2002.10.4)(2002.10.4)
Analyze the flow data of KAIST Border Router. Analyze the flow data of KAIST Border Router.
http17%
ftp-data17%
nntp6%
telnet0.4%
Microsoft- ds0.5%
NETBIOS-ss
0.4%
unknown60%
Fig 2. Cumulative Distribution Function of the files transferred by FTP and HTTP.
1/17
102
104
106
108
0
0.2
0.4
0.6
0.8
1
File Size(Byte)
Ra
tio
HTTPFTP
10MB
Some Findings: 1) The amount of bandwidth consumed by FTP is similar with the one
consumed by HTTP2) 78% of the FTP traffic is due to the large files which is larger than
10MB.
Fig 1. The byte ratio in terms of Protocols
2. Problem Statement2. Problem Statement
► Unnegligible access to the large multimedia data. Unnegligible access to the large multimedia data. [Jung00][Jung00]
► FTP Traffic: FTP Traffic: 17% of total traffic.17% of total traffic. 78% of them are larger than 10MB.78% of them are larger than 10MB. 11% of them were failed during transfer.11% of them were failed during transfer.
► The The large fileslarge files transferred by FTP generate much transferred by FTP generate much traffic,traffic, and and many of them takes long time.many of them takes long time.
► To solve this problem, we propose To solve this problem, we propose HTTP/FTP proxy HTTP/FTP proxy cachecache which is scalable in terms of which is scalable in terms of bandwidth and bandwidth and storagestorage..
2/17
3. Related Work3. Related Work
► The researches which solve large files’ The researches which solve large files’ transfer.transfer. RepliCache: A New Approach to Scalable Networking RepliCache: A New Approach to Scalable Networking
Storage System for Large Objects [Jung97]Storage System for Large Objects [Jung97] Proactive Web caching with cumulative prefetching Proactive Web caching with cumulative prefetching
[Jung00][Jung00]
► The researches which has scalable The researches which has scalable architecture.architecture. Squirrel: A decentralized peer-to-peer web cache Squirrel: A decentralized peer-to-peer web cache
[Iyer02][Iyer02] Peer-to-Peer Caching Scheme to Address Flash Peer-to-Peer Caching Scheme to Address Flash
Crowds[Stading02]Crowds[Stading02]
3/17
4. Approach 4. Approach
4.1 Motivation4.1 Motivation
4.2 Cache with Peer-to-Peer Storage4.2 Cache with Peer-to-Peer Storage
4.3 Model4.3 Model
4.4 Detail Design4.4 Detail Design
4/17
4.1 Motivation4.1 Motivation
► Peer-to-Peer Architecture as a CachePeer-to-Peer Architecture as a Cache Scalability (bandwidth, computing power and Scalability (bandwidth, computing power and
storage)storage) Cost Cost Overhead (to find object and to persist system)Overhead (to find object and to persist system)
► The LatencyThe Latency One of the important metric of cache performance.One of the important metric of cache performance. the lookup time + delivery timethe lookup time + delivery time Delivery time is depend on the file size.Delivery time is depend on the file size. Small files: Small files: the lookup timethe lookup time dominate dominate
Large files: Large files: the deliver timethe deliver time dominate dominate
5/17
4.2 Cache with Peer-to-Peer 4.2 Cache with Peer-to-Peer StorageStorage► Hybrid ApproachHybrid Approach
Scalability: peer-to-peer storageScalability: peer-to-peer storage Lookup and control: central cache.Lookup and control: central cache.
► Peer-to-Peer two-layer storagePeer-to-Peer two-layer storage The storage in central cacheThe storage in central cache
► Expected to be always available, low latency.Expected to be always available, low latency.► Store small files.Store small files.
The second tier storagesThe second tier storages► can be unavailable.can be unavailable.► Store large files. Store large files.
6/17
Os1
Connectivity Cloud
Peer 1
OS1 ,OS2 : Small objectOL1, OL2 : Large object
4.3 Model 4.3 Model HTTP/ FTP Server A
Local Area Network
Peer 2
Peer n
,Os2
OL1 OL2 OL1OL1
Peer-to-Peer Storage
Os1
OL1
Web Proxy Cache with FTP supporting module
HTTP/ FTP Server B
Os1
Fig 3. Cache with two-layer storage
7/17
4.4 Detail Design4.4 Detail Design
► 2 new components to support 2 new components to support FTP and large files.FTP and large files. Preserve transparency of File Preserve transparency of File
LocationLocation
► FTP Cache DaemonFTP Cache Daemon Store the state of FTP Store the state of FTP
connectionconnection Make the URL of files Make the URL of files
transferred by FTPtransferred by FTP Check consistency. Check consistency.
► P2P Storage ManagerP2P Storage Manager Control its own storage. Control its own storage. Managed by object table in Managed by object table in
central cache.central cache.
HTTP Cache Daemon
FTP Cache Daemon
Object TableStorageManager
FTP/HTTP
Server
FTP/HTTP Client
P2P Storage Manager
FTP/HTTP Client
P2P Storage Manager
1
34
44
2
ControlData
Fig 4. Control and Data connection between components
8/17
5. Simulation5. Simulation
5.1 Simulation Environment5.1 Simulation Environment
5.2 Simulation Result5.2 Simulation Result
9/17
5.1 Simulation Environment5.1 Simulation Environment
► TraceTrace Requested FTP file listRequested FTP file list Gather the FTP control (port 21) packet and produce the Gather the FTP control (port 21) packet and produce the
tracetrace► 2002.10.23 ~ 2002.11.5 ( two weeks2002.10.23 ~ 2002.11.5 ( two weeks ))
76,880 (783GB)76,880 (783GB) file requests.file requests. 417 clients 417 clients
► AssumptionAssumption Local Network: 100MbpsLocal Network: 100Mbps
► Simulated CachesSimulated Caches Cache A: 100GB Storage, 100Mbps Cache A: 100GB Storage, 100Mbps Cache B: Infinite Storage, 100MbpsCache B: Infinite Storage, 100Mbps Cache C: Infinite Storage, Infinite BandwidthCache C: Infinite Storage, Infinite Bandwidth Cache D: Cache with Peer-to-Peer StorageCache D: Cache with Peer-to-Peer Storage
10/17
5.2 Simulation Result: Hit Ratio5.2 Simulation Result: Hit Ratio
Fig 5. Cache Hit Ratio
11/17
0%
10%
20%
30%
40%
50%
60%
Cache A Cache B Cache C Cache D
Hit
Rati
o(%)
Count Hit Ratio
Byte Hit Ratio
0
100
200
300
400
500
600
700
800
900
NoCache
Cache A Cache B Cache C Cache D
Traffi
c(GB
)Fig 6. Outbound traffic
No strict storage control
• Some peers may have same files in their storage
• Even though some peers have available storage, the other peers can remove the file from their cache as a victim.
• degrade the performance of storage, but not much.
5.2 Simulation Result: Latency 5.2 Simulation Result: Latency
Fig 7. Average latency of 95~105MB files
12/17
0
200
400
600
800
1000
1200
1400
No Cache Cache A Cache B Cache C Cache D
Tim
e(S
econd)
0
0.5
1
1.5
2
NoCache
Cache A Cache B Cache C Cache DTim
e(S
econd)
Fig 8. Average latency of 95~105KB files
Without the increase of small files’ latency, we can reduce the latency of large files.
0%
10%
20%
30%
40%
50%
60%
0% 50% 100%
Peer Failure Ratio(%)
Hit
Rati
o(%)
Byte Hit ratioCount Hit Ratio
5.2 Simulation Result5.2 Simulation Result :Cache Hit Ratio degradation by the peer :Cache Hit Ratio degradation by the peer failurefailure
Fig 8. Cache hit ratio degradation by the peer failure
13/17
30%
6. Conclusion6. Conclusion
1)1) Shows that much amount of traffic is produced by FTP Shows that much amount of traffic is produced by FTP by the measurement. Among them,78% were by the measurement. Among them,78% were occurred by the files larger than 10MB.occurred by the files larger than 10MB.
2)2) Propose the cache system which has two-layer Propose the cache system which has two-layer storage using peer-to-peer architecture. It is storage using peer-to-peer architecture. It is transparent to the location of files.transparent to the location of files.
3)3) Shows that two layer storage has good performance Shows that two layer storage has good performance for the large files as well as small files using trace-for the large files as well as small files using trace-driven simulation.driven simulation.
4)4) Can reduce the outbound traffic and latency by Can reduce the outbound traffic and latency by caching using our sistem. caching using our sistem.
► Other issuesOther issues Collaboration between proposed systems.Collaboration between proposed systems. Load balancing between peers.Load balancing between peers. Security problem.Security problem.
15/17
7. Reference7. Reference
► Jaeyeon Jung, “RepliCache: Enhancing Web Caching Architecture with Jaeyeon Jung, “RepliCache: Enhancing Web Caching Architecture with Replication of Large Objects”Replication of Large Objects”
► Jaeyeon Jung, Dongman Lee and Kilnam Chon, "Proactive Web Caching with Jaeyeon Jung, Dongman Lee and Kilnam Chon, "Proactive Web Caching with Cumulative Prefetching for Large Multimedia Data" , Cumulative Prefetching for Large Multimedia Data" , Computer Networks 33 Computer Networks 33 (2000) pp. 645-655(2000) pp. 645-655
► Sitaram Iyer, Ant Rowstron and Peter Druschel, “Squirrel: A decentralized Sitaram Iyer, Ant Rowstron and Peter Druschel, “Squirrel: A decentralized peer-to-peer web cache” In Proceedings of the PODC ’02, Monterey, CA peer-to-peer web cache” In Proceedings of the PODC ’02, Monterey, CA
► Tyron Stading, Petros Maniatis, Mary Baker, “Peer-to-Peer Caching Schemes Tyron Stading, Petros Maniatis, Mary Baker, “Peer-to-Peer Caching Schemes to Address Flash Crowds”, In Proceedings of the IPTPS ’02, MA, USAto Address Flash Crowds”, In Proceedings of the IPTPS ’02, MA, USA
► Hyun-chul Kim, Joonbock Lee, Jungwon Suh, and Kilnam Chon, Hyun-chul Kim, Joonbock Lee, Jungwon Suh, and Kilnam Chon, “Measurements of File-Systems Deployed on High-Performance Research “Measurements of File-Systems Deployed on High-Performance Research and Education Networks”, Technical Reportand Education Networks”, Technical Report
► I.Stoica , R. Morris, D. Karger, F.Kaas hoek, and H.Balakrishnan. Chord: A I.Stoica , R. Morris, D. Karger, F.Kaas hoek, and H.Balakrishnan. Chord: A scalable content-addressable network. In Proceedings of the ACM SIGCOMM scalable content-addressable network. In Proceedings of the ACM SIGCOMM 2001 Technical Conference, San Diego, CA, USA, August 20012001 Technical Conference, San Diego, CA, USA, August 2001
► S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. “A scalable S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. “A scalable content-addressable network.” In Proceedings of the ACM SIGCOMM 2001 content-addressable network.” In Proceedings of the ACM SIGCOMM 2001 Technical Conference, San Diego, CA, USA, August 2001.Technical Conference, San Diego, CA, USA, August 2001.
16/17
7. Reference7. Reference
► A. Rowstron and P. Druschel, "A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systemsrouting for large-scale peer-to-peer systems". IFIP/ACM International ". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001. Germany, pages 329-350, November, 2001.
► Ian Clarke, Theodore W. Hong, Scott G. Miller, Oskar Sandberg, and Brandon Ian Clarke, Theodore W. Hong, Scott G. Miller, Oskar Sandberg, and Brandon Wiley, "Protecting Free Expression Online with Freenet," IEEE Internet Wiley, "Protecting Free Expression Online with Freenet," IEEE Internet Computing 6(1), January/February 2002.Computing 6(1), January/February 2002.
► William J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer, Feasibility William J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer, Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs In proceeding of SIGMETRICS 2000PCs In proceeding of SIGMETRICS 2000
► Internet RFC 959 File Transfer ProtocolInternet RFC 959 File Transfer Protocol
17/17
Request File
Check Protocol
Lookup Object Table
Check Consistency
Check Cached Location
Open FTP control connections to both peer which has file and peer which requests is.
Make FTP data connections between two the peers.
HTTP
FTPnot cached
cached
inconsistent
consistent
peer
Handle a request like web proxy cache
Transfer file
Check File Size
Central cache opens data connection to client.
central server
Update Object Table
Transfer file
Opens data connection between server and client
Transfer file
Server opens data connection to central cache.
Update Object Table
small
Large
Central cache opens data connection to client.
Transfer file
Update Object Table
Appendix A
Recommended