Performance Modeling and Evaluation of Peer-to-Peer Live Streaming Systems Under Flash Crowds

1106 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 4, AUGUST 2014

Performance Modeling and Evaluation ofPeer-to-Peer Live Streaming Systems

Under Flash CrowdsYishuai Chen, Baoxian Zhang, Senior Member, IEEE, Changjia Chen, Senior Member, IEEE, and

Dah Ming Chiu, Fellow, IEEE

Abstract—A peer-to-peer (P2P) live streaming system faces abig challenge under flash crowds. When a flash crowd occurs, thesudden arrival of numerous peers may starve the upload capacityof the system, hurt its quality of service, and even cause systemcollapse. This paper provides a comprehensive study on the perfor-mance of P2P live streaming systems under flash crowds. By mod-eling the systems using a fluid model, we study the system capacity,peer startup latency, and system recovery time of systems with andwithout admission control for flash crowds, respectively. Our studydemonstrates that, without admission control, a P2P live streamingsystem has limited capacity to handle flash crowds. We quantifythis capacity by the largest flash crowd (measured in shock level)that the system can handle, and further find this capacity is inde-pendent of system initial state while decreasing as departure rate ofstable peer increases, in a power-law relationship.We also establishthe mathematical relationship of flash crowd size to the worst-casepeer startup latency and system recovery time. For a system withadmission control, we prove that it can recover stability under flashcrowds of any sizes. Moreover, its worst-case peer startup latencyand system recovery time increase logarithmically with the flashcrowd size. Based on the analytical results, we present detailedflash crowd handling strategies, which can be used to achieve sat-isfying peer startup performance while keeping system stability inthe presence of flash crowds under different circumstances.

Index Terms—Flash crowd, modeling, peer-to-peer, streamingmedia, videos.

I. INTRODUCTION

A S PEER-TO-PEER (P2P) live streaming systems becomepopular over the Internet [1], study on the performance of

such systems under flash crowds is becoming critical. A flash

Manuscript received March 13, 2012; revised October 29, 2012 and February04, 2013; accepted June 03, 2013; approved by IEEE/ACM TRANSACTIONS ONNETWORKING Editor T. Bonald. Date of publication August 02, 2013; date ofcurrent version August 14, 2014. This work was supported by the NSF of Chinaunder Grants 61271199, 61173158, and 61101133; HK RGC Grant 411508;the National Key Special Program of China under Grant No. 2010ZX03006-001-02; and Fundamental Research Funds in Beijing Jiaotong University underGrant W11JB00630.Y. Chen and C. Chen are with the School of Electrical and Information

Engineering, Beijing Jiaotong University, Beijing 100044, China (e-mail:[email protected]; [email protected]).B. Zhang is with the Research Center of Ubiquitous Sensor Networks,

University of Chinese Academy of Sciences, Beijing 100049, China (e-mail:[email protected]).D. M. Chiu is with the Department of Information Engineering, The Chinese

University of Hong Kong, Hong Kong (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TNET.2013.2272056

crowd is a sudden arrival of numerous peers at a system. Itis typically triggered by popular programs that cause a surgeof users to join the system at a scheduled time [2]. Existingmeasurement results on commercial P2P live streaming systems(e.g., UUSee [3] and CoolStreaming [4]) indicate that the per-formance of such systems is in general acceptable under small-sized flash crowds [3], but degrades seriously under large-sizedflash crowds [4]. In the latter case, a lot of users are congestedand cannot watch the program normally.In this paper, we focus on studying a fundamental issue

about the performance of a P2P live streaming system underflash crowds: How and to which degree does it scale well tothe size of flash crowds? Although earlier studies [5]–[7] haveobtained some preliminary understandings of behavior of P2Plive streaming system under flash crowds by measurements andanalysis, a concise characterization of the maximal size of flashcrowds that such a system can handle is still missing. In thispaper, we answer these questions using mathematical analysis.The major contributions are as follows.• We build fluid-based models for P2P live streaming sys-tems with and without admission control, respectively.These models establish the relationship between peerparameters (including number of startup peers, numberof stable peers, and peer startup latency) and systemparameters (including peer arriving rate, peer departurerate, and system upload bandwidth) and characterize thegeneric startup process of peers and the system stabiliza-tion process under flash crowds of various sizes.

• For a system without admission control, we find that itscapacity for handling flash crowds is limited and can bequantified by the maximum shock level of flash crowd thata system can sustain. The shock level of a flash crowdis defined to be equal to the new peer arriving rate afterthe flash crowd occurs divided by the original peer ar-riving rate before the flash crowd occurs [8]. Beyond thecapacity, the system collapses. Furthermore, we find thatthis system capacity is independent of the system’s initialstate (i.e., number of already-online peers before the flashcrowd), while power law decreases with the departure rateof stable peer. We also establish the relation of flash crowdsize to the worst-case peer startup latency and system re-covery time. Accordingly, given the maximum allowablepeer startup latency, we can find the maximum shock levelof flash crowd that a system can support.

• For a system with admission control, we prove that it canrecover stability under flash crowds of any sizes, and the

1063-6692 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

CHEN et al.: P2P LIVE STREAMING SYSTEMS UNDER FLASH CROWDS 1107

worst-case peer startup latency and system recovery timeincrease logarithmically with the size of flash crowd, whichshows the superiority of admission control to help a systemsustain large flash crowds.

• We conduct simulations to validate the analytical results.Based on these analytical results, we further discuss somemajor considerations in designing flash crowd handlingstrategies and present detailed flash crowd handling algo-rithms for various situations.

The rest of this paper is organized as follows. Section II re-views related work. Section III introduces the problems causedby flash crowds. Sections IV and V build fluid-based models tocharacterize peers’ startup process under flash crowds in P2Plive streaming systems without and with admission control, re-spectively, and study the system performance with the models.Section VI conducts simulations to validate the derived models.Section VII presents several flash crowd handling strategiesbased on the derived analysis results. Section VIII concludesthe paper.

II. RELATED WORKS

Although much work has been done regarding how to handleflash crowds in Web services [8]–[10] and P2P file-sharing ap-plications [11], how a P2P live streaming system performs underflash crowds has not received enough attention. The limited ex-isting work left certain fundamental questions unanswered. Inthe following, we will respectively review existing work forhandling flash crowds in the fields of P2P file sharing, Web siteaccessing, and P2P live streaming and compare our work to ex-isting work.P2P file-sharing applications such as BitTorrent were

proven to be remarkably robust and scalable under flashcrowds [11]–[13]. Inspired by these results, some recent workproposed to use P2P networking to help Web sites to handleflash crowds. For instance, the Backslash system [14] is acollaborative Web mirroring system to protect Web sites partic-ipating in the system from significant performance degradationin the presence of flash crowds. Rubenstein et al. proposed touse P2P-based caching systems for the same purpose [15], [16].In a P2P live streaming system, however, this extraordinarysystem scalability under flash crowds is not a matter of fact dueto the stringent real-time requirement in terms of peer startuplatency. We will elaborate these reasons in Section III-B.Content distribution network (CDN) is an approach com-

monly used for shortening the accessing time of Web sites [17].CDN replicates a Web server’s contents at various locations inthe Internet and redirects users to local replicas. Similarly, somecommercial P2P live streaming systems have also used CDNto help content distribution [4]. However, it is not economicalto deploy so much redundant system resource to a level highenough to handle the peak demand that appears only underflash crowds, which lasts for a short time. Hence, this paperfocuses on studying a P2P live streaming system’s inherentability to handle flash crowds.For P2P live streaming applications, Leighton claimed that a

P2P live streaming system may abruptly degrade under a flashcrowd as the suddenly increased join requests may outstrip theuploading capacity of already online peers in the system [17].

However, Leighton did not provide theoretical proof or exper-imental evidence to support this claim. In this paper, we provethat the capacity of a P2P live streaming system to handle flashcrowds is limited via theoretical analysis.P2P live streaming systems can be categorized into two

types [18]: tree-based and mesh-based, according to theirunderlying overlay networking architectures. In a tree-basedsystem, streaming data are disseminated on single or multipledissimilar trees. In a mesh-based system, however, data are dis-seminated in a less structured manner. Seibert et al. [5] foundthat mesh-based systems are more adaptive than tree-basedsystems under flash crowds as the topologies of mesh-basedsystems are more flexible. In this paper, we consider thesystem residual uploading bandwidth allocation problem facedby a P2P live streaming system when a flash crowd occurs,assuming the system can resolve the topology managementproblem properly. Thus, our analytical results are applicablefor both tree-based systems and mesh-based systems.P2P streaming systems can also be categorized as either

chunk-based or substream-based, based on how media con-tents are divided. In a chunk-based system [1], [19], mediacontents are broken down into chunks, and each chunk istransferred in the network independently, like what is done bythe swarming data transfer technology in BitTorrent [20]. In asubstream-based system, however, media contents are split intosubstreams, and each substream is propagated independentlyin the Internet [21], [22]. Previous work [6], [7] studying theperformance of systems under flash crowds focused on sub-stream-based systems. In reality, however, many popular P2Plive streaming systems are chunk-based, such as PPLive [20]and PPStream [23]. For this reason, this paper focuses onstudying chunk-based systems and fills this gap up.Fluid model is an important modeling approach to analyze

the performance of P2P systems. Qiu et al. [24] introduced thismethod in their seminal work to analyze P2P file-sharing appli-cations. Inspired by [24], in this paper, we use fluid model to in-vestigate the performance of P2P live streaming systems underflash crowds. Specifically, we study the case in which newlyarrived peers do not serve other peers temporarily until they ob-tain a certain amount of content. This case has not been exten-sively studied in [24] because it is not a realistic case in P2P filesharing. In P2P live streaming, however, it is realistic. We willfurther elaborate the reasons in Section III-B. To the best of ourknowledge, this is the first study of P2P live streaming systemsunder flash crowds using fluid model.Admission control is a commonly used method for pre-

venting Web sites or streaming systems from being overloaded.There are in general two major strategies for performingadmission control [25]: reject excessive incoming requestsimmediately, or hold them temporarily until vacant systemresources appear and then admit them to join. The formerstrategy is not friendly to users. Thus, we focus on studying thelatter strategy. We will evaluate an admission control methodas follows: The system does not reject newly arrived peers, butadmits them in a way such that the sum of the downlink ratesof those newly admitted (while not stable yet) peers alwaysequals to the system’s residual upload bandwidth, until no peeris waiting for admission. This admission control method is


different from the slot-based admission control method used forsubstream-based P2P live streaming systems [6], [7] and Bit-Torrent-like P2P video-on-demand (VoD) systems [26]. In theslot-based method in the above work, newly arrived peers areadmitted on a per-slot basis. In each slot, the system (usually,the tracker server) admits newly arrived peers, where equalsthe system’s residual upload capacity divided by the streamingrate. Then, these peers get their wanted video contents andfinish their startup in this slot. References [6], [7], and [26] thenmodel and analyze how many slots are needed to admit all thenewly arrived peers of a flash crowd. One big issue associatedwith the slot-based model is the difficulty for the tracker server(or something with similar function in the system) to accuratelyestimate the optimal slot length, which can be affected by the(worst-case) end-to-end delay among peers, which, however,is usually unknown to the tracker server. Such an estimationis critical since longer slots mean larger startup latency, whileshorter slots may make the system unstable. Moreover, [6], [7],and [26] only analyzed systems with admission control anddid not study systems without admission control. In practice,however, as shown by existing measurement results [3], [4],most commercial live streaming systems have not deployedadmission control algorithms. Thus, the study of systemswithout admission control and their inherent ability to handleflash crowds is highly desirable. To the best of our knowledge,this is the first study on P2P live streaming systems withoutadmission control under flash crowds.

III. PRELIMINARIES

In this section, we first introduce preliminary knowledge re-garding P2P live streaming systems and flash crowds, and thendiscuss the problems caused by flash crowds. Symbols and no-tations used in this paper are summarized in Table I.

A. System-Related Background

In this paper, we study chunk-based P2P live streaming sys-tems due to their high popularity in the Internet. In such systems,media contents are broken down into a large number of smallchunks. A peer saves received chunks in a small local buffer.The buffer only stores the latest 10–100 seconds’ media con-tents, as in a live streaming system a media chunk quickly be-comes outdated and hence can be purged from the system. Insuch a small buffer, received chunks are reordered, assembled,and sent to a media player program. Each peer maintains theinformation about chunks available for sharing, referred to aswindow of chunk availability. These chunks are propagated inthe network in a swarming pattern, i.e., peers receiving a chunkserve as proxies that forward it to other peers.

B. Flash Crowds

We model a flash crowd as a phenomenon that the rate atwhich peers arrives at a system suddenly jumps to a higher rateand keeps at this rate for a certain long period of time (e.g., a fewminutes). This model has also been used in [7] to evaluate theperformance of P2P live streaming systems under flash crowds.To characterize the size of a flash crowd, we use two metrics:1) the new peer arriving rate after the flash crowd occurs, de-noted by ; 2) the shock level of the flash crowd (denoted by

TABLE ISYMBOL LIST

), which equals divided by the original peer arriving rate be-fore the flash crowd (denoted by ).We have . Thus,describes the difference between the peer arrival rates before

and after a flash crowd, while describes the actual peer arrivalrate associated with a flash crowd. In this paper, whether a flashcrowd can be said “large” is relative and evaluated based on itsshock level instead of its actual peer arrival rate . Whenis not specified, a large flash crowd means a high shock level;When is given, a large flash crowd means not only a highshock level, but also a significantly larger than .P2P live streaming has some unique requirements different

from other P2P applications (e.g., file sharing and video-on-de-mand), which make the design of such systems to handle flashcrowds challenging. The problems are listed as follows.First, P2P live streaming systems have stringent requirement

in terms of peer startup latency. This requirement is difficult tobe met under flash crowds. Specifically, unlike a user of file-sharing applications who can spend hours or even days to fetcha file, a user of live streaming systems expects to watch selectedprograms in a few seconds. Hence, a live streaming systemmustbe able to deliver a certain number of chunks to each newlyarrived peer within a short period of time (usually a few sec-onds). This stringent delay requirement is difficult to be metunder a large flash crowd, where a surge of new users arrive


Fig. 1. Buffer filling status of a newly joined peer during its startup phase. Tostart playback quickly, the peer will request chunks starting from its buffer tailand in the sequential order.

at the system during a very short period of time and require alarge amount of system upload bandwidth.Second, as shown by measurement results of real systems [4],

in a P2P live streaming system, newly joined peers upload littledata to others peers, which hurts the stability and scalability ofthe system under flash crowds. The main reason of this behavioris because a considerable percentage of users in a live streamingsystem usually only watch a small portion of a video and thenleave [27]. Thus, it is not preferable to let a newly joined peernotify other peers of its window of chunk availability and acceptchunk requests immediately after it joins the system, which isnot only unnecessary as it has not obtained enough chunks, butalso dangerous as its departure (which is quite probable) willaffect the stability of downloading of other peers and thus de-grade their service quality [6], [28]. Moreover, the tracker servershould not recommend such a newly joined peer to other newlyjoined peers, which could cause their startup performance tobecome even worse. As a result, other peers do not know theexistence of such a newly joined peer and thus cannot requestanything from it. Moreover, even if the newly arrived peer ob-tains some chunks and starts advertising its locally availablechunks to other peers, the use of special chunk fetching mecha-nism during its startup process also causes that it actually shareslittle with others for some time [28], [29].Fig. 1 illustrates the buffer filling status of a newly arrived

peer when it uses such a special chunk fetching mechanism.As shown in Fig. 1, the newly arrived peer’s buffer is almostempty (the filling status of a buffer position is indicated by a “1”or a “0.” “1” means “filled” by corresponding chunk, whereas“0” means not). Then, to shorten its startup latency, it usuallyrequests chunks starting from its buffer tail and in sequentialorder, as the leftmost chunk is the oldest chunk and closest tolocal playback point. Consequently, the chunks it obtained arenot wanted by peers who have obtained these chunks and startedtheir local playback stably or by other newly arrived peers, asthey request chunks at the same rate, in the same order, andstarting from almost the same chunk, and thus the data diver-sities among them are quite low. Thus, newly arrived peershave little to share with others. Since the scalability of a P2Psystem relies mainly on the sharing of load among peers, thelack of content uploading from newly joined peers hurts thesystem’s scalability, in particular, when a large flash crowd oc-curs. Certainly, after a newly arrived peer receives a certainnumber of chunks, it switches to use more appropriate chunkfetching mechanism to share chunks with other peers [28], [30].

IV. SYSTEMS WITHOUT ADMISSION CONTROL

In this section, we make assumptions and build a mathemat-ical model, which characterizes peers’ startup processes in a P2Plive streaming system without admission control.

Fig. 2. Generic model for P2P live streaming system.

A. Assumptions

Before presenting our model, wemake the following assump-tions, which were also used in [6] and [7] for investigating theperformance of systems under flash crowds.First, as explained in Section III-B, we assume that a P2P live

streaming system consists of two types of peers: startup peersand stable peers. Before a newly joined peer obtainschunks, it is a startup peer and does not serve data to other peers.After it obtains chunks, it becomes a stable peer and startsto advertise its chunk availability information and serve chunksto other peers. Such an assumption is consistent with existingmeasurement results of real-world systems [4].Then, we assume that each stable peer serves other stable

peers with high priority, and only uses residual upload band-width to serve startup peers. This assumption is reasonable asthe watching performance of existing stable peers should not beaffected by newly joined peers [6]. Denote a system’s streamingrate as and each peer’s uploading bandwidth as . As eachstable peer requires download bandwidth of to ensure itsplayback continuity, each stable peer on average must uploadto other stable peers at rate . Thus, each stable peer on av-erage can provide residual upload bandwidth of to startuppeers. We define this upload bandwidth as the residual uploadbandwidth of a stable peer and denote it by , measured innumber of chunks per second. Thus, we have ,where represents the size of a chunk.Finally, we assume that startup peers are treated fairly with re-

spect to (w.r.t.) allocation of all the stable peers’ residual uploadbandwidth (called system residual bandwidth). This assumptionis realistic as most systems currently in use treat newly arrivedstartup peers fairly [4]. Thus, each startup peer obtains a fairshare of the system residual bandwidth and has the same down-load rate.

B. System Model

In this section, we build a fluid-based model for a P2P livestreaming system without admission control. Fig. 2 shows ageneric model characterizing the system from the perspectiveof peers’ arrival, startup, and departure. As shown in Fig. 2, thesystem consists of startup peers and stable peers. The numbersof startup peers and stable peers at time are denoted byand , respectively. The state of such a system can be char-acterized by a pair of variables . Peer arrival rate isdenoted by . Similar to [6] and [7], we consider the casein which newly arrived peers do not leave the system midwayduring their startup phase, in order to find the worst-case systemperformance, since midway leavings of startup peers decreasethe system load and alleviate the potential congestion. This caseis not unrealistic as it was reported that P2P live streaming userscan wait for tens of seconds before starting to watch a program,although feeling frustrated to some extent [31]. Let rep-resent peer startup rate, which is defined as the rate at which


startup peers become stable peers at time and measured innumber of peers per second. Similar to [24], we assume the de-parture rate of a stable peer is a constant and denote it by Thus,the total peer departure rate is .We now study how and evolve. To ease the pre-

sentation, we write and as and ,respectively. A fluid model is given by

(1)

(2)

Equation (1) means that the change rate of is equal tonew peers’ arriving rate minus their startup rate , and(2) means that the change rate of equal the startup peers’startup rate minus stable peers’ departure rate .We use peer startup latency to characterize peers’ startup

process. Considering a peer that finishes its startup at time anddenoting its arrival time as , its startup latency is .First, as all the startup peers download at the same (fair) rateand each of them needs to download chunks to finish startup,a peer that arrives earlier will finish its startup earlier, i.e., peersfinish startup in first-in–first-out (FIFO) pattern. Thus, at time ,startup peers consist of peers that arrive during the period fromtime to time . We have . Getting itsderivative w.r.t. , we have

(3)

where we write as and as to ease thedescription. Comparing (1) to (3), we have .Replacing it into (2), we have

(4)

Then, according to the definition of , we know that attime , the system’s residual bandwidth, which is the sum ofall stable peers’ residual upload bandwidth, equals . Sinceall startup peers are assumed to be treated fairly in terms of thesystem’s residual bandwidth allocation, each startup peer canobtain residual bandwidth . Denoting each startuppeer’s downlink rate by and its download rate by , wehave . Accordingly, we define thefollowing two system working states.• Residual-bandwidth-abundant (RBA) state: In this state,

, meaning that a startup peer does not needto compete with other startup peers for system residualbandwidth and have .

• Residual-bandwidth-in-abundant (RBIA) state: In thisstate, , meaning that startup peers need tocompete with each other for residual bandwidth sharingand have .

Based on the definition for startup finishing, a peer who joinsthe system at time and finishes its startup at time needs todownload chunks during this period. We have

(5)

Equations (3)–(5) form a deterministic model for a P2Plive streaming system. The model reflects the peer startupprocess and reveals the relationship between peer parameters(including number of startup peers, number of stable peers, andpeer startup latency) and system parameters (including peer ar-riving rate, peer departure rate, and system upload bandwidth).In Sections IV-C and IV-D, we shall use it to characterize peers’startup processes in a P2P live streaming system under flashcrowd and evaluate the system’s performance, supposing thesystem works in RBA state before the flash crowd.

C. System Under Flash Crowd: RBA Case

This section studies the RBA case. That is, the system understudy continues working in RBA state after a flash crowd occurs.We first study how and evolve in this case, and thenderive the condition for this case to occur.Consider a flash crowd occurs at . According to the

definition of flash crowd we introduced in Section III-B, theflash crowd can be represented by ,where and are the peer arriving rates before and after theoccurrence of the flash crowd, respectively, and is unit stepinput. As in RBA case, (5) can be rewritten as

(6)

i.e., . Thus, , meaning that allnewly arrived peers have the same fixed startup latency .Denote it by . We have . Substituting it and

into , we have

(7)

Equation (7) means that the evolution process of con-tains the following three phases.1) Initial phase: When , i.e., before the flash crowdoccurs, keeps stable with a steady value .

2) Transition phase: When , i.e., after the flashcrowd occurs and before time , linearly increases astime evolves. The increasing rate is peers/s.

3) Stable phase: At time , reaches its new steady value. Since then, it keeps stable around this value.

Replacing into , we have. Substituting it into (2), we have

(8)

Equation (8) means that the dynamic process of can becharacterized by the following two phases.1) Stable phase: When , the equilibrium value ofis (by setting , we have

). As real-world measurement results showthat a system can work in steady state in normal situa-tions [1], we approximately set , .

2) Transition phase: When , the solution ofis

, where is a constant and is determined by the


initial value of . Recalling , we have. Thus,

. Since , we have, meaning gradually approaches

its new steady value .In summary, we have

(9)

We then derive the maximum size of flash crowd under whicha system can keep working in RBA state, denoted by . Asshown in (7) and (9), in , linearly increases andkeeps stable. Then, since , keeps stable and keepsincreasing. Thus, at time , has the minimalvalue. Thus, the condition that the system can keep workingin RBA state under a flash crowd is that .Replacing , , into it,after simple mathematical transformation, we have

(10)

meaning that is independent of . Denote the initialnumbers of startup peers and stable peers when the flash crowdoccurs by and , respectively. The system initial state is

. As shown in (7) and (9), and are decided by .Thus, (10) means that is independent of the system ini-tial state . Furthermore, as shown in (10), is in-versely proportional to stable peer’s departure rate , meaningthat a system can keep stability under larger flash crowds ifstable peers stay longer. We attribute this to the fact that stablepeers staying longer contribute more.We then use the models in (7) and (9) to study a P2P live

streaming system’s stabilization process in RBA case with realsystem parameter settings. For this purpose, we consider asystem with the following parameters, which are typical for areal-world P2P live streaming system.• Peer departure rate , which results in a meanstaying time of 20 s. This choice is made based on obser-vations from real system deployments wherein a large ma-jority of sessions are short sessions [14], [27]. In the laterstudies in this paper, we will also use longer staying time(e.g., ).

• Chunk size second’s streaming content, which isthe default chunk size used in PPLive [1].

• , which is the number of chunks that a startup peermust fetch to become a stable peer. As each chunk includes0.1 second’s streaming content, this configuration meansthat after a startup peer fetches second’scontent, it becomes a stable peer.

• Peer residual upload bandwidth chunks/s, which isrealistic as a P2P live streaming service provider usuallysets the streaming rate close to stable peer’s upload band-width, in order to fully utilize peers’ upload bandwidth andobtain high streaming quality. As a result, the ratio betweenpeer upload rate and streaming rate is usually in the rangefrom 1.1 to 1.3 [32], [33]. In this paper, we set the ratioto 1.2. As a result, the peer residual upload bandwidth is

chunks/s.

Fig. 3. System stabilization process in RBA case.

• Peer downlink rate chunks/s, which is 1.2 timesthe streaming rate. We select such a rate because streamingservice providers tend to provide videos with higher reso-lution and users tend to select videos with higher resolu-tion to watch when possible for better video quality. Thus,we set the peers’ downlink rate slightly higher than thestreaming rate, which is chunks/s.

We use peers per second, meaning a relatively popularchannel. With the above parameter settings, we have the max-imum peer arriving rate of flash crowd under which a system cankeep working in RBA statepeers/s. Thus, to show the system’s stabilization process in

the RBA case, we select peers/s. Fig. 3 plots the mod-eling results (see the curves for modeling results). As shown inFig. 3, after the flash crowd occurs, the number of startup peersquickly increases to its new stable value

, and the number of stable peers gradually in-creases to approach its new steady value.

D. System Under Flash Crowd: RBIA Case

This section studies the RBIA case. In this case, the systementers RBIA state after a flash crowd occurs and then may andmay not recover back to RBA state, depending on the size ofthe flash crowd. Existing measurement results indicate that thissituation frequently happens when flash crowds occur [4]. Wefirst derive a model for characterizing a system in this case, andthen examine the system performance with the model.Consider a flash crowd occurs at time . As the system

works in RBA state when the flash crowd occurs, we havewhen . We next analyze the system’s

evolution process by the following five phases.Phase 1: RBA phase. After the flash crowd occurs, in-

creases as [see (7)] and keepsno change as [see (9)]. Thus, keepsdecreasing. If drops below before time , thesystem enters RBIA state. Denoting the time whendrops to equal by , we have and. Replacing and

into it, after simple mathematical transformation, we have. As the system works in RBA state

when , the startup latency of a startup peer that finishesstartup before time is still . Thus, Phase 1 tells the startupprocess of peers arriving before .Phase 2: Startup process of peers arriving during .

As the peer arriving at time finishes its startup at time


and peers finish startups in FIFO pattern, if a peer arriving aftercan finish startup, its startup finishing time must be later

than . Thus, such a peer’s downloading process consists of thefollowing two stages: 1) During , the system worksin RBA state, and the peer downloads with rate . 2) During

, the system works in RBIA state, and the peer has todownload with rate . Thus, (5) can be rewritten as

Getting its derivative w.r.t. , we have, i.e.,

(11)

As we are studying peers arriving during , we have. Replacing it and (11) into (3) and (4), we have

(12)

(13)

Equations (11)–(13) are a system of ordinary differentialequations (ODEs), which represent the system evolutionprocess for peers arriving during to finish theirstartups (if possible). While the equations have irregular formsso that we cannot obtain closed-form solutions of , ,and from them, we can resolve them by numerical method.We next prove that, according to (12) and (13), keeps

increasing and keeps decreasing in Phase 2. Accordingly,the system keeps working in RBIA state in this phase. For thispurpose, we first discuss the initial varying trends of and

at . For , as , according to(12), we have , meaning is increasingwhen . Thus, we have , .For , we rewrite (13) as follows:

(14)

Since , replacing it into ,we have . After simple mathematicaltransformation, we have . Replacing it into(14), we have , meaning that does not changewhen . Thus, . As a result,

, i.e., thesystem enters RBIA state since .We then prove that is increasing while is decreasing

when . For , as the system is in RBIA state at, i.e., , according to

(12), we have , meaning is stillincreasing when . Thus, we have

. For , as , we have. Thus, according to

(14), we have , meaning is decreasing at. Thus, we have . As a result,

we have

, i.e., the system is still in RBIAstate at . Following the above deriving strategy, it canbe proven that keeps increasing and keeps decreasingat , and the system keeps working inRBIA state, until the end of Phase 2. The proof is now complete.Thus, the system’s evolution process in Phase 2 has the fol-

lowing two possibilities: 1) The size of flash crowd is extremelylarge so that the system collapses, i.e., the number of stable peersdrops to zero and remaining startup peers will never finish theirstartups, meaning that the size of arriving flash crowd has ex-ceeded the system’s capacity for handling flash crowds. 2) Thenumber of stable peers, although keeps decreasing, does notdrop to zero at the end of Phase 2. We next focus on studyingthe second case to explore the largest flash crowd under whicha system can recover to RBA state. Accordingly, the system en-ters the following Phase 3.Phase 3: Startup process of peers arriving during . As

the system enters RBIA state at time , similar to the peersarriving during that we studied in Phase 2, peersarriving during also first downloadwith rate whenand then download with rate after . Thus, (11)

is still applicable for these peers. Moreover, as these peers arriveafter , we have . Thus, . Replacing itand (11) into (3) and (4), we haveand . Similar to Phase 2, wecan resolve them by numerical method.After extensive experiments, we find that the system’s state

has the following three possibilities when Phase 3 terminates:1) One extreme case is that the shock level of flash crowd is ex-tremely high so that drops to zero and the system collapses.2) Another extreme case is that the flash crowd’s shock levelis just slightly higher than and the system reenters RBAstate. 3) The flash crowd’s size is moderately large such that thesystem is still working in RBIA state when all the peers arrivingduring finish their startups. Next, instead of focusing onextremely large flash crowds (for the first case) or too moderateflash crowds (for the second case), wewill study the third case toexplore the largest flash crowd under which a system can handle.Accordingly, the system enters the following Phase 4.Phase 4: Startup process of peers arriving after —system

still in RBIA state. For peers that arrive after andfinish startup in this phase, they always download with rate

. Thus, (5) can be rewritten as. Getting its derivative w.r.t. , we have

, i.e., .Replacing it and (as ) into (3) and(4), we have and

. These equations arenot ODEs, as an ODE should not include and . For-tunately, in our problem, is always smaller than . Thus,we can resolve them by numerical method to obtain , ,and . After extensive experiments, we find the system’sevolution process has two possibilities: 1) the system collapses;2) the system returns back to RBA state. In the latter case, thesystem enters the following Phase 5.Phase 5: Startup process of peers arriving after —system

in RBA state again. Denote the time when the system reentersthe RBA state by . We have . Thus, forpeers arriving during , they first download with rate


during and then download with rateduring . Thus, (5) can be rewritten as

Getting its derivative w.r.t. , we have, i.e., .

Replacing it and (as ) into (3)and (4), we have and

. Similar to Phase 4, we canresolve them by numerical method.For peers arriving after , as they always download with

rate , (6) is applicable again. Thus, we have ,meaning that they have fixed startup latency again. Accord-ingly, we have . Replacing it and (as) into (3) and (4), we have , meaning that entersits new steady state since then, and , whosesolution is ,where is decided by the system state at time , meaning thatthe number of stable peers gradually increases to approach itsnew steady values .We finally present the modeling results of the evolution

process for the system described in Section IV-C when it worksin the RBIA case. As peers/s, we selectand 80 peers/s, respectively, for study. We first present themodeling results for peers/s. The system evolutionprocess includes the following five phases.1) Phase 1: s. The system works in RBAstate. In this phase, keeps increasing and keepsconstant. Note that s.

2) Phase 2: During (0.36 s, 1.2 s], the system works in RBIAstate. keeps increasing, and keeps decreasing.When s, the peer arriving at time finishes itsstartup. At this time, , .

3) Phase 3: During (1.2 s, 2.58 s], the system still works inRBIA state, but both and keep increasing. When

s, the peer arriving at time s finishesits startup. At this time, and .

4) Phase 4: During (2.58 s, 20.26 s], the system still works inRBIA state. During this process, first keeps increasingduring (2.58 s, 15.5 s]. At 15.5 s, reaches its maximalvalue 380. Then, starts decreasing. At the same time,

keeps increasing. When s,and , meaning that

.5) Phase 5:After 20.26 s, the systemworks in RBA state.keeps decreasing during (20.26 s, 21.09 s]. At 21.09 s, thepeer arriving at time s finishes its startup, and

reaches its new steady value. At the same time, keeps in-

creasing and gradually approaches its new steady value.

Fig. 4(a) plots the system evolution process forpeers/s (see the curves for the modeling results). As shown

Fig. 4. System evolution process in RBIA case. (a) peers/s. (b)peers/s.

in Fig. 4(a), after the flash crowd occurs, the number of startuppeers keeps increasing until it reaches its maximal value 380 at15.5 s, indicating that a large number of newly arrived peers arecongested. This modeling result is consistent with the measure-ment results reported in [4]. Fortunately, the number of stablepeers also keeps increasing, meaning that the system’s residualupload bandwidth keeps increasing. As a result, at 15.5 s,the number of startup peers starts decreasing. Eventually, at21.09 s, all previously congested startup peers are drained out,and the number of startup peers reaches its new steady value.This result shows that a P2P live streaming system has inherentcapacity to recover stability under flash crowds, even in theRBIA case.We then present the modeling results for peers/s. As

shown in Fig. 4(b), when the flash crowd occurs, the number ofstartup peers keeps increasing, and the number of stable peerskeeps decreasing until it drops to zero, i.e., the system collapses.Specifically, when

s, the system works in RBA state. After 0.17 s, keepsincreasing, and keeps decreasing. When Phase 2 ends at5.72 s, and . When Phase 3 ends at18.66 s, and . Then, drops tozero in Phase 4. This result shows that the capacity of the systemto recover stability from flash crowds in RBIA case is limited.

E. System Capacity to Handle Flash Crowds

Based on the models derived in Section IV-D, this sectionanalyzes the capacity of a system to handle flash crowds (i.e.,without causing collapsing) and its relation to system initialstate (i.e., the initial numbers of startup peers and stable peers)and stable peer’s departure rate. Again, we consider the systemdescribed in Section IV-C. To find the capacity of this system,we vary from 2 to 100 peers per second. Then, for each ,we gradually increase with step size of 0.1 peers/s, for ob-serving the system’s evolution process under flash crowds withdifferent . We also change the value of to find the rela-tion between the capacity and . We have the following threefindings.Finding 1: For a system with given and , it collapses

when is larger than a certain value. When is lower thanthe value, however, the system can avoid collapse and recoverstability. We call this threshold of as the system’s maximumsupportable peer arrival rate and denote it as .Finding 2: For a system with given , the maximum shock

level of flash crowd under which a system can avoid collapse


Fig. 5. System capacity to handle flash crowds. (a) Relation between maximum supportable shock level and departure rate of stable peer . (b) Relationbetween peers’ arriving time and their startup latencies when the system recovers stability from RBIA case. (c) Relation of the worst-case peer startup latencyto shock level of flash crowd for system with and without admission control, respectively. (d) Relation between peer waiting time and arriving time when systeminitial residual bandwidth is lower than the new peer arriving rate.

TABLE IIFOR THE SYSTEM DESCRIBED IN SECTION IV-C WITH CHANGING

and recover stability is a constant and is independent of . De-note it by . As the system initial state is decided by(i.e., and ), Finding 2 can

be understood as that is independent of the system initialstate. Based on these considerations, it is natural for us to use

to quantify (represent) a system’s capacity to handle flashcrowds.According to Finding 2, we can answer the question raised

but left open in [4] regarding the relationship between the initialnumber of stable peers and the maximum supportable peerarriving rate . Specifically, as , Finding 2 meansthat is proportional to . Thus, Finding 2 also means that

is proportional to the initial system state , as thesystem initial state is decided by . Thus, a systemwith more initial startup and stable peers can sustain flashcrowds with higher . Moreover, Finding 2 also enables us toestimate the performance of server assisted method to handleflash crowds, according to which extra assisting servers areutilized to serve newly arrived peers to alleviate the effect offlash crowds. Specifically, the introduction of extra assistingservers is equivalent to increasing the number of stable peersfrom the perspective of increasing system residual bandwidth.For example, suppose the uploading bandwidth of an assistingserver is 1.2 Mb/s while the residual uploading bandwidth of astable peer is 80 kb/s. In this case, the introduction of an extraserver can be approximately seen as addingextra always-online stable peers into the system. This will bevery helpful for system stabilization in the presence of a flashcrowd that causes shortage of uploading resources. Note thatthe server does not consume any uploading resources fromstable peers in the system.Finding 3: decreases with the departure rate of stable

peer, and the relationship is power law. Table II shows the valuesof for the system described in Section IV-C with different. Fig. 5(a) plots the relation between and as listed in

Table II on a log-log scale and shows a very straight line, whichmeans that the relation between and is power law. Weconduct linear regression on the modeling result and obtain afitting curve with of 0.9994, meaninga perfect fit. This result means that a system can handle largerflash crowds if its stable peers stay longer. We attribute this tothe fact that stable peers staying longer can contribute more tonewly joined peers.

F. Peer Startup Latency

This section studies the peer startup latency of a P2Plive streaming system under flash crowds. As explained inSection III-B, P2P live streaming systems have stringentreal-time requirement in terms of peer startup latency. Thus, ifpeers experience too long startup latency when a flash crowdoccurs, the system’s performance is also unacceptable. Ac-cording to our models, we can resolve the equations and obtain

by numerical method. As is the arriving time of thepeer who finishes its startup at time , the peer’s startup latencyis . Fig. 5(b) plots the peer startup latency asa function of peer arrival time for the system described inFig. 4(a). As shown in Fig. 5(b), after the flash crowd occurs,peers arriving after the occurrence of the flash crowd experi-ence longer startup latency, meaning that peers’ startups areslowing down. Then, peers arriving after 5.97 s have reducedstartup latency, meaning that peers’ startups are accelerated asthe number of stable peers increases. Eventually, peers arrivingafter 20.26 s (i.e., ) can finish startup in constant latency

s again. Such results are consistent with the systemstabilization process shown in Fig. 4(a).We then evaluate the relationship between the size of flash

crowd and the worst-case peer startup latency, which reflects thelongest time that newly arrived peers take to finish startup aftera flash crowd occurs. Denote it by . As shown in Fig. 5(b),we have s for the experiment, and such a latency isexperienced by the peer that arrives at time s. Wefind that is only relevant to the flash crowd’s shock level .Fig. 5(c) plots as a function of for the system describedin Section IV-C. As shown in Fig. 5(c), without admission con-trol, first increases exponentially with shock level when

. When further increases, increases more quicklythan exponential increase. Furthermore, as approaches


(which is 10.09 for the system under study), the system ap-proaches collapse (i.e., ). We do curve fitting to themodeled curve. The obtained fitting curve is as follows:

(15)

with root mean square error (RMSE) and mean abso-lute error (MAE) , meaning a good fitting.

G. System Recovery Time

This section analyzes the system recovery time, which meansthe time required by a system to recover to RBA state sincethe occurrence of a flash crowd, which causes the system todegrade and work in RBIA state. Denote it by . For instance,for the experiment shown in Fig. 5(b), is 20.26 s. We studythe relationship between and the size of flash crowd with ourmodels. We find that is also only relevant to and approachesinfinity when approaches . We perform curve fitting on theobtained curve. The obtained fitting curve is as follows:

with RMSE and MAE , meaning a goodfitting.

V. SYSTEMS WITH ADMISSION CONTROL

The analysis results in Section IV-D show that the high com-petitions for system residual upload bandwidth among newlyjoined peers are the key reason for system collapse upon largeflash crowds. Specifically, the competition of system residualupload bandwidth among newly joined peers makes them becongested and unable to finish their startups and thus cannot pro-mote to stable peers, but existing stable peers keep leaving thesystem, thus the number of stable peers eventually drops to zeroand the system collapses. Since admission control protects ex-isting startup peers from the impact of excessive newly arrivedpeers, and then the existing startup peers can finish startup andbecome stable peers to serve remaining startup peers, it couldbe effective in solving this problem.The admission control approach we shall study works as fol-

lows: keeping the sum of the downlink rates of those admittedstartup peers always equal to the system residual bandwidthuntil there is no peer waiting for admission. Thus, the admittedpeers can download with their downlink rate (which is usuallyhigh) and obtain the required chunks for startup quickly. Then,they can serve the remaining newly arrived peers in return. Ina real system, when a peer joins a channel, it first contacts thetracker server to register itself and obtain a list of possible neigh-bors. The desired admission control strategy can be applied atthis stage. Moreover, distributed implementation can also beadopted [26].Similar to the analysis of systems without admission control

in Section IV, in this section, we first mathematically model theevolution process of systems with admission control in the pres-ence of flash crowds and then use the derived model to evaluatethe system performance under flash crowds.

A. System Model

We build a fluid model to characterize a system with ad-mission control under flash crowds. Consider a system expe-riencing a flash crowd with new peer arriving rate at .Let represent a stable peer’s residual bandwidth measured innumber of startup peers per second. Alternately, can be ex-plained as how many newly joined peers can finish startups byusing a single stable peer’s unit-time residual bandwidth. Thus,we have . Accordingly, the system’s residual band-width is . We further define a new concept: waiting peer,which is a peer that has arrived at the system but has not been ad-mitted to begin startup. We denote the number of peers waitingfor admission at time by and its derivative w.r.t. by

.We present the model in (16) and (17), where , ,and are written as , and , respectively, to simplify thedescription

(16)

(17)

The model can be explained as follows.Case 1: (meaning there are newly arrived peers

waiting for admissions). As the system residual upload band-width has been fully utilized (otherwise there should be nopeers waiting for admission) and those admitted peers down-load with their downlink rates, the peer startup rate equalsthe system residual bandwidth . Thus, the rate at whichchanges is , and the rate at whichchanges is .Case 2: (meaning there is no newly arrived peer

waiting for admission) and (meaning the system’sresidual bandwidth is abundant such that it is not lower thanthe peer arriving rate ). Based on the definition of , isthe number of newly admitted peers that can become stable inunit time using the system’s residual bandwidth. Thus,means that the newly arrived peers that arrive in unit time canbecome stable in unit time. Therefore, there is no peer waitingfor admission and is always zero and , and the peerstartup rate is equal to the peer arriving rate . We have

.Case 3: and (meaning the system’s residual

bandwidth is not so abundant such that peer arriving rateexceeds it). Based on the definition of , means that,among the newly arrived peers arrive in unit time, onlypeers can finish startup and become stable, and the remaining

peers have to wait for admission. Thus, the peer startuprate is equal to the system residual bandwidth . As aresult, and .

B. Performance Analysis

Based on the above model, we analyze the system perfor-mance under flash crowds.1) System Capacity to Handle Flash Crowds: We first study

an admission-control-enabled system’s capacity to handle flash


crowds. The state of the system can be characterized by a pairof variables . We have the following theorem.Theorem 1: When , a P2P live streaming system with

admission control has superior scalability under flash crowds: Itcan drain out all waiting peers and recover to a new steady stateunder flash crowds of any sizes, and the new system steady stateis and .

Proof: Consider a flash crowd occurring at and therehave been some newly arrived peers waiting for admission at

(this is the worst case). Denoting the initial value ofthe number of waiting peers by , we have . Since

, according to (16) and (17), we have and. Since , we have and

thus , where is a constant and determined bythe initial value of at . Thus, the number of stable peersincreases exponentially, and the system’s residual bandwidthincreases exponentially as well. As the initial value of stable

peer number is , the initial system residual bandwidth is .We have the following two cases.Case 1: , i.e., system initial residual bandwidth is

higher than or equals to the new peer arriving. In this case, askeeps increasing with time, we have . Thus,. Therefore, , meaning that keeps de-

creasing with time. Let represent the number of stable peerswhen drops to 0 and call this time as a transition time. Whendrops to 0, as and , according to (17), we

have . By simple integral calculation, we have, where is a constant and determined

by the initial value of at the transition time (i.e., ). Thus,as . We now proceed to prove that the peer

arriving rate is always smaller than the residual bandwidthduring this process: First, if , will monotonically

decrease from toward and thus during thisprocess. As (see the precondition claimed in Theorem 1),we have . Second, if , will mono-tonically increase from toward and thus duringthis process. As , we have . Thus, wealways have and after the transition time.Accordingly, the evolution of can always be represented by

, which means as . Thus,Case 1 is established.Case 2: , i.e., system initial residual bandwidth is

lower than the new peer arriving rate. In this case, as ,we have , meaning keeps increasing.Simultaneously, as and , increases ex-ponentially. When eventually exceeds , we have

, meaning that starts decreasing.Since this time, the system dynamics will be similar to that de-scribed in Case 1. Thus, Case 2 is also established.2) System Stabilization Process Under Flash Crowds: In this

section, we obtain closed-form expressions of andwhen a flash crowd occurs in order to characterize a system’sstabilization process under a flash crowd. Assume that a P2P livestreaming system with admission control works in the steadystate and when a flash crowd occurs at time

. This assumption is reasonable as proven in Theorem 1.We have the following two cases.Case 1: . In the proof of Theorem 1 (see Case 1),

we have shown that when the initial condition is and(after the transition time), the evolution of system state

Fig. 6. System dynamic in the presence of a flash crowd. (a) .(b) .

is given by and . In the current case,we have similar initial condition (i.e., and )and thus have similar results, i.e., , and with ,we have and .Case 2: . In the proof of Theorem 1 (see Case 2),

we have known that there are two phases for this case. In thefirst phase, we have and . Resolvingit with the initial condition and , we have

and .When drops to 0, the process enters the second phase.In the second phase, we have and .Denote the phase conversion time by . We can obtain it bysolving . With

as initial condition, we have.

Fig. 6 demonstrates the system stabilization processes of theabove two cases according to the modeling results, respectively(see the curves for the modeling results). Again, we use thesystem described in Section IV-C. When flash crowd occurs,

, , and . The difference is as follows.In Fig. 6(a), and thus ; and .Since , we have , and it is an example ofCase 1 (i.e., ). In contrast, in Fig. 6(b), andthus ; and . Thus, ,and this is an example of Case 2 (i.e., ).3) Peer Startup Latency: In this section, we analyze peers’

startup latency in a P2P live streaming system with admissioncontrol under flash crowds. We consider a common implemen-tation method: Newly arrived peers are placed in an FIFO queueto get admissions. In such a system, a peer’s startup latencyconsists of two parts: waiting time and downloading time. Thewaiting time is defined as the time from the instant when a peerarrives at a system to the instant when the peer is admitted tojoin the system. The downloading time is the time taken by thepeer to finish startup after it is admitted to join. With admissioncontrol, an admitted startup peer can download at its downlinkrate. Thus, its downloading time is equal to the ratio betweenand its downlink rate and is a constant. Thus, we focus

on analyzing the waiting time. Regarding this, we have the fol-lowing theorem.Theorem 2: When a P2P live streaming systemwith an FIFO-

like admission control mechanism experiences a flash crowd, anewly arrived peer’s waiting time increases with the shock levelof flash crowd logarithmically, in the worst case.

Proof: Consider a flash crowd occurs at time . Denotethe waiting time of a peer that arrives at time by . Wehave the following two cases to get .


Case 1: . In Section V-B.2, we have derived thatin this case, always equals to zero, meaning that there isno peer waiting for admission and every newly arrived peer canbegin startup immediately once it arrives. Thus, .Case 2: . In Section V-B.2, we have derived that

in this case, is larger than zero and keeps increasing (forsome time). As peers are admitted to join in an FIFO pattern, thepeer arriving at time has to wait until all the peers in frontof it finish their startup. Since the peer startup rate is , andthese peers will finish startup at , we have

. Replacingand (see the solutions of and ,which we have derived in Section V-B-2 for this case) into it,we have . This result indicatesthat peer waiting time increases logarithmically with theflash crowd’s shock level .Fig. 5(d) plots the peer waiting time as a function of peer

arrival time when a flash crowd occurs at whenfor the system considered in Fig. 6(b). The shock level offlash crowd . The results shown inFig. 5(d) are consistent with the system stabilization processshown in Fig. 6(b), i.e., peers’ waiting time first increases,meaning that more and more peers are congested. It then de-creases, meaning the congested peers are drained out (admitted)gradually. In particular, after s, the waiting time ofnewly arrived peers drops to zero, meaning that every newlyarrived peer can begin startup immediately once it arrives.We then examine the worst-case peer waiting time when

a flash crowd occurs. Denote it by . Getting deriva-tive of w.r.t. , and letting it be equal to 0, we have

. Replacing it into , we have

(18)

Based on (18), Fig. 5(c) plots how the worst-case peer startuplatency changes with for the system with admission con-trol (see the curve “ : With Admission Control”). The systemparameter settings are same to those described in Section IV-C.For a system with admission control, its worst-case peer startuplatency , i.e., the worst-case peer waiting timeplus the fixed peer downloading time . As shown in Fig. 5(c),when is smaller than 6.0, with or without admission con-trol, the worst-case peer startup latencies are similar. This isbecause, when is small, congestions in the system without ad-mission control are not severe such that newly arrived peers canstill finish startup quickly. As increases, however, for thesystem without admission control increases first exponentiallyand then even faster when . For comparison, for thesystem with admission control, always increases logarith-mically with . As a result, the improvement brought by ad-mission control in terms of quickly increases as increases.This result demonstrates the superiority of admission control tohandle large flash crowds from the perspective of reducing theworst-case peer startup latency.4) System Recovery Time: In this section, we analyze system

recovery time , which means the time required for the systemto recover back to the state in which newly arrived peers donot need to wait for admission and can begin their startup im-mediately, since the occurrence of a flash crowd. We have thefollowing theorem.

Theorem 3: For a P2P live streaming system with admissioncontrol, its system recovery time increases logarithmicallywith the shock level of flash crowd, in the worst case.

Proof: Consider a flash crowd occurring at time . Wehave the following two cases to obtain .Case 1: . In Section V-B-2, we have derived that

in this case, the number of waiting peers always equals tozero, meaning there is no peer waiting for admission and eachnewly arrived peer can begin startup immediately after it arrives.Thus, .Case 2: . In this case, system recovery time is

the time when . Thus, we have. Rewrite it as

(19)

Getting its derivative w.r.t. , we have

(20)

According to (19), we have . Substi-tuting it into (20), we have

. When flash crowd size is large and consequently thesystem recovery time is also large, it can be approximated as

. Thus, we have, where is a constant. This re-

sult indicates that the system recovery time approximatelyincreases with shock level logarithmically.

VI. SIMULATION RESULTS WITH PRACTICAL CONSIDERATIONS

In this section, we conduct simulations to validate the derivedmodels while considering implementation details related to realsystems, such as fluctuations in peer arrivals, data are exchangedchunk by chunk (instead of as a continuous fluid like in theanalytical models), and network transmission delay.

A. Simulator Description

We perform event-driven discrete-time simulations, whichsimulate peers’ startup processes as follows. Startup peers re-quest chunks to fill up their buffers starting from buffer tailand in sequential order. After a startup peer issues a request, ifthere are idle stable peers available, a randomly selected idlestable peer will serve this request. The request arrives at thestable peer after a network propagation delay that is chosen ran-domly and uniformly distributed in (0, 100] ms. When the idlestable peer receives the request, it changes its state from “idle”to “busy” and sends the chunk back to the startup peer. Sincethe startup peer’s downlink rate is larger than a stable peer’sresidual upload bandwidth (i.e., chunks/s), the transmissionof a chunk takes s. After that time, the startup peer ob-tains the chunk and the stable peer changes its state back to“idle.” In contrast, if there exists no idle stable peer, the startuppeer’s request is rejected, and the requesting peer retries after adelay that is chosen randomly and uniformly distributed in (0,100] ms as well. When the startup peer obtains chunks, it be-comes a stable peer and starts uploading for remaining startuppeers. With the above configuration, we simulate the competi-tion of residual bandwidths among startup peers and their tran-sitions to stable peers. We do not consider how peer neighbors


are discovered and maintained since existing measurement re-sults indicate that the system bottleneck when a flash crowd oc-curs is the shortage of upload bandwidth, rather than neighbordiscovery [4]. We also simulated to use other delay distribu-tions, e.g., uniform distribution in (0, 200] or (0, 50] ms, and ob-tained similar results. To simulate admission control, we imple-mented the admission control approach with following optionsfor chunk fetching: After a stable peer starts serving a startuppeer, it does not accept any other startup peers’ requests untilthe served startup peer finishes its startup, to ensure the servicequality of chunk uploading. We begin each simulation with awarm-up period, allowing the system to reach steady state atan initial peer arrival rate . We then simulate a flash crowdby abruptly increasing the arrival rate to . We model peers’arriving behavior using a Poisson process and the staying timeof stable peers using an exponential distribution. For instance,to simulate a peer arriving rate , we randomly choose peer in-terarrival time according to an exponential distribution withmean . This choice is motivated by observations of realuser arrival behavior [2]. We wrote our simulation code basedon SimPy, which is an open-source discrete-event simulationframework based on standard Python [34].

B. Simulation Results

Figs. 3, 4(a), 4(b), 6(a), and 6(b) show the simulation results(see the curvesmarkedwith “Simulation”). The parameters usedare the same to those used for deriving the modeling resultsin each figure. For each experiment, we repeated 10 runs withdifferent seeds, average the results, and plot them into corre-sponding figures. As shown in these figures, simulation resultsand modeling results are consistent, indicating that the imple-mentation details have no obvious impact on system stabiliza-tion process in the presence of flash crowd. The small differ-ences between them can be explained as follows.• Peers arrive randomly. Thus, the short-term peer arrivingrate may temporarily exceed the system’s residual band-width, and thus some peers need to wait for admissions[see Fig. 6(a)].

• The transmission and retry delay of peers’ requests slowsdown the peer startup process, and thus a little more peersneed to wait [see Fig. 6(a) and (b)].

• Startup of peers in simulations happens at a series of dis-crete times, which slows down the peer startup process aswell [see Figs. 4(a) and 6(b)]. To understand this, considerthe following example. When (i.e., at time ,there is one stable peer online), (i.e., a stable peercan help one startup peer finish startup in unit time), and

(i.e., stable peers never leave), in the analyticalmodel, we have , and the number of stable peersevolves as . In the simulations, however, thenumber of stable peers increases as sincepeers start up in a series of discrete steps. As , thepeer startup rate in simulations is slower than that in theanalytical model.

• According to the model derived for systems (without ad-mission control) that can keep working in RBA state underflash crowds, peer startup time is

s. In simulations, however, as and

, a startup peer can only simultaneously download fromstable peers. As each chunk’s downloading takes

s, it takes a startup peer 0.5 s to ob-tain the first 6 chunks and another 0.5 s to obtain the re-maining chunks. Thus, the peer startup latencyis s, which is slightly longer than the modelingresult 0.8333 s. Accordingly, the number of startup peersis , which is larger than themodeling results [seeFig. 4(a)].

We further run simulations of system with admission controlunder larger flash crowds (e.g., ), and the system sta-bilization process is similar to that shown in Fig. 6(b) as both ofthem are for Case 2 in Theorem 1, while its system recoverytime is longer. These simulation results validate the superiorscalability of an admission-control-enabled P2P live streamingsystem under very large flash crowds.

VII. FLASH CROWD HANDLING STRATEGIES

In this section, based on the modeling results in precedingsections, we present several flash crowd handling strategies fordifferent circumstances. We first define some notations. Giventhe maximal allowable startup latency (denoted by ), for sys-tems without and with admission control, we can obtain themaximum shock levels of flash crowds under which all newlyarrived peers have startup latency below by using (15) and(18), respectively [see Fig. 5(c) for illustration].We denote themby and , respectively. They represent the capacity ofa system to meet the maximal startup latency bound underflash crowds, when the system works without and with admis-sion control, respectively.We next discuss some major considerations in designing flash

crowd handling strategies based on different criteria.• Admission control can bring significant gain in terms ofimproving the system capacity to meet the maximal startuplatency bound under flash crowds when is large [seeFig. 5(c)]. However, caution should be taken when intro-ducing admission control in a system (especially a com-mercial large-scale system already in use) since it requiresextra signaling process between joining peers and trackerserver and collection of accurate peer state information,which increase system complexity and thus may cause per-formance degradation [26].

• Assisting servers have been widely deployed as defaultbackup resources to improve system quality in most P2Pstreaming systems [4]. While their deployment increasessystem cost, assisting servers can be shared among dif-ferent channels, provided that these channels suffer flashcrowds at different times. Thus, the number of assistingservers does not need to reach a level high enough for han-dling simultaneous flash crowds on all channels.

• Based on these considerations, a service provider canmake a choice between using assisting servers or admis-sion control based on its system infrastructure, servicerequirements, and development and maintenance cost.Generally, when , i.e., admission controlbrings no obvious gain, it is not suggested to use/deploy


admission control. When , use of admissioncontrol could be a good choice.

After deciding whether to use admission control or not, wepresent the following flash crowd handling strategies.• For a system without admission control, if , thesystem itself can handle the flash crowd smoothly; other-wise, the system (usually the tracker server) should divertthe excessive percent of newlyarrived peers to extra assisting servers to ensure that thesepeers’ startup latencies are still below .

• For a system with admission control, we focus on studyingthe case (otherwise admission controlshould not have been used). If , the systemitself can handle the flash crowd smoothly and we neednot enable admission control since it increases systemcomplexity and may cause performance degradation; elseif , the system can handle the flashcrowd by enabling the admission control function; else if

, besides admission control, the system shoulddivert the excessive percent ofnewly arrived peers to extra assisting servers.

With the above strategies, a service provider can greatlysimplify its service logic and fully utilize its system resourcesto achieve satisfying peer startup performance while keepingsystem stability in the presence of large flash crowds.

VIII. CONCLUSION

In this paper, we have conducted a comprehensive studyon the performance of P2P live streaming systems under flashcrowds. By modeling the systems using fluid model, we studythe system capacity, peer startup latency, and system recoverytime of systems with and without admission control for flashcrowds. For systems without admission control, we use thelargest flash crowd under which a system can avoid collapseto quantify the system’s capacity to handle flash crowds andfound it is independent of system’s initial state while decreasesas stable peer’s departure rate increases, in a power-law rela-tionship. In comparison, for a system with admission control,we prove that it can recover stability from flash crowds ofany sizes and its worst-case peer startup latency and systemrecovery time scale logarithmically with the flash crowd size.Based on the analysis results, we present flash crowd handlingstrategies for providing satisfying peer startup performancewhile keeping system stability in the presence of large flashcrowds under different circumstances.

ACKNOWLEDGMENT

An abridged version of this paper was presented at the IEEEInternational Conference on Communications—Next Genera-tion Networking and Internet Symposium, Tokyo, Japan, June5–9, 2011. The authors are grateful to the constructive com-ments from anonymous reviewers, which helped improve thequality of this paper a lot.

REFERENCES[1] Y. Huang, T. Z. J. Fu, D. Chiu, J. C. S. Lui, and C. Huang, “Challenges,

design and analysis of a large-scale P2P-VoD system,” in Proc. ACMSIGCOMM, New York, NY, USA, May 2008, pp. 375–388.

[2] K. Sripanidkulchai, B. Maggs, and H. Zhang, “An analysis of livestreaming workloads on the internet,” in Proc. ACM IMC, Taormina,Italy, Feb. 2004, pp. 41–54.

[3] C. Wu, B. Li, and S. Zhao, “Magellan: Charting large-scale peer-to-peer topologies,” in Proc. IEEE ICDCS, Toronto, ON, Canada, Jun.2007, pp. 62–69.

[4] B. Li, G. Y. Keung, S. Xie, F. Liu, Y. Sun, and H. Yin, “An empiricalstudy of flash crowd dynamics in a P2P-based live video streamingsystem,” in Proc. IEEE GLOBECOM, New Orleans, LA, USA, Nov.2008, pp. 1–5.

[5] J. Seibert, D. Zage, S. Fahmy, and C. Nita-Rotaru, “Experimentalcomparison of peer-to-peer streaming overlays: An application per-spective,” in Proc. IEEE LCN, Montreal, QC, Canada, Feb. 2008, pp.20–27.

[6] F. Liu, B. Li, L. Zhong, B. Li, and D. Niu, “Flash crowd in P2Plive streaming systems: Fundamental characteristics and design im-plications,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 7, pp.1227–1239, Jul. 2012.

[7] Z. Chen, B. Li, G. Keung, H. Yin, C. Lin, and Y. Wang, “How scalablecould P2P live media streaming system be with the stringent time con-straint,” in Proc. IEEE ICC, Dresden, Germany, Jun. 2009, pp. 1–5.

[8] I. Ari, B. Hong, E. Miller, S. Brandt, and D. Long, “Managing flashcrowds on the internet,” in Proc. IEEE MASCOTS, Orlando, FL, USA,Oct. 2003, pp. 246–249.

[9] J. Jung, B. Krishnamurthy, and M. Rabinovich, “Flash crowds and de-nial of service attacks: Characterization and implications for CDNsand web sites,” in Proc. WWW, Honolulu, HI, USA, May 2002, pp.252–262.

[10] J. A. Patel, C. M. Yang, and I. Gupta, “Turning flash crowds into smartmobs with real-time stochastic detection and adaptive cooperativecaching,” in Proc. ACM SOSP, Brighton, U.K., Oct. 2005, pp. 1–7.

[11] R. Bharambe, C. Herley, and V. N. Padmanabhan, “Analyzing andimproving a BitTorrent networks performance mechanisms,” in Proc.IEEE INFOCOM, Barcelona, Spain, Apr. 2006, pp. 1–12.

[12] A. Legout, G. Urvoy-Keller, and P. Michiardi, “Rarest first and chokealgorithms are enough,” in Proc. ACM IMC, New York, NY, USA, Oct.2006, pp. 203–216.

[13] D. Stutzbach, D. Zappala, and R. Rejaie, “The scalability of swarmingpeer-to-peer content delivery,” in Proc. IFIP Netw., Waterloo, ON,Canada, May 2005, pp. 15–26.

[14] T. Stading, P. Maniatis, and M. Baker, “Peer-to-peer caching schemesto address flash crowds,” in Proc. IPTPS, Cambridge, MA, USA, Mar.2002, pp. 203–213.

[15] A. Stavrou, D. Rubenstein, and S. Sahu, “A lightweight robust P2Psystem to handle flash crowds,” IEEE J. Sel. Areas Commun., vol. 22,no. 1, pp. 6–17, Jan. 2004.

[16] D. Rubenstein and S. Sahu, “Can unstructured P2P protocols surviveflash crowds?,” IEEE/ACM Trans. Netw., vol. 13, no. 3, pp. 501–512,Jun. 2005.

[17] T. Leighton, “Improving performance on the internet,”ACMCommun.,vol. 52, no. 2, pp. 44–51, Feb. 2009.

[18] N. Magharei, R. R. , and Y. Guo, “Mesh or multiple-tree: A com-parative study of live P2P streaming approaches,” in Proc. IEEE IN-FOCOM, Anchorage, AK, USA, May 2007, pp. 1424–1432.

[19] X. Zhang, J. Liu, B. Li, and T. P. Yum, “Coolstreaming/DONet: Adata-driven overlay network for peer-to-peer live streaming,” in Proc.IEEE INFOCOM, Miami, FL, USA, Mar. 2005, pp. 2102–2111.

[20] B. Cohen, “Incentives build robustness in BitTorrent,”2003 [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.1911

[21] H. Chang, S. Jamin, andW.Wang, “Live streaming performance of thezattoo network,” inProc. ACM IMC, Chicago, IL, USA, Nov. 2009, pp.417–429.

[22] B. Li, S. Xie, Y. Qu, G. Y. Keung, C. Lin, J. Liu, and X. Zhang,“Inside the new Coolstreaming: Principles, measurements and perfor-mance implications,” in Proc. IEEE INFOCOM, Phoenix, AZ, USA,Apr. 2008, pp. 1031–1039.

[23] J. K. Jia, C. Li, and C. J. Chen, “Characterizing PPStream across in-ternet,” in Proc. NPC Workshop, Redwood City, CA, USA, Sep. 2007,pp. 413–418.

[24] D. Qiu and R. Srikant, “Modeling and performance analysis of BitTor-rent-like peer-to-peer networks,” in Proc. ACM SIGCOMM, Portland,OR, USA, Aug. 2004, pp. 367–378.

[25] L. Xie, P. Smith, D. Hutchison,M. Banfield, H. Leopold, A. Jabbar, andJ. Sterbenz, “From detection to remediation: A self-organized systemfor addressing flash crowd problems,” in Proc. IEEE ICC, Beijing,China, May 2008, pp. 5809–5814.


[26] L. Acunto, T. Vink, and H. Sips, “Bandwidth allocation in BitTor-rent-like VoD systems under flash crowds,” in Proc. IEEE P2P, Kyoto,Japan, Aug. 2011, pp. 192–201.

[27] K. C. Almeroth and M. H. Ammar, “Collecting and modeling the join/leave behavior of multicast group members in the MBone,” in Proc.IEEE HPDC, New York, NY, USA, Aug. 1996, pp. 209–216.

[28] C. Li and C. Chen, “Fetching strategy in the startup stage of P2P livestreaming,” 2010 [Online]. Available: http://arxiv.org/abs/0810.2134

[29] A. Vlavianos, M. Iliofotou, and M. Faloutsos, “BiToS: EnhancingBitTorrent for supporting streaming applications,” in Proc. IEEEINFOCOM, Barcelona, Spain, May 2006, pp. 1–6.

[30] B. Zhao, J. Lui, and D. Chiu, “Exploring the optimal chunk selectionpolicy for data-driven P2P streaming systems,” in Proc. IEEE P2P,Washington, DC, USA, Sep. 2000, pp. 271–280.

[31] Q. Ying, Y. Guo, Y. Chen, X. Tan, and W. Zhu, “Understanding users’access failure and patience in large-scale P2P VoD systems,” in Proc.IEEE ICWMMN, Beijing, China, Nov. 2011, pp. 283–287.

[32] C. Liang, Y. Guo, and Y. Liu, “Is random scheduling sufficient in P2Pvideo streaming,” in Proc. IEEE ICDCS, Beijing, China, Jun. 2008, pp.53–60.

[33] M. Zhang, Q. Zhang, L. Sun, and S. Yang, “Understanding the power ofpull-based streaming protocol: Can we do better?,” IEEE J. Sel. AreasCommun., vol. 25, no. 9, pp. 1678–1694, Sep. 2007.

[34] N. Matloff, “Introduction to discrete-event simulation and the SimPylanguage,” 2008 [Online]. Available: http://heather.cs.ucdavis.edu/~matloff/156/PLN/DESimIntro.pdf

Yishuai Chen received the B.S., M.S., and Ph.D.degrees in electronics and information engineeringfrom Beijing Jiaotong University, Beijing, China, in1998, 2001, and 2010, respectively.He is currently a Lecturer with the School of Elec-

trical and Information Engineering, Beijing JiaotongUniversity. From 2010 to 2012, he was a Postdoc-toral Fellow with the Research Center of UbiquitousSensor Networks, University of Chinese Academyof Sciences, Beijing, China. From 2001 to 2007, heworked with Lucent–Bell Labs, Beijing, China, on

intelligent network systems as a Member of Technical Staff. His research in-terests include peer-to-peer computing, streaming media, Web and Internet ser-vices, and consumer behavior.Dr. Chen has served as a TPC member for IEEE GLOBECOM.

Baoxian Zhang (M’02–SM’12) received the Ph.D.degree in electronics and information engineeringfrom Beijing Jiaotong University, Beijing, China, in2000.He is currently a Professor with the Research

Center of Ubiquitous Sensor Networks, Universityof Chinese Academy of Sciences (UCAS), Beijing,China. Prior to joining UCAS, he was a ResearchScientist with the School of Information Technologyand Engineering, University of Ottawa, Ottawa, ON,Canada. He has coauthored a book in wireless sensor

networks and published over 100 refereed technical papers in archival journalsand conference proceedings. His research interests cover network protocol andalgorithm design and wireless ad hoc and sensor networks.Prof. Zhang has served as a Guest Editor of special issues for the IEEE

JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, Mobile Networks andApplications, Ad Hoc Networks, and Wireless Communications and MobileComputing. He has served as a TPCmember for many international conferencesand symposia such as IEEE GLOBECOM, ICC, WCNC, and PIMRC.

Changjia Chen (M’93–SM’03) received the Ph.D.degree in electrical engineering from the Universityof Hawaii at Manoa, Honolulu, HI, USA, in 1986.He is currently a Professor with Beijing Jiaotong

University, Beijing, China. He has published over100 refereed technical papers in archival journalsand conference proceedings. His research interestsinclude communication networks and communica-tion protocols and measurement and modeling ofP2P networks.Prod. Chen is a Fellow of the China Institute of

Communications (CIC) and Chinese Institute of Electronics (CIE).

Dah Ming Chiu (SM’02–F’08) received the B.Sc.degree in electrical engineering from Imperial Col-lege London, London, U.K., in 1975, and the Ph.D.degree in applied mathematics from Harvard Univer-sity, Cambridge, MA, USA, in 1980.He is currently the Department Chairman of Infor-

mation Engineering with the Chinese University ofHong Kong (CUHK), Hong Kong. Prior to joiningCUHK, he worked for Sun Labs, DEC, and AT&TBell Labs. His current research interests include P2Pnetworks, network measurement, architecture and

engineering, network economics, and social networks.Prof. Chiu is an IEEE Fellow for his contribution to the resource allocation

algorithms for the Internet.

Documents

Performance Modeling and Evaluation of Peer-to-Peer Live Streaming Systems Under Flash Crowds