Upload
neil-gunther
View
2.196
Download
2
Embed Size (px)
DESCRIPTION
Discusses The Greatest Scatter Plot (Hubble 1929), Irregular Time Series (Harmonic Mean), Zipf’s Law of Words, Oracle Query Times, and Eleventh Hour Spikes.
Citation preview
A Melange of Methods for Manipulating MonitoredData
Converging on Consistency
Neil Gunther @DrQzen.wikipedia.org/wiki/Neil_J._Gunther
Performance Dynamics
Monitorama PDXMay 6, 2014
SM
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 1 / 52
Introductions
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 2 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashesSorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashesSorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashesSorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashesSorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013
Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashesSorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013Asiana Airlines Flight 214 landing arse-backwards
(sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashesSorry... Deal with it
SFO runway 28L, 11:28 a.m., July 6, 2013Asiana Airlines Flight 214 landing arse-backwards (sans tail)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
“Asiana pilots appear to be overly reliant on instrument-guided landings and lack thetraining to touch down manually.” —SFO Commissioner Eleanor Johns
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 4 / 52
A Message from Your Sponsors
Don’t be too reliant on your instruments (strip charts, colored dials, shiny things)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 5 / 52
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools3 It’s not about fancy math4 Data are usually trying to tell you something5 Your interpretation has to be consistent with other data6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures2 It’s not about whiz bang tools
3 It’s not about fancy math4 Data are usually trying to tell you something5 Your interpretation has to be consistent with other data6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures2 It’s not about whiz bang tools3 It’s not about fancy math
4 Data are usually trying to tell you something5 Your interpretation has to be consistent with other data6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures2 It’s not about whiz bang tools3 It’s not about fancy math4 Data are usually trying to tell you something
5 Your interpretation has to be consistent with other data6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures2 It’s not about whiz bang tools3 It’s not about fancy math4 Data are usually trying to tell you something5 Your interpretation has to be consistent with other data
6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures2 It’s not about whiz bang tools3 It’s not about fancy math4 Data are usually trying to tell you something5 Your interpretation has to be consistent with other data6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
Consistency
1 It’s not about pretty pictures2 It’s not about whiz bang tools3 It’s not about fancy math4 Data are usually trying to tell you something5 Your interpretation has to be consistent with other data6 Your interpretation has to be consistent with other information
This talk is about
Converging on consistency by example
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
The Greatest Scatter Plot
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power LawsZipf’s Law of WordsDatabase Query TimesEleventh Hour Spikes
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 7 / 52
The Greatest Scatter Plot
The Greatest Scatter Plot
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 8 / 52
The Greatest Scatter Plot
Goggle up! Science ahead...
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 9 / 52
The Greatest Scatter Plot
Some Monitored Data
5 10 15 20
0.0
0.5
1.0
1.5
2.0
Time
Met
ric 1
5 10 15 20
-200
200
600
1000
TimeM
etric
2
Two time series, two metrics: Metric 1 and Metric 2
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 10 / 52
The Greatest Scatter Plot
Scatter Plot
0.0 0.5 1.0 1.5 2.0
0500
1000
Metric 1
Met
ric 2
Are Metric 1 and Metric 2 related in any way?c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 11 / 52
The Greatest Scatter Plot
Linear Regression
0.0 0.5 1.0 1.5 2.0
0500
1000
Metric 1
Met
ric 2
LSQ fit: Metric2 = 423.94 Metric1 and R2 = 0.82c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 12 / 52
The Greatest Scatter Plot
This is Not the End
This is just the beginning
Need to reach consistency
1 Is the linear fit still a reasonable choice?
2 What is the meaning of the slope ?
3 Willing to extrapolate this model into the future?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 13 / 52
The Greatest Scatter Plot
The most important scatter plot in history (1929)
Hubble’s diagram and cosmic expansionRobert P. Kirshner*Harvard–Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138
Contributed by Robert P. Kirshner, October 21, 2003
Edwin Hubble’s classic article on the expanding universe appeared in PNAS in 1929 [Hubble, E. P. (1929) Proc. Natl. Acad. Sci. USA 15,168–173]. The chief result, that a galaxy’s distance is proportional to its redshift, is so well known and so deeply embedded into thelanguage of astronomy through the Hubble diagram, the Hubble constant, Hubble’s Law, and the Hubble time, that the article itselfis rarely referenced. Even though Hubble’s distances have a large systematic error, Hubble’s velocities come chiefly from VestoMelvin Slipher, and the interpretation in terms of the de Sitter effect is out of the mainstream of modern cosmology, this articleopened the way to investigation of the expanding, evolving, and accelerating universe that engages today’s burgeoning field ofcosmology.
T he publication of Edwin Hub-ble’s 1929 article ‘‘A relationbetween distance and radialvelocity among extra-galactic
nebulae’’ marked a turning point in un-derstanding the universe. In this briefreport, Hubble laid out the evidence forone of the great discoveries in 20th cen-tury science: the expanding universe.Hubble showed that galaxies recedefrom us in all directions and more dis-tant ones recede more rapidly in pro-portion to their distance. His graph ofvelocity against distance (Fig. 1) is theoriginal Hubble diagram; the equationthat describes the linear fit, velocity !Ho " distance, is Hubble’s Law; theslope of that line is the Hubble con-stant, Ho; and 1!Ho is the Hubble time.Although there were hints of cosmicexpansion in earlier work, this is thepublication that convinced the scientificcommunity that we live in an expandinguniverse. Because the result is so impor-tant and needs such constant reference,astronomers have created eponymousHubble entities to use Hubble’s aston-ishing discovery without a reference tothe original publication in PNAS (1).†
Today, #70 years later, exquisite ob-servations of the cosmic microwavebackground (2), measurement of lightelements synthesized in the first fewminutes of the universe (3), and modernversions of Hubble’s Law form a firmtriangular foundation for modern cos-mology. We now have confidence that ageometrically f lat universe has been ex-panding for the past 14 billion yr, grow-ing in contrast through the action ofgravity from a hot and smooth Big Bangto the lumpy and varied universe of gal-axies, stars, planets, and people we seearound us. Observations have forced usto accept a dark and exotic universethat is $30% dark matter with only 4%of the universe made of familiar protonsand neutrons. Of that small fraction offamiliar material, most is not visible.Like a dusting of snow on a mountain
ridge, luminous matter reveals the pres-ence of unseen objects.
Extensions of Hubble’s work with to-day’s technology have developed vastnew arenas for exploration: extensivemapping using Hubble’s Law shows thearrangement of matter in the universe,and, by looking further back in timethan Hubble could, we now see beyondthe nearby linear expansion of Hubble’sLaw to trace how cosmic expansion haschanged over the vast span of time sincethe Big Bang. The big surprise is thatrecent observations show cosmic expan-sion has been speeding up over the last5 billion yr. This acceleration suggeststhat the other 70% of the universe iscomposed of a ‘‘dark energy’’ whoseproperties we only dimly grasp but thatmust have a negative pressure to makecosmic expansion speed up over time(4–9). Future extension of the Hubblediagram to even larger distances andmore precise distances where the effects
of acceleration set in are the route toilluminating this mystery.
Hubble applied the fundamental dis-coveries of Henrietta Leavitt concern-ing bright Cepheid variable stars.Leavitt showed that Cepheids can besorted in luminosity by observing theirvibration periods: the slow ones arethe intrinsically bright ones. By mea-suring the period of pulsation, an ob-server can determine the star’s intrin-sic brightness. Then, measuring theapparent brightness supplies enoughinformation to infer the distance.
This Perspective is published as part of a series highlightinglandmark papers published in PNAS. Read more aboutthis classic PNAS article online at www.pnas.org!misc!classics.shtml.
*E-mail: [email protected].†There are just 73 citations of Hubble’s original paper inNASA’s Astrophysics Data System. There are 1,001 citationsof ref. 7.
© 2003 by The National Academy of Sciences of the USA
Fig. 1. Velocity–distance relation among extra-galactic nebulae. Radial velocities, corrected for solarmotion (but labeled in the wrong units), are plotted against distances estimated from involved stars andmean luminosities of nebulae in a cluster. The black discs and full line represent the solution for solarmotion by using the nebulae individually; the circles and broken line represent the solution combining thenebulae into groups; the cross represents the mean velocity corresponding to the mean distance of 22nebulae whose distances could not be estimated individually. [Reproduced with permission from ref. 1(Copyright 1929, The Huntington Library, Art Collections and Botanical Gardens).]
8–13 " PNAS " January 6, 2004 " vol. 101 " no. 1 www.pnas.org!cgi!doi!10.1073!pnas.2536799100
Metric 1 (x-axis) = distance to the observed star (r )Metric 2 (y -axis) = recessional velocity of the star (v )
106 parsecs ≡ 1 Mpc = 3.3 million light years
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 14 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ rSupports Big Bang hypothesis
2 What does the slope mean?
Slope:vr=
rt× 1
r=
1t≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH ' 2 billion years
Age of Earth tE ' 3–5 billion years (Oops!)
Not consistent / Whaddya gonna do?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ rSupports Big Bang hypothesis
2 What does the slope mean?
Slope:vr=
rt× 1
r=
1t≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH ' 2 billion yearsAge of Earth tE ' 3–5 billion years (Oops!)
Not consistent / Whaddya gonna do?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ rSupports Big Bang hypothesis
2 What does the slope mean?
Slope:vr=
rt× 1
r=
1t≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH ' 2 billion yearsAge of Earth tE ' 3–5 billion years (Oops!)
Not consistent /
Whaddya gonna do?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspected v ∼ rSupports Big Bang hypothesis
2 What does the slope mean?
Slope:vr=
rt× 1
r=
1t≡ H0 (Hubble’s constant)
Inverse Hubble constant has units of time tH = 1/H0
tH is the expansion time = Age of Universe!
3 Small problem
Hubble calculated: tH ' 2 billion yearsAge of Earth tE ' 3–5 billion years (Oops!)
Not consistent / Whaddya gonna do?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
The Greatest Scatter Plot
0.0 0.5 1.0 1.5 2.0
0500
1000
Hubble's 1929 Corrected Data
Galactic distance (Mpc)
Rec
essi
onal
vel
ocity
(km
/s)
Hubble even corrected for so-called peculiar velocity (black dots)c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 16 / 52
The Greatest Scatter Plot
0.0 0.5 1.0 1.5 2.0
0500
1000
Hubble's 1929 Corrected Data
Galactic distance (Mpc)
Rec
essi
onal
vel
ocity
(km
/s)
Slope moved the wrong way /c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 17 / 52
The Greatest Scatter Plot
Pay Day 2003
result, they make good distance indica-tors. Refined methods for analyzingthe observations of type Ia supernovaegive the distance to a single event tobetter than 10% (19, 20). The bestmodern Hubble diagram, based on wellobserved type Ia supernovae out to amodest distance of !2 billion lightyears, is shown in Fig. 3, where theaxes are chosen to match those ofHubble’s original linear diagram (tomask our uncertainties, astronomersgenerally use a log-log form of thisplot as in Fig. 4). Far beyond Hubble’soriginal sample, Hubble’s Law holdstrue.
In table 2 of his original article (1)(reproduced as Table 1, which is pub-lished as supporting information on thePNAS web site), Hubble inverted thevelocity–distance relation to estimatethe distances to galaxies of known red-shift. For galaxies like NGC 7619 forwhich he had only Humason’s recentlymeasured redshift, Hubble used thevelocity–distance relation to infer thedistance. This approach to estimatingdistances from the redshift alone hasbecome a major industry with galaxyredshift surveys. Today’s telescopes are1,000 times faster at measuring red-shifts than in Hubble’s time, leading tolarge samples of galaxies that trace thetexture of the galaxy distribution (21–24). As shown in Fig. 5, the 3D distri-bution of galaxies constructed fromHubble’s Law is surprisingly foamy,with great voids and walls that form asdark matter clusters in an expandinguniverse, shaping pits into which theordinary matter drains, to form theluminous matter we see as stars in gal-axies. Quantitative analysis of galaxy
clustering leads to estimates for theamount of clumpy dark matter associ-ated with galaxies. The best matchcomes if the clumpy matter (dark andluminous, baryons or not) adds up to!30% of the universe.
The interpretation of the redshift as avelocity, or more precisely, as a stretch-ing of photon wavelengths due to cosmicexpansion, which we assume today’s col-lege sophomores will grasp, was not soobvious to Hubble. Hubble was verycircumspect on this topic and, more gen-erally, on the question of whether cos-mic expansion revealed a genuine cos-mic history. He referred to the redshiftas giving an ‘‘apparent velocity.’’ In a
letter to Willem de Sitter (25), Hubblewrote, ‘‘Mr. Humason and I are bothdeeply sensible of your gracious appreci-ation of the papers on velocities anddistances of nebulae. We use the term‘apparent’ velocities to emphasize theempirical features of the correlation.The interpretation, we feel, should beleft to you and the very few others whoare competent to discuss the matterwith authority.’’
Part of the difficulty with the inter-pretation came from alternative views,notably by the local iconoclast, FritzZwicky, who promptly sent a note toPNAS in August 1929 that advocatedthinking of the redshift as the result ofan interaction between photons and in-tervening matter rather than cosmic ex-pansion (26). The reality of cosmicexpansion and the end of ‘‘tired light’’has only recently been verified in aconvincing way.
While the nature of the redshift was abubbling discussion in Pasadena, OlinWilson of the Mount Wilson Observa-tory staff suggested that measuring thetime it took a supernova to rise and fallin brightness would show whether theexpansion was real. Real expansionwould stretch the characteristic time,about a month, by an amount deter-mined by the redshift (27).
This time dilation was sought in 1974,but the sample was too small, toonearby, and too inhomogeneous to seeanything real (28). It was only with largecarefully measured and distant samplesof SN Ia (29, 30) and more thoroughcharacterization of the way supernovalight curves and supernova luminositiesare intertwined (31, 32) that this topic
Fig. 3. The Hubble diagram for type Ia supernovae. From the compilation of well observed type Iasupernovae by Jha (29). The scatter about the line corresponds to statistical distance errors of "10% perobject. The small red region in the lower left marks the span of Hubble’s original Hubble diagram from1929.
Fig. 4. Hubble diagram for type Ia supernovae to z ! 1. Plot in astronomers’ conventional coordinatesof distance modulus (a logarithmic measure of the distance) vs. log redshift. The history of cosmicexpansion can be inferred from the shape of this diagram when it is extended to high redshift andcorrespondingly large distances. Diagram courtesy of Brian P. Schmidt, Australian National University,based on data compiled in ref. 18.
Kirshner PNAS ! January 6, 2004 ! vol. 101 ! no. 1 ! 11
Hubble’s (linear) Law: v = H0r out to 2.3 billion light yearsc© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 18 / 52
The Greatest Scatter Plot
Consistency
1 Hubble took some static for his 1929 paper2 Couldn’t reach consistency and had to gamble3 Best measurements (telescopes) at the time4 Telescopes and measurements improved5 Converged toward consistency over next decades6 tH = 2.36 Gy (1929)→ tH = 13.89 Gy (2003)
Data was wrong but his interpretation (model) was correct
Guerrilla Mantra 1.16:Treating data as something divine is a sin
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
The Greatest Scatter Plot
Consistency
1 Hubble took some static for his 1929 paper2 Couldn’t reach consistency and had to gamble3 Best measurements (telescopes) at the time4 Telescopes and measurements improved5 Converged toward consistency over next decades6 tH = 2.36 Gy (1929)→ tH = 13.89 Gy (2003)
Data was wrong but his interpretation (model) was correct
Guerrilla Mantra 1.16:Treating data as something divine is a sin
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
Irregular Time Series
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power LawsZipf’s Law of WordsDatabase Query TimesEleventh Hour Spikes
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 20 / 52
Irregular Time Series
Irregular Time Series
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 21 / 52
Irregular Time Series
Aggregating Time Series
1 Regular sample intervals:Samples on tick of a metronomeComputer performance metricsWeather data
2 Irregular sample intervals:Missing data (e.g., stock exchanges)Unequal sampling due to:
EventsSubscriptions (e.g., every 10,0000 sign-ups)Occasional (e.g., personal weight)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 22 / 52
Irregular Time Series
Back to Monitorama Boston 2013
Aggregation always assumes the arithmetic mean (AM)
Aggregation of irregular time series came up in @mleinart’s talk
NJG: “Should aggregate rate data using the harmonic mean (HM)”
But harmonic mean is not clear for time series
Cost me a month after Monitorama Boston to figure it out
See my blog post and detailed slides of April 9, 2013
Harmonic Averaging of Monitored Rate Data
Which is why Monitorama is cool ,
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
Irregular Time Series
Back to Monitorama Boston 2013
Aggregation always assumes the arithmetic mean (AM)
Aggregation of irregular time series came up in @mleinart’s talk
NJG: “Should aggregate rate data using the harmonic mean (HM)”
But harmonic mean is not clear for time series
Cost me a month after Monitorama Boston to figure it out
See my blog post and detailed slides of April 9, 2013
Harmonic Averaging of Monitored Rate Data
Which is why Monitorama is cool ,
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
Irregular Time Series
Equal Intervals
AM
0.0 0.5 1.0 1.5 2.0 2.5Time
0.5
1.0
1.5
2.0Metric
Heights : hblue = 1 and hred = 1
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 24 / 52
Irregular Time Series
Arithmetic Mean of Heights
AM
0.0 0.5 1.0 1.5 2.0 2.5Time
0.5
1.0
1.5
2.0Metric
AM =12
hblue +12
hred =12(2 + 1) = 1.5
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 25 / 52
Irregular Time Series
Unequal Intervals (Area = 6)
0 1 2 3 4Time
0.5
1.0
1.5
2.0
2.5
3.0Metric
Heights : hblue = 3 and hred = 1
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 26 / 52
Irregular Time Series
AM Leaves a Gap (Area = 6)
AM
gap?
0 1 2 3 4Time
0.5
1.0
1.5
2.0
2.5
3.0Metric
AM =12
hblue +12
hred =12[3 + 1] = 2.0
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 27 / 52
Irregular Time Series
Stretch the Rectangle (Area = 6, Width = 4)
AM
HM
0 1 2 3 4Time
0.5
1.0
1.5
2.0
2.5
3.0Metric
HM = 1.5× 4 = 6
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 28 / 52
Irregular Time Series
Lowers the Height
AM
HM
0 1 2 3 4Time
0.5
1.0
1.5
2.0
2.5
3.0Metric
Theorem
HM < AM
Harmonic mean is always smaller than Arithmetic mean of the same samples
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 29 / 52
Irregular Time Series
Monitored Subscription Rates
Samples only occur when subscription count reaches 10,000.Sampling intervals are unevenly spaced in time over 33 days.
æ
ææ æ
æ
æ
AMHM
0 5 10 15 20 25 30 35Time0
1000
2000
3000
4000Rate
AM and HM are (different) averaged subscription rates.
Only HM gives the correct total time window of 33 days.
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 30 / 52
Irregular Time Series
Consistency
Use HM to aggregate monitored data when the following criteria apply:
R — Rate metric (on y -axis)
A — Async time intervals (on x-axis)
T — Threshold is low vs. high
E — Event data
Example metrics:
Cache-hit rate
Video bit-rate
Call rate
Please send in your examples ,
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 31 / 52
The Power of Power Laws
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power LawsZipf’s Law of WordsDatabase Query TimesEleventh Hour Spikes
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 32 / 52
The Power of Power Laws
The Power of Power Laws
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 33 / 52
The Power of Power Laws Zipf’s Law of Words
Example 1: Zipf’s Law
Ranked data is 1000 most common wordforms in UK English based on 29 works ofliterature by 18 authors (i.e., 4.6 million words)
Wordform: english word
Abs: absolute frequency (total number of occurrences)
Data format> td <- read.table("~/../Power Laws/zipf1000.txt",header=TRUE)> head(td)
Rank Wordform Abs r mod1 1 the 225300 29 223066.92 2 and 157486 29 156214.43 3 to 134478 29 134044.84 4 of 126523 29 125510.25 5 a 100200 29 99871.26 6 I 91584 29 86645.5
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 34 / 52
The Power of Power Laws Zipf’s Law of Words
Linear Axes
050000
100000
150000
200000
Ranked 1000 UK English Words
Ranked words (W)
Freq
uenc
y of
occ
urre
nce
(F)
the their us love voice true state eye stand worth service neck land art
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 35 / 52
The Power of Power Laws Zipf’s Law of Words
Log-Log Axes
5e+02
2e+03
5e+03
2e+04
5e+04
2e+05
Ranked 1000 UK English Words
Ranked words (W)
Freq
uenc
y of
occ
urre
nce
(F)
the it at would much us love lay eye dare
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 36 / 52
The Power of Power Laws Zipf’s Law of Words
Regression Fit
5e+02
2e+03
5e+03
2e+04
5e+04
2e+05
Ranked 1000 UK English Words
Ranked words (W)
Freq
uenc
y of
occ
urre
nce
(F)
the it at would much us love lay eye dare
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 37 / 52
The Power of Power Laws Zipf’s Law of Words
Consistency
Log axes are word frequency (y) and ranked word order (x):
log(y) = −1.13 log(x)
y = x−1.13
y =1
x1.13
Here, “power” refers to x to the power −1.13 (exponent)
Power laws differ from standard statistical distributions
Power laws carry most of the information in their tail
Fatter tail corresponds to stronger correlations than usual
Power laws imply persistent correlations that have to be explained
Zipf’s law correlations arise from grammatical rules
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
The Power of Power Laws Zipf’s Law of Words
Consistency
Log axes are word frequency (y) and ranked word order (x):
log(y) = −1.13 log(x)
y = x−1.13
y =1
x1.13
Here, “power” refers to x to the power −1.13 (exponent)
Power laws differ from standard statistical distributions
Power laws carry most of the information in their tail
Fatter tail corresponds to stronger correlations than usual
Power laws imply persistent correlations that have to be explained
Zipf’s law correlations arise from grammatical rules
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
The Power of Power Laws Database Query Times
Example 2: Database Query Times
0 100 200 300 400 500
0100
200
300
400
Index
orad$Elapstime
Like Zipf’s law, data must be ranked by frequency of occurrencec© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 39 / 52
The Power of Power Laws Database Query Times
Visualize Ranked Data
0 100 200 300 400 500
0100
200
300
400
Ranked SQL Times
Index
otr
Impossible to tell functional form of this curve
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 40 / 52
The Power of Power Laws Database Query Times
Try Double-Log Visualization
1 2 5 10 20 50 100 200 500
0.1
0.5
1.0
5.0
50.0
500.0
Log-Log SQL Times
Index
otr
Clearly not power law overallBut first 100 queries do appear to be power law
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 41 / 52
The Power of Power Laws Database Query Times
Three Data Windows
1 2 5 10 20 50 100
100
200
300
400
500
Log-Log of SQL-A Times
Index
etA
0 50 100 150
3040
5060
7080
Log-Lin of SQL-B Times
Index
etB
0 20 40 60 80
0.090
0.095
0.100
0.105
0.110
Log-Lin of SQL-C Times
Index
etC
(A) log-log axes
(B) log-linear axes
(C) log-linear axes
This suggests breaking data across 3 regions:
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 42 / 52
The Power of Power Laws Database Query Times
Regression Analysis
1 2 5 10 20 50 100
100
200
300
400
500
Log-Log SQL A-Times
Index
etA
0 50 100 150
3040
5060
7080
Log-Lin SQL B-Times
Index
etB
0 20 40 60 80
0.090
0.095
0.100
0.105
0.110
Log-Lin SQL C-Times
Index
etC
(A) yA ∼ x−0.4632 power law decay(B) yB ∼ e−0.0074x exponential decay(C) yC ∼ e−0.0028x exponential decay
But this is still not enough
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 43 / 52
The Power of Power Laws Database Query Times
Consistency
1 2 5 10 20 50 100
100
200
300
400
500
Log-Log SQL A-Times
Index
etA
Power law slope γ = 0.46
Half Zipfian slope γ = 1.0
Correlations stronger than Zipf
Hypothesis1 Shorter query times (window A) may involve dictionary lookups or other structured data.
Structure provides correlations.
2 Longer queries in window B are unstructured (ad hoc?) and randomized. Weakcorrelations produce exponential decay.
3 Ditto for window C.
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 44 / 52
The Power of Power Laws Eleventh Hour Spikes
Example 3: Eleventh Hour Spikes
All Australian businesses were required to register with the Australian Tax Office (ATO)for an Australian Business Number (ABN) to claim an income tax refund. The ABNwas introduced in Y2K.
Time series data from ABN registrations database.
Period covers March 27 to September 19, 2000
Deadline traffic spike on 31 May, 2000
Similar to rush to meet Obamacare deadline of March 31, 2014.
More details in my CMG Australia 2006 paper.c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 45 / 52
The Power of Power Laws Eleventh Hour Spikes
Complete Time Series
11!3!2000 21!4!2000 21!5!2000 15!6!2000 10!7!2000 4!8!2000 29!8!20000
200000
400000
600000
800000
1.!106ORA
Connections
Question: Could the “11th hour” spike have been predicted?
Answer: Yes, but quite involved.
How: Using a power law.
What else!?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
The Power of Power Laws Eleventh Hour Spikes
Complete Time Series
11!3!2000 21!4!2000 21!5!2000 15!6!2000 10!7!2000 4!8!2000 29!8!20000
200000
400000
600000
800000
1.!106ORA
Connections
Question: Could the “11th hour” spike have been predicted?
Answer: Yes, but quite involved.
How: Using a power law. What else!?
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
The Power of Power Laws Eleventh Hour Spikes
Semi-Log Plot
11!3!2000 21!4!2000 21!5!20001!1042!104
5!1041!1052!105
5!105
1!1062!106
ORA
Connections
y -axis is the number of Oracle RDBMS connections (log scale)
Peak growth preceding spike looks almost linear on semi-log plot
Time range: 0–38 days
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 47 / 52
The Power of Power Laws Eleventh Hour Spikes
Statistical Regression on Peaks
11!3!2000 21!4!20001!104
2!104
5!104
1!105
2!105
5!105
1!106
ORA
Connections
Linear growth on semi-log axes implies exponential function y = AeBt
Fit parameters
Origin: A = 1.14128× 105
Curvature: B = 0.0175
Doubling period:ln(2)
B∼ 6 months
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 48 / 52
The Power of Power Laws Eleventh Hour Spikes
Trend on Linear Axes
11!3!2000 21!4!2000 21!5!2000 15!6!2000 10!7!20000
200000
400000
600000
800000
1.!106ORA
Connections
Exponential forecast looks valid, up to the crosshairs
Significantly underestimates onset of the “11th hour” peak
And rapid drop off after the peak
Faster than exponential suggests power law
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 49 / 52
The Power of Power Laws Eleventh Hour Spikes
Power Law Fit
Exp growth
Power law
11!3!2000 21!4!2000 21!5!2000 15!6!2000 10!7!20000
200000
400000
600000
800000
1.!106ORA
Connections
Log axes are y: connects (y) and time in days (x):
log(y) = −0.6421 log(|x − xc |)
y =1
|x − xc |0.6421
where peak occurs at xc = 61 daysc© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 50 / 52
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributionsMay have mixed regions of power law and other distributionsCan even predict critical spikesPower laws signal presence of strong correlationsExplaining those correlations may be more difficultZipf’s law took 40 years
Remember
Aim for consistencyLearn to talk to God , (She’s listening)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributionsMay have mixed regions of power law and other distributionsCan even predict critical spikesPower laws signal presence of strong correlationsExplaining those correlations may be more difficultZipf’s law took 40 years
Remember
Aim for consistencyLearn to talk to God ,
(She’s listening)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributionsMay have mixed regions of power law and other distributionsCan even predict critical spikesPower laws signal presence of strong correlationsExplaining those correlations may be more difficultZipf’s law took 40 years
Remember
Aim for consistencyLearn to talk to God , (She’s listening)
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
The Power of Power Laws Eleventh Hour Spikes
Performance Dynamics CompanyCastro Valley, Californiawww.perfdynamics.comperfdynamics.blogspot.comtwitter.com/DrQzFacebookTraining classes (May 19, 2014)[email protected]: +1-510-537-5758
c© 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 52 / 52