22
Introduction GPR Derivations Conclusion Gaussian Process Regression Forecasting of Computer Network Conditions Christina Garman Bucknell University August 3, 2010 Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 1 / 22

Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Gaussian Process Regression Forecasting ofComputer Network Conditions

Christina Garman

Bucknell University

August 3, 2010

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 1 / 22

Page 2: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

What are we doing and why do we care?

We have investigated Gaussian process regression forforecasting network conditions

Computer network conditions concern:

Users with large data transfers or resource-intensiveapplicationsNetwork engineers monitoring the quality of their networkNetwork researchers

Gaussian process regression has not been applied to the fieldof computer networking

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 2 / 22

Page 3: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Computer Networking

A computer network is a system of computers and devicesconnected to share information and resources

Performance metrics of interest

Available bandwidthLatencyLoss

L. Peterson and B. Davie, Computer Networks: A Systems Approach, Elsevier, 2007.

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 3 / 22

Page 4: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Background

Our forecasting efforts focus on the Department of Energy’sEnergy Sciences Network (ESnet)

Forecasts are done in MATLAB. We have created a frameworkthat allows the code to be run directly in MATLAB or from aC program.

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 4 / 22

Page 5: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

ESnet

Department of Energy, Energy Sciences Network (Esnet), http://www.es.net/pub/maps/topology.html

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 5 / 22

Page 6: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Gaussian Process Regression

Definition

A Gaussian process is an indexed set of random variables, anyfinite number of which have a joint Gaussian distribution. It can becompletely specified by a mean function and covariance function.

Gaussian process regression (GPR) allows us to make predictions ofcontinuous quantities based on “learning” from a set of trainingdata.

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 6 / 22

Page 7: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

What is a covariance function?

Also called a kernel function

Chosen in a way that best fits the data

Gives us a model of the data

Controls the properties of the Gaussian process

Has adjustable parameters, called hyperparameters

k(xi , xj ) = σ2e− 1

2

(|xi−xj |

l

)2

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 7 / 22

Page 8: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

What are hyperparameters?

Adjustable

Can be “learned” or inferred from a set of training data

Allow the kernel function to provide the best description ofthe current data

k(xi , xj ) = σ2e− 1

2

(|xi−xj |

l

)2

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 8 / 22

Page 9: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Maximum Likelihood Estimation

Used to “learn” the hyperparameters

µ =~Y TK−1~1

~1TK−1~1

σ2 =1

n~Y TK−1~Y

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 9 / 22

Page 10: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Terminology

Expected Value

E [X ] =∑

i

pixi

VarianceV [X ] = E [(X − E [X ])2]

Covariance

Cov [X ,Y ] = E [(X − E [X ])(Y − E [Y ])]

Σ = Cov [~Y ] =

Cov [Y1,Y1] Cov [Y1,Y2] · · · Cov [Y1,Yn]Cov [Y2,Y1] Cov [Y2,Y2] · · · Cov [Y2,Yn]

......

. . ....

Cov [Yn,Y1] Cov [Yn,Y2] · · · Cov [Yn,Yn]

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 10 / 22

Page 11: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Forecasting

Forecast

Yf = E [Yf |~Y ]

Standard Error

se(Yf ) =

√V [Yf |~Y ]

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 11 / 22

Page 12: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Basic Algorithm

1 Given a vector ~Y of n measurements made at times t1, . . . , tn

as training data

2 Choose a kernel function

3 Perform a maximum likelihood estimate of the kernelparameters (hyperparameters) using the training data

4 Forecast the measurement Yf at time tf . The mean andvariance of Yf given the n measurements ~Y are

E [Yf |~Y ] = µ+ ΣTf Σ−1(~Y − ~µ)

V [Yf |~Y ] = Σff − ΣTf Σ−1Σf

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 12 / 22

Page 13: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Why GPR?

GPR accommodates

Asynchronous data sourcesPeriodic dataActively measured dataMissing dataStructural data

GPR can model various different trends and properties of adata set

Simple covariance functions can be combined to create morecomplex ones

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 13 / 22

Page 14: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Combining Covariance Functions

C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 14 / 22

Page 15: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

New Formulae for Updating GPR Forecasts

Expected Value

E [Yf |~Y ,Yu] = E [Yf |~Y ] +

Σf

Σuf

TΣ−1Σu

−1

Σ−1Σu

−1

T ~YYu

Σuu−ΣT

u Σ−1Σu

Variance

V [Yf |~Y ,Yu] = V [Yf |~Y ]−

Σf

Σuf

TΣ−1Σu

−1

2

Σuu−ΣTu Σ−1Σu

Computationally efficient - no new matrix inversions

No need to redo whole process each time a new data point isreceived

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 15 / 22

Page 16: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Variance - Two Questions

Question 1

What is the effect of history length on prediction error?

tn+1 tn t2 t1 tf· · ·

E [Var [Yf |Y1, · · · ,Yn]− Var [Yf |Y1, · · · ,Yn+1]] =???

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 16 / 22

Page 17: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Variance - Two Questions

Question 2

How does the variance change as our forecasting point moves outin time?

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 17 / 22

Page 18: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Variance - Two Questions

Both of these questions boil down to a study of the same quantity:

KTf K−1Kf

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 18 / 22

Page 19: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Bounds

Using the Rayleigh-Ritz theorem, we can bound the quantity thatwe are interested in, giving us:

1λmax (K)K

Tf Kf ≤ KT

f K−1Kf ≤ 1λmin(K)K

Tf Kf

Or more simply:

1λmax (K)nk(t)2 ≤ KT

f K−1Kf ≤ 1λmin(K)nk(t)2

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 19 / 22

Page 20: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Future Work

Revisit this work from an information theoretic perspective

Improve network performance characteristics forecasting usingmultivariate data

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 20 / 22

Page 21: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Acknowledgements

Department of Energy Research Assistantship

MATLAB Code: Carl Edward Rasmussen and Hannes Nickisch

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 21 / 22

Page 22: Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Questions?

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 22 / 22