Upload
kash
View
213
Download
1
Embed Size (px)
Citation preview
Empirical Analysis of Bayesian Kernel Methods for
Modeling Count Data
Molly Stam Floyd, Hiba Baroud, and Kash Barker University of Oklahoma, mstam, hbaroud, [email protected]
Abstract - Bayesian models are used for estimation and
forecasting in a wide range of application areas. One extension of such methods is the Bayesian kernel model,
which integrate the Bayesian conjugate prior with kernel
functions. This paper empirically analyzes the
performance of Bayesian kernel models when applied to
count data. The analysis is performed with several data
sets with different characteristics regarding the numbers
of observations and predictors. While the size of the data
and number of predictors is changing across data sets,
the predictors are all continuous in this study. The
Poisson Bayesian kernel model is applied to each data set
and compared to the Poisson generalized linear model.
The measures of goodness of fit used are the deviance
and the log-likelihood functional value, and the
computation is done by dividing the data into training
and testing sets, for the Bayesian kernel model, a tuning
set is used to optimize the parameters of the kernel
function. The Bayesian kernel approach tends to outperform classical count data models for smaller data
sets with a small number of predictors. The analysis
conducted in this paper is an initial step towards the
validation of the Poisson Bayesian kernel model. This
type of model can be useful in risk analysis applications
in which data sources are scarce and can help in
analytical and data-driven decision making.
Index Terms - Bayesian kernel models, Count data,
Goodness of fit, Poisson regression
INTRODUCTION
In many situations, the likelihood of an event is found with
the average rate at which the event occurs. And often that
rate is a function of characteristics surrounding the event.
To integrate the impacts of both the component
characteristics and any prior failure information, we propose
to use a Bayesian kernel model as an approach to a more
accurate estimation of the rate of occurrence of an event.
More specifically, we use an extended version of this
method, the Poisson Bayesian kernel model to accommodate
count data and estimate the rate of occurrence. This paper
provides an empirical analysis of this model using different
types of datasets and measures of goodness of fit used to
validate the accuracy of the model in comparison to more
classical approaches.
Kernel methods, first introduced in a pattern recognition
setting several decades ago [1], have found popularity across a number of data mining domains, including bioinformatics
[2, 3], sensing [4, 5], and financial risk management and
forecasting [6, 7], among many others. Kernel functions are
used to map input data, for which no pattern can be
recognized, to a higher dimensional space, where patterns
are more readily detected. Such functions enable algorithms
designed to detect relationships among data in the higher
dimensional space, including least squares regression and
support vector machine (SVM) classification [8-10].
Integrating Bayesian methods with kernel methods has
recently garnered attention [11-14], as Bayesian methods
make use of previous data to estimate posterior probability
distributions of the parameter of interest given that it follows
a specific prior distribution. The integration of Bayesian and
kernel methods enables a classification algorithm which
provides probabilistic outcomes as opposed to deterministic
outcomes (i.e., such as those resulting from SVM classification). That is, rather than assigning a class to a data
point, Bayesian kernel methods assign a probability that the
data point belongs to a particular class. Several extensions to
Bayesian kernel models have appeared, including (i) the
relevance vector machine (RVM) which assumes a Gaussian
distribution for the probability to be estimated [15, 16], and
(ii) non-Gaussian distributions for binary problems [17-19].
This paper analyzes a similar approach to model count
data, whereby the outcome estimated is the rate of
occurrence of a certain event rather than a classification of a
data point in a deterministic class. The prior distribution of
the rate is assumed to follow a gamma distribution and the
notion of conjugate prior is used to construct the posterior
distribution whose parameters are depending on the kernel
function. Section 2 provides a review of appropriate
literature. Section 3 details the development of the Bayesian
kernel approach for count data. Section 4 discusses the goodness of fit measures used in the empirical analysis
presented in section 5, and section 6 provides concluding
remarks.
BACKGROUND
I. Bayes Rule and the Conjugate Prior
The classic Bayes rule assumes that a prior probability for an
event of interest, A, is given as P(A), and a likelihood of
event B conditioned on the occurrence of A is given as P(B |
A). With these probabilities, along with P(B), one can
978-1-4799-4836-9/14/$31.00 (c) 2014, IEEE 328
calculate the posterior distribution for the event of interest
given knowledge of B, or P(A | B) shown in Eq. (1).
(1)
This manifests itself, for example, when we want to
develop a posterior distribution for a parameter of interest
from (i) the prior distribution for that parameter, and (ii) the data describing that parameter in the form of a likelihood
function, which is a conditional likelihood of obtaining the
data given what we understand about the parameter. In such
a case, the denominator does not depend on the parameter of
interest and can be excluded from the Bayes rule equation
when maximum likelihood calculations are performed.
More specifically, in the SVM framework, consider a
function t that maps input data x to a value corresponding to
its binary class (y = +1, -1). Given a training set of data, a
posterior probability distribution for this function t can be
estimated as being proportional to its prior distribution
multiplied by the likelihood function, as depicted in Eq. (2).
(2)
An important notion used in the Bayesian framework is
conjugate distributions, which assume that posterior P(t | x)
and prior P(t) distributions are from the same family of
distributions. For example, in the non-Gaussian extension
for the Bayesian kernel models, MacKenzie et al. [19] use
the Beta-Bernoulli conjugate prior. Having the prior and
posterior follow the same distribution insures that the overall
data properties are kept while modifying the details of the
distribution such as the parameters to better explain the
trends.
II. Gaussian Bayesian Kernel Models and Non-Gaussian Extensions
For an m × d data matrix X with rows corresponding to m
data points each with d attributes, the function t(X) can be
thought of as a random vector of length m. Gaussian
Bayesian kernel models assume the vector-valued function t
follows a multivariate normal distribution with mean
and covariance matrix , where matrix
K is positive definite and matrix element Kij is the kernel
function k(xi, xj) between the ith and jth data points. The
multivariate normal distribution for the realization of t is
found in Eq. (3), where is a vector-valued variable of
length m, such that , [16]. The first term
in the probability density function does not depend on
parameter t, and hence the prior distribution can be further
reduced.
(3)
In the case of a binary classification, an appropriate
likelihood function, P(t | x), would be the logit function
shown in Eqs. (4) and (5).
(4)
(5)
The posterior distribution is then the product of the
likelihood function and the prior distribution for a data set of
m data points, found in Eq. (6). To estimate the parameter of
interest, , Eq. (6) is maximized (or its negative log is minimized) using any of several optimization algorithms
(e.g., the Newton-Raphson method).
(6)
An extension to the basic Bayesian kernel model is the
non-Gaussian Bayesian kernel model [17-19], which can
improve predictive accuracy for certain problems where a
Gaussian distribution for model parameters should not
realistically be assumed. MacKenzie et al. [19] highlight
some of the drawbacks of using the Gaussian distribution for
binary classification problems, use a beta conjugate prior,
and offer an alternative likelihood function to the logit,
expanding previous work done on Non-Gaussian kernel
models [17,18] by introducing a more generalized model.
III. Methods for Count Data
One of the classical approaches used to analyze count data is
the Poisson Generalized Linear Model (GLM) [20, 21]. The
Poisson GLM assumes that the rate to be estimated has an
exponential relationship with a set of covariates representing
coefficients for the different attributes, shown in Eq. (7).
(7)
More sophisticated models analyze count data within a
Bayesian framework such as the Bayesian analysis of the
Poisson model using the Gamma-Poisson conjugate prior,
which will be further discussed in the next section.
Extensions to this model include the analysis of the
parameters of the gamma prior distribution [22].
Other extensions to Bayesian Poisson methods consider
hierarchical models [23]. The proposed model is then based
on the multivariate Poisson-log normal distribution with a
hierarchical Bayesian application. This multivariate
distribution is used to model discrete multiple count data and
is shown in Eq. (8). is the M-dimensional
multivariate normal distribution. The mean vector is
represented by μ, and T is the inverse of the covariance
matrix. The hyper-prior parameters R and π = M are known.
The model is advantageous in that it can model joint
responses and can detect relationships among the categories
of count variables. However, Markov Chain Monte Carlo
329
methods were utilized to make inferences about the model
parameters, which can oftentimes be complex.
(8)
The model discussed and illustrated in this paper is
simple enough to avoid expensive computations but detailed enough to overcome issues in basic Bayesian such as the
Gamma-Poisson conjugate prior and count regression
models such as the GLM.
POISSON BAYESIAN KERNEL MODEL
Bayesian kernel methods estimate the rate of occurrence of
the event rather than estimating a deterministic value for the
number of times the event is estimated to occur. A common distribution to model count data within a Bayesian
framework is the Gamma-Poisson conjugate prior. The
development of the Poisson Bayesian kernel method
discussed here is found in [24].
It is assumed that the parameter to be estimated is the
rate of occurrence, , which follows a Gamma prior
distribution with parameters and , as shown in Eq. (9).
(9)
For the likelihood function, the product of the Poisson
density function, shown in Eq. (10), is used, since this is a
Gamma-Poisson conjugate prior approach.
(10)
Thus, the posterior distribution is the product of Eqs. (9)
and (10). The posterior distribution is also a gamma
distribution where and . This result is the basic Gamma-Poisson Bayesian approach which
assumes the notion of exchangeability meaning that for
different sets of training and testing datasets, the resulting
posterior parameter will be similar since they are a function
of the prior parameter, the size of the dataset and the
summation of all the data points. The characteristic of each
outcome are not taken into consideration in this case, but
rather the overall property of the dataset [19].
Rearranging the product of the likelihood function and
the prior distribution function results in a Gamma
distribution in Eq. (11).
(11)
Using the same argument as above, the parameters for
the Bayesian kernel model for counts are expressed in Eqs.
(12) and (13). K is the m × m kernel matrix, Y is an m × 1
vector containing the output data associated with the m
observations of X, and V is an m × 1 vector containing ones.
(12)
(13)
With the addition of the kernel function, the new data
point is compared with the training set and according to the
similarities of the attributes, new values for the parameter of
the posterior distribution will be computed. The choice of the type of kernel function depends on the application and
the model user. For the purpose of the empirical analysis
conducted in this paper, we use the most popular kernel
function, the radial basis function in Eq. (14),
where is one entry in the matrix representing the
kernel function between the ith and jth data points. Note that
in the data sets used in this empirical study, all predictors are
continuous variables. The radial basis function parameter, ,
is tuned to obtain an optimal value that would either
maximize the log-likelihood function or minimize the
deviance, details on the tuning of this parameter are
discussed in the next section.
(14)
The rate for the new data point follows then a Gamma
distribution with parameters and . As a point estimate for this parameter, we will consider the expected value of the
posterior distribution, shown in Eq. (15) as the ratio of the
gamma distribution parameters and .
(15)
Note that a different point estimate for the rate can be
used such as the median, the mode, or the variance,
depending on the type of problem and the model users.
GOODNESS OF FIT MEASURES
The purpose of this paper is to empirically test the Poisson
Bayesian kernel model to determine how well it fits different
data sets in comparison to another classical method for
modeling count data, the Poisson generalized linear model
(GLM) [20, 21]. The Poisson GLM, presented in the
background section in (7), assumes that the rate to be
330
estimated has an exponential relationship with a set of
covariates representing coefficients for the different
attributes.
The functional values of two metrics are used to compare
the two models. The first metric is the deviance, which
computes the difference in the log-likelihood function
between the fitted model and the saturated model, Eq. (16),
where is the size of the testing set, is the true value of
the data point, and is the estimated rate for the particular data point.
(16)
The deviance is the generalized form of the sum of
squared errors used in the linear regression model, it is a
metric that analyzes the discrepancy between the observed
and estimated values. The deviance for a Poisson regression
model is represented in Eq. (17), where
when We use to assess how well the fitted values
are representing the observed rate of occurrences in both the
Poisson GLM and the Poisson Bayesian kernel model.
(17)
The second metric used is the functional value of the log-
likelihood, shown in Eq. (18), which is to be maximized.
Note that the Poisson GLM coefficients estimates are
computed such that the likelihood is maximized. The
Poisson Bayesian kernel model is fitted given a tuned
parameter, , of the radial basis kernel function in (14). This
parameter is optimized such that the log-likelihood function
is maximized.
(18)
To determine the robustness of the tuning of this
parameter and its influence on the estimated posterior
parameters, and , in (12) and (13), respectively, the
metrics are also computed for tuned to minimize the
deviance. Note that for the analysis of both metrics, we
discard components that are independent of the model, such
as the multiplication by 2 in the deviance and in the
log-likelihood function.
EMPIRICAL ANALYSIS
The model discussed above is applied to several data sets
[25-27], and its performance is compared to the Poisson
GLM using the metrics discussed in the previous section. A
brief description of the data sets is found in Table I. The first
three data sets are similar in terms of the number of
predictors and the size of the data, while the fourth set has a
larger number of predictors for a small data set, and the fifth
is a large data set with a small number of predictors. Note
that the number of predictors is held constant across the
models to ensure consistency in the comparison, though
future research could consider the goodness of fit of each
model given the number of predictors required to explain the
rate of occurrence and achieve the same level of accuracy.
Also, the prior parameters are assumed to be equal to zero,
. Testing is performed for 100 trials on 30% of the data,
with 50% of the data used as a training set and 20% as a
tuning set for computing the unknown parameter, , in the
kernel function. The training and the tuning sets were combined into one training set to perform the testing. For
each of the two models, the estimated rate of occurrence is
computed for the testing test and used to evaluate the
deviance and the log-likelihood functional value given the
observed values. This process is repeated 100 times where,
at each iteration, random samples of training, tuning, and
testing sets are chosen. Tables II and III provide a summary
of the analysis.
The deviance and log-likelihood values presented in the
tables below are the average values of the goodness of fit
measures evaluated over 100 trials. PBK refers to the
Poisson Bayesian kernel model and PGLM refers to the
Poisson GLM. Recall that a model with a smaller deviance
and a larger log-likelihood functional value fits better the
data.
TABLE I
DESCRIPTION OF DATA SETS IN THE POISSON BAYESIAN KERNEL MODEL VALIDATION STUDY
Data set Number of
attributes
Data
set size Dependent variable Predictors
Crime 4 50 Crime rate Race, percentage of high school graduates, percentage below poverty level,
percentage with a single parent
Murder 4 51 Murder rate Race, percentage of high school graduates, percentage below poverty level,
percentage with a single parent
Murder in
Metropolitan 4 51
Murder rate in
metropolitan areas
Race, percentage of high school graduates, percentage below poverty level,
percentage with a single parent
Mussels 8 45 Number of species of
mussels
Area, number of stepping stones (intermediate rivers) to 4 major
species-source river systems, concentration of nitrate, solid residue,
concentration of hydronium
Customer 5 110
Number of customers
visiting a store from a
particular region
Number of housing units in the region, average household income in the region,
average housing unit age in the region, distance to the nearest competitor,
distance to the store
331
Overall, there are three out of five data sets for which the
Poisson Bayesian kernel model performs better than the
Poisson GLM, and in particular, those three cases are all
among the four small data sets. Both the deviance and the
log-likelihood behave similarly for all the datasets and lead
to the same conclusion of the model performance.
The deviance tends to be larger whenever we have a
small dataset and a small number of predictors, and in both
cases where we have the largest deviances among all datasets (Crime and Murder in Metropolitan area), the
Poisson Bayesian kernel model performed better than the
Poisson GLM. While conclusions might not be definitive
without further analysis, the Poisson Bayesian kernel model
initially appears to be a good model when we have a small
data set with a small number of predictors, a situation known
to cause issues with regression modeling [21].
TABLE II
GOODNESS OF FIT MEASURES (MAXMIZING THE LOG-
LIKELIHOOD)
Deviance Log-Likelihood
Data Set PBK PGLM PBK PGLM
Crime 79.8 130.1 2632.5 2582.1
Murder 23.4 7.6 172.1 187.9
Murder in
Metropolitan area 56.7 64.2 3190.7 3183.3
Mussels 13.9 17.3 207.6 204.2
Customer 24.2 18.6 567.7 573.2
Recall that the radial basis function parameter, , was
initially tuned such that the log-likelihood is maximized, which complies with the estimation method of the Poisson
GLM [21]. In order to identify the robustness of the tuning
process and its impact on the empirical analysis and the
goodness of fit measures, we perform the same computation
using a tuned such that the deviance is minimized, the
results of the computation are summarized in Table III.
TABLE III
GOODNESS OF FIT MEASURES (MINIMIZING THE DEVIANCE)
Deviance Log-Likelihood
Data Set PBK PGLM PBK PGLM
Crime 90.2 130.1 2662.1 2582.1
Murder 24.1 7.6 171.4 187.9
Murder in
Metropolitan area 55.6 64.2 3191.9 3183.3
Mussels 16.6 17.3 204.9 204.2
Customer 41.9 18.6 550.0 573.2
Although the values of the goodness of fit measures are different for the Poisson Bayesian kernel model, the
conclusion regarding the performance of the model is the
same under both the deviance and log-likelihood function.
Note that these new values are compared with the same
values of deviance and log-likelihood of the Poisson GLM
as maximum likelihood estimation is popular and generally
used for GLMs [21]. This suggests that the tuning process is
robust enough that the difference in the goodness of fit
measures is insignificant and did not result in any change in
the conclusion of the analysis.
CONCLUDING REMARKS
The Poisson Bayesian kernel model is presented in this
paper and empirically tested and compared with the classical
Poisson GLM. Both models were used to fit several datasets
having different characteristics in terms of the size of the
data and the number of predictors.
The evaluation of the performance of each model is
based on the values of the deviance and the log-likelihood
function. Based on the results obtained, the Poisson
Bayesian kernel model outperformed the Poisson GLM in
the majority of the sets. Also, the Poisson Bayesian kernel
model might be a better model for small-sized data sets having few predictors. Such a result can be very useful in
risk analysis applications to estimate the rate of occurrence
of a certain disruption in transportation systems or power
grids. In such cases data sources can be scarce due to the
lack of occurrence of the event and the possible factors that
might cause a disruption, and a need for a more accurate
estimation of the rate of disruption can help save lives and
lead to a more efficient preparedness and recovery
investment and allocation.
The results presented in this paper serve as an initial step
in the validation process of the Poisson Bayesian kernel
model. Future research will investigate the impact of
changing the number of predictors across the models
analyzed and look into the comparison with other types of
count data models, in addition to considering other measures
for testing the goodness of fit.
REFERENCES
[1] Aizerman, M., Braverman, E., and Rozonoer, L., 1964, “Theoretical
foundations of the potential function method in pattern recognition
learning,” Automation and Remote Control, 25, pp. 821–837.
[2] Schölkopf, B., Guyon, I., and Weston, J., 2003, “Statistical Learning
and Kernel Methods In Bioinformatics,” IOS Press Amsterdam, The
Netherlands, pp. 1–21.
[3] Ben-Hur, A. and Noble, W.S. 2005. “Kernel methods for predicting
protein–protein interactions,” Bioinformatics, 21(Suppl. 1), pp. i38–
i46.
[4] Arias, P., Randall, G., and Sapiro, G. 2007. “Connecting the Out-of-
sample and Preimage Problems in Kernel Methods.” Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition,
Minneapolis, Minnesota, pp. 18-23.
[5] Camps-Valls, G., Rojo-Alvarez, J. L., and Martinez-Ramon, M. 2006.
Kernel Methods in Bioengineering, Signal and Image Processing.
Hershey, PA: IGI Global.
[6] Wang, L. and Zhu, J., 2010, “Financial market forecasting using a
two-step kernel learning method for the support vector regression.”
Annals of Operation Research, 174, pp. 103-120.
[7] Mitschele, A., Chalup, S., Schlottmann, F., and Seese, D. 2006.
“Applications of Kernel Methods in Financial Risk Management.”
Computing in Economics and Finance , Society for Computational,
no. 317.
[8] Cherkassky, V. and F. Mulier. 1998. Learning from Data: Concepts,
Theory, and Methods. Hoboken, NJ: Wiley.
332
[9] Cristianini, N., J. Shawe-Taylor. 2000. An Introduction to Support
Vector Machines and Other Kernel based Learning Methods.
Cambridge, UK: Cambridge University Press.
[10] Hastie, T., R. Tibshirani, and J. Friedman. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New
York, NY: Springer.
[11] Seeger, M. “Bayesian model selection for support vector machines,
Gaussian processes and other kernel classifiers.” In Solla, S. A., Leen,
T. K., and Müller, K. R. (eds.), 2000, Advances in neural information
processing systems. Cambridge: MIT Press, pp. 603-609.
[12] Bishop, C. M. and Tipping, M. E. “Bayesian regression and
classification,” In Suykens, J. A. K., Horváth, G., Basu, S., Micchelli,
C., and Vandewalle, J. (eds), 2003, Advances in Learning Theory:
Methods, Models and Applications, IOS press, Amsterdam, pp. 267-
288
[13] Mallick, B. K., Ghosh, D., and Ghosh, M. 2005. “Bayesian
classification of tumours by using gene expression data,” Journal of the Royal Statistical Society, Part B, 67(2), pp. 219-234.
[14] Zhang, Z., Dai, G., and Jordan, M. I. 2011. “Bayesian generalized
kernel mixed models,” Journal of Machine Learning Research, 12,
pp. 111-139.
[15] Tipping, M.E. 2001. “Sparse Bayesian Learning and the Relevance
Vector Machine.” Journal of Machine Learning Research, 1, pp. 211-
244
[16] Schölkopf, B. and Smola A. J. 2002. Learning with Kernels: Support
Vector Machines, Regularization, Optimization, and Beyond. MIT
Press, Cambridge, MA.
[17] Montesano, L., and Lopes, M. June 2009. “Learning grasping
affordances from local visual descriptors.” Proceedings of the 8th
IEEE international conference on development and learning,
Shanghai, China, pp. 1-6.
[18] Mason, M., and Lopes, M. March 2011. “Robot self-initiative and
personalization by learning through repeated interactions,”
Proceedings of the 6th ACM/IEEE international conference on
human-robot interaction, Lausanne, Switzerland, pp. 433-440.
[19] MacKenzie, C.A., Trafalis, T. B., and Barker, K. “Bayesian Kernel
Methods for Non-Gaussian Distributions.” In revision.
[20] Cameron, A.C. and Trivedi, P. K. 1986. “Econometric Models Based
on Count Data: Comparisons and Applications of Some Estimators
and Tests,” Journal of Applied Econometrics, 1(1), pp. 29-53.
[21] Cameron, A.C. and Trivedi, P. K. 1998. Regression Analysis of Count
Data. Cambridge University Press, Cambridge, UK.
[22] Winkelman, R. 2008. “Chapter 8: Bayesian Analysis of Count Data.”
In Econometric Analysis of Count Data, 5th edition, Springer, Verlag
Berlin Heidelberg.
[23] Tunaru, R. 2002. “Hierarchical Bayesian Models for Multiple Count
Data,” Australian Journal of Statistics, 31(2-3), pp. 221-229.
[24] Baroud, H., Barker, K., Lurvey, R., and MacKenzie, C. A. 2013.
“Bayesian Kernel Models for Disruptive Event Data,” Proceedings of
the ISERC, San Juan, Puerto Rico, pp.1777-1785.
[25] Agresti A. and Finlay, B. 2008. Statistical Methods for the Social Sciences, 4th edition, Prentice Hall.
[26] Sepkoski, J. J. and Rex, M. A. 1974. "Distribution of Freshwater
Mussels: Coastal Rivers as Biogeographic Islands." Systematic
Zoology, 23(2), pp. 165-188.
[27] Kutner, M. H., Nachtsheim, C., Neter, J., and Li, W. 2005. Applied
Linear Statistical Models, 5th edition, New York: McGraw-Hill-Irwin.
AUTHOR INFORMATION
Molly Stam Floyd is an Undergraduate Research Assistant
in the School of Industrial and Systems Engineering at the
University of Oklahoma. She will earn a B.S. in Industrial
Engineering in May 2014 and will pursue graduate studies
thereafter. Her research interests lie in the resilience of
disaster recovery and humanitarian relief networks, and her
research has been funded by the Experimental Program to
Stimulate Competitive Research (EPSCoR).
Hiba Baroud is a Ph.D. Candidate and Graduate Research
Assistant in the School of Industrial and Systems
Engineering at the University of Oklahoma. She came to OU
following B.S. and M.S. degrees in Actuarial Science from
Notre Dame University, Lebanon and the University of
Waterloo, respectively. Her research interests include
statistical modeling for risk analysis and decision making.
Kash Barker is an Assistant Professor in the School of
Industrial and Systems Engineering at the University of
Oklahoma. His research interests primarily lie in the
reliability, resilience, and economic impact of infrastructure
networks, and his work has been funded by the National
Science Foundation and the Army Research Office, among
others. He earned B.S. and M.S. degrees in Industrial
Engineering from the University of Oklahoma and a Ph.D. in
Systems Engineering at the University of Virginia.
333