Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Estimating the Probability of Informed Trading:
A Bayesian approach
Jim Griffin1, Jaideep Oberoi∗2, and Samuel D. Oduro3
1Department of Statistical Science, University College, Gower Street, London WC1E 6BT, United
Kingdom. Email: [email protected]
2Kent Business School, University of Kent, Parkwood Road, Canterbury CT2 7FS, United
Kingdom. Tel: +44 1227 82 3865 Email: [email protected]
3Public Health & Intelligence, NHS National Services Scotland, 1 South Gyle Crescent, Edinburgh
EH12 9EB, United Kingdom. Tel: +44 1312 75 6934 Email: [email protected]
Abstract
The Probability of Informed Trading (PIN) is a widely used indicator of infor-
mation asymmetry risk in the trading of securities. Its estimation using maximum
likelihood algorithms has been shown to be problematic, resulting in biased estimates,
especially in the case of liquid and frequently traded assets. We provide an alternative
approach to estimating PIN by means of a Bayesian method that addresses some of
the shortcomings in the existing estimation strategies. The method leads to a natural
quantification of the uncertainty of PIN estimates, which may prove helpful in their
use and interpretation. We also provide an easy to use toolbox for estimating PIN.
JEL classification: C13, G12, G14
Keywords : PIN, software, Bayesian estimation, information asymmetry risk, robust esti-
mation.
∗Please send correspondence to [email protected]. Tel: +44 1227 82 3865
1. Introduction
The probability of informed trading (PIN) is a widely used measure of information asym-
metry risk introduced in a sequence of papers following Easley and O’Hara (1992). In
particular, the measures in Easley, Kiefer, O’Hara, and Paperman (1996) (EKOP-PIN for
the initials of the authors) and Easley, Hvidkjaer, and O’Hara (2002) (EHO-PIN) have been
frequently applied in studies related to illiquidity risk and informed trading. In this paper,
we propose a Bayesian approach to estimating PIN in order to address well-documented
problems with its estimation using maximum likelihood algorithms. We also provide the
associated code in the form of a toolbox for use by researchers.
The importance of PIN is associated with its implications for trading costs, illiquidity
risk and expected returns (see e.g. the theoretical models of Glosten and Milgrom, 1985;
Kyle, 1985; Easley and O’Hara, 1987, 2004). It is based on the assumption that there are two
types of agents that enter the market to trade: those wishing to exploit superior or private
information (informed traders) and those that wish to trade for other reasons (variously
referred to as uninformed, liquidity or noise traders). In theoretical models, market makers
adjust their bid and ask quotes to reduce the risk of losses from trading against informed
counterparties. As this may affect the expected returns on an asset, PIN has been used
as the main explanatory variable or as a control in regressions related to asset pricing.
Consequently, the measure itself has undergone scrutiny and refinement over the years.
In the theoretical model, the key information observed by the market maker to estimate
PIN is the order flow. As a result, PIN models specify a distribution for the signed trades
(buys or sells) initiated by informed and uninformed traders and estimate the relevant param-
eters using the maximum likelihood estimator (MLE). The literature has found two practical
drawbacks with the MLE. Firstly, studies have found that the estimation algorithm leads
too often to floating-point exceptions when the inputs to the likelihood function (numbers
of buy and sell trades) are large (see, e.g. Boehmer, Grammig, and Theissen, 2007; Jackson,
2013; Lei and Wu, 2005; Lin and Ke, 2011; Yan and Zhang, 2012). Secondly, the estimates
1
of some of the underlying parameters of PIN (from existing MLE algorithms) often fall on
the boundary of the parameter space. It is not clear whether the instances of boundary
value solutions for the parameters arise due to the choice of initial values or due to model
misspecification, but they could lead to bias and instability in PIN estimates.
Problems with the MLE algorithms have been acknowledged since the early work ap-
plying PIN. For instance, Easley and O’Hara (2004) introduce a modified factorization of
the likelihood function to reduce the effects of such problems. Lin and Ke (2011) address
the problems further by using an alternative factorization of the objective function. How-
ever, Yan and Zhang (2012) show that one still needs to choose initial values for the MLE
maximizer carefully in order to achieve stable results. Thus the estimates are likely to be
dependent on the choice of the initial values used by the optimizer, an issue of concern if
the likelihood surface has several local maxima. In our applications, we observed that the
parameter estimates were often very close to the starting values, suggesting the need for
additional care. Our approach sidesteps these issues by using Bayesian estimation instead.
Several papers use PIN in cross-section or panel regressions (see, e.g. Chen, Goldstein,
and Jiang, 2007; Christoffersen, Goyenko, Jacobs, and Karoui, 2018; Duarte and Young,
2009; Easley et al., 2002; Easley and O’Hara, 2004; Easley, Hvidkjaer, and O’Hara, 2010;
Lai, Ng, and Zhang, 2014; Mohanram and Rajgopal, 2009; Vega, 2006, for just a few.).
When PIN values cannot be calculated, the loss of observations in such regressions can
be significant. Jackson (2013) reports that Easley et al. (2010) “lose firms representing
nearly 24% of the market capitalization of the NYSE and AMEX” while in “Yan and Zhang
(2012), the fraction of market capitalization lost grows from 2% in 1993 to 42% in 2004.”
Lost observations can lead to biased results, especially since larger firms (which have high
numbers of trades) are more likely to be affected.
Our paper contributes to the literature by addressing estimation problems identified in
previous studies. We use the Gibbs Sampling methodology to explore the entire posterior
distribution of the model parameters in a Bayesian analysis, thereby avoiding the numerical
2
instability problem faced by MLE maximizers. The Bayesian method does not require any
ad hoc selection of initial values. This approach also provides a natural way of quantifying
uncertainty in point estimates of the PIN (using credible intervals) from its posterior dis-
tribution. Most available methods and the majority of papers in the literature do not pay
much attention to the uncertainty of PIN estimates. Given the potential for model misspec-
ification (see Gan, Wei, and Johnstone, 2017), it may be useful for researchers to be aware
of the uncertainty surrounding their point estimates of PIN. In addition, our approach leads
to reliable estimation of PIN at daily frequency using only 26 intraday observations, as com-
pared to the usual recommendation in the literature for a minimum of 60 daily observations
resulting in quarterly estimates. Higher frequency estimates offer opportunities to use PIN
for new studies.
The improvement in estimation proposed here matters not simply for its own sake. One
of the debates about PIN is that it does not represent the probability that it purports to
measure (see, e.g. Aktas, De Bodt, Declerck, and Van Oppens, 2007; Duarte and Young,
2009). However, if the bias in PIN is caused by issues such as numerical problems, then the
debate cannot be fully resolved, because the bias will not be well-understood.
The potential to obtain more stable estimates of PIN opens up opportunities to use the
measure in more applications. PIN is usually calculated with the daily numbers of buy and
sell trades over a period of between one quarter and one year. Given the speed of markets, it
would be useful to have, say, a time series of daily estimates of PIN by assuming that news
arrives and is absorbed by markets at shorter intervals than one day. Standard estimation
methods over shorter horizons are considered problematic partly because of the instability
of the estimates, so most studies have been limited to lower frequency cross-sectional appli-
cations of PIN. A notable exception is that of Brennan, Huh, and Subrahmanyam (2018),
who used PIN in an event study setting. In order to achieve this, they first estimate PIN
over a longer period (two months) and then update the estimates using Bayesian updating
on a daily basis. Our approach is more direct and can work over any of the frequencies. In
3
this paper, we demonstrate how our methodology can be used to produce daily estimates of
PIN and its credible intervals over twelve years for five stocks. We also provide quarterly
estimates over the same period, and in both cases, compare the estimates to those obtained
by MLE.
The paper is organized as follows. The next section provides background in the form of
a brief description of the EHO-PIN model along with the MLE procedure. Section 3 details
the Bayesian procedure for estimation. In Section 4, we demonstrate our results using data
on five stocks over a twelve year period. We then briefly conclude.
2. Background
This section provides background on the EHO version of the PIN model and its estimation
under the MLE approach.
2.1. The EHO-PIN Model
In the theoretical market microstructure setting (see e.g. Glosten and Milgrom, 1985),
informed traders act on advantage while uninformed traders buy or sell for reasons other
than the possession of superior knowledge of the fundamental value of the asset. In such a
setting, Easley et al. (1996) and Easley, Kiefer, and O’Hara (1997) estimated models based
on the information structure in Easley and O’Hara (1992). The main idea behind the model
is that an unusual imbalance between buy and sell trades reflects the activity of informed
traders.
The model described by Easley et al. (2002) assumes that within any trading day, the
number of buyer and seller initiated trades from informed and uninformed traders are real-
izations of independent Poisson distributions whose mean depends on whether no news, good
news or bad news occurs on that day (a representation of the model as a probability tree is
provided in Figure 1). We will think about the natural generalisation of the model where
4
the type of news is fixed over intervals at other frequencies, for example over a 15−minute
interval. As a result, we will substitute the phrase trading period instead of trading day in
this paper. The model assumes that the probability of news (good or bad) in any trading
period is α. Given that there is news, the probability that an asset value will be negatively
affected by the news event is δ (which implies that the probability of a positive effect is
1 − δ). In any given trading period, liquidity traders are present in the market to either
buy or sell the asset. In a bad news period, informed traders expect an adverse effect on
the value of the asset, and are therefore likely to sell the asset. The order arrivals in a bad
news period are assumed to follow independent Poisson distributions with means µ for the
informed traders, and λb and λs for liquidity buy and sell traders respectively. This implies
that the total numbers of buyer and seller initiated trades in a bad news period are Poisson
distributed with means λb and λs+µ respectively. Similarly, in a good news period, the total
numbers of buy and sell trades are Poisson distributed with means λb+µ and λs respectively.
Finally, in a no news period, informed traders will not participate and the total numbers of
buys and sells are Poisson distributed with means λb and λs respectively. In practice, we
do not observe the arrival of traders or the occurrence of a news event and these must be
inferred from the observable trade data.
[Insert Figure 1 near here]
2.2. Alternative models
In this paper, we will concentrate on the EHO version of the PIN model since it is widely
used despite any criticism (some of which is cited in the current paper). However, the
Gibbs sampling method used in this paper can be relatively easily extended to other similar
models or model decompositions. For example, the EHO version assumes that informed
trader arrivals are the same for bad and good news event periods. It is possible to relax
this assumption and allow for unequal arrival rates between good and bad news periods.
5
Allowing for possible asymmetry is justified, for instance, by short-sale constraints or other
conditions that restrict the ability of traders to act equally aggressively to both good and
bad news. This can be modelled by assuming that the number of informed buyers and
sellers are Poisson distributed with means µb and µs respectively. A Gibbs sampler for a
Bayesian analysis of this model, which extends our approach to the EHO model, is described
in Appendix C.
2.3. MLE Approach
Let Bt and St be the total numbers of buy and sell trades in trading period t, respectively.
The joint likelihood function for a sample of t = 1, . . . , T , trading periods is given as follows
L (Θ|B,S) =
T∏t=1
[αδe−(µ+λs) (µ+ λs)
St
St!
e−λb (λb)Bt
Bt!+ α (1− δ) e
−(µ+λb) (µ+ λb)Bt
Bt!
e−λs (λs)St
St!
+ (1− α)e−λb (λb)
Bt
Bt!
e−λs (λs)St
St!
], (1)
where Θ = (α, δ, µ, λb, λs), B = (B1, . . . , BT ) and S = (S1, . . . , ST ). Easley et al. (2002)
estimate the vector of parameters Θ by maximizing equation 1. In this model the PIN is
defined as
PIN =αµ
αµ+ λs + λb. (2)
This is the ratio of expected informed trading to expected total trades. When we do not
observe whether a trade was initiated by a buy order or a sell order, classification of trades
as buys and sells can be carried out using the information available, for instance by using a
rule based on the Lee and Ready (1991) trade classification algorithm.
The biased estimates using MLE may simply arise from the size of the numbers that
enter the estimator, leading to “NaN” error codes. As noted by Lin and Ke (2011) and Yan
and Zhang (2012), the daily number of trades has grown large. Therefore it is possible for
6
the likelihood function to produce a number larger (or smaller) than the largest (smallest)
acceptable value of a computer software. However, we also find that smaller numbers of
trades (e.g., when counted at 15−minute intervals) still lead to less stable estimates than
the Bayesian method, when plotted over time. This may be related to the fact that there
are only 26 trading periods of length 15 minutes in a day.
3. The Bayesian Estimation Approach
Our goal is to learn about PIN and its underlying parameters from observed transaction
data. The MLE approach assumes that the model parameters are unknown but fixed. In
Bayesian inference, we express the uncertainty about the unknown model parameters through
the rules of probability. We achieve this through Bayes’ rule which states that the probability
of the parameter set Θ given the observed data is
p(Θ|B, S) =p(B, S,Θ)
p(B, S)
=p(Θ) p(B, S|Θ)
p(B, S)
∝ p(Θ) p(B, S|Θ). (3)
The denominator in Equation 3, p(B, S) =∫p(Θ) p(B, S|Θ)dΘ, is a normalizing constant.
It guarantees that p(Θ|B, S) is a well defined probability density function. The term p(Θ),
referred to as the prior density, is not dependent on the data. It is used to express the prior
knowledge and uncertainty about the model parameters before observing the data. The term
p(B, S|Θ), usually referred to as the likelihood function is the probability density function of
the data conditional on the model parameters. In Bayesian inference, the primary object of
interest is p(Θ|B, S), which is referred to as the posterior density. From the posterior density,
we can compute point estimates like the mean and mode as well as credible intervals for the
model parameters. We employ a Markov Chain Monte Carlo (MCMC) method, namely the
7
Gibbs Sampler, to infer the parameters of the EHO model. The Gibbs Sampler explores the
entire support of the posterior distribution of the model parameters.
3.1. The Gibbs Sampler
The Gibbs Sampler is a Markov chain Monte Carlo algorithm which generates a sample
from the posterior distribution. The algorithm uses the full conditional distribution of the
posterior distribution. If the parameters are θ1, . . . , θk then the full conditional distribution
for θj is p(θj|θ1, . . . , θj−1, θj+1, . . . , θk, y). The algorithm proceeds by updating each parame-
ter (or block of parameters) in turn from its full conditional distribution by sampling a value
from its full conditional distribution (with all other parameters set to their current values).
Each iteration of the Gibbs sampler involves updating all parameters. A summary of the
Gibbs sampler algorithm is as follows:
• Step 0 : Initialize θ(0)i , θ
(0)2 , . . . , θ
(0)k
• Step 1 : Draw once from p(θ1|θ(0)2 , . . . , θ
(0)k , y) to obtain θ
(1)1
• Step 2 : draw once from p(θ2|θ(1)1 , . . . , θ
(0)k , y) to obtain θ
(1)2
• . . .
• Step k : draw once from p(θk|θ(1)1 , . . . , θ
(1)k−1, y) to obtain θ
(1)k
• Repeat Steps 1 to k for say G times to generate G Monte Carlo draws from the full
conditionals.
The posterior expectation of any function of the parameters can be estimated using the
output from the Gibbs sampler using the Monte Carlo average
E[f(θ1, . . . , θk)|y] =1
G−G0
G∑i=G0+1
f(θ
(i)1 , . . . , θ
(i)k
)
where G0 is a burn-in time which is used to remove the dependence of the θ(i)1 , . . . , θ
(i)k on
the initial values.
8
3.2. Joint Density Of Buy And Sell Orders
As per the EHO-PIN model, all trading periods can be classified into three types. Let
this classification be labelled Dt, such that
Dt =
1, bad news, with probability ω1 = αδ
2, good news, with probability ω2 = α(1− δ)
3, no news, with probability ω3 = 1− α.
where ωD is the probability of news type D. The model assumes that, when Dt = 1, informed
traders take a short position and liquidity traders either buy or sell the asset for reasons other
than information. Since only liquidity traders make buy trades, the total number of buy
trades (Bt) is Poisson distributed with mean λb. The numbers of sell trades by informed
traders (Sit) and by liquidity traders (Sut ) follow independent Poisson distributions with
means µ and λs respectively. Thus the total number of sell trades (St = Sit + Sut ) follows
a Poisson distribution with mean µ + λs. Using a similar reasoning for each type of news
event, we can state conditional distributions of the numbers of buy and sell trades as
St|Dt = 1 ∼ Pn (µ+ λs)
Bt|Dt = 1 ∼ Pn (λb)
St|Dt = 2 ∼ Pn (λs)
Bt|Dt = 2 ∼ Pn (µ+ λb)
St|Dt = 3 ∼ Pn (λs)
Bt|Dt = 3 ∼ Pn (λb),
where Pn (.) is the probability mass function of a Poisson random variable.1
The underlying process Dt is not observable but can be inferred from transaction data,
as a missing data problem within the Bayesian framework. Since we do not observe a bad,
good or no news period as well as the arrival of liquidity and informed traders, we employ the
data augmentation procedure to impute these missing observations. We do this by directly
sampling from the posterior distribution of Dt conditional on the available data. For a
detailed review of data augmentation, see Van Dyk and Meng (2001).
1Pn(x; θ) = e−θθx
x!
9
The joint density of buy and sell orders using the data augmentation procedure is
P (Bt, St|Dt,Θ) =
[f1 (Bt, St,Θ)
]dt,1[f2 (Bt, St,Θ)
]dt,2[f3 (Bt, St,Θ)
]dt,3
=
[e−µµS
it
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]dt,1[e−µµB
it
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]dt,2
×
[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]dt,3,
where dt,j = 1{Dt=j}, for j = 1, 2, 3. This is obtained by combining the likelihood functions
for the joint densities of buy and sell orders under each of the realizations of Dt. The
derivations of these joint densities are provided in Appendix B.1.
3.3. Prior Distributions And Gibbs sampler
Since we employ a Bayesian approach to estimating the parameters of the PIN model,
it is important to choose appropriate prior distributions for the parameters and write down
the posterior distribution.
3.3.1. Prior distributions
As α and δ represent probabilities strictly in the interval (0, 1), we choose beta distribu-
tions as their priors. We choose gamma prior distributions for the positive parameters µ, λb
and λs. The prior distributions for parameter set Θ = (α, δ, µ, λs, λb) are
P (α|ρ, φ) = Γ(ρ+φ)Γ(ρ)Γ(φ)
αρ−1(1− α)φ−1,
P (δ|ν, τ) = Γ(τ+ν)Γ(ν)Γ(τ)
δν−1(1− δ)τ−1,
P (µ|γ0, β0) =βγ00
Γ(γ0)µγ0−1e−β0µ,
P (λs|γ1, β1) =βγ11
Γ(β1)λγ1−1s e−β1λs ,
P (λb|γ2, β2) =βγ22
Γ(β2)λγ2−1b e−β2λb .
In our analyses, we used the following values for the hyper-parameters ρ, φ, ν, τ, γ0, γ1, β0, and β1.
10
We set ρ, φ, ν, and τ to a value of 5. This implies a prior value for both α and δ between
zero and one with a mean of 0.5 and a variance of 0.02. Next, γ0, γ1, β0, and β1 are set
equal to 1, indicating that the prior mean and variance for µ, λb and λs are all equal to 1.
3.3.2. Gibbs sampler
The use of conjugate priors allows the Gibbs Sampler to be easily applied to sample from
the posterior distribution. From Bayes’ theorem, the posterior density for the parameter set
Θ = (α, δ, µ, λs, λb) and the classification indicators D = (D1, . . . , DT ) is proportional to the
product of the likelihood and prior. If we denote T1, T2 and T3 as the number of periods
with bad, good and no news arrivals, then the posterior density can be written as
P (Θ, D|B, S) ∝ P (Θ)T∏t=1
[P (Bt, St|Dt,Θ)P (Dt|Θ)
]
= µγ0−1e−β0µλγ1−1s e−β1λsλγ2−1
b e−β2λbαρ−1(1− α)φ−1δν−1(1− δ)τ−1
×
[(αδ)T1(α(1− δ))T2(1− α)T3
]T∏t=1
[e−µµS
it
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]dt,1
×
[e−µµB
it
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]dt,2[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]dt,3. (4)
The full conditional distributions of the parameters of interest are
α ∼ Be(ρ+ T1 + T2, T3 + φ
), (5a)
δ ∼ Be(ν + T1 + T2, T2 + τ
), (5b)
µ ∼ Ga
(γ0 +
T∑t=1
[(Sit)dt,1 + (Bi
t)dt,2 ], T1 + T2 + β0
), (5c)
λs ∼ Ga
(γ1 +
T∑t=1
[Sdt,1t − (Sit)
dt,1 + Sdt,2t + S
dt,3t ], T1 + T2 + T3 + β1
), (5d)
λb ∼ Ga
(γ2 +
T∑t=1
[Bdt,1t +Bdt,2
t − (Bit)dt,2 +B
dt,3t ], T1 + T2 + T3 + β2
), (5e)
11
where T = T1 + T2 + T3 is the total number of trading periods in the sample. In the
above equations, Be(.) and Ga(.) denotes the beta and gamma probability density functions
respectively. In appendix B.2, we provide the derivation of the full conditional distributions.
The steps of the Gibbs sampler to estimate the parameters in our PIN model are
• Start with classification D(0) of (Bt, St)
• Initialize the parameters Θ(0)=(α(0), δ(0), λ
(0)s , λ
(0)b , µ(0)
)• Repeat for k = 1 to G sweeps
– Update µ(k)|λ(k−1)s , λ
(k−1)b , α(k−1), δ(k−1), B
i(k)t , S
i(k)t
– Update λ(k)s |λ(k−1)
b , α(k−1), δ(k−1), Bi(k)t , S
i(k)t , µ(k)
– Update λ(k)b |α(k−1), δ(k−1), B
i(k)t , S
i(k)t , µ(k), λ
(k)s
– Update α(k)|δ(k−1), Bi(k)t , S
i(k)t , µ(k), λ
(k)s , λ
(k)b
– Update δ(k)|Bi(k)t , Si(k)t , µ(k), λ
(k)s , λ
(k)b , α(k)
– Compute L1 = logω(k)1 −
(µ(k) + λ
(k)s + λ
(k)b
)+Bt log λ
(k)b + St log
(λ(k)s + µ(k)
)
– Compute L2 = logω(k)2 −
(µ(k) + λ
(k)s + λ
(k)b
)+ St log λ
(k)s +Bt log
(λ(k)b + µ(k)
)
– Compute L3 = logω(k)3 −
(λ(k)s + λ
(k)b
)+ St log λ
(k)s +Bt log λ
(k)b
– compute χ = max (L1, L2, L3)
– Compute p1 = eL1−χ
3∑j=1
eLj−χ, p2 = eL2−χ
3∑j=1
eLj−χand p3 = eL3−χ
3∑j=1
eLj−χ
– Update Dt(k), the classification of (Bt, St) by sampling from the multinomial distribution with
probability (p1, p2, p3),
where p1, p2 and p3 are the probabilities that at the beginning of the trading period there
will be bad news, good news and no news respectively.
The algorithm yields G random samples drawn from the posterior distributions of the
parameters α, δ, µ, λs, λb, and of PIN. Discarding say G0 initial draws of each posterior
12
sample and taking the average of the remaining, we obtain the estimated central value of
each parameter. As we also have the posterior distributions of the parameters and of PIN,
we can easily view the uncertainty around the PIN estimate as well.
In Appendix A, we provide a description of the BayesPIN toolbox for MATLAB that
accompanies this paper online. This toolbox implements the Gibbs Sampler algorithm for a
given input of numbers of buy and sell trades.
We next demonstrate the procedure on real data and discuss the advantages of our
approach.
4. Empirical Illustration
In order to demonstrate the methodology, we estimate PIN for five stocks on a daily basis
over the period 5th January 2004 to 31st December 2015. We also estimate the quarterly
PIN using daily counts of buy and sell trades. To compare the estimates with those from an
MLE package, we use the InfoTrad package available in R (Celik and Tinic, forthcoming).
4.1. Data
The five stocks are IBM, Coca Cola (KO), Boeing (BA), Walt Disney (DIS), and Exxon
Mobil Corporation (XOM), all large highly-traded stocks, but from different industries. We
use millisecond time stamped National Best Bid Offer (NBBO) quotes and trades from
TickData, a high frequency data vendor with expertise on generating NBBO data. Holden
and Jacobsen (2014) argue that NBBO data is likely to contain fewer errors compared with
raw quotes from individual exchanges. The data cover a total of 3, 020 trading days.
In terms of data cleaning procedures, we follow Korajczyk and Sadka (2008) by excluding
all transactions that occurred outside the normal trading hours as well as weekend trades. We
also remove all transactions that had negative prices as well as negative prevailing spreads.
We further exclude all cases where the transaction price was higher (lower) than the ask
13
(bid) price by more than 50 times the tick size ($0.01), or 50 cents.
The required input for estimating PIN is a sequence of numbers of buy and sell transac-
tions at the chosen trading period frequency. We use the Lee and Ready (1991) algorithm
to classify the individual transactions into buyer and seller initiated trades, which we then
aggregate over 15−minute intervals or daily intervals as required.
In Table 1, we provide a summary of the number of buyer and seller initiated trades for
the assets over the two intervals. It can be observed that the daily buyer and seller initi-
ated trades for these assets are large enough to potentially cause floating point exceptions.
Corresponding scatter plots of numbers of buy and sell trades are shown in Figure 2 for the
daily frequency and in Figure 3 for the 15−minute frequency. The median number of buys
exceeds the median number of sells for all of the stocks. The median number of buy trades
ranges between 12, 201 and 35, 914 across the 5 stocks at the daily level, and between 370
and 1, 108 at the 15-minute frequency.
[Insert Figures 2 and 3 near here]
In our data set there are transactions on each trading day, though when sampling at
15−minute intervals there are some periods in which either there is no buy or no sell (but
not both). In order for the algorithm to work, there should be a minimum of 1 trade (buy
or sell) in every trading period.
4.2. Daily PIN estimation results
For each day, we estimate the PIN using trading periods of 15 minutes. We summarize
and plot both the MLE benchmark and the Bayesian estimates below. For the Gibbs sampler,
we chose the number of sweeps (G) to be 35, 000 with a burn-in (G0) of 5, 000.
As a benchmark, we provide the MLE estimates using the R package InfoTrad (see Celik
and Tinic, forthcoming) in Table 2. We use the Yan and Zhang (2012) grid search approach
with likelihood function factorization proposed by Lin and Ke (2011). The corresponding
14
Asset Sampling Freq. Min Median Mean Max
BA Daily Buys 790 12,201 12,720 109,215Sells 652 11,230 11,810 103,212
15 mins Buys 0 370 478 15,866Sells 0 342 444 16,091
DIS Daily Buys 1,090 19,189 20,200 186,073Sells 990 16,543 17,440 138,201
15 mins Buys 0 590 758 31,128Sells 0 508 654 28,220
IBM Daily Buys 1,585 12,954 14,043 76,662Sells 1,523 12,840 13,830 66,630
15 mins Buys 0 409 541 11,427Sells 0 406 533 12,478
KO Daily Buys 1,343 20,380 20,682 112,880Sells 1,302 17,771 17,931 117,510
15 mins Buys 0 611 771 19,397Sells 0 527 668 18,277
XOM Daily Buys 1,730 35,914 39,822 260,5340Sells 1,600 31,310 35,652 245,164
15 mins Buys 0 1,108 14,637 27,425Sells 0 972 13,105 24,066
Table 1: Numbers of buy and sell trades counted at two frequencies
results for the Bayesian approach are provided in Table 3.
In our sample, the MLE approach of Yan and Zhang (2012) with the Lin and Ke (2011)
factorization is successful in producing a PIN estimate for each day. However, the proportion
of days on which δ is estimated as 0 or 1 (corner solutions) varies between 36% for DIS and
54% for XOM. This is despite the fact that the size of inputs (numbers of buy and sell trades
over 15−minute intervals) that are used for daily estimation is much smaller than those
used over the typical two-three month period (using daily counts). In contrast, the Bayesian
results show no trading day with corner solutions for δ, as expected.
We can review the results plotted over time (Figure 4), and also as a histogram (Figures
5 and 6). In each of these figures, the data we are plotting are the daily estimates of PIN
for each of the stocks. It is important to note that the daily estimates are independent of
each other. Yet, plotted as a time series, the contrast between the stability and smoothness
15
Parameter Statistic BA DIS IBM KO XOMα Min 0.034 0.034 0.037 0.036 0.034
Median 0.296 0.321 0.269 0.321 0.357Mean 0.319 0.339 0.284 0.341 0.426Max 0.996 0.997 0.875 0.999 0.998
δ Min 0.000 0.000 0.000 0.000 0.000Median 0.200 0.182 0.240 0.200 0.000Mean 0.369 0.337 0.378 0.348 0.257Max 1.000 1.000 1.000 1.000 1.000
µ Min 29.1 31.7 38.2 27.2 53.2Median 498.2 708.7 552.7 769.1 1372.8Mean 588.9 862.9 666.6 914.6 1648.3Max 7969.1 25658.4 5187.1 14589.4 13859.6
λs Min 0.0 0.0 1.0 0.0 0.0Median 366.4 547.2 433.5 574.6 1023.1Mean 387.1 586.1 476.6 594.6 1187.0Max 3822.6 5315.4 2449.5 4327.1 9429.4
λb Min 0.0 0.0 0.7 0.0 0.0Median 360.9 544.6 417.3 557.0 887.8Mean 381.9 581.1 453.8 579.3 1012.9Max 2335.3 5580.8 2730.8 3818.0 8406.7
PIN Min 0.024 0.024 0.021 0.023 0.022Median 0.132 0.136 0.123 0.138 0.154Mean 0.143 0.150 0.124 0.151 0.198Max 0.499 0.499 0.451 0.499 0.499
Table 2: Summary of maximum likelihood estimates over the 3020 day sample period
of the two sets of estimates is clearly noticeable.
[Insert Figures 4, 5 and 6 near here]
Finally, we can quantify the daily variation in PIN estimates by calculating the size of the
daily changes over the entire sample. We calculate the absolute value of the daily changes
in PIN estimates for each stock, and then report the median, mean and variance of these
changes in Table 4. The table confirms the relative stability of the Bayesian PIN estimates.
The average (median or mean) size of the daily change in MLE PIN estimates is between
36% and 80% larger than that of the Bayesian PIN estimates. The variance of these changes
is between 10% and 150% larger for the MLE estimates relative to the Bayesian estimates.
16
Parameter Statistic BA DIS IBM KO XOMα Min 0.166 0.163 0.192 0.182 0.183
Median 0.421 0.444 0.414 0.436 0.448Mean 0.430 0.445 0.415 0.443 0.469Max 0.838 0.818 0.724 0.838 0.838
δ Min 0.137 0.138 0.162 0.139 0.137Median 0.414 0.401 0.425 0.409 0.318Mean 0.450 0.437 0.455 0.441 0.404Max 0.863 0.861 0.839 0.862 0.862
µ Min 21.6 21.9 28.7 20.9 30.2Median 342.3 490.0 367.8 545.0 934.2Mean 377.9 548.7 409.3 581.4 1056.9Max 7149.8 15423.3 3336.9 6633.3 7422.6
λs Min 0.5 0.5 1.3 1.0 0.5Median 343.4 512.5 413.6 532.5 946.9Mean 366.8 549.6 451.3 558.6 1114.3Max 2122.0 5118.3 2358.9 4191.9 9079.8
λb Min 0.7 0.6 1.0 0.9 0.4Median 349.1 534.5 395.3 557.7 938.5Mean 373.7 584.6 436.2 583.6 1096.9Max 3900.7 5744.1 2629.6 3676.5 7980.1
PIN Min 0.071 0.059 0.057 0.067 0.064Median 0.157 0.152 0.140 0.156 0.162Mean 0.165 0.161 0.145 0.164 0.182Max 0.538 0.563 0.535 0.598 0.631
Table 3: Summary of Bayesian estimates over the 3020 day sample period
Statistic Method BA DIS IBM KO XOMMedian MLE 0.0498 0.0520 0.0401 0.0520 0.0744
BayesPin 0.0329 0.0309 0.028 0.0306 0.0413Mean MLE 0.0717 0.0785 0.0511 0.0774 0.1194
BayesPin 0.0465 0.0460 0.0375 0.0445 0.0709Variance MLE 0.0061 0.0080 0.0021 0.0071 0.0139
BayesPin 0.0029 0.0032 0.0019 0.0030 0.0075
Table 4: Comparison of daily absolute changes in PIN estimates
4.2.1. Patterns of difference between MLE and PIN estimates
In order to understand the differences between the daily estimates from the MLE and
Bayes algorithms, we plot the histogram of these differences. One potential source of differ-
17
ence identified by the literature is the occurrence of corner solutions in parameter estimates.
To evaluate this source, we split the observations into two groups - one for which the MLE
estimate of δ is either 0 or 1, the other for which it is not on the boundary. Looking at the
histograms for the five assets in Figure 7, we can see that the differences for the cases with no
boundary solutions are evenly centered around 0. On the other hand, the boundary solution
cases have a relatively higher proportion of extreme differences between the two estimates,
and appear to be centred to the right of zero.
4.3. Quarterly PIN estimation results
In order to estimate a series of quarterly PIN values, we use aggregated buy and sell
numbers over a day for all trading days in each calendar quarter. While large scale studies
involving PIN use annual estimates (see, e.g., Easley et al., 2002), other applications use
frequencies as low as quarterly (see, e.g., Christoffersen et al., 2018). The first PIN estimate is
for January - March 2004, which uses all the trading days up to and including March 31, 2004.
This procedure leads to 48 quarterly PIN estimates. Tables 5 and 6 provide summaries of the
quarterly PIN estimates computed using the MLE and BayesPin approaches respectively.
The proportion of the quarterly MLE PIN estimates in which the estimate of δ is a corner
solution (either 0 or 1) varies between 40% for BA and 71% for XOM. These proportions are
higher than in the daily case, potentially because the size of the aggregate numbers of buys
and sells is much larger over the course of a day, leading to more computational issues.
As in the daily case, we plot the two series together over time in Figure 8, for each of
the assets. In the figure, we also plot the 95% credible interval of PIN using the Bayesian
approach. Although the two series agree on the direction of changes in many cases, we can
once again see that the Bayesian estimates are relatively more stable. This observation is
confirmed by Table 7, in which we compare absolute changes and variance over time of the
two sets of estimates for each stock.
More so, Figure 8 suggests that there are certain quarters when the MLE estimates are
18
Parameter Statistic BA DIS IBM KO XOMα Min 0.016 0.016 0.033 0.115 0.109
Median 0.315 0.276 0.254 0.347 0.429Mean 0.295 0.277 0.257 0.361 0.415Max 0.500 0.625 0.519 0.812 0.828
δ Min 0.000 0.000 0.000 0.000 0.000Median 0.095 0.043 0.179 0.046 0.000Mean 0.316 0.224 0.324 0.145 0.152Max 1.000 1.000 1.000 1.000 1.000
µ Min 611.8 712.8 867.6 490.6 799.7Median 8026.9 11705.9 10185.6 11537.3 16318.6Mean 10454.9 16794.3 10045.9 11308.8 19910.7Max 97219.9 87783.8 39087.5 29440.3 82374.8
λs Min 1972.6 2321.4 3046.1 2271.5 3479.3Median 12318.0 19021.5 12750.1 19286.9 32378.2Mean 11236.0 17332.0 12743.4 16988.1 32553.8Max 23365.9 39865.2 30985.4 35387.4 87843.1
λb Min 1819.4 2246.8 3261.6 2187.8 3709.4Median 12050.9 18133.0 13964.0 20088.6 30913.1Mean 11079.8 16990.5 13041.8 17669.1 34944.9Max 25292.3 42304.8 33056.2 41278.9 130782.6
PIN Min 0.010 0.011 0.021 0.047 0.055Median 0.096 0.090 0.085 0.105 0.105Mean 0.099 0.093 0.085 0.115 0.120Max 0.170 0.195 0.148 0.305 0.310
Table 5: Summary of quarterly maximum likelihood estimates over sample period
not only outside the credible interval, they also have sharp drops to values near zero or sharp
increases. It is, however, reassuring that the two methods do tend to move similarly except
in the extreme cases (most visible in the plots for BA and DIS).
[Insert Figure 8 near here ]
19
Parameter Statistic BA DIS IBM KO XOMα Min 0.095 0.127 0.148 0.267 0.243
Median 0.386 0.397 0.359 0.426 0.469Mean 0.384 0.389 0.364 0.437 0.480Max 0.645 0.608 0.743 0.675 0.785
δ Min 0.096 0.101 0.083 0.092 0.089Median 0.254 0.229 0.283 0.233 0.141Mean 0.407 0.298 0.374 0.306 0.268Max 0.875 0.813 0.874 0.904 0.921
µ Min 497.6 542.4 619.4 391.4 635.9Median 6563.9 9328.7 7110.6 9428.8 13259.9Mean 7177.2 11032.8 6963.8 9241.1 16357.1Max 33812.3 73093.7 17642.8 25488.2 74224.0
λs Min 1784.6 2233.1 3203.2 2138.2 3596.1Median 11311.2 18008.8 13560.5 19047.4 32131.7Mean 10604.2 16661.1 12911.4 17067.56 32836.9Max 22570.1 41654.5 32523.0 40643.4 99748.3
λb Min 1862.8 2281.1 2664.7 2223.3 3465.4Median 12032.8 17821.2 12848.0 19313.9 30100.0Mean 11254.6 16753.1 12320.9 17006.2 33462.3Max 25927.1 39121.7 27208.7 32818.9 135255.5
PIN Min 0.052 0.054 0.040 0.043 0.045Median 0.086 0.087 0.079 0.081 0.090Mean 0.094 0.095 0.081 0.095 0.095Max 0.177 0.188 0.152 0.195 0.223
Table 6: Summary of quarterly Bayesian estimates over sample period
Statistic Method BA DIS IBM KO XOMMedian MLE 0.0277 0.0375 0.0261 0.0422 0.0367
BayesPin 0.0185 0.0201 0.0129 0.0242 0.0255Mean MLE 0.0329 0.0444 0.0301 0.0526 0.0522
BayesPin 0.0233 0.0297 0.0201 0.0391 0.0363Variance MLE 0.0006 0.0014 0.0004 0.0027 0.0027
BayesPin 0.0004 0.0009 0.0004 0.0013 0.0012
Table 7: Comparison of absolute changes in quarterly PIN estimates
20
5. Conclusion
We have shown a Bayesian method that can provide estimates of PIN using small data
sets at higher frequency, as well as over longer periods for stocks with very large numbers
of trades. This method avoids the non-convergence and other computational problems of
optimization functions underlying MLE routines. We know from previous literature that
two of the challenging parameters to estimate using MLE are α (the probability of news
arrival) and δ (the probability that the news is bad). We have also found that the MLE
algorithm gives us a boundary value of either zero or one in a considerable number of cases,
particularly for δ. The Bayesian methodology, on the other hand, does not suffer from
this corner solution problem. In the Bayesian approach, there is also no need for a careful
selection and specification of initial values as is known to be essential in the MLE approach.
Furthermore, in the empirical illustrations, we demonstrated that the Bayesian estimation
approach yielded relatively stable results, even at a daily frequency using intraday data
sampled over only 26 trading periods of 15−minutes each. The ability to estimate PIN in
this manner offers new opportunities to apply the measure in studies involving time-varying
information asymmetry risk.
6. Acknowledgements
We are grateful to David Veredas for valuable feedback and to conference participants at
the Paris Microstructure Conference (Dec 2014) and Computational and Financial Econo-
metrics conference (Dec 2015) for helpful comments. We gratefully acknowledge financial
support to Samuel Oduro from the University of Kent Vice Chancellor’s 50th Anniversary
Scholarship.
21
References
Aktas, N., De Bodt, E., Declerck, F., Van Oppens, H., 2007. The PIN anomaly around M&A
announcements. Journal of Financial Markets 10, 169–191.
Boehmer, E., Grammig, J., Theissen, E., 2007. Estimating the probability of informed trad-
ing: Does trade misclassification matter? Journal of Financial Markets 10, 26–47.
Brennan, M. J., Huh, S.-W., Subrahmanyam, A., 2018. High-frequency measures of informed
trading and corporate announcements. The Review of Financial Studies 31, 2326–2376.
Celik, D., Tinic, M., forthcoming. Infotrad: An R package for estimating the probability of
informed trading. The R Journal .
Chen, Q., Goldstein, I., Jiang, W., 2007. Price informativeness and investment sensitivity to
stock price. The Review of Financial Studies 20, 619–650.
Christoffersen, P., Goyenko, R., Jacobs, K., Karoui, M., 2018. Illiquidity premia in the equity
options market. The Review of Financial Studies 31, 811–851.
Duarte, J., Young, L., 2009. Why is PIN priced? Journal of Financial Economics 91, 119–138.
Easley, D., Hvidkjaer, S., O’Hara, M., 2002. Is information risk a determinant of asset
returns? The Journal of Finance 57, 2185–2221.
Easley, D., Hvidkjaer, S., O’Hara, M., 2010. Factoring information into returns. Journal of
Financial and Quantitative Analysis 45, 293–309.
Easley, D., Kiefer, N. M., O’Hara, M., 1997. One day in the life of a very common stock.
Review of Financial Studies 10, 805–835.
Easley, D., Kiefer, N. M., O’Hara, M., Paperman, J. B., 1996. Liquidity, information, and
infrequently traded stocks. The Journal of Finance 51, 1405–1436.
22
Easley, D., O’Hara, M., 1987. Price, trade size, and information in securities markets. Journal
of Financial Economics 19, 69–90.
Easley, D., O’Hara, M., 1992. Adverse selection and large trade volume: The implications
for market efficiency. Journal of Financial and Quantitative Analysis 27, 185–208.
Easley, D., O’Hara, M., 2004. Information and the cost of capital. The Journal of Finance
59, 1553–1583.
Gan, Q., Wei, W. C., Johnstone, D., 2017. Does the probability of informed trading model
fit empirical data? Financial Review 52, 5–35.
Glosten, L. R., Milgrom, P. R., 1985. Bid, ask and transaction prices in a specialist market
with heterogeneously informed traders. Journal of Financial Economics 14, 71–100.
Holden, C. W., Jacobsen, S., 2014. Liquidity measurement problems in fast, competitive
markets: expensive and cheap solutions. The Journal of Finance 69, 1747–1785.
Jackson, D., 2013. Estimating PIN for firms with high levels of trading. Journal of Empirical
Finance 24, 116–120.
Korajczyk, R. A., Sadka, R., 2008. Pricing the commonality across alternative measures of
liquidity. Journal of Financial Economics 87, 45–72.
Kyle, A., 1985. Continuous auctions and insider trading. Econometrica: Journal of the
Econometric Society pp. 1315–1335.
Lai, S., Ng, L., Zhang, B., 2014. Does PIN affect equity prices around the world? Journal
of Financial Economics 114, 178–195.
Lee, C. M. C., Ready, M. J., 1991. Inferring trade direction from intraday data. The Journal
of Finance 46, 733–46.
23
Lei, Q., Wu, G., 2005. Time-varying informed and uninformed trading activities. Journal of
Financial Markets 8, 153–181.
Lin, W. H.-W., Ke, W.-C., 2011. A computing bias in estimating the probability of informed
trading. Journal of Financial Markets 14, 625–640.
Mohanram, P., Rajgopal, S., 2009. Is PIN priced risk? Journal of Accounting and Economics
47, 226–243.
Van Dyk, D. A., Meng, X.-L., 2001. The art of data augmentation. Journal of Computational
and Graphical Statistics 10, 1–50.
Vega, C., 2006. Stock price reaction to public and private information. Journal of Financial
Economics 82, 103 – 133.
Yan, Y., Zhang, S., 2012. An improved estimation method and empirical properties of the
probability of informed trading. Journal of Banking & Finance 36, 454–467.
24
Appendix A. The BayesPin Toolbox
In what follows we provide details of the BayesPin toolbox written in Matlab that
accompanies this paper. The toolbox calculates PIN based on the Easley et al. (2002), and
Easley et al. (1996) models. The command for invoking the toolbox is
BayesPin(trades,model,sweeps,burnin,confidence).
The inputs of the toolbox are described as follows
• trades: a dataframe holding the aggregate buy and sell trades.
• model: Either of the Easley et al. (1996), Easley et al. (2002) models (i.e EHKOP96,EHO2002).
• sweeps (optional): the number of iterations for the Gibbs Sampler (the default is set
at 11, 000). This has to be large enough to ensure convergence of the Markov Chain.
In our example, we used 10, 000.
• burnin (optional): This is the number of initial iterations for which the parameter
draws should be discarded. This is to ensure that we keep the draws at the point where
the MCMC has converged to the parameter space in which the parameter estimate is
likely to fall. This figure must always be less than the sweeps (the default is set at
1, 000).
• confidence (optional): A number indicating the level of tolerance for computation
of credible interval (the default is 5 for a 95% credible interval)
The output of the toolbox will be a list of the following
• The posterior estimates of the model parameters and the PIN.
• The standard deviations of the posterior draws of parameters and PIN
• The lower credible limit of the posterior distribution for each parameter and PIN.
• The upper credible limit of the posterior distribution for each parameter and PIN.
We provide below an illustrative example of daily parameter estimates and PIN for the
Easley et al. (2002) model.
25
A.1. Matlab Implementation
In what follows it is assumed that the user of the toolbox have saved the folders of the
toolbox into a personal folder with sub-folder Estimates.
%#########################################################################
% EXAMPLE : Calculation of Cross-Sectional PIN
%#########################################################################
clc
dir=[pwd,’\’,mfilename]; %Working directory
cd (dir) %Change to working directory
%import aggregated buy and sell trades
orders = importdata(‘sampleData.txt’);
model = ‘EHO2002’; % PIN Model of interest (i.e EKOP96, EHO2002)
sweeps = 10000; %Specify number of iterations
burnin = 1000; %This has to be smaller than sweeps
confidence = 5; %Confidence level for credible interval
est = BayesPin(orders,model,sweeps,burnin,confidence);
%View the estimates
c = cell2table(est(2,:))
c.Properties.VariableNames = est(1,:)
%Save summary of posterior estimates over estimation period
filename = strcat(dir,‘\Estimates\’,model,‘_CrossSection_PIN’,‘.xls’);
xlswrite(filename,est);
26
Appendix B. Derivation of joint density and full con-
ditionals
In this appendix, we provide the derivations of the posterior density and full conditionals
for the parameters.
B.1. Joint density
First, we write down the the joint densities of buy and sell orders conditional on the
realization of Dt.
Bad News Event (Dt = 1)
The model assumes that the number of sell trades from informed traders and the total
number of sell trades are both generated from Poisson distributions. It can easily be shown
that conditioning on the total number of sell trades, the number of sell trades by informed
traders follows a binomial distribution with St trials and success probability µ/µ+λs. Sell
trades initiated by uninformed traders are then calculated as Sut = St−Sit . The trade arrival
distributions in the event of bad news are therefore given as
St|Dt = 1 ∼ Pn (µ+ λs)
Sit |St, Dt = 1 ∼ Bin
(St,
µµ+λs
).
Bt|Dt = 1 ∼ Pn (λb) ,
The probability of the numbers of different types of buy or sell trades in the event of bad
27
news is
f1
(Bt, S, S
it ,Θ)
= P(Bt, St, S
it |Dt = 1,Θ
)=P (Bt|Dt = 1,Θ)P
(Sit |St, Dt = 1,Θ
)P (St|Dt = 1,Θ)
=
(StSit
)e−(µ+λb+λS)
Bt!St!λbBtλS
St
(µ
λS
)Sit=e−λb (λb)
Bt
Bt!
(StSit
)(µ
µ+ λS
)Sit( λSµ+ λS
)St−Sit e−µ+λSt (µ+ λS)St
St!
=
(StSit
)e−(µ+λb+λS)
Bt!St!λbBtλS
St−SitµSit
=e−µµS
it
Sit !
e−λbλBtbBt!
e−λSλSt−SitS
(St − Sit)!. (6)
As expected, Equation 6 is a product of Poisson processes for the buy trades and sell trades
by uninformed traders, and the sell trades by informed traders.
Good News Event (Dt = 2)
Using similar arguments to those above, the distributions of the numbers of different types
of trades in the event of good news can be written as follows
Bt|Dt = 2 ∼ Pn (µ+ λb),
Bit|Bt, Dt = 2 ∼ Bin
(Bt,
µµ+λb
).
St|Dt = 2 ∼ Pn (λs) ,
The probability of the numbers of different types of buyer or seller initiated trades is
f2 (Bt, St,Θ) =P (Bt, St|Dt = 2,Θ)
=P (St|Dt = 2,Θ)P(Bit|Bt, Dt = 2,Θ
)P (Bt|Dt = 2,Θ)
=e−µµB
it
Bit!
e−λsλStsS!
e−λbλBt−Bitb
(Bt −Bit)!. (7)
No News Event (Dt = 3)
28
With the assumption that there is no informed trader activity during a “no news” period,
all trades are attributable to uninformed traders. Hence we have Bt|Dt = 3 ∼ Pn (λb) and
St|Dt = 3 ∼ Pn (λs) as the distributions of the number of buys and sells respectively. The
probability of the number of buyer and seller initiated trades is
f3 (Bt, St,Θ) =P (Bt, St|Dt = 3,Θ)
=P (St|Dt = 3,Θ)P (Bt|Θ)P (Bt|Dt = 3,Θ)
=e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!. (8)
Putting Equations 6, 7 and 8 together, we obtain the joint probability function
P (Bt, St|Dt,Θ) =
[f1 (Bt, St,Θ)
]dt,1[f2 (Bt, St,Θ)
]dt,2[f3 (Bt, St,Θ)
]dt,3
=
[e−µµS
it
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]dt,1[e−µµB
it
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]dt,2
×
[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]dt,3, (9)
B.2. Derivation of Posterior Distributions
Posterior Density
In Bayesian inference, the posterior distribution is proportional to the product of the
prior distribution and the likelihood function of the parameters. Using the prior distributions
provided in Section 3.3.1 and the likelihood function, we obtain the posterior density of buys
and sells:
29
P (Dt,Θ|Bt, St) is proportional to
P (Θ)
T∏t=1
[P (Bt, St|Dt,Θ)P (Dt|Θ)
]
= P (Θ)T∏t=1
[[αδe−µµS
it
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]d1[α(1− δ)e
−µµBit
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]d2
×
[(1− α)
e−λs (λs)St
St!
e−λb (λb)Bt
Bt!
]d3]
= P (Θ)
[(αδ)T1(α(1− δ))T2(1− α)T3
]
×T∏t=1
[e−µµS
it
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]d1[e−µµB
it
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]d2[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]d3
= µγ0−1e−β0µλγ1−1s e−β1λsλγ2−1
b e−β2λbαρ−1(1− α)ε−1δν−1(1− δ)τ−1
[(αδ)T1(α(1− δ))T2(1− α)T3
]
×T∏t=1
[e−µµS
it
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]d1[e−µµB
it
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]d2[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]d3(10)
Full Conditional Distributions
In order to use the Gibbs sampler we need the full conditionals of the parameters of interest.
In order to use the Gibbs sampler we need the full conditionals of the parameters of interest,
Θ = (α, δ, µ, λs, λs). Using Equation 10 we derive below the full conditional distributions of
the parameters.
30
Full Conditonal for informed trader arrivals parameter µ
P (µ|.) ∝ µγ0−1e−β0µT∏t=1
[e−µµS
it
Sit !
]d1[e−µµB
it
Bit!
]d2
∝ e−µ(T1+T2)µγ0−1e−β0µµ
T∑t=1
(Sid1t +B
id2t
)T∏t=1
[Sit !
]d1[Bit!
]d2∝ e−µ(T1+T2)µγ0−1e−β0µµ
T∑t=1
(Sid1t +B
id2t
)
∝ e−(T1+T2+γ0)µµ
T∑t=1
(Sid1t +B
id2t
)+β0−1
(11)
This is the kernel of a gamma distribution hence the full conditional distribution of µ
can be written as µ ∼ Ga
(γ0 +
T∑t=1
(Sid1jt +Bid2
jt
), T1 + T2 + β0
).
Full Conditonal for uninformed buy trader arrivals parameter λb
P (λb|.) ∝ λβ2−1b e−γ2λb
T∏t=1
[e−λbλBtbBt!
]d1[e−λbλ
Bt−Bitb
(Bt −Bit)!
]d2[e−λb (λb)
Bt
Bt!
]d3
∝ λγ2−1b e−β2λbe−(T1+T2+T3)λbλ
T∑t=1
(Bd1t +B
d2t −B
id2t +B
d3t
)b∏T
t=1
[Bt!
]d1[(Bt −Bi
t)!
]d2[B!
]d3∝ e−(T1+T2+T3+β2)λbλ
T∑t=1
(Bd1t +B
d2t −B
id2t +B
d3t
)+γ2−1
b (12)
As before this full conditional distribution has the kernel of a gamma distribution therefore
λb ∼ Ga
(T∑t=1
(Bd1t +Bd2
t −Bid2t +Bd3
t
)+ γ2, T1 + T2 + T3 + β2
).
Full Conditonal for uninformed sell trader arrivals parameter λs
31
P (λs|.) ∝ λβ1−1s e−γ1λs
T∏t=1
[e−λsλ
St−Sits
(St − Sit)!
]d1[e−λsλStsSt!
]d2[e−λs (λs)St
St!
]d3
=λβ1−1s e−γ1λse−(T1+T2+T3)λsλ
T∑t=1
(Sd1t −S
id1t +S
d2t +S
d3t
)s∏T
t=1
[(St − Sit)!
]d1[St!
]d2[St!
]d3∝ e−(T1+T2+T3+γ1)λsλ
T∑t=1
(Sd1t −S
id1t +S
d2t +S
d3t
)+β1−1
s (13)
The expression in equation 13 is the kernel of a gamma distribution therefore the full condi-
tional distribution of λs ∼ Ga
(T∑t=1
(Sd1t − Sid1t + Sd2t + Sd3t
)+ γ1, T1 + T2 + T3 + β1
).
Full Conditonal for news arrival parameter α
The full conditional distribution for α, P (α|.), is derived as follows
P (α|.) ∝ αT1αT2(1− α)T3αρ−1(1− α)ε−1
∝ αT1+T2+ρ−1(1− α)T3+ε−1, (14)
which is the kernel of a beta distribution. Hence the full conditional distribution of α ∼
Be(ρ+ T1 + T2, T3 + ε
).
Full Conditonal for news impact parameter δ
Similarly, the full conditional distribution for δ is derived as follows :
P (δ|.) ∝ δT1(1− δ)T2δν−1(1− δ)τ−1
∝ δT1+T2+ν−1(1− δ)T2+τ−1. (15)
32
The expression in Equation 15 is the kernel of a beta distribution so
δ ∼ Be(ν + T1 + T2, T3 + τ
)
It can be observed that the estimates of α and δ are highly dependent on the correct
classification of news event period.
33
Appendix C. Extended PIN model
In this appendix, we derive the necessary quantities to estimate an extended version of
the EHO-PIN model, in which we allow for unequal arrival rates of informed traders between
good and bad news periods.
If we allow informed buyer and seller arrival rates to follow Poisson distributions with
means µb and µs respectively and keep all other assumptions the same, the model can be
estimated based on the following modified approach.
The likelihood function can be stated as
L (Θ|B,S) =
T∏t=1
[αδe−(µs+λs) (µs + λs)
St
St!
e−λb (λb)Bt
Bt!+ α (1− δ) e
−(µb+λb) (µb + λb)Bt
Bt!
e−λs (λs)St
St!
+ (1− α)e−λb (λb)
Bt
Bt!
e−λs (λs)St
St!
](16)
and the PIN becomes
PIN =α(µs + µb)
α(µs + µb) + λs + λb. (17)
This likelihood function is due to the following probabilities of the number of buy or sell
trades under the different types of news:
f1
(Bt, S, S
it ,Θ)
= P(Bt, St, S
it |Dt = 1,Θ
)=e−µsµs
Sit
Sit !
e−λbλBtbBt!
e−λStλSt−SitSt
(St − Sit)!, (18)
f2 (Bt, St,Θ) =P (Bt, St|Dt = 2,Θ)
=P (St|Dt = 2,Θ)P(Bit|Bt, Dt = 2,Θ
)P (Bt|Dt = 2,Θ)
=e−µbµb
Bit
Bit!
e−λsλStsS!
e−λbλBt−Bitb
(Bt −Bit)!, (19)
34
and
f3 (Bt, St,Θ) =P (Bt, St|Dt = 3,Θ)
=P (Bt|Dt = 3,Θ)P (St|Dt = 3,Θ)
=P (St|Dt = 3,Θ)P (Bt|Θ)P (Bt|Dt = 3,Θ)
=e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!. (20)
for bad, good and no news periods respectively. The joint probability function therefore
becomes
P (Bt, St|Dt,Θ) =
[f1 (Bt, St,Θ)
]dt,1[f2 (Bt, St,Θ)
]dt,2[f3 (Bt, St,Θ)
]dt,3
=
[e−µsµs
Sit
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]dt,1[e−µbµb
Bit
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]dt,2
×
[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]dt,3. (21)
Keeping the same prior distributions for α, δ, λs, λb, we choose P (µs|γ0, β0) =βγ00
Γ(γ0)µγ0−1s e−β0µs
and P (µb|γ3, β3) =βγ33
Γ(γ3)µγ3−1b e−β3µb as the prior distributions for µs and µb respectively.
The corresponding posterior density of the model parameters can be written as follows
P (Θ|B,S) ∝ P (Θ)
T∏t=1
[P (Bt, St|Dt,Θ)P (Dt|Θ)
]
= µsγ0−1e−β0µsλγ1−1
s e−β1λsλγ2−1b e−β2λbαρ−1(1− α)φ−1δν−1(1− δ)τ−1µb
γ3−1e−β3µb
×
[(αδ)T1(α(1− δ))T2(1− α)T3
]T∏t=1
[e−µsµs
Sit
Sit !
e−λbλBtbBt!
e−λsλSt−Sits
(St − Sit)!
]dt,1
×
[e−µbµb
Bit
Bit!
e−λsλStsSt!
e−λbλBt−Bitb
(Bt −Bit)!
]dt,2[e−λs (λs)
St
St!
e−λb (λb)Bt
Bt!
]dt,3. (22)
The full conditional distributions for the parameters can be derived from this equation as
35
follows
α ∼ Be(ρ+ T1 + T2, T3 + φ
), (23a)
δ ∼ Be(ν + T1 + T2, T2 + τ
), (23b)
µs ∼ Ga
(γ0 +
T∑t=1
(Sit)dt,1 , T1 + β0
), (23c)
µb ∼ Ga
(γ3 +
T∑t=1
(Bit)dt,2 , T2 + β3
), (23d)
λs ∼ Ga
(γ1 +
T∑t=1
[Sdt,1t − (Sit)
dt,1 + Sdt,2t + S
dt,3t ], T1 + T2 + T3 + β1
), (23e)
λb ∼ Ga
(γ2 +
T∑t=1
[Bdt,1t +B
dt,2t − (Bi
t)dt,2 +B
dt,3t ], T1 + T2 + T3 + β2
). (23f)
The Gibbs sampling algorithm is now
• Start with classification D(0) of (Bt, St)
• Initialize the parameters Θ(0)=(α(0), δ(0), λ
(0)s , λ
(0)b , µs(0), µb
(0))
• Repeat for k = 1 to G sweeps
– Sample Bi(k)t |D(k−1)
t ∼ Bin(Bt,
µb(k−1)
µb(k−1)+λ(k−1)b
), t = 1, . . . , T
– Sample Si(k)t |D(k−1)
t ∼ Bin(St,
µs(k−1)
µs(k−1)+λ(k−1)s
), t = 1, . . . , T
– Update µs(k)|µb(k−1), λ
(k−1)s , λ
(k−1)b , α(k−1), δ(k−1), B
i(k)t , S
i(k)t
– Update µb(k)|µs(k), λ(k−1)
s , λ(k−1)b , α(k−1), δ(k−1), B
i(k)t , S
i(k)t
– Update λ(k)s |λ(k−1)
b , α(k−1), δ(k−1), Bi(k)t , S
i(k)t , µs
(k), µb(k)
– Update λ(k)b |α(k−1), δ(k−1), B
i(k)t , S
i(k)t , µs
(k), µb(k), λ
(k)s
– Update α(k)|δ(k−1), Bi(k)t , S
i(k)t , µs
(k), µb(k), λ
(k)s , λ
(k)b
– Update δ(k)|Bi(k)t , Si(k)t , µs
(k), µb(k), λ
(k)s , λ
(k)b , α(k)
36
– Compute L1 = logω(k)1 −
(µ(k)s + λ
(k)s + λ
(k)b
)+Bt log λ
(k)b + St log
(λ(k)s + µ
(k)s
)
– Compute L2 = logω(k)2 −
(µ(k)b + λ
(k)s + λ
(k)b
)+ St log λ
(k)s +Bt log
(λ(k)b + µ
(k)b
)
– Compute L3 = logω(k)3 −
(λ(k)s + λ
(k)b
)+ St log λ
(k)s +Bt log λ
(k)b
– compute χ = max (L1, L2, L3)
– Compute p1 = eL1−χ
3∑j=1
eLj−χ, p2 = eL2−χ
3∑j=1
eLj−χand p3 = eL3−χ
3∑j=1
eLj−χ
– Compute p2 = eL2−χ
3∑j=1
eLj−χ
– Compute p3 = eL3−χ
3∑j=1
eLj−χ
– Compute p1 = ω(k)1 e
−(µ(k)+λ(k)
s +λ(k)b
)λ(k)b
Bt(λ(k)s + µ(k)
)St– Compute p2 = ω
(k)2 e
−(µ(k)+λ(k)
s +λ(k)b
)λ(k)s
St(λ(k)b + µ(k)
)Bt– Compute p3 = ω
(k)3 e
−(λ(k)s +λ
(k)b
)λ(k)s
Stλ(k)b
Bt
– Update Dt(k), the classification of (Bt, St) by sampling from the multinomial distribution with
probability (p1, p2, p3),
(p1
p1+p2+p3, p2p1+p2+p3
, p3p1+p2+p3
),
where p1, p2 and p3 are the probabilities that at the beginning of the trading there will be a
bad news, good news and no news respectively.
37
Fig. 1. Tree diagram for Easley et al. (2002) model
38
0 50000 100000
Buy
0
50000
100000
Sel
l
BA
0 50000 100000 150000
Buy
0
50000
100000
150000
Sel
l
DIS
0 20000 40000 60000 80000
Buy
0
20000
40000
60000
80000
Sel
l
IBM
0 50000 100000
Buy
0
50000
100000
Sel
lKO
0 100000 200000
Buy
0
100000
200000
Sel
l
XOM
Fig. 2. Scatter plot of daily buy and sell trades
39
Fig. 3. Scatter plot of 15−minute sampled buy and sell trades
40
0.0
0.2
0.4
0.6
01
−Ja
n−
20
04
01
−Ja
n−
20
05
01
−Ja
n−
20
06
01
−Ja
n−
20
07
01
−Ja
n−
20
08
01
−Ja
n−
20
09
01
−Ja
n−
20
10
01
−Ja
n−
20
11
01
−Ja
n−
20
12
01
−Ja
n−
20
13
01
−Ja
n−
20
14
01
−Ja
n−
20
15
01
−Ja
n−
20
16
BayesPin MLE PIN
(a) BA
0.0
0.2
0.4
0.6
01
−Ja
n−
20
04
01
−Ja
n−
20
05
01
−Ja
n−
20
06
01
−Ja
n−
20
07
01
−Ja
n−
20
08
01
−Ja
n−
20
09
01
−Ja
n−
20
10
01
−Ja
n−
20
11
01
−Ja
n−
20
12
01
−Ja
n−
20
13
01
−Ja
n−
20
14
01
−Ja
n−
20
15
01
−Ja
n−
20
16
BayesPin MLE PIN
(b) DIS
0.0
0.2
0.4
0.6
01
−Ja
n−
20
04
01
−Ja
n−
20
05
01
−Ja
n−
20
06
01
−Ja
n−
20
07
01
−Ja
n−
20
08
01
−Ja
n−
20
09
01
−Ja
n−
20
10
01
−Ja
n−
20
11
01
−Ja
n−
20
12
01
−Ja
n−
20
13
01
−Ja
n−
20
14
01
−Ja
n−
20
15
01
−Ja
n−
20
16
BayesPin MLE PIN
(c) IBM
0.0
0.2
0.4
0.6
01
−Ja
n−
20
04
01
−Ja
n−
20
05
01
−Ja
n−
20
06
01
−Ja
n−
20
07
01
−Ja
n−
20
08
01
−Ja
n−
20
09
01
−Ja
n−
20
10
01
−Ja
n−
20
11
01
−Ja
n−
20
12
01
−Ja
n−
20
13
01
−Ja
n−
20
14
01
−Ja
n−
20
15
01
−Ja
n−
20
16
BayesPin MLE PIN
(d) KO
0.0
0.2
0.4
0.6
01
−Ja
n−
20
04
01
−Ja
n−
20
05
01
−Ja
n−
20
06
01
−Ja
n−
20
07
01
−Ja
n−
20
08
01
−Ja
n−
20
09
01
−Ja
n−
20
10
01
−Ja
n−
20
11
01
−Ja
n−
20
12
01
−Ja
n−
20
13
01
−Ja
n−
20
14
01
−Ja
n−
20
15
01
−Ja
n−
20
16
BayesPin MLE PIN
(e) XOM
Fig. 4. Daily PIN estimates from intraday data using the InfoTrad package and the Bayesianalgorithm
41
0
50
100
150
200
250
0.0 0.1 0.2 0.3 0.4 0.5
BA
0
50
100
150
200
250
0.0 0.1 0.2 0.3 0.4 0.5
DIS
0
100
200
300
0.0 0.1 0.2 0.3 0.4
IBM
0
50
100
150
200
250
0.0 0.1 0.2 0.3 0.4 0.5
KO
0
100
200
300
0.0 0.1 0.2 0.3 0.4 0.5
XOM
Fig. 5. Histogram of Daily PIN estimates from intraday data using the InfoTrad package
42
0
100
200
300
0.1 0.2 0.3 0.4 0.5
BA
0
100
200
300
0.1 0.2 0.3 0.4 0.5
DIS
0
100
200
300
0.1 0.2 0.3 0.4 0.5
IBM
0
100
200
300
0.2 0.4 0.6
KO
0
50
100
150
200
250
0.2 0.4 0.6
XOM
Fig. 6. Histogram of Daily PIN estimates from intraday data using the Bayesian algorithm
43
0
50
100
150
−0.4 −0.2 0.0 0.2
Difference between ByesPin and MLE Pin
Fre
quen
cyδ is 0 or 1
FalseTrue
(a) BA
0
50
100
150
200
−0.4 −0.2 0.0 0.2 0.4
Difference between ByesPin and MLE Pin
Fre
quen
cy
δ is 0 or 1
FalseTrue
(b) DIS
0
50
100
150
−0.3 −0.2 −0.1 0.0 0.1 0.2
Difference between ByesPin and MLE Pin
Fre
quen
cy
δ is 0 or 1
FalseTrue
(c) IBM
0
50
100
150
−0.4 −0.2 0.0 0.2
Difference between ByesPin and MLE Pin
Fre
quen
cy
δ is 0 or 1
FalseTrue
(d) KO
0
50
100
−0.4 −0.2 0.0 0.2
Difference between ByesPin and MLE Pin
Fre
quen
cy
δ is 0 or 1
FalseTrue
(e) XOM
Fig. 7. Histograms of the differences between daily Bayesian and MLE Pin estimates.Note: The darker shaded areas (blue) represent the regular cases while the lighter shaded areas (gold)
represent corner solutions in MLE.
44
0.0
0.1
0.2
0.3
0.4
2004
Q1
2004
Q3
2005
Q1
2005
Q3
2006
Q1
2006
Q3
2007
Q1
2007
Q3
2008
Q1
2008
Q3
2009
Q1
2009
Q3
2010
Q1
2010
Q3
2011
Q1
2011
Q3
2012
Q1
2012
Q3
2013
Q1
2013
Q3
2014
Q1
2014
Q3
2015
Q1
2015
Q3
BayesPin MLE PIN
(a) BA
0.0
0.1
0.2
0.3
0.4
2004
Q1
2004
Q3
2005
Q1
2005
Q3
2006
Q1
2006
Q3
2007
Q1
2007
Q3
2008
Q1
2008
Q3
2009
Q1
2009
Q3
2010
Q1
2010
Q3
2011
Q1
2011
Q3
2012
Q1
2012
Q3
2013
Q1
2013
Q3
2014
Q1
2014
Q3
2015
Q1
2015
Q3
BayesPin MLE PIN
(b) DIS
0.0
0.1
0.2
0.3
0.4
2004
Q1
2004
Q3
2005
Q1
2005
Q3
2006
Q1
2006
Q3
2007
Q1
2007
Q3
2008
Q1
2008
Q3
2009
Q1
2009
Q3
2010
Q1
2010
Q3
2011
Q1
2011
Q3
2012
Q1
2012
Q3
2013
Q1
2013
Q3
2014
Q1
2014
Q3
2015
Q1
2015
Q3
BayesPin MLE PIN
(c) IBM
0.0
0.1
0.2
0.3
0.4
2004
Q1
2004
Q3
2005
Q1
2005
Q3
2006
Q1
2006
Q3
2007
Q1
2007
Q3
2008
Q1
2008
Q3
2009
Q1
2009
Q3
2010
Q1
2010
Q3
2011
Q1
2011
Q3
2012
Q1
2012
Q3
2013
Q1
2013
Q3
2014
Q1
2014
Q3
2015
Q1
2015
Q3
BayesPin MLE PIN
(d) KO
0.0
0.1
0.2
0.3
0.4
2004
Q1
2004
Q3
2005
Q1
2005
Q3
2006
Q1
2006
Q3
2007
Q1
2007
Q3
2008
Q1
2008
Q3
2009
Q1
2009
Q3
2010
Q1
2010
Q3
2011
Q1
2011
Q3
2012
Q1
2012
Q3
2013
Q1
2013
Q3
2014
Q1
2014
Q3
2015
Q1
2015
Q3
BayesPin MLE PIN
(e) XOM
Fig. 8. Quarterly BayesPin estimates from daily aggregated buys and sells with credibleintervals.Note: The solid lines (blue) represent the BayesPin estimates while the dashed lines (red) represent MLE
PIN estimates. The grey shaded region is the credible interval for the BayesPin estimates.
45