41
Different levels of information in Different levels of information in financial data: an overview of some financial data: an overview of some widely investigated databases widely investigated databases Salvatore Miccichè Salvatore Miccichè http:// http:// lagash.dft.unipa.it lagash.dft.unipa.it Observatory of Complex Observatory of Complex Systems Systems Dipartimento di Fisica e Tecnologie Relative Dipartimento di Fisica e Tecnologie Relative Università degli Studi di Palermo Università degli Studi di Palermo GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April 2008 GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April 2008

Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè Observatory

Embed Size (px)

Citation preview

Page 1: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Different levels of information in Different levels of information in financial data: an overview of financial data: an overview of

some widely investigated some widely investigated databasesdatabases Salvatore MiccichèSalvatore Miccichè

http://http://lagash.dft.unipa.itlagash.dft.unipa.it

Observatory of Complex Observatory of Complex SystemsSystems

Dipartimento di Fisica e Tecnologie RelativeDipartimento di Fisica e Tecnologie Relative Università degli Studi di Palermo Università degli Studi di Palermo

GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April 20082008

Page 2: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of DatabasesOverview of Databases

Observatory of Complex SystemsObservatory of Complex Systems

S. Miccichè

F. LilloR. N. Mantegna

M. TumminelloG. Vaglica

C. Coronnello

EconophysicsEconophysics BioinformaticsBioinformatics Stochastic ProcessesStochastic Processes

M. Spanò

Page 3: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

We will present an overview of some widely We will present an overview of some widely investigated financial and economic databases. investigated financial and economic databases.

Most financial databases include data about Most financial databases include data about transaction prices, bid and ask quotes, volume transaction prices, bid and ask quotes, volume

of transactions. of transactions.

In some financial databases the information In some financial databases the information about the coded identity of the market about the coded identity of the market

members acting on the order book is also members acting on the order book is also available. available.

The economic databases we will discuss The economic databases we will discuss contain financial and economic information on contain financial and economic information on over ten millions public and private companies over ten millions public and private companies

operating in Europe and USA. operating in Europe and USA.

Overview of DatabasesOverview of Databases

What do we do with them?What do we do with them?

Page 4: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Why Physicists are Why Physicists are interested in Financial interested in Financial

MarketsMarkets

Financial market can be considered as model complex

systems

•Many Agents/Factors•interactions are not always clear/known (NO equations, Hamiltonians ?)

G. Parisi cond-mat/0205297 Complex Systems: a Physicist's ViewPoint: “A system is complex if its behaviour crucially depends on the details of the system”

Econophysics Econophysics is a recently established discipline whose main aim is that of modeling some of the stylized facts empirically observed in the study of financial markets.

Overview of Databases: financial databasesOverview of Databases: financial databases

Page 5: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Methods of Statistical Physics can be applied :

•Stochastic Processes (Brownian motion, superdiffusivity, power-law tails, long-range correlation,...)•scaling•Network theory, clustering techniques, random matrix, ...•Agent-based models, ...•...

Last but not least: There is a huge amount of data!There is a huge amount of data!

1995: 1 CD per month 2003: 12-13 CD per month

Overview of Databases: financial databasesOverview of Databases: financial databases

Page 6: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

FINANCIAL databases:FINANCIAL databases:

TAQ, Euronext, BI, TSETAQ, Euronext, BI, TSELSE, BMELSE, BME

MTSMTS

Overview of DatabasesOverview of Databases

Page 7: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Trade and Quote Trade and Quote (NYSE)(NYSE)- 1995 6.3 Gb

- 1996 8.1 Gb

- 1997 13.5 Gb

- 1998 20.0 Gb

- 1999 27.1 Gb

- 2000 63.1 Gb

- 2001 approx 110. Gb

- 2002 approx 180 Gb

- 2003 approx 215 Gb

Rebuild Order Book - Rebuild Order Book - LSELSE- 2002 19.5 Gb (now also 2004, 2005, 2006)

OPEN BOOK - OPEN BOOK - NYSENYSE- 2002 approx 110 Gb

Tokio (TSE)Tokio (TSE)- 2002 trades 1.6 Gb.

EURONEXTEURONEXT- 2002 6.7 Gb.

MTS- 4/2003-3/2004 4.0 Gb.

MILANO (BI)MILANO (BI)- 2002 trades 2.14 Gb.- 2002 best quotes 2.43 Gb.

Overview of Databases: financial databasesOverview of Databases: financial databases

1 Tb1 Tb

SizeSize

Page 8: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Transaction pricesTransaction pricesQuotesQuotes

Overview of Databases: financial databasesOverview of Databases: financial databases

Page 9: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Given a price S(t) at time t, the price return r(t) is:

AARRBBIITTRRAAGGEE

rt (t) S(t t) S(t)

S(t)

To start withTo start with

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 10: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Multivariate descriptionMultivariate description

COMOVEMENTS

)(ln)(ln)( tSttStr iii )(

)()()(

tS

tSttStr ii

i

t=op-cl, 1995-2003

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 11: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

We are looking for a possible collective stochastic dynamics and/or links between

price returns / volatilities of different stocks.

PRICE RETURNS CLUSTERS

Cross-Correlation Clustering Procedure based on a similarity measure:

)ρ2(1d ijij

2

j2

j

2

i2

i

jiji

ij

rrrr

rrrrρ

where ri are the price returns time series.

subdominant ultrametric distancedistance.

Hierarchical Tree (HT) and Minimum Spanning Tree (MST).

Multivariate descriptionMultivariate description

At any t

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 12: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Multivariate descriptionMultivariate description•Compare the dynamics of price returns of Compare the dynamics of price returns of stocks traded at different exchangesstocks traded at different exchanges -- industry sector identification at different time industry sector identification at different time horizonhorizon - sector dynamics - sector dynamics - LSE and NYSE - LSE and NYSE - are there common (stylized) facts ? - are there common (stylized) facts ?Single Linkage Clustering Analysis

MST construction (N-1)

At each step,when two elements or one element and a cluster or two clusters p and q merge in a wider single cluster t, the distance dtr between the new cluster t and any cluster r is recursively given by: dtr =min {d pr ,d qr}i.e. the distance between any element of cluster t and any element of cluster r is the shortest distance between any two entities in clusters t and r .

Planar Maximally Filtered Graph (3N-2)Planar Maximally Filtered Graph (3N-2)

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 13: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Sinchronized dataSinchronized dataWe consider: NYSE - the 100 most capitalized stocks in 2002.

LSE - the 92 most traded stocks in 2002.

We consider high-frequency (intradayintraday) data. Transactions do not occur at the same time for all stocks.

We have to synchronizesynchronize/homogenizehomogenize the data:

NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day trading time 6trading time 6hh30’ 30’

LSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 dayLSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 day trading time 8trading time 8hh30’30’

TTrades AAnd QQuotes (TAQTAQ) database maintained by NYSE (1995-20031995-2003)

RRebuild OOrder BBook (ROBROB) database maintained by LSE (20022002)

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 14: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

The set of investigated stocksThe set of investigated stocks

NYSE 100 stocksNYSE 100 stocks

01 Technology 802 FinancialFinancial 2403 Energy 304 04 Consumer non-CyclicalConsumer non-Cyclical 11 1105 Consumer Cyclical 206 Healthcare 1207 Basic Materials 608 ServicesServices 2009 Utilities 210 Capital Goods 611 Transportation 212 Conglomerates 412 Conglomerates 4

LSE 92 stocksLSE 92 stocks

01 Technology 402 FinancialFinancial 2003 Energy 304 04 Consumer non-CyclicalConsumer non-Cyclical 12 1205 Consumer Cyclical 1006 Healthcare 607 Basic Materials 508 ServicesServices 1909 Utilities 610 Capital Goods 511 Transportation 212 Conglomerates 012 Conglomerates 0

Page 15: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Daily data: SLCA – hierarchy & topologyDaily data: SLCA – hierarchy & topologyNYSE daydayLSE dayday

High level of correlationHigh level of correlation High level of correlationHigh level of correlation

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 16: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Daily data: PMFGDaily data: PMFGNYSE daydayLSE dayday

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 17: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

5-min data: SLCA – hierarchy & topology5-min data: SLCA – hierarchy & topologyLSE 5-min5-min NYSE 5-min5-min

FINANCIAL 04 out of 20FINANCIAL 04 out of 20SERVICES 02 out of 19SERVICES 02 out of 19

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 18: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

5-minute data: PMFG5-minute data: PMFGLSE 5-min5-min NYSE 5-min5-min

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

Page 19: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

ConclusionsConclusions•The system is more hierarchically/topologically The system is more hierarchically/topologically structured at daily time horizons conferming that the structured at daily time horizons conferming that the market needs a finite amount of time to assess the market needs a finite amount of time to assess the correct degree of cross correlation between pairs of correct degree of cross correlation between pairs of stocks.stocks.•Financial and Energy seem to be structured even at Financial and Energy seem to be structured even at a low time horizon (LSE more than NYSE).a low time horizon (LSE more than NYSE).

Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized

overnightovernight

Page 20: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

A possible use of tick-by-tick dataA possible use of tick-by-tick data

Overview of Databases: financial databases – transaction prices – Overview of Databases: financial databases – transaction prices – tick-by-thicktick-by-thick

• The “extreme events” we consider will be related with the first crossing of any of the two barriers.

• The Mean Exit Time (MET) is simply the expected value of the time interval

Financial InterestFinancial Interest: the MET provides a timescale for market

movements.

dashed black=original datamagentamagenta = shuffle returns only

GE stock

2L2L

Page 21: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

A possible use of tick-by-tick dataA possible use of tick-by-tick dataOverview of Databases: financial databases – transaction prices – Overview of Databases: financial databases – transaction prices – tick-by-thicktick-by-thick

QUOTESQUOTES

Time Time between between

consecutive consecutive quotesquotes

Page 22: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Another database: MTSAnother database: MTSOverview of Databases: financial databases – bondsOverview of Databases: financial databases – bonds

These are data of bonds traded in the European markets and managed by the MTS Group firm,

which is based in Italy. The bonds we have considered are those continuously traded In Italy in

the whole year from April 2003 to March 2004.

Page 23: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

The state of the The state of the complete order complete order

book can bebook can be

visualized at any visualized at any period of time by period of time by

using ausing a

schematic schematic representationrepresentation

Order book data allows to follow the Order book data allows to follow the detailsdetails

of price formation in a financial marketof price formation in a financial market

Order book dataOrder book dataOverview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Page 24: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

The real behavior in a short time for a normal stock

- sell limit orders

- buy limit orders

○ sell market orders

x buy market orders

time (s)

pri

cex1

00

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Order book data: time evolutionOrder book data: time evolution

Page 25: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Representation of the order book focusing on the time dependence of order flow (the plot refers to a stock traded at London Stock Exchange)

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Order book data: time evolutionOrder book data: time evolution

Page 26: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

A very special day

(20 Sept 2002)

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Order book data: time evolutionOrder book data: time evolution

Page 27: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

(Coded) Identity(Coded) Identity

Overview of Databases: financial databasesOverview of Databases: financial databases

Page 28: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Tick-by-tick data, volume and identityTick-by-tick data, volume and identityOverview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

In the LSE and BME databases the In the LSE and BME databases the information about the coded identity of the information about the coded identity of the market members (market members (brokeragesbrokerages) acting on ) acting on the order book is also availablethe order book is also available

For LSE we have got these data under a For LSE we have got these data under a special special confidentiality agreementconfidentiality agreement: e.g. : e.g. people who uses these data MUST be people who uses these data MUST be traceable!traceable!

For BME the identity is For BME the identity is transparenttransparent in the in the market.market.

Page 29: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Inventory variationInventory variation: the value (i.e. price times volume) of an asset exchanged as a buyer minus the value exchanged as a seller in a given time interval .

price(2001-2004)

volumesign+1 for buys-1 for sells

In this talk, we focus on = 1 trading day

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Tick-by-tick data, volume and identityTick-by-tick data, volume and identity

i=1, …, 69(BBVA)

most activemost active

BBVA, TEF, SAN, REP

Page 30: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Inventory variation correlation matrix obtained by sorting the firms in the rows and columns according to their correlation of inventory variation with price return

BBVA 2003

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Tick-by-tick data, volume and identityTick-by-tick data, volume and identity

69696969

orderingordering

Page 31: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

“trending” firms(momentum traders)

“reversing” firms(contrarians traders)

“noisy” firms

A brokerages/firms A brokerages/firms

classificationclassification

by considering the correlation

between its inventory

variation and the price return of

the traded stock;

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Tick-by-tick data, volume and identityTick-by-tick data, volume and identity

Page 32: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

BBVA

2003

“Reversing”(negative correlation between inventory variation and price return).

“Noisy”(correlation between inventory variation and price return within noise confidence levels).

“Trending” (positive correlation between inventory variation and price return).

Number of firms in the group 37 21 11

TrendingTrending- Positively correlated with price return- Large institutions- Acting on a long time scales, splitting large orders

to build portfolio position by minimizing price impact

- Their trading activity tends to be localized in time

ReversingReversing

- Negatively correlated with price return

- Large and small institutions

- Typically acting on a short time scale, reverting continuously their position in the market

- Their trading activity tends to be homogeneous in time

NoisyNoisy- Poorly correlated with price return- Large and small institutions

Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data

Tick-by-tick data, volume and identityTick-by-tick data, volume and identity

Page 33: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

ECONOMIC databases:ECONOMIC databases:

Amadeus, CompustatAmadeus, CompustatINPSINPS

Overview of Databases: economic databasesOverview of Databases: economic databases

Page 34: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

AMADEUS is a comprehensive, pan-European database AMADEUS is a comprehensive, pan-European database containing financial information on over 10 million public and containing financial information on over 10 million public and private companies in 38 European countries. private companies in 38 European countries.

Standardised annual accounts (for up to 10 years), consolidated Standardised annual accounts (for up to 10 years), consolidated and unconsolidated, financial ratios, activities and ownership for and unconsolidated, financial ratios, activities and ownership for approximately 9 million companies throughout Europe, including approximately 9 million companies throughout Europe, including Eastern Europe. Eastern Europe.

A standard company report includes: 24 balance sheet items, 25 A standard company report includes: 24 balance sheet items, 25 profit and loss account items and 26 ratios, descriptive profit and loss account items and 26 ratios, descriptive information including trade description and activity codes (NACE information including trade description and activity codes (NACE 1, NAICS or US SIC can be used across the database), ownership 1, NAICS or US SIC can be used across the database), ownership information. information. A news module contains information from Reuters’, Dow Jones, A news module contains information from Reuters’, Dow Jones, the FT as well as M&A news and rumours from our own ZEPHYR. the FT as well as M&A news and rumours from our own ZEPHYR.

AMADEUS also contains security and price information and links AMADEUS also contains security and price information and links to an executive report with integral graphs plus a report to an executive report with integral graphs plus a report comparing the financials of the company’s default peer group.comparing the financials of the company’s default peer group.

Overview of Databases: economic databasesOverview of Databases: economic databases

Page 35: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

The growth of a firm was initially describes by Gibrat in 1931.The growth of a firm was initially describes by Gibrat in 1931.Its model regards the logarithmic growth rateIts model regards the logarithmic growth rate

where S(t) is some proxy: total asset, employees, sells, revenue where S(t) is some proxy: total asset, employees, sells, revenue turnover, …turnover, …

Overview of Databases: economic databasesOverview of Databases: economic databases

)(ln)(ln)( tSttStr iii

The Gibrat Model is based on:The Gibrat Model is based on:

1)1) Law of proportionate effectsLaw of proportionate effects: r: rii(t) is independent on the initial (t) is independent on the initial size of the firmsize of the firm

2)2) rrii(t) and r(t) and rjj(t) are un-correlated(t) are un-correlated

By making use (i) of the Central Limit Theorem and (ii) of the By making use (i) of the Central Limit Theorem and (ii) of the additional assumption of indepenence, one can show that additional assumption of indepenence, one can show that the the logarithmic growth rate show be log-normally logarithmic growth rate show be log-normally distributeddistributed..

Logarithmic growth rateLogarithmic growth rate

Page 36: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of Databases: economic databasesOverview of Databases: economic databases

All data are aggregatedAll data are aggregated

IC fixedIC fixed

AMADEUSAMADEUS

databasedatabase

Log-normal Log-normal laplacian laplacian what else? what else?

Page 37: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of Databases: economic databasesOverview of Databases: economic databases

2)( rr

rrz

Z-Z-transformtransform

Data allow Data allow disaggregatiodisaggregation in terms of n in terms of

economic economic sectors of sectors of activityactivity

within sectorswithin sectors

Page 38: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of Databases: economic databasesOverview of Databases: economic databases

Data allow Data allow disaggregatiodisaggregation year-by-yearn year-by-year

Page 39: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of Databases: economic databasesOverview of Databases: economic databases

Exploring the role of correlation Exploring the role of correlation between firmsbetween firms

Shuffling experimentsShuffling experiments

Page 40: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

Overview of Databases: economic databasesOverview of Databases: economic databases

ConclusionsConclusions

The availability of accurate The availability of accurate databases allows for the inspection databases allows for the inspection of the role that different variables of the role that different variables

play in the system.play in the system.

Page 41: Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè  Observatory

The EndThe Endmicciche@[email protected]

http://http://lagash.dft.unipa.itlagash.dft.unipa.it