71
Directions in Information Systems and Data Analysis: Opportunities and Challenges from a personal perspective

Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Directions in Information Systems and Data Analysis: 

Opportunities and Challenges

from a personal perspective

Page 2: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Outline● Applications● Problems & Challenges● Solutions

– R Framework (Business Model)

– Reproducible Research

Page 3: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Applications

Analysis ofInvestment Strategies

Page 4: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Analysis of Investment Strategies

● Data sources:

– Accounting

– Macro economics

– Financial markets

– Reports, etc...● Techniques:

– Technical Analysis

– Statistics (econometrics, TSA, VaR, MCMC/bootstrap simulations)

– Data Mining (classifiers, neural networks, ...)

Page 5: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Reasons for studying IS

● Financial– Reduce risk (market, operational, human)

– Improve return

● Monitoring / Auditing / Rating● HRM

– Renumeration schemes

– Training

● ICT (alignment, usability)

Page 6: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Application 1

Rating ofInvestment Strategies and Hedge Funds

Page 7: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Example: market neutral IS

● Hedge funds often use a market neutral IS:– Define a Universe of tradable items (stocks)

– Define a investment horizon (2 – 30 days)

– Create a discrimination model that makes 3 piles:● Neutral pile (no position)● Long pile (buy stocks = long position)● Short pile (sell stocks = short position)

– Hold short and long positions simultaneously during horizon period

Page 8: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Example: conclusion

● The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients have no information about the risk they are actually taking by using the hedge fund investment vehicle.

Page 9: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Hypothesis

● real stock market time series exhibit fundamental, testable differences when compared to the Random-Walk (Fama-efficient)

Page 10: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Rating Procedure

● Rating is a function of the Expert's ability to discriminate between Real and Random-Walk time series

This equally applies to Experts using:– Technical analysis

– Statistical models

– No model or technique at all

Page 11: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Fundamental Problem

● This procedure does not take into account exogenous factors such as:– Temporal properties of the market (volatility, ...)

– Geographical properties of the market

– IS-related restrictions imposed by senior management or by law

● Therefore the rating can only be used for a single case (we cannot compare Experts)

Page 12: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Solution

● Statistical model that discriminates well(low alpha and beta errors)

● The discrimination quality of the model is used as a benchmark to create a relative rating

● This is possible if the model's performance is not too sensitive to external factors (time, place, ...)

Page 13: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Model

● Quasi Random-Walk (Airoldi, 2001)● P. Cizeau, M. Potters, J.P. Bouchaud, Correlation Structure of

Extreme Stock Returns, Quantitative Finance, 1, 217-222 (2001)● Marco Airoldi, Correlation Structure and Fat Tails in Finance: a New

Mechanism, Risk Management & Research, Intesa-Bci Bank, Milan, Italy (July 30, 2001)

Page 14: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Simple Logistic Regression

● I expand on this idea and introduce the logistic relationshipf = exp(gamma + delta X) where P(h=1) = f / (1+f)and where X is a “discriminating statistic”

● Estimation: Bias Reduced Logistic Regression:➢ Firth, D. (1993) Bias reduction of maximum likelihood estimates.

Biometrika 80, 27-38.

➢ Firth, D. (1992) Bias reduction, the Jeffreys prior and GLIM. In Advances in GLIM and Statistical Modelling, Eds. L Fahrmeir, B J Francis, R Gilchrist and G Tutz, pp91-100. New York: Springer.

➢ Heinze, G. and Schemper, M. (2002) A solution to the problem of separation in logistic regression. Statistics in Medicine 21, 2409-2419.

Page 15: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Best discriminating factor?

● Based on preliminary investigation we identified the p-value of the small sample Kurtosis

as the best factor (Vandervorst, Wessa, 2005).

Page 16: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 17: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 18: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Conclusions

● “Do real stock market time series exhibit fundamental, testable differences when compared to the Random-Walk?”-> Yes (Kurtosis p-value works great)

● Bias-Reduced Logistic Regression = non-linear transformation of probabilities

● We can use the model as a benchmark=> it looks like we can make “fair” comparisons:

– it only requires re-estimation of the model parameters

– the model's performance promises to be good over time and place (and other factors?)

● The autocorrelation-based measure requires 4 times more observations to reach the same discrimination quality

Page 19: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Application 2

Analysis and Parameter Estimation oftrend-following Investment Strategies

Page 20: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

“tKMACD”

Page 21: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

“Alexander’s Filterrule”

Page 22: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

par3 = lambda_1par4 = lambda_2

Expected Return (P50)

Page 23: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

par3 = lambda_1par4 = lambda_2

VaR 99% (P01)

Page 24: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

tKMACD

Morgan Stanley Standard & Poors

Page 25: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Expected Return (P50)

par1 = delta_1par2 = delta_2

Page 26: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

par1 = delta_1par2 = delta_2

VaR 99% (P01)

Page 27: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Alexander’s Filterrule

Morgan Stanley Standard & Poors

Page 28: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Application 3

Profit Density Forecasting

Page 29: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 30: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 31: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Application 4

Portfolio AnalysisDiversification = f(IS)

Page 32: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 33: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 34: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients
Page 35: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Problems & Challenges

So how can things go wrong?

Page 36: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Risks are not independent

Market Operational

HumanThe interaction between human and operational risks are often neglected

(solution: reproducibility)

Page 37: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

ICT creates problems● Investment parameters change

– Horizon & timing

– Style (large caps, markets, ...)

– Importance of different types of analysis

but is ICT able to cope with rapid changes?● Algorithms change

● Features are added/changed (mostly undocumented)

● Data availability changes

● Do users make adequate use of the Information System?

● How to learn from past mistakes?

● How do we collaborate?

Page 38: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Process

The black box problem

Decision by theAgent

Evaluation by thePrincipal

Assignment Result

Unknown factors

We cannot attributethe Result to theIndividual

The reward mechanism is not always transparent

Information SystemThe Inf. System makes thingsworse

Market

Page 39: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

PsychologicalSociological

Incentive Scheme

Reproducibilityand Reusability

Responsibility

Experimentation

Communication

Collaboration “Innovation”

“Reliability”

Page 40: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

ChallengesWe would like to

– produce new

– maintain existing

– publish/implement

applications

– quickly & cheaply (Costs)

– with dissemination in mind (Marketing)

– with usability, scalability, security, flexibility... in mind (Infrastructure)

that generate

– reproducible

– reusable

research.

Page 41: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

R Framework (wessa.net)

R Framework for Statistical Software Development, Maintenance, and Publishing within an Open-Access Business Model

Page 42: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Business Model● Open Source / Open Access

(Johnson 2001, Edwards 2005, Välimäki 2005)● Business & Marketing Models

(Osterwalder 2004, Constantinides 2004)● A Framework for Statistical Software Development, 

Maintenance, and Publishing within an Open­Access Business Model(DSC 2007 Auckland NZ)(published in Computational Statistics)

Page 43: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Structure

Operating System

R Language

R package

R wrapper

R Framework (wessa.net) Meta-data

Distributed Computing & Queue Manager

Compendium Platform (www.freestatistics.org)

Education & Research Publishing … … …

Page 44: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Example

● We build an R module that computes Bivariate Kernel Density Estimation 

Lucy, D. Aykroyd, R.G. & Pollard, A.M.(2002) Non­parametric calibration for age estimation. Applied Statistics 51(2): 183­196 

Page 45: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Meta Information

Page 46: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

R codeWe include:● type conversion● input validation● actual analysis

Page 47: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

R code produces HTML output

Page 48: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Meta­tags, links, citations

Page 49: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

R Version, Category, Author

Page 50: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Editor

The R Framework creates the R module when the Editorclicks the Publish link

Page 51: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

“Descriptive Statistics”

Page 52: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Bivariate KDE (R Module)

Page 53: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Handling Requests

• Find R module (Google)• User submits request (html form, HTTP POST)• Webserver loads R module• R module creates pre-processed R code• Webserver directs request to R server (callback)• R server invokes R engine, stores output• Webserver gets output parses through template • Webserver sends html reply to user• User fetches pictures/logfiles from R server

Page 54: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Computational Result

Page 55: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Pictures: png and postscript

Postscript (herschaalbaar, exporteerbaar)Png formaat

Page 56: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

R Modules …

• are “indexable”, hence “findable”– hierarchy of hyperlinks– meta tags, titles, descriptions– archive of old versions– fast loading, pure html

• make “Business/Marketing Sense”• are “flexible” (the underlying R code can be

changed on the fly)• have “impact”

Page 57: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

R modules have impact● Millions of unique users [Borghers,Wessa, 2005]

● Citations: Neuroscience, BMC Genomics, European Food Research and Technology, Journal of Electronic Resources in Medical Libraries, Functional Ecology, Journal of Animal Science, Nature, Ornis Fennica, Journal of Field Ornithology, Folia Zool., Journal of Animal Breeding and Genetics, Journal of Molecular Evolution, Annales Zoologici Fennici, Journal of Structural Geology, Journal of Statistics Education, Genes, Brain and Behavior, Journal of Lipid Research, PLoS Pathogens, IEEE Transactions on Knowledge and Data Engineering, Col. Vol. Sci. Pap. ICCAT, Applied Mathematics and Computation, Nucleic Acids Research, Informs, IEEE Transactions on Nuclear Science, arXiv:0704.0655v1 [astro­ph], etc..

Page 58: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Advantages

● Distributed computing– Scalability

– Robustness

– Shared resources

● Thin client– No incompatibilities (unlike DIE software)

– Client-side security

– Easy/cheap maintentance

● Flexibility● etc...

Page 59: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Reproducible Research (freestatistics.org)

Archive of Computational R Objects that support reproducibility and reusability

Page 60: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Compendium

• Original definition:An electronic collection of Data and Software that is needed to reproduce the results in a text text (c.q. Article)

• New definition:A document with (open-access) references to archived Computations (including Data, Meta-data, and Software) that allow us to reproduce, and reuse the underlying analysis

=> the compendium platform is a tool for collaboration, dissemination, and monitoring.

Page 61: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Blogging Results

Page 62: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Archive of Computational R Objects

Page 63: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Object properties● Date, computing time, links, sources, etc...

● GUUID

● R code

● Parameters

● Data

● Comments, keywords, tags, etc...

● R output

● HTML output

● Pictures (png, postscript)

● etc...

Page 64: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Objects on the web

Page 65: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Demo (screenshots)

Page 66: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Demo (screenshots)

Page 67: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Demo (screenshots)

Page 68: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Demo (screenshots)

Every archived computation can be recomputed, changed, and re-archived!=>Archived Computation=Collaborative Software

Page 69: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Examples of Compendia

http://www.wessa.net/download/tutorial1.pdf (Time Series Analysis - Introduction)

http://www.wessa.net/download/tutorial.pdf(Descriptive Statistics – Central Tendency)

Note: both documents are “work in progress”Please, send corrections & suggestions to [email protected]

Page 70: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Queue Manager

(not freely available)

Page 71: Directions in Information Systems and Data Analysis ... · The hedge fund managers are unable to adequately demonstrate the qualities of their investment approach and the clients

Time Series

Simulations&

Quantiles

PermanentParallel

SimulationAlgorithm

Aggregation

Query & Analysis(R Framework)

Browser:HTML

Query Interface

DataProvider

Data Collector

DataProviderData

Providers