20
Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania [email protected]

Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Embed Size (px)

Citation preview

Page 1: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Software, Hardware, and Database Structure Options for

Research in Financial Economics

SAS and Computing Speed

Michael Boldin, WRDS, University of Pennsylvania [email protected]

Page 2: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Main Questions

1. How can researchers take advantage of modern computing technology ?

2. Which econometric software packages would you recommend to students ?

3. How do SAS features and computing speed stack up?

Page 3: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Q1. How can researchers take advantage of modern computing technology ?

Observations:

• Today’s PCs are better than yesterday’s ‘supercomputer’ (for single users).

• The system–hardware, software, and network connections–needs to work as a whole.

• Database management (DBMS) matters.

Page 4: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Q2. Which econometric software packages would you recommend to students ?

Observations:

• Undergrad and Grad advice differs.• Power, flexibility and user-friendly elements

are not mutually opposing.• Almost too many choices (and change is

hard).• Few students care about good programming

practice and they keep bad habits.

Page 5: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Q3. How do SAS features and computing speed stack up?

• Is SAS fast enough in raw computing speed?• Does the SAS Data step framework create

performance handicaps? • How does SAS/IML stack up to MATLAB and

GAUSS in functionality?

Other issues:• Does SAS need a better interface to C/C++

and FORTRAN modules ?• What does SAS offer as an RDBM compliment to MATLAB ?• Is greater compatibility with open-source software such as

MySQL and PHP possible ?

Page 6: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Statistical Software Evaluations Reviewing the Reviews

Noteworthy: Jeffrey MacKie-Mason (1992) ‘Econometric Software: A User’s View’

• Could not select an unqualified winner among: Gauss, Limdep, RATS, SAS, SST, Stata, and TSP.

• Preferred TSP. Saw advantages to SAS, but found problems in PC SAS (of 1991).

• Correctly predicted movement toward matrix algebra oriented software such as GAUSS.

John Rust (1993), ‘GAUSS and MATLAB: A Comparison’ • Highlighted the advantages of matrix oriented programming for econometrics.• Correctly predicted that users would soon be moving away from DOS.• Incorrectly predicted that the move would be toward UNIX workstations.

Problems: • Most other reviews just count features. • Or worse, stress speed overall all other issues.• Within 2 years, the review is largely obsolete. • After 5 years likely to be completely misleading if not irrelevant.

Page 7: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Speed Comparisons (by Stefan Steinhaus)

Speed Score Overall Score 1997 1999 2002 1997 1999 2002

GAUSS 49.94 47.96 47.90 | 64.38 63.64 64.80Mathematica 7.67 31.95 31.32 | 48.76 54.93 57.34Matlab 39.98 34.64 65.89 | 60.03 55.85 69.74Ox 66.21 68.12 62.22 | 47.30 49.22 58.45O-Matrix 70.80 67.29 69.80 | 48.72 43.68 45.83S-Plus 37.18 30.51 38.56 | 54.28 44.90 48.61

Source: http://www.scientificweb.com/ncrunch/index.html

Higher scores are better. 100 is the highest possible score in each year’s evaluation. Speed scores are not comparable across years. Overall score includes breadth of functionality and other usability considerations, using these weights: Mathematical functions 38%, Graphical functions 10%, Programming environment 9%, Data import/export 5%, Available operating systems 2%, Speed comparison 36

In pure speed comparisons (made comparable across years)

-- faster PC and new software vintage makes a poor performer the top performer relative to the ‘best’ old technology pair.

And how about SAS ?

Page 8: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

A Helicopter View of PC Technology

1981: IBM PC = $5,500 in today’s prices

64K memory, no hard disk, monochrome monitor, no networking capabilities

Today: Dell Pentium IV < $1000

1G memory, 3000x faster, 80 Gig hard drive, DVD/CD burner, flat screen color monitor, and built-in networking.

Page 9: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

The Speed IssueMoore’s Law in Action

PentiumClockspeed Year MWIPS Time indexI 120 Mhz 1995 79 100.0II 266 Mhz 1997 218 36.2III 550 Mhz 1999 448 17.6IV 1.8 Ghz 2001 638 12.4IV 3.6  Ghz 2003 1342 5.9IV 3.8  Ghz 2004 3899 2.0

      

MWIPS = Mean Whetstone Instructions per Second. A higher MWIPS score is better (i.e. faster chip), and a twice as high MWIPS translates

to roughly 50% less time to make an average numerical calculation.

Source: http://homepage.virgin.net/roy.longbottom/whetstone.htm

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1995 1997 1999 2001 2003 2004

0

20

40

60

80

100

120

MWIPS

Time Index

Page 10: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Evaluation of Statistical Software

Three categories1. Traditional programming languages:

FORTRAN, C/C++, and Basic. Relatively new: Perl, Python, and Java.

2. Statistical packages: EVIEWS, SAS, STATA, and TSP.

3. Matrix algebra oriented computing software: GAUSS, Mathematica, MATLAB, R and Splus.

Page 11: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Speed & User Friendliness

Computation Speed: Fortran > C > C++ > Matlab > SAS > Perl

User Friendliness: SAS > Matlab > Perl > C++ > C > Fortran

Rankings of other languages /packages ?? Java VBasic Stata SPSS SPlus/R

Are the speed differences significant ?Are ‘user’ elements only a matter of taste ?How can user friendliness and computation speed be

combined in an evaluation.

Page 12: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Computing Speed Only One Part of the Equation

Total Research Project Time1. Planning2. Data Management3. Programming4. Computation5. Analysis of Results6. Re-Evaluation

(revisit & repeat prior steps)

Page 13: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Simple Model of Cost/Benefit (Time) TradeoffsProgramming = (b0 + b1*x + b2*x2) / (ease-factor)

Computation = (a0j + a1*x + a2* x2) / (speed)

0 1 2 3 4 5 6 7 8 9 100

10

20

30User Programming (Time and Effort) Element

0 1 2 3 4 5 6 7 8 9 100

5

10Computation Time Element

0 1 2 3 4 5 6 7 8 9 10-10

0

10

20Differences in Costs

Complexity

Package 1 (slow and easy)

Package 2 (fast and hard)

Total

Programming

Computation

Both programming and computing time depend on the complexity of the task, and the computing speed advantage of Package 2 may overwhelm the ease of use issue for modestly complex tasks.

Package 2 preferred for complexity level above 6

Page 14: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Simple Model of Cost/Benefit (Time) TradeoffsProgramming = (b0 + b1*x + b2*x2) / (ease-factor*2)

Computation = (a0j + a1*x + a2* x2) / (speed*10)

0 1 2 3 4 5 6 7 8 9 100

10

20

30User Programming (Time and Effort) Element

0 1 2 3 4 5 6 7 8 9 100

5

10Computation Time Element

0 1 2 3 4 5 6 7 8 9 10-5

0

5Differences in Costs

Complexity

Package 1 (slow and easy)

Package 2 (fast and hard)

Total

Programming

Computation

Increase in computing speed (relative to ease-factor) makes Package 1 a better choice for a larger range of tasks.

Threshold for preferring Package 2 rises

Page 15: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Black-Scholes Calculation Speeds

1 million cases System A Sun V440

System B

Pentium 4 PC

C Program 3.0 seconds 1.5 seconds

Fortran 4.1 --

Matlab 2.4 1.4

SAS 4.6 6.7

R -- 1.9

EXCEL VBA -- 560

Perl 39.6 --

*SAS code -- Black Scholes Option Value calculation;* S= Spot price, X = Excise price, sigma= Stock return volatility* r= Risk free bond rate, q= Dividend rate, tau= Time till maturity; d1= ( log(S/X) + ( r – q + 0.5*sigma*sigma ) * tau ) / ( sigma*sqrt(tau) );d2= d1 - sigma * sqrt(tau);*Normal curve cumulative density function values;N1= cdf('normal',d1); N2= cdf('normal',d2);Vc = ( S * exp(-q*tau) * N1 ) - ( X * exp(-r*tau)* N2 );

Page 16: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

SAS vs. MATLAB Computation Speed Comparison

Basic Statistics ExampleSimulated Data: 1million observation, 10 variables, in 10 groups

SAS MATLABData creation 3.6 1.4Mean & std 1.6 1.4Frequency 0.3 0.3REG module 0.8 2.2Sort by group 8.4 2.4REG by group 1.1 1.4

sum 15.8 seconds 9.1 seconds

Bottom line:• MATLAB is almost twice as fast in relative difference (42% faster in

this example), but only 6.7 seconds faster in absolute difference.• For most applications there are less than 1 million observations

and the absolute difference is even smaller.

Page 17: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

SAS vs. MATLABComputation Speed Comparison

Is MATLAB’s speed advantage due to its matrix based programming ?

No. SAS also has a Interactive Matrix Language module (IML).Using SAS IML shows how alternative programming methods can

matter (within the same package).

OLS Regression Example: 1million observation, 10 variables

SAS IML MATLABB= inv(X’X)*(X’y) 2.6 0.4REG module 0.8 2.2 Programming the OLS matrix algebra equation in MATLAB beats

MATLAB’s regress(.) function in terms of speed, while the opposite is true for SAS.

Page 18: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Finance Research Example

CAPM (Beta) Test: Ri,t = αi + βi Rmt

500 Beta Calculations

System ASun V440(multi-user UNIX)

System BPentium 4 Windows PC

SAS 1.3 / 2.5 seconds 1.2 seconds

MATLAB ‘loop’ version

1.0

17.3

Multi-user UNIX system run time varies depending on load.MATLAB run time varies depending on program design– optimal vectorized

code versus an inefficient loop.A true CAPM test would estimate multi-factor betas (βi) for 5,000 to 25,000

stocks over different sample periods. Summarizations require sorting into portfolios and applying 2 stage estimation and testing techniques.

Example: SAS run = 40 minutes // MATLAB = 35 minutes

Page 19: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Conclusions:

Changes in technology change the equation for determining the best system—personal preferences are important.

Absolute speed (not relative speed) may matter but programming time is overwhelmingly the larger component (in > 90% of the cases) anyway.

Software is not an either/or situation. Advice: Learn and use two or more software packages as compliments.

Database management and connectivity is the key to the greatest possible flexibility.

Page 20: Software, Hardware, and Database Structure Options for Research in Financial Economics SAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvania

Almost Counterintuitive General Conclusion:

Technological progress makes human factors and personal preferences most important.