27
Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business Intelligence Texas Tech University

Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Grid Computing at Texas Tech University using SAS

Ron BremerJerry PerezPhil Smith

Peter Westfall*

Director, Center for Advanced Analytics and Business IntelligenceTexas Tech University

Page 2: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

What is Grid Computing?

• Grid computing means using multiple resources connected by the net to perform demanding calculations.

• Example:

Page 3: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Economies of High Performance Computing

• Current fastest machine: ~40 Teraflops ($300M)

• 10 Tflops Machines

(~$50M)

• Fastest Cluster at TTU: 0.1 Tflops (~$0.1M)

• Speed of a PC 0.003 Tflops

(~$.001M)

Page 4: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Underused Resources

• Computers are everywhere, mostly idle!

• Grid computing leverages unused resources to create an effective “Supercomputer”

• Teraflops = (N computers) x (TFLPs per)

• For Free! (Almost)

Page 5: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Grid Initiatives at TTU and in Texas

• HipCAT – High Performance Computing Across Texas

• TIGRE – Texas Internet Grid for Research and Education

• SORCER – Service ORienter Computing EviRonment (TTU CS dept.)

• SAS/Connect grid

Page 6: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

HipCAT

• Consortium of Texas institutions working together to use– High performance computing– Clusters– Massive data storage – Scientific visualization– Grid computing.

• Director: Phil Smith, Texas Tech University• Members:

– Baylor College of Medicine – Rice University – Texas A&M University – Texas Tech University – University of Houston – University of Texas – University of Texas at Austin – University of Texas at Arlington– University of Texas at El Paso – University of Texas Southwestern Medical Center

Page 7: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

TIGRE

• Texas Internet Grid for Research & Education• Two year project involving: UT, TTU, UH, Rice,

and TAMU• Funding announced by the Governor in

September• TIGRE will develop a grid software stack and

policies and procedures to facilitate Texas grid computing efforts.

Page 8: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Grid Software Products Used at TTU

• AVAKI

• Globus

• Jini Networking Technology

• SAS/Connect (MPConnect), %Distribute macro

Page 9: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Benefits of SAS

• Ease of Use (relative to other grid products)

• Available and applicable for many scientists in their resp. fields

• Flexibility– Data base (DATA step, PROC SQL)– Math/Optimization (SAS/IML, SAS/OR)– Stat (SAS/STAT, SAS/ETS)

Page 10: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Problems Amenable to SAS Grid

• Replicates of Fundamental task

• Fundamental tasks are time consuming, lots of replicates

• Examples– Simulation– Astrophysics– Bioinformatics– Ensembles of predictive models

Page 11: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Success Story

• Financial Event Studies– Developed simulation tool to detect events– Simulated its performance– 25 hours finished in 40 minutes– Published in J. Fin. Econometrics

• Old system: “Sneaker grid”

Page 12: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Another Success Story:Portfolio Analysis

• 300 portfolios, 50 securities each by randomly sampling securities from CRSP daily database (7.23 Gigabytes)

• 15 models created for each of 50 securities (PROC AUTOREG of SAS/ETS), under 169 treatment settings.

• 126,750 models and associated data steps per portfolio.

• 500 days of continuous computing time reduced to two weeks.

Page 13: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Notoriety

• Web articles appeared in SAS, Grid today, Next-Gen Data forum

• Interviewed by DataBase Trends and Applications

Page 14: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

SAS Grid Structure

• Client connects to host machines

• Client sends replicates of fundamental task (“chunks”) to hosts

• Hosts process chunks, send back to client

• Client combines chunks and summarizes

Page 15: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

The SAS Grid

Page 16: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

SAS Farm

• 100 SAS machines in student lab

• 2.66 GhZ per node

• All have SAS software installed

• SAS “Spawner” must be started on all

• Avaki also installed - diagnoses problems

Page 17: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Student Lab

Page 18: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Load Balancing

• Automatically supports load balancing by farming out independent tasks to the next available resource.

• Students never noticed that their machines were being used!

Page 19: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Simulation-Based Methods

PROC MULTTEST of SAS/STAT(first hard-coded bootstrap?)

Page 20: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Simulation-Based Methods, II

• Adjust=simulate in GLM and MIXED

• Posterior simulation in MIXED

Page 21: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Toy Example – Testing Random Number Generators

• Random number generators often fail to provide independent numbers.

• Test case: U1, U2 are Uniform on (0,1).

• If independent, then E{6(U1-U2)2} = 1.00.

• Check: Generate many pairs, report average (should be 1.000000)

Page 22: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Code

toy_example.htm

Page 23: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Results

Dist_log.htm Dist_lis.htm

Page 24: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Startup (Windows)

C:\Program Files\SAS\SAS 9.1>spawner -i -comamid tcp

1. Start Spawner:

2. Activate Spawner:

3. Set batch log in permissions:

Page 25: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

The %Distribute Macro

• Written by Cheryl Doninger and Randy Tobias

• File: http://support.sas.com/rnd/scalability/papers/distribute.zip

• Supporting document:

http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf

Page 26: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Problems We Have Experienced

• Random crashes (client as well as hosts)

• Diagnosing errors

• I/O problems

• Windows Service Pack 2 Firewall

• Social issues (grid involves people!)

Page 27: Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business

Future Plans

• Support from business and government:

– grid-enabled bioinformatics

– business intelligence/data mining

• Support HPC at TTU and in Texas