View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Grid Computing at Texas Tech University using SAS
Ron BremerJerry PerezPhil Smith
Peter Westfall*
Director, Center for Advanced Analytics and Business IntelligenceTexas Tech University
What is Grid Computing?
• Grid computing means using multiple resources connected by the net to perform demanding calculations.
• Example:
Economies of High Performance Computing
• Current fastest machine: ~40 Teraflops ($300M)
• 10 Tflops Machines
(~$50M)
• Fastest Cluster at TTU: 0.1 Tflops (~$0.1M)
• Speed of a PC 0.003 Tflops
(~$.001M)
Underused Resources
• Computers are everywhere, mostly idle!
• Grid computing leverages unused resources to create an effective “Supercomputer”
• Teraflops = (N computers) x (TFLPs per)
• For Free! (Almost)
Grid Initiatives at TTU and in Texas
• HipCAT – High Performance Computing Across Texas
• TIGRE – Texas Internet Grid for Research and Education
• SORCER – Service ORienter Computing EviRonment (TTU CS dept.)
• SAS/Connect grid
HipCAT
• Consortium of Texas institutions working together to use– High performance computing– Clusters– Massive data storage – Scientific visualization– Grid computing.
• Director: Phil Smith, Texas Tech University• Members:
– Baylor College of Medicine – Rice University – Texas A&M University – Texas Tech University – University of Houston – University of Texas – University of Texas at Austin – University of Texas at Arlington– University of Texas at El Paso – University of Texas Southwestern Medical Center
TIGRE
• Texas Internet Grid for Research & Education• Two year project involving: UT, TTU, UH, Rice,
and TAMU• Funding announced by the Governor in
September• TIGRE will develop a grid software stack and
policies and procedures to facilitate Texas grid computing efforts.
Grid Software Products Used at TTU
• AVAKI
• Globus
• Jini Networking Technology
• SAS/Connect (MPConnect), %Distribute macro
Benefits of SAS
• Ease of Use (relative to other grid products)
• Available and applicable for many scientists in their resp. fields
• Flexibility– Data base (DATA step, PROC SQL)– Math/Optimization (SAS/IML, SAS/OR)– Stat (SAS/STAT, SAS/ETS)
Problems Amenable to SAS Grid
• Replicates of Fundamental task
• Fundamental tasks are time consuming, lots of replicates
• Examples– Simulation– Astrophysics– Bioinformatics– Ensembles of predictive models
Success Story
• Financial Event Studies– Developed simulation tool to detect events– Simulated its performance– 25 hours finished in 40 minutes– Published in J. Fin. Econometrics
• Old system: “Sneaker grid”
Another Success Story:Portfolio Analysis
• 300 portfolios, 50 securities each by randomly sampling securities from CRSP daily database (7.23 Gigabytes)
• 15 models created for each of 50 securities (PROC AUTOREG of SAS/ETS), under 169 treatment settings.
• 126,750 models and associated data steps per portfolio.
• 500 days of continuous computing time reduced to two weeks.
Notoriety
• Web articles appeared in SAS, Grid today, Next-Gen Data forum
• Interviewed by DataBase Trends and Applications
SAS Grid Structure
• Client connects to host machines
• Client sends replicates of fundamental task (“chunks”) to hosts
• Hosts process chunks, send back to client
• Client combines chunks and summarizes
The SAS Grid
SAS Farm
• 100 SAS machines in student lab
• 2.66 GhZ per node
• All have SAS software installed
• SAS “Spawner” must be started on all
• Avaki also installed - diagnoses problems
Student Lab
Load Balancing
• Automatically supports load balancing by farming out independent tasks to the next available resource.
• Students never noticed that their machines were being used!
Simulation-Based Methods
PROC MULTTEST of SAS/STAT(first hard-coded bootstrap?)
Simulation-Based Methods, II
• Adjust=simulate in GLM and MIXED
• Posterior simulation in MIXED
Toy Example – Testing Random Number Generators
• Random number generators often fail to provide independent numbers.
• Test case: U1, U2 are Uniform on (0,1).
• If independent, then E{6(U1-U2)2} = 1.00.
• Check: Generate many pairs, report average (should be 1.000000)
Code
toy_example.htm
Results
Dist_log.htm Dist_lis.htm
Startup (Windows)
C:\Program Files\SAS\SAS 9.1>spawner -i -comamid tcp
1. Start Spawner:
2. Activate Spawner:
3. Set batch log in permissions:
The %Distribute Macro
• Written by Cheryl Doninger and Randy Tobias
• File: http://support.sas.com/rnd/scalability/papers/distribute.zip
• Supporting document:
http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf
Problems We Have Experienced
• Random crashes (client as well as hosts)
• Diagnosing errors
• I/O problems
• Windows Service Pack 2 Firewall
• Social issues (grid involves people!)
Future Plans
• Support from business and government:
– grid-enabled bioinformatics
– business intelligence/data mining
• Support HPC at TTU and in Texas