Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
UHD Grid Computing TeamDepartment of Computer and Mathematical SciencesUniversity of Houston-Downtown
2
Outline
Grid LaboratoryArchitectureInterfaceClustersApplication Projects
3
Historical Review
An Integrated Lab Package for Introductory Computer Science Course – CSIThe Design of Phil2000
Java ApplicationHierarchical Design
Labs -> Tasks -> ActivitiesExplorerPlug-ins to run applications
4
Grid Infrastructure for Laboratory
an interface that is extensible to incorporate more lab modules and customizable to different course structures
Solution: a lab explorer an computational backbone that provides services for various lab activities
Solution: an array of servers that run on a computational grid
5
The pyramid model of the project
Interface of functional units
Client/server model
Lab modules
Module language
Architecture specification
Design theme
Application theme
Multi-agent system
6
Labs to implementTopology: Circuiting messages in a ringCollective communications: Matrix transposeGroup management: Matrix multiplication with Fox’s algorithmScientific computation: Solving linear systems with Jacobi’s algorithmCombinatorial search: Traveling salesman problemParallel I/O: Vector processing - SummationPerformance analysis: Visualization with Upshot – Trapezoidal rule problemParallel library: Solving linear system with ScaLapackScalability analysis: Bitonic sortingLAN configuration: The use of NICs and hubsNetwork analysis: Monitoring a chat roomAddress resolution: Experiment with ARP burstIP masquerading: Clustered web serversWAN configuration: The use of routersPerformance tuning: Deal with congestionService configuration: The configuration of a networked file system:
7
The Main Menu Frame
End
ScreensCards
Tree Panel Upper Toolbar
Lower Toolbar
Save
Open
Next
Previous
Last
Help First
The Lab Layout
8
Frame title & icon Screen
menu
Tree
Upper ToolBar
Screen panel
ScrollPane LowerToolBar
Button The Main Menu
9
Open and Close Menus
Open or Save lab Massage
The file *.mla
10
Print layoutframe
Print Dialog
Data Table
Print Table
Print Dialog
Scroll Pane
11
Help Menu
Help Frame
12
Other Software used
Visual C++
MPICH
JumpShot
13
Node Calling and Services Menu
Services menu
Scroll pane
RadioButton
Text areaText
field
Scroll Menu
Node CallingMenu
14
An Example
Performance evaluation of parallel programs
15
Performance Analysis - Activities
a. Performance Prediction b. Compilation and execution c. Profiling
16
Khoi Nguyeno Construct small cluster/NOW.o CMS lab cluster (16 nodes)o Linux Beowulf Cluster functionalo Successfully run all sorting algorithms
with expected results.o Create a GUI interface as an overlay
using JAVA.
http://www.netlib.org/utk/papers/mpi-book/node2.html
17
16 Nodes RedHat 9.0 as Operating SystemConfigure rsh in root and user directoriesInstall MPICH ver. 1.2.6 (Unix- all flavors)Test with MPICH test programs & sorting program
18
The Cluster
19
Test results – Quick sortQ u i c k s o r t f o r 1 0 , 0 0 0 e l e me n t s
0
0 . 0 1
0 . 0 2
0 . 0 3
0 . 0 4
0 . 0 5
P r o c e s s o r s
0 . 0 4 2 0 . 0 1 4 5 0 . 0 1 2 9 0 . 0 1 5 8 0 . 0 1 9
1 p 2 p 4 p 8 p 1 6 p
Q u i c k s o r t f o r 1 0 0 , 0 0 0 e l e me n t s
0
0 . 0 5
0 . 1
0 . 1 5
0 . 2
0 . 2 5
P r o c e s s o r s
0 . 1 6 2 0 . 1 8 1 0 . 1 9 8 0 . 1 8 9 0 . 2 2 4
1 p 2 p 4 p 8 p 1 6 p
Q u i c k s o r t f o r 1 , 0 0 0 , 0 0 0 e l e me n t s
0
0 . 5
1
1 . 5
2
P r o c e s s o r s
1 . 4 9 1 . 6 2 1 . 5 1 1 . 3 4 1 . 3 4
1 p 2 p 4 p 8 p 1 6 p
Q u i c k s o r t f o r 1 0 , 0 0 0 , 0 0 0 e l e me n t s
0
5
1 0
1 5
2 0
P r o c e s s o r s
1 6 . 0 8 5 1 4 . 6 8 1 4 . 5 8 1 4 . 1 5 1 2 . 4 9
1 p 2 p 4 p 8 p 1 6 p
20
Test results – Merge sortM e r g e So r t - 1 0 , 0 0 0 e l e me n t s
0
0 . 0 1
0 . 0 2
0 . 0 3
0 . 0 4
0 . 0 5
0 . 0 6
P r o c e s s o r s
0 . 0 4 9 1 0 . 0 4 3 4 0 . 0 3 0 2 0 . 0 3 1 0 . 0 2 9
1 p 2 p 4 p 8 p 1 6 p
M e r g e So r t - 1 0 0 , 0 0 0 e l e me n t s
0
0 . 0 5
0 . 1
0 . 1 5
0 . 2
0 . 2 5
0 . 3
P r o c e s s o r s
0 . 2 4 8 0 . 2 2 3 0 . 1 7 9 0 . 1 4 7 0 . 1 5 5
1 p 2 p 4 p 8 p 1 6 p
M e r g e s o r t - 1 0 , 0 0 0 , 0 0 0 e l e me n t s
0
1 0
2 0
3 0
4 0
P r o c e s s o r s
3 1 . 7 5 2 5 . 7 2 1 7 . 8 9 1 3 . 8 5 1 2 . 1 4
1 p 2 p 4 p 8 p 1 6 p
M e r g e s o r t P e r f o r ma n c e I n c r e a s e f o r 1 , 0 0 0 , 0 0 0 e l e me n t s
0
0 . 2
0 . 4
0 . 6
0 . 8
1
1 . 2
1 . 4
P r o c e s s o r s
1 6 . 5 9 % 5 1 . 7 0 % 9 9 . 2 5 % 1 1 8 . 8 5 %
1 p 2 . 6 7 2 p 4 p 8 p 1 6 p
21
GUI will be basic window appletAction buttons
IntroductionRun your MPI programRun Demo programs
Sorting ProgramsDistribution sample programs
HelpOutput Window
22
Finished GUI - Introduction
23
Open your MPI file!
24
Compile & Build your MPI Program!
25
Running your MPI program!
26
Program Running in Console…
27
ABSTRACTThe construction and performance of computer clusters
running different operating systems is studied. A platforms Windows XP cluster and a Linux ‘Beowulf’ cluster needed to be constructed to conduct a time-based analysis. Details on construction, configuration, and performance between the clusters are discussed.
INTRODUCTIONThe typical Von-Neumann architecture has directed us to
increase processing power via increased transistors, addressing space, and physical memory. However, a more efficient way is through message-passing between multiple processors. The concept of message-passing is to achieve parallelism through a function that explicitly transmits data from one process to another. Message Passing Interface (MPI) is simply a “library” of functions that can be called from C/C++ and FORTRAN programs. MPI programs make use of multiple processors by assigning each processor a task. Each processor works in parallel with anotherprocessor where one sends a packet of data and one receives.
MPI programs are designed to operate most efficiently on multiple processors. They are used widely on Scalable Parallel Computers (SPCs) and Networks of Workstations (NOWs). A ‘cluster’ is simply a collection computers (2 or more) working in parallel to accomplish a given task. Here, two different clusters were constructed to run different sorting algorithms and sample MPI programs. For convenience, a Java applet was also developed to launch these programs. Subsequently, cluster construction and performance results are discussed.
Khoi Nguyen, Computer & Mathematical Sciences, University of Houston – DowntownAdvisor: Dr. Hong Lin
Fall 2004CLUSTER CONSTRUCTION
XP Cluster
2 nodes: AMD Athlon 1.33GHz and Pentium III 850MHz w/ 512MB system memory were linked via a Router/Switch (see Figure 2).
Router/ Switch
Linux Beowulf Cluster
16 nodes: (15) Pentium II 350Mhz w/ 128MB system memory and (1) server node: Pentium 550Mhz w/ 256MB system memory. All nodes were linked via 10/100Mbps Ethernet LAN switch. A KVM switch was installed for only 4 nodes (See Figure 3).
SORTING ALGORITHMSParallel implementations of Merge-sort (O(log2n)) and
Bitonic-sort (O((log2n))2/2) were used to conduct the time-based analysis. Serial implementations were also incorporated to serve as control variables.
XP CLUSTER CONFIGURATIONThe configuration for this cluster required an older protocol – NETBeui, but
the more widely used protocol today is TCP/IP. The MPICH installation was mirrored on each node, and user information and passwords must be identical, and the executable program file must be in the same location on each node. Either node could function as the server at the user’s discretion; moreover, whatever node launched the program, becomes the server. MPICH ran processes in a ‘round-robin’fashion.
BEOWULF CLUSTER CONFIGURATIONThis architecture requires the installation of a Linux distribution on each node.
One node functions as a server where the user interacts directly. The rest of the nodes serve as computational slaves (see Figure 3). Fedora (latest Red Hat) was installed on each node. Remote Shell or ‘rsh’ was used for communication between server and nodes. Each node was configured the same way – differing in IP and hostnames. The latest MPICH distribution for UNIX was installed to each node to the same directory. Sample programs included in the distribution were tested on 8 nodes.
Figure 3Figure 2
Figure 1
Workstations
LAN Switch
KVM Switch
0.001.142.283.424.565.706.847.989.12
10.2611.4012.5413.6814.8215.9617.1018.2419.3820.5221.6622.8023.9425.0826.2227.3628.5029.6430.7831.9233.0634.2035.3436.48
10000 100000 1000000 10000000
ELEMENTS
Tim
e (s
)
Serial
Parallel
XP Cluster – Mergesort – 2 Processors
0.942
0.65
0.468
0.3670.315
0.356 0.354
0.257 0.229
0.307
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1p 2p 3p 4p 5p 6p 7p 8p 9p 10p
PROCESSORS
TIM
E (s
)
CPI Test Program - Beowulf
TO BE CONTINUED…Inconclusive and inconsistent data drawn from the
XP cluster has led me to choose the Beowulf design. MPI programs are competing for resources under XP, so priority scheduling is required for running MPI programs efficiently under XP. This is a rather cumbersome and inconvenient process; moreover the Beowulf cluster offers more flexibility in configuration. The XP cluster has been abandoned and continued research is currently being conducted on the Beowulf cluster.
Server
REFERENCES
Pacheco, Peter. Pacheco, Peter. Parallel Programming with MPI.Parallel Programming with MPI.Morgan Kaufmann, 1997.Morgan Kaufmann, 1997.
Parallel Programming with MPI. Parallel Programming with MPI. Pacheco, Peter. Pacheco, Peter. 2002. <http://2002. <http://www.cs.usfca.edu/mpiwww.cs.usfca.edu/mpi/>/>
28
External Support
NSF Major Research Instrumentation Grant2 more clusters, each with 16 nodesMaster server node to distribute applicationsExternal storage
29
2nd & 3rd Cluster Configuration
Danil Safin and Hooman HemmatiSharing Internet connection using FirestarterNetwork configuration and booting on LANEnable remote/secure shells and allow passwordless loginSharing files with Network File SystemInstall and configure MPICH2
30
Real-Time Intelligent Agent Traffic Controllers
OutlineProblemGoals Traffic modelSimulation test bedPreliminary results
31
ProblemCoordinate a network of traffic lights in an inner city streets such as downtown can be very challengingThe fact that traffic configurations change constantly One might have to wait for the light to turn green at an intersection where there is no car on the cross street How long to keep the light green at a busy street?Timed traffic lightsSensors
32
Goals
Develop a traffic model that can manage a network of traffic lights efficiently by maximize the traffic flows and minimize the time of traffic flowsConstruct a visual simulation test bed to verify the our model and perform comparisons to other models
33
Real-Time Intelligent Agent Traffic Controller Model
Consists of two types of agents
Communication Agents --send and receive traffic information from and to its neighbors
Computation Agents --make decision to keep or change the current traffic light based on the information from Communication Agent
34
Communication Agents
Send and receive traffic informationNumber of cars passing at the intersectionNumber of cars waiting at the intersectionTime of light has stay greenWaiting time for a red light to change to greenAverage speed of current traffic
35
Computation Agents
Establish rules from traffic informationConvert to fuzzy rulesConstruct fuzzy controller
36
Simulation Test Bed
Implement clusters where each node serves as a traffic light at an intersectionRules for moving cars
37
SimulationsCity of 10 horizontal and 10 vertical streets, each with 3 lanes, containing a total of 100 traffic lights. Tests were run with maximum cars being 1500 (heavy traffic), 1000 (moderate) and 500 (light traffic). The cars entered the city map randomly, with no preference for any street or direction. Two Traffic models were compared:
the timed traffic lights with a checkerboard pattern (no adjacent intersections have the same light), Simple traffic controllers. Each controller based its decision on
the amounts of cars on the two sides of the intersection, the maximum waiting time that any car can be made to wait at a red light.
The data gathered was the mean speed of a single car and the average of all cars' mean speed.
38
Simulations
Intelligent Agent Traffic ControllersStandard Traffic Controllers
39
Preliminary Results
Two models:Standard traffic lightsSimple intelligent agent controlled traffic lights
Measured average speed of all cars IA traffic lights are 20% to 90% more efficient than standard lights
40
Intelligent Traffic Control by Agents
41
Data Mining
42
Data Mining Results
Gabriel Williams: Text mining on clustersDanil Safin: HTML document preprocessing
Datamining Runtime
0
100
200
300
1 8 16
# of Nodes
Runt
ime
Datamining Speedup
0
1
2
3
4
1 8 16
# of Nodes
Spee
dup
43
E-Learning Agent System Design
System arranged in a hierarchical tree structureTwo entry points into the system
Command lineNetwork socket – to service web requests
44
E-Learning Agent System Design (continued)
Maintenance Agent
Control Agent
Student Information Agent
Instructor
Notification and Recommendation Agent
Control Agent Control Agent
Master Control Agent
Link DB
Student Registrar
Student DB
45
The Agents Developed For This Project
Master Control AgentMain entry point into the system
Control AgentMaintenance Agent
InstructorNotification and Recommendation Agent
StudentStudent Information Agent
Registrar
46
Implementing MPI into the Agent System
MPI provides communication mechanism for the agentsPacks data into MPI data structureUses MPI_Send and MPI_RecvCommon for all agentsMPI Limitations
Process creation limitationEliminates 1:Many relationship between control and information agents
47
E-Learning System Agent Code Architecture
Data StructuresJOBINPUT JOB
struct JOB{int cmdSource;int cmdDestination;int cmdCode;int cmdResult;JOB_SOURCE ioSource;Socket* jobSocket;int messageLength;char messageInfo[MAX_STRING_SIZE];
};
struct INPUTJOB{int jobType;int cmdCode;Socket* jobSocket;char commandText[MAX_STRING_SIZE];
};
48
E-Learning System Agent Code Architecture (continued)
Class baseAgentContains MPI communication and process creation mechanismsAll agent classes inherit from this class
Agent classesDemos for each agent class represented
WEBAGENT.EXECGI program used to communicate with the Agent system from a web server
49
Client/Server Web Interface
The ultimate goal of this project is to formulate a formal system for creating multi-agent systems (MAS) so that one no longer has to rely on the use of a high level specification language. This will be accomplished by creating a gamma calculus parser and running the parser on a prototype to formulate a method for a formal system of creating multi-agent systems. As it stands, a prototype E-Learning MAS has been created and a preliminary Beliefs-Desires-Intentions (BDI) model, using argumentation based negotiation, has been created.
Multi-Agent Course-Scheduling System
Methods of implementation are as follows:1.First it was imperative to create a model MAS to run the calculus parser on. The chosen model was an E-Learning Environment MAS. This model was built using four main agents to distribute tasks; Master-Control Agent, Student Agent, Instructor Agent, Registrar Agent. These agents will handle registration and enrollment of students and the managing of course content.2.Second is to create advanced logic to run with these agents. The logic chosen was argumentation based negotiation. Using this with a BDI model, the agents would argue among themselves to achieve their particular goals. What is important about this model is each agent will argue for its beliefs and if other agents are coerced, they will create a compromise among themselves.3.The third point is to create a gamma-calculus parser to run on the MAS created. This will allow data to be collected and interpreted to formulate a method for the development of a formal system of creating multi-agent systems.
Methods
Successful completion of the multi-agent system prototype has been accomplished. Using MPI, the MAS divides tasks and sends the task to be accomplished by the appropriate agent. This system runs concurrently with a server/client socket structure. The Master-Control agent handles information received from the server socket which waits for a client on the same machine to communicate. This client gathers its information through use of Apache and Java Server pages. Thus the client of this system is a simple web page in which a user enters data to be used by the MAS.
The argumentation based negotiation logic with the BDI model has been preliminarily created. The rough prototype successfully integrates desires in which the registrar argues beliefs of classes that cannot be taught and classes that must be taught. When the system begins, the instructor argues what class it wants and the registrar responds, arguing what changes it may need to make. This is similar to what goes on with the student and registrar, in which the student has a list of desired classes and must argue to allow the registrar to accept its proposal. Current work is to make it more complex and add a visual aid to the program. All of this was done using Jason with agent-speak.
Results
55
Virus particle reconstruction from cryo-electron microscopy• Step 1: extract individual particle images from cryo-
electron micrographs or CCD images.• Step 2: determine orientations • Step 3: 3-D reconstruction. Execute Steps 2 and 3
repeatedly until convergence.• Step 4: dock atomic model into 3D density map
Z
X
Y
refinement
Step 1 Step 2 Step 3 Step 4
56
Challenges
Increase resolution from 20-30 Å to 5 Å.Increase number of projections, N from few hundreds to several thousand.Increase the size of pixel frames from about 1502 to 3002.
57
Computational challenges
Assuming:2,000 projections each of size 300 × 300 pixels.Then:we will solve at most 3 × 3002 = 270,000 linear least squares problems each having 2000 equations and 300 unknowns. The number of arithmetic operations:270,000 × 2000 × 3002 = O(5 × 1013)
58
Test on UHD cluster
P3DR Runtime
0
500
1000
1500
1 2 4 8 16
# of Nodes
Runt
ime
P3DR Speedup
0
2
4
6
8
1 2 4 8 16
# of Nodes
Spee
dup
59
60
61
Contact Information:Hong Lin ([email protected])