Upload
myron-barton
View
217
Download
3
Tags:
Embed Size (px)
Citation preview
DISTRIBUTED AND INTERACTIVE DISTRIBUTED AND INTERACTIVE SIMULATIONS ATSIMULATIONS AT LARGE SCALE FOR LARGE SCALE FOR
TRANSCONTINENTAL EXPERIMENTATIONTRANSCONTINENTAL EXPERIMENTATION
19 October 2010Dan M. DavisFor Gottschalk, Yao Lucas, & Wagenbreth [email protected](310)448-8434
Approved for public release; distribution is unlimited.
DS-RT 2010
DS-RT 2010Co-Authors
Prof. Robert F. Lucas, Dr. Ke-Thia Yao and Gene Wagenbreth
Information Sciences InstituteUniversity of Southern CaliforniaMarina del Rey, California 90292{ rflucas, kyao, genew } @isi.edu
Dr. Thomas D. Gottschalk
Center for Advanced Computing Research
California Institute of TechnologyMarina del Rey, California 91125
DS-RT 2010Overvie
w
Needs of user community at JFCOM
Three Technologies
General Purpose GPUs Implementation
High Bandwidth Studies with Interest
Management
Distributed Data Management
Lessons Learned
DS-RT 2010Theses
Transcontinentally Distributed Simulations require advanced technologies and innovative paradigmatic approaches
Academic Researchers often have such capabilities that are typically not yet in the journeyman programmers’ toolbox
GPGPUs provide useable acceleration, but making conservative assessments of potential speed-up may be warranted
Transcontinental High Performance Communications (“10 Gig”) are possible with Interest Management
With distributed systems and a data glut from High Performance Computing simulations, new approaches to Data Management are required
DS-RT 2010
JFCOM as a Distributed
Simulation User U.S. Joint Forces Command, Norfolk, Virginia One of DoD’s combatant commandsKey role in transforming defense capabilities
Two JFCOM Directorates using agent-based simulationsJ7 - Training
trains forcesdevelops doctrineleads training requirements analysis provides an interoperable training environment
J9 - Concept Development and Experimentationdevelops innovative joint concepts and capabilities provides proven solutions to problems facing the joint force
Simulations are typically GenSer Secret and characterized by:Interactive use by hundreds of personnel Distributed trans-continentally, but must be real timeVast majority of users at the terminals are uniformed warfighters
DS-RT 2010Plan View Display
DS-RT 20103D Stealth View
DS-RT 2010Simulation Federates
Agent-based models use rules for entity behaviorAutonomous-agent entitiesCan be Human-In-The-Loop (HITL) and run in real timeLarge compute clusters required to run large-scale
simulations
Standard interface is HLA RTI communication (IEEE 1516)Supplanted to old DISPublish/Subscribe modelUSC/Caltech Software Routers scale better
Common Codes in use at JFCOM:Joint Semi-Automated Forces (JSAF)“Culture”, stripped-down civilian instantiation of JSAFSimulating Location & Attack of Mobile Enemy Missiles
(SLAMEM)OneSAF
DS-RT 2010
Computing Infrastructure
Koa and Glenn Deployed, spring ’04
MHPCC & ASC-MSRC2 Linux Clusters, 32 bit256 nodes, 512 cores each
Joshua deployed 2008GPGPU-enhanced256 nodes, 1024 cores, 64 bit
DREN ConnectivityUsers in VA, CA & Army
basesApplication tolerates
network latency
Real-time interactive supercomputing
DS-RT 2010
Joshua GPGPU-Enhanced Cluster
Configuration
256 NodesNodes - (2) AMD Santa Rosa 2220 2.8 GHz dual-core
processors, 1024 cores totalGPUs - (1) NVIDIA 8800 Video CardNode Chassis - 4U chassisMemory - 16 GB DIMM DDR2 667 per node
GigE Inter-node CommunicationsDelivery to:
Joint Advanced Tactics and Training Laboratory (JATTL) in Suffolk, VA
DS-RT 2010
Perspective:Entity Growth vs.
Time
Nu
mb
er
an
d
Com
ple
xit
y o
f JS
AF
En
titi
es
JSAF/SPP Joshua (2008)
10,000,010,000,00000
UE 98-1
(1997)
JSAF/SPP Capability (2006)
JSAF/SPP Urban
Resolve (2004)
JSAF/SPP
Tests (2004)
J9901 (1999)
SAF Expres
s (1997)
3,600 3,600 12,000 12,000 107,00107,00
0 0
AO-00 (2000)
50,000 50,000
1,400,001,400,00
1,000,001,000,0000
250,000250,000
SPP Proof of Principle DARPA / Caltech
Experiments continue to require orders of magnitude larger &
more complex battlespaces
SCALEand FIDELITY
DC Clusters at MHPCC & ASCMSRC
DHPI GPU-
Enhanced Cluster
DS-RT 2010Why GPUs?
GPU performance can be 100X hostsDon’t forget Gene Amdahl’s Law; 2-3X
typical
This speed-up is expected to grow
Early SAF work (UNC. SAIC, USC)Line of Sight
Route Finding
Collision Detection
Sparse Matrix Factorization ISI verified similar bottlenecks in JSAFNew ideas for use in scenario generation
for new multi-spectral sensors
DS-RT 2010
Route Planning Performance Impact
Time Spent in Route Planning is Critical Bottleneck
DS-RT 2010
Benefits of GPGPU Computing
Joshua has provided many benefits; some are not easily quantified
Training, analysis or evaluation in cities otherwise off-limits due to:security issuespublic resistance to combat troops in their citydiplomatic about U.S. interest in cities of potential conflict
Joshua does save personnel costs, e.g. Army Division costs ~ $20M per day.
DHPI cluster can runs such a program using only ~100 technicians Cost saving may be ~$19.5M each day.
Good visibility with the leadership elite:Congressional visitsLieutenant General noted that it was probably the only time in his
career he would have an opportunity to command so large a unit
1,500 soldiers across the country participated, all connected
DS-RT 2010
Transcontinental Network High Bandwidth Research
The issue here was the potential exploitation of High Bandwidth Nets, e.g. as 10 Gig (10 GigaBit per Second) Nets
The nodes of this WAN were located at ISI-East in VirginiaUniversity of Illinois at ChicagoISI-West in California
Previous work indicated the utility of interest managed communications on cluster mesheshigh-bandwidth Local Area Networks (LANs) lower bandwidth WANs
Interest-limited message exchange was done using Caltech’s MeshRouter formalism
DS-RT 2010
Interest Managed Software Router Diagram
DS-RT 2010
Transcontinental Testbed for“10 Gig” Proof of Concept
DS-RT 2010
Tests on Local ISI Cluster
Prepared for the wide area tests Ran a number of generalizations of a various
configurations using the ISI-W clusterTests involved configurations with a
single router process on its own nodesingle publish process on its own nodemultiple subscriber processes on either
Performance results for these configurations are summarized in next slide
Bandwidth numbers reflect only data rates into the subscriber processes.
DS-RT 2010
Bandwidth results for Various Test Configurations
Number ofSubscribers
SubscriberNodes
Per NodeBandwidth
TotalBandwidth
RouterLoad
1 1 680 Mbit/sec 690 Mbit/sec 41% Busy
2 1 672 Mbit/sec 1.3 Gbit/sec 54% Busy
4 1 504 Mbit/sec 2.0 Gbit/sec 59% Busy
6 2 344 Mbit/sec 2.1 Gbit/sec 55% Busy
8 2 288 Mbit/sec 2.3 Gbit/sec 51% Busy
16 2 160 Mbit/sec 2.6 Gbit/sec 44% Busy
DS-RT 2010
Benchmark Tests: Two forms
Application processes Publish Processors
messages of specified length and interest statenominal total publication rate (Mbyte/sec) controlled by a data file
Subscribe Processors which
receives messages for a specified interest state, collects messages from multiple publishers
measure actual incoming message rates
Routers direct individual messages from publishers to subscribers according to the interest declarations
Router processes were instrumented to determine the fraction of time spent on management.
DS-RT 2010
Aggregate bandwidth versus Number of
Subscribers
DS-RT 2010
10 Gig WAN Test Results
WAN Tests Were then PerformedThis entailed a great deal of trouble-shooting
various routers along the WANA number of variants of the basic configuration
were explored:the number of distinct interest statesthe number of processors associated with a single router
processes) the number of replicas of the basic “Router plus
Associated Pub/Sub” nodes at each site
Typical performance numbers for a test with eight participating nodes at each of the UIC and ISI sites
Results are summarized on the next slide
DS-RT 2010
Performance Measures for Typical WAN Test
MessageLength
Client BW(bytes/sec)
Single MR BW(bytes/sec)
Aggregate BW(bits/sec)
0.4 KByte 3.2 M 16.0 M 1.0 G
0.8 KByte 6.4 M 32.0 M 2.1 G
1.6 KByte 12.8 M 64.0 M 4.1 G
2 KByte 14.3 M 71.5 M 4.6 G
100 KByte 0.8 M 4.0 M 0.3 G
DS-RT 2010
Final Results ~ 5 GigaBits Over a “10 Gig”
LineAggregate throughput is rather poorer for:
Individual message sizes that are too smallIndividual message sizes that are too large
This is consistent with experience on single SPP Constraints largely due to the nature of the RTI-s
communications primitives As long as near the optimal message size, the
aggregate bandwidth for the WAN test is ~ 4.6 Gbits/sec.
WAN tests varying configurations gives similar results:
Max Total WAN BW = 4.6 – 4.9 Gbit/sec
DS-RT 2010
Data Requirements
Larger ScaleGlobal scale vs. theatre
Higher FidelityGreater complexity in models of
entities(sensors, people, vehicles, etc.)
Urban OperationsMillions of civilians
All of the above produce dramatic increases in data relative to the previously experienced events.
DS-RT 2010
Terrain Large, but NOT Significant Data Issue
DS-RT 2010
Growth and the Impending
Data Armageddon JFCOM had immediate need for more entities (>10X)Limited Memory on Nodes and in TeslaTertiary Storage very limited
TeraByte a week with existing practiceKeeping only 20% of current dataNeed 10X more entitiesNeed 10X behavior improvement Net growth needed: almost three orders of magnitude
Now doing face validityNeed more quantitative, statistical
approach (future work at Caltech – Dr. Thomas Gottschalk)Data mining efforts now commencing
DS-RT 2010
Two Key Challenges
Collect the “fire hoses” of data generated by large-scale distributed sensor rich environmentsWithout interfering with communication
Without interfering with simulator performance
Maximally exploit the collected data efficientlyWithout overwhelming users
Without losing critical content
Goal: Unified distributed logging/analysis
infrastructure, which will help the users,not burden the computing/networking managers and not unduly tax the network traffic loads
DS-RT 2010
Limitation of the First System: Did not scale!
Two separate data analysis systemsOne for near-real time during the event
Another one for post event processing
For near-real timeToo much data to access over
wide-area network without crashing
For post event processing1-2 weeks to stage data to
centralized data store
Discards Green entities (80%)
DS-RT 2010
Handling Dynamic Data
Data is NOT static during runs, but users need to accessLogger continuously inserts new data from the simulation
Need distributed query to combine remote data sourcesDistributed logger inserts data into SDG data store at each site
ProblemsLocal cache invalid with respect to insertsCannot preposition data to optimize queries
ISI Strategy: explore trade-offsCompute on demand for better efficiency Compute on insert for faster queries Variable fidelity: periodic updatesDynamic pre-computation: detect frequent queries
DS-RT 2010
Handling Distributed Data
Analyze data in placeData is generated on distributed nodes
Leave data where it is generated
Distribute data access so data appears to be at a single site
Take advantage of HPC hardware capabilitiesLarge capacity data storage
High bandwidth network
Data archival
Exploit JSAF query characteristicsLimited number of joins
Counting/aggregation type queries
Data product is several orders of magnitude less than raw data
DS-RT 2010
Notional Diagram of
Scalable Data Grid
DS-RT 2010
Multidimensional Analysis:Sensor/Target Scoreboard
Summarizes sensor contact reportsPositive and negative sensor
detectionsDisplays two dimensional views
of the dataProvides three levels of drill-down
Sensor platform type vs. target class
Sensor platforms vs. sensor modes
List of contact reports
DS-RT 2010
Multidimensional Analysis
Raw data has other dimensions of potential interestDetection status
Time, location
Terrain type, entity density
Weather condition
Each dimension can be aggregated at multiple levelsTime: minutes, hours, days
Location: country, grid square
Collapse and expand multiple dimensions for viewing
sens
or
target
time
DS-RT 2010
* : any * : any
t : sensor platform type
p : sensor platform
c : targetclass
o : targetobject
m : sensormode
Multidimensional Analysis
* : any
*
t
p
c
tc
pc
o
to
po
m
tm
pm
cm
tcm
pcm
om
tom
pom
Sensor/Target Scoreboard drill-downs in the context multidimensional analysis
Data classified along 3 dimensions
Drill-down to 3 nodes in the dimensional lattice
Dimensions
Dimension Lattice
DS-RT 2010Cube Dimension Editor
DS-RT 2010
SDG: Broader Applicability?
Scalable Data Grid: a distributed data management application/middleware that effectively:Collects and stores high volumes of data at very high
data rates from geographically distributed sourcesAccesses, queries and analyzes the distributed dataUtilizes the distributed computing resources on HPCProvides a multidimensional framework for
viewing the data
Potential application areasLarge scale distributed simulationsInstrumented live training exercisesHigh volume instrumented physics research
Virtually any distributed data environment using HPC
DS-RT 2010K-Means Testing
K-Means is a popular data mining algorithm The K-Means algorithm requires three inputs:
- an integer k to indicate the number of desired clusters- a distance function over the data instances- the set of n data instances to be clustered.
Typically, a data instance represented as a vectorThe output of the algorithm is a set of k points
representing the mean of the k clustersEach of the n data instances is assigned to nearest
cluster mean based on the distance function
DS-RT 2010
K-Means Clustering of
Three Data Points
-2000
-1000
0
1000
2000
3000
4000
5000
-6000 -5000 -4000 -3000 -2000 -1000 0 1000 2000
Clustering of 3 Clusters
DS-RT 2010Conclusions
Large-scale Distributed Simulations demand more power Technology can to deliver that power This paper set out three emerging technologies
GPGPU Acceleration of Linux ClustersHigh Bandwidth Interest Managed CommunicationsDistributed Data Management
This work shows Simulation Researchers’ new abilitiesto generate simulationsto more effectively move the data around to more efficiently store the data to better analyze the data
Researchers would benefit from understanding the technologies’ powerlimitations costs
DS-RT 2010
Research Funded by JFCOM and AFRL
This material is based on research sponsored by the U.S. Joint Forces Command via a contract with the Lockheed Martin Corporation and SimIS, Inc., and on research sponsored by the Air Force Research Laboratory under agreement numbers F30602-02-C-0213 and FA8750-05-2-0204. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government. Approved for public release; distribution is unlimited.