Upload
lythu
View
220
Download
0
Embed Size (px)
Citation preview
2
ABSTRACT
Bonn, 2011/11/08
The work described below starts from an idea of a previous experience of Reply, developed in collaboration with
Prompt Engineering. The technical constraints for IVECO have been more stringent than Reply ones; it was therefore
necessary to carry out a feasibility study to take into account the requirements of IVECO, both in terms of safety and
accessibility of the infrastructures. The main request was to ensure the security and inviolability of the data used during
the simulations. The infrastructure was designed and built to meet this need. We will start by describing a simple
configuration that, gradually, will be made more complex until to achieve the final one. Finally, we will compare the
simulation results, in terms of performances, obtained by using the two configurations (the virtual and the real cluster).
3
INTRODUCTION
The term Cloud Computing refers to a distributed and relocated structure. The analyzed solution is to use the services
offered by Amazon, where the machines are virtual computing resources. It should be noted that Amazon only provides
the computing resources with a minimal configuration, while the infrastructure and application codes must be installed
by the end user.
Bonn, 2011/11/08
4
VIRTUAL AND REAL INFRASTRUCTURE
The initial implementation included a Front End (hereafter FE) and a set
of cores N1 , N2 , … Ni , … Nn , distributed across multiple nodes (a
minimum of two cores for the FE, four to sixteen cores per compute
nodes) connected, via Internet, with a remote license server.
This structure was considered non-compliant with the safety IVECO standards. Therefore, it was built a new network
infrastructure that contained, within a Virtual Private Network (hereafter VPN), already existing, the FE and the
calculation nodes Ni. This solution has allowed the integration of the virtual Amazon resources with the IVECO network.
While the FE and the calculation nodes are within the VPN, the
license servers are outside: therefore, we must create a
communication channel between them and the virtual calculation
structure. The communication channel is controlled by two firewalls:
the first (an Open Point) verifies the communication between the FE
and the VPN, while the second (a Point to Point) verifies the
communication between the VPN and the network of license servers.
Bonn, 2011/11/08
5
VIRTUAL AND REAL INFRASTRUCTURE
The services and infrastructure offered by Amazon are configured as different types of instances: one instance is,
typically, a CPU with a number of cores (from two to sixteen), one disk space on which is installed the Operating System
(Linux or Window Server) and a few basic utilities. The following table contains the instances that have been activated
for the Proof of Concept (hereafter POC)..
INSTANCE OS MEMORY CPU NOTES
1 WS 2003 2 GB Remote Graphics
2 Linux 8 GB 2 Front End
3 Linux 16 GB 4 Compute Node
The first instance, a Windows Server 2003, has been used to
display the calculation results, using the HyperView software. On
the same machine, it was also installed an Expedat server,
dedicated to data transfer. On the FE were installed two different
classes of software: the first includes the codes of management
and , in particular:
PBS (Portable Batch System) : to manage the queues and to allocate the resources of the computation nodes.
NFS (Network File System) : to share the storage disk resources.
MPI (Message Parsing Interface): control libraries for running the same calculation performed in parallel on multiple
independent processors
SSH/SSL (Secure Socket Layer): to manage the encrypted communication between FE and nodes, and between FE and
graphic display,
while the second contains the calculation codes
Radioss : explicit solver for crash analysis
Optistruct : implicit solver for static and optimization analysis.
Bonn, 2011/11/08
6
NETWORK INFRASTRUCTURE
The Amazon infrastructure is mainly an internal virtual network, where the
compute nodes, the FE and the Graphics WorkStation are embedded. The FE
has several network interfaces: one on the internal and one on the public
network (Internet), which allows access to the client via SSH. The other is on
the VPN.
The Iveco infrastructure is more complex. We outline the network in its main
lines. In the figure we see three main components:
a VPN server, connected to the public and internal network, which
controls the input/output data
a client that communicates with the outside, through the firewall, and
with the license server
Two license servers that communicate with the external clients only
through the firewall.
Bonn, 2011/11/08
7
NETWORK INFRASTRUCTURE
The communication channel between Amazon and Iveco is made using an
encrypted VPN. The VPN server is embedded in the Iveco network, while the
client, in this case, is the FE Amazon. All communications occur via SSH
protocol, thus creating a secure channel of communication.
The most critical point, was to find the correct mechanism of communication between the FE and the license server.
From technical point of view, the license server, which uses FlexLM software, requires a two-way connection, using two
specific communication ports; was therefore necessary to assign a well-defined communication port, not arbitrary,
opening the access on the firewall. In the same way, the calculation nodes must be able to communicate with the license
server and therefore must be embedded inside the VPN.
Bonn, 2011/11/08
8
COMPUTATION CODES AND DATA TRANSFER
The software needed for to setup the infrastructure are standard; in addition to the package management (NFS, PBS,
SSH, MPI libraries) and some software to log on to the license server, it was not necessary to install additional packages.
All software is installed automatically using scripts; the whole process takes about 24 minutes to setup a FE and ten
computing nodes (quad-core). The computation codes were chosen considering the cluster configuration used: an
explicit solver (Radioss) and an implicit one (Optistruct). The choice was made taking into account the two different
calculation types; a crash analysis, performed using explicit codes, and a linear static analysis solved using an implicit
code. Since the two calculations are inherently different, the first was performed using all available cores in the compute
node, while the second was still running on the same node, but assigning two of the four available cores.
A critical point is the data transfer. While, in general, the input file is an
ASCII file, compressible and relatively small, the output files are binary
files, poorly compressible and large. These represent, if transferred by
traditional methods, a real bottleneck of the structure. The solution was as
follows:
transfer of results from the compute nodes to the Window server (green
arrow); as the compute nodes and the server are on the same Amazon
internal network, the transfer is very fast.
Open of a graphic session using the RDS protocol (Remote Desktop
System) through encrypted VPN channel
Download of results by connecting to the Expedat server; also in this
case the communication (red line) is via encrypted VPN channel
Bonn, 2011/11/08
9
COMPUTATION CODES AND DATA TRANSFER
The following table shows the time taken to download the results. The data
related to ADSL connection are comparable to the Iveco VPN that has a
bandwidth of 10 Mbps. In particular, using the Iveco VPN, a transfer of
1.5 GB compressed by means Expedat protocol to 46%, takes about 480 s.
Extrapolating the data obtained, a download of a 10 GB, using the Iveco
VPN, requires 3200 s, i.e. about 53 minutes.
The following table shows the average time taken to transfer the input data to
the Amazon network using the SFTP protocol (Secure File Transfer Protocol)
without compressing the data.
INPUT FILE DIMENSION TIME
RADIOSS 95 MBytes 112 s
OPTISTRUCT 70 MBytes 59 s
Bonn, 2011/11/08
10
RESULTS AND CONCLUSIONS
Normalizing the execution times respected to the more powerful
architecture, it should be noted that the Cloud Solution is more efficient
using explicit solvers, how we can see in the table and figure following.
TEST CASE CLOUD HPC
Optistruct 3.43 1
Radioss 1.3 1
One important point is to have an appropriate scratch area on the disk: the architecture used allocates a single disk area,
mounted on the FE, and exported via NFS onto all compute nodes. The scratch area must be dimensioned as a function
of the codes used, in particular for the implicit solvers, where sometimes is required a disk area also of 2 TB. In
conclusion we can say that the Cloud Solution is more efficient, in terms of performances, as the infrastructure is
correctly dimensioned from technical point of view (i.e. in terms of CPU time, disk space, bandwidth for data transfer
optimization) and is much useful to manage the workloads.
Optistruct Model Radioss Model
Bonn, 2011/11/08
11
NEXT STEP
Bonn, 2011/11/08
The next step will be to integrate the two solutions (virtual and real) using the virtual solution to manage, on-demand, the
workloads.
We are currently evaluating the economic impacts in order to have a "Proper Business Case".