12
20 Novembre, 2010 Vladi Nosenzo, Roberto Vadori 2011 2011 European HyperWorks Technology Conference

2011 European HyperWorks Technology Conferencealtairatc.com/.../Session_07/VladiNosenzo-IVECO-7-20111012.pdf · 2011 European HyperWorks Technology Conference . 2 ... The technical

  • Upload
    lythu

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

20 Novembre, 2010

Vladi Nosenzo, Roberto Vadori

2011

2011 European HyperWorks Technology Conference

2

ABSTRACT

Bonn, 2011/11/08

The work described below starts from an idea of a previous experience of Reply, developed in collaboration with

Prompt Engineering. The technical constraints for IVECO have been more stringent than Reply ones; it was therefore

necessary to carry out a feasibility study to take into account the requirements of IVECO, both in terms of safety and

accessibility of the infrastructures. The main request was to ensure the security and inviolability of the data used during

the simulations. The infrastructure was designed and built to meet this need. We will start by describing a simple

configuration that, gradually, will be made more complex until to achieve the final one. Finally, we will compare the

simulation results, in terms of performances, obtained by using the two configurations (the virtual and the real cluster).

3

INTRODUCTION

The term Cloud Computing refers to a distributed and relocated structure. The analyzed solution is to use the services

offered by Amazon, where the machines are virtual computing resources. It should be noted that Amazon only provides

the computing resources with a minimal configuration, while the infrastructure and application codes must be installed

by the end user.

Bonn, 2011/11/08

4

VIRTUAL AND REAL INFRASTRUCTURE

The initial implementation included a Front End (hereafter FE) and a set

of cores N1 , N2 , … Ni , … Nn , distributed across multiple nodes (a

minimum of two cores for the FE, four to sixteen cores per compute

nodes) connected, via Internet, with a remote license server.

This structure was considered non-compliant with the safety IVECO standards. Therefore, it was built a new network

infrastructure that contained, within a Virtual Private Network (hereafter VPN), already existing, the FE and the

calculation nodes Ni. This solution has allowed the integration of the virtual Amazon resources with the IVECO network.

While the FE and the calculation nodes are within the VPN, the

license servers are outside: therefore, we must create a

communication channel between them and the virtual calculation

structure. The communication channel is controlled by two firewalls:

the first (an Open Point) verifies the communication between the FE

and the VPN, while the second (a Point to Point) verifies the

communication between the VPN and the network of license servers.

Bonn, 2011/11/08

5

VIRTUAL AND REAL INFRASTRUCTURE

The services and infrastructure offered by Amazon are configured as different types of instances: one instance is,

typically, a CPU with a number of cores (from two to sixteen), one disk space on which is installed the Operating System

(Linux or Window Server) and a few basic utilities. The following table contains the instances that have been activated

for the Proof of Concept (hereafter POC)..

INSTANCE OS MEMORY CPU NOTES

1 WS 2003 2 GB Remote Graphics

2 Linux 8 GB 2 Front End

3 Linux 16 GB 4 Compute Node

The first instance, a Windows Server 2003, has been used to

display the calculation results, using the HyperView software. On

the same machine, it was also installed an Expedat server,

dedicated to data transfer. On the FE were installed two different

classes of software: the first includes the codes of management

and , in particular:

PBS (Portable Batch System) : to manage the queues and to allocate the resources of the computation nodes.

NFS (Network File System) : to share the storage disk resources.

MPI (Message Parsing Interface): control libraries for running the same calculation performed in parallel on multiple

independent processors

SSH/SSL (Secure Socket Layer): to manage the encrypted communication between FE and nodes, and between FE and

graphic display,

while the second contains the calculation codes

Radioss : explicit solver for crash analysis

Optistruct : implicit solver for static and optimization analysis.

Bonn, 2011/11/08

6

NETWORK INFRASTRUCTURE

The Amazon infrastructure is mainly an internal virtual network, where the

compute nodes, the FE and the Graphics WorkStation are embedded. The FE

has several network interfaces: one on the internal and one on the public

network (Internet), which allows access to the client via SSH. The other is on

the VPN.

The Iveco infrastructure is more complex. We outline the network in its main

lines. In the figure we see three main components:

a VPN server, connected to the public and internal network, which

controls the input/output data

a client that communicates with the outside, through the firewall, and

with the license server

Two license servers that communicate with the external clients only

through the firewall.

Bonn, 2011/11/08

7

NETWORK INFRASTRUCTURE

The communication channel between Amazon and Iveco is made using an

encrypted VPN. The VPN server is embedded in the Iveco network, while the

client, in this case, is the FE Amazon. All communications occur via SSH

protocol, thus creating a secure channel of communication.

The most critical point, was to find the correct mechanism of communication between the FE and the license server.

From technical point of view, the license server, which uses FlexLM software, requires a two-way connection, using two

specific communication ports; was therefore necessary to assign a well-defined communication port, not arbitrary,

opening the access on the firewall. In the same way, the calculation nodes must be able to communicate with the license

server and therefore must be embedded inside the VPN.

Bonn, 2011/11/08

8

COMPUTATION CODES AND DATA TRANSFER

The software needed for to setup the infrastructure are standard; in addition to the package management (NFS, PBS,

SSH, MPI libraries) and some software to log on to the license server, it was not necessary to install additional packages.

All software is installed automatically using scripts; the whole process takes about 24 minutes to setup a FE and ten

computing nodes (quad-core). The computation codes were chosen considering the cluster configuration used: an

explicit solver (Radioss) and an implicit one (Optistruct). The choice was made taking into account the two different

calculation types; a crash analysis, performed using explicit codes, and a linear static analysis solved using an implicit

code. Since the two calculations are inherently different, the first was performed using all available cores in the compute

node, while the second was still running on the same node, but assigning two of the four available cores.

A critical point is the data transfer. While, in general, the input file is an

ASCII file, compressible and relatively small, the output files are binary

files, poorly compressible and large. These represent, if transferred by

traditional methods, a real bottleneck of the structure. The solution was as

follows:

transfer of results from the compute nodes to the Window server (green

arrow); as the compute nodes and the server are on the same Amazon

internal network, the transfer is very fast.

Open of a graphic session using the RDS protocol (Remote Desktop

System) through encrypted VPN channel

Download of results by connecting to the Expedat server; also in this

case the communication (red line) is via encrypted VPN channel

Bonn, 2011/11/08

9

COMPUTATION CODES AND DATA TRANSFER

The following table shows the time taken to download the results. The data

related to ADSL connection are comparable to the Iveco VPN that has a

bandwidth of 10 Mbps. In particular, using the Iveco VPN, a transfer of

1.5 GB compressed by means Expedat protocol to 46%, takes about 480 s.

Extrapolating the data obtained, a download of a 10 GB, using the Iveco

VPN, requires 3200 s, i.e. about 53 minutes.

The following table shows the average time taken to transfer the input data to

the Amazon network using the SFTP protocol (Secure File Transfer Protocol)

without compressing the data.

INPUT FILE DIMENSION TIME

RADIOSS 95 MBytes 112 s

OPTISTRUCT 70 MBytes 59 s

Bonn, 2011/11/08

10

RESULTS AND CONCLUSIONS

Normalizing the execution times respected to the more powerful

architecture, it should be noted that the Cloud Solution is more efficient

using explicit solvers, how we can see in the table and figure following.

TEST CASE CLOUD HPC

Optistruct 3.43 1

Radioss 1.3 1

One important point is to have an appropriate scratch area on the disk: the architecture used allocates a single disk area,

mounted on the FE, and exported via NFS onto all compute nodes. The scratch area must be dimensioned as a function

of the codes used, in particular for the implicit solvers, where sometimes is required a disk area also of 2 TB. In

conclusion we can say that the Cloud Solution is more efficient, in terms of performances, as the infrastructure is

correctly dimensioned from technical point of view (i.e. in terms of CPU time, disk space, bandwidth for data transfer

optimization) and is much useful to manage the workloads.

Optistruct Model Radioss Model

Bonn, 2011/11/08

11

NEXT STEP

Bonn, 2011/11/08

The next step will be to integrate the two solutions (virtual and real) using the virtual solution to manage, on-demand, the

workloads.

We are currently evaluating the economic impacts in order to have a "Proper Business Case".

12

RESULTS AND CONCLUSIONS

THANK YOU FOR YOUR ATTENTION

Bonn, 2011/11/08