16
10 - 1 The Term Project demands in-depth research and The Term Project demands in-depth research and investigated reporting. All reported contents, figures, investigated reporting. All reported contents, figures, and tables must be originally generated . and tables must be originally generated . Ten topics are for students to choose from, different Ten topics are for students to choose from, different topics topics for multiple disjoint groups of students to work on. for multiple disjoint groups of students to work on. You have only 1 month to report the work first through a You have only 1 month to report the work first through a proposal and then a complete written report at the end proposal and then a complete written report at the end of the semester and present it. of the semester and present it. The proposal which will become your report at the end The proposal which will become your report at the end should follow the IEEE Conference paper format of about should follow the IEEE Conference paper format of about 10 pages, including original figure illustrations and 10 pages, including original figure illustrations and tabulations plus a reference listing of 10 - 15 papers. tabulations plus a reference listing of 10 - 15 papers. (Template link: (Template link: Introduction to Grid and Cloud Computing Term Project Specification:

10 -1 The Term Project demands in-depth research and investigated reporting. All reported contents, figures, and tables must be originally generated

Embed Size (px)

Citation preview

10 - 1

The Term Project demands in-depth research and investigated The Term Project demands in-depth research and investigated

reporting. All reported contents, figures, reporting. All reported contents, figures,

and tables must be originally generated .and tables must be originally generated .

Ten topics are for students to choose from, different topics Ten topics are for students to choose from, different topics

for multiple disjoint groups of students to work on. for multiple disjoint groups of students to work on.

You have only 1 month to report the work first through a proposal and You have only 1 month to report the work first through a proposal and

then a complete written report at the end of the semester and present then a complete written report at the end of the semester and present

it.it.

The proposal which will become your report at the end should follow The proposal which will become your report at the end should follow

the IEEE Conference paper format of about 10 pages, including the IEEE Conference paper format of about 10 pages, including

original figure illustrations and tabulations plus a reference listing of original figure illustrations and tabulations plus a reference listing of

10 - 15 papers. (Template link: 10 - 15 papers. (Template link:

http://www.ieee.org/conferences_events/conferences/publishing/templhttp://www.ieee.org/conferences_events/conferences/publishing/templ

ates.html)ates.html)

Introduction to Grid and Cloud

ComputingTerm Project Specification:

10 - 2

How to Write a Good Technical Paper on 10 pages ?

1. Title ( < 8 words) must hit the hot topic - short, clear and eye-catching, Authors

and Affiliations (in 1-2 lines after the title)

2. Abstract (< 50 ~ 100 words) must state the research objectives, summarize the

findings, and highlight the innovative contributions.

3. Introduction (including the title, abstract) on 1 page must motivate the readers to

read the rest of the paper and prepare them with the necessary background

4. Problem Statement and Formulation (2 pages) of the problem being solved,

basic assumptions, formulate the problem with technical specifications

5. Architecture, algorithms, solution methods, protocols, analytical

results and illustrated example, etc. (2 pages)

6. Experimental setting (computer simulators, benchmarks, and datasets used (1

page)

7. Experimental Results in plotted figures or tabulations plus their interpretations

and performance analysis ( 2 pages)

8. Related Work and Conclusions (1 page)

9. References – List of 15 relevant papers (1 page)

10 - 3

Topic Project Title Assignments

1 Use of XEN to create virtual machines, conduct some VM experiments and report performance measured

2 Exploring Amazon EC2, S3, or MapReduce, or virtual

cluster, or private cloud for HPC scientific applications

3 Parallelization of a novel application idea using MPI or OpenMP, analyze the performance improvements.

4 Using Hadoop or node.js for a distributed Web

Application

5 Integration of Globus Online by using CLI or REST API

for an application that needs data transfer capabilities

Candidate Project Topics :

10 - 4

Candidate Project Topics :

Topic No. Topic Title Assignment

6 Stork – Globus Online Comparison through different metrics

7 Application of a scientific problem with a workflow in Condor scheduler

8 Development of a client/server application that does performance improvements on a high-performance data transfer protocol (GridFTP, UDT)

9 MPI- Hadoop Comparison

10 A survey on Parallel File System Comparison

10 - 5

Topic 1: Use of XEN for virtual machine (VM) creation and resource management through some VM application experiments

You are asked to port the XEN hypervisor on a local

computer or on your own notebook.

Create the Domain 0 (control VM) and some User Domains

(VM applications) for some selected benchmarks

Collect the performance results. Discuss lessons learned

from the XEN application experiments.

10 - 6Prof. Kai Hwang

Suggested References for Topic 1:

1. M. Rosenblum, “Recent Advances in Virtual Machines and

Operating Systems”, Keynote Address, ACM ASPLOS 2006

2. J. Smith and R. Nair, Virtual Machines: Versatile Platforms for

Systems and Processes, Morgan Kaufmann, 2005

3. B. Sotomayor, R. Montero, and I. Foster, “Virtual Infrastructure

Management in Private and Hybrid Clouds”, IEEE Internet

Computing, Sept. 2009.

4. P. Barham, et al, “XEN and the Art of Virtualization”, Proc.of the 9th

ACM Symp. on OS Principles (SOSP19), ACM Press, 2003

5. A. Menon, et al, “Diagnosing Performance Overheads in the XEN

Virtual Machine Environment”, Proc. of the 1st Int’l Conf. on Virtual

Execution Environments. 2005

10 - 7

Topic 2: Exploring the use of Amazon EC2, S3, MapReduce, or virtual

cluster, or private cloud in HPC scientific applications

This project requires to use available AWS virtual clusters (EC2, S3

instances), or the MapReduce Cluster, or the private cloud offered

on the AWS platform. A cluster of 64 to 120 nodes are desired

You need to perform some benchmark experiments on these VM

clusters. You need to measure the performance and analyze the

performance attributes and identify performance bottlenecks.

Select some well-known high-performance scientific benchmark

programs to carry out your experiments or write your own testing

program such as for large-scale matrix multiplication

10 - 8

Key References for Topic 2 :

1.1. K. Hwang, G. Fox and J. Dongarra, K. Hwang, G. Fox and J. Dongarra, Distributed and Cloud Distributed and Cloud

ComputingComputing, Chapters 2, 4, 6, Morgan Kaufmann, Oct. 2011., Chapters 2, 4, 6, Morgan Kaufmann, Oct. 2011.

2.2. K. Hwang and Z. Xu: K. Hwang and Z. Xu: ScalableScalable Parallel Computing, Parallel Computing, McGraw-McGraw-

Hill, Chapter 2 and 12, 1998Hill, Chapter 2 and 12, 1998

3. E. Walker, “Benchmarking Amazon EC2 for High-

Performance Scientific Computing,” login, vol. 33, no. 5, pp.

18–23, 2008.

4.4. D. Kirk and W. HwuD. Kirk and W. Hwu, Programming Massively Parallel , Programming Massively Parallel

Processors: A Hands-on ApproachProcessors: A Hands-on Approach, Morgan Kaufmann, 2010., Morgan Kaufmann, 2010.

10 - 9

Topic 3: Parallelization of a novel application idea using MPI

or OpenMP, analyze the performance improvements

You are asked to find a computationally intensive application and

parallelize it by using MPI or OpenMP.

Conduct a thorough performance analysis test using multiple

machines (multi core computer in absence of multiple machines)

Test your code by running it on an SMP(A single computer with mult-

cores) and DSM(Multiple computers connected via LAN)

environment

Prepare different test case by differentiating machine architecture,

problem size, etc.

10 - 10

Topic 4: Using Hadoop or node.js for a distributed Web

Application

Design a web application that serves thousands of users

Each user asks for a computationally intensive service.

Distribute the load of the service given by the application to multiple

machines at the back end by using technologies like Hadoop or

node.js.

Analyze the performance of your application with the increasing

number of users

10 - 11

Topic 5: Integration of Globus Online by using CLI or REST

API for an application that needs data transfer capabilities

You are asked to design or use an existing application that needs

transfer capabilities

Your application will integrate Globus Online as the data transfer

capability and provide monitoring of the jobs as well.

The CLI could be used in a complex job that needs data transfers

between nodes before starting execution

The REST API could be used for any type of application.

10 - 12

Topic 6: Stork – Globus Online Comparison through different

metrics

You are asked to install two GridFTP servers in two machines and

integrate these with Globus Online

Then install the Stork scheduler in one of the machines

Design data transfer test cases and make a full comparison of the

two tools.

Some of the performance metrics could be dataset characteristics,

ease of use(Stork doesnot have an interface so compare it with GO

CLI), individual transfer speed, throughput.

Use Stork features like concurrent transfers, optimization.

10 - 13

Topic 7: Application of a scientific problem with a workflow in

Condor scheduler

Find a scientific problem that requires complex computational and

data transfer needs.

Design a workflow for the solution of the problem

Apply the workflow by using the Condor scheduler

10 - 14

Topic 8: Development of a client/server application that does

performance improvements on a high-performance data

transfer protocol (GridFTP, UDT)

By using GridFTP or UDT APIs, design a client/server model that

does optimization to the data transfers

Ex: For UDT: Use the same connection for multiple file transfers,

apply a threaded server/client model to do concurrent file transfers

for multiple sockets

Ex: For GridFTP: Use the java or C APIs to dynamically change the

parallel stream numbers or concorrency numbers for a directory

transfer

Test your implementation to see any improvements.

10 - 15

Topic 9: MPI-Hadoop Comparison

Find an application of algorithm that can be parallelized but does not

need any communications in between the parallel processes

Implement it using Hadoop and MPI

Compare their performances

10 - 16

Topic 10: A survey Report on Parallel and Distributed File

Systems

You are asked to write an extensive report on popular currently

available parallel and distributed file systems (GPFS, Lustre, HDFS,

PVFS, WheelFS, GFS, AFS)

Research performance comparison metrics for these file system

Open source file systems could be installed and by using

performance benchmarking tools , conduct test cases where you

measure the read/write speeds

Write a paper presenting a multdimensional comparison study and

provide test case results with selected sample file systems