Upload
annabel-sherman
View
213
Download
0
Embed Size (px)
Citation preview
10 - 1
The Term Project demands in-depth research and investigated The Term Project demands in-depth research and investigated
reporting. All reported contents, figures, reporting. All reported contents, figures,
and tables must be originally generated .and tables must be originally generated .
Ten topics are for students to choose from, different topics Ten topics are for students to choose from, different topics
for multiple disjoint groups of students to work on. for multiple disjoint groups of students to work on.
You have only 1 month to report the work first through a proposal and You have only 1 month to report the work first through a proposal and
then a complete written report at the end of the semester and present then a complete written report at the end of the semester and present
it.it.
The proposal which will become your report at the end should follow The proposal which will become your report at the end should follow
the IEEE Conference paper format of about 10 pages, including the IEEE Conference paper format of about 10 pages, including
original figure illustrations and tabulations plus a reference listing of original figure illustrations and tabulations plus a reference listing of
10 - 15 papers. (Template link: 10 - 15 papers. (Template link:
http://www.ieee.org/conferences_events/conferences/publishing/templhttp://www.ieee.org/conferences_events/conferences/publishing/templ
ates.html)ates.html)
Introduction to Grid and Cloud
ComputingTerm Project Specification:
10 - 2
How to Write a Good Technical Paper on 10 pages ?
1. Title ( < 8 words) must hit the hot topic - short, clear and eye-catching, Authors
and Affiliations (in 1-2 lines after the title)
2. Abstract (< 50 ~ 100 words) must state the research objectives, summarize the
findings, and highlight the innovative contributions.
3. Introduction (including the title, abstract) on 1 page must motivate the readers to
read the rest of the paper and prepare them with the necessary background
4. Problem Statement and Formulation (2 pages) of the problem being solved,
basic assumptions, formulate the problem with technical specifications
5. Architecture, algorithms, solution methods, protocols, analytical
results and illustrated example, etc. (2 pages)
6. Experimental setting (computer simulators, benchmarks, and datasets used (1
page)
7. Experimental Results in plotted figures or tabulations plus their interpretations
and performance analysis ( 2 pages)
8. Related Work and Conclusions (1 page)
9. References – List of 15 relevant papers (1 page)
10 - 3
Topic Project Title Assignments
1 Use of XEN to create virtual machines, conduct some VM experiments and report performance measured
2 Exploring Amazon EC2, S3, or MapReduce, or virtual
cluster, or private cloud for HPC scientific applications
3 Parallelization of a novel application idea using MPI or OpenMP, analyze the performance improvements.
4 Using Hadoop or node.js for a distributed Web
Application
5 Integration of Globus Online by using CLI or REST API
for an application that needs data transfer capabilities
Candidate Project Topics :
10 - 4
Candidate Project Topics :
Topic No. Topic Title Assignment
6 Stork – Globus Online Comparison through different metrics
7 Application of a scientific problem with a workflow in Condor scheduler
8 Development of a client/server application that does performance improvements on a high-performance data transfer protocol (GridFTP, UDT)
9 MPI- Hadoop Comparison
10 A survey on Parallel File System Comparison
10 - 5
Topic 1: Use of XEN for virtual machine (VM) creation and resource management through some VM application experiments
You are asked to port the XEN hypervisor on a local
computer or on your own notebook.
Create the Domain 0 (control VM) and some User Domains
(VM applications) for some selected benchmarks
Collect the performance results. Discuss lessons learned
from the XEN application experiments.
10 - 6Prof. Kai Hwang
Suggested References for Topic 1:
1. M. Rosenblum, “Recent Advances in Virtual Machines and
Operating Systems”, Keynote Address, ACM ASPLOS 2006
2. J. Smith and R. Nair, Virtual Machines: Versatile Platforms for
Systems and Processes, Morgan Kaufmann, 2005
3. B. Sotomayor, R. Montero, and I. Foster, “Virtual Infrastructure
Management in Private and Hybrid Clouds”, IEEE Internet
Computing, Sept. 2009.
4. P. Barham, et al, “XEN and the Art of Virtualization”, Proc.of the 9th
ACM Symp. on OS Principles (SOSP19), ACM Press, 2003
5. A. Menon, et al, “Diagnosing Performance Overheads in the XEN
Virtual Machine Environment”, Proc. of the 1st Int’l Conf. on Virtual
Execution Environments. 2005
10 - 7
Topic 2: Exploring the use of Amazon EC2, S3, MapReduce, or virtual
cluster, or private cloud in HPC scientific applications
This project requires to use available AWS virtual clusters (EC2, S3
instances), or the MapReduce Cluster, or the private cloud offered
on the AWS platform. A cluster of 64 to 120 nodes are desired
You need to perform some benchmark experiments on these VM
clusters. You need to measure the performance and analyze the
performance attributes and identify performance bottlenecks.
Select some well-known high-performance scientific benchmark
programs to carry out your experiments or write your own testing
program such as for large-scale matrix multiplication
10 - 8
Key References for Topic 2 :
1.1. K. Hwang, G. Fox and J. Dongarra, K. Hwang, G. Fox and J. Dongarra, Distributed and Cloud Distributed and Cloud
ComputingComputing, Chapters 2, 4, 6, Morgan Kaufmann, Oct. 2011., Chapters 2, 4, 6, Morgan Kaufmann, Oct. 2011.
2.2. K. Hwang and Z. Xu: K. Hwang and Z. Xu: ScalableScalable Parallel Computing, Parallel Computing, McGraw-McGraw-
Hill, Chapter 2 and 12, 1998Hill, Chapter 2 and 12, 1998
3. E. Walker, “Benchmarking Amazon EC2 for High-
Performance Scientific Computing,” login, vol. 33, no. 5, pp.
18–23, 2008.
4.4. D. Kirk and W. HwuD. Kirk and W. Hwu, Programming Massively Parallel , Programming Massively Parallel
Processors: A Hands-on ApproachProcessors: A Hands-on Approach, Morgan Kaufmann, 2010., Morgan Kaufmann, 2010.
10 - 9
Topic 3: Parallelization of a novel application idea using MPI
or OpenMP, analyze the performance improvements
You are asked to find a computationally intensive application and
parallelize it by using MPI or OpenMP.
Conduct a thorough performance analysis test using multiple
machines (multi core computer in absence of multiple machines)
Test your code by running it on an SMP(A single computer with mult-
cores) and DSM(Multiple computers connected via LAN)
environment
Prepare different test case by differentiating machine architecture,
problem size, etc.
10 - 10
Topic 4: Using Hadoop or node.js for a distributed Web
Application
Design a web application that serves thousands of users
Each user asks for a computationally intensive service.
Distribute the load of the service given by the application to multiple
machines at the back end by using technologies like Hadoop or
node.js.
Analyze the performance of your application with the increasing
number of users
10 - 11
Topic 5: Integration of Globus Online by using CLI or REST
API for an application that needs data transfer capabilities
You are asked to design or use an existing application that needs
transfer capabilities
Your application will integrate Globus Online as the data transfer
capability and provide monitoring of the jobs as well.
The CLI could be used in a complex job that needs data transfers
between nodes before starting execution
The REST API could be used for any type of application.
10 - 12
Topic 6: Stork – Globus Online Comparison through different
metrics
You are asked to install two GridFTP servers in two machines and
integrate these with Globus Online
Then install the Stork scheduler in one of the machines
Design data transfer test cases and make a full comparison of the
two tools.
Some of the performance metrics could be dataset characteristics,
ease of use(Stork doesnot have an interface so compare it with GO
CLI), individual transfer speed, throughput.
Use Stork features like concurrent transfers, optimization.
10 - 13
Topic 7: Application of a scientific problem with a workflow in
Condor scheduler
Find a scientific problem that requires complex computational and
data transfer needs.
Design a workflow for the solution of the problem
Apply the workflow by using the Condor scheduler
10 - 14
Topic 8: Development of a client/server application that does
performance improvements on a high-performance data
transfer protocol (GridFTP, UDT)
By using GridFTP or UDT APIs, design a client/server model that
does optimization to the data transfers
Ex: For UDT: Use the same connection for multiple file transfers,
apply a threaded server/client model to do concurrent file transfers
for multiple sockets
Ex: For GridFTP: Use the java or C APIs to dynamically change the
parallel stream numbers or concorrency numbers for a directory
transfer
Test your implementation to see any improvements.
10 - 15
Topic 9: MPI-Hadoop Comparison
Find an application of algorithm that can be parallelized but does not
need any communications in between the parallel processes
Implement it using Hadoop and MPI
Compare their performances
10 - 16
Topic 10: A survey Report on Parallel and Distributed File
Systems
You are asked to write an extensive report on popular currently
available parallel and distributed file systems (GPFS, Lustre, HDFS,
PVFS, WheelFS, GFS, AFS)
Research performance comparison metrics for these file system
Open source file systems could be installed and by using
performance benchmarking tools , conduct test cases where you
measure the read/write speeds
Write a paper presenting a multdimensional comparison study and
provide test case results with selected sample file systems