View
2
Download
0
Category
Preview:
Citation preview
Basics of Cloud Computing – Lecture 7
Cloud Computing – Summary and
Research at UTResearch at UT
Satish Srirama
Outline
• Cloud computing – Summary
• Research at UT on cloud
– SciCloud
– Mobile Cloud– Mobile Cloud
– D2CM
23.05.2012 2/57Satish Srirama
What is Cloud Computing?
• Computing as a utility
– Consumers pay based on their usage
• Cloud Computing characteristics
– Illusion of infinite resources
– No up-front cost
– Fine-grained billing (e.g. hourly)
• Gartner: “Cloud computing is a style of computing where massively scalable IT-related capabilities are provided ‘as a service’ across the Internet to multiple external customers”
23.05.2012 3/57Satish Srirama
Cloud Computing – Services - Recap
• Software as a Service – SaaS– A way to access applications
hosted on the web through your web browser
• Platform as a Service – PaaS– A pay-as-you-go model for IT
resources accessed over the
SaaS
Facebook, Flikr, Myspace.com,
Google maps API, Gmail
Level of
Abstraction
resources accessed over the Internet
• Infrastructure as a Service –IaaS– Use of commodity computers,
distributed across Internet, to perform parallel processing, distributed storage, indexing and mining of data
– Virtualization
PaaS
Google App Engine,
Force.com, Hadoop, Azure, Amazon S3, etc
IaaS
Amazon EC2, SciCloud,
Joyent Accelerators, Nirvanix Storage Delivery Network, etc.
23.05.2012 4/57Satish Srirama
Scientific Computing Cloud (SciCloud)
• Distributed Systems Group owned private cloud infrastructure
• Eucalyptus setup
• Goal of the project• Goal of the project
– To establish a private cloud at the university
– To efficiently use the already existing resources of universities
– To address computationally intensive scientific, mathematical, and academic problems
• Hope it was a pleasant experience !!!
23.05.2012 5/57Satish Srirama
Economics of Cloud Providers –
Failures - Recap
• Cloud Computing providers bring a shift from high reliability/availability servers to commodity servers– At least one failure per day in large datacenter
• Caveat: User software has to adapt to failures• Caveat: User software has to adapt to failures
• Solution: Replicate data and computation– MapReduce & Distributed File System
• MapReduce = functional programming meets distributed processing on steroids – Not a new idea… dates back to the 50’s (or even 30’s)
23.05.2012 6/57Satish Srirama
MapReduce
• Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>*– All values with the same key are reduced together
• The execution framework handles everything else…• Not quite…usually, programmers also specify:• Not quite…usually, programmers also specify:
partition (k’, number of partitions) → partition for k’– Often a simple hash of the key, e.g., hash(k’) mod n– Divides up key space for parallel reduce operationscombine (k’, v’) → <k’, v’>*– Mini-reducers that run in memory after the map phase– Used as an optimization to reduce network traffic
23.05.2012 7/57Satish Srirama
Hadoop Processing Model
• Create or allocate a cluster
• Put data onto the file system (HDFS)– Data is split into blocks
– Replicated and stored in the cluster
• Run your job• Run your job– Copy Map code to the allocated nodes
• Move computation to data, not data to computation
– Gather output of Map, sort and partition on key
– Run Reduce tasks
• Results are stored in the HDFS
23.05.2012 8/57Satish Srirama
MapReduce Examples
• Distributed Grep
• Count of URL Access Frequency
• Reverse Web-Link Graph
• Inverted Index • Inverted Index
• Distributed Sort
23.05.2012 9/57Satish Srirama
Synchronization in Hadoop
• Approach 1: turn synchronization into an ordering problem– Sort keys into correct order of computation
– Partition key space so that each reducer gets the appropriate set of partial results
– Hold state in reducer across multiple key-value pairs to perform computationcomputation
– Illustrated by the “pairs” approach in calculating conditional probability of words
• Approach 2: construct data structures that “bring the pieces together”– Each reducer receives all the data it needs to complete the
computation
– Illustrated by the “stripes” approach
23.05.2012 10/57Satish Srirama
Information Retrieval
• We have checked Boolean retrieval
• We also have seen Ranked retrieval in corpus
– TF.IDF
– We have seen how to calculate similarity between – We have seen how to calculate similarity between
documents
• How several MapReduce jobs can be used to
solve particular problem
23.05.2012 11/57Satish Srirama
Cloud Providers
• Seen several services from Amazon
– Amzon EC2, S3, EBS
– Amazon CloudFormation and others
• Google AppEngine• Google AppEngine
• Aneka framework
• Force.com
23.05.2012 12/57Satish Srirama
RESEARCH AT UT
23.05.2012 13Satish Srirama
SciCloud [Srirama et al, CCGrid 2010] – Research
topics
• Have customized images supporting
– Scientific Computing
• NumPy
• SciPy
• Enterprise computing• Enterprise computing
– Ongoing research with auto scaling
– Porting enterprise application onto the cloud
– Ongoing research with load balancing
• Clouds promise virtually infinite resources
– Probably good for HPC!!! Are they?
23.05.2012 14/57Satish Srirama
Troubles with HPC on Cloud
• Shift from high reliability/availability servers to
commodity servers
• Communication is huge trouble [Srirama et al, SPJ
2011]2011]
– Virtualization technology is the culprit
– Thoroughly analyzed MPI on cloud
• Tests conducted with NAS PB
23.05.2012 15/57Satish Srirama
Adapting computing problems to cloud
• Reducing the algorithms to cloud computing frameworks like MapReduce [Srirama et al, FGCS 2012]
• Designed a classification on how the algorithms can be adapted to MRalgorithms can be adapted to MR
– Algorithm � single MapReduce job
– Algorithm � n MapReduce jobs
– Each iteration in algorithm � single MapReduce job
– Each iteration in algorithm � n MapReduce jobs
23.05.2012 16/57Satish Srirama
Example algorithms for classification [Srirama et al, FGCS 2012]
• Conjugate Gradient (CG) – Class 4
• K-medoid clustering
– Partitioning Around Medoids (PAM) – Class 3
– Clustering large applications (CLARA) – Class 2– Clustering large applications (CLARA) – Class 2
• Factoring integers – Class 1
23.05.2012 17/57Satish Srirama
Conjugate Gradient
• Iterative method for solving systems of linear equations
• Makes an initial guess of the solution
• Applies different matrix and vector operations at each vector operations at each iteration to improve the guess
• CG is a complex algorithm, it is not possible to reduce the whole algorithm to MapReduce model
• Adapted the matrix and vector operations used by CG instead
23.05.2012 18/57Satish Srirama
PAM
• Iterative k-medoid clustering algorithm
• Map– Find the closest medoid and assign the
object to the cluster of the medoid
– Input: (cluster id, object)
– Output: (new cluster id, object)
• Reduce• Reduce– Find which object is the most central and
assign it as the new medoid of the cluster
– Input: (cluster id, (list of all objects in the cluster))
– Output: (cluster id, new medoid)
23.05.2012 19/57Satish Srirama
CLARA in MapReduce
• The algorithm can be reduced to 2
MapReduce Jobs
• MapReduce job I (Finding the Candidate sets)
– Map: (Assign random key to each point)– Map: (Assign random key to each point)
• Input: < key, point >
• Output < random key, point >
– Reduce: (Pick first S points and use PAM on them)
• Output < key, PAM(S points) >
– result of PAM() is a set of k medoids
23.05.2012 20/57Satish Srirama
CLARA in MapReduce - continued
• MapReduce job II (Measuring the quality)
– Map: (Find each points distance from it's closest
medoid, for each candidate set)
• Input: < key, point >• Input: < key, point >
• Output: < candidate set
key, distance > – C different
sums, one for each
candidate set
– Reduce: (Sum distances)
23.05.2012 21/57Satish Srirama
PAM vs CLARA
23.05.2012 22/57Satish Srirama
Observations
• CG– Operations used inside each iteration are reduced to MapReduce
model
– Low efficiency and scalability
• PAM– Content of a whole iteration can be reduced to MapReduce model
– Low efficiency and scalability– Low efficiency and scalability
• CLARA– Iterations are removed, everything can be reduced to two different
MapReduce jobs
– Good scalability and efficiency
• Factoring integers– Everything can be reduced to one MapReduce job
– Very good efficiency and scalability
23.05.2012 23/57Satish Srirama
Results of the study
• Adapting scientific computing problems to cloud environments is not a trivial task
• MapReduce is suited better for embarrassingly parallel algorithms
• CG and PAM have serious problems with job • CG and PAM have serious problems with job latency– Most of the time is spent on the background tasks
• Reading data from the HDFS
• Input need to be read again and again, even if much of it does not change after each MapReduce job
• Approximately 18 seconds latency with Hadoop for job initiation
23.05.2012 24/57Satish Srirama
Alternative MapReduce: Twister
• Designed for iterative algorithms, by providing long running Map and Reduce tasks
• Ability to keep input data in memory across multiple MapReduce executions
• Disadvantages:• Disadvantages:– Not perfect fault tolerance
– No distributed file system integrated
– Has stability issues
– Input data must fit into the collective memory of the cluster
– Twister has much shorter startup time. But still too high for real time applications. (~3 sec)
23.05.2012 25/57Satish Srirama
Stratus
• Building our own framework for iterative
problems [Jakovits and Srirama, CCGrid 2012]
• Based on Bulk Synchronous Parallel
– Consists of a sequence of super steps– Consists of a sequence of super steps
– Concurrent tasks execute the same code on local data
partition
– Send messages to other tasks, which are received at
the next super step
– Barrier synchronization at the end of every super step
23.05.2012 26/57Satish Srirama
MOBILE CLOUD
23.05.2012 27Satish Srirama
Mobile is 7th Mass Media Channel
Tomi T Ahonen, Mobile as 7th of the Mass Media, http://mobile7th.futuretext.com/
23.05.2012 28/57Satish Srirama
Some numbers
• There are lot of mobile phones
– 5.6 billion subscriptions with global population of
6.97 billion
– > 3.6 billion people with at least one mobile– > 3.6 billion people with at least one mobile
• Mobile data services revenue totals $314.7
Billion
http://www.gartner.com/it/page.jsp?id=1759714
23.05.2012 29/57Satish Srirama
Popular consumer mobile applications
• Location-based services (LBSs)
– Deliver services to users based on his location
• Mobile social networking
– Most popular social networking platforms have – Most popular social networking platforms have apps for mobiles
• Mobile commerce
– An extension of e-commerce
• Mobile payment
– Near field communication (NFC) payment
23.05.2012 30/57Satish Srirama
Popular consumer mobile applications
- continued
• Context-aware services
– Context means person's interests, history, environment, connections, preferences etc.
– Proactively serve up the most appropriate content, product or service
–content, product or service
• Mobile instant messaging (MIM)
– Skype for mobiles
• Mobile e-mail
• Mobile video
23.05.2012 31/57Satish Srirama
Mobile Cloud Lab
• Interested in developing mobile applications
– Social networks, m-learning, m-health etc.
• Work with multiple platforms
– Android, iOS, Windows Phone 7, Sensor kits etc.
– Have ~50 latest phones
• Offered courses
– Mobile Application Development - MTAT.03.262 (Fall 2012)
– Mobile Application Development - Projects (MTAT.03.266) (Spring & Fall 2012)
23.05.2012 Satish Srirama 32/57
Mobile Cloud Computing
• One can do interesting things on mobiles
directly
– Today’s mobiles are far more capable
• However, some applications need to offload • However, some applications need to offload
certain activities to servers
– Processing sensor data
• Resource-intensive processing on the cloud
– To enrich the functionality of mobile applications
23.05.2012 33/57Satish Srirama
Mobile Cloud Applications
• Mobile has significant advantage by going
cloud-aware
– Increased data storage capacity
– Availability of unlimited processing power– Availability of unlimited processing power
– Extended battery life [Rudenko et al., SIGMOBILE 1998]
• Mobile Cloud based mash-up applications
– Mobiles need to possess multiple APIs
– Cloud interoperability is a huge trouble
23.05.2012 34/57Satish Srirama
Mobile Cloud Middleware
[Srirama et al, ICIW 2006]
[Flores et al, MoMM 2011]
23.05.2012 35/57Satish Srirama
MCM – enables
• Interoperability between different Cloud
Services (IaaS, SaaS, PaaS) and Providers
(Amazon, Eucalyptus, etc)
• Bringing the benefits of the Cloud to the • Bringing the benefits of the Cloud to the
Mobile Domain
• Composition of different Cloud Services
• Asynchronous communication between the
device and MCM
23.05.2012 36/57Satish Srirama
Technological Choices – Asynchronous
Notification
• Via third party services
– Apple Push Notification Service (APNS)
– Android Cloud to Device Messaging Framework (C2DM)
– Microsoft Push Notification Service (MPN)– Microsoft Push Notification Service (MPN)
• Mobile Host [Srirama et al, 2006]
– Providing web services from smart phones
– MCM can directly send the response back to the device
– Support for Android, iOS and J2ME
23.05.2012 37/57Satish Srirama
CroudSTag – Scenario
• CroudSTag takes the pictures/videos from the cloud and tries to recognize people
– Pictures/Videos are actually taken by the phone
– Processes the videos
– Recognizes people using facial recognition technologies– Recognizes people using facial recognition technologies
• Reports the user a list of people recognized in the pictures
• The user decides whether to add them or not to the social group
• The people selected by the user receive a message in Facebook inviting them to join the social group
23.05.2012 38/57Satish Srirama
CroudSTag [Srirama et al, PCS 2011]
• Cloud services used
– Media storage on
Amazon S3
– Processing videos on Facial Recognition
Process
Taking picture/video
using the camera
Selecting CloudMain Menu
Send Asynchronous
Notification and Results
Storage Services
1.
2.
3.
Login
– Processing videos on
Elastic MapReduce
– face.com to recognize
people on facebook
– Starting social group
on facebookSend invitation to the
social group
Selecting people
Selecting Cloud
Authentication
Start Process
4.
5.
6.
7.
8.
9.
Send next
invitation
Back to Menu
23.05.2012 39/57Satish Srirama
D2CM
23.05.2012 40Satish Srirama
Desktop to Cloud Migration [Srirama et al,
FGCS 2013/X]
• D2CM is a tool developed in the University of Tartu as part of
an FP7 project called REMICS
– “Reuse and Migration of Legacy Applications to Interoperable Cloud
Services”
• Enables scientists to deploy scientific experiments on Cloud • Enables scientists to deploy scientific experiments on Cloud
infrastructure (IaaS)
– By migrating local desktop virtual machines to the cloud
– Scenario
• Main assumptions:
– Non computer-scientists do not have significant knowledge of
computer science, clouds and migration procedures
– They are only interested in submitting an experiment and collecting
the results after some time
23.05.2012 41/57Satish Srirama
D2CM Goals
• Simplify the use of on-demand cloud resources
• Support full start-to-end migration of scientific
experiments to the cloud
• Retain total control over the environment and • Retain total control over the environment and
installed libraries
– Choosing correct optimization libraries is vital for HPC
• Easy to use
23.05.2012 42/57Satish Srirama
D2CM Goals - continued
• Enable to scale up scientific experiments
–Vertical scalability
–Horizontal scalability
• Concurrent execution of parallel subtasks
– MPI, OpenMP, ...
• Batch processing
– Multiple experiments at once
23.05.2012 43/57Satish Srirama
D2CM execution flow
• Enter cloud credentials
• Add a desktop image
– Where everything needed is installed
– Currently we are using VirtualBox– Currently we are using VirtualBox
• Migrate the image to a specific cloud
23.05.2012 44/57Satish Srirama
Migrate image
• Started with migration to Amazon and Eucalyptus
– Both based on XEN
– Proof-of-Concept implementation
• Transform it into a compatible cloud image• Transform it into a compatible cloud image
– Extract the file system
– Package kernels
– Install additional software
• Move it to the target cloud
23.05.2012 45/57Satish Srirama
Describe the deployment
• Create deployment template to describe the configuration of the IaaS resources
– Define roles
• Instance type, what resources allocated
• Number of instances• Number of instances
– Define actions for each role
• Uploads
• Initialization commands
• Run commands
• Deployment ending conditions
• Downloads
23.05.2012 46/57Satish Srirama
23.05.2012 47/57Satish Srirama
Deploy experiments
• Start the required instances
– Certain number for each role
– Certan type of cloud instance
• Execute the role scpecific commands• Execute the role scpecific commands
• Track the ending conditions
• Shutdown the instances
23.05.2012 48/57Satish Srirama
Monitor experiments
• Uses CollectD to keep track of Linux Virtual machines
– Memory usage
– Processor load
– Network
• Combined and separate graphical views for roles
• Scientist can see:
– how well the resources are utilized
– is the deployment working as intended
– was the deployment chosen correctly
23.05.2012 49/57Satish Srirama
Monitoring
23.05.2012 50/57Satish Srirama
Cost prediction
• Monetary cost is important when running
experiments in public cloud
– Price per hour cost
– Estimating the total runtime based on previous – Estimating the total runtime based on previous
experiments
23.05.2012 51/57Satish Srirama
23.05.2012 52/57Satish Srirama
FURTHER RESEARCH ON CLOUD
23.05.2012 53Satish Srirama
Data Mining on the Cloud
• Data storage and retrieval
– Also interested in processing, analysis, mining,
warehousing and presentation
• Research interests• Research interests
– NoSQL and cloud scale data storage solutions
– Migrating relational databases to non-relational
systems
– Implementing graph algorithms on graph
databases
23.05.2012 54/57Satish Srirama
Further research interests
• Cloud economics– Predicting costs based on previous executions
– Providing best estimates on ideal deployment configuration, given problem size and resource costs • Have the setup to experiment with load balancers• Have the setup to experiment with load balancers
• Distributed troubleshooting [Shor & Srirama, CLOSER 2011; Shor et al, DOA-SVI 2011]
– Finding memory leaks in distributed applications
– Adapting the tools to cloud solutions
• Bioinformatics on the cloud
• Refactoring enterprise applications for cloud
23.05.2012 55/57Satish Srirama
How you can contribute?
• Several open Bachelor/Master theses
http://ds.cs.ut.ee/theses
• Your course projects can be extended to thesestheses
• You can also do internship in summer
• All new interesting ideas are welcome
• Good students always have paid positions
– Contact srirama AT ut.ee
23.05.2012 56/57Satish Srirama
Any Questions?
• All the best with exam next week
23.05.2012 57/57Satish Srirama
References
• S. N. Srirama, P. Jakovits, E. Vainikko: Adapting Scientific Computing Problems to Clouds using MapReduce, Future Generation Computer Systems Journal, 28(1):184-192, 2012. Elsevier press. DOI 10.1016/j.future.2011.05.025
• H. Flores, S. N. Srirama, C. Paniagua: A Generic Middleware Framework for Handling Process Intensive Hybrid Cloud Services from Mobiles, The 9th International Conference on Advances in Mobile Computing & Multimedia (MoMM-2011), December 5-7, 2011, pp. 87-95. ACM.(MoMM-2011), December 5-7, 2011, pp. 87-95. ACM.
• S. N. Srirama, C. Paniagua, H. Flores: CroudSTag: Social Group Formation with Facial Recognition and Mobile Cloud Services, The 8th International Conference on Mobile Web Information Systems (MobiWIS 2011), September 19-21, 2011, v. 5 of Procedia Computer Science, pp. 633-640. Elsevier. doi: 10.1016/j.procs.2011.07.082.
• More details at http://math.ut.ee/~srirama/publications.html
23.05.2012 58/57Satish Srirama
Recommended