Upload
garey-greene
View
213
Download
0
Embed Size (px)
Citation preview
Explorations into Internet Distributed Computing
Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu
Project Overview
Design and implement a simple internet distributed computing framework
Compare application development for this environment with traditional parallel computing environment.
Grapevine
An Internet Distributed Computing Framework- Kunal Agrawal, Kevin Chu
What is Internet Distributed Computing?
Motivation
Supercomputers are very expensiveLarge numbers of personal computers and workstations around the world are naturally networked via the internetHuge amounts of computational resources are wasted because many computers spend most of their time idleGrowing interest in grid computing technologies
Other Distributed Computing Efforts
Internet Distributed Computing Issues
Nodes reliabilityNetwork qualityScalability SecurityCross platform portability of object codeComputing Paradigm Shift
Overview Of Grapevine
Client Application
Grapevine Server
Grapevine Volunteer
Grapevine Volunteer
Grapevine Volunteer
Grapevine Features
Written in JavaParametrized Tasks Inter-task communicationResult ReportingStatus Reporting
Un-addressed Issues
Node reliabilityLoad BalancingUn-intrusive OperationInterruption SemanticsDeadlock
Meta Classifier
- Ang Huey Ting, Li Guoliang
Classifier
Function(instance) = {True,False}Machine Learning Approach Build a model on the training set Use the model to classify new
instance
Publicly available packages : WEKA(in java), MLC++.
Meta Classifier
Assembly of classifiersGives better performanceTwo ways of generating assembly of classifiers Different training data sets Different algorithms
Voting
Building Meta Classifier
Different Train Datasets - Bagging Randomly generated ‘bags’ Selection with replacement Create different ‘flavors’ of the
training set
Different Algorithms E.g. Naïve Bayesian, Neural Net, SVM Different algorithms works well on
different training sets
Why Parallelise?
Computationally intensiveOne classifier = 0.5 hrMeta classifier (assembly of 10 classifiers)
= 10 *0.5 = 5 hr
Distributed Environment - Grapevine Build classifiers in parallel
independently Little communication required
Distributed Meta Classifiers
WEKA- machine learning package University of Waikato, New Zealand http://www.cs.waikato.ac.nz/~ml/
weka/ Implemented in Java Including most popular machine
learning tools
Distributed Meta-Classifiers on Grapevine
Distributed Bagging Generate different BagsDefine bag and Algorithm for each taskSubmit tasks to GrapevineNode build ClassifiersReceive results Perform voting
Preliminary Study
Bagging on Quick Propagation in openMP Implemented in C
Trial Domain
Benchmark corpus Reuters21578 for Text Categorization 9000+ train documents 3000+ test documents 90+ categories Perform feature selection Preprocess documents into feature
vectors
Summary
Successful internet distributed computing requires addressing many issues outside of traditional computer science Distributed computing is not for everyone