21
Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Embed Size (px)

Citation preview

Page 1: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Explorations into Internet Distributed Computing

Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Page 2: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Project Overview

Design and implement a simple internet distributed computing framework

Compare application development for this environment with traditional parallel computing environment.

Page 3: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Grapevine

An Internet Distributed Computing Framework- Kunal Agrawal, Kevin Chu

Page 4: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

What is Internet Distributed Computing?

Page 5: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Motivation

Supercomputers are very expensiveLarge numbers of personal computers and workstations around the world are naturally networked via the internetHuge amounts of computational resources are wasted because many computers spend most of their time idleGrowing interest in grid computing technologies

Page 6: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Other Distributed Computing Efforts

Page 7: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Internet Distributed Computing Issues

Nodes reliabilityNetwork qualityScalability SecurityCross platform portability of object codeComputing Paradigm Shift

Page 8: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Overview Of Grapevine

Page 9: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Client Application

Grapevine Server

Grapevine Volunteer

Grapevine Volunteer

Grapevine Volunteer

Page 10: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Grapevine Features

Written in JavaParametrized Tasks Inter-task communicationResult ReportingStatus Reporting

Page 11: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Un-addressed Issues

Node reliabilityLoad BalancingUn-intrusive OperationInterruption SemanticsDeadlock

Page 12: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Meta Classifier

- Ang Huey Ting, Li Guoliang

Page 13: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Classifier

Function(instance) = {True,False}Machine Learning Approach Build a model on the training set Use the model to classify new

instance

Publicly available packages : WEKA(in java), MLC++.

Page 14: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Meta Classifier

Assembly of classifiersGives better performanceTwo ways of generating assembly of classifiers Different training data sets Different algorithms

Voting

Page 15: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Building Meta Classifier

Different Train Datasets - Bagging Randomly generated ‘bags’ Selection with replacement Create different ‘flavors’ of the

training set

Different Algorithms E.g. Naïve Bayesian, Neural Net, SVM Different algorithms works well on

different training sets

Page 16: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Why Parallelise?

Computationally intensiveOne classifier = 0.5 hrMeta classifier (assembly of 10 classifiers)

= 10 *0.5 = 5 hr

Distributed Environment - Grapevine Build classifiers in parallel

independently Little communication required

Page 17: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Distributed Meta Classifiers

WEKA- machine learning package University of Waikato, New Zealand http://www.cs.waikato.ac.nz/~ml/

weka/ Implemented in Java Including most popular machine

learning tools

Page 18: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Distributed Meta-Classifiers on Grapevine

Distributed Bagging Generate different BagsDefine bag and Algorithm for each taskSubmit tasks to GrapevineNode build ClassifiersReceive results Perform voting

Page 19: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Preliminary Study

Bagging on Quick Propagation in openMP Implemented in C

Page 20: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Trial Domain

Benchmark corpus Reuters21578 for Text Categorization 9000+ train documents 3000+ test documents 90+ categories Perform feature selection Preprocess documents into feature

vectors

Page 21: Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu

Summary

Successful internet distributed computing requires addressing many issues outside of traditional computer science Distributed computing is not for everyone