Upload
nathaniel-henry
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Designing Services for Grid-based Knowledge Discovery
A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio
DEISUniversity of Calabria
Future Generation Grids, Dagstuhl Seminar, November 2004
2
SUMMARY
The use of computers is changing our way to make discoveries and is improving both speed and quality of the discovery processes.
In this scenario the Grid can provide an effective computational support for distributed knowledge discovery from large and distributed data sets. To this purpose we designed a system called Knowledge Grid.
This talk discusses how to design distributed knowledge discovery services, according to the OGSA model, by using the Knowledge Grid services starting from searching Grid resources, composing software and data elements, and executing the resulting application on a Grid.
3
OUTLINE
MOTIVATIONS
TOWARDS KNOWLEDGE SERVICES
THE KNOWLEDGE GRID
OGSA SERVICES FOR KNOWLEDGE DISCOVERY
A META-LEARNING EXAMPLE
CONCLUSIONS
4
MOTIVATIONS
Lots of data collected and warehoused.
Data collected and stored at enormous speeds in local
databases, from remote sources, or from the sky.
Scientific simulations generating terabytes of data.
Huge data sets are hard to understand.
Traditional techniques are infeasible for raw data.
Computational science is evolving toward data-intensive applications that include
• data analysis, • information management, and • knowledge discovery.
5
MOTIVATIONS
Most data will never be examined by humans; it is analyzed and summarized by computers.
Data analysis is becoming a key element in scientific discovery and in business processes.
Data intensive applications are defined to be those that explore, query, analyze, visualize, and in general, process very large-scale data sets.
Data intensive applications help
• scientists in hypothesis formation
• companies to provide better, customized services and support decision making.
6
SCIENTIFIC OBJECTIVES
This objective can be achieved through
• development of techniques and tools for supporting data intensive applications and
• integration of Data and Computation Grids with Information and Knowledge Grids.
to support the process of unification of data management and knowledge discovery systems with Grid technologies for providing knowledge-based Grid services.
TOWARDS KNOWLEDGE SERVICES
Grid-aware Knowledge Discovery
Systems
7
KNOWLEDGE GRID - a distributed knowledge discovery architecture that integrates data mining techniques and computational Grid resources.
In the KNOWLEDGE GRID architecture data mining tools are integrated with lower-level Grid mechanisms and services and exploit Data Grid services.
This approach benefits from "standard" Grid services and offers an open architecture that can be configured on top of generic Grid middleware.
THE KNOWLEDGE GRID PAST
8
KNOWLEDGE GRID ARCHITECTURE
Generic and Data Grid Services
K N O W L E D G E G R I D
DASData AccessService
TAASTools and Algorithms
Access Service
EPMSExecution Plan
Management Service
RPSResult
Presentation Service
KDSKnowledge Directory
Service
RAEMSResource Alloc.Execution Mng.
KEPRKMR KBR
High level K-Grid layer
Core K-Grid layer
Resource MetadataExecution Plan MetadataModel Metadata
PAST
9
THE KNOWLEDGE GRID
D3 S1
D1
D2
S3 S2 H1
D2
H2
H1
D2D2
H3
D2
D1
S3
D4H2
D3
S1 D4H3
Component Selection
Application Workflow Composition
Application Execution on the Grid
Service Selection
PASTFUTURE
10
OGSA KNOWLEDGE GRID SERVICES
The KNOWLEDGE GRID is an abstract service-based Grid architecture that does not limit the user in developing and
using service-based knowledge discovery applications.
We are defining a set of Grid Services that export functionality and operations of the KNOWLEDGE GRID.
Each of the KNOWLEDGE GRID services is exposed as a persistent service, using the OGSA conventions and
mechanisms.
FUTURE
11
KNOWKEDGE SERVICES: A Meta-Learning Example
A simple example of meta-learning process over the KNOWLEDGE GRID.
To show how the execution of a significant distributed data mining application can benefit from the Knowledge Grid services, provided through the OGSA model.
Meta-learning aims to generate a number of independent classifiers by applying learning programs to a collection of distributed data sets in parallel.
The classifiers computed by learning programs are then collected and combined to obtain a global classifier.
12
KNOWKEDGE SERVICES: A Meta-Learning Example
LearnerLi
TrainingSet TRi
Nodei
PartitionerP
DataSet DS
NodeA
LearnerL1
TrainingSet TR1
Node1
LearnerLn
TrainingSet TRn
Noden
…
Step 1
Combiner/Tester CT
ValidationSet VS Testing
Set TS
ClassifierC1
ClassifierCi
ClassifierCn
Global ClassifierGC
NodeZ
…
…
Step 2
Step 3
…
13
KNOWKEDGE SERVICES: A Meta-Learning Example
A user application interacts with Knowledge Grid nodes to generate a classifier by combining the classifiers built from different subsets of a given data set.
The scenario comprises five nodes:• NU, running the user application that builds the meta-learning
application and visualizes the global classifier;
• NS, which is used for resource discovery and for steering the meta-learning application execution;
• NA, on which the original dataset is located and it provides a data partitioning service;
• NC, providing learning services which are performed in parallel over a homogeneous cluster;
• NZ, providing a combiner/tester service used to compute the global classifier.
14
The user application invokes the DAS and TAAS services on the node Ns specifying the required resources: two nodes providing services for the metalearning process (a learner and a combiner/tester) and for resource reservation.
RESOURCE DISCOVERY AND EXECUTION PLANNING
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS DAS
Resource Reservation
Factory
R
CombinerFactory
TAAS
RESOURCE DISCOVERY AND EXECUTION PLANNING
The DAS and TAAS services of node Ns invoke the corresponding services on other Knowledge Grid nodes, in order to obtain information about the needed resources. Contacted nodes reply to node Ns sending meta-information.
On node Ns, the meta-information about nodes Nc and Nz is analyzed, and such nodes are identified as candidates for the computation. The DAS and TAAS services on node Ns send this information to the U.A..
The application builds an execution plan for the meta-learning process, specifying strategies for data movement and algorithm execution. The execution plan is submitted to the EPMS of node Ns.
NU NS
NA NC NZ
15
The EPMS invokes the factories on Na, Nc and Nz requesting the creation of a partitioner service on node Na, and the creation of two reservation services on Nc and Nz. On node Nc,computing cycles are reserved (on each computing element) to execute the learner programs, storage space is reserved to maintain the subsets extracted from DS and the partial classifiers. On node Nz, storage space is reserved to maintain the partial and global classifiers.
SCIENTIFIC OBJECTIVESKDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
NU NS
NA NC NZ
16
SCIENTIFIC OBJECTIVES
The requests made by the EPMS result in the creation of the requested services.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
PartitionerService Reservation
ServiceReservation
Service
NU NS
NA NC NZ
17
SCIENTIFIC OBJECTIVES
The partitioner service interacts with the database service on the same node to extract theneeded subsets from DS: n training sets, a testing set and a validation set.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS
PartitionerService Reservation
Service
DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
ReservationService
NU NS
NA NC NZ
18
SCIENTIFIC OBJECTIVES
The EPMS invokes the DAS service on node Na, requesting to transfer the training sets to node Nc, and the testing and validation sets to node Nz; the learner factory on Nc, requesting the creation of n learner service instances to be run on the same node.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS
PartitionerService Reservation
Service
DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
ReservationService
NU NS
NA NC NZ
19
SCIENTIFIC OBJECTIVES
On node Nc, n learner service instances are created. On each computing element of node Nc, the learner service instances generate the partial classifiers. As soon as each partial classifier is obtained, a notification message is sent to the EPMS.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS
PartitionerService Reservation
Service
Learner Serv.Learner Serv.Learner Serv.
DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
ReservationService
NU NS
NA NC NZ
20
SCIENTIFIC OBJECTIVES
The EPMS invokes (i) the DAS service on node Nc, requesting to transfer the generated classifiers to node Nz; the combiner/tester factory on Nz, requesting the creation of a combiner/tester service to be run on the same node.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS
PartitionerService Reservation
Service
Learner Serv.Learner Serv.Learner Serv.
DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
ReservationService
NU NS
NA NC NZ
21
SCIENTIFIC OBJECTIVES
On node Nz, a combiner/tester service is created to perform the combining and testingprocesses and generate the global classifier GC.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS
PartitionerService Reservation
Service
Learner Serv.Learner Serv.Learner Serv.
DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
ReservationService
Combiner Service
NU NS
NA NC NZ
22
SCIENTIFIC OBJECTIVES
The EPMS invokes the DAS service on node Nz, requesting to transfer the generated global classifier to node Nu.
KDD APPLICATION EXECUTION
StorageReservation
FactoryR
User Application
DASTAAS
EPMS
R
DAS
DatabaseService
R
PartitionerFactory
DAS
Resource Reservation
Factory
R
LearnerFactory
TAAS
PartitionerService Reservation
Service
Learner Serv.Learner Serv.Learner Serv.
DAS
Resource Reservation
Factory
R
Combiner Factory
TAAS
ReservationService
Combiner Service
NU NS
NA NC NZ
23
SCIENTIFIC OBJECTIVES
Data privacy and security
KDD process state management
Complex processing patterns (Web Services are too simple to express distributed data mining processes and applications)
KDD Grid Service standards ( towards OGSA-KDAI ?)
KDD processes as G-Services Workflows
Asynchronous services
……
OPEN ISSUES FUTURE
24
SCIENTIFIC OBJECTIVES
The knowledge-building process in a distributed setting involves data and information collection, generation, and distribution followed by the collective interpretation of processed information into “knowledge.”
Next-generation Grids must be able to produce, use, and deploy knowledge as a basic element of advanced applications.
Knowledge-based Grids that can offer tools, components and services to support data analysis, inference, and discovery in scientific and business applications.
OGSA-based services for distributed knowledge discovery are a key element for large support of e-science and e-business.
CONCLUSIONS
25
CREDITS:
M. CannataroC. Comito
THANKS
www.icar.cnr.it/kgridwww.icar.cnr.it/kgrid