35
Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev Institute of Informatics Problems, Russian Academy of Unifying mediation of knowledge, data and services in a subject domain for problem solving over heterogeneous information resources Declaration of Intent Draft by IPI RAN SkTech.RC/IT/Madnick

Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

  • Upload
    dysis

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Unifying mediation of knowledge, data and services in a subject domain for problem solving over heterogeneous information resources. Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev Institute of Informatics Problems, Russian Academy of Science. - PowerPoint PPT Presentation

Citation preview

Page 1: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Institute of Informatics Problems, Russian Academy of Science

Unifying mediation of knowledge, data and services in a subject domain for problem solving over heterogeneous

information resources

Declaration of Intent Draft by IPI RAN SkTech.RC/IT/Madnick

Page 2: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Outline

State of the art in subject mediation reached at IPI RAN Directions of research and development suggested for use in the

proposal SkTech.RC/IT/Madnick Investigation of application driven approach for scientific problem

solving in the subject mediator environment Heterogeneous multidialect mediator infrastructure for data,

knowledge and services semantic integrationMediation of data bases with nontraditional data modelsStorage of very large volumes of data [Zakharov]Cyber security issues [Budzko, Korolev]

Self-certification Coverage by the DoI of a content of the three themes (Scientific

Dataspace, Data Quality and Big Data) declared by Prof Stuart Madnick

2

Page 3: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

State of the art in subject mediation reached at IPI RAN

3

Page 4: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Basic principles Subject mediation technology is aimed to fill the widening gap

between the users (applications) and heterogeneous distributed information resources

independence of definition of problem domain (the mediator definition) of the existing information resources

definition of a mediator as a result of consolidated efforts of the respective scientific community

independence of user interfaces of the multiple information resources involved

information about new resources can be published at any time independently of mediators acting at that time

GLAV-based setting for relevant information resources integration at the mediator

integrated access to the information resources in process of problem solving

recursive structure of a mediators 4

Page 5: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Canonical information model synthesis

5

R1

R2

R3

Resource information models

E1

E2

E3

Canonical Model

Kernel

refines

refines

refines

Page 6: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Resources identification and integration Identification relevant resources

metadata model (capabilities)ontological model (concepts and their relationships)canonical model (structure and behavior)

Integration of relevant resources in a mediator (registration)GLAV = Local As View (LAV) + Global As View (GAV)GAV: provide for reconciliation of various conflicts between resource

and mediator specificationsLAV: resource schemas are registered in mediator as materialized

views over virtual classes of a mediator stability of application problem specification during any modifications

of resources is provided scalability of mediators w.r.t. the number of resources is provided

6

Page 7: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Subject mediation: results obtained at IPI RAN (I)

A prototype of the subject mediation infrastructure used for problem solving over multiple distributed information resources (specifically, in the astronomy problem domain) [slide 8]

Methods and tools for mapping and transformation of information models of heterogeneous resources intended for their unification in mediation middleware The Model Unifier prototype tool aimed at partial automation of

heterogeneous information models unification has been implementedFirst version is based on term-rewriting technologyThe second version as an Eclipse platform application based on model

transformation languages is under implementation [slide 9] Methods for information resources semantic interoperability

support in a context of application problem domainTools for identification of resources relevant to a problem on the basis

of ontological descriptions of problem domain Tools for registration of the relevant resources in the mediator

7

Page 8: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Subject mediation infrastructure

8

Resource WrappersData Source WrapperService Wrapper

Semantic Mediation

Middleware

Mediator Specifications

Semantic Mappings

Rule-based Mediator Programs Conventional Application Programs

Application Conceptualization

Cloud ServiceCloud ServiceCloud ServiceCloud Service

Resource RegistryResource RegistryResource Registry

Canonical Information

Model (SYNTHESIS Language)

Clouds

Registry Wrapper

Information Grid

Рабочие станции DataBases

Grid ServiceGrid ServiceGrid ServiceGrid ServiceGrid ServiceGrid Service

Resource Registry

Серверы

Resource RegistryResource Registry

Computation and Information Resource Environments

Application Problem Domain

Computational Grid

Большие ЭВМРабочие станцииСерверыСерверы Серверы

Grid ServiceGrid ServiceGrid ServiceResource RegistryResource RegistryResource Registry Programming

Facilities

Rule-based Programs

Rule-based Programs

Rule-based Programs

Broker

W3C-RIF

Page 9: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Model Unifier architecture

9

Page 10: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Subject mediation: results obtained at IPI RAN (II) Methods and tools for rewriting of non-recursive mediator programs

into resource partial programs oriented on object schemas of resources and mediators and typed GLAV-views

A method for optimizing planning of resource partial programs execution over distributed environment takes into account capabilities of the resources assigns places of operation’s execution on the basis of estimative samples

Methods for dispersed organization of problem solving in the mediation environment An implementation of a problem in mediation environment may be

dispersed among programming systems, mediators, GLAV-views, wrappers and resources

Methods and tools for representation, manipulation and estimation of efficiency of dispersed organization

Algorithms for construction of efficient dispersed organization An original approach for binding of programming languages with

declarative mediator rule language The approach combines static and dynamic binding overcoming

impedance mismatch and allowing dynamic result types10

Page 11: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Directions of research and development

Application-driven approach for scientific problem solving

11

Page 12: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Application-driven approach for scientific problem solving

Approaches to the integrated representation of multiple information resources for problem solving:Resource-driven: an integrated representation of multiple resources is

created independently of the problemApplication-driven: a description of a problem class subject domain

is created, into which the relevant to the problem resources are mapped

Application-driven approach assumes creation of a subject mediator that supports an interaction between a user and resources

12

Page 13: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Experience of applying the application driven approach

The problem of secondary standards search for photometric calibration of optical components of gamma-ray bursts formulated by the Institute of Space Research of RAS

The problem was formalized and implemented applying the subject mediation: A glossary of the problem domain was manually extracted from the textual

specification An ontology required for problem solving was constructed Data structures, methods and functions constituting problem domain schema

were defined Resources relevant to the problem were identified in the Astrogrid and VizieR

information grids SDSS, USNO B-1, 2MASS, GSC, UCAC, VSX, ASAS, GCVS, NSVS

Resources were registered in the mediator and corresponding GLAV-views were obtained

The problem was formulated as a program consisted of a set of declarative rules over the mediator schema

The implemented mediator is used for an application monitoring in real time the e-mails informing about the gamma-ray bursts. The application extracts standards located in the area of a burst and e-mails them to subscribers.

13

Page 14: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Issues requiring further investigations

Semantic identification of resources relevant to a mediator Construction of semantic source to target schema mapping in

the presence of constraints reflecting specificity of various data models

Development of mediator program rewriting algorithms in presence of source and mediator constraints over the classes of objects

14

Page 15: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Directions of research and development

Heterogeneous multidialect mediator infrastructure for data, knowledge and services semantic integration

15

Page 16: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

An approach for the infrastructure

Recently W3C adopted Rule Interchange Format (RIF) standard oriented on interoperability of declarative programs

Objective integration of

multilanguage knowledge representations and rule-based declarative programs,

heterogeneous databases and services built on the basis of unified languages and multidialect mediation

infrastructure Idea

Combining RIF standard paradigm andGLAV approach built on the extensible canonical information model

16

Page 17: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Modular mediator infrastructure The multidialectal construction of the canonical model

Mediators are represented as a functional composition of declarative specification of modules

Each module is based on its own dialect with an appropriate semantics Mediator modules as peers:

Rule-based modules become the mediator components alongside with the GLAV-based modules

Interoperability of the modules is based on P2P and W3C RIF techniques.

Combination of integration and interoperability The information resource integration can be provided in the scope of an

individual mediator module The integration approaches in different modules can be different.

Rule-based specifications on different levels of the infrastructure Declarative programming over the mediators Various modules of a mediator Schema mapping for semantic integration of the information resources

in the mediator etc

17

Page 18: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Example of a problem solving in the multidialect mediation infrastructure A problem of finding an optimal assignment of applicants among

universitiesA set of n applicants is to be assigned among m universities, where qi is

the quota of the i-th collegeApplicants (universities) rank the universities (the applicants) in the

order of their preferenceThe aim is to find optimal assignment from the quotas of the colleges

and the two sets of orderingsAn assignment is unstable if there are two applicants α and β who are

assigned to colleges A and B, respectively, although β prefers A to B and A prefers β to α, otherwise an assignment is stable

A stable assignment is called optimal if every applicant is at least as well off under it as under any other stable assignment

Program calculating assignment is defined in DLV (ASP) The required information resources are integrated in a subject

mediator OntoBroker communicates with the users and applying its

ontologies, formulates the queries to the mediator and after collecting the required data, initiates a program in DLV

18

Page 19: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Optimal assignment problem infrastructure

19

RIF-BLD (via XML)

DLV (ASP facilities)

BLD → DLV

DLV → BLD

Synthesis Mediation Environment

BLD → Synthesis

Synthesis → BLD

OntoBroker

OB → BLD

BLD → OB

Resources

Ontologies

Multi-Layered Broker

Resp. 1, 4Resp. 2, 3Req. 2, 3

Req. 1, 4

Requests1. OB2DLV: GetProgram(Loc, Name [Params])2. OB2SYNTH: GetSchema(Loc, Name [Params])3. OB2SYNTH: SendExec(Loc,Name,Prog [Pars])4. OB2DLV: SendExec(Loc, Name, Prog [Pars])

Responses1. DLV2OB: DLV Program (without IDB)2. SYNTH2OB: Synthesis Schema3. SYNTH2OB: Result of OB program execution.4. DLV2OB: Result of DLV program execution.

Page 20: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Issues to be investigated and prototyped

Approaches for constructing of the rule-based dialect mappings Methods for justification of semantic preservation by the

mappings Approaches for modular representation of knowledge in the

multidialect mediation environment Approaches for providing of interoperability of the mediator

multidialect modules Infrastructure design and prototyping Real problems solving in a scientific subject domains chosen Expansion of the experience into the Semantic Web area

20

Page 21: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Directions of research and development

Mediation of data bases with nontraditional data models

21

Page 22: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Non-traditional data models NoSQL data models oriented on the support of extra large volumes

of data applying a “key-value” technology for vertical storageDynamo, BigTable, HBase, Cassandra, MongoDB, CouchDB.

Graph data models Neo4j, InfiniteGraph, DEX, InfoGrid, HyperGraphDB, Trinity, supporting

flexible data structures. Triple-based data model (expressible in RDF, RDFS)

Virtuoso, OWLIM, 5Store, Bigdata. OWL QL profile oriented on a support of ontological modeling over

relational databases and expressed by data dependencies used together with Datalog

“Scientific” data models SciDB applying a multidimensional array data model

Prof. Pentland Connection science-oriented data models

Most of these data models the standards still do not exist Most of these data models and systems are oriented on “big data”

support applying massive parallel technique of the MapReduce kind22

Page 23: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

The results of research planned to obtain

Information preserving methods of mapping and transformations of various classes of non-traditional data models into the canonical one

Mappings and transformations for specific data models and of adequate extensions of the canonical data model

Techniques for interpretation of canonical model DML in the DMLs of different classes of non-traditional data models and approaches for their implementation

Architectural decisions on implementation of the massive parallel techniques on the level of mediators, evaluation of performance growth that can be reached

Evaluation of suitability and efficiency of integration of non-traditional data models of different classes in the GLAV mediation infrastructure for various problem domains

23

Page 24: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Directions of research and development

Storage of very large volumes of data [Zakharov]

24

Page 25: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Storage of very large volumes of data [Zakharov]

The objective is to develop a novel distributed parallel fault-tolerant file system possessing the following capabilities:storage of data volumes of petabyte scaleunlimited period of storagescalabilityefficient multiuser access support in different kinds of networksusage of different storage types (e.g., HDD and flash memory)

The experience of existing file systems vendors should be taken onto account:ReFS (Windows Server 8) by MicrosoftVMFS by VMwareLustreZFS by Sun MicrosystemszFS (z/OS) by IBMOneFS by Isilon

25

Page 26: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Directions of research and development

Cyber security issues [Budzko, Korolev]

26

Page 27: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Cyber security issues [Budzko, Korolev] Information integrity and availability support for large-scale data

gathering & mining Technical architectures security analysis (network protocols,

architectures, operating systems, DBMSs, etc.) Vulnerability analysis Development of threat models Protection from insiders in personal information data centers

27

Page 28: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Self-assessment

28

Page 29: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Self-assessment (I) Relevance

Semantic integration of resources in the context of an applicationMediation of knowledgeMediation of non-traditional databasesSemantic Web and Big Data orientation.

NoveltyAn intellectual executable level for declarative conceptual level

specification of the problems in terms of the application domain for problem solving over diverse resources

Methods for information preserving data model mappings and for their implementation

Schema mapping and query rewriting methods in presence of constraints reflecting specificity of diverse data models, etc.

Breadth of scopeRelevant to a broad area of application domains, technologies and

research issues.

29

Page 30: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Self-assessment (II) Challengability

Hard theoretical and implementation problems need to be overcome Entrepreneurship possibilities

Areas of possible application are very diverseTo reach a proper commercialization level serious investments are

required Educational potential

Very broad, various courses can be proposed for master studentsMany challenging research topics for PhD research

30

Page 31: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Coverage of a content of the proposed themes

31

Page 32: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Scientific DataSpace Large-scale federated data architecture Semantic integration of heterogeneous information Context mediation Semantic web

Architecture for semantic mediation and integration of heterogeneous resources

Infrastructures: semantic layer for grids and clouds, P2P heterogeneous knowledge-based mediator infrastructures

Data model transformation, data model unification, declarative canonical model extension and synthesis

Justification of correctness of data model transformation, sets of dependencies (constraints) extending canonical model core should be decidable and tractable

Information resources: semantic description, canonical modeling, wrappers, registries, metadata

Problem domains: conceptual description, ontologies, metadata, multidomains, context mediation

Semantic based information resource discovery Semantic schema mapping for data exchange and integration

32

Page 33: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Data Quality Recognizing and resolving heterogeneous data semantics Effective integration of data from multiple and disparate data

sources

Semantic schema mapping Justification of correctness of data model (schemas and

operations) transformation Dispersed implementation of problems in subject mediation

environment

33

Page 34: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Big Data Data extraction and gathering from the web Federated data systems Parallel infrastructures for high-performance big data

manipulation and analysis Large-scale and novel “big data” applications Novel approaches to development of large-scale data

warehouses

Mediation infrastructure including Grids and clouds Non-traditional data models integration in the canonical data

model Parallel infrastructures at the mediation level Distributed parallel fault-tolerant file system

34

Page 35: Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

International Cyber Security Secure information architectures Techniques for assessment of threats and vulnerabilities

Cyber security issues

35