An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

Preview:

DESCRIPTION

FGWM 2009 presentation

Citation preview

LOGO

www.sp2.fr

An Ontology-Based Autonomic System for

Improving Data Warehouses by Cache

http://www.polytech.univ-nantes.fr/COD/

by CacheAllocation Management

Vlad Nicolicin-Georgescu, Henri BriandRemi Lehn and Vincent Benatier

Knowledge and Experience Management Workshop FG-WM 200922/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� Decision Support Systems

� Computerized systems with the main goal to analyze a series of facts and give propositions for acting regarding the facts involved – Business Intelligence

Introduction

3

involved – Business Intelligence

� Their core is the analytical (derived) data which is translated into data warehouse (architecture) with the help of data marts (the bricks) (Inmon, 2005)

� The challenge: managing the data warehouses efficiently(cost, performance and resource scaling)

Vlad Nicolicin Georgescu22/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� Enterprises’ decision support systems – at the end of thefirst year up to 90% of data warehouse efforts isconsidered as failure (Frolick and Lindsey, 2003)

� The main causes

Problematic - Industrial

5

� The main causes� Bad management - manual configurations, manual maintenance

operations, bad scaling of systems resources � Bad performance due to inefficient common resource sharing

between groups and conglomerates� Increase of the data warehouse size with time� Any of the data may be accessed at any time: ‘Give me what I

want so I can tell you what I really want’

Vlad Nicolicin Georgescu22/09/2009

LOGO

� High costs of data warehouse maintenance (due toprevious causes) translated into:

� Need for increase in a systems hardware resources(normal cost)

Problematic – Industrial

6

(normal cost)

� Need for decisional experts to configure and maintaindata warehouses (more costly)

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Example� 10 Data warehouses and shared RAM memory� 1 data warehouse requires 20GB of RAM -> 200GB of RAM

• Costly high (sometimes not a problem)• Architecturally impossible (stuck!)

Problematic – Industrial

7

� How to reallocate and manage?� To manage them the enterprise makes use of an expert to

configure and maintain how the memory is allocated based on each data warehouse’s needs : priority, usage period, changes in the architecture etc

� The problem repeats recursively� Too hard to sustain due to cost and human limits

Vlad Nicolicin Georgescu22/09/2009

LOGO

� How to manage efficiently decision support systems:� How to formalize non structured data from different

sources (editors readme, forums, html ..)

� How to render various processes (RAM memory

Problematic – Scientific

8

� How to render various processes (RAM memory allocation between groups of data warehouse) autonomic based on the formalized knowledge

� Finding suitable algorithms for resource allocation and parameter configuration (cache memory ) in groups of data warehouse

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Building knowledge bases based on decision supportsystems - Ontologies and Ontology Based Rules

� Autonomic Computing based on the knowledge bases& algorithms for improving data warehouse performance

Problematic – Scientific

9

& algorithms for improving data warehouse performance

� Combining the notions of knowledge formalization withthe notions of autonomic computing for data warehousemanagement

Vlad Nicolicin Georgescu22/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� Manage data warehouse for improving itsperformances

Knowledge Management

11

� Knowledge division in the knowledge base toexpress a decision support system

Vlad Nicolicin Georgescu22/09/2009

LOGO

� The measure of performance: query response time fordata retrieval operations

� Analytical data is presented as opposed to operationaldata by being retrieval time relaxed (Inmon, 2005)

Knowledge ManagementData Warehouse Performance

12

data by being retrieval time relaxed (Inmon, 2005)

� True : if the operations we speak of concern aggregation andcalculation operations (i.e. during night)

� Not so true : when performing data retrieval tasks for rapportgeneration (day usage of the data warehouse)

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Several propositions for query response timeimprovement:

� (Malik et al, 2008): how to design physically data basesthroughout caches – data base and architecture oriented

Knowledge ManagementData Warehouse Performance

13

throughout caches – data base and architecture oriented

� (Saharia and Babad, 2000): determining which data is mostlikely to be accessed so it can be stored into caches - workswell for single data warehouse improvement and concerns thedata requested rather than on how to modify the datawarehouse parameters.

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Our proposition for dividing knowledge to represent adecision support system

� Three main types

Knowledge ManagementKnowledge Division

14

� Architectural

� Configuration and performance

� Experience and advice/best practices

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Architectural information� What components are part of a decision support systems� How are these entities linked and how do they exchange� What are the common resources characteristic for each entity

and shared between the

Knowledge ManagementKnowledge Division

15 Vlad Nicolicin Georgescu22/09/2009

LOGO

� Configuration and performance indicators (forEssbase multidimensional cubes)

� For each of the data warehouse: index file and data file size (how much space does it occupy on the disk )

Knowledge ManagementKnowledge Division

16

(how much space does it occupy on the disk )

� Three types of caches: index, data file and data cache

� Query response time on data retrieval operations

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Experience and best practices

� More delicate due to its subjectivity and non structured form in which the information finds itself

Knowledge ManagementKnowledge Division

17

� Represents all knowledge concerning decision support system and data warehouse management (in any form)

� Comes from several sources

� Formalized under the form of rules knowledge base , such as Event Condition Rules (Huebscher et al, 2008)

Vlad Nicolicin Georgescu22/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� Previous propositions of representing self managing systems:

� Inspired by the functioning of the human body (Wang, 2007)

Autonomic Computing

19

� Self-healing systems to be further on elaborated to self-X systems (Gosh et al., 2007)

� Proposition made by IBM in 2001, and refined towards the current known form (IBM, 2001)

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Autonomic computing - the ability for an IT infrastructure to adapt and change in accordance with business policies and objectives, guiding systems to be (IBM, 2001):

Autonomic Computing

20

� Self-configuring

� Self-healing

� Self-optimizing

� Self-protecting

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Autonomic Computing Manager : automates the self-Xfunctions and externalizes these functions according tothe behavior defined by the management interfaces(IBM, 2001). The MAPE-K loop:

Autonomic ComputingAutonomic Computing Manager

21 Vlad Nicolicin Georgescu22/09/2009

LOGO

� We propose the implementation of the loop on each of the levels from the architecture of the decision support system

� Each entity has its own individual loop and is related to

Autonomic ComputingAutonomic Computing Manager

22

� Each entity has its own individual loop and is related to the superior entities only

� Each entity’s manager has two ‘responsibilities’:� Its individual self-management� Its direct children management

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Retaking the Decision Support System’s schema

Autonomic ComputingAutonomic Computing Manager

23 Vlad Nicolicin Georgescu22/09/2009

LOGO

� Self-improvement algorithm:� Specific for the individual loop of each of the data warehouse� Executed at the end of each day when statics over the usage of

the data warehouse are gathered and its parameters can be changed

Autonomic ComputingAlgorithms Self-Improvement

24

� Tries to improve the cache allocation for a data warehouse by repetitively decreasing the cache values up to a certain limit:

• Step : the amount of cache decrease at each time period (CV –cache value)

CV1 = CV0 - (CVmax –CV0)*step• Delta : the threshold at which the algorithm stops. The impact that a

cache modification has. If (RT1-RT0)/RT0 < delta then we accept the new cache proposition. (RT – average query response time)

Vlad Nicolicin Georgescu22/09/2009

LOGO

Autonomic ComputingAlgorithms Self-Improvement

25 Vlad Nicolicin Georgescu22/09/2009

LOGO

�Group improvement algorithm� Specific for each application (seen as a group of data

warehouse)

� Has the role of reallocating caches periodically between the data

Autonomic ComputingGroup-ImprovementAlgorithm

26

� Has the role of reallocating caches periodically between the data warehouses in the group depending on their average performance

� ‘The catch’: by a small sacrifice (delta) of some data warehouses there is important performance gain to others

� How to distinguish between performance and nonperformance data warehouses?

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Performance data warehouse: its average query response time is under the average response time of the group

� Non-performance data warehouse: the ones that are above (the equal can go in one of the two categories)

Autonomic ComputingGroup-ImprovementAlgorithm

27

above (the equal can go in one of the two categories)

Vlad Nicolicin Georgescu22/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� Bringing the Knowledge Management , Autonomic Computing and Algorithms all together

� Knowledge bases are formalized with the help of OWL ontologies and ontology based rules

Combining the elements

29

ontologies and ontology based rules

� Autonomic Computing Managers are implemented with the help of ontology based rules and Java programs

� Algorithms are formalized by ontologies , rules and java programs

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Ontology : explicit formal specifications of the terms in the domain and relations among them (Grubber, 1992)

� It expresses:� The hierarchical inclusion relations between entities (taxonomy)

Combining the elementsKnowledge base

30

� The hierarchical inclusion relations between entities (taxonomy) � The inter-entity concept relations that makes it much more

powerful than a taxonomy

� Used with several knowledge formalization approaches

Vlad Nicolicin Georgescu22/09/2009

LOGO

� OWL: � W3C recommendation in xml based format for ontology representation � Evolved from the RDF

� It provides the main concepts of:� Individual : an instance of ‘something’, the actual concept itself (i.e.

John , Mary, Bob )

Combining the elementsKnowledge base

31

John , Mary, Bob )� Class : a group of individuals belonging to a same set having common

properties (i.e. John, Mary, Bob are Human , John, Bob are Men)� Property : a characteristic of an individual that makes it different form

others and allows him to belong to a class • Data type property : links an individual to a literal value (John is 30

years old )• Object property : links an individual to other individuals (John is the

friend of Mary, Mary hates Bob)

� Sentence representation: (subject, predicate, object) – (John, hasAge, 30)

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Used to formalize the first two types of information: architectural and configuration/performance

� The ‘static’ aspect of the approach� An OWL representation of a data warehouse

Combining the elementsKnowledge base

32 Vlad Nicolicin Georgescu22/09/2009

LOGO

� The dynamic part of the knowledge management aspect

� The rules that formalize:� The passage between the four states of the Autonomic

Computing Manager

Combining the elementsAutonomic Computing

33

Computing Manager� How does the knowledge base in the middle of the loop

connects with each state � How the two algorithms are implemented over the loop

� We base our approach on previous works to using autonomic computing with ontologies (Stojanovic, 2004)

Vlad Nicolicin Georgescu22/09/2009

LOGO

� Autonomic Computing Manager loop phases applied on the levels of the decision support systems

Combining the elementsAutonomic Computing

34 Vlad Nicolicin Georgescu22/09/2009

LOGO

� Described using Jena Ontology based rules � Example of the data warehouse individual self-improving

algorithm

Combining the elementsAlgorithms

35 Vlad Nicolicin Georgescu22/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� Scenario:� With Oracle Hyperion Essbase BI solution� An Essbase application with two data warehouses (DW1 and

DW2) � A period of 14 days to see how each data warehouse improves

Results

37

and how the application relocates the memory� A random series of queries (from a given pool) is done on each

data warehouse each day� Individual self-improvement algorithm runs each day� Group reallocation algorithm runs each 4 days

Vlad Nicolicin Georgescu22/09/2009

LOGO

Results

38 Vlad Nicolicin Georgescu22/09/2009

LOGO

� At the end of day 5 we have a good ratio response time/cache allocation

� The data warehouses improve themselves (individual algorithm) fast and then oscillate around this point

Results

39

algorithm) fast and then oscillate around this point (DW2)

� At the end of the 6th day:� DW2 looses 2% in response time� DW1 gains around 80%� The application has reduced its memory consumption with 60%.

Vlad Nicolicin Georgescu22/09/2009

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

LOGO

� We have presented a common problematic in enterprises today: knowledge management in decision support systems

� We have presented how can we formalize data

Conclusions & Future DirectionsConclusions

41

� We have presented how can we formalize data warehouses with the help of ontologies and ontology based rules data

� We have seen how we can enable autonomy by using Autonomic Computing

� We presented results over a test on a real applicationVlad Nicolicin Georgescu22/09/2009

LOGO

� Extension of the parameters used for data warehouse performance: calculation time, aggregation time etc.

� Introduction of Service License Agreement (SLA) notions for defining data warehouse usage

Conclusions & Future DirectionsFuture directions

42

notions for defining data warehouse usage specifications

� Extension of the knowledge base so it can be enriched in an autonomic way

� Introduction of attenuation in algorithms to avoid oscillation

Vlad Nicolicin Georgescu22/09/2009

LOGO

Remarks…Questions…Propositions…

Vlad Nicolicin Georgescu22/09/2009

LOGO

References

� Mark N. Frolick and Keith Lindsey. Critical factors for data warehouse failure. Business Intelligence Journal, Vol. 8, No. 3, 2003.

� Debanjan Ghosh, Raj Sharman, H. Raghav Rao, and Shambhu Upadhyaya. Self-healing systems — survey and synthesis. Decision Support Systems 42, Vol 42:p. 2164–2185, 2007

� T. Gruber. What is an ontology? Academic Press Pub., 1992� M.C. Huebscher and J.A. McCann. A survey on autonomic computing – degrees, models and applications. ACM

Computing Surveys, Vol. 40, No. 3, 2008� Corporation IBM. An architectural blueprint for autonomic computing. IBMCorporation, 2001� Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer � Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer

Science and Network Security, Vol.7 No.10:p. 2–4, 2005� W.H. Inmon. Building the data warehouse, fourth edition. Wiley Publishing, 2005� S.S. Lightstone, G. Lohman, and D. Zilio. Toward autonomic computing with db2 universal database. ACM

SIGMOD Record, Vol. 31, Issue 3, 2002� A. Mateen, B. Raza, and T. Hussain. Autonomic computing in sql server. In 7th IEEE/ACIS International

Conference on Computer and Information Science, 2008� L. Stojanovic, J. Schneider, A. Maedche, S. Libischer, R. Studer, Th. Lumpp, A. Abecker, G. Breiter, and

J. Dinger. The role of ontologies in autonomic computing systems. IBM Systems Journal, Vol. 43, No. 3:p. 598–616, 2004

� V. Markl, G. M. Lohman, and V. Raman. Leo : An autonomic optimizer for db2. IBM Systems Journal, Vol. 42, No. 1, 2003

� A. N. Saharia and Y.M. Babad. Enhancing data warehouse performance through query caching. The DATA BASE Advances in Informatics Systems, Vol 31, No.3, 2000

� Yingxu Wang, Toward Theoretical Foundations of Autonomic Computing, Int’l Journal of Cognitive Informatics and Natural Intelligence, 1(3), 1-16, July-September 2007

Vlad Nicolicin Georgescu22/09/2009

Recommended