44
LOGO www.sp2.fr An Ontology-Based Autonomic System for Improving Data Warehouses by Cache http://www.polytech.univ-nantes.fr/COD/ by Cache Allocation Management Vlad Nicolicin-Georgescu, Henri Briand Remi Lehn and Vincent Benatier Knowledge and Experience Management Workshop FG-WM 2009 22/09/2009

An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

Embed Size (px)

DESCRIPTION

FGWM 2009 presentation

Citation preview

Page 1: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

www.sp2.fr

An Ontology-Based Autonomic System for

Improving Data Warehouses by Cache

http://www.polytech.univ-nantes.fr/COD/

by CacheAllocation Management

Vlad Nicolicin-Georgescu, Henri BriandRemi Lehn and Vincent Benatier

Knowledge and Experience Management Workshop FG-WM 200922/09/2009

Page 2: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 3: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Decision Support Systems

� Computerized systems with the main goal to analyze a series of facts and give propositions for acting regarding the facts involved – Business Intelligence

Introduction

3

involved – Business Intelligence

� Their core is the analytical (derived) data which is translated into data warehouse (architecture) with the help of data marts (the bricks) (Inmon, 2005)

� The challenge: managing the data warehouses efficiently(cost, performance and resource scaling)

Vlad Nicolicin Georgescu22/09/2009

Page 4: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 5: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Enterprises’ decision support systems – at the end of thefirst year up to 90% of data warehouse efforts isconsidered as failure (Frolick and Lindsey, 2003)

� The main causes

Problematic - Industrial

5

� The main causes� Bad management - manual configurations, manual maintenance

operations, bad scaling of systems resources � Bad performance due to inefficient common resource sharing

between groups and conglomerates� Increase of the data warehouse size with time� Any of the data may be accessed at any time: ‘Give me what I

want so I can tell you what I really want’

Vlad Nicolicin Georgescu22/09/2009

Page 6: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� High costs of data warehouse maintenance (due toprevious causes) translated into:

� Need for increase in a systems hardware resources(normal cost)

Problematic – Industrial

6

(normal cost)

� Need for decisional experts to configure and maintaindata warehouses (more costly)

Vlad Nicolicin Georgescu22/09/2009

Page 7: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Example� 10 Data warehouses and shared RAM memory� 1 data warehouse requires 20GB of RAM -> 200GB of RAM

• Costly high (sometimes not a problem)• Architecturally impossible (stuck!)

Problematic – Industrial

7

� How to reallocate and manage?� To manage them the enterprise makes use of an expert to

configure and maintain how the memory is allocated based on each data warehouse’s needs : priority, usage period, changes in the architecture etc

� The problem repeats recursively� Too hard to sustain due to cost and human limits

Vlad Nicolicin Georgescu22/09/2009

Page 8: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� How to manage efficiently decision support systems:� How to formalize non structured data from different

sources (editors readme, forums, html ..)

� How to render various processes (RAM memory

Problematic – Scientific

8

� How to render various processes (RAM memory allocation between groups of data warehouse) autonomic based on the formalized knowledge

� Finding suitable algorithms for resource allocation and parameter configuration (cache memory ) in groups of data warehouse

Vlad Nicolicin Georgescu22/09/2009

Page 9: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Building knowledge bases based on decision supportsystems - Ontologies and Ontology Based Rules

� Autonomic Computing based on the knowledge bases& algorithms for improving data warehouse performance

Problematic – Scientific

9

& algorithms for improving data warehouse performance

� Combining the notions of knowledge formalization withthe notions of autonomic computing for data warehousemanagement

Vlad Nicolicin Georgescu22/09/2009

Page 10: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 11: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Manage data warehouse for improving itsperformances

Knowledge Management

11

� Knowledge division in the knowledge base toexpress a decision support system

Vlad Nicolicin Georgescu22/09/2009

Page 12: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� The measure of performance: query response time fordata retrieval operations

� Analytical data is presented as opposed to operationaldata by being retrieval time relaxed (Inmon, 2005)

Knowledge ManagementData Warehouse Performance

12

data by being retrieval time relaxed (Inmon, 2005)

� True : if the operations we speak of concern aggregation andcalculation operations (i.e. during night)

� Not so true : when performing data retrieval tasks for rapportgeneration (day usage of the data warehouse)

Vlad Nicolicin Georgescu22/09/2009

Page 13: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Several propositions for query response timeimprovement:

� (Malik et al, 2008): how to design physically data basesthroughout caches – data base and architecture oriented

Knowledge ManagementData Warehouse Performance

13

throughout caches – data base and architecture oriented

� (Saharia and Babad, 2000): determining which data is mostlikely to be accessed so it can be stored into caches - workswell for single data warehouse improvement and concerns thedata requested rather than on how to modify the datawarehouse parameters.

Vlad Nicolicin Georgescu22/09/2009

Page 14: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Our proposition for dividing knowledge to represent adecision support system

� Three main types

Knowledge ManagementKnowledge Division

14

� Architectural

� Configuration and performance

� Experience and advice/best practices

Vlad Nicolicin Georgescu22/09/2009

Page 15: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Architectural information� What components are part of a decision support systems� How are these entities linked and how do they exchange� What are the common resources characteristic for each entity

and shared between the

Knowledge ManagementKnowledge Division

15 Vlad Nicolicin Georgescu22/09/2009

Page 16: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Configuration and performance indicators (forEssbase multidimensional cubes)

� For each of the data warehouse: index file and data file size (how much space does it occupy on the disk )

Knowledge ManagementKnowledge Division

16

(how much space does it occupy on the disk )

� Three types of caches: index, data file and data cache

� Query response time on data retrieval operations

Vlad Nicolicin Georgescu22/09/2009

Page 17: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Experience and best practices

� More delicate due to its subjectivity and non structured form in which the information finds itself

Knowledge ManagementKnowledge Division

17

� Represents all knowledge concerning decision support system and data warehouse management (in any form)

� Comes from several sources

� Formalized under the form of rules knowledge base , such as Event Condition Rules (Huebscher et al, 2008)

Vlad Nicolicin Georgescu22/09/2009

Page 18: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 19: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Previous propositions of representing self managing systems:

� Inspired by the functioning of the human body (Wang, 2007)

Autonomic Computing

19

� Self-healing systems to be further on elaborated to self-X systems (Gosh et al., 2007)

� Proposition made by IBM in 2001, and refined towards the current known form (IBM, 2001)

Vlad Nicolicin Georgescu22/09/2009

Page 20: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Autonomic computing - the ability for an IT infrastructure to adapt and change in accordance with business policies and objectives, guiding systems to be (IBM, 2001):

Autonomic Computing

20

� Self-configuring

� Self-healing

� Self-optimizing

� Self-protecting

Vlad Nicolicin Georgescu22/09/2009

Page 21: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Autonomic Computing Manager : automates the self-Xfunctions and externalizes these functions according tothe behavior defined by the management interfaces(IBM, 2001). The MAPE-K loop:

Autonomic ComputingAutonomic Computing Manager

21 Vlad Nicolicin Georgescu22/09/2009

Page 22: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� We propose the implementation of the loop on each of the levels from the architecture of the decision support system

� Each entity has its own individual loop and is related to

Autonomic ComputingAutonomic Computing Manager

22

� Each entity has its own individual loop and is related to the superior entities only

� Each entity’s manager has two ‘responsibilities’:� Its individual self-management� Its direct children management

Vlad Nicolicin Georgescu22/09/2009

Page 23: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Retaking the Decision Support System’s schema

Autonomic ComputingAutonomic Computing Manager

23 Vlad Nicolicin Georgescu22/09/2009

Page 24: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Self-improvement algorithm:� Specific for the individual loop of each of the data warehouse� Executed at the end of each day when statics over the usage of

the data warehouse are gathered and its parameters can be changed

Autonomic ComputingAlgorithms Self-Improvement

24

� Tries to improve the cache allocation for a data warehouse by repetitively decreasing the cache values up to a certain limit:

• Step : the amount of cache decrease at each time period (CV –cache value)

CV1 = CV0 - (CVmax –CV0)*step• Delta : the threshold at which the algorithm stops. The impact that a

cache modification has. If (RT1-RT0)/RT0 < delta then we accept the new cache proposition. (RT – average query response time)

Vlad Nicolicin Georgescu22/09/2009

Page 25: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Autonomic ComputingAlgorithms Self-Improvement

25 Vlad Nicolicin Georgescu22/09/2009

Page 26: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

�Group improvement algorithm� Specific for each application (seen as a group of data

warehouse)

� Has the role of reallocating caches periodically between the data

Autonomic ComputingGroup-ImprovementAlgorithm

26

� Has the role of reallocating caches periodically between the data warehouses in the group depending on their average performance

� ‘The catch’: by a small sacrifice (delta) of some data warehouses there is important performance gain to others

� How to distinguish between performance and nonperformance data warehouses?

Vlad Nicolicin Georgescu22/09/2009

Page 27: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Performance data warehouse: its average query response time is under the average response time of the group

� Non-performance data warehouse: the ones that are above (the equal can go in one of the two categories)

Autonomic ComputingGroup-ImprovementAlgorithm

27

above (the equal can go in one of the two categories)

Vlad Nicolicin Georgescu22/09/2009

Page 28: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 29: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Bringing the Knowledge Management , Autonomic Computing and Algorithms all together

� Knowledge bases are formalized with the help of OWL ontologies and ontology based rules

Combining the elements

29

ontologies and ontology based rules

� Autonomic Computing Managers are implemented with the help of ontology based rules and Java programs

� Algorithms are formalized by ontologies , rules and java programs

Vlad Nicolicin Georgescu22/09/2009

Page 30: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Ontology : explicit formal specifications of the terms in the domain and relations among them (Grubber, 1992)

� It expresses:� The hierarchical inclusion relations between entities (taxonomy)

Combining the elementsKnowledge base

30

� The hierarchical inclusion relations between entities (taxonomy) � The inter-entity concept relations that makes it much more

powerful than a taxonomy

� Used with several knowledge formalization approaches

Vlad Nicolicin Georgescu22/09/2009

Page 31: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� OWL: � W3C recommendation in xml based format for ontology representation � Evolved from the RDF

� It provides the main concepts of:� Individual : an instance of ‘something’, the actual concept itself (i.e.

John , Mary, Bob )

Combining the elementsKnowledge base

31

John , Mary, Bob )� Class : a group of individuals belonging to a same set having common

properties (i.e. John, Mary, Bob are Human , John, Bob are Men)� Property : a characteristic of an individual that makes it different form

others and allows him to belong to a class • Data type property : links an individual to a literal value (John is 30

years old )• Object property : links an individual to other individuals (John is the

friend of Mary, Mary hates Bob)

� Sentence representation: (subject, predicate, object) – (John, hasAge, 30)

Vlad Nicolicin Georgescu22/09/2009

Page 32: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Used to formalize the first two types of information: architectural and configuration/performance

� The ‘static’ aspect of the approach� An OWL representation of a data warehouse

Combining the elementsKnowledge base

32 Vlad Nicolicin Georgescu22/09/2009

Page 33: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� The dynamic part of the knowledge management aspect

� The rules that formalize:� The passage between the four states of the Autonomic

Computing Manager

Combining the elementsAutonomic Computing

33

Computing Manager� How does the knowledge base in the middle of the loop

connects with each state � How the two algorithms are implemented over the loop

� We base our approach on previous works to using autonomic computing with ontologies (Stojanovic, 2004)

Vlad Nicolicin Georgescu22/09/2009

Page 34: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Autonomic Computing Manager loop phases applied on the levels of the decision support systems

Combining the elementsAutonomic Computing

34 Vlad Nicolicin Georgescu22/09/2009

Page 35: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Described using Jena Ontology based rules � Example of the data warehouse individual self-improving

algorithm

Combining the elementsAlgorithms

35 Vlad Nicolicin Georgescu22/09/2009

Page 36: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 37: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Scenario:� With Oracle Hyperion Essbase BI solution� An Essbase application with two data warehouses (DW1 and

DW2) � A period of 14 days to see how each data warehouse improves

Results

37

and how the application relocates the memory� A random series of queries (from a given pool) is done on each

data warehouse each day� Individual self-improvement algorithm runs each day� Group reallocation algorithm runs each 4 days

Vlad Nicolicin Georgescu22/09/2009

Page 38: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Results

38 Vlad Nicolicin Georgescu22/09/2009

Page 39: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� At the end of day 5 we have a good ratio response time/cache allocation

� The data warehouses improve themselves (individual algorithm) fast and then oscillate around this point

Results

39

algorithm) fast and then oscillate around this point (DW2)

� At the end of the 6th day:� DW2 looses 2% in response time� DW1 gains around 80%� The application has reduced its memory consumption with 60%.

Vlad Nicolicin Georgescu22/09/2009

Page 40: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Contents

Introduction1

Problematic2

Knowledge Management3

Vlad Nicolicin Georgescu

Knowledge Management3

Autonomic Computing4

Results6

Combining the Elements5

Conclusions and Future Directions7

22/09/2009

Page 41: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� We have presented a common problematic in enterprises today: knowledge management in decision support systems

� We have presented how can we formalize data

Conclusions & Future DirectionsConclusions

41

� We have presented how can we formalize data warehouses with the help of ontologies and ontology based rules data

� We have seen how we can enable autonomy by using Autonomic Computing

� We presented results over a test on a real applicationVlad Nicolicin Georgescu22/09/2009

Page 42: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

� Extension of the parameters used for data warehouse performance: calculation time, aggregation time etc.

� Introduction of Service License Agreement (SLA) notions for defining data warehouse usage

Conclusions & Future DirectionsFuture directions

42

notions for defining data warehouse usage specifications

� Extension of the knowledge base so it can be enriched in an autonomic way

� Introduction of attenuation in algorithms to avoid oscillation

Vlad Nicolicin Georgescu22/09/2009

Page 43: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

Remarks…Questions…Propositions…

Vlad Nicolicin Georgescu22/09/2009

Page 44: An Ontology-Based Autonomic System for Improving Data Warehouses by Cache Allocation Management

LOGO

References

� Mark N. Frolick and Keith Lindsey. Critical factors for data warehouse failure. Business Intelligence Journal, Vol. 8, No. 3, 2003.

� Debanjan Ghosh, Raj Sharman, H. Raghav Rao, and Shambhu Upadhyaya. Self-healing systems — survey and synthesis. Decision Support Systems 42, Vol 42:p. 2164–2185, 2007

� T. Gruber. What is an ontology? Academic Press Pub., 1992� M.C. Huebscher and J.A. McCann. A survey on autonomic computing – degrees, models and applications. ACM

Computing Surveys, Vol. 40, No. 3, 2008� Corporation IBM. An architectural blueprint for autonomic computing. IBMCorporation, 2001� Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer � Corporation IBM. Autonomic computing. powering your business for success. International Journal of Computer

Science and Network Security, Vol.7 No.10:p. 2–4, 2005� W.H. Inmon. Building the data warehouse, fourth edition. Wiley Publishing, 2005� S.S. Lightstone, G. Lohman, and D. Zilio. Toward autonomic computing with db2 universal database. ACM

SIGMOD Record, Vol. 31, Issue 3, 2002� A. Mateen, B. Raza, and T. Hussain. Autonomic computing in sql server. In 7th IEEE/ACIS International

Conference on Computer and Information Science, 2008� L. Stojanovic, J. Schneider, A. Maedche, S. Libischer, R. Studer, Th. Lumpp, A. Abecker, G. Breiter, and

J. Dinger. The role of ontologies in autonomic computing systems. IBM Systems Journal, Vol. 43, No. 3:p. 598–616, 2004

� V. Markl, G. M. Lohman, and V. Raman. Leo : An autonomic optimizer for db2. IBM Systems Journal, Vol. 42, No. 1, 2003

� A. N. Saharia and Y.M. Babad. Enhancing data warehouse performance through query caching. The DATA BASE Advances in Informatics Systems, Vol 31, No.3, 2000

� Yingxu Wang, Toward Theoretical Foundations of Autonomic Computing, Int’l Journal of Cognitive Informatics and Natural Intelligence, 1(3), 1-16, July-September 2007

Vlad Nicolicin Georgescu22/09/2009