View
216
Download
0
Category
Preview:
Citation preview
1Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
1
Advanced databases –
Introduction and overview
Prof. Dr. Bettina Berendt
Katholieke Universiteit Leuven, Department of Computer Science
http://www.cs.kuleuven.ac.be/~berendt/teaching/2009-10-1stsemester/adb/
Last update: 23 September 2009
2Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
2
Agenda
Organisation of the course
Motivation and overview
Data, information, and knowledge
Conceptual modelling, schemas, and ontologies
Recap: Entity-relationship model for data modelling
3Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
3Organisation of the course
www.cs.kuleuven.be/~berendt/teaching/2009-10-1stsemester/adb
Master’s course for CS students specializing in Databases + others
Teaching
Lecture: Bettina Berendt, in English
Exercises and homeworks: Ilija Subašić, in English
Materials:
see Web site; available ~ 1 week before each class
Grading based on exercises; no exam
Contact us:
via the toledo system (details to be announced)
[ bettina.berendt | ilija.subasic]@cs.kuleuven.be
4Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
4
Agenda
Organisation of the course
Motivation and overview
Data, information, and knowledge
Conceptual modelling, schemas, and ontologies
Recap: Entity-relationship model for data modelling
5Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
5LOTS of data(often, but not always, in database form)
6Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
6
What is this course about? (1) – What does it build on?
The database field profits from a well-understood, well-functioning, commonly-used general model: relational databases
You have learned about this in the Databases course
Relational databases: a „homogenizing model“
What else makes databases so powerful today ?
7Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
71. Data are accessible because they are interconnected
(often, but not always, over the Internet/Web)
8Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
82. Heterogeneous data are integrated (often, but not always, „semantically“)
9Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
93. They are analysed to reveal the „knowledge“ implicit in them(e.g., link structure PageRank sorting to order by relevance)
10Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
10An application example:Where do people live who will buy the Qur‘an soon?
11Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
11
Data source #1: Amazon wishlists
[Owad, T. (2006). Data Mining 101: Finding Subversives with Amazon Wishlists. http://www.applefritter.com/bannedbooks]
12Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
12Data sources #2-#4: address books, geocoders, visualizations
1. http://www.amazon.com/gp/registry/search.html/?encoding=UTF8&type=wishlist&field-name=edgar&page=1 contains “edgar“ wishlist URLs:
http://www.amazon.com/gp/registry/registry.html/?encoding=UTF8&type=wishlist&id=theFirstEdgar...
2. 6-line shell script + wget : Many wish lists
3. ls -1 | xargs grep -HiFof /Volumes/UFS/terms.txt > /Volumes/UFS/matches.txt (or search by ISBN):
search term (or ISBN) {person name + city}
4. http://people.yahoo.com/
book {name + address}
5. http://www.ontok.com/geocode :
book {geo-coordinates}
6. Google Maps API: insert geo-coordinates into map
13Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
13
So: What is this course about? (2) – What will it be about?
The database field profits from a well-understood, well-functioning, commonly-used general model: relational databases
You have learned about this in the Databases course
Relational databases: a „homogenizing model“
What else makes databases so powerful today ?
Semantic integration of heterogeneous data
Integration over the Internet/Web
Analysis beyond retrieval: „Knowledge discovery (in databases)“ aka „Data mining“
14Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
14
Outline of the course
Lectures (see Web page)
Exercises progress from small „Bachelor-type exercises“ to a larger joint „mini-
project“ with distributed teams
conceptual elements (modelling), tool use, programming, reports
Will be similar in structure to last year:
1. Create a conceptual model in UML of ...
2. Model the same domain in OWL
3. Federated search: Retrieve information from different databases
4. Convert information (2008: XML the OWL model created in ex. 2)
5. Extract implicit knowledge from a given relational database table
6. Extract implicit knowledge from a given semi-structured dataset
7. Knowledge discovery from real data on the Web (Wikipedia): retrieval, preprocessing, semantic enrichment, model integration, pattern extraction, visualisation, model comparison
15Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
15
Learning outcomes: After this course, you will ...
understand and master relevant concepts and techniques of current databases and processing based on databases
understand the potentials, limitations, and risks inherent in assembling, combining, and processing huge amounts of heterogeneous data in globally interconnected environments
be able to design such databases and connectivity and relevant methods for combining and enriching data
have worked with concrete examples of such data collection / processing
16Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
16
Agenda
Organisation of the course
Motivation and overview
Data, information, and knowledge
Conceptual modelling, schemas, and ontologies
Recap: Entity-relationship model for data modelling
17Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
17
Data and information
Datum / Data
Fact or concept from reality, in a form suitable for communicating it, interpreting it, and processing it
Information
Interpreted data
Example:
The length of the road is 400 km
Interpretation Data
(based on Henk Olivié: Gegevensbanken – 01. 2006/07)
18Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
18
Data, information, and knowledge
Data represents a fact or statement of event
without relation to other things. Ex: It is raining.
Information embodies the understanding of a relationship of some sort, possibly cause and effect.
Ex: The temperature dropped 15 degrees and then it started raining.
Knowledge represents a pattern that connects and generally provides a high level of predictability as to what is described or what will happen next.
Ex: If the humidity is very high and the temperature drops substantially the atmospheres is often unlikely to be able to hold the moisture so it rains.
(This is from knowledge-management theory. If you want to know about wisdom, check the Web page:
G. Bellinger, D. Castro, & A. Mills: Data, Information, Knowledge, and Wisdom. http://www.systems-thinking.org/dikw/dikw.htm )
19Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
19
„Knowledge“ as used in this course
Data represents a fact or statement of event
without relation to other things. Ex: It is raining.
Information embodies the understanding of a relationship of some sort, possibly cause and effect.
Ex: The temperature dropped 15 degrees and then it started raining.
Knowledge represents a pattern that connects and generally provides a high level of predictability as to what is described or what will happen next.
Ex: If the humidity is very high and the temperature drops substantially the atmospheres is often unlikely to be able to hold the moisture so it rains.
This definition of „knowledge“ corresponds to that used in Data mining (aka „knowledge discovery (in databases)“) (in particular symbolic) AI (e.g., „knowledge-based systems“)
It is not the only definition; e.g., cognitive psychology generally assumes that only people can have knowledge, such that computers can only possess (different types of) information.
20Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
20Computerizing data, information, and knowledge:Databases and knowledge bases
Databases
= data + interpretation (metadata)
focus on data and information
= focus on the retrieval of data and information
Knowledge bases
a special kind of database
provide the means for the computerized collection, organization, and retrieval of knowledge
focus on knowledge
= focus on the inferences that can be made from data+information
21Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
21Combining data and knowledge from different sources:The importance of conceptual models
To combine data from different databases:
know + integrate their conceptual models
To combine data from databases and knowledge bases:
1. understand the commonalities and differences of their conceptual meta-models
Simplified:
database conceptual models = entities + relations
knowledge base conceptual models = entities + relations + rules for inferencing
2. integrate these conceptual models (as for databases)
22Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
22
Agenda
Organisation of the course
Motivation and overview
Data, information, and knowledge
Conceptual modelling, schemas, and ontologies
Recap: Entity-relationship model for data modelling
23Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
23
Conceptual modelling as a part of database design
24Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
24
Conceptual database schemas and conceptual models in general
Conceptual schema: a concise description of the data requirements of the users
includes detailed descriptions of the entity types, relationships, and constraints
does not include implementation details
can be used to communicate with non-technical users
(Elmasri, R. & Navathe, S.B. (2007). Fundamentals of Database Systems. Boston: Addison Wesley. 5th Edition. p. 60)
Conceptual model a theoretical construct that represents something, with a set of variables
and a set of logical and quantitative relationships between them.
describes the semantics of the modelled domain
Models in this sense are constructed to enable reasoning within an idealized logical framework
Often in the form of an ontology, or having an ontology as a part
– Ontology (a simple definition): ~ schema plus axioms for inference
25Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
25Conceptual modelling: languages, automated code generation, integration
Typically, the conceptual model(s) that are developed are captured in a software tool, using a particular conceptual modeling language.
Entity-relationship models (ERM)
Unified modeling language (UML)
But also: resource description framework (RDF), Web ontology language (OWL)
Conceptual modeling is one of the key activities in developing computerized systems for two important reasons.
Firstly, more and more, it is now possible to use computerized tools that can generate part (or sometimes all) of a computer application from the conceptual models encoded in standardized modeling languages [such as UML].
Secondly, computerization of enterprises continues with a focus on integrating systems.
Integration of systems requires an understanding of the semantics of each of the systems to be integrated.
The availability of conceptual models for the participant systems can facilitate the integration process and will require the involved staff to be fluent with the basics of the models employed and to have some modeling capabilities of their own. ...
26Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
26
Agenda
Organisation of the course
Motivation and overview
Data, information, and knowledge
Conceptual modelling, schemas, and ontologies
Recap: Entity-relationship model for data modelling
27Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
27
Recap: Conceptual modelling in the Entity-Relationship Model
insert here:
Jeff Ullman
The Entity-Relationship (E/R) Model.
2004 Slide set.
http://infolab.stanford.edu/~ullman/dscb/pslides/er.ppt
(in particular pp. 1-39)
(A lot of detail also in Henk Olivié, Gegevensbanken:
3: gegevensmodellering met het entiteit-relatie model,
4: het uitgebreide entiteit relatie model en UML
Or
(Instructor slides of the Elmasri/Navathe book, in English)
ch03.ppt, ch04.ppt in the directory „Lecture/OtherSlides“ of this course´s Web site
28Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
28
Next lecture
Organisation of the course
Motivation and overview
Data, information, and knowledge
Conceptual modelling, schemas, and ontologies
Recap: Entity-relationship model for data modelling
Data modelling: UML, logics, Semantic Web
29Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching
29
References / background reading; acknowledgements
p. 23:
Elmasri, R. & Navathe, S.B. (2007). Fundamentals of Database Systems. Boston: Addison Wesley. 5th Edition. p. 410
p. 25: Based on: Dagstuhl seminar April 2008: The Evolution of Conceptual Modeling
http://www.dagstuhl.de/de/programm/kalender/semhp/?semnr=2008181
p. 27 – the referenced Ullman slides refer to
Hector Garcia-Molina, Jeff Ullman, & Jennifer Widom (2002). Database Systems: The Complete Book. Upper Saddle River, NJ: Prentice-Hall.
Recommended