SQL Unit 21: Object-Oriented Modeling and Design with UML Michael Blaha and James Rumbaugh

Preview:

DESCRIPTION

SQL Unit 21: Object-Oriented Modeling and Design with UML Michael Blaha and James Rumbaugh. Summary of Selections from Chapter 19 prepared by Kirk Scott. Chapter 19 Databases. Chapter 19 in the book is specifically on the topic of developing a database to match an object-oriented design - PowerPoint PPT Presentation

Citation preview

1

SQL Unit 21: Object-Oriented Modeling and Design with UML

Michael Blaha and James Rumbaugh

Summary of Selections from Chapter 19 prepared by Kirk Scott

2

Chapter 19 Databases

3

• Chapter 19 in the book is specifically on the topic of developing a database to match an object-oriented design

• Not surprisingly, the example pursued in chapter 19 is based on the example of chapter 12

• 19.1 Introduction• There is no need to go over this• It is a review of db concepts

4

19.2 Abbreviated ATM Model

• The book presents an abbreviated ATM model for chapter 19

• Some things are taken out so that the amount of stuff isn’t overwhelming

• A couple of things are added so that specific db related things can be addressed which weren’t present in the original model

• The abbreviated model is shown on the following overhead

5

6

19.3 Implementing Structure--Basic

• What happens next is a discussion of how elements of the OO model are converted into database constructs, like tables

• The book notes that there are software tools out there that will do this for you automatically

• It’s worthwhile knowing how to do it by hand in case you have to and so you know what it is the tools generate

7

• 1. Implement classes = create tables• 2. Implement associations = create tables that

are related by pk, fk pairs• 3. Implement generalizations, namely classes

in superclass-subclass relationships = again, create tables that are related by pk, fk pairs

• 4. Implement identity = make sure you have suitable pk identifiers in tables

8

19.3.1 Classes

• Each class in an O-O design will generally map to a table in a relational design

• Each attribute in a class will map to a column in the table

• Strictly speaking, classes typically don’t contain primary and foreign key attributes

• The details of this will emerge gradually• Since you already know databases, how it’s done

will be no surprise

9

• Constructors and methods in a class have nothing to do with the relational table the class is mapped to

• An example is shown on the following overhead• This pattern will be repeated for following

examples:• A UML class diagram will be shown, followed by a

table schema, followed by the SQL statement for creating the table

10

11

19.3.2 Associations

• Remember that the term associations in UML refers to relationships between classes

• If classes map to tables, then associations will generally map to the inclusion of pk and fk fields

• There is more to it than that• For example, when mapping a many-to-many

relationship in an O-O system, it will be necessary to create the table in the middle

12

• The book lists the different types of O-O associations, with differing multiplicity (cardinality)

• It gives a verbal summary of how they are mapped.

• For some it gives a complete example• For others, it’s limited to a verbal description

13

1. Many-to-Many Associations

• Make a table for each of the base classes• Remember to give those tables primary keys• Make a table in the middle• Embed the primary keys of the base tables in

the table in the middle and make its primary key the concatenation of the embedded foreign keys

• Remember to include any attributes of the association as fields in the table in the middle

14

15

2. One-to-Many Associations

• Make a table for each of the base classes• Remember to give those tables primary keys• Embed the primary key of the one table as a

foreign key in the many table• Sometimes in the O-O design an association

arc may have a name on it• If so, that would be a good choice for the

name of the foreign key field

16

17

3. One-to-One Associations

• The book states that these rarely occur• Recall that in the database discussion, the

question was whether this should be one table or two, and which way to embed

• In this context, the assumption is that there are two classes in the O-O design and there will be two tables in the database design

• The question still remains of which way to embed

18

19

4. N-ary Associations

• The book states that these also rarely occur• At an earlier point the term ternary association was used• N-ary and ternary refer to tables in the middle of star-

like designs• In other words, tables in the middle where there are

more than two base tables being connected• They are treated like tables in the middle, with primary

keys consisting of the concatenation of embedded foreign keys

• The book doesn’t provide a separate example of this

20

Association Classes

• This boils down to the idea that in an O-O design, there may already be in essence a table in the middle

• This book refers to association classes• Design pattern terminology might refer to this as some

kind of mediator class• The bottom line is that if it exists in the O-O design, the

easiest thing to do is turn it into a table in the relational design

• Again, the book doesn’t provide a separate example of this

21

Qualified Associations

• This topic will have an example• It is worth paying attention to because it

should make clear what qualified associations really are

• Remember that from the perspective of CS 202 and CS 204, the UML notation and the concept were things that hadn’t come up before

22

• The point is that we need to be aware of this because a qualified association will translate into a relational database in a certain way

• The qualified association under consideration is diagrammed on the following overhead

23

24

• When you consider the diagram, you notice that a 1-1 relationship is shown

• In fact, the relationship between banks and accounts is 1-m

• What the diagram is telling you is that within the context of the Bank class, given an accountCode, you can identify exactly 0 or 1 accounts that match that combination

25

• You could say that there is a 1-1 relationship between (bank + accountCode) and account

• But when you translate into the relational model, you get the two base tables in a 1-m relationship

• The primary key of the one table, bank, is embedded as a foreign key in the many table, account

26

• The qualifier, accountCode, appears in the O-O model in the context of bank

• However, the accountCode is a descriptor of an account

• The accountCode becomes a field in the account table in the relational model

27

• You may recall the term “candidate key”• This referred to a field or set of fields in a table

that could have served as a primary key, but wasn’t chosen as the primary key

• The concatenation of the bankID and the accountCode are a candidate key in the resulting Account table

• The book indicates this with the abbreviation ck• The illustration follows

28

29

Aggregation, Composition

• Aggregation and composition are just special forms of association

• When turning them into relational models, the process is the same as for any other association

30

19.3.3 Generalizations

• The book points out that things work differently depending on whether you have single or multiple inheritance

• Since we’re working with Java, we don’t have multiple inheritance

• We have abstract classes and interfaces though

31

• The point is that there is no such thing as an instance of an abstract class or an interface

• If, in general, classes translate into tables, then instances translate into rows in tables

• A table that cannot contain a row is meaningless

• Therefore, trying to turn abstract classes or interfaces into tables is meaningless

32

• What we are concerned with is concrete classes in an inheritance hierarchy

• The basic rule of thumb still applies:• Turn each class into a table• What glues this all together is that a record in

a subclass table will have the same primary key value as the corresponding record in the superclass table

33

• In object-oriented terms, and object inherits certain instance variable from its superclass

• In relational terms, there is a record in the superclass table and a matching record in the subclass table

• The “inherited values” are the fields that are maintained in the superclass table with the matching primary key value

34

• You may recall that in Watson’s presentation of this, animals and horses and sheep were used as illustrations.

• There was a class for each, and the horse and sheep classes were referred to as subtypes.

• You also saw something like this in the cardealership database

35

• Car and Carsale had the same primary key• Car contained information common to all cars• Carsale was a subtype• It contained information about that category

of cars that had sold• The book’s example is shown on the following

overhead

36

37

• There are a few things to observe about the example• Unlike the cardealership example, where both tables

had a vin field, the different kinds of accounts have different primary key names

• It’s not a bad idea to have differing, descriptive field names

• The important point is that they are all on the same domain, and there is a pk-fk relationship from the superclass to the subclass tables

38

• The book also points out that in this example there was a class, SavingsAccount, that had no instance variables

• It translated into a table that only contained a primary key field

• The book says that it’s still a good idea to keep this table

• If it’s in one design it should be in the other• It’s possible that it will have fields added to it later

39

19.3.4 Identity

• In discussing this issue the book uses these two terms:

• Object identity: This means making up an arbitrary (typically numeric) field as the pk for a table

• Value based identity: This means using some combination (concatenation) of actual data fields as the pk for a table

40

• This issue has come up before in the discussion of database design

• When translating from O-O to relational, the design choice remains

• The book prefers object identity—which is consistent with what we’ve talked about before

41

• When you translate a base class into a base table, you give it an arbitrary pk field

• The pk fields of tables in the middle are then concatenated

• Recall that a table in the middle might have something like a date field that gets added to the pk

• There is no way around that• In that case, a data value belongs in the key• The book illustrates its preference in the following

diagram

42

43

19.3.5 Summary of Basic Rules for RDBMS Implementation

44

19.4 Implementing Structure--Advanced

• This section will cover the following four topics:

• 1. Implementing foreign keys• 2. Implementing check constraints• 3. Implementing indexes• 4. Considering views

45

19.4.1 Foreign Keys

• This section isn’t about creating the foreign keys

• It’s about referential integrity• Depending on the translation from O-O to

relational, there may be specific ways you want to handle ON DELETE, ON UPDATE, and so on

46

• Up to this point, this was the standard default given for how to handle this:

• ON DELETE RESTRICT• ON UPDATE CASCADE

47

• Consider the case of generalizations• This was the translation of superclass and

subclass into table and subtype table• The subtype table contains a fk that refers to

the superclass table

48

• If the parent record is deleted, you would like the child record to be deleted

• This is accomplished by adding the following to the subtype table’s definition:

• ON DELETE CASCADE• The book’s application of this rule to its example

is shown on the following overhead by adding suitable constraints to some of the tables in the design

49

50

• The book points out that in reality, in this situation, if the child is deleted, it would also be desirable to delete the parent record

• Note that referential integrity does not support this

• This would become something that you had to implement separately

51

• The book illustrates two more cases, based on association rather than generalization

• A customer has an address• If you delete the customer, you would like to

delete the address• Alternatively, a customer has accounts• You don’t want to be able to delete any

customer that has accounts

52

• The book’s illustration of adding these constraints to the tables is shown on the following overhead

• Note that they are using a system with different syntax

• The default is apparently “ON DELETE RESTRICT”

53

54

19.4.2 Check Constraints

• SQL has another kind of constraint which wasn’t covered in the first half of the course

• It is a way of enforcing data integrity• If you looked through the GUI for table

creation in MS Access, you would find similar capabilities

• The idea is that you can specify the set of values valid for a given field

55

• The book illustrates how this can be useful when translating generalizations (inheritance)

• In their translation, the superclass table has a field where the type of the matching subclass record is indicated

• Types could only be those of the given subclasses• Adding such a constraint is illustrated on the

following overhead

56

57

19.4.3 Indexes

• This a very short section with nothing new in it• If you’re doing the translation, it’s up to you to

create the indexes• If you’re relying on software, it’s still up to you

to make sure that you’ve got all the indexes you need

58

• The book repeats that you get pk indexes by default on all tables

• It also reiterates that at the very least you will want indexes on all fk fields

• Others may also be desirable• The book’s illustration is given on the

following overhead

59

60

19.4.4 Views

• You can define a view for each subclass in a hierarchy

• The idea is that by doing a join query between the superclass and subclass tables, you can bring together both local and inherited instance variables

• The book’s illustration is shown on the following overhead

61

62

19.4.5 Summary of Advanced Rules for RDBMS Implementation

63

19.5 Implementing Structure for the ATM Example

• The book gives tables schemas and complete SQL for creating the database corresponding to the O-O ATM design

• The UML for the O-O design is repeated on the following overhead

• The table schemas are given on the overhead following the next one

• The SQL is also given for the sake of completeness, but I’m not going to read through it

64

65

66

67

68

69

19.6 Implementing Functionality

• Databases are all about structuring and storing data

• Software is about functionality• There are general areas that can be identified

where there are questions about matching up a software system with a database

70

• 1. Coupling a programming language to a database

• 2. Converting data• 3. Encapsulation vs. query optimization• 4. Use of SQL code

71

19.6.1 Coupling a Programming Language to a Database

• SQL is declarative• Programming languages are procedural• This means that there has to be some sort of

crossover technique for merging the two• The book identifies 8 possible ways of going

about this

72

1. Proprocessor and Postprocessor

• The idea is to work with temporary tables/files• For example, write a query that generates results.• Save them• Write a program that processes the result file• Conversely, write a program that generates file

output• Then use database tools to apply that to the db• This is clumsy and limited, although possibly useful

in some settings

73

2. Script Files

• A database management system may support saving sequences of SQL commands in a single executable file

• This isn’t really programming, but it is an expansion of one at a time SQL commands

• This is a simple approach which may sometimes be sufficient

74

3. Embedded DBMS Commands

• In other words, embedded SQL• Programs with embedded SQL are not necessarily

easy to write or maintain• The classic illustration of the mismatch of paradigms

is having to loop in order to acquire query results• You will be familiar with this from your project• Embedded SQL is a common approach• The book suggests that it’s not necessarily the best

approach

75

4. Custom Application Programming Interface (API)

• In effect, this is built on top of embedded SQL, but it provides a better alternative

• Instead of embedding SQL directly in user programs, add classes/methods which have the embedded SQL in them and embody the needed functionality

• Then user programs can be built on those constructs• ODBC and JDBC are examples of this• In a given environment, a programmer might also

develop reusable components like this

76

5. Stored Procedures

• This came up briefly in the Watson presentation on SQL

• Implementations of stored procedures can vary widely

• Roughly, the range goes from scripts to database management systems that effectively have some sort of programming language built in

• The developer can write and save dbms code on the dbms side rather than in an external program

77

6. Fourth-Generation Language (4GL)

• This is a term that refers to a GUI environment for program development

• MS Access, for example, has a visual environment for putting together reports and forms, where the data that populates them is ultimately retrieved by queries under the covers

• The book says this is good for simple applications and prototyping

• It doesn’t have the same power as a programming environment

78

7. Generic Layer

• This is a simplified interface to a database for a programming language

• It is apparently somewhat like a simplified interface for embedding commands

• Any simplification involves a trade-off• It may be easier to use• But it will limit access to functionality

79

8. Medata-Driven System

• This is an advanced topic• Applications may be structured to query the

data dictionary (SYSTABLE, etc.) and then query the database

• The book gives as an example applications that learn

• In other words, data mining, etc. might be implemented using techniques like these

80

Data Interaction Techniques

81

19.6.2 Data Conversion

• Data conversion is a practical concern• It has not been touched on before, but it is of

interest whenever you are converting data from one form or system to another, regardless of whether an O-O model is involved

• This can involve transfer of data between current systems and transfer from an old system to a new one

82

• 1. Cleansing data = correcting data integrity problems

• 2. Handling missing data• 3. Moving data = figuring out exactly how to

export/import from one format to another• 4. Merging data—word to the wise: Figure out a

combined data model first; then take care of the technical details of how to combine data from different sources/formats

83

• 5. Changing data structure• From the db design point of view, this is the

most interesting point• Different data sources may contain similar

information• However, field names and types may differ,

and more importantly, designs may differ

84

• For example, one application may handle addresses using the LineItem model

• Another may have used a different model• You need an overall model to convert both to,

and then you have the problem of doing the conversion and merging

85

• The book suggests an approach based on what are called staging tables

• The idea is to convert raw source information into relational tables

• At that point you have the full power of SQL to manipulate the contents before arriving at the final, converted data set

• This is a very good idea

86

19.6.3 Encapsulation vs. Query Optimization

• This topic is related to how you process your data

• The basic observation is this:• In SQL, you can easily write a join query across

many tables• A single query is allowed to access any field of

any table

87

• In a corresponding O-O implementation, the tables are classes which may have references to each other

• To process data belonging to three different tables might involve a call x.getY().getZ()

• Encapsulation says that x shouldn’t have direct access to z.

88

• The obvious problem is that calls of the form x.getX().getY() are complex and only get worse if more tables are involved

• If you’ve had CS 304, you will recognize that such calls are not just complex

• They are bad in the sense that they will tend to violate the Law of Demeter

• In other words, you have to break encapsulation to accomplish your goals

89

19.6.4 Use of SQL Code

• There is a range of implementation choices • Write a pure O-O front end • This will preserve encapsulation in the code• You will have the full power of a high level

language to implement complex logic• Considered from the point of view of querying

and manipulating the db back end, performance will not be good

90

• Write a front end that essentially is a framework for executing SQL queries

• Code will not be highly O-O but performance will improve

• I am prejudiced in favor of the second option, but complex applications may require the first approach

• The book illustrates this with a query for a monthly statement of ATM transactions, as opposed to an O-O method for generating those results

91

92

Object-Oriented Databases

• Object-Oriented databases can be implemented in many different ways

• Fundamentally they are based on these concepts:• Objects are persistent (they are what is stored in

the db)• Has-a relationships are captured by references• There is a tree-like relationship among types of

objects (due to inheritance)

93

• The book identifies two basic reasons for opting for an O-O database

• 1. An O-O programmer doesn’t fully understand the relational model and wants a database back end that reflects a known programming paradigm

• This is not a sound reason• Relational databases are the gold standard and

it’s necessary to adapt to them

94

• 2. The O-O database is more suitable to the problem domain or offers special features which the relational model doesn’t offer

• If you recall, the parts, sub-parts, assembly example pushed the limits of the relational model

• In some engineering or manufacturing environments, especially, an O-O database might be useful

• This is a valid reason

95

19.8 Practical Tips

• This section of the book is just a compressed summary of the foregoing points

• One claim is worth examining:• “Normal forms apply regardless of the

development approach. However, it is unnecessary to check them if you build a sound OO model.”

96

• This is reminiscent of Watson’s claim that you don’t need the normal forms if you build a sound E-R model.

• It’s basically a tautology.• It’s true that you don’t need to check the

normal forms if by chance you have created a model that doesn’t violate them.

97

• However, at the very least, this seems to be a corollary truth:

• You will only build a sound model if you have internalized the normal forms, whether you learned them formally or not

• In any case, it is worthwhile to know the normal forms and to be able to apply them when checking a model for correctness

98

19.9 Chapter Summary

99

The End

Recommended