A Relational Model of Data for Large Shared Data Banks - E. F. Codd

Embed Size (px)

Citation preview

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    1/11

    Information Retrieval P. BAXENDALE, Editor

    A Relational Model of DataLarge Shared Data Banks

    E. F. CODDI B M Research Laboratory, Sa n Jose, California

    for

    Future users of large dat a banks must be protected fromhaving to know how the data is organized in the machine (theinternal representation). A prompting service which suppliessuch information is not a sa tisfa ctory solution. Activities of usersat terminals a nd mo st appli cation progra ms should remainunaffected when the internal representation of d ata is changedand even when some aspects of the external representationare changed. Changes in data representation will often beneeded as a result of changes in query, update, and reporttraffic and natural growth in the types of stored information.

    Existing noninferential, formatted data systems provide userswith tree-structured files or slightly more general networkmodels of the data. In Section 1, inadequacies of these modelsare discussed. A model based on n-ar y relations, a normalform for data base relations, and the concept of a universaldata sublanguage are introduced. In Section 2, certain opera-tions on relations (o ther than logical infe rence) a re discussedand applied to the problems of redundancy and consistencyin the user's model.

    KEY WOR DS AND PHRASES: data bank, data base, data structure, dataorganization, hierarchies of data, networks of data, relations, derivability,redundancy, consistency, composition, join, retrieval language, predicatecalculus, security, dat a integrityCR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29

    1. R e l a t i o n a l M o d e l a n d N o r m a l F o r m

    1.1. INTRODUCTIONThis paper is concerned with the application of ele-

    mentar y relation theory to systems which provide sharedaccess to large banks of formatte d data. Ex cept for a paperby Childs [1], the principal application of relations to da ta

    systems has been to deductive question-answering systems.Levein and Maron [2] provide numerous references to workin this area.

    In contrast, the problems treated here are those of dataindependence--the independence of application programsand terminal activities from growth in data types andchanges in data representation--and certain kinds of datainconsistency which are expected to become troublesomeeven in nondeductive systems.

    Vo l u m e 13 / Number 6 / June, 1970

    The relational view (or model) of data described inSection 1 appears to be superior in several respects to thegraph or network model [3, 4] present ly in vogue for non-inferential systems. It provides a means of describing datawith its natural structure only--that is, without superim-posing any additional structure for machine representationpurposes. Accordingly, it provides a basis for a high leveldata language which will yield maximal independence be-tween programs on the one hand and machine representa-tion and organization of data on the other.

    A further advantage of the relational view is that itforms a sound basis for treating derivability, redundancy,and consistency of relations --the se are discussed in Section2. The network model, on the other hand, has spawned anumber of confusions, not the least of which is mistakingthe derivation of connections for the derivation of rela-tions (see remarks in Section 2 on the "connection trap").

    Finally, the relational view permits a clearer evaluationof the scope and logical limitations of present formattedda ta systems, and also the relative merits (from a logicalstandpoin t) of competing representations of data within asingle system. Examples of this clearer perspective arecited in various parts of this paper. Implementations ofsystems to support the relational model are not discussed.

    1.2. DATA DEPENDENCIES IN PRESENT SYSTEMSThe provision of data description tables in recently de-

    veloped information systems represents a major advancetoward the goal of data independence [5, 6, 7]. Such tablesfacilitate changing certain characteristics of the dat a repre-sentation stored in a data bank. However, the variety ofdata representation characteristics which can be changed

    without logica lly imp airin g som e application programs isstill quite limited. Further, the model of data with whichusers interact is still cluttered with representational prop-erties, particularly in regard to the representation of col-lections of dat a (as opposed to individual items). Th ree ofthe principal kinds of data dependencies which still needto be removed are: ordering dependence, indexing depend-ence, and access path dependence. In some systems thesedependencies are not clearly separable from one another.

    1.2.1. Ordering Dependence. Elements of data in adat a ban k ma y be stored in a variet y of ways, some involv-ing no concern for ordering, some permitti ng each elementto participate in one ordering only, others permitting each

    element to par ticipate in several orderings. Let us considerthose existing systems which either require or permit dataelements to be stored in a t least one total ordering which isclosely associated with the hardware-determined orderingof addresses. For example, the records of a file concerningparts might be stored in ascending order by part serialnumber. Such systems normally permit application pro-grams to assume that the order of presentation of recordsfrom such a file is ident ical to (or is a subordering of) t he

    C o m m u n i c a t i o n s o f t h e ACM 377

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    2/11

    stored ordering. Those application programs which takeadvantage of the stored ordering of a file are likely to failto operate correctly if for some reason it becomes necessaryto replace tha t ordering by a different one. Similar remarkshold for a stored ordering implemented by means ofpointers.

    It is unnecessary to single out a ny sys tem as an example,because all the well-known information systems that aremarketed today fail to make a clear distinction between

    order of presentation on the one hand and stored orderingon the other. Significant implemen tation problems mus t besolved to provide this k ind of independence.

    1.2.2. Indexing Dependence. In the context of for-matted data, an index is usually thought of as a purelyperformance-oriented component of the data representa-tion. It tends to improve response to queries and updatesand, at the same time, slow down response to insertionsand deletions. From an informational standpoint, an indexis a redundant component of the data representation. If asyste m uses indices at all and if it is to perform well in anenvironment with changing patterns of activity on the dat abank, an ability to create and destroy indices from time totime will probably be necessary. The question then arises:Can application programs and terminal activities remaininvariant as indices come and go?

    Present formatted data systems take widely differentapproaches to indexing. TDMS [7] unconditionally pro-vides indexing on all attributes. The pr esently releasedversion of IMS [5] provides th e user with a choice for eachfile: a choice between no indexing at all (the hierarchic se-quential organization) or indexing on the primary keyonly (the hierarchic indexed sequential organization). Inneither case is the user's application logic dependen t on theexistence of the unconditionally provided indices. IDS[8], however, permits the file designers to select attri butesto be indexed and to incorporate indices into th e file struc-ture by means of additional chains. Application programstalcing advantage of the performance benefit of these in-dexing chains must refer to thos e chains by name. Such pro-grams do not operate correctly if these chains are laterremoved.

    1.2.3. Access Path Dependence. Many of the existingformatted da ta systems provide users with tree-structuredfiles or slightly more general network models of the data.Application programs developed to work with these sys-tems tend to be logically impaired if the trees or networksare changed in structure . A simple example follows.

    Suppose the data bank contains information about partsand projects. For each part, the part number, part name,part description, quantity-on-hand, and quantity-on-orderare recorded. For each project, the project number, projectname, project description are recorded. Whenever a projectmakes use of a certain part, the quan tity of tha t part com-mitted to the given project is also recorded. Suppose thatthe system requires the user or file designer to declare ordefine the d ata in terms of tree structures. Then, any oneof the hierarchical structures ma y be adopted for the infor-mation mentioned above (see Structures 1-5).

    S t r u c t u r e 1 .File Segment

    F PA RT

    P R O J E C T

    P r o j e c t s S u b o r d i n a t e t o P a r t sP l d d s

    p a r t #p a r t n a m ep a r t d e s c r i p ti o nq u a n t i t y - o n - h a n dq u a n t i t y - o n - o r d e r

    p r o j e c t #p r o j e c t n a m ep r o j e c t d e s c r i p t i o nq u a n t i t y c o m m i t t e d

    S t r u c t u r e 2 .File Segment

    F P R O J E C T

    P A R T

    P a r t s S u b o r d i n a t e t o P r o j e c t sg i d d s

    p r o j e c t #p r o j e c t n a m ep r o j e c t d e s c r i p t io n

    p a r t #p a r t n a m ep a r t d e s c r i p t io nq u a n t i t y - o n - h a n dq u a n t i t y - o n - o r d e rq u a n t i t y c o m m i t t e d

    S t r u c t u r e 3 . P a r t s a n d P r o j e c t s a s P e e r sC o m m i t m e n t R e l a t i o n s h i p S u b o r d i n a t e to P r o j e c t sFile Segmen~ Fields

    F PA R T p a r t #p a r t n a m ep a r t d e s c r i p t io nq u a n t i t y - o n - h a n dq u a n t i t y - o n - o r d e r

    G P R O J E C T p r o je c t #p r o j e c t n a m ep r o j e c t d e s c r i p t i o n

    PA R T p a r t #q u a n t i t y c o m m i t t e d

    S t r u c t u r e 4 . P a r t s a n d P r o j e c t s a s P e e r sC o m m i t m e n t R e l a t i o n s h i p S u b o r d i n a t e t o P a r t sFi l e Segmen t F idd s

    P P A R T p a r t #p a r t d e s c r i p t io nq u a n t i t y - o n - h a n dq u a n t i t y - o n - o r d e r

    P R O J E C T p r o j ec t #q u a n t i t y c o m m i t t e d

    G P R O J E C T p r o j ec t #p r o j e c t n a m ep r o j e c t d e s c r i p t i o n

    S t r u c t u r e 5 . P a r t s , P r o j e c t s , a n dC o m m i t m e n t R e l a t i o n s h i p a s P e e r s

    F~ Segment Fields

    F PA R T p a r t #p a r t n a m ep a r t d e s c r i p t i o nq u a n t i t y - o n - h a n dq u a n t i t y - o n - o r d e r

    G P R O J E C T p r o je c t #p r o j e c t n a m ep r o j e c tdescr ip t ion

    H C O M M I T p a r t #p r o j e c t #q u a n t i t y c o m m i t t e d

    3 7 8 C o m m u n i c a t i o n s o f t h e A C M Vo l u m e 13 / N u m b e r 6 / J u n e , 19 70

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    3/11

    N o w, c o n s i d e r t h e p r o b l e m o f p r i n t i n g o u t t h e p a r tn u m b e r , p a r t n a m e , a n d q u a n t i t y c o m m i t t e d f o r e v e r y p a r tu s e d i n t h e p r o j e c t w h o s e p r o j e c t n a m e i s " a l p h a . " T h ef o ll o w in g o b s e r v a t i o n s m a y b e m a d e r e g a r d le s s o f w h i c ha v a i l a b l e t r e e - o r i e n t e d i n f o r m a t i o n s y s t e m i s s e l e c t e d t ot a c k l e t h i s p r o b l e m . I f a p r o g r a m P i s d e v e l o p e d f o r t h i sp r o b l e m a s s u m i n g o n e of t h e f iv e s t r u c t u r e s a b o v e - - t h a ti s , P m a k e s n o t e s t t o d e t e r m i n e w h i c h s t r u c t u r e i s i n e f -f e c t - t h e n P w il l f a il o n a t l e a s t th r e e o f t h e r e m a i n i n g

    s t r u c t u r e s . M o r e s p e c i f ic a l l y, i f P s u c c e e d s w i t h s t r u c t u r e 5 ,i t wi l l f a i l w i t h a l l t h e o t he r s ; i f P s ucceed s w i t h s t ru c t u re 3o r 4 , i t w i l l f a i l w i t h a t l eas t 1 , 2 , and 5 ; i f P s u cceed s w i t h1 o r 2 , i t w i l l f a i l w i t h a t l eas t 3 , 4 , and 5 . Th e r eas o n i ss i m p l e i n e a c h c a s e . I n t h e a b s e n c e o f a t e s t t o d e t e r m i n ew h i c h s t r u c t u r e i s i n e ff e ct , P f ai l s b e c a u s e a n a t t e m p t i sm a d e t o e x e c u t e a r e fe r e n c e t o a n o n e x i s t e n t f il e ( a v a i l a b l es y s t e m s t r e a t t h i s a s a n e r r o r ) o r n o a t t e m p t i s m a d e t oe x e c u t e a r e f e r e n c e t o a f il e c o n t a i n i n g n e e d e d i n f o r m a t i o n .T h e r e a d e r w h o i s n o t c o n v i n c e d s h o u l d d e v e l o p s a m p l ep r o g r a m s f o r t h is s i m p l e p r o b l e m .

    S i nce , i n gene ra l , i t i s no t p rac t i ca l t o d ev e l op app l i ca -t i o n p r o g r a m s w h ic h t e s t f o r a l l t r e e s t r u c t u r i n g s p e r m i t t e db y t h e s y s t e m , t h e s e p r o g r a m s f a il w h e n a c h a n g e i ns t r u c t u r e b e c o m e s n e c e ss a r y.

    S y s t e m s w h ic h p r o v i d e u s e r s w i t h a n e t w o r k m o d e l o ft h e d a t a r u n i n t o s i m i l a r d i f f i c u l t i e s . I n b o t h t h e t r e e a n dn e t w o r k c a s e s, t h e u s e r ( o r h i s p r o g r a m ) i s r e q u i r e d t oe x p l o i t a c o l l e c t io n o f u s e r a c c e s s p a t h s t o t h e d a t a . I t d o e sn o t m a t t e r w h e t h e r t h e s e p a t h s a r e in c l os e c or r e s p o nd e n c ew i t h p o i n t e r- d e fi n e d p a t h s i n t h e s t o r e d r e p r e s e n t a t i o n - - i nI D S t h e c o r r e s p o n d e n c e is e x t re m e l y s i m p le , i n T D M S i t isj u s t t h e o p p o s i t e . T h e c o n s e q u e n c e , r e ga r d l e s s o f t h e s t o r e dr e p r e s e n t a t i o n , i s t h a t t e r m i n a l a c t i v i t i e s a n d p r o g r a m s b e -c o m e d e p e n d e n t o n t h e c o n t i n u e d e x i s te n c e o f t h e u s e ra c c e s s p a t h s .

    O n e s o l u t i o n t o t h is i s t o a d o p t t h e p o l i c y t h a t o n c e au s e r a c c e s s p a t h i s d e f in e d i t w il l n o t b e m a d e o b s o l e t e u n -t i l a l l a p p l i c a t i o n p r o g r a m s u s i n g t h a t p a t h h a v e b e c o m eo b s o l e te . S u c h a p o l i c y i s n o t p r a c t i c a l , b e c a u s e t h e n u m b e ro f a c c es s p a t h s i n t h e t o t a l m o d e l f o r t h e c o m m u n i t y o fu s e r s o f a d a t a b a n k w o u l d e v e n t u a l l y b e c o m e e x c e ss i v e lyl a rge .

    1.3. A RELATIONAL VIEW OF DATAT h e t e r m relation i s u s e d h e r e i n i t s a c c e p t e d m a t h e -

    m a t i ca l s ens e . G i ven s e t s $1 , $2 , , S ~ (no t n ece s s a r i l yd i s t i n c t ) , R i s a r e l a t i on on t hes e n s e t s i f i t i s a s e t o f n -t u p l e s e a c h o f w h i c h h a s i t s f i r s t e l e m e n t f r o m S ~ , i t s

    s e c o n d e l e m e n t f r o m $ 2 , a n d s o o n . 1 We s h a l l r e fe r t o $ i a st h e j t h domain o f R . As de f i n ed ab o v e , R i s s a i d t o havedegree n. R e l a t i o n s o f d e g r e e 1 a r e o f t e n c a l l e dunary, de -g ree 2 binary, d e g r e e 3 ternary, a n d d e g r e en n-ary.

    F o r e x p o s i t o r y r e a s o n s , w e s h a l l f r e q u e n t l y m a k e u s e o fa n a r r a y r e p r e s e n t a t i o n o f r e l a t io n s , b u t i t m u s t b e r e -m e m b e r e d t h a t t h i s p a r t ic u l a r r e p r e s e n t a t i o n i s n o t a n e s -s e n t i a l p a r t o f t h e r e l a t i o n a l v i e w b e i n g e x p o u n d e d . A n a r -

    i More concisely, R is a subs et of the C artesia n pro duct $1 XS~X " ' " X S ..

    r a y w h i c h r e p r e s e n t s a n n - a r y r e l a t i o n R h a s t h e f op r o p e r t i e s :

    ( 1 ) E a c h r o w r e p r e s e n t s a n n - t u p l e o f R .( 2 ) T h e o r d e r i n g o f r o w s i s i m m a t e r i a l .(3 ) A l l ro ws a re d i s t i n c t .( 4 ) T h e o r d e r i n g o f c o l u m n s i s s i g n i f i c a n t - - i t

    s p o n d s t o t h e o r d e r i n g S ~, $2 , - . - , S ~ o f t hm a i n s o n w h i c h R i s d e fi n e d ( se e , h o w e v e r, r eb e l o w o n d o m a i n - o r d e r e d a n d d o m a i n - u n o

    re l a t i o ns ) .( 5 ) T h e s i g n i f i c a n c e o f e a c h c o l u m n i s p a r t i a l l

    v e y e d b y l a b e l i n g i t w i t h t h e n a m e o f t h e s p o n d i n g d o m a i n .

    T h e e x a m p l e i n F i g u r e 1 i l lu s t r a t e s a r e l a t i o n o f4 , ca l led supply, w h i c h r e f l e c t s t h e s h i p m e n t s - i n - p r oo f p a r t s f r o m s p e c i f i e d s u p p l i e r s t o s p e c i f i e d p r o js p ec i f i ed qu an t i t i e s .

    supply (supplier part pr oj ec t quantity)

    1 2 5 171 3 5 232 3 7 9

    2 7 5 44 1 1 12

    FIG. 1. A rela tion of degree 4

    O n e m i g h t a s k : I f t h e c o l u m n s a r e l a be l e d b y t ho f c o r r e s p o n d i n g d o m a i n s , w h y s h o u l d t h e o r d e r i n gu m n s m a t t e r ? A s t h e e x a m p l e i n F i g u r e 2 s h o w s , tu m n s m a y h a v e i d e n t i c a l h e a d i n g s ( i n d i c a ti n g i dd o m a i n s ) b u t p o s s e s s d i s t i n c t m e a n i n g s w i t h r e s p e cr e l a t i o n . T h e r e l a t i o n d e p i c t e d i s c a l l e dcomponent. I t i st e r n a r y r e l a t i o n , w h o s e f i r s t t w o d o m a i n s a r e c a l lpaa n d t h i r d d o m a i n i s c a l l e dquanti ty. T h e m e a n i n g o fcoponent (x, y, z) i s t h a t p a r t x is a n im m e d i a t e c o m p( o r s u b a s s e m b l y ) o f p a r t y, a n d z u n i t s o f p a r t x a r e

    t o a s s e m b l e o n e u n i t o f p a r t y. I t i s a r e l a t i o n w h i ca c r i t i c a l ro l e in t h e p a r t s e x p l o si o n p r o b l e m .

    comp onen t (part part quantity)

    1 5 92 5 73 5 22 6 123 6 34 7 16 7 1

    FiG. 2. A relat ion with two identical domains

    I t i s a r e m a r k a b l e f a c t t h a t s e v e r a l e x i s t i n g i n f o rs y s t e m s ( c hi e fl y t h o s e b a s e d o n t r e e - s t r u c t u r e d f it o p r o v i d e d a t a r e p r e s e n t a t i o n s f o r r e l a t i o n s w h i ct w o o r m o r e i d e n t i c a l d o m a i n s . T h e p r e s e n t v e rI M S / 3 6 0 [5] i s a n e x a m p l e o f s u c h a s y s t e m .

    T h e t o t a l i t y o f d a t a i n a d a t a b a n k m a y b e v i e wc o l l e c t io n o f t i m e - v a r y i n g r e l a t io n s . T h e s e r e l a t i o na s s o r t e d d e g r e e s . A s t i m e p r o g r e s s e s , e a c h n - a r y m a y b e s u b j e c t t o i n s e r t i o n o f a d d i t i o n a l n - t u p l e s , o f e xi s t in g o n e s, a n d a l t e r a t i o n o f c o m p o n e n t s o f a nex i s t i ng n - t up l es .

    Vo l u m e 13 / N u m b e r 6 / J u n e , 19 70 C o m m u n i c a t i o n s o f t h e A C M 3

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    4/11

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    5/11

    and nonsimple domain, respectively. Much of the confusionin present terminology is due to failure to distinguish be-tween type and instance (as in "record") and betweencomponents of a user model of the data on the one handand their machine representation counterparts on theother hand (again, we cite "rec ord" as an example).

    1.4. NORMAL FORMA relation whose domains are all simple can be repre-

    sented in storage by a two-dimensional column-homo-geneous array of the kind discussed above. Some morecomplicated data structure is necessary for a relation withone or more nonsimple domains. For this reason (and othersto be cited below) th e possibility of eliminating nonsimpledomains appears worth investigating. 4 There is, in fact, avery simple elimination procedure, which we shall callnormalization.

    Consider, for example, the collection of relations ex-hibited in Fi gure 3 (a). Job history an d children are non-simple domains of the relation employee. Salary historyis anonsimple domain of the relation job history. The tree inFigure 3 (a) shows just these interrelationships of the non-

    simple domains.

    jobhistoryI

    salaryhistory

    employee

    children

    employee (man#, name, birthdate, jobhistory, children)jobhistory (jobdate,title, salaryhistory)salaryhistory (salarydate, salary)children (childname,birthyear)

    FIG. 3(a ). Unnormalized set

    employee' (martS,name, birthdate)jobhistory' (man#, iobdate,title)salaryhistory' (man#, iobdate, salarydate,salary)children' (man#, childname,birthyear)

    Fro. 3(b). Normalized set

    Normalization proceeds as follows. Star ting w ith the re-lation at the top of the tree, take its primary key and ex-pand each of the immediately subordinate relations byinserting this primary k ey domain or domain combination.The primar y key of each expanded relation consists of theprimary key before expansion augmented by the primary

    key copied down from the parent relation. Now, strike outfrom the p arent relation all nonsimple domains, remove thetop node of the tree, and repeat the same sequence ofoperations on each remaining subtree.

    The result of normalizing the collection of relations inFigure 3 (a) is the collection in Figure 3 (b). The prima rykey of each relation is italicized to show how such keysare expanded by the normalization.

    ' M. E. Sanko of IBM, San Jose, independently recognized thedesirability of eliminating nonsimple domains.

    If normalization as described above is to be applicable,the unnormalized collection of relations must satisfy thefollowing conditions:

    (1) The graph of interrelationships of the nonsimpledomains is a collection of trees.

    (2) No prima ry key has a component domain which isnonsimple.

    The writer knows of no application which would requireany relaxation of these conditions. Further operations of anormalizing kin d are possible. These ar e not discussed inthis paper.

    The s implicity of the arra y representati on which becomesfeasible when all relations are cast in normal form is notonly an advantage for storage purposes but also for com-municat ion of bulk d ata between systems which use widelydifferent representations of the data. The communicationform would be a suitably compressed version of the arrayrepresentation and would have the following advantages:

    (1) It would be devoid of pointers (address-valued ordisplacement-valued).

    (2) It would avoid all dependence on hash addressing

    schemes.(3) It would contain no indices or ordering lists.If the user's relational model is set up in normal form,

    names of items of data in the d ata ba nk can take a simplerform than would otherwise be the case. A general namewould take a form such as

    R (g).r.d

    where R is a relational name; g is a generation identifier(optional); r is a role name (optional); d is a domain name.

    Since g is needed on ly when several generations of a givenrelation exist, or are anticipated to exist, and r is neededonly when the relation R has two or more domains namedd, the simple form R.d will often be adequate.

    1. 5. SOME LINGUISTIC ASPECTSThe adoption of a relational model of data, as described

    above, permits the development of a universal data sub-language based on an applied predicate calculus. A first-order predicate calculus suffices if the collection of relationsis in normal form. Such a language would provide a yard-stick of linguistic power for all other proposed data lan-guages, and would itself be a strong candidate for embed-ding (with appropriate synta ctic modification) in a vari etyof host languages (programming, command- or problem-oriented). While it is not the purpose of this paper to

    describe such a language in detail, its salient featureswould be as follows.Let us denote the data sublanguage by R and the host

    language by H. R permits the declaration of relations andtheir domains. Each declaration of a relati on identifies theprimary key for that relation. Declared relations are addedto t he system catalog for use by any members of the usercommunity who have appropriate authorization. H per-mits supporting declarations which indicate, perhaps lesspermanently, how these relations are represented in stor-

    Vo l u m e 13 / N u m b e r 6 / J u n e , 19 70 C o m m u n i c a t i o n s o f t h e A C M 3

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    6/11

    a g e. R p e r m i t s t h e s p e c i fi c a ti o n f o r r e t r i e v a l o f a n y s u b s e to f d a t a f r o m t h e d a t a b a n k . A c t i o n o n s u c h a r e t r ie v a l re -q u e s t i s s u b j e c t t o s e c u r i t y c o n s t r a i n t s .

    T h e u n i v e r s a l i t y o f th e d a t a s u b l a n g n a g e l i es in i t sd e s c r i p t i v e a b i l i ty ( n o t it s c o m p u t i n g a b i l i t y ) . I n a l a r g ed a t a b a n k e a c h s u b se t o f t h e d a t a h a s a v e r y l ar g e n u m b e ro f p o ss ib l e ( a n d se n s ib l e ) d e sc r ip t i o n s , e v e n wh e n we a s -s u m e ( as w e d o ) t h a t t h e r e i s o n l y a f in i te s e t o f f u n c t i o ns u b r o u t i n e s t o w h i c h t h e s y s t e m h a s a c c e s s f o r u s e i nq u a l i f y in g d a t a f o r r e t r i e v a l . T h u s , t h e c l a ss o f q u a l i f i c a t i o ne x p r e ss io n s wh ic h c a n b e u se d in a se t sp e c i f ic a t i o n m u s th a v e t h e d e s c r i p t i v e p o w e r o f t h e c la s s o f w e l l - fo r m e df o r m u l a s o f a n a p p l i e d p r e d i c a t e c a l c u l u s. I t i s w e l l k n o w nt h a t t o p r e s e r v e t h i s d e s c r i p t i v e p o w e r i t i s u n n e c e s s a r y t oe x p r e ss ( in w h a t e v e r s y n t a x i s c h o s e n ) e v e r y fo r m u l a o ft h e s e l e c te d p r e d i c a t e c a l c u lu s . F o r e x a m p l e , j u s t t h o s e i np r e n e x n o r m a l f o r m a r e a d e q u a t e [ 9 ] .

    A r i t h m e t i c f u n c t i o n s m a y b e n e e d e d i n t h e q u a l i f i c a t i o no r o t h e r p a r t s o f r e t r i e v a l s t a t e m e n t s . S u c h f u n c t i o n s c a nb e d e f i n e d i n H a n d i n v o k e d i n R .

    A s e t s o s p e c i f i e d m a y b e f e t c h e d f o r q u e r y p u r p o s e so n l y, o r i t m a y b e h e l d f o r p o s s ib l e c h a n g e s. I n s e r t i o n s t a k et h e f o r m o f a d d i n g n e w e l e m e n t s t o d e c l a r e d r e l a ti o n s w i t h -o u t r e g a r d t o a n y o r d e r i n g t h a t m a y b e p r e s e n t i n t h e i rm a c h i n e r e p r e s e n t a t i o n . D e l e t i o n s w h i c h a r e e f fe c t iv e f o rt h e c o m m u n i t y ( as o p p o s e d t o t h e i n d i v id u a l u s e r o r s u b -c o m m u n i t i es ) t a k e t h e f o r m o f r e m o v i n g e l em e n t s f r o m d e -c l a r e d r e l a t i o n s . S o m e d e l e t i o n s a n d u p d a t e s m a y b e t r i g -g e r e d b y o t h e r s , i f d e l e t i o n a n d u p d a t e d e p e n d e n c i e s b e -twe e n sp e c i f i e d r e l a t i o n s a r e d e c l a r e d in R .

    O n e i m p o r t a n t e ff e ct t h a t t h e v i e w a d o p t e d t o w a r d d a t ah a s o n t h e l a n g u a g e u s e d t o r e t r i e v e i t is in t h e n a m i n g o fd a t a e l e m e n t s a n d s e ts . S o m e a s p e c t s o f t h i s h a v e b e e n d i s-c u s s e d i n t h e p r e v i o u s s e c t i o n . Wi t h t h e u s u a l n e t w o r k

    v i e w, u s e r s wi ll o f t e n b e b u r d e n e d w i t h c o i n in g a n d u s i n gm o r e r e l a t i o n n a m e s t h a n a r e a b s o l u t e l y n e c e s s a r y, s i n c en a m e s a r e a s s o c i a te d w i t h p a t h s ( o r p a t h t y p e s ) r a t h e rt h a n w i t h r e l a t i o n s .

    O n c e a u s e r i s a w a r e t h a t a c e r t a i n r e l a t i o n i s s t o r e d , h ewi ll e x p e c t t o b e a b l e t o e x p lo i t5 i t u s i n g a n y c o m b i n a t i o no f i t s a r g u m e n t s a s " k n o w n s " a n d t h e r e m a i n i n g a r g u -m e n t s a s " u n k n o w n s , " b e c a u s e t h e i n f o r m a t i o n ( li keE v e r e s t ) i s t h e r e . T h i s i s a s y s t e m f e a t u r e ( m i ss i n g f r o mm a n y c u r r e n t i n f o r m a t i o n s y s t e m s ) w h i c h w e s h a l l c a l l( l o g i c a l l y )symmetric exploitation o f r e l a t i o n s . N a t u r a l l y,

    s y m m e t r y i n p e r f o r m a n c e is n o t t o b e e x p e c t e d .To s u p p o r t s y m m e t r i c e x p l o i t a t io n o f a s i ng le b i n a r y r e -

    l a t io n , t w o d i r e c t e d p a t h s a r e n e e d e d . F o r a r e l a t i o n o f d e -g r ee n, t h e n u m b e r o f p a t h s t o b e n a m e d a n d c o n t r o ll e d isn f a c to r i a l .

    A g a i n , i f a r e l a t i o n a l v ie w i s a d o p t e d i n w h i c h e v e r y n -a r y r e l a t i o n ( n > 2 ) h a s t o b e e x p re s s e d b y t h e u s e r as an e s t e d e x p r e s si o n i n v o l v i n g o n l y b i n a r y r e l a t io n s ( se eF e l d m a n ' s L E A P S y s t e m [1 0], f o r e x a m p l e ) t h e n 2 n - - 1n a m e s h a v e t o b e c o i n e d i n s t e a d o f o n l y n - b 1 w i t h d i r e c tn - a r y n o t a t i o n a s d e s c r i b e d i n S e c t i o n 1 .2 . F o r e x a m p l e , t h e

    5 Exploit ing a relation includes query, up date, and delete.

    4 - a r y r e l a t i o nsupply o f F ig u r e 1 , wh ic h e n t a i l s 5 n a m e s inn - a r y n o t a t i o n , w o u l d b e r e p r e s e n t e d i n t h e f o r m

    P (supplier, Q (part, R (project, qu ant ity )))

    i n n e s t e d b i n a r y n o t a t i o n a n d , t h u s , e m p l o y 7 n a m e s .A f u r t h e r d i s a d v a n t a g e o f t h i s k i n d o f e xp r e s s io n i s i t s

    a s y m m e t r y. A l t h o u g h t h i s a s y m m e t r y d o e s n o t p r o h i b i ts y m m e t r i c e x p l o i t a t io n , i t c e r t a i n l y m a k e s s o m e b a s e s o fi n t e r r o g a t i o n v e r y a w k w a r d f o r t h e u s e r t o e x p r e s s ( c o n -s i d er , fo r e xa m p l e , a q u e r y f o r th o s e p a r t s a n d q u a n t i t i e sr e l a t e d t o c e r t a i n g i v e n p r o j e c t s v i a Q a n d R ) .

    1 .6. EXPRE SSIBLE, NAMED,AND STORED RELATIONSA s s o c i a t e d w i t h a d a t a b a n k a r e t w o c o l l e c t io n s o f re l a -

    t i o n s : t h enamed set a n d t h e expressible set.T h e n a m e d s e ti s t h e c o l l e c ti o n o f a l l t h o s e r e l a t io n s t h a t t h e c o m m u n i t y ou s e r s c a n i d e n t i f y b y m e a n s o f a s i m p l e n a m e ( o r i d e n t i fi e r )A r e l a t i o n R a c q u i r e s m e m b e r s h i p i n t h e n a m e d s e t w h e n as u i t a b l y a u t h o r i z e d u s e r d e c l a r e s R ; i t l o se s m e m b e r s h i pw h e n a s u i t a b l y a u t h o r i z e d u s e r c a n c e l s t h e d e c l a r a t i o n oR .

    T h e e x p r es s ib l e s e t i s th e t o t a l c o l l e c ti o n o f r e l a t io n s t h a t

    c a n b e d e s i g n a t e d b y e x p r e ss i o n s i n t h e d a t a l a n g u a g e . S u c he x p r e ss i o n s a r e c o n s t r u c t e d f r o m s i m p l e n a m e s o f r e l a t io n si n t h e n a m e d s e t ; n a m e s o f g e n e r a t io n s , r o le s a n d d o m a i n sl o g ic a l c o n n e c t i v e s ; t h e q u a n t i f ie r s o f t h e p r e d i c a t e c a l c u -l u s ; 6 a n d c e r t a i n c o n s t a n t r e l a t i o n s y m b o l s s u c h a s = , > .T h e n a m e d s e t i s a s u b s e t o f t h e e x p r e s s ib l e s e t - - u s u a l l y av e r y s m a l l s u b s e t.

    S i n c e s o m e r e l a t i o n s i n t h e n a m e d s e t m a y b e t i m e - i n d e -p e n d e n t c o m b i n a t i o n s o f o t h e r s in t h a t s e t , i t is u s e fu l t oc o n s i d e r a ss o c i a ti n g w i t h t h e n a m e d s e t a c o l l e c t io n o fs t a t e m e n t s t h a t d e f in e t h e s e t i m e - i n d e p e n d e n t c o n s t r a i n t s .We s h a ll p o s t p o n e f u r t h e r d i s c u ss i o n o f t h i s u n t i l w e h a v ei n t r o d u c e d s e v e r a l o p e r a t i o n s o n r e l a t i o n s ( se e S e c t i o n 2 ) .

    O n e o f t h e m a j o r p r o b l e m s c o n f r o n t i n g t h e d e s i g n e r o f ad a t a s y s t e m w h i c h i s t o s u p p o r t a r e l a t io n a l m o d e l f o r i t su s e r s is t h a t o f d e t e r m i n i n g t h e c la s s o f s t o r e d r e p r e s e n t a -t i o n s to b e s u p p o r t e d . I d e a l l y, t h e v a r i e t y o f p e r m i t t e dd a t a r e p r e s e n ta t io n s s h o u l d b e j u s t a d e q u a t e t o c o v e r th es p e c t r u m o f p e r fo r m a n c e r e q u i r e m e n t s o f t h e t o t a l c o l -l e c t i o n o f i n s t a ll a t io n s . To o g r e a t a v a r i e t y l e a d s t o u n -n e c e s s a r y o v e r h e a d i n st o r a g e a n d c o n t i n u a l r e i n t e r p r e t a -t i o n o f d e s c r i p t io n s f o r t h e s t r u c t u r e s c u r r e n t l y i n e f fe c t .

    F o r a n y s e l e c t e d cl as s o f s t o r e d r e p r e s e n t a t i o n s t h e d a t as y s t e m m u s t p r o v i d e a m e a n s o f t r a n s l a t i n g u s e r r e q u e s t se x p r e s s e d i n t h e d a t a l a n g u a g e o f t h e r e l a t i o n a l m o d e l i n t oc o r r e s p o n d i n g - - a n d e f f i c i e n t - - a c t i o n s o n t h e c u r r e n ts t o r e d r e p r e s e n t a t i o n . F o r a h i g h l e v e l d a t a l a n g u a g e t h i sp r e se n t s a c h a l l e n g in g d e s ig n p r o b le m . Ne v e r th e l e s s , i t i s ap r o b l e m w h i c h m u s t b e s o l v e d - - a s m o r e u s e r s o b t a in c o n -c u r r e n t a c c e s s t o a l a r g e d a t a b a n k , r e s p o n s i b i l it y f o r p r o -v i d i n g e ff ic i en t r e s p o n s e a n d t h r o u g h p u t s h i f t s f r o m t h ei n d i v i d u a l u s e r t o t h e d a t a s y s t e m .

    s Because each relation in a practical data bank is a f inite set aevery instan t of t ime, the existential and un iversal quantif ierscan be expressed in te rms o f a func t ion tha t coun ts the num ber oelements in a ny f inite set .

    3 8 2 C o m m u n i c a t i o n s o f t h e A C M Vo l u m e 13 / N u m b e r 6 / J u n e , 19 70

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    7/11

    2 . Red u n d an c y an d Co n s i s t en cy

    2 .1 . OPERATIONS ON RELATIONSSin c e r e l a t i o n s a r e se t s , a ll o f t h e u su a l se t o p e r a t i o n s a r e

    a p p l i c a b l e t o t h e m . N e v e r t h e l e s s , t h e r e s u l t m a y n o t b e ar e l a t io n ; f o r e x am p l e , t h e u n i o n o f a b i n a r y r e l a t i o n a n d at e r n a r y r e l a t io n i s n o t a r e l a t io n .

    T h e o p e r a t i o n s d i sc u sse d b e lo w a r e sp e c i f i c a l ly f o r r e la -t i o n s . T h e s e o p e r a t i o n s a r e i n t r o d u c e d b e c a u s e o f t h e i r k e y

    r o l e i n d e r i v i n g r e l a t i o n s f r o m o t h e r r e l a t i o n s . T h e i rp r i n c i p a l a p p l i c a t i o n i s i n n o n i n f e r e n t i a l i n f o r m a t i o n s y s -t e m s - s y s t e m s w h i c h d o n o t p r o v i d e lo g ic a l i n fe r e n ces e r v i c e s - - a l t h o u g h t h e i r a p p l i c a b i l i t y i s n o t n e c e s s a r i l yd e s t r o y e d w h e n s u c h s e r v i c e s a r e a d d e d .

    M o s t u s e r s w o u l d n o t b e d i r e c t l y c o n c e r n e d w i t h t h e s eo p e r a t i o n s . I n f o r m a t i o n s y s t e m s d e s i g n e r s a n d p e o p l e c o n -c e r n e d w i t h d a t a b a n k c o n t r o l s h o u l d , h o w e v e r, b e t h o r -o u g h l y f a m i l i a r w i t h t h e m .

    2.1.1. Permutation. A b i n a r y r e l a t i o n h a s a n a r r a yr e p r e s e n t a t i o n w i t h t w o c o l u m n s . I n t e r c h a n g i n g t h e s e c o l-u m n s y i e l d s t h e c o n v e r s e r e la t i o n . M o r e g e n e r a l ly, i f ap e r m u t a t i o n i s a p p l i e d t o t h e c o l u m n s o f a n n - a r y r e l a t io n ,

    t h e r e su l t i n g r e l a t i o n i s s a id t o b e apermutation o f t h eg i v e n r e l a ti o n . T h e r e a r e , f o r e x a m p l e , 4 ! - - 2 4 p e r m u t a -t i o n s o f t h e r e l a t i o nsupply in Fig u r e 1 , i f we in c lu d e th ei d e n t i t y p e r m u t a t i o n w h i c h l e a ve s t h e o r d e r i n g o f c o l u m n su n c h a n g e d .

    S in c e t h e u se r ' s r e l a t i o n a l m o d e l c o n s i s t s o f a c o l l e c t i o no f r e l a ti o n s h i p s ( d o m a i n - u n o r d e r e d r e l a ti o n s ) , p e r m u t a -t i o n i s n o t r e l e v a n t t o s u c h a m o d e l c o n s i d e r e d i n i so l a ti o n .I t i s , h o w e v e r, r e l e v a n t t o t h e c o n s i d e r a t i o n o f s t o r e dr e p r e s e n t a t i o n s o f t h e m o d e l . I n a s y s t e m w h i c h p r o v i d e ss y m m e t r i c e x p l o i t a t i o n o f r e l a t i o n s , t h e s e t o f q u e r i e sa n s w e r a b l e b y a s t o r e d r e l a t i o n i s i d e n t i c a l t o t h e s e ta n s w e ra b l e b y a n y p e r m u t a t i o n o f t h a t r e la t io n . A l t h o u g h

    i t i s l o g ic a l ly u n n e c e s s a r y t o s t o r e b o t h a r e l a t i o n a n d s o m ep e r m u t a t i o n o f i t, p e r f o r m a n c e c o n s i d e ra t i o n s c o u l d m a k ei t advisable.

    2.1.2. Projection. Sup pos e now we se lec t ce r t a in co l-

    umn s o f a re l a ti on ( s t ri k ing ou t t he o thers ) a nd then re -

    mov e f ro m the resu l ti ng a r ray any dup l i ca t i on in t he rows .

    The f i na l a r ray rep resen t s a rel a t ion wh ic h i s s a id t o be a

    projection o f t h e g iv e n r e l a t i o n .A se l e c t i o n o p e r a to r 7 r i s u se d to o b t a in a n y d e s i r e d

    p e r m u t a t i o n , p r o j e c t i o n , o r c o m b i n a t i o n o f t h e t w o o p e r a -t ion s. T hu s, if L is a l ist of lc indic es 7 L = /1 ,/2 , , ika n d R i s a n n - a r y r e l a t i o n ( n _> k ) , t h e n7rL (R ) i s t h e k - a r yr e l a t i o n wh o se j t h c o lu m n is c o lu m n i j o f R ( j = 1 , 2 , . . . , k )

    e x c e p t t h a t d u p l i c a t i o n i n r e s u lt i n g ro w s is r e m o v e d . C o n -s i d e r t h e r e l a t i o nsupply o f F i g u r e 1 . A p e r m u t e d p r o j e c t i o no f t h i s r e l a t i o n i s e x h ib i t e d in F ig u r e 4 . No te t h a t , i n t h i sp a r t i c u l a r c a s e, t h e p r o j e c t i o n h a s f e w e r n - t u p l e s t h a n t h er e l a t i o n f r o m w h i c h i t i s d e r iv e d .

    2.1.3. Join. S u p p o s e w e a r e g i v e n t w o b i n a r y r e l a-t i o n s , w h i c h h a v e s o m e d o m a i n i n c o m m o n . U n d e r w h a tc i r c u m s t a n c e s c a n w e c o m b i n e t h e s e r e l a t i o n s t o f o r m a

    7 When dealing wit h relati onships, we use domain names (role-qualified whenever necessary) instead of domain positions.

    t e r n a r y r e l a t i o n w h i c h p r e s e r v e s a l l o f t h e i n f o r mth e g iv e n r e l a t i o n s?

    T h e e x a m p l e in F i g u r e 5 s h o w s t w o r e l a t io n s R , Sa r e j o i n a b l e w i t h o u t l o s s o f i n f o r m a t i o n , w h i l e Fsh o ws a j o in o f R w i th S . A b in a r y r e l a t i o n R i sjoinabw i t h a b i n a r y r e l a t i o n S i f t h e r e e x i s ts a t e r n a r y r e lsu c h t h a t 7 r1 2 (U) = R a n d ~ '2 3(U) - - S . A n y su c h r e l a t i o n i s c a l le d ajoin o f R w i t h S . I f R , S a r e b i n a r yt i o n s s u c h t h a t v2 ( R ) = ~ i ( S ) , t h e n R i s j o i n a b l e On e jo in t h a t a lwa y s e x i s t s i n su c h a c a se i s t h enaturjoin o f R w i th S d e f in e d b y

    R*S = {(a,b,c):R(a,b) A S(b,c)}wh e r e R ( a , b ) h a s t h e v a lu etrue i f ( a , b ) i s a m e m b e r a n d s im i l a r ly f o r S ( b , c ) . I t i s im m e d ia t e t h a t

    "12 (R ,S ) = R

    a n d

    ~ 23 (R ,S) = S .

    N o t e t h a t t h e j o i n s h o w n i n F i g u r e 6 i s t h e n a t u ro f R w i t h S f r o m F i g u r e 5 . A n o t h e r j o i n is s h o w n i n7.

    FIG. 4.

    II31 (supp ly) (project supp l ier )

    5 15 21 47 2

    A permuted projection of the relation in Figure 1

    R ( supp l i e r pa r t ) S (pa r t p ro jec t )1 1 1 12 1 1 22 2 2 1

    FIG. 5. Two joinab le relatio ns

    R*S

    FIG. 6.

    ( supp l i e r pa r t p ro jec t )

    1 1 11 1 22 1 12 1 22 2 1

    The natural join of R, with S (from Figure 5)

    U ( supp l i e r pa r t p ro jec t )

    1 1 22 1 12 2 1

    FIG. 7. Another join of R with S (from Figure 5)

    I n s p e c t i o n o f t h e s e r e l a t io n s r e v e a l s a n e l e m e nm e n t 1 ) o f t h e d o m a i npart ( t h e d o m a i n o n w h i c h t h ei s t o b e m a d e ) w i t h t h e p r o p e r t y t h a t i t p o s s e s s et h a n o n e r e l a t iv e u n d e r R a n d a l so u n d e r S . I t i s t

    Vo l u m e 13 / Number 6 / June, 1970 C o m m u n i c a t i o n s o f t h e ACMM 38

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    8/11

    m e n t w h i c h g i ve s r is e t o t h e p l u r a l i t y o f j o i n s. S u c h a n e l e -m e n t i n t h e j o i n i n g d o m a i n i s c a l l e d apoint of ambiguityw i t h r e s p e c t t o t h e j o i n i n g o f R w i t h S .

    I f e i t he r ~r21 R ) o r S i s a func t i on , 8 no p o i n t o f am b i gu i t yc a n o c c u r i n j o i n i n g R w i t h S . I n s u c h a c a s e , t h e n a t u r a lj o i n o f R w i t h S i s t h e o n l y j o in o f R w i t h S . N o t e t h a t t h er e i t e r a t e d q u a l i fi c a t i o n " o f R w i t h S " i s n e c e s s a r y, b e c a u s eS m i g h t b e j o i n a b l e w i t h R ( a s w e l l a s R w i t h S ) , a n d t h i sj o i n w o u ld b e a n e n t i r e l y s e p a r a t e c o n s i d e r a t i o n . I n F i g u r e5 , non e of the re la t ion s R , m l (R) , S , ~r21 S) i s a func t ion .

    A m b i g u i t y i n t h e j o i ni n g of R w i t h S c a n s o m e t i m e s b er e s o l v e d b y m e a n s o f o t h e r r e l a t io n s . S u p p o s e w e a r e g i v e n ,o r c a n d e r i v e f r o m s o u r c e s i n d e p e n d e n t o f R a n d S , a r e l a -t i o n T o n t h e d o m a i n sproject a n d supplier wi t h t he fo l l o w-i n g p r o p e r t i e s :

    (1) ~-~(T) = ~-2(S),

    (2 ) w2(T) = ~ (R ) ,

    (3 ) T ( j , s ) - - + 3 p ( R ( S , p ) A S ( p , j ) ) ,

    (4 ) R ( s , p ) - - - - > 3 j( S ( p , j) A T( j , s ) ) ,

    (5 ) S( p , j ) - - -->3s(T( j, s ) A R(s , p ) ) ,

    t h e n w e m a y f o r m a t h r e e - w a y j o i n o f R , S , T; t h a t i s , at e r n a r y r e l a t io n s u c h t h a t

    ~ ( v ) = R , ~ ( U ) = S , ~ ( V ) = T.

    S uch a j o i n w i l l be ca l l ed acyclic 3-join t o d i s t i n gu i s h i tf r o m a l inear 3-join w h i c h w o u l d b e a q u a t e r n a r y r e l a t i o nV s u c h t h a t

    ~r12(V) = R , ~r~3 (V)= Z, 7r34 (V)= T .

    Whi l e i t i s pos s i b l e fo r m ore t h an one cyc l i c 3 - jo i n t o ex i s t( se e F i g u r e s 8 , 9 , f o r a n e x a m p l e ) , t h e c i r c u m s t a n c e s u n d e r

    w h i c h t h i s c a n o c c u r e n t a il m u c h m o r e s e v e r e c o n s t r a i n t s

    FIG. 8.

    R (8 p) s (p i) T ~ 8)l a a d d l2 a a e d 22 b b d e 2

    b e e 2

    B i n a r y r e l a t i o n s w i t h a p l u r a l i t y o f c y c l i c 3 ~ o i n s

    FIG. 9 .

    U (8 p j) U ' (8 p i)l a d l a d2 a e 2 a d2 b d 2 a e2 b e 2 b d

    2 b e

    Two cycl ic 3-joins of the relat ions in Figure 8

    t h a n t h o s e f o r a p l u r a l i t y o f 2 - j o in s . To b e s p e ci fi c , t h e r e -l a t i o n s R , S , T m u s t p o s s e s s p o i n t s o f a m b i g u i t y w i t hres p ec t t o j o i n i ng R wi t h S ( s ay po i n t x ) , S w i t h T ( s ay

    8 A funct ion is a bin ary relat ion, which is one-one or man y-one,but not one-many.

    y ) , a n d T w i t h R ( s a yz) , a n d , f u r t h e r m o r e , y m u s t b e are l a t i v e o f x u n d er S , z a r e l a t i v e o f y u n d er T, an d x ar e l a t i v e o f z u n d e r R . N o t e t h a t i n F i g u r e 8 th e p o i n t sx = a ; y = d ; z = 2 ha v e t h i s p ro p e r t y.

    T h e n a t u r a l l i n e a r 3 - jo i n o f t h r e e b i n a r y r e l a t i o n s R , S ,T i s g i ven by

    R . S . T = { ( a , b , c , d ) : R ( a , b ) A S ( b , c ) A T ( c , d ) }

    w h e r e p a r e n t h e s e s a r e n o t n e e d e d o n t h e l e f t -h a n d s i d e b ec a u s e t h e n a t u r a l 2 - j o i n ( * ) i s a s s o c i a t i v e . To o b t a i n t h ec y c l ic c o u n t e r p a r t , w e i n t r o d u c e t h e o p e r a t o r .y w h i c h p r o -du ces a r e l a t i o n o f d eg ree n - 1 f rom a r e l a t i on o f deg ree nb y t y i n g it s e n d s t o g e t h e r. T h u s , i f R i s a n n - a r y r e l a t i o n(n >_ 2), thetie o f R i s d e f i n e d b y t h e e q u a t i o n

    " I (R) = { (a l , a2 , . . . , a ,_ l ) : R( a l , a2 , . . . , a~- l , an )A al = an}.

    We m a y n o w r e p r e s e n t t h e n a t u r a l c y c li c 3 - jo i n o fR, S, Tb y t h e e x p r e s s i o n

    ( R . S . T ) .

    E x t e n s i o n o f t h e n o t i o n s o f l i n e a r a n d c y c l ic 3 - jo i n a n dt h e i r n a t u r a l c o u n t e r p a r t s t o t h e j o i n i n g o f n b i n a r y r e l a -t i o n s ( w h e re n > 3 ) i s o b v i o u s . A f e w w o r d s m a y b e a p -p r o p r i a t e , h o w e v e r, r e g a r d i n g t h e j o i n i n g o f r e l a t i o n s w h i c ha r e n o t n e c e s s a r i l y b i n a r y. C o n s i d e r t h e c a s e o f t w o r e l a -t i on s R (d eg ree r ) , S (d eg ree s ) wh i ch a re t o be j o i ned onp o f t he i r d om a i ns (p < r, p < s ) . F o r s i m p l i c i t y, s up -p o s e t h e s e p d o m a i n s a r e t h e l a s t p o f t h e r d o m a i n s o f Ra n d t h e f ir s t p o f t h e s d o m a i n s o f S . I f t h i s w e r e n o t s o , w ec o u l d a l w a y s a p p l y a p p r o p r i a t e p e r m u t a t i o n s t o m a k e i ts o . N o w, t a k e t h e C a r t e s i a n p r o d u c t o f t h e f i r str- p d o -

    m a i n s o f R , a n d c a ll t h i s n e w d o m a i n A . T a k e t h e C a r -t e s i a n p r o d u c t o f t h e l a s t p d o m a i n s o f R , a n d c a l l t h i s BTa k e t h e C a r t e s i a n p r o d u c t o f t h e l a s ts- p d o m a i n s o f Sa n d c a l l t h i s C .

    We c a n t r e a t R a s i f i t w e r e a b i n a r y r e l a t i o n o n t h ed o m a i n s A , B . S i m i l a r l y, we c a n t r e a t S a s i f i t w e r e a b i -n a r y r e l a t i o n o n t h e d o m a i n s B , C . T h e n o t i o n s o f l i n e a ra n d c y c l ic 3 -j o i n a r e n o w d i r e c t l y a p p l i c a b l e . A s i m i l a r a pp r o a c h c a n b e t a k e n w i t h t h e l i n e a r a n d c y c l ic n - j o i n s o f nr e l a t i o n s o f a s s o r t e d d e g r e e s .

    2.1.4. Composition. T h e r e a d e r i s p r o b a b l y f a m i l i a rw i t h t h e n o t i o n o f c o m p o s i t i o n a p p l i e d t o f u n c t i o n s . Wes h a l l d i s c u ss a g e n e r a l i z a t i o n o f t h a t c o n c e p t a n d a p p l y i

    f ir s t to b i n a r y r e l a t i o ns . O u r d e f i ni t i o n s o f c o m p o s i t i o na n d c o m p o s a b i l i t y a r e b a s e d v e r y d i r e c t l y o n t h e d e f in i t i o no f j o i n a n d j o i n a b i l i t y g i v e n a b o v e .

    S u p p o s e w e a r e g i v e n t w o r e l a t io n s R , S . T is acom-position o f R wi t h S i f t h e r e ex i s t s a j o i n U o f R wi t h S s ucht h a t T = 7913 U ). T h u s , t w o r e l a t io n s a r e c o m p o s a b l e i fa n d o n l y i f t h e y a r e j o i n a b l e . H o w e v e r, t h e e x i s t e n c e o fm o r e t h a n o n e j o in o f R w i t h S d o e s n o t i m p l y t h e e x i s t e n co f m o r e t h a n o n e c o m p o s i t i o n o f R w i t h S .

    C o r r e s p o n d i n g t o t h e n a t u r a l j o i n o f R w i t h S i s t h e

    384 C o m m u n i c a t i o n s o f t h e AC M Vo l u m e 13 / Nu m b e r 6 / J u ne , 1970

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    9/11

    natural composition 9 o f R w i t h S d e f i n e d b y

    R. S = ~'i~(R*S).

    Ta k i n g t h e r e l a t io n s R , S f r o m F i g u r e 5 , th e i r n a t u r a l c o m :p o s i t io n is e x h i b i t e d i n F i g u r e 1 0 a n d a n o t h e r c o m p o s i t i o ni s e x h i b i t e d in F i g u r e 11 ( d e r i ve d f r o m t h e j o i n e x h i b i t e di n F i g u r e 7 ) .

    Fro. 10.

    R. S (project suppl ier)

    1 11 22 12 2

    The natural composit ion of R with S (from Figure 5)

    T (project supplier)1 22 1~:

    Fro. 11. An other comp osit ion of R with S (from Figure 5)

    W h e n t w o o r m o r e j o i n s e xi s t, t h e n u m b e r o f d i s t i n c tc o m p o s i t i o n s m a y b e a s fe w as o n e o r as m a n y a s th e n u m -b e r o f d i s t i n c t j o in s . F ig u l e 1 2 sh o ws a n e x a m p le o f two

    r e l a t io n s w h i c h h a v e s e v e r a l jo i n s b u t o n l y o n e c o m p o s i t io n .N o t e t h a t t h e a m b i g u i t y o f p o i n t c is lo s t in c o m p o s i n g Rw i t h S , b e c a u s e o f u n a m b i g u o u s a s s o c ia t io n s m a d e v i a t h epo in ts a , b , d , e .

    R (suppli er part) S (part project)

    1 a a g1 b b f1 c c f2 c c g2 d d g2' e e f

    composit ionio . 12. M an y jo ins, on ly one

    E x t e n s i o n o f c o m p o s i t i o n t o p a i r s o f r e l a t io n s w h i c h a r e

    n o t n e c e s s a r i l y b i n a r y ( a n d w h i c h m a y b e o f d i ff e r e n t d e -g r e e s ) f o l lo ws th e sa m e p a t t e r n a s e x t e n s io n o f p a i r w i sejo in in g to su c h r e l a t i o n s .

    A l a c k o f u n d e r s t a n d i n g o f r e l a t io n a l c o m p o s i t i o n h a s l e ds e v e r a l s y s t e m s d e s i g n e r s i n t o w h a t m a y b e c a l l e d t h econnection trap. T h i s t r a p m a y b e d e s c r i b e d i n t e r m s o f t h ef o ll o w in g e x a m p l e . S u p p o s e e a c h s u p p l i e r d e s c r i p t i o n i sl i n k e d b y p o i n t e r s t o t h e d e s c r i p ti o n s o f e a c h p a r t s u p p l i e db y t h a t s u p p l i e r , a n d e a c h p a r t d e s c r i p t i o n i s s i m i l a r l yl i n k e d t o t h e d e s c r i p ti o n s o f e a c h p r o j e c t w h i c h u s e s t h a tp a r t . A c o n c lu s io n is n o w d r a w n w h ic h i s , i n g e n e r a l , e r -r o n e o u s : n a m e l y t h a t , i f a l l p o s s ib l e p a t h s a r e f o l lo w e d f r o ma g i v e n s u p p l i e r v i a t h e p a r t s h e s u p p l i e s t o t h e p r o j e c t s

    u s in g th o se p a r t s , o n e w i ll o b t a in a v a l id se t o f a ll p r o j e c t ss u p p l i e d b y t h a t s u p p l ie r. S u c h a c o n c l u s io n is c o r r e c to n l y i n t h e v e r y s p e c ia l c a se t h a t t h e t a r g e t r e l a t i o n b e -t w e e n p r o j e c t s a n d s u p p l i e r s i s , i n f a c t , t h e n a t u r a l c o m -p o s i t io n o f t h e o t h e r t w o r e l a t i o n s - - a n d w e m u s t n o r m a l l ya d d t h e p h r a s e " f o r a ll t i m e , " b e c a u s e t h i s i s u s u a ll y i m -p l i e d i n c l a im s c o n c e r n in g p a th - f o l lo win g t e c h n iq u e s .

    9 Other writers ten d to ignore compositions other than the na-tural one, and accordingly refer to this part icular composit ion asthe composit ion--see, for example, Kelley 's "General Topology."

    2 .1 .5 . Restriction. A su b se t o f a r e l a t i o n i s a r e l aO n e w a y i n w h i c h a r e l a t i o n S m a y a c t o n a r e l a t i og e n e r a t e a s u b s e t o f R i s t h r o u g h t h e o p e r a t i o nrestrictiono f R b y S . T h i s o p e r a t i o n i s a g e n e r a l i za t i o n o f t h e t i o n o f a f u n c t io n to a su b se t o f i t s d o m a in , a n d i s as fo l lows.

    L e t L , M b e e q u a l - l e n g th l i st s o f i n d i c e s su c hL = / 1 , / 2 , " " , i k , M = j l , j 2 , " " , j k w h e r e k ~ do f R a n d k ~ d e g r e e o f S . T h e n t h e L , M r e s t r ic t i o n S d e n o t e dR dM S i s t h e m a x i m a l s u b s e t R ' o f R s u c h

    ~ L ( R ' ) = ~ M ( S ) .

    T h e o p e r a t i o n i s d e f i n e d o n l y i f e q u a l i t y i s a p p l i c atwe e n e l e m e n t s o f ~ ' ~h R) o n th e o n e h a n d a n d ~ ' ~hthe o th er fo r a l l h = 1 , 2 , , k .

    T h e t h r e e r e l a t i o n s R , S , R ' o f F i g u r e 1 3 s a t i s f y t ht i o n R ' = R~,3)[i,2iS.

    R 0 p J) S (p j) R ' (s p j)1 a A a A 1 a A2 a A c B 2 a A2 a B b B 2 b B2 b A2 b B

    FIG. 13. Exam ple of restr ict ion

    We a r e n o w i n a p o s i t i o n t o c o n s i d e r v a r io u s a p p l io f th e s e o p e r a t i o n s o n r e l a t i o n s.

    2.2. REDUNDANCYR e d u n d a n c y i n th e n a m e d s e t of r e la t io n s m u s t

    t i n g u is h e d f r o m r e d u n d a n c y i n t h e s t o r e d s e t o f r e p rt i o n s. We a r e p r i m a r i l y c o n c e r n e d h e r e w i t h t h e fTo b e g i n w i t h , w e n e e d a p r e c i s e n o t i o n o f d e r i v a b ir e l a t i o n s .

    Su p p o se 0 i s a c o l l e c t i o n o f o p e r a t i o n s o n r e l a t i oe a c h o p e r a t io n h a s t h e p r o p e r t y t h a t f r o m it s o p e r

    y i e ld s a u n iq u e r e l a t i o n ( t h u s n a tu r a l j o in i s e l i g ibjo in i s n o t ) . A r e l a t i o n R i sO-derivable f r o m a se t S o f r et i o n s i f t h e r e e x i s ts a se q u e n c e o f o p e r a t i o n s f r o m tl e c t i o n 0 wh ic h , f o r a l l t im e , y i e ld s R f r o m m e m b e rT h e p h r a s e " f o r a ll t i m e " i s p r e s e n t , b e c a u s e w e a re w i t h t i m e - v a r y i n g r e l a t io n s , a n d o u r i n t e r e s t i s in d e ri t y w h i c h h o l d s o v e r a s i g n i fi c a n t p e r i o d o f t im e . Fn a m e d s e t o f re l a t io n s h i p s i n n o n i n f e r e n t i a l s y s te m sp e a r s t h a t a n a d e q u a te c o l l e c t i o n 01 c o n ta in s t h e f oo p e r a t i o n s : p r o j e c t i o n , n a tu r a l j o in , t i e , a n d r e s t rP e r m u t a t i o n i s i r r e l e v a n t a n d n a t u r a l c o m p o s i t i on o t b e i n c lu d e d , b e c a u s e i t is o b t a i n a b l e b y t a k i n g a j o i n a n d t h e n a p r o j e c t io n . F o r t h e s t o r e d s e t o f r e p rt i o n s, a n a d e q u a t e c o l le c t io n ~ o f o p e r a t io n s w o u l d p e r m u t a t i o n a n d a d d i t i o n a l o p e r a t i o n s c o n c e r n e d w is e t t i n g a n d m e r g in g r e l a ti o n s , a n d o r d e r i n g a n d c o nt h e i r e l e m e n t s .

    2.2.1. Stronq Redundancy. A se t o f r e l a t i o n s i sstronglyredundant i f i t c o n ta in s a t l e a s t o n e r e l a t i o n t h a t p o sa p r o j e c t i o n w h i c h i s d e r i v a b l e f r o m o t h e r p r o j e c tr e l a t i o n s i n t h e se t . T h e f o l l o win g two e x a m p le s t e n d e d t o e x p l a i n w h y s t r o n g r e d u n d a n c y i s d e fi nw a y, a n d t o d e m o n s t r a t e i t s p r a c t i c a l u s e . I n t h e f i

    Vo l u m e 1 3 / N u m b e r 6 / J u n e , 1 9 7 0 C o m m u n i c a t i o n s o f t h e A C M 3 8 5

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    10/11

    a m p le t h e c o l l e c t i o n o f r e l a t i o n s c o n s i s t s o f j u s t t h e f o l l o w-in g r e l a t i o n :

    employee (serial #, name, manager#, managername )

    wi th serial# a s th e p r i m a r y k e y a n dmanager# as a fo re ignk e y. L e t u s d e n o t e th e a c t i v e d o m a i n b y , ~ , a n d s u p p o s et h a t

    a n d

    At (manager#) c A , (serial#)

    At (managername ) C At (name)

    f o r a l l t i m e t . I n t h i s c a s e t h e r e d u n d a n c y i s o b v i o u s : t h ed o m a i n managername i s u n n e c e s s a r y. To s e e t h a t i t i s as t r o n g r e d u n d a n c y a s d e f i n e d a b o v e , w e o b s e r v e t h a t

    ~34 (employ ee) = ~'12 employee )l[lm (emplo yee).I n t h e s e c o n d e x a m p l e t h e c o l l e c ti o n o f re l a t io n s i n c l u d e s ar e l a t i o n S d e sc r ib in g su p p l i e r s w i th p r im a r y k e y s# , a r e -l a t i o n D d e s c r i b i n g d e p a r t m e n t s w i t h p r i m a r y k e y d # , ar e l a t i o n J d e s c r i b in g p r o j e c t s w i t h p r i m a r y k e y j# , a n d t h ef o l lo win g r e l a t i o n s :

    P ( s# , d# , . . . ) , Q ( s# , j# , . - . ) , R (d# , j# , . . . ) ,

    w h e r e in e a c h c a s e . - . d e n o t e s d o m a i n s o t h e r t h a n s # , d # ,j # . L e t u s s u p p o s e t h e f o ll o w i n g c o n d i t i o n C i s k n o w n t oh o l d i n d e p e n d e n t o f t im e : s u p p l i e r s s u p p l ie s d e p a r t m e n td ( r e la t i o n P ) i f a n d o n l y i f s u p p l i e r s s u p p l ie s s o m e p r o j e c tj ( r e l a t i o n Q) t o wh ic h d i s a s s ig n e d ( r e l a t i o n R) . T h e n , wec a n w r i t e t h e e q u a t i o n

    ~'12 ( P ) = 71-12 Q ) . 71"21R )

    a n d t h e r e b y e x h ib i t a s tr o n g r e d u n d a n c y.A n i m p o r t a n t r e a s o n f o r t h e e x i s te n c e o f s t r o n g r e -

    d u n d a n c i e s i n t h e n a m e d s e t o f r e l a ti o n s h i p s is u s e r c o n -

    v e n i e n c e . A p a r t i c u l a r c a s e o f t h i s i s th e r e t e n t i o n o f s e m i -o b s o l e t e r e l a t io n s h i p s i n t h e n a m e d s e t so t h a t o l d p r o -g r a m s t h a t r e fe r t o t h e m b y n a m e c a n c o n t i n u e to r u n c o r-r e c t l y. K n o w l e d g e o f t h e e x i st e n c e o f s t r o n g r e d u n d a n c i e si n t h e n a m e d s e t e n a b l e s a s y s t e m o r d a t a b a s e a d m i n i s -t r a t o r g r e a t e r f r e e d o m i n t h e s e l e c t i o n o f st o r e d r e p r e s e n t a -t i o n s t o c o p e m o r e e f fi c ie n t ly w i t h c u r r e n t t r af fi c . I f t h es t r o n g r e d u n d a n c i e s i n t h e n a m e d s e t a r e d i r e c t ly r e fl e c t e di n s t r o n g r e d u n d a n c i e s i n t h e s t o r e d s e t ( o r i f o t h e r s t r o n gr e d u n d a n c i e s a r e in t r o d u c e d i n t o t h e s t o r e d s e t ) , th e n , g e n -e r a l l y s p e a k i n g , e x t r a s t o r a g e s p a c e a n d u p d a t e t i m e a r ec o n s u m e d w i t h a p o t e n t i a l d r o p i n q u e r y t i m e f o r s om eq u e r i e s a n d i n l o a d o n t h e c e n t r a l p r o c e s s i n g u n i t s .

    2 .2 .2 . Weak Redundancy. A s e c o n d t y p e o f r e d u n -d a n c y m a y e x i st . I n c o n t r a s t t o s t ro n g r e d u n d a n c y i t is n o tc h a r a c t e r i z e d b y a n e q u a t i o n . A c o l l e c ti o n o f re l a t i o n s i sweak ly redundant i f i t c o n t a i n s a r e l a t i o n t h a t h a s a p r o j e c -t i o n w h i c h i s n o t d e r i v a b l e f r o m o t h e r m e m b e r s b u t i s a ta l l t i m e s a p r o j e c t i o n o fsome j o i n o f o t h e r p r o j e c t i o n s o fr e l a t i o n s i n t h e c o l l e c t i o n .

    We c a n e x h ib i t a w e ak r e d u n d a n c y b y t a k i n g t h e s e c o n de x a m p l e ( c it e d a b o v e ) f o r a s tr o n g r e d u n d a n c y , a n d a s -s u m i n g n o w t h a t c o n d i t i o n C d o e s n o t h o l d a t a l l t i m e s .

    T h e rela tion s ~12 (P ) , ~r12 Q),'~'12 R) a r e c o m p l e x r e l a t io n sw i t h t h e p o s s i b il i ty o f p o i n ts o f a m b i g u i t y o c c u r r i n g ft i m e t o t i m e i n t h e p o t e n t i a l j o i n i n g o f a n y t w o . U nt h e s e c i r c u m s t a n c e s , n o n e o f t h e m i s d e r i v a b l e f r o m o t h e r t w o . H o w e v e r, c o n s t r a i n t s d o e x i s t b e t w e e n t hs i n c e e a c h i s a p r o j e c t i o n o f s o m e c y c l i c j o i n o f t h e t h r et h e m . O n e o f t h e w e a k r e d u n d a n c i e s c a n b e c h a r a c t e rb y th e s t a t e m e n t : f o r a l l t im e , ~12 ( P ) i ssome c o m p o s i t i o n

    of ~-12 Q) w i th "~'21 R) . T he co mp osi t ion in qu est i on mb e t h e n a t u r a l o n e a t s o m e i n s ta n t a n d a n o n n a t u r a l o na n o t h e r i n s t a n t.

    G e n e r a l l y s p e a k i n g, w e a k r e d u n d a n c i e s a r e i n h e r e nt h e l o g i ca l n e e d s o f t h e c o m m u n i t y o f u se r s . T h e y a r er e m o v a b l e b y t h e s y s t e m o r d a t a b a s e a d m i n i s t ra t o rt h e y a p p e a r a t a l l, t h e y a p p e a r i n b o t h t h e n a m e d s e tt h e s t o r e d s e t o f r e p r e s e n t a t io n s .

    2.3. CONSISTENCYW h e n e v e r t h e n a m e d s e t o f r e l at i o n s is r e d u n d a n t

    e i t h e r se n se , we sh a l l a s so c i a t e w i th t h a t s e t a c o l l e c t i os t a t e m e n t s w h i c h d e f i n e a l l o f t h e r e d u n d a n c i e s w h i c h i n d e p e n d e n t o f t i m e b e t w e e n t h e m e m b e r r e l a ti o n s. I f

    i n f o r m a t io n s y s t e m l a c k s - - a n d i t m o s t p r o b a b l y w i l l -t a i l e d s e m a n t i c i n f o r m a t i o n a b o u t e a c h n a m e d r e l a t i oc a n n o t d e d u c e t h e r e d u n d a n c i e s a p p l i c a b l e t o t h e n as e t . I t m i g h t , o v e r a p e r i o d o f t i m e , m a k e a t t e m p t si n d u c e t h e r e d u n d a n c ie s , b u t s u c h a t t e m p t s w o u l d b elible.

    G i v e n a c o l l e c ti o n C o f t i m e - v a r y i n g r e l a t io n s , a ns o c i a t e d s e t Z o f c o n s t r a i n t s t a t e m e n t s a n d a n i n s t a n t a nv a lu e V f o r C , we sh a l l c a l l t h e s t a t e(C, Z, V ) consistento r inconsistent a c c o r d i n g a s V d o e s o r d o e s n o t s a t i s f y F o r e x a m p l e , g i ve n s t o r e d r e l a ti o n s R , S , T t o g e t h e r t h e c o n s t r a i n t s t a t e m e n t'%'12(T) i s a c o m p o s i t i o n o7r~2 ( R) w i th 7r~2 S ) " , we m a y c h e c k f r o m t im e to t im e t h e v a l u e s s t o r e d f o r R , S , T s a t i s f y t h i s c o n s t r a i n t . A ng o r i t h m f o r m a k i n g t h i s c h e c k w o u l d e x a m i n e t h e f ir s tc o l u m n s o f e a c h o f R , S , T ( in w h a t e v e r w a y t h e y a r e r es e n t e d in t h e s y s t e m ) a n d d e t e r m i n e w h e t h e r

    (1) T ry(T) = ~-i(R ),

    (2) ~(T) = ~(S) ,

    ( 3 ) f o r e v e r y e l e m e n t p a i r ( a, c ) i n t h e r e l a t i o n ~2 th e r e i s a n e l e m e n t b su c h th a t ( a , b ) is i n ~-12an d (b, c ) i s in7r12(S).

    T h e r e a r e p r a c t i c a l p r o b l e m s ( w h ic h w e s h a l l n o t d i sh e r e ) i n t a k i n g a n i n s t a n t a n e o u s s n a p s h o t o f a c o l l e

    o f r e l a ti o n s , s o m e o f w h i c h m a y b e v e r y l a r g e a n d h iv a r i a b l e .

    I t i s i m p o r t a n t t o n o t e t h a t c o n s i s t e n c y a s d e fi n e d a bi s a p r o p e r t y o f t h e i n s t a n t a n e o u s s t a t e o f a d a t a b a n k ,i s i n d e p e n d e n t o f h o w t h a t s t a t e c a m e a b o u t . T h u sp a r t i c u l a r , t h e r e i s n o d i s t i n c t i o n m a d e o n t h e b a s iw h e t h e r a u s e r g e n e r a t e d a n i n c o n s i s t e n c y d u e t o a n ao m i s s i o n o r a n a c t o f c o m m i s s io n . E x a m i n a t i o n o f a s i

    10 A bin ary rela tion is com plex if neithe r it no r its conve rsfunction.

    3 8 6 C o m m u n i c a t i o n s o f t h e A M C Vo l u m e 13 / N u m b e r 6 / J u n e , 19 70

  • 8/14/2019 A Relational Model of Data for Large Shared Data Banks - E. F. Codd

    11/11

    e x a m p le w i ll sh o w th e r e a so n a b le n e ss o f t h i s ( p o ss ib ly u n -c o n v e n t i o n a l ) a p p r o a c h t o c o n s i s t e n c y.

    S u p p o s e t h e n a m e d s e t C i n c l u d e s t h e r e l a t i o n s S , J , D ,P, Q , R o f t h e e x a m p l e in S e c t i o n 2 .2 a n d t h a t P, Q , Rp o s s e s s e i t h e r t h e s t r o n g o r w e a k r e d u n d a n c i e s d e s c r i b e dt h e r e i n ( in th e p a r t i c u l a r c a s e n o w u n d e r c o n s i d e r a t i o n , itd o e s n o t m a t t e r w h i c h k in d o f r e d u n d a n c y o c c u r s ). F u r t h e r ,s u p p o s e t h a t a t s o m e t i m e t t h e d a t a b a n k s t a t e i s c o n s is t e n ta n d c o n t a i n s n o p r o j e c t j s u c h t h a t s u p p l i e r 2 s u p p l ie sp r o j e c t j a n d j i s a ss i gn e d t o d e p a r t m e n t 5 . A c c o r d i n g l y,t h e r e i s n o e l e m e n t ( 2 , 5 ) i n ~-i2 ( P ) . No w, a u se r i n t r o d u c e sth e e l e m e n t ( 2 , 5 ) i n to ~ r i2 P ) b y in se r t i n g so m e a p p r o p r i -a t e e l e m e n t i n t o P. T h e d a t a b a n k s t a t e i s n o w i n c o n s i s te n t .T h e i n c o n s i s t e n c y c o u l d h a v e a r i s e n f r o m a n a c t o f o m i s -s io n , i f t h e i n p u t ( 2, 5 ) i s c o r r e c t , a n d th e r e d o e s e x i s t ap r o j e c t j su c h th a t su p p l i e r 2 su p p l i e s j a n d j i s a s s ig n e d tod e p a r t m e n t 5 . I n t h i s c a se , it is v e r y l ik e l y t h a t t h e u s e ri n t e n d s i n t h e n e a r f u t u r e t o i n s e r t e l e m e n t s i n t o Q a n d Rwhic h wi l l ha ve th e e ffec t o f in t ro du cin g (2 , j ) in to ~r12 Q)a n d ( 5, j ) i n 7r I2 (R). On th e o th e r h a n d , t h e i n p u t ( 2, 5 )m i g h t h a v e b e e n f a u l t y. I t c o u l d b e t h e c a s e t h a t t h e u s e ri n t e n d e d t o i n se r t s o m e o t h e r e le m e n t in t o P - - a n e l e m e n tw h o s e i n s e r ti o n w o u l d t r a n s f o r m a c o n s i s t e n t s t a t e i n t oa c o n s i s t e n t s t a te . T h e p o i n t is t h a t t h e s y s t e m w il ln o r m a l l y h a v e n o w a y o f r e s o l v i n g t h i s q u e s t i o n w i t h o u ti n t e r r o g a t i n g it s e n v i r o n m e n t ( p e r h a p s t h e u s e r w h o c r e -a t e d t h e i n c o n s i s t e n c y ) .

    T h e r e a r e , o f c o u r se , s e v e r a l p o ss ib l e wa y s i n wh ic h as y s t e m c a n d e t e c t i n c o n s i s t e n c i e s a n d r e s p o n d t o t h e m .I n o n e a p p r o a c h t h e s y s t e m c h e c k s fo r p o s si b le i n c o n s is t -e n c y w h e n e v e r a n i n s e r t io n , d e l e t i o n , o r k e y u p d a t e o c c u r s .N a t u r a l l y, s u c h c h e c k i n g wi ll s lo w t h e s e o p e r a t i o n s d o w n .I f a n i n c o n s i s t e n c y h a s b e e n g e n e r a t e d , d e t a i ls a r e l o g g e di n t e r n a l l y, a n d i f i t is n o t r e m e d i e d w i t h i n s o m e r e a s o n a b l e

    t i m e i n t e r v a l , e i t h e r t h e u s e r o r s o m e o n e r e s p o n s i b l e f o rt h e s e c u r i t y a n d i n t e g r i t y o f t h e d a t a i s n o t if i ed . A n o t h e ra p p r o a c h i s t o c o n d u c t c o n s i s t e n c y c h e c k i n g a s a b a t c ho p e r a t i o n o n c e a d a y o r le s s f r e q u e n t l y. I n p u t s c a u s i n g th ei n c o n s i s t e n c i e s w h i c h r e m a i n i n t h e d a t a b a n k s t a t e a tc h e c k i n g t i m e c a n b e t r a c k e d d o w n i f t h e s y s t e m m a i n -r a in s a j o u r n a l o f a l l s t a t e - c h a n g in g t r a n sa c t io n s . T h i sl a t t e r a p p r o a c h w o u l d c e r t a i n l y b e s u p e r i o r i f f e w n o n -t r a n s i t o r y i n c o n s is t e n c ie s o c c u r r e d .

    2.4 . SUMMARYI n S e c t i o n 1 a r e l a ti o n a l m o d e l o f d a t a i s p r o p o s e d a s a

    b a s is f o r p r o t e c t i n g u se r s o f f o r m a t t e d d a t a s y s t e m s f r o mt h e p o t e n t i a l l y d i s r u p t i v e c h a n g e s i n d a t a r e p r e s e n t a t i o n

    c a u s e d b y g r o w t h i n t h e d a t a b a n k a n d c h a n g e s i n t r a f f i c .A n o r m a l f o r m f o r th e t i m e - v a r y i n g c o l l e c ti o n o f re l a t io n -sh ip s i s i n t r o d u c e d .

    I n S e c t i o n 2 o p e r a t i o n s o n r e l a t i o n s a n d t w o t yr e d u n d a n c y a r e d e f i n e d a n d a p p l i e d t o t h e p r o bm a i n t a i n i n g t h e d a t a i n a c o n s i s t e n t s t a t e . T h i s i s bb e c o m e a s e r i o u s p r a c t i c a l p r o b l e m a s m o r e a n d mf e r e n t t y p e s o f d a t a a r e in t e g r a t e d t o g e t h e r in t o cd a t a b a n k s.

    M a n y q u e s t io n s a r e ra i s e d a n d l e ft u n a n s w e r ee x a m p l e , o n l y a f e w o f t h e m o r e i m p o r t a n t p r o p et h e d a t a s u b l a n g u a g e i n S e c t i o n 1 .4 a re m e n t i o n e d . t h e p u r e ly l i n g u i s t i c d e t a i l s o f su c h a l a n g u a g e ni m p l e m e n t a t i o n p r o b l e m s a r e d i s c u s s e d . N e v e r t h e lm a t e r i a l p r e s e n t e d s h o u l d b e a d e q u a t e f o r e x p es y s t e m s p r o g r a m m e r s t o v i s u a l i z e s e v e r a l a p p r o a ci s a ls o h o p e d t h a t t h i s p a p e r c a n c o n t r i b u t e t o g r e ac i si o n i n w o r k o n f o r m a t t e d d a t a s y s t e m s .

    Acknowledgment. I t w a s C . T. D a v i es o f I B M P ok e e p s i e w h o c o n v i n c e d t h e a u t h o r o f t h e n e e d f oi n d e p e n d e n c e i n f u t u r e i n f o r m a t i o n s y s t e m s . T h ew i s h e s t o t h a n k h i m a n d a l s o F. P. P a l e r m o , C . P. E . B . A l t m a n , a n d M . E . S e n k o o f t h e I B M S a n Js e a r c h L a b o r a t o r y f o r h e l p f u l d i s c u s s i o n s .

    RECEIVED SEPTEMBER, 1969;REVISED FEBRUARY, 1970

    R E F E R E N C E S

    1. CHILDS, D. L. Feasib i l ity o f a se t - theore t ica l da ta s t--a general structure based on a reconsti tuted defirelation. Proc. IF IP Cong., 1968, Nor th H olland PAmsterdam, p. 162-172.

    2. LEVEIN, R. E., AND MARON, M. E. A co mp uter s yinference execution and data retr ieval .Comm. ACM 111 (Nov. 1967), 715--721.

    3. BACHMAN, C. W. Software for rand om access procDatamation (Ap r. 1965), 36--41.

    4. M cG EE , W. C. Ge neralized file processing. InAnnual Rview in Automatic Programming 5, 13, Pergamon PNew York, 1969, pp. 77-149.

    5. Information Managem ent System/360, A pplication t ion M anual H20-0524-1. IBM C orp., White PlainsJuly 1968.

    6. G IS (Generalized In formatio n System), Applicationtion Man ual H 20-0574. IBM Corp., White Plains,1965.

    7. BLEIER, R .E . Trea t ing h ie rarch ica l da ta s t ruc tu resSDC t ime-shared da ta management sys tem (TProc. A CM 22nd Nat. Conf. , 1967, M DI PubliWayne, Pa., pp. 41---49.

    8. IDS R eference Ma nual GE 625/635, GE Inform. SPhe onix, Ariz., C PB 1093B, Feb . 1968.

    9. CHURCH, A. An Introduction to Mathematical Logic I. Pr in

    ton U. Press, P rinceton, N.J. , 1956.10. FELDMAN,J. A., AND ROVNEa, P .D . An Algol-basedative language. Sta nfo rd Artificial Intelligen ce RepAug. 1, 1968.

    [ ]

    Vo l u m e 1 3 / N u m b e r 6 / J u n e , 19 7 0 C o m m u n i c a t i o n s o f t h e ACCM 3