Making choices with data models and database ?· Nyerges GIS Database Primer GISDBP_chapter_1_v17.doc…

  • Published on
    19-Aug-2018

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

  • Nyerges GIS Database Primer GISDBP_chapter_1_v17.doc

    Chapter 1 Data Modeling

    Abstract

    This chapter starts by differentiating data, information, evidence, and knowledge. Choosing the most appropriate way to organize data within a GIS depends on choices made about data models and database models. In this chapter we learn about the conceptual, logical and physical levels of data models and database models as a way of understanding levels in data management software design and database design. Understanding data models is important for understanding the full range of data representation options we might have when we want to design and implement a database. A database model is developed as the basis for a particular database design, and is constrained by what data model we have chosen as the basis for developing various designs. Several data models are described in terms of the geospatial (abstract) data types that form the basis of those data models. The character of geospatial data types at the same time enable and constrain the kind of representations possible in GIS. It is the basis for out ability to derive information and assemble evidence that builds geographic knowledge. The chapter concludes by listing and describing nine steps that compose a database design process. This process can be shortened or lengthened depending on the complexity of the design problem under consideration by a GIS analyst. Database development is one of the most important activities in GIS work. Data modeling, or what is commonly called database design, is a beginning step in database development. Database implementation follows data modeling for database design; that is the implementation depends (is enabled and/or constrained by) the database software used. Even database management software must be designed with some ideas about what kinds of features of the world are to be represented in a database. No database management software can implement ALL feature representations and needs. Such software would always be in design mode. Limits and constraints, i.e., general nature of GIS applications to be performed, exist for all software. To gain a better sense of data modeling for database design it is important to distinguish data from information. If there were no distinction then software would be very difficult to develop and GIS databases would not be as useful. In this chapter we first make that distinction. We then differentiate data models from database models as a natural outcome from how databases relate to the software used to manipulate them. We then present a general database design process that can be used to design geodatabases one of the newest and most sophisticated types of GIS databases currently in use. 1.1 Data, Information, Evidence, and Knowledge A Comparison Data modeling deals with classes of data, and thus is really more about information categories. Some might even say that the data classes are about knowledge, as the categories often become the basis of how we think about GIS data representations. To gain a sense of data modeling, let us define some terms data, information, evidence, knowledge, and wisdom. Those terms are not always well understood in common practice, as in everyday language. Over the years many people have written about the relationship among data, information and knowledge in the context

    1-1

  • Nyerges GIS Database Primer GISDBP_chapter_1_v17.doc

    of information systems (Kent 1984). Defining the terms provides a clearer sense of their differences and relatedness to help with data modeling, as well as provide a basis for understanding how information products relate to knowledge in a broader sense. In a GIScience and systems context, Longley, Goodchild, Maguire, and Rhind (2001) have written about the relationships among all five of those terms (to which some have even added a sixth truth). The definitions that we provide below are based on interpretations of Longley, et al, and integrate Sayers (1984) geographic treatment of epistemology because GIS analysts work in contexts involving many perspectives. Data is/are raw observations (as in a measurement) of some reality, whether past, current, future, in a shared understanding of an organizational context. We typically value what we measure and we measure what we value that is, what is important enough to expend human resources in order to get data? Information is/are data placed in a context for use tells us something about a world we share. Geographic information is a fundamental basis of decision making, hence information needs to be transparent in groups if people are to share an understanding about a situation. Evidence is/are information that is corroborated and hence something we can use to make reasoned thought (argument) about the world. All professionals, whether they be doctors, lawyers, scientists, GIS analysts, etc. use evidence as a matter of routine in their professions to establish shared valid information in the professional community. Credible information is the basis of evidence. How we interpret evidence shapes how we gain knowledge. When we triangulate evidence we understand how multiple sources lead to robust knowledge development as the evidence re-enforces or contradicts what we come to know. Knowledge is the result of synthesizing enduring, credible and corroborated evidence. Knowledge enables us to interpret the world through new information, and of course, data. Knowledge about circumstances is what we use to interpret information and decide if we have gained new insight or not. It is what we use to determine whether information and/or data are useful or not. Wisdom - People who apply knowledge and never make mistakes are commonly characterized as attaining wisdom - if knowledge creation were only that easy. Taking one step after another to build and rebuild knowledge to gain a sense of insight about complex problems has thus far not been easy. Once people have a much more robust knowledge of inter-relationships about sustainability, then and only then will we move to this level of knowing. The purpose of elucidating the above five levels of knowing are meant to provide readers with a perspective that GIS is not just about data and databases, but extends through higher levels of knowing. Given all that has been written and researched about these relationships, most people would say there are many ways to interpret each of the data, information, evidence, and knowledge steps, and that wisdom is often elusive. Nonetheless, with the above distinctions in mind, this chapter presents a framework for understanding the choices to be made in data modeling, and makes use of a framework to distinguish data models and database models that underlie the development of information as the middle initial of GIS.

    1-2

  • Nyerges GIS Database Primer GISDBP_chapter_1_v17.doc

    Data modeling is a process of creating database designs. Data models and database models are both used to create and implement database designs. We can differentiate data models and database models in terms of the level of abstraction in a data modeling language. A database design process creates several levels of database descriptions, some oriented for human communication, while others are oriented to computer-based computation. Conceptual, logical, physical have been used to differentiate levels of data modeling abstraction. There is also a difference between a language that can used to describe object classes and the use of that same language to specify the outcomes of in terms of a meaningful set of object classes. The former is called a data model and the latter is called a database model. The second (i.e., middle column) in Table 1.1 lists different levels of abstraction of data models. The third column (right column) lists several levels of abstraction for database models. Table 1.1 Differentiating Data Model and Database Model at Three Levels of Abstraction

    Schema Language Specifications Levels of Abstraction

    Schema Language itself, i.e. a data model ala Edgar Codd circa 1979

    Result of Schema Language use, i.e. a database model ala James Martin circa 1976

    Conceptual - Informal English narrative as objective class description framework; graphical depiction of points, lines, polygons.

    Specific implementation of object class description framework, e.g., transportation improvement decision making with regard to a specific track

    Conceptual - Formal Unified Modeling Language (UML) as for example used in the MS Visio software.

    Specific implementation of UML for a particular application, e.g., transportation improvement decision making whereby any kind of logical data model could be used to implement a data base.

    Logical Geodatabase data model (GDBDM) implemented in the MS SQL 2005 relational database management system. A set definition of constructs stored using spatial and attribute data types.

    Geodatabase database model (GDBDBM) is a specific implementation of the geodatabase data model for a particular application, e.g., an information base to support transportation improvement decision making

    Physical MS SQL 2005 implemented on the Windows Server 2003 operating system using a well-specified set of data types for all data fields in all tables (relations) in the Geodatabase

    The improvement program database implemented in MS SQL 2005 implemented on the Windows Server 2003 operating system

    No other terms have been proposed to clarify this important nuance, even the term object class has some difficulty when dealing with databases and programming languages. Nonetheless, one important thing to remember is that the database model is still an object class description it is not the database per se. The database model makes use of a particular schema language to specify certain object classes that will be used in the creation of a database. A schema language is a language for describing databases (some have called it a data description language). As mentioned in the two-right hand columns of Table 1.1 there is a difference between formulating a schema language (the basis of a data model in the middle column) and a using that same schema language to express a database model (the right-hand column).

    1-3

  • Nyerges GIS Database Primer GISDBP_chapter_1_v17.doc

    Thus, one way to think of the difference between a data model and a database model at the conceptual level of expression is to think of the basics of a natural language such as English (the data model) and the use of that language to create a story about the world (the database model). The basics of the English language are constructs such as nouns, verbs, adjectives, adverbs, prepositions, etcetera, plus the rules those constructs use (similar to a data model language). When we put the constructs and the rules of English to use, we can create a story about a particular place at particular time (as for example, a story about a communitys interest in land use and transportation change). However, neither of these is an actual conversation. A natural language like English provides us an ability to communicate. One kind of communication is telling a story by making use of a language. But, a story has not been told as yet. When a person actually tells the story, a person builds a database of verbal utterances and/or written words. When an analyst tells a GIS story, the analyst creates a geospatial database and then proceeds to elaborate on the use of that database through a workflow process as described in chapter 3. So, data models as languages are formed using basic constructs, operations, and constraint rules. Such languages provide us with the capabilities to develop specific database designs. When we create the database designs they are like a type of story about the world that is, we limit ourselves to certain constructs (categories for data) together with some potential operations on data, to tell a story through a template for a database representation. We thus include some feature categories of points, lines, and polygons, but we also exclude others. It depends on what we want to do (what kind of data analysis or display might we perform) with the story. The constructs of a data model and how we put them to use are often referred to as metadata. Metadata is information about data. It describes the particular constructs of a data model and how we make use of them. Data category definitions, need to be meaningful interpretations in order to be able to model data. However, there are at least three levels of metadata (data construct descriptions and meaningful interpretations) in a data modeling context as indicated in Table 1.1. So let us unpack those three levels for each of the data model and database model interpretations. 1.2 Data Models the Core of GIS Data Management Just before we get started, consider for yourself which of the three levels of conceptual, logical, and physical is more abstract, i.e., is more general or more concrete for you. The conceptual level is about meaning and interpretation of data categories. The physical level is about the bits and bytes of storing the data. For some, the conceptual model is more abstract; while for others, the physical model is more abstract. From the point of view of a database design specialist, the general to specific detail proceeds from conceptual, through logical, through to physical. With that in mind we tackle the levels in the order of abstractness. A conceptual data model organizes and communicates the meaning of data categories in terms of object (entity) classes, attributes and (potential) relationships. This interpretation of the term data model is often credited to James Martin (Martin 1976), a world-renowned information systems consultant, having authored some 25 books as of the mid 1980s. Many of these books described graphical languages for specifying databases at an information level of design. That level was called the infological level by Sundgren (1975, distinguishing it from the datalogical level,

    1-4

  • Nyerges GIS Database Primer GISDBP_chapter_1_v17.doc

    to highlight the difference between information and data. Information was defined as data placed in a meaningful context, i.e., an interpretation of data perhaps given by a definition or perhaps by what we expect others to know a common knowledge about the world. A major challenge for researchers has been the development of a computable conceptual data model. The first language developed to approach a level of rigor for potential computability was the entity-relationship model (Chen 1976). Although the language was not entirely computable, it was credited as the first formal language to be used widely for database design, as many software systems were designed and implemented based on that model. In addition, other researchers worked on data models that became know as semantic data m...

Recommended

View more >