Upload
tranquynh
View
227
Download
0
Embed Size (px)
Citation preview
ATTRIBUTES SANITIZATION IN OBJECT-ORIENTED DESIGN TO IMPROVE
DATABASE STRUCTURE
MOHD ZAINURI BIN SARINGAT
A thesis submitted in
fulfillment of the requirement for the award of the
Doctor of Philosophy in Information Technology
Faculty of Computer Science and Information Technology
Universiti Tun Hussein Onn Malaysia
JUNE 2014
v
ABSTRACT
Modelling using the Entity Relationship Model was introduced more than thirty
years ago. Until late 1990’s, object-oriented introduced class diagram. However,
designing a good database is still a serious issue. Some of the issues are very difficult
to handle such as consistency checking between system design and database design,
redundancy of data, mismatch of the data structure with the user’s needs in the
database and unused data in the database. In this thesis, a new technique called
UInData is introduced as an alternative method for designing database based on
attribute sanitization. The proposed technique will extract class behaviour from class
diagram to produce schema table which will then be compared with the user interface
to normalize the structure. Attributes sanitization is introduced to remove the unused
attributes and to provide final schema table. An experiment using three case studies
has shown that some improvements of designing optimal database have been
achieved in term of data sanitization and data accessibility. Attribute sanitization was
applied in LAS, SPKS and MPBP database. Data sanitizations have removed 2.2%,
14.1% and 24.5% from defined attributes which are not used by user interface.
Meanwhile the results shown in data accessibility for these three cases have shown
that LAS was reduced by 50%, SPKS have not reducing of data accessibility and
MPBP was reduced by 20% when UInData is used as compared to using ordinary
object-oriented. Therefore, the UInData is a good alternative technique to improve
database structure.
vi
ABSTRAK
Permodelan mengunakan Model Hubungan Entiti telah diperkenalkan sejak lebih
tiga puluh tahun. Sehingga lewat 1990-an, kaedah berorientasikan objek
memperkenalkan class diagram. Walaubagaimanapun, proses mereka bentuk
pangkalan data yang baik masih menjadi isu yang besar. Antara isu yang dikenalpasti
adalah sukar mempastikan keserasian diantara rekabentuk sistem dan pangkalan data,
pertindanan data, stuktur pengkalan data tidak memenuhi kehendak pengguna dan
atribut yang tidak digunakan dalam pengkalan data. Dalam tesis ini, satu teknik baru
yang dinamakan UInData telah diperkenalkan sebagai kaedah alternatif untuk
menghasilkan pangkalan data dengan membuang atribut. Teknik ini akan
mengambilkira ciri-ciri kelas bagi menghasilkan jadual skema dan
membandingkannya dengan antara muka penguna bagi melaksanakan struktur
pernormalan. Pembuangan atribut diperkenalkan bagi membuang atribut yang tidak
digunakan bagi menghasilkan skema jadual yang terkini. Pengujian terhadap tiga
kajian kes mendapati penambahbaikan rekabentuk pangkalan data telah dicapai
dalam penghapusan data dan capaian data. Penghapusan attribut telah diaplikasikan
terhadap pangkalan data LAS, SPKS dan MPBP. Penghapusan data telah memadam
2.2%, 14.1% dan 24.5% dari attribut yang dikenalpasti bilamana ia tidak digunakan
dalam antaramuka pengguna. Sementara itu dalam capaian data bagi ketiga-tiga
kajian kes menunjukkan capaian data bagi LAS telah berkurangan sebanyak 50%,
SPKS tiada perbezaan dalam kadar capian data dan MPBP telah berkurangan
sebanyak 20%. Ini menunjukkan UinData adalah salah satu teknik yang berguna
dalam menambahbaik rekabentuk pangkalan data.
vii
CONTENTS
TITLE i
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
ABSTRAK vi
CONTENTS vii
LIST OF PUBLICATIONS xiii
LIST OF TABLES xiv
LIST OF FIGURES xvi
LIST OF SYMBOLS AND ABBREVIATIONS xviii
LIST OF APPENDICES xx
CHAPTER 1 INTRODUCTION 1
1.1 Background of Study 1
1.2 Problem Statement 5
1.3 Aim 7
1.4 Objectives of study 8
1.5 Scope and Limitation 8
1.6 Report Organization 9
viii
CHAPTER 2 LITERATURE REVIEW 11
2.1 Introduction 11
2.2 Overview of Data 11
2.2.1 Data Sanitization 12
2.2.2 Attribute Sanitization 13
2.3 Structured Approach 16
2.3.1 Data Flow Diagram 16
2.3.2 Entity Relationship Diagram 16
2.4 Object-Oriented Approach 17
2.4.1 Class Diagram 19
2.4.2 User Interface 21
2.5 Normalization Technique 22
2.6 Database Design 23
2.6.1 Database Design Using Structured
Technique 24
2.6.2 Design Database using Object-
oriented Technique 26
2.6.3 Optimal Database 27
2.7 The Issues in Designing Database 28
2.8 Comparison Structure and Object-oriented
Techniques 31
2.9 Summary 33
CHAPTER 3 RESEARCH METHODOLOGY 34
3.1 Introduction 34
3.2 Methodology 34
ix
3.3 The Flow Chart of Research 35
3.4 Attribute Sanitization Technique 37
3.5 Validation of Database Design 40
3.6 Summary 41
CHAPTER 4 FORMALIZATION OF RULES IN DESIGINING
OPTIMAL DATABASE 43
4.1 Introduction 43
4.2 Attribute Sanitization 43
4.2.1 Attribute Sanitization in Class
Diagram 44
4.2.2 Generating Schema Table 44
4.2.3 User Interface Normal Form (UINF) 45
4.2.4 Attribute Sanitization in User
Interface 45
4.2.5 Optimal Schema Table 46
4.3 Rules for Designing Database Using Attribute
Sanitization Technique 46
4.3.1 Rules of Attribute Sanitization in a
Class Diagram 46
4.3.2 Rules of Generating Schema Table 47
4.3.3 Rules of User Interface Normal Form
(UINF) 48
4.3.4 Rule of Attribute Sanitization in User
Interface Normal Form (UINF) 49
4.4 Algorithm for Designing Database 49
x
4.4.1 Algorithm for Attribute Sanitization
in Class Diagram 49
4.4.2 Algorithm for Generating Schema
Table 50
4.4.3 Algorithm for User Interface Normal
Form (UINF) 51
4.4.4 Algorithm for Attribute Sanitization
in User Interface Normal Form
(UINF) 51
4.5 Formalization for Designing Database 52
4.5.1 Formal Model of Class Diagram 53
4.5.2 Formal Model in Schema Table 55
4.5.3 Formal Model in User Interface
Normal Form (UINF) 55
4.6 Consistency Rules for Designing Database
Using Attribute Sanitization 56
4.6.1 Consistency Rules between Class
Diagram and Schema Table 57
4.6.2 Consistency Rules between Schema
Table and User Interface 62
4.7 Summary 63
CHAPTER 5 IMPLEMENTATION OF UInData 64
5.1 Introduction 64
5.2 Environment for UInData Tool 64
5.3 Work Instruction of Create Schema Table 67
xi
5.4 Work Instruction of Create Optimal Schema
Table 70
5.5 Summary 75
CHAPTER 6 EVALUATION OF UInData 76
6.1 Introduction 76
6.2 Lecturer Assessment System (LAS) 79
6.2.1 Designing LAS Database Using
Object-oriented Technique 81
6.2.2 Designing LAS Database Using
UInData 81
6.2.3 Results Comparison on LAS
Database Design 88
6.3 Sistem Pengurusan Klinik Sejahtera (SPKS) 90
6.3.1 Designing SPKS Database Using
Object-oriented Technique 91
6.3.2 Designing SPKS Database Using
UInData 92
6.3.3 Results Comparison on SPKS
Database Design 99
6.4 Sistem Pengurusan Kompaun Majlis
Perbandaran Batu Pahat (MPBP) 101
6.4.1 Designing MPBP Database Using
Object-oriented Technique 102
6.4.2 Designing MPBP Database Using
UInData 103
xii
6.4.3 Results Comparison on MPBP
Database Design 110
6.5 Summary 113
CHAPTER 7 CONCLUSIONS AND FUTURE WORKS 114
7.1 Introduction 114
7.2 Summary of Contributions 114
7.2.1 Attribute Sanitization Technique 115
7.2.2 Formalization of Rules for Attributes
Sanitization 116
7.2.3 UInData Tool 117
7.3 Future Work 117
7.4 Summary 118
REFERENCES 119
APPENDIX 127
VITA 164
xiii
LIST OF PUBLICATIONS
Journals:
1. Mohd Zainuri Saringat, Rosziati Ibrahim, Noraini Ibrahim and Tutut
Herawan. On Data Sanitization in Designing Database From Class Diagram.
Journal of Computing, Vol. 3, Issue 4, April 2011.
2. Mohd Zainuri Saringat, Rosziati Ibrahim & Tutut Herawan. A Proposal for
Constructing Relational Database from Class Diagram. Canadian Journal:
Computer and Information Science, Vol. 3, No. 2, pp. 38-46, 2010.
Proceedings:
1. Mohd Zainuri Saringat, Rosziati Ibrahim, Noraini Ibrahim and Adekunle
Adeshina. Constructing Schema Table from Class Diagram. 2nd
World
Conference on Information Technology (WCIT 2011). AWERProcedia
Information Technology and Computer Science, Vol. 1, May 2012, pp 699-
708.
2. Mohd Zainuri Saringat, Rosziati Ibrahim, Noraini Ibrahim and Tutut
Herawan. On Database Normalization using User Interface Normal Form.
2010 International Conference On Intelligent Computing, Changsha, China.
Lecture Notes in Computer Science, 2010, Springer Verlag Berlin
Heidelberg, pp 571-578.
xiv
LIST OF TABLES
2.1 Current Designing Database Techniques 29
2.2 Comparisons of Structure and Object-oriented Techniques 32
4.1 Overview of Research Elements and Formal Logical
Specification 53
4.2 Mapping Between Consistency Rules and Formal Logical
Specification 57
6.1 LAS-OOT Schema Table 81
6.2 LAS-UInData Schema Table 82
6.3 UINF in Table lasStudentAttendance 83
6.4 UINF in Table lasSubjectMark 84
6.5 UINF in Table lasStudentMark 85
6.6 UINF in Table lasSubjectPercentage 86
6.7 Result on LAS Database Design 88
6.8 LAS Matrix Accessing Table 89
6.9 SPKS-OOT Schema Table 92
6.10 SPKS-UInData Schema Table 93
6.11 UINF in Table pekerja 94
6.12 UINF in Table temujanji 94
6.13 UINF in Table doktor 95
6.14 UINF in Table pesakit 96
6.15 UINF in Table ubat 96
6.16 UINF in Table stokUbat 97
6.17 UINF in Table rawatan 97
6.18 Result on SPKS Database Design 99
6.19 SPKS Matrix Accessing Table 100
6.20 MPBP-OOT Schema Table 103
xv
6.21 MPBP-UInData Schema Table 104
6.22 UINF in Table staf 105
6.23 UINF in Table letakKereta 106
6.24 UINF in Table jawatan 107
6.25 UINF in Table undangUndang 107
6.26 UINF in Table kesalahan 108
6.27 UINF in Table penguatkuasaan 109
6.28 Result on MPBP Database Design 110
6.27 MPBP Matrix Accessing Table 112
xvi
LIST OF FIGURES
1.1 System Design: Specify How System Works 2
1.2 The System Design Components 3
2.1 UML Class Diagram 20
2.2 Structured Technique in Designing Database 25
2.3 Object-oriented Technique in Designing Database 27
3.1 The Flow Chart Research 36
3.2 Idea of the Research 38
3.3 The Research Framework 39
4.1 Attribute Sanitization in Class Diagram Algorithm 50
4.2 Relationships Algorithm 50
4.3 User Interface Normal Form Algorithm 51
4.4 Attribute Sanitization in User Interface Algorithm 52
5.1 First Page of Visual Studio 2010 65
5.2 The C# Code Editor 65
5.3 Implementing the LAS Class Diagram 67
5.4 Sample Error in Validate Process 68
5.5 Menu Options in Generate Schema Table 68
5.6 Text File for LAS Schema Table 69
5.7 Start Page 70
5.8 Opening Text File 70
5.9 Display Tables and Attributes 71
5.10 Marking Intersection between Tables and User Interfaces 71
5.11 Calculating Intersections 72
5.12 Designing Table Structure 73
5.13 Optimal Table Structure 73
5.14 Text File of Optimal Table Structure 74
5.15 Text File of Attribute Sanitization 74
xvii
6.1 LAS Class Diagram 80
6.2 LAS Optima Schema Table 87
6.3 SPKS Class Diagram 91
6.4 SPKS Optima Schema Table 98
6.5 MPBP Class Diagram 102
6.6 MPBP Optima Schema Table 110
xviii
LIST OF SYMBOLS AND ABBREVIATIONS
1NF - First Normal Form
2NF - Second Normal Form
3NF - Third Normal Form
4NF - Fourth Normal Form
ANSI/IEEE - American National Standards Institute/ Institute of Electrical
and Electronics Engineers
AS - Attributes Sanitization
CD - Class Diagram
DBA - Database Administrator
DBMS - Database Management System
DFD - Data Flow Diagram
ERD - Entity Relationship Diagram
GUI - Graphical User Interface
K - Primary Key
KDD - Knowledge Discovery in Database
LAS - Lecturer Accessing System
MPBP - Pengurusan Kompaun Majlis Perbandaran Batu Pahat
OCL - Object Constraint Language
OOD - Object-oriented Design
OOT - Object-oriented Technique
S - Sanitization
SMP - Student Information System
SPKS - Sistem Pengurusan Klinik Sejahtera
SRS - System Specification Document
SSADM - Structure System Analysis and Design
UInData - Attributes Sanitization in Object-oriented Design to Improve
Database Structure
xix
UML - Unified Modelling Language
UINF - User Interface Normal Form
UTHM - Universiti Tun Hussein Onn Malaysia
xx
LIST OF APPENDICES
APPENDIX TITLE PAGE
A Listing A: Part of Source Code for 127
Generating Schema Table from
Class Diagram
B Text File of Schema Table from Class 149
Diagram
C Listing B: Part of Source for Designing 151
Optimal Schema Table
1 CHAPTER 1
INTRODUCTION
1.1 Background of Study
Living in this modern era is interesting but complex. Since the early 1980’s, when
computer gains its popularity, the human social interaction dramatically changes.
Furthermore, with the increase in the number of human population in the world, life
has become more challenging. People are migrating from their country to other
countries for many reasons such as study, work, travel and business. This
phenomenon encourages the establishment of good platform for communications
through which people can interact to share information, do banking transactions, pay
bills and purchase goods. All these involve huge data. The data must always be
available and free from errors. Incorrect data will lead to wrong information and
inaccurate decision making [1, 2]. The characteristics of good quality information are
that the information must be accurate, available and accessible right on time and in
right form [3]. To ensure this, good technique in designing database must be well
defined. Another reason for a good database design is to increase the speed of
accessing data. Bad database design can slow down the data transfer. In a worst
condition, the data will not be accessible and no information can be provided. In
other words, the database must be precise and concise. Using those applicable
systems, there are a series of important data that must be stored for future use. The
data can also be used to generate information such as monthly reports, bills and
receipts. The place to store all these data is known as database. However, current
technique focuses on designing the database structure based on user requirement, but
there is no consent whether the data store is used or not.
2
Before the computer system was introduced, people used files to keep all
records. The same thing happens with the filing systems; the data must be well-
organised to avoid misplace, error, wasting of time and redundancy. Designing the
database is closely related to user needs. It involves a long process and starting to
identify when the user requests for the system.
In producing a system or software application, there are four fundamental
process activities: software specification, software design and implementation,
software validation and software evolution [1-3]. Software specification is a process
of writing the user requirements to get system specification [4, 5]. People can use
existing standards technique to write documents such as using the natural technique
as proposed by Heninger [6], the US Department of Defence and the IEEE. Another
technique in writing system documentation that has been discussed by Pohl and
Rupp [5] includes the famous and widely used technique, IEEE/ANSI 830-1998.
Normally, the System Specification Document (SRS) will be produced as the
deliverable for this phase to verify the user requirements with the system developer.
However, in an established company, they usually use their own form as the standard
way to show the user requirements. These user requirements will be used in the next
phase to the software design and implementation. The system design is part of the
software fundamental process as illustrated in Figure 1.1.
Figure 1.1: System Design: Specify How System Works
3
After that, the process of collecting the requirements is completed. Modelling
or designing a software application in software design and developing phase is an
important part in order to make sure that the development is congruent with the
specification. A model can specify, visualise and document models of software
system including their structure and design. During system design, the software
engineer is needed to complete three main tasks, which are designing the system
logic, user interface and database. These system logic, user interface and database
will be used to counter back with the users before starting the coding. The counter
back process assures that business functionality is complete and correct, end-user
needs are met and program design supports the requirements for scalability,
robustness, security and extendibility. The components involved in system design are
illustrated as in Figure 1.2.
Figure 1.2: The System Design Components
Designing database is very important in system development [7]. It involves
storing data and generating information for a long term process. Currently, there are
two well-known techniques in designing database, which are the structured technique
and the object-oriented technique. Even though the techniques have been introduced
for more than 30 years, there are still some issues in designing database. One of the
issues is designing optimal tables.
4
This thesis is a study that defines an alternative approach in designing
optimal database system based on object-oriented concept. Optimal database means
all defining attributes table in database are used. The discussion is made to present
the technique on how attributes sanitization is applied in proposed technique which is
not implemented in the current defining database technique, whether using the
structure or object-oriented technique. The discussion includes process flow of the
proposed technique, rules, example of implementation and the comparisons with
current techniques to show that this technique is better than previous techniques.
Normally, in object-oriented approaches, Unified Modelling Language
(UML) is used to represent the system requirements that consist of 13 notations such
as use case, sequence diagram and class diagram [8-10]. This study has been done to
consider all the features and the constraints in the class diagram to be used in
producing the first schema table. Attributes sanitization is a technique to remove
duplicated class or attributes in constructing the first schema table. Then, this schema
table will be compared with the user interface to apply the second step which is to
normalize the structure based on user interfaces. Upon completion of the process of
normalization in the structure of schema table, the attributes sanitization technique is
again applied to remove unused data in producing the final schema table. Then, the
schema table is available to create the structure to implement the optimal database.
This chapter presents the background of this study including the related issues, aim,
research objectives, scope and the limitation of study and lastly the organization of
the thesis.
The third process in system design as illustrated in Figure 1.1 is software
validation. Software validation was involved with verification and validation
processes are used to ensure the development product are confirm to the
requirements [11, 12]. The last process in system design is software evaluation. The
evaluation is involved with the activities improvement the system after deliver to the
used. Both last processes was not involved in this research and not used in verified
the attributes sanitization in designing object-oriented to improve database design.
5
1.2 Problem Statement
System application consists of software design and database design. Software design
will show the business process of the system. Software design will do the system
logical and sketch the user interface. Another activity is database design to describe
the flow of data involving in the system [13]. We cannot deny that software design
and database design are tightly coupled. However, until now, both these designs are
not congruent. Furthermore, the practitioners have distinguished these two activities
in many ways.
In universities, students learn different subjects. System analysis and design
subject represents software design. Another subject is database system or database
design. This phenomenon also happens in organizations. Most of the companies
break up into two positions, which are software engineers or analyst programmers for
developing the system and database administrators (DBA) for managing the
database. This situation is supported with strong reasons such as for security or to
focus on particular jobs that can increase the quality of tasks. However, to break the
tasks into two groups, there must be mutual understanding in order to get high
quality system.
The situation is more chaotic when the development approaches in software
development are broken into two techniques, which are the structured and the object-
oriented approaches. The structured approach is also known as functional, procedural
and imperative. This approach consists of two techniques, which are Data Flow
Diagram (DFD) for software design [13-15] and Entity Relationship Diagram (ERD),
which are used for data modelling to assist during the development of the database
[16, 17]. The approach was popular in the 1970’s. Some programming languages that
apply structures or procedures concept are Pascal, C and COBOL. The relational data
model concept was introduced by E. F. Codd in 1970 [18]. It is a popular model and
easy to implement. Entity Relationship (ER) diagram is a famous model to present
the relation of the data [19]. Furthermore, Codd introduced normalization technique
to avoid data redundancy [20]. He introduces three techniques of normal form to
ensure that the database design is free from anomalies. However, the result of this
technique has decomposed into too many tables.
6
The second approach is object-oriented. This technique, which uses Unified
Modelling Language (UML), consists of various notations such as use case, sequence
diagram and class diagram. Normally, during system analysis and design, the
developer will produce use case diagram, state diagram and sequence diagram to
confirm the understanding of the developer with the users regarding the system [5,
21, 22]. The class diagram is produced to show the interaction among the entities. In
class diagram, it will show the behaviours of the class such as generalization,
association and multiplicity. The class diagram also contains details about the class
such as the attributes, name and data type [23]. PHP, C+, C# and Java are among the
programming languages that support object-oriented concept. However, the UML
does not have specific notation for database modelling. Normally, class diagram is
used to design the database because these diagrams represent the semantic data [1].
In current practice for software development using object-oriented methodology,
there are two famous ways on how people establish the database from designing the
problem domain. The first technique is by transferring the class diagram to the
appropriate tables after considering the class behaviour. Objects of the class are used
without any translation between data structures in the design system and the program
data structures [24]. This object-oriented approach seems to be a very attractive
solution for modelling system based on object in system design and database design.
It can be more meaningful especially if it is combined with the object-oriented
programming technique and applied in programming language to manage persistence
object. This technique gains its popularity when many references such as books and
papers have been published. However, this system is not mature. In addition, it seems
to lead to data redundancy in the object-oriented database design. Many researchers
have given suggestions to consider the relationships principle in data model to avoid
redundancy [25-28]. Since this technique is very young and not mature, further
researches have to be done.
The second technique, which is a famous practice in object-oriented software
development process, is hybrid technique. This technique allows the software
engineer to use different approaches in designing the software. Software engineers
use object-oriented notation to represent the system design by producing the
diagrams such as use case, state diagram, sequence diagram and class diagram. Then,
they use Entity Relationship Diagram (ERD), which is a relational data model to
build the database design and Relation Database Management System (RDBMS) as
7
the repository [2, 16]. Developer will use normalisation technique to reduce
redundancy. The developer combines the object-oriented technique in system design,
while structured approach is used in database design. This object-oriented technique
is workable to produce the system. However, there is lack of consistency checking
from system design to database design. The consistency checking cannot be applied
from design system to design database. Furthermore, the possibility to lose data
while transferring diagram from object to ERD is very high. This technique is also
lacking references because most of the references discuss structure technique or
object-oriented technique.
Both structure technique and object-oriented technique are commonly used
by the software developers. However, there was no consistency checking applied
from system design to database design. Without consistency checking, it will lead to
storing useless data in the database.
Therefore, in this research, an alternative technique to transfer the design
system to database design based on object-oriented approach is identified. The class
diagram produced in the system design will be used to create the schema table. This
schema table will be compared with the user interface for normalization process.
Lastly, attributes sanitization process will be applied to produce an optimal database
structure. Optimal database structure referred to use of defining attributes in data
model. This technique will apply continuous process from system design until the
creation of database. Furthermore, the normalization technique using interface will
ensure that only dedicated data are stored in database based on the user’s needs. The
technique will remove the passive data to avoid the process of inserting and keeping
the data. Furthermore, the technique will provide a platform to consistently check the
system design and database design. On the other hand, this technique will provide
better database compared to the current techniques. This technique will be
compromised with both fundamental techniques to resolve the drawbacks.
1.3 Aim
The outcome of this research is to define the appropriate technique to produce the
optimal database from the object-oriented design. The optimal database is achieved
8
when all defining attributes in schema table are used in displaying the user interface.
The result must show the improvement in terms of design and storage consumption.
Well-designed technique will clearly show the flow in software development
process. It starts from using defining class diagram until producing schema table.
This research is primarily concerned with how efficiently the new alternative
algorithms solve software development problem. It is hoped that this research would
encourage the use of object-oriented technique in software development process
including software specification and software design by using the Unified Modelling
Language (UML) as a tool to draw notation to show that the class diagram produce
in system design is harmonious with the database design.
1.4 Objectives of study
The objectives of this research are as follows:
i. To design the framework for producing an improved database structure based
on attribute sanitization from class diagram.
ii. To formalize the rules for producing an improved database structure based on
attribute sanitization from class diagram.
iii. To develop the tool based on (ii) in getting an improved database structure.
1.5 Scope and Limitation
The followings are the scope of this research:
i. Schema Table Focus on Class Diagram
The class diagram is used as the base diagram to produce optimal schema table.
The class diagram was considered in designing database because it contained
semantic rules and represents data model [1, 5, 21, 23, 24, 29-31].
ii. Optimal Database
In this research, the optimal database is considered achieved when the structure
of schema table is used in the user interface. The defining attributes that are not
9
used by the user interface are considered as not useful and should be removed
[32-34].
iii. Formalization of the rules
The entire transferring of rules from class diagram to schema table and
comparing schema table with user interface is specified using mathematics
such as logic, sets, and quantifiers in order to prove the concept of the ideas
[35, 36].
iv. Proving the consistency rules
Two ways will be used to prove the technique in producing schema table from
class diagram. First is using the defining rules. The idea was transferred to the
rules as will be described in Chapter 4. The rules are defined to prove it
consistent. Then, the rules are implemented in the proposed tool. Developing
an application is done to prove that the research idea is workable. The software
is developed using Microsoft Visual Studio 2010 to apply the rules in
producing the schema table and the detailed discussion about the tool is in
Chapter 5. The second technique is by using a selective existing system, which
is chosen to verify that the technique follows the defining rules as discussed in
Chapter 6. The discussion compares the result between the research technique
and object-oriented technique with normalize technique.
1.6 Report Organization
The next chapter deals with related work and discusses features of the existing
system design algorithm.
Chapter 2 also discusses current issues in designing database. There are two
techniques in designing database, namely the structured technique and the object-
oriented technique. The structured technique is defined as the database that is based
on two diagrams, which are data flow diagram and entity relationship diagram. On
the other hand, the object oriented is introduced nearly 30 years after the structured
technique has been introduced. The object-oriented is used as class diagram to
10
produce structure of database. Chapter 2 elaborates further regarding the structured
and object-oriented techniques.
Next, Chapter 3 illustrates the methodology used for the proposed Attributes
Sanitization in Object-oriented Design for Relational Database System. This chapter
will give clear picture about the research framework. The discussion details up the
process involved in creating schema table from class diagram and in producing
optimal schema table.
Then, the process flow is transferred to formalize the rule in Chapter 4. Four
primary topics, which are the rules, algorithm, formalization and consistency rules, in
creating optimal database schema from class diagram will be discussed.
It is followed by discussion on the tool in Chapter 5. To ensure that the
defining process in Chapter 3 and defining rules in Chapter 4 are workable, the tool
has been created using Visual Studio 2010 using C# language. Chapter 5 will give an
example of Lecture Assessment System (LAS) to show the process in producing
schema table using tools.
In Chapter 6, the manual implementation in the process of designing database
design using object-oriented technique and the research technique will be discussed.
Then, a comparison using LAS will be analysed and discussed. In Chapter 5 and
Chapter 6, the application of the process of attributes sanitization in designing
optimal database system will also be shown.
Lastly, the conclusion and the future work will be discussed in Chapter 7.
2 CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
This chapter covers the related and pertinent issues to this research. The literature
review will be discussing the spectrum of research done in this area for supporting
the research, specifically to find the gaps in producing schema table or designing the
database structure. There are two techniques in designing database: structured
technique and object-oriented technique. Structured technique is defined as the
database is based on two diagrams, which are data flow diagram and entity
relationship diagram. On the other hand, object-oriented used class diagram to
produce the structure of database. This chapter will start explaining the differences
between the data and attribute as discussion is based in database design, before
discussing the techniques of designing database.
2.2 Overview of Data
Data is known as any facts that contain numbers or texts. It is a raw fact and need to
be processed to produce information [37-39]. Data was started identify in
requirement phase when studies the business process. Eliciting the requirement will
show the flow of data involved in the system. Context diagram and data flow
diagram are used in structured technique to illustrate this business process.
Meanwhile in object-oriented, use case and sequence diagram are commonly used.
Data is an element that describe things, events, activities and transaction [39, 40].
12
The next phase is design phase where data was analysed and organized as the system
need.
In this phase, data model was established to show the conceptual structure [7,
41]. There are three basic elements used in the data modelling language which are
entities, attributes and relationships. Class diagram or entity relationship diagram is
an example of data model which is frequently used. In the design phase, each entity
was described in detail. Entity is the basic symbol and depicted by a rectangle. Entity
is a person, place, thing or event about which data is collected. A research done by
Yangbo Xu et al. [42] showed how to extract table properties from the web to store
as entity attributes. The entity name was taken from the table name. Each entity has
name and details about what it is representing is called attribute. An attribute is a
property of an entity and each entity may have many attributes. Each attribute will
hold value and is called data. The data value can be changed but once the attribute is
removed, the data value will be gone. In designing phase, Yuto Nakamura et al. [43]
are suggested to use similarity words in giving the name for entity and attributes with
the daily words used. Their research has shown that higher validity and
understandability of the diagram if used same structure of class diagram.
2.2.1 Data Sanitization
Data sanitization is the process of making sensitive information in non-production
database safe for wider visibility. The data should be sanitized or removed in order to
protect valuable business information [44-46]. Data sanitization is also referred to as
the process of removing resistant or worthless data. Some researchers use the terms
data cleansing or data reduction [47]. The objectives of doing data sanitization are to
get clean unused data and to produce quality data. Moreover, sometimes data
sanitization is used to hide important data [46]. The technique can be applied before
or after the data is stored in the database.
Formerly, database designer produces database structure for the system
operation to fulfil user needs. The term data sanitization is not familiar in database
design. However, the concept has been hidden applied since relational data model
was introduced by Codd in 1970 [20]. To avoid redundancy in the tables, Codd
13
introduced normalization technique. Codd focused on data redundancy but ignored
the usefulness of the data in the database.
The term data sanitization is formerly used in Knowledge Discovery in
Database (KDD). Data mining is one of the activities in the process to discover
previously ambiguous, duplicated and scattered data in database to define the
technique to mine the data for gaining useful information. Many techniques have
been proposed to mine the data including classification, clustering, association rule
mining using rough set and neural network [47, 48]. Then rough set and neural
network was improved by soft set theory that introduced by Molodtsov [49]. Other
researchers have proposed the improvement techniques in soft set theory such as
Herawan who proposes the alternative way to consider multi-value attributes
reduction in information system [50].
Besides that, data sanitization concepts have also been introduced in other
fields such as network data. Bishop et al [46] propose the idea of removing the
selected information such as password and information that can be identified by
individuals to hide sensitive data from passing through the network. Another
research done by Defit [47] uses data cleansing technique to identify and fill the
missing values in Jakarta Stock Exchange. Defit [47] combines several intelligent
techniques, which are rough set, neural network, knowledge based and statistic to
claim that the techniques offer better results in handling uncertain and missing value.
Literature study found that data sanitization or data cleansing was a very
popular technique in removing the available data in the database but no research was
done on attribute or column sanitization on the table structure in the database.
2.2.2 Attribute Sanitization
Data sanitization is a process of deleting data in records. It can be one data in a
record or a set of data for a few records. Meanwhile attribute sanitization is a process
of deleting a set of data for particular column. That means when an attribute is
deleted, all the data referring to that column will permanently be deleted. Inspired by
the fact that database may consist of unused attributes; this research proposed a
technique to apply attribute sanitization in designing database with reference to
object-oriented system design concept. Literature study found that attribute
14
sanitization is not popular among the researchers but attribute reduction which has a
similar meaning was used by many researchers in their areas of research.
Long Giang Nguyen [51] did a research on attribute reduction in decision
table using metric technique. This research proposed a new method for attribute
reduction in information system. It was based on the subset of attributes which
determines the knowledge structure on the set of objects. A metric on knowledge
structure will be defined on the attribute sets to measure the similarity of knowledge
structures. The researcher has published the technique which proves this metric
method is more effective based on Shannon entropy. A research done by Xu Qisheng
et al. [52] used mapping matrix according to the mapping relation between an
attribute value and a decision value from decision table. They claimed that by using
this mapping matrix optimal reduction set can be obtained compared to other
techniques such as discrimination matrix, decision matrix or heuristic reduction.
Their paper provides a detailed technique to apply mapping matrix with an example
to prove their idea is workable.
Mingquan Ye and Chanrong Wu [53] did a research on attribute reduction
using discernability matrix. Both researchers have proposed a decision table
decomposition model to solve the attribute reduction problem based on discensibility
matrix for large decision table. In their technique, they have introduced attributes
partition. The large decision table was divided into a number of decision sub-tables
and transferred the computing discernibility matrix in original decision table into
sub-tables. They have attached a complete algorithm and example in their paper for
reviewers. Their paper shows how this technique is applied and they claimed that
their technique is efficient.
A research done by Jin Zhou et al [54] deals with attribute reduction in
information system. They proposed a direct method of attribute relative reduction on
rough set theory which is called tolerance relationship. The given algorithm is based
on the concept of tolerance relationship similar matrix to calculate the attributes of
incomplete information system. It applied attribute significance based on attribute
frequency in the tolerance relationship similar matrix as the heuristic knowledge. To
calculate the candidate attribute, they used binary search heuristic algorithm. Their
paper also includes the algorithm, definition rule and examples of the proposed
technique. Another research done on information system is proposed by Yuan Ke-
yong et al. [55] using discernibility matrix. This research is based on the analysis of
15
the core attributes in information system and the cost of getting core attributes to
identify whether a set contains the core or not. By using this technique, it takes less
time and easier to achieve the smallest reduction.
Attribute reduction was also used by Xiongbin Wang et al. [56] in their area
which was rough set theory. Their research was based on the relationship of many
deferent attribute reduction definitions. It was known as relative attribute reduction.
Their research paper established the algorithm of attribute reduction and the rules
acquired from the decision table. They did a research to identify the relationship
between attribute reduction based on the system entropy and on the database system.
Wang Yanbo et al. [57] have done a research on attribute reduction for
network security. The research done is based on the network situation where the
network equipment running status, network behaviour and user behaviour was saved
in a log file. The information is very large, scrambled, repetitive and imperfect.
There are some attributes in the log file which are useless. These researchers
produced a paper that proposed a rough set reduction algorithm based on adaptive
genetic algorithm. This algorithm was improved on cross probability, variation
probability and fitness function. They claimed that their proposed technique has
reduced time and enhanced the calculation far better than the traditional algorithm.
In this research, attribute sanitization process was applied in two processes.
First sanitization process was done during the transfer from class diagram to schema
table. Attribute sanitization was applied to remove same attributes in a same class.
The second process was done after normalization process based on the user interface.
Cross checking between schema table and user interface identified the attributes that
were not retrieved in the system. This idea can improve the usage of storage through
sanitizing of unused attributes in the database. Current practices in analysis and
design system only focus on gathering user requirements to identify the flow of data
to be stored in database. These current techniques do not consent with the data in the
database. It has led the data to be stored that is used for nothing in the database for a
long life of the system. The research has done with proposed the rules as in Chapter 4
to produce database structure.
16
2.3 Structured Approach
Modelling of a system is an essential process in software development lifecycle to
produce a system model. A system model can provide the structure for problem
solving. It can also be used as an experiment to explore multiple solutions and
abstractions to manage complexity. It is not just only to understand the requirements
and business process, but more than that, which is to suggest improvement process
when the system is implemented. The models used should have traceability links
among them. In other words, a system model must be consistent.
Structured technique was introduced in 1970. The old technique for showing
the flow of data in structure system analysis and design (SSADM) is known as data
flow diagram (DFD) [15, 16]. Using the data flow diagram, users are able to show
how the system operation and data involved are used as the input and output at each
process. It was introduced by Edward Yourdon and Larry Constantine in 1974 based
on Martin and Estrin data flow graph model of computation [58]. Then, in 1977,
Gane and Sarson introduced different notation for data flow diagram, which is more
efficient [15].
2.3.1 Data Flow Diagram
Data flow diagram is very popular and easy for end users to understand the physical
idea of where the data they input has an effect to the process in the whole system.
Data flow diagram can show the flow of data from external entities into the system
and involvement of the data with the process. Data flow diagram has four notations:
input/output, process, data flow and data storage [2, 15]. A data flow diagram mainly
focuses on the processes or activities performing in the system. This diagram was
established to represent the system process in a diagram.
2.3.2 Entity Relationship Diagram
In structured technique, the Entity Relationship Diagram (ERD) was used to produce
database. Entity Relationship Diagram was popular a long time ago since Peter Chan
wrote in ACM Journal in 1976 [19]. ERD is categorised as a top-down approach that
17
represents a concept of data. The diagram contains entity and relationships between
entities. Chan defines entity as a thing that can be a person, company or event, while
relationship is an association among the entities. There are three types of association,
which are one-to-one, one-to-many or many-to-one, and many-to-many. The ERD is
a basic for advanced entity relationship as implemented in object-oriented concepts
[2, 16].
2.4 Object-Oriented Approach
Using object-oriented for software development, the system model can be developed
by using Unified Modelling Language (UML). Unified Modelling Language (UML)
is a modelling language that contains notations to represent objects-oriented
concepts. It was established in the works of Grady Booch, James Rumbaugh, Ivar
Jacobson and Rational Software Company. Since 1995 until now, UML has passed
through many evolution processes for improvement such as from UML 0.8 until
UML 2.3, which was released in 2010 [21, 29]. UML has many diagrams in view of
system modelling. However, Class Diagram is the backbone of a modelling system
because it represents the structure of the system that shows the system classes,
attributes and relationship between classes [9, 16]. Furthermore, the graphical
notation in a class diagram can be used to produce database structure.
Dobing and Parsons [59] in their interviews-based research with almost 2,700
UML practitioners and clients through the web find that use case diagrams and class
diagrams have the highest usage levels followed by sequence, state and activity
diagrams. Most of the diagrams are used to present the requirements; however, the
class diagram is one of the diagrams available to use in producing the database
design. The research study is based on class diagram as the initial stage to produce
schema table as the diagram contains semantic data [1]. Furthermore, class diagram
is also used to show entities, element of analysis and design such as associations,
aggregations and generalizations among entities [1, 21, 23] .
Currently, process modelling using object-oriented approach is already well-
known [1, 9, 29]. This technique, which has thirteen notations, uses Unified
Modelling Language (UML) that consists of various notations such as use case,
sequence diagram and class diagram. UML is a language for specifying,
18
constructing, visualizing and documenting the software system and its components.
It is an object modelling and specification language used in software engineering. It
includes a set of graphical notation techniques to create abstract models of specific
systems. Normally, during system analysis and design, the developer will produce
use case diagram, state diagram, class diagram and sequence diagram to confirm the
understanding of the developer with the users regarding the system [59]. When the
users agreed with the design, the class diagram will be produced to show the
interaction among the entities. In class diagram, it will show the behaviours of the
class such as association, composition, aggregation and generalization [16, 60]. It
also contains details about the class such as the attributes, name and data type [21,
29]. Association in object-oriented same as in structure technique was known as
multiplicity. Multiplicity in object-oriented shows the relationship between two
classes. Composition, aggregation and generalization are not defined in structured
technique. Composition is shown as a class which is a whole of another class and has
a strong relationship. Aggregation is described as a week relationship where a class
is part of another class. Generalization is shown in a class diagram that one class
shares the structure defined in one or more other classes. These four types of
relationships were considered in this research as defined in the rules in Chapter 4.
Practitioners use class diagram for the development team member as their
internal design [23, 29, 61]. This class diagram will be used as the initial part of
designing the database system. The UML does not contain dedicated diagram for
modelling database design. However, the class diagram is the only one which is the
most suitable diagram to consider as the base for database conceptual modelling. A
class diagram visually represents the semantic data [1, 23, 24].
Many tools [10, 31, 62-64] are available to represent UML notations to
design system such as Rational Rose, Poseidon, Power Designer and MagicDraw.
However, neither tool provides the facilities to generate optimal schema table from
the class diagram nor the facilities to check consistency from the class diagram to the
schema table. In the tools, the user must indicate which class will be the permanent
or transient. In this case, the permanent class will be transferred to the table in
schema table. Another class will transfer to the template program code such as C++.
In this research the idea is to create a good schema table from a class diagram. This
will help the database designer especially the novice database designer or those who
are not familiar with database concept. Database is the heart of any system
19
application and designing database is a critical part. Even though it is hidden from
the user, the database design must be fulfilled the user’s needs.
2.4.1 Class Diagram
A class in a class diagram [10, 65] represents a set of object with common features.
A class graphically shows a rectangle that is divided into three parts: names,
attributes and operations. The names for each class must be unique in the whole
diagram and the other parts are optional. Among the classes, there are four types of
relationships: associations, composition, aggregation and generalization. Association
is shown as the relationship of the object among the classes. It is called multiplicity
class diagram or cardinality on entity-relationship diagram. There are three types of
multiplicities which are one-to-one, one-to-many and many-to-many. Composition
and aggregation represent the whole and part relationships, when one class represents
the whole object and other classes represent parts. The generalization is also known
as inheritance, which is a concept in which the derived class can contain attributes
and operation as based class. The derived class can also contain additional attributes
and operations or less than the number of attributes or operations from based class.
The class is defined based on the user requirements and classified based on persons,
things, events or places. The class diagram is used to represent model data with
ignoring the operations. The UML does not have specific notation for database
modelling. However, the class diagram can be used to represent semantic data [1].
As shown in Figure 2.1, there are four (4) classes which are Subject,
Staff, SubjectFTMM and Student. The figure also shows two relationships
which are generalization and association. The generalization was shown between
SubjectFTMM and Staff. Meanwhile the associations between those classes are
shown between SubjectFTMM and Staff, and SubjectFTMM and
Student.
20
Figure 2.1: UML Class Diagram
The class diagram will be the input for process generating schema table. The
standard concept in the class diagram as class behaviour is described as follows:
i. Class diagram consists of classes and relationships.
• A class is defined as the set of shared attributes and behaviours in each
object in the class.
• Relationships are associations between entities. They are referred to as data
associations.
ii. A class consists of name, attributes and operations.
• Class name is used to distinguish the defining class from all other classes in
a class diagram.
• Attributes are used to describe the class.
• Operation is defined as the process to invoke on the attributes based on
system requirement. However, the operations are not used in designing the
structure of table.
iii. Among the classes, there are four types of relationships namely associations,
composition, aggregation and generalization.
• Associations show the relationship of the object among the classes. It is
called as multiplicity class diagram or cardinality on entity-relationship
Subject
code : String
desc : String
Student
idStudent : String
name : String
Staff
idStaff : String
name : Single
salary : Currency
address : String
SubjectFTMM
code
mark : float
gred : String
idStaff : String+m+n
+1
+n
21
diagram. There are three types of multiplicities which are one-to-one, one-
to-many and many-to-many.
• Composition relationship is when one class represents the whole object of
other classes.
• Aggregation relationship is when one class represents the part object of
other classes.
• Generalization is also known as inheritance, which is a concept where the
derived class can contain the attributes and operations as based class. It can
also contain additional attributes and operations or contain less than the
number of attributes or operations from based class.
2.4.2 User Interface
An interface is defined as a point of interaction between two systems or work groups.
It is very important as a reference point of interaction between human to system and
system to human. There are many types of interface including natural language,
question answer, menus, form fill, command language, graphical user interface
(GUI) and a variety of web interfaces [16]. Users use the interface to supply data for
store in the database and to retrieve it back. The information will be displayed after
the system retrieves the data and processes it for the user’s needs. Most of the time,
users are not familiar and do not concern with the structure in the database. They do
not bother with the database design. However, users are really consent to have
information in the interface to fulfil what they want. In other words, the interface
helps users to get information they need in and out of the system [16, 66]. Generally,
there are two purposes of the interface. Firstly, interface is used as presentation
language, which is the computer to human part of transaction, and secondly, it is
used as action language, which is the human to computer part of transaction.
Many researchers work on how to create the interface with the focus on
users’ satisfaction. Many books have also been written focusing on a good interface
according to the user’s likes preferences [1, 67]. No authors write about the impact of
interface to database structured. Most of them work on retrieving the information
from database or report to attract the users’ attention. However, a research done by
Dzafic et al. [68] proposed a technique to design database and user interface in
22
object-oriented with the aim reducing time maintenance. Reducing the time in design
database will significantly reduce the cost. Dzafic et al. has presented their work with
an example and found that the technique proposed has increased the number of
database table and increased the response time.
In this research, the attribute is referred to the data which is channelled
through the interface to be utilized but does not consider the graphics such as colour.
The user interface was an important input in this research because it was used to
create the final database structure after making comparison among the schema table
from the user requirement and the user interface. During the process of gathering
user requirements, a lot of data are requested to be stored in the database. The users
believed that these storing data will be used in the future. However, after the system
was completed and delivered to the user, not all the dreams came true. There are
many cases where the data stored in the database is never used. To avoid this, the
proposed method as suggested in this research is by tuning the structure of the
database through removal of the unused data in the database design. This method can
reduce the storage and increase the database performance. It is based on the
computer principle dealing with input and output concepts. This technique will be
further described in Chapter 3.
2.5 Normalization Technique
Normalization is a formal technique to reduce redundancy by analysing the
relationships based on primary key or candidate keys and functional dependencies
[18]. It is implemented from lower steps to higher steps: First Normal Form (1NF),
Second Normal Form (2NF), Third Normal Form (3NF), Boyce-Codd Normal Form,
Fourth Normal Form (4NF) and others. Each step corresponds to a specific normal
form that defined the properties to reduce data redundancy. The critical part in this
normal form is to define first normal form. Even though there is a series of normal
form, the users are encouraged to apply to at least Third Normal Form to avoid
update anomalies [7]. All the normalize forms are based on the relational database.
This technique is mature and well-formed. However, the drawback of this normalize
of forms is that when a higher normal form is applied, the database structure is
becomes less vulnerable and the number of tables increases [7, 13, 41]. Decomposing
23
the tables will increase the number of tables in displaying information, decrease
accessing time and make the process of updating, accessing and programming much
more complex.
2.6 Database Design
Database in general is known as allocation storage in computer to store data.
Database is a shared collection of logically related data that contains description of
the data defined according to the organization’s needs [2, 41, 69, 70]. Usually,
database put in a computer act as a server and is accessible by other computers
recognised as clients. Since a server is needed to serve many clients, all efforts must
be made when designing the database during the system design phase. Normally,
there are three stages in designing database, which are conceptual, logical and
physical design. However, this research focuses on logical database design only.
Logical database design focuses on identifying the data flow during eliciting
user’s requirements. Normally, the designer establishes the entity, attributes and
relationship between the entities. Designing logical database requires the software
engineer to comprehend thoroughly the overall business process. There are two main
approaches in designing database that are referred as bottom-up and top-down. The
bottom-up approach starts with identifying the attributes and relationships. This
approach is quite suitable for a simple database with a limited number of attributes.
This research focused on top-down approach because this approach is
appropriate for designing any type of database whether it’s simple or complex
database [7]. This approach starts with designing data model that contains entities
and relationships to show data semantic. In structured technique, the Entity
Relationship Diagram is usually used to represent the entities, relationships and
attributes. In UML technique, the data semantic can be derived from class diagram.
In current practice, a schema table is produced either from entity relationship
diagram or class diagram, depending on the design approach. A schema table is a
description of tables and its attributes [25]. This structure table will be used to create
physical database. However, normally, before the schema table is implemented in
physical database, the normalization technique that was introduced by Codd will be
applied to avoid the anomalies or data redundancy [20].
24
In this research, the schema table is constructed from class diagram with
considering the semantic data. Attribute sanitization in class diagram will eliminate
redundant of class name and redundant attributes in class diagram. Then the user
interface is used as normalized process to structure the table as wanted by users.
Finally, the fields in user interface will be compared with the schema table. If any
attributes in the schema table are not accessible from user interfaces, then it was
suggested for the schema table to be removed.
2.6.1 Database Design Using Structured Technique
In structured technique approach, the user defines the database based on two
diagrams, which are data flow diagram (DFD) and entity relationship diagram
(ERD). Data flow diagram is used to represent the user’s requirements. System
analysts will attempt to understand the user’s needs and business process in both
diagrams. Then, these diagrams are used to get confirmation from the users.
Through the structured analysis technique with the Data Flow Diagram, the
four symbols which are process, data flow, data store and entity are used to create a
pictorial depiction of process that will eventually provide system documentation.
When the system documentation is accepted by the user, the next step is to collect
the data stored in Data Flow Diagram to visualise in Entity Relationship Diagram.
Then, the schema table will be produced based on the defined Entity Relationship
Diagram.
After producing schema table, there are two steps to implement the schema
table to physical database system, done by system analysts. The first step is that they
will strictly create the database structure based on the schema table. Another step is
the systems analyst will perform current normalization technique, as introduced by
Codd, before the implementation. Although, applying normal form is optional, it is
recommended to do the process at least up to third normal form [7]. When, the
schema table were normalized until Third Normal Form, it has complied with the
Codd basic rule to set of functional dependencies for each relation and each relation
has a designated primary key. The structured technique in designing database system
is illustrated in Figure 2.2.
REFERENCES
1. Sommerville, I. Software Engineering 9 (Ninth edition). Eighth. Harlow:
Pearson Addison Wesley. 2010.
2. Roth, R. M., Dennis, A. and Wixom, B. H. System Analysis And Design.
Singapore: John Wiley. 2013.
3. Stair, R. and Reynolds, G. Principles of Information System. Tenth Edition.
Thomson. 2012.
4. Babu, P. A., Murali, N., Swaminathan, P. and Kumar, C. S. Making Formal
Software Specification Easy. Proceeding of the 2nd
International Conference
on Reliability, Safety & Hazard (ICRESH-2010). IEEE. 2010. pp 511-516.
5. Pohl, K. and Rupp, C. Requirements Engineering Fundamentals. First
Edition. California: Library of Congress. 2011.
6. Heninger, K. L. Specifying Software Requirement for Complex Systems:
New Techniques and Their Applications. Proceeding of the IEEE
Transaction on Software Engineering. 1980. pp 2-13.
7. Connolly, T. and Begg, C. Database Systems. Fifth Edition. Addison Wesley.
2010.
8. OMG. (2009). OMG Unified Modeling LanguageTM (OMG UML). Retrieved
on April 2, 2014, from http://www.uml.org/#UMLProfiles.
9. Yan, W. and Du, Y. Research on Reverse Engineering from Formal Models
to UML Models. Proceeding of the 3rd International Symposium on Parallel
Architectures, Algorithms and Programming. IEEE. 2010. pp 406-411.
10. Deenis, A., Wixom, B. H. and Tegarden, D. System Analysis Design UML
Version 2.0. Fourth Edition. John Willey & Sons. 2012.
11. Cadar, C. and Dadeau, F. Constraints in Software Testing, Verification and
Analysis CSTVA 2013. Proceeding of the 2013 IEEE Sixth International
Conference on Software Testing, Verification and Validation Workshops.
2013. pp 208-209.
12. Software & Systems Engineering Standards Commitee (2012). IEEE
Standard for System and Software Verification and Validation. New York:
IEEE Computer Society.
120
13. Elmasri, R. and Navathe, S. Fundamentals of Database Systems. Sixth
Edition. Singapore: Pearson. 2011.
14. Ganc, C. and Sarson, T. System Structured Analysis and Design Tools and
Techniques. New Jersey: Prentice Hall. 1979.
15. Ibrahim, R. and Yen, Y. S. Formalization of The Data Flow Diagram Rules
for Consistency Check. International Journal of Software Engineering &
Application (IJSEA). 2010. 1(1). pp. 99-106.
16. Kendall, K. E. and Kendall, J. E. System Analysis and Design. Ninth Edition.
New Jersey: Pearson. 2013.
17. Jiang, X. and Song, G. The Database Design for The Control Equipment
Management System. Proceeding of the 2012 3rd International Conference
On System Science, Engineering Design and Manufacturing Information
(ICSEM). IEEE. 2012. pp 261-264.
18. Codd, E. F. A Relational Model for Large Shared Data Banks.
Communication of ACM. 1970. 13(6). pp. 377-387.
19. Chen, P. P. The Entity Relationship Model - Toward a Unified View of Data.
ACM Transaction on Database System. 1976. 1 (1). pp. 9-36.
20. Codd, E. F. (1969). Derivability, Redundancy and Consistency of Relations in
Large Data Banks. IBM Research Report. Retrieved on 18 January, 2010,
from http://technology.amis.nl/ blogacc/blog/wp-content/images/RJ599.pdf.
21. Chonoles, M. J. and Schardt, J. A. UML 2 For Dummies. Wiley Publishing,
Inc. 2003.
22. Ha, I. and Kang, B. Cross Checking Rules to Improve Consistency between
UML Static Diagram and Dynamic Diagram. In. Intelligent Data Engineering
and Automated Learning – IDEAL 2008 Springer Berlin/ Heidelberg. pp 436-
443; 2008.
23. Alsaadi, A. Checking Data Integrity via the UML Class Diagram. Proceeding
of the Proceeding of International Conference on Software Engineering
Advances (ICSEA 06). IEEE Press. 2006. pp 37-46.
24. Fitsilis, P., Gerogiannis, V., Kameas, A. and Pavlides, G. Producing Relation
Database Schemata from an Object Oriented Design. Proceeding of the 20th
EUROMICRO Conference, EUROMICRO 94. 1994. pp 251-257.
121
25. Grimes, S., Alkadi, G. and Beaubouef, T. Design and Implementation of The
Rough Relational Database System. Journal of Computing in Colleges. 2009.
24(4). pp. 88-95.
26. Waseem, A. A., Hussain, S. J. and Shaikh, Z. A. An Extended Synthesis
Algorithm for Relational Database Schema Design. Proceeding of the
International Conference on Information Systems and Design of
Communication. Lisboa, Portugal: ACM. 2013. pp 94-100.
27. Kolahi, S. and Libkin, L. An Information-Theoretic Analysis of Worst-Case
Redundancy in Database Design. ACM Tansactions on Database System.
2010. 35(1). pp. 1-32.
28. Heimel, M. Designing a Database System for Modern Processing
Architectures. Proceeding of the SIGMOD'13 PhD Symposium. New York,
USA: ACM. 2013. pp 13-17.
29. Jason, T. R. UML: A Beginner Guide. Califonia: McGraw-Hill. 2003.
30. Boufares, F. and Bennaceur, H. Consistency Problem in ER-Schemas for
Database Systems. Information Sciences. 2004. 163(4). pp. 263-274.
31. Ali, A. B. H., Boufares, F. and Abdellatif, A. Checking Constraints
Consistency in UML class diagrams. Proceeding of the International
Conference ICTTA '06. 2006. pp 3599-3604.
32. Lubcke, A., Koppen, V. and Saake, G. A Decision Model to Select the
Optimal Storage Architecture for Relational Databases. Proceeding of the
2011 Fifth International Conference on Research Challenges in Information
Science (RCIS). IEEE Conference Publication. 2011. pp 1-11.
33. Sheng, C., Zhang, N., Tao, Y. and Jin, X. Optimal Algorithms for Crawing a
Hidden Database in The Web. Proceedings of The VLDB Endowment. 2012.
5(11). pp. 1112-1123.
34. Al-Zubaidy, H., Lambadaris, L. and Talim, J. Optimal Scheduling in High-
Speed Downlink Packet Access Networks. Transactions on Modeling and
Computer Simulation (TOMACS). 2010. 21(1). pp. 1-27.
35. Rosen, K. H. Discrete Mathmatics and Its Applications. Fouth Edition. New
York: McGraw Hill. 1999.
36. Grimaldi, R. P. Discrete and Combinatorial Mathematics. Fifth Edition. New
York: Pearson. 2004.
122
37. Khaswneh, A. A., Saraureh, K. T. A. and Meridji, K. Early Identification,
Specification and Measurement for Data Definition and Database
Requirements. Proceeding of the International Conference on Computer
Information and Telecomunication Systems (CITS). IEEE. 2012. pp 1-5.
38. Rainer, R. K. and Cegielski, C. G. Introduction to Information Systems. Third
Edition. New York: John Wiley. 2010.
39. Chen, Z., Gua, T., Qien, Z. and Xiao, C. Research on The Method of Data
Extraction Based on Category. Proceeding of the 12th International
Conference on Computer and Information Science (ICIS). 2013. pp 365-369.
40. Ismail, S., Shaari, M. S., Saip, M. A. and Shaari, N. Module Information
Technology for Management. Selangor: Pearson. 2010.
41. Coronel, C., Morris, S. and Rob, P. Database Principles. Tenth Edition.
China: Course Technology Cengage Learning. 2013.
42. Xu, Y., Yu, Z., Mao, C., Wang, Y. and Guo, J. Entity Answer Extraction of
Web Table. Proceeding of the 2010 Seventh International on Fuzzy Systems
and Knowledge Discovery (FSKD 2010). 2010. pp 2465-2468.
43. Nakamura, Y., Sakamoto, K., Inoue, K., Washizaki, H. and Fukazawa, Y.
Evaluation of Understanability of UML Class Diagrams by Using Word
Similarity. Proceeding of the International Workshop on Software
Measurement and The 6th International Conference on Software Process and
Product Measurement. 2011. pp 178-187.
44. Ltd, A. N. (2005). Data Sanitization Techniques. White Paper. Retrieved on
July 7, 2012, from http://www.qaguild.com/upload/
Datasanitization_Whitepaper.pdf.
45. Bishop, M., Bhumiratana, B., Crawford, R. and Levitt, K. How to sanitize
data? Proceeding of the Enabling Technologies: Infrastructure for
Collaborative Enterprises, 2004. WET ICE 2004. 13th IEEE International
Workshops on. 14-16 June 2004. 2004. pp 217-222.
46. Bishop, M., Crawford, R., Bhumiratana, B., Clark, L. and Levitt, K. Some
Problems in Sanitizing Network Data. Proceeding of the Enabling
Technologies: Infrastructure for Collaborative Enterprises, 2006. WETICE
'06. 15th IEEE International Workshops on. 26-28 June 2006. 2006. pp 307-
312.
123
47. Devit, S. Intelligent Data Cleaning Based on Knowledge Based. Proceeding
of the Information and Multimedi Technology (ICIMT 09). Jeju Island: IEEE
Press. 2009. pp 393-398.
48. Amiri, A. Dare to Share: Protecting Sensentivi Knowledge with Data
Sanitization. Science Direct: Decision Support Systems. 2006. 43. pp. 181-
191.
49. Molodtsov, D. Soft Set Theory First Results. Computer and Mathmatics with
Applications. 1999. pp. 19-13.
50. Herawan, T., Rose, A. N. M. and Deris, M. M. Soft Set Theoretic for
Dimensionality Reduction. Communication in Computer and Information
Science. 2009. pp. 171-178.
51. Nguyen, L. G. Metric Based Attribute Reduction in Decision Tables.
Proceeding of the Proceedings of The Federated Conference on Computer
Science and Information Systems. 2012. pp 311-316.
52. Qisheng, X., Zeyin, X., Yang, Z. and Houchang, X. Study on Attribute
Reduction Based on Mapping Matrix. Proceeding of the 2nd International
Confrence on Signal Processing Systems (ICSPS). 2010. pp 722-726.
53. Ye, M. and Wu, C. Decision Table Decomposition Using Core Attributes
Partition for Attribute Reduction. Proceeding of the The 5th International
Conference on Computer Science and Eduction. Hefei, China: 2010. pp 23-
26.
54. Zhou, J., E, X., Li, Y., Liu, Z., Bai, X., Huang, X. and Yang, D. A New
Reduction Algorithm Dealing With The Incomplete Information System.
Proceeding of the International Conference on Cyber-Enabled Distributed
Computing and Knowledge Discovery. 2009. pp 12-19.
55. Ke-yong, Y., Xiao-guang, H. and Fang-qiao, X. An Algorithm for Attribute
Reduction Based on Database Technology. Proceeding of the 2009
International Conference on Artificial Intelligence and Computational
Intelligence. 2009. pp 53-57.
56. Wang, X., Zheng, X. and Xu, Z. Comparative Research of Attribute
Reduction Based on The System Entropy and on The Database System.
Proceeding of the International Symposium on Computational Intelligence
and Design (ISCID '08). 2008. pp 150-153.
124
57. Yanbo, W., Huiqiang, W., Xiufeng, W. and Ming, Y. Research on Data
Attribute Reduction for Network Security Situation. Proceeding of the 2012
International Conference on Computer Science and Service System (CSSS).
2012. pp 589-593.
58. Yourdon, E. and Constantine, L. L. Structured Design. New Jersey: Prentice
Hall. 1979.
59. Dobing, B. and Parsons, J. Dimensions of UML Diagram Use: A Survey of
Practitioners. Journal of Database Management. 2008. 19(1). pp. 1-18.
60. Maciaszek, L. A. Requirements Analysis and System Design. Third Edition.
England: Pearson. 2007.
61. Saringat, M. Z., Ibrahim, R. and Ibrahim, N. A Proposal for Constructing
Relational Database from Class Diagram. Computer and Information Science.
2010. 3(2). pp. 38-46.
62. Castro, T. R. d., Souza, S. N. A. d. and Souza, L. S. d. CASE Tool for Object-
Relational Database Designs. Proceeding of the Conference on Information
Systems and Technologies (CISTI). 2012. pp 1-6.
63. Felsinger, R. and Pleasant, M. (2001). Using Rational UML Case Tool.
Retrieved on 16 MAY 2011, from http://www.uml.org.cn/ umltools/psf/
UsingRationalRose.pdf.
64. Sybase. (2011). Object-oriented Modeling. Retrieved on 16 May 2014, from
http://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc38086.1610/
doc/pdf/ object_oriented_modeling.pdf.
65. Ali, N. H., Shukur, Z. and Idris, S. A Design of an Assessment System for
UML Class Diagram. Proceeding of the Fifth International Conference on
Computational Science and Applications IEEE Press. 2007. pp 539-546.
66. Kieffer, S., Coyette, A. and Vanderdanckt, J. User Interface Design by
Sketching: A Complexity Analysis of Widget Representations. Proceeding of
the EICS'10: Proceedings of the 2nd ACM SIGCHI Symposium on
Engineering Interactive Computing Systems. New York: ACM. 2010. pp 57-
66.
67. Zhu, B., Shanker, G. and Cai, Y. Integrating Data Quality into Decision-
Making Process: An Information Visualization Approach. In: S.-V. Lecture
Notes in Computer Science 4557. 2007. pp 366-369.
125
68. Dzafic, I., Soto, J., Halilovic, E., Lecek, N. and Music, M. Object-oriented
Database and User Interface Design. Proceeding of the EuroCon 2013.
Croatia: IEEE. 2013. pp 558-563.
69. Ramez, E. Database System. Sixth Edition. Boston: Pearson. 2011.
70. Kumar, V. 101 Design Methods. New Jersey: John Wiley. 2013.
71. Attraran, H. and Far, A. H. The 2011 10th IEEE International Conference.
Proceeding of the Cybernatic Intelligent Systems (CIS). London: IEEE. 2011.
pp 128-132.
72. Date, C. J. An Introduction to Database System. Eighth Edition. Singapore:
Pearson. 2004.
73. Dzafic, I., Soto, J., Halilovic, E., Lecek, N. and Music, M. Object-oriented
Database and User Interface Design. Proceeding of the EuroCom 2013.
Croatia: 2013. pp 558-563.
74. Jagannatha, S., Reddy, S. V. P., Kumar, T. V. S. and Kanth, K. R. Simulation
and Analysis of Performance Predicition In Distributed Database Using OO
Approach. Proceeding of the 2013 3rd IEEE International Advance
Computing Conference (IACC). IEEE. 2013. pp 1324-1329.
75. Osman, R., Awan, I. and Woodward, M. E. Performance Evaluation of
Database Designs. Proceeding of the 2010 24th IEEE International
Conference on Advanced Information Networking and Applications. 2010. pp
42-49.
76. Boufares, F. and Bennaceur, H. Consistency Problem in ER-schemas for
Database Systems. Information Sciences: an International Journal. 2003. pp.
263-274.
77. Ibrahim, N. and Ibrahim, R. Checking Inconsistencies in UML Diagrams.
Proceeding of the Software Engineering Postgraduate Workshop (SEPoW
2009). Penang, Malaysia: 2009. pp
78. Ibrahim, N., Ibrahim, R. and Mansor, D. Well-Formedness Rules for UML
Use Case Diagram. Proceeding of the Proceeding of International Industrial
Informatics Seminar 2009. Yogyakarta, Indonesia: 2009. pp 169-173.
79. Ahmad, M. A. and Nadem, A. Consistency Checking of UML Models Using
Description Logics. Proceeding of the 2010 6th International Conference on
Emerging Technologis. 2010. pp 310-315.
126
80. University of Namur. Towards a Unifying Conceptual Framework for
Inconsistency Management Approaches. 2009.
81. Snell, M. and Powers, L. Microsoft Visual Studio 2010. First Edition. Indiana:
Pearson. 2011.
82. Saringat, M. Z. Lecture Assessment System (LAS): Module Manage Marks
and Module Manage Attendance. Master of Computer Science (Real Time
Software Engineering). Universiti Teknologi Malaysia (UTM). 2003.
83. Zakaria, Z. Sistem Pengurusan Klinik Sejahtera. Bachelor Degree in
Information Technology. Universiti Tun Hussein Onn Malaysia. 2013.
84. Jamaluddin, M. A. W. Sistem Pengurusan Kompaun Majlis Perbandaran
Batu Pahat (MPBP). Bachelor Degree of Information Technology. Universiti
Tun Hussein Onn Malaysia. 2013.
85. Ibrahim, N. Lecture Assessment System: Module Login and Manage User.
Master of Computer Science (Real Time Software Engineering). Universiti
Teknologi Malaysia (UTM). 2003.