28
Improved Extraction mechanism in ETL process for building of a Data Warehouse MPSTME, SVKM’s NMIMS, Mumbai Page28 Chapter 3: Results and observations Part-I 3.1 Flat file database Experimental Setup A Experiments are Run on Intel ® Core ™ i3 CPU, [email protected] GHz Processor 4.00 GB RAM Windows XP Operating System Summary of the sample/real DB for these Experimentations Portfolios/ Scripts details from Moneycontrol.com Balance Sheets of various Scirpts from Religare Securities National Stock Exchange www.nseindia.com Bombay Stock Exchange www.bseindia.com

Chapter 3: Results and observations Part-I - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/7023/9/09_chapter 3.pdf · Improved Extraction mechanism in ETL process for building

  • Upload
    phamdan

  • View
    221

  • Download
    4

Embed Size (px)

Citation preview

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e28

Chapter 3:

Results and observations Part-I

3.1 Flat file database

Experimental Setup

A Experiments are Run on

Intel ® Core ™ i3 CPU, [email protected] GHz Processor

4.00 GB RAM

Windows XP Operating System

Summary of the sample/real DB for these Experimentations

Portfolios/ Scripts details from Moneycontrol.com

Balance Sheets of various Scirpts from Religare Securities

National Stock Exchange www.nseindia.com

Bombay Stock Exchange www.bseindia.com

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e29

Figure 2 Scripts Sold Data

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e30

Figure 3 Portfolio Data

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e31

Figure 4 Balance Sheet Data

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e32

Figure 5 History and Alerts Data

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e33

Figure 6 One Year Backup Basic Data

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e34

Figure 7 One Year Advanced in detailed Data

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e35

Files are called “Flat Files” when they contain a single data structure. Generally this

structure is the column and row structure like a spreadsheet or table, but a file in binary or

encrypted with a single encryption key could also be called a flat file. Files that are not flat;

marked up files like XML or HTML, EDI files, other formats like HL7 or SEF files and others.

Two flat file types; Delimited Files, and Fixed Width Files.

Delimited File

A delimited file is a file where the data is organized in rows and columns. Each row has a

set of data, and each column has a type of data. If it sounds like I am describing a spreadsheet,

you are right on the money. To make the column, each row has the columns separated with a

character called a delimiter. See the example below Delimited File in figure 8, Fixed Length

Delimited Flat file in Figure 9 and Variable Length Delimited Flat file in figure 10.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e36

Figure 8 Delimited Flat file

Figure 9 Fixed Length Delimited Flat file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e37

Figure 10 Variable Length Delimited Flat file

Flat files have some advantages over databases:

Available and versatile: we can create and save data in any operating system's file

system. We don't need to install any extra software. Additionally, text data stored in flat

Stock,Price,Change,Close,Volume,Qty,Inv.Price,Inv.Amt,Gain,Overall Gain,Latest Value

Reliance,748.1,-3.45,751.55,2.40m,1,799,799,-3,-51,748

Alok Industries,19.65,0.05,19.6,2.73m,1,16.55,17,0,3,20

FCS Software,0.45,0.05,0.4,209633,9820,2.55,25041,491,-20622,4419

Bharti Airtel,317.05,-6.45,323.5,3.88m,1,425.2,425,-6,-108,317

DCB,49.9,1.75,48.15,6.42m,100,64,6400,175,-1410,4990

Punj Lloyd,55.65,1.1,54.55,2.01m,111,97.6,10834,122,-4657,6177

Unitech,28.6,0.15,28.45,8.54m,60,32.5,1950,9,-234,1716

Apollo Tyres,86.65,0.45,86.2,2.08m,1,81.8,82,0,5,87

GMR Infra,29.05,0.35,28.7,2.51m,1,27.15,27,0,2,29

Jaypee Infra,48.4,-0.2,48.6,104078,1,90,69,0,-21,48

Mahindra Satyam,78.9,0.45,78.45,1.97m,1,91,91,0,-12,79

ILandFS,26.75,-0.05,26.8,26293,20,29.25,585,-1,-50,535

Suzlon Energy,24.1,-0.05,24.15,9.72m,110,36.8,4048,-5,-1397,2651

Hind Constr,25.15,-0.15,25.3,1.83m,140,51.5,7210,-21,-3689,3521

Wire & Wireless,9.25,-0.05,9.3,338281,1000,14.01,14010,-50,-4760,9250

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e38

files can be read by a variety of software programs, such as word processors or

spreadsheets [59].

Easy to use: We don‟t need to do any extra preparation, such as install database software,

design a database, create a database, and so on. Just create the file and store the data with

statements in your PHP script [54].

Smaller: Flat files store data by using less disk space than databases [58].

A flat file is quick and easy and takes less space than a database. Flat files are particularly useful

for making information available to other software, such as an editing program or a spreadsheet.

Flat files can be looked at by anyone with access to the computer directory where they are stored,

so they are useful when information needs to be made available to other people.

Databases have some advantages as well:

Security: A database provides a security layer of its own, in addition to the security

provided by the operating system. A database protects the data from outside intrusion

better than a flat file.

Accessibility of data: we can store data in a database by using a very complex data

structure, specifying data types and relationships among the data. The organization of the

data makes it easy to search the data and retrieve what you need.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e39

Ability to handle multiple users: When many users store or access data in a single file,

such as a file containing names and addresses, a database ensures that users take their turn

with the file to avoid overwriting each other's data.

Databases require more start-up effort and use more space than a flat file, but are much more

suitable for handling complex information. The database handles the internal organization of the

data, making data retrieval much simpler [50, 57]. A database provides more security, making it

more suitable for sensitive, private information. Databases can more easily and efficiently handle

high traffic when many users may try to access the data almost simultaneously [53].

Some of the flat file systems disadvantages are listed as follow.

Disadvantages

1. Less security easy to extract information.

2. Data Inconsistency.

3. Data Redundancy.

4. Searching of a record is very time consuming.

This research work makes use of all the advantages of the flat file database and also tries to

nullify all the disadvantages mentioned above.

These are the following measures taken in order to overcome the above disadvantages.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e40

a. This research work explicitly provides security using one time password over a file thus

the security during transfer of a file from source to the destination can be maintained.

Protection against hacking of file is possible as one time password is used in order to

avoid the leaking of password.

b. Data consistency can be maintained using change data capture mechanism where triggers

are used to alert the destination system about the record updates at the source systems and

as these flat files are converted from data base file itself the data is already consistent

before extraction as well [48, 51].

c. Data redundancy does not exist in flat file as it is created from the database file itself

before extracting the file, once after the extraction is done it is once again converted back

to database file at the destination system [49, 52].

d. Search time is very high in normal any flat file as there is no constraint in the records of

the file. Data may be redundant and missing of primary key are the main factors for which

the search time becomes very high. This research work does not use flat files for any kind

of operation. Flat files are used only during the transfer of files and as this flat file is

created from the database file itself, most the advantages of database files are also

applicable.

Fixed Width Flat File

There is another type of file, is is called a Fixed Width or Fixed Position file. It is different from

a delimited file in that the data fields are defined by the character position. See the example

below.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e41

In a fixed width file, the delimiter characters are eliminated. If the data is formulated such that

the data fields are the same size, this format can be more compact than a delimited file. You can

see here that we know the size of the Birthdate data, so we eliminate all the spaces between the

Bdate and Department fields. If all of the data was formatted for size like this, we could really

make this file small, so that it only contains the data.

We also eliminate the pesky problem of delimiters found in data. The issue of a comma delimited

files containing a field that has a comma in the data. How does the parser know that this comma

is not really a delimiter, but is part of the data? Anyway, that problem is eliminated in a fixed

width file.

Comparison

This is not a contest of which format is superior. Both file architectures are useful and both are

used commonly enough that you need to be at ease working with both. Delimited files are really

easy to work with as long as your data is clean of the delimiter character. Doing quick integration

of data common in ETL tasks, delimited files are far more common that Fixed Width.

Continuous operations of data integration and importation many times find that Fixed Width or

Position files are more reliable for the unattended operation, even ETL if it is unattended.

As with many things in integration work, we want to pick the best option. Knowing and working

with both fixed and delimited files will help you determine the right choice for the task we have

before us.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e42

3.2 Implementation

Experimental Results:

FRONT PAGE:

Main front page designed to compare and check time extraction for flat file and database file is as

shown in the following snap shots by varying the number of the records in a file.

To compare the performance with respect to size and time, the record size were varied from 100

to 10000 records. The performance variation of the flat file can be compared with that of database

file with respect to the following implementation snapshots. (For some part of the implementation

module code, refer Appendix-B). The implementation was carried out in asp.Net, MYSQL.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e43

Figure 11: Main page to execute the extraction time for flat file and Data base file.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e44

Number of Records : 100

a) Data base file extraction

Figure 12: Data Extraction time result when the numbers of records are 100 in a Database file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e45

b) Flat file extraction

Figure 13: Data Extraction time result when the numbers of records are 100 in a flat file.

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e46

Number of Records: 500

a) Database file

Figure 14: Data Extraction time result when the numbers of records are 500 in a Database file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e47

b) Flat file extraction

Figure 15: Data Extraction time result when the numbers of records are 500 in a flat file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e48

Number of Records: 1000

a) Database file

Figure 16: Data Extraction time result when the numbers of records are 1000 in a Database file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e49

a) Flat file Extraction

Figure 17 : Data Extraction time result when the number of records are 1000 in a flat file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e50

Number of Records : 2000

a) Database file Extration

Figure 18: Data Extraction time result when the numbers of records are 2000 in a Database file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e51

b) Flat file Extration

Figure 19: Data Extraction time result when the numbers of records are 2000 in a flat file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e52

Number of Records: 5000

a) Database file

Figure 20: Data Extraction time result when the numbers of records are 5000 in a Database file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e53

b) Flat file

Figure 21: Data Extraction time result when the numbers of records are 5000 in a flat file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e54

Number of Records : 10000

a) Database file Extration

Figure 22: Data Extraction time result when the numbers of records are 10000 in a data base file

Improved Extraction mechanism in ETL process for building of a Data Warehouse

MPSTME, SVKM’s NMIMS, Mumbai

Pag

e55

b) Flat file Extration

Figure 23: Data Extraction time result when the numbers of records are 10000 in a flat file