5
BCO5651 Deliverable 2 Data Cleansing & Conversion Introduction KKAE is an industrial company that is undergoing an SAP implementation The rollout has reached the phase of data conversion and migration. This deliverable aims to show the procedures of how KKAE Pte. Ltd handled the data cleansing and data mapping based on a list of employees who were using SAP system in KKAE. The conversion is based upon a set of business assumptions. Thereafter, the data cleansing and mapping will be done by applying two applications: Microsoft Excel to cleanse and Altova MapForce to convert. Details Two files are used in this activity. The first file is a list of employees who are using SAP system. The list contains their contact and SAP access information. Some of the information is duplicated. And some information contains human-unreadable characters or invalid data. The other file is a template that shows the required layout format of the final output list. The main objectives of this activity were as follows: Mark all the duplicated data with proper error flags. Mark all the missing data with proper error flags. Verify the human-unreadable characters, then take proper action on it; either remove the human-unreadable character or correct the unreadable character. Verify and mark all the invalid data with proper error flags, then correct it. Once the data is cleansed, map the data based on the template’s pre-defined layout and create the final list. Business Assumptions

BCO5651Del2DataCleansingConversion

Embed Size (px)

Citation preview

Page 1: BCO5651Del2DataCleansingConversion

BCO5651

Deliverable 2 Data Cleansing & Conversion

Introduction

KKAE is an industrial company that is undergoing an SAP implementation The rollout has reached the phase of data conversion and migration. This deliverable aims to show the procedures of how KKAE Pte. Ltd handled the data cleansing and data mapping based on a list of employees who were using SAP system in KKAE. The conversion is based upon a set of business assumptions. Thereafter, the data cleansing and mapping will be done by applying two applications: Microsoft Excel to cleanse and Altova MapForce to convert.

DetailsTwo files are used in this activity. The first file is a list of employees who are using SAP system. The list contains their contact and SAP access information. Some of the information is duplicated. And some information contains human-unreadable characters or invalid data. The other file is a template that shows the required layout format of the final output list.

The main objectives of this activity were as follows:

Mark all the duplicated data with proper error flags.

Mark all the missing data with proper error flags.

Verify the human-unreadable characters, then take proper action on it; either remove the human-unreadable character or correct the unreadable character.

Verify and mark all the invalid data with proper error flags, then correct it.

Once the data is cleansed, map the data based on the template’s pre-defined layout and create the final list.

Business Assumptions

The following is the list of business assumptions made in this activity.

In this data cleansing and mapping activity, there are less than 200 rows of employees’ records. Therefore, KKAE decides to test out two approaches: Microsoft Excel and Altova MapForce, to find out the most suitable solution in terms of efficiency, accuracy and data integrity. This would allow for the company to improve service quality and communication.

Data backup has been performed on the employees’ relevant data before the data cleansing and mapping activity. The data backup is to prepare for any incidence in wrong data manipulation.

The head office of KKAE is located in Australia. Documents and reports must be

displayed/printed in English. Grammar and spelling in all the documents and output reports

Page 2: BCO5651Del2DataCleansingConversion

BCO5651 ERP System Implementation Assignment Two – Data Cleansing and Conversion

follow the English version. For example, all the foreign salutations will be converted to English version in the final output list.

The final output list will serve as a global internal contact list that includes all the key employees who are involved in the SAP Business-By-Design Project. As it is a contact list, some data such as the contact numbers, must be stated in complete format with country code, area code, followed by the phone number.

KKAE has started to use SAP system since 1996.

The data source was extracted on 31 March 2011

Before the data extraction activity, a junior staff had executed a wrong program. That wrong program duplicated some data in the database. It has also shifted some data to the wrong fields. In addition, some data has been overwritten by wrong data or un-readable/ASCII codes.

Each row of record represents a single employee and the record stores the employee’s contact and SAP related information. All the rows are unique and there should be no duplication.

In the Name field, employee’s first name is stored in front of his/her surname.

Example mapping

Data Mapping

Header of Source (S) File:

Header of Template (T) File:

Fields Mapping:

Page 2 of 3

Page 3: BCO5651Del2DataCleansingConversion

BCO5651 ERP System Implementation Assignment Two – Data Cleansing and Conversion

TASKYou will be given two files, the first a file containing 199 records (File 1) of customer data. This is the file that will be converted. The second file contains about 10 records (File2) and becomes the template file. Using a package (excel) cleanse the 199 customer records. Use another package (mapforce10) to convert the customer data into the same format as your template file. Completed data conversion deliverables includes:

o Text file of your template file (File 2)o Excel sheet of missing/corrupted records (cleansing phase)o Excel sheet of duplicated records (cleansing phase)o Cleansed text file of recordso Screen dumps of conversion phase steps. Hard Copy. (conversion phase)o Text file of cleansed and converted records (final phase)o When cleansing data make sure you make your assumptions clear

Records with missing data must be separated and presented Records with corrupted data must be separated and presented Records with duplicated records must be separated and presented Records with other char sets, unprintables, inconsistent char length

must be cleansed When converting data make sure you make your assumptions clear

o Detail the appropriate file structure conversiono Detail the appropriate text formatting conversiono Detail the appropriate number formatting conversiono Detail the appropriate field order conversiono Detail the instructions of how conversion done (formulas, functions, procedures)

BCO5651 Deliverable 2 Mark Guide PresentationDisplayed Field Mappings x2 HD D C P NExplained Business Assumptions x2 HD D C P NSeparated incomplete x1 HD D C P NCleansed Data x2 HD D C P NRemoved Duplicates x1 HD D C P NFinal data in Correct Order x2 HD D C P NCorrect Data Structure-no of fields x1 HD D C P NData Converted x5 HD D C P NDocumentation shows Procedure x4 HD D C P N

x1 x.8 x.6 x.5 x0Total (20)

Page 3 of 3