View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Enterprise Business Processes and Reporting
(IS 6214)
MBS MIMAS
`17th Feb 2010
Fergal CartonBusiness Information Systems
Last week
• Feedback on presentations
• Data cubes
• Types of data– Dynamic / Static– Cucina example
• Soft information
• Data recording
This week
• Data storage
• Deciding what information to collect
• Exploiting data warehouses
• Cucina and real time information
• Extract, transform, load (ETL)
• Real time data
• Refresh rates and response times
Data storage
• Series of ad-hoc systems manual and computer-based (spreadsheet, filed forms…)
• Dedicated databases for manufacturing data (QC, shipping etc…)
• Process Control Systems (technical parameters)• Other specialised proprietary systems
(integration may not be easy).• ERP system with its own data structure or fed by
existing systems
Deciding what information to collect
• Information cost + overload mean not all data are useful
• Some framework can be used – e.g. Critical Success Factors (CSF)
• Questions that must be answered:– How is it measured and broken down?– How often should it be measured?– Who should know about it?– Where can the data be found?– How should it be presented?
ExtractionCleaningTransformationLoading
Relational Databaseon a dedicated Server
De normalised, data
Static Reporting
Scrutinising
MultidimensionalData CubesOLAP tools
Data Warehouse
Source Systems
Discovering
Data Mining…….
Data StagingArea
Exploiting the DW data
Think about real-time for Cucina
• What information is required real time?
• Can you differentiate between report types– Static– Scrutinising– Discovery
ETL Tools
• Extraction, Transformation, and Loading
• Specification based
• Eliminate custom coding
• Third party and DBMS based tools
Data extraction and transformation
• Getting data out of legacy applications
• Cleaning up the data
• Enriching it with new data
• Converting it to a form suitable for upload
• Staging areas
Data Quality Problems
• Multiple identifiers• Multiple field names• Different units• Missing values• Orphaned values• Multipurpose fields• Conflicting data• Different update times
Example 1 – the supplier file
Sup code Sup name Sup address City Phone4 digits
Sup code Sup name Sup address… Phone Cat3 letters + 1,2,3 depending4 digits on total purchases
last year
OLD
NEW
New supplier code to include city where firm is basedAssignation of category based on amounts purchased
Example 2: merging files
• Complete customer file based on Accounts and Sales and Shipping
OLD (finance)
CustID name address city account number credit limit balance
OLD (sales)
OLD (Shipping)
CustID* name address city discount rates sales_to_date rep_name
CustID** name address city Preferred haulier
Refreshing databases
• Timing
• Criticality of information
• Volume of data
• Response time
• Real-time requirement
• Level of aggregation / granularity
Life cycle of the DW
Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database
First time loadFirst time load
RefreshRefresh
RefreshRefresh
RefreshRefresh
Purge or ArchivePurge or Archive
Real time information
• Up to date
• On-line
• Actual data
• Live feed
• Decisions made on what basis?
Real time requirement?
• Historical sales or accounting data, not real-time
• Sales as quarter end approaches
• Inventory levels for MRP
• Exchange rates, when is Visa rate calculated?
• Real-time processing: card transactions down
Response times
• Response times are a function of :– response time, – Infrastructure elements, – Database sizing– Transaction processing – Interfaces – Reporting– Other processing demands– Peak times– …
Example
• Revenue reports from EMC Data warehouse
• Report can grow to >1million lines at quarter end
• Should not be run on ERP server
• Poorly designed?
Manager’s view
• Volume has been increasing at a huge pace compared to … like, you go talk to Jonathan, … my answer to it will be, get used to it, it’s not going to go away, I don’t care what you do, it’s not my problem, I want the reports, you deal with the volume of records, it’s not going to go away, you deal with it.
Refresh Optimization
Accounting
UnknownProcesses
ExternalData Sources
InternalData Sources
DataWarehouse
ETLTools
Valid Time Lag
Load Time Lag
Fact andDimensionChanges
PrimarilyDimensionChanges