15
Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Managemen t Master Data Managemen t Data Capture Data Modelling Data Security Data Quality Data Governance Business Glossary Reporting

Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

Embed Size (px)

Citation preview

Page 1: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

Big

Dat

a Lo

ndon

BIG DATA LONDONInformation ArchitectureOverview 1 June 2015

MetadataManagement

Data Integration

DataMigration

Data Lifecycle

Management

Master Data

Management

Data Capture

DataModelling

DataSecurity

Data Quality

Data Governance

BusinessGlossary

Reporting

Page 2: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

2INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

• The slide deck will explain:-

o The different components of information architecture (depicted as jigsaw pieces in a puzzle) and how

they interrelate with each other

o The deck will summarise key points of each part of the architecture. More detailed slide decks will be

created for each component.

o The overall aim of the deck is to provide sufficient information to technical staff about the discipline of

information architecture, so that it can be established within each division. The knowledge is also

useful when assessing the capabilities of vendor tools.

o The information is presented as recommendations/guidelines. It is rare for all of the elements of

information architecture to be present within an organisation, since cost & time to implement have to

be weighted against the overall benefits to the business.

Purpose of slide deck

Page 3: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

3INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is Information Architecture?

MetadataManagement

Data Integration

DataMigration

Data Lifecycle

Management

Master Data

Management

Data Capture

DataModelling

DataSecurity

Data Quality

Data Governance

BusinessGlossary

Reporting

• For a long time, the discipline of information architecture was seen as a pure IT function and referred to simply as data architecture.

• The elements of Information Architecture which are usually implemented by IT with little business input are shown in green in the diagram.

• Since the late 90s, some businesses realised that they could gain a competitive advantage by capturing better quality business data about their customers and either using it to cross-sell/up-sell products via or to sell the information to 3 rd parties.

• It was gradually realised that in order to achieve this, there needed to be more business involvement in standardising names & definitions, managing important (master) data and driving data quality initiatives via stricter data governance.

• This has led to data governance councils being formed to allow for MDM, data governance and business glossary contents to drive these initiatives. The broadening of the discipline has led to it now being called information architecture

Page 4: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

4INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Capture/Acquisition?

MetadataManagement

Data Integration

DataMigration

Data Lifecycle

Management

Master Data

Management

Data Capture

DataModelling

DataSecurity

Data Quality

Data Governance

BusinessGlossary

Reporting

• Data Capture/Acquisition covers technology and design patterns which are required to extract data from a source (database/file) and capture it to a staging area for ongoing processing.

• Typically data is captured either in batch via file transfer or in near real time by extracting information from source database logs.

• In order to avoid performance bottlenecks, message queues are often implemented between the source system and the staging area.

• It is important that you also ensure that no data is lost from source and that sensitive data is masked during transit.

Page 5: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

5INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Quality?Data

Quality

• Data Quality refers to the technical implementation of processes to :-

1. Analyse (profile) data and capture metrics identifying how good the quality of the data is

2. Standardise the structure of data where there are multiple sources for that data

3. Clean data, if possible. For example, address information can be cleansed by using address verification services.

4. Validate or reject data records which do not contain sufficient information for the target system to accept.

Note: When the business are involve in creating data quality rules, this is considered to be part of “data governance”.

MetadataManagement

Data Integration

DataMigration

Data Lifecycle

Management

Master Data

Management

DataModelling

DataSecurity

Data Governance

BusinessGlossary

Reporting

Page 6: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

6INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Integration?Data

Integration

• In the broader use of the word, data integration is all of the steps from capturing the data from source through to pushing the data in to the target system.

• However, in Information Architecture terms, data integration specifically refers to technology & design patterns used to extract data from a source, transform it and load it in to the target. Data capture & data quality steps are excluded from this narrower definition of the term.

MetadataManagement

DataMigration

Data Lifecycle

Management

Master Data

Management

DataModelling

DataSecurity

Data Governance

BusinessGlossary

Reporting

Page 7: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

7INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Migration?

• Data migration refers to needing to move data in bulk from an old system to a new system.

During data migration, typical considerations include:-

• Where to pull data from – it’s not always the original system of record, since it can be more convenient to pull data from a data warehouse which has consolidated data from multiple systems of record, should it hold all the necessary data at a transactional level

• How much history to capture. For example, only transactional data for open sales orders might need to be migrated from an old system, as transactional data is only required for operational purposes whilst the sales order hasn’t completed. Care needs to be taken, however, that regulatory requirements to hold transactional data can still be met as well as any internal management reporting requirements.

• How much change history to capture. This refers to whether you just want to migrate the current state of data in the source system or whether you wish to also migrate all of the changes that were made to that data over the years. Having change history is important for reporting things such as like for like sales, where you wish to report on how much revenue and profit has changed had the organisation remained in the same state as it was last year. This differentiates organic growth from growth via acquisition.

MetadataManagement

Data Lifecycle

Management

Master Data

Management

DataModelling

DataSecurity

Data Governance

BusinessGlossary

Reporting

DataMigration

Page 8: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

8INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Master Data Management?

• In the broader sense of the term, master data management (mdm) can refer to all initiatives to ensure that master data is of good quality, so can encompass data quality, data governance & the development of a business glossary within that term.

• In the narrower sense of the term, mdm usually refers to a tool and business processes which alert a data owner that there are master data records whch need managing, and allows a data owner to manually clean master data records and map unknown source records to known master data records.

MetadataManagement

Data Lifecycle

Management

DataModelling

DataSecurity

Data Governance

BusinessGlossary

Reporting

Master Data

Management

Page 9: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

9INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Governance?

• In the broader sense of the term, data governance can refer to all of the tasks controlled by a data governance council. So can include master data management, data governance in it’s narrower sense and the development of the business glossary. In this situation, the terms MDM & data governance are often interchangeable.

• In the narrower sense of the term, it refers to involving the business in the creation of policies, standards and rules to ensure that data is of good quality. Naming standards, Cleansing & Validation rules, policies on how to deal with poor data quality are covered by data governance.

MetadataManagement

Data Lifecycle

Management

DataModelling

DataSecurity

Data Governance

BusinessGlossary

Reporting

Page 10: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

10INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by a Business Glossary?

• A business glossary contains names of data objects, attributes (describing features) and measures (formulae) which are understood by the business. These are referred to as terms.

• A business glossary should cover all of the terms used by the business data during their every day work, including reporting.

• Since there are often many different terms used by different parts of the business to either refer to the same thing or similar things, a business glossary will allow any term to be attached to synonyms and related terms.

• Terms can be grouped in to different hierarchical structures (posh words are taxonomies & ontologies) in order to allow a business glossary user to be able to find a particular term and where it sits within a classification hierarchy.

• A business glossary can also contain information about distinct lists of values for particular data attributes.

• A business glossary should not be confused with a data dictionary. A data dictionary is used by IT to capture information about the tables & columns used to physically store data. It’s purpose is to aid development. If a reporting architecture is correctly implemented, a business user should not need to know how data is physically stored within a particular system.

MetadataManagement

Data Lifecycle

Management

DataModelling

DataSecurity

Reporting

BusinessGlossary

Page 11: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

11INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Reporting?

Reporting refers to:-

1. Statutory reporting decks2. Operational reporting – which allows a business to do it’s

daily activities3. Analytical reporting – which allows management to use

aggregated data to be able to monitor business activity on a more macro scale.

4. Data discovery & querying – allows an operational user to visually examine data.

MetadataManagement

Data Lifecycle

Management

DataModelling

DataSecurity

Reporting

Page 12: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

12INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Modelling?

• Data Modelling refers to conceptual, logical & physical modelling of data.

• Conceptual data modelling simply captures data objects (entities) as they’re understood by the business. For example, customer, product, supplier are entities which the business refer to. At this stage, relationships between the entities may or may not be included in the model. The purpose of conceptual data models is that you can use them to more easily talk to the business without the complication of detail that the logical & physical data models introduce.

• Logical data modelling adds in all of the detailed data fields (attributes/measures) and establishes all of the relationships. A logical data model is useful for front end development e.g. web forms or reporting.

• Physical data modelling converts a logical data model in to a form that’s suited to a particular database. So specific database objects such as views, indexes, sequences, storage charactistics are added in this model, and table/column names may well be abbreviated due to restrictions on name lengths or to speed up development.

MetadataManagement

Data Lifecycle

Management

DataModelling

DataSecurity

Page 13: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

13INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Security?

• Data security is a broad subject covering the need to ensure that only the right people can see the right data.

Data security includes:-

• User/role authentication i.e. assigning rights to create/read/update or delete (CRUD) data from objects

• Data auditing – this is the capturing of CRUD operations done by users. It ensures that should data be lost or a database authentication has been breached, that we know who was responsible.

• Data Masking – can refer to “on the fly” masking of sensitive data (often called data redaction) or the permanent masking of data required if you wish to migrate production data in to a non-production environment

• Data encryption – In databases that have highly sensitive data, as well as providing users with passwords, data is encrypted throughout it’s lifecycle, and is decrypted on the fly by use of encryption keys

MetadataManagement

Data Lifecycle

Management

DataSecurity

Page 14: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

14INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Metadata Management?

• This refers to the capture of information about the structure of data (metadata) held in the source system, target system and at each stage as it goes through the data capture, quality integration phases right through until it’s consumed by an application or report.

• The metadata is held in a metadata repository and allows data lineage (tracing of data from target to source or vice versa) to be achieved. Data lineage is useful when a consumer of a report, say, wishes to verify from which source, data in a particular field originates from.

Note: Metadata management remains relatively elusive. The main reason for this is that most tools use proprietary metadata which varies from version to version. Other tools will typically only support metadata integration for popular versions of the most popular tools in any particular area of information management. Even buying all your tools from a single vendor does not overcome this issue, as many tools supplied by a single vendor weren’t originally developed in-house and it can take several years for their metadata to become fully integrated.

Data Lifecycle

Management

MetadataManagement

Page 15: Big Data London BIG DATA LONDON Information Architecture Overview 1 June 2015 Metadata Management Data Integration Data Migration Data Lifecycle Management

15INFORMATION ARCHITECTURE 1 JUNE 2015

Big Data London

What is meant by Data Lifecycle Management?

• Data Lifecycle management is typically considered after a database has gone live. In the early days all data can easily be stored in a new database without impacting performance. However, as more data is added, performance gets worse and the need for data lifecycle management surfaces.

• In the past, your only option was to archive data off to tape and hold it in a tape archive with requests for retrieval of data in terms of days or weeks.

• Nowadays, data that is kept in the original database and is needed for operational purposes is referred to as “hot” data.

• Once data is no longer needed to support the majority of daily activities, the data can be moved to cheaper online storage outside of the original database but still accessible. This is known as “warm” data. Solutions nowadays allow the differentiation between “hot” and “warm” data to be invisible to the end user, other than from a relative degradation in performance when accessing the “warm” data.

• Once data is no longer needed other than for retention purposes such as regulatory requirements, then it can be moved to traditional tape archive solutions. Data in a tape archive is called “cold” data.

• Data lifecycle management refers to implementing data retention & archiving policies to move data from hot to warm to cold environments

Data Lifecycle

Management