23
G00227024 Top 10 Technology Trends Impacting Information Infrastructure, 2012 Published: 23 December 2011 Analyst(s): Regina Casonato, Andrew White, Mark A. Beyer, Merv Adrian, Ted Friedman, Mike Blechar, Whit Andrews As evidenced in Gartner's 2012 Hype Cycles, significant innovation continues in the field of information management (IM) technologies. The key factors driving this innovation are the explosion in the volume, velocity and variety of information and the huge amount of value — and potential liability — locked inside all this ungoverned information. But the growth in information volume, velocity, variety and complexity and the new information use cases makes IM infinitely more difficult than it has been in the past. In addition to the new internal and external sources of information, practically all information assets must be available for delivery through varied, multiple, concurrent — and, in a growing number of instances, in real time — channels and mobile devices. All this demands shareability and reusability of information for multiple context delivery and use cases. The 10 technology trends analyzed in this document can play different roles in modernizing IM. Some help structure information, some help organize and mine information, some integrate and share information across applications and repositories, some store information, and some govern information. Key Findings Some IM technologies will enable enterprises to reduce risks, improve quality and avoid certain costs. Others will transform how they use information as an asset. The top 10 technology trends identified will help enterprises cope with vastly increased volumes of information, as well as its increased velocity, variety and the proliferation of use cases for information. Some address the diversity of information, some tackle the volume of information, and some help manage information as a strategic asset and improve pattern recognition and discovery. However, modernizing current information infrastructure and management capabilities requires more than just technology — it requires the right IT, people, roles and processes. Use of some of these technologies requires new skills, as well as dedicated analysts to interpret the results.

Top 10 Technology Trends

Embed Size (px)

DESCRIPTION

Discover the top 10 technology trends that affect your world! Read this white paper to see what changes are expected for the near term future.

Citation preview

G00227024

Top 10 Technology Trends ImpactingInformation Infrastructure, 2012Published: 23 December 2011

Analyst(s): Regina Casonato, Andrew White, Mark A. Beyer, Merv Adrian, Ted Friedman, Mike Blechar,Whit Andrews

As evidenced in Gartner's 2012 Hype Cycles, significant innovation continues in the field ofinformation management (IM) technologies. The key factors driving this innovation are the explosionin the volume, velocity and variety of information and the huge amount of value — and potentialliability — locked inside all this ungoverned information.

But the growth in information volume, velocity, variety and complexity and the new information usecases makes IM infinitely more difficult than it has been in the past. In addition to the new internaland external sources of information, practically all information assets must be available for deliverythrough varied, multiple, concurrent — and, in a growing number of instances, in real time —channels and mobile devices. All this demands shareability and reusability of information formultiple context delivery and use cases.

The 10 technology trends analyzed in this document can play different roles in modernizing IM.Some help structure information, some help organize and mine information, some integrate andshare information across applications and repositories, some store information, and some governinformation.

Key Findings■ Some IM technologies will enable enterprises to reduce risks, improve quality and avoid certain

costs. Others will transform how they use information as an asset.

■ The top 10 technology trends identified will help enterprises cope with vastly increased volumesof information, as well as its increased velocity, variety and the proliferation of use cases forinformation. Some address the diversity of information, some tackle the volume of information,and some help manage information as a strategic asset and improve pattern recognition anddiscovery. However, modernizing current information infrastructure and managementcapabilities requires more than just technology — it requires the right IT, people, roles andprocesses.

■ Use of some of these technologies requires new skills, as well as dedicated analysts to interpretthe results.

Recommendations■ Determine, for your current IM projects and those planned in the short term, whether the main

emerging trends in technology have been factored into your strategic planning process.

■ Identify where these 10 technology trends are in Gartner's 2011 Hype Cycles for master datamanagement (MDM), content management, data management and enterprise informationmanagement (EIM). Anticipate their impact and plan adoption accordingly. Some of thesetechnologies are in an early stage of enterprise adoption and come with a risk of failure.

Table of Contents

Analysis..................................................................................................................................................2

Overview..........................................................................................................................................2

Organize Information...................................................................................................................3

Describe Information...................................................................................................................4

Integrate Information...................................................................................................................7

Share Information.....................................................................................................................12

Integrate/Store Information.......................................................................................................16

Govern Information...................................................................................................................18

Recommended Reading.......................................................................................................................21

List of Tables

Table 1. Top 10 Technology Trends Likely to Impact Information Management in 2012..........................3

List of Figures

Figure 1. Common Features and Use Cases for Application Integration and Data Integration...............11

Analysis

Overview

This document updates Gartner's analysis of the top 10 technology trends in IM (see Table 1). Weview these technologies as strategic because they directly address some of the challenges faced byenterprises, as set out in "Information Management in the 21st Century."

This document should be read by CIOs, information architects, information managers, enterprisearchitects and anyone working in a team evaluating or implementing an enterprise content

Page 2 of 23 Gartner, Inc. | G00227024

management system, a business intelligence (BI) application, an MDM technology, a document-centric collaboration initiative, an e-discovery effort, an information access project or a datawarehouse (DW) system.

Table 1. Top 10 Technology Trends Likely to Impact Information Management in 2012

Purpose Technology Trend

Organize Content Analytics

Describe Enterprise Taxonomy and Ontology ManagementBig Data and Extreme Information Processing and Management

Integrate Enterprisewide Metadata RepositoriesData/Application Integration Convergence

Share Information Semantic ServicesLogical Data Warehouse

Integrate/Store Hadoop MapReduce Implementations

Govern Multiple Domain Master Data ManagementData Stewardship Applications

Source: Gartner (December 2011)

Organize Information

Content Analytics

Analysis by Whit Andrews

Content analytics defines a family of technologies that processes content and the behavior of usersin consuming content to derive answers to specific questions. Content types include text of allkinds, such as documents, blogs, news sites, customer conversations (both audio and text), andsocial network discussions. Analytic approaches include text analytics, rich media and speechanalytics, and behavioral analytics. Applications of the technology are broad; it can serve to aid in:sentiment analysis; reputation management; trend analysis; affinity; recommendations; facerecognition; industry-focused analytics (such as voice of the customer) to analyze call center data;fraud detection for insurance companies; crime detection to support law enforcement activities;competitive intelligence; and understanding consumers' reactions to a new product.

Multicomponent functions are constructed by serializing simpler functions. The output of oneanalysis is passed as input to the next. As virtually all content analytics applications use proprietaryAPIs to integrate functions, today there's no way to easily construct analyses from applicationscreated by different vendors. In the future, the Unstructured Information Management Architecture(UIMA), governed by the Organization for the Advancement of Structured Information Standards

Gartner, Inc. | G00227024 Page 3 of 23

(OASIS), may serve this purpose. Such a standard for unstructured data would serve a similarpurpose to Structured Query Language (SQL) for structured data.

Due to the explosion of social networking analyses, particularly sentiment analysis, use of bothgeneral- and special-purpose content analytics applications continues to grow, as stand-aloneapplications and as extensions to search and content management applications, as well as morespecifically targeted business applications. The greatest growth comes from generally availablecontent resources, such as contact center records and postsale service accounts, leading touptake, especially in CRM. Also, open-source intelligence is seeking to use content analytics formore effective understanding of public and semipublic sentiment.

Recommendations:

■ Enterprises should understand the broad range of functions that content analytics can be usedfor. It can identify high-priority clients, product problems, and customer sentiment and serviceproblems; analyze competitors' activities and consumers' responses to a new product; supportsecurity and law enforcement operations by analyzing photographs; and detect fraud byanalyzing complex behavioral patterns. Increasingly, it replaces difficult and time-consuminghuman analyses with automation, often making previously impossible tasks tractable. Complexresults are often represented as visualizations (possibly graphical but also textual), making themeasier for people to understand.

■ Enterprises should employ content analytics to replace/augment time-consuming and complexhuman analyses.

■ Firms should identify the analytics that are most able to simplify and demystify complexbusiness processes.

■ Users should identify vendors with specific products that meet their requirements and shouldreview case studies to understand how other users have exploited these technologies.

Describe Information

Enterprise Taxonomy and Ontology Management

Analysis by Mark Beyer

At its core, the ontological method seeks to determine a universal commonality (think: singlerepresentation to start with) that permits the sharing of concepts across semantic barriers (forexample, can many people look at something from different viewpoints and realize they are lookingat the same thing?). Taxonomic design is the process of defining categories and the relationshipsbetween them — of determining what features make one thing different from another. In therelational world, these categories and subcategories are expressed as entities and attributes. Thetechniques for ontology and taxonomy management are still at an early stage of development andthe definition of the business needs continues to specify features or functionality in tools.

Page 4 of 23 Gartner, Inc. | G00227024

When analyzing multiple taxonomies in relation to one another, it is important to recognize thatmultiple metadata descriptions can exist simultaneously for each information asset. In other words,different people and systems attach seemingly different meanings to the same entity. The variousmetadata sources include business process modeling, federation, extraction, transformation andloading, metadata repository technologies and many others. When rationalizing more than onetaxonomy, it is useful to remember that taxonomy design and ontological development occursimultaneously and implicitly. Data integration, at its core, is the process of rationalizing onetaxonomy and its incumbent ontology to a second, disparate taxonomy, but with a similar ontology(the ontology of each must have some similarity or else there is little reason to put them together). Inother words, it is the reconciliation of meaning — the taxonomies have underlying commonalitieswhich create shared meaning. So to answer the question "Should we reconcile these taxonomies?,"ask whether they are describing more or less the same thing. Reconciling two taxonomies to eachother is "point to point" integration.

An alternative to point-to-point data integration can be a process of taking two or more datataxonomies and creating a series of rationalization rules (coded as transformations) that move themto a neutrally designed taxonomy. To do so, there must be some unifying set of principles thatbrings the datasets together. These are the ontology rules of the new, neutrally designed taxonomy.

The process of resolving multiple ontologies and their explicit or implicit rules with multipletaxonomy definitions has an extremely high degree of potential for organizations, in that thisresolution can enable them to leverage data from a much wider array of sources — opening up thepossibilities of cloud-based data, enabling rapid integration of new supplier and partner data, andleveraging market data. Gartner sees a growing need for these capabilities, based on the followingevidence.

Throughout 2011, the business demand for taxonomy and ontology management began to emergefrom its embryonic state. Organizations have realized that sharing information requires ontologicalcontinuity. Continuity is not the same as consistency. Continuity merely requires that I can follow achain of reasoning from point A through points B, C and D and arrive at point E. In a businessrequirement, this means visibility into an audit trail that tracks information as it is rationalized fromone representation to another. In other words, "Why are we saying the corporations are people?And why are hours of the days considered an asset again?"

Also in 2011, many organizations began to pursue information taxonomy management through theuse of canonical models. Canonical models have significant benefit, but they represent a declaredstyle of ontology/taxonomy pairing. With the rise of NoSQL deployments, social media data and thehuge influx of operational technology (machine data), it is more important than ever to address themanagement that an information concept can belong to more than one taxonomy/ontology pair.Focusing on a single model will not yield good results if enforcement is the goal. If, however,learning is the objective, and testing and re-testing architectures is the goal, using a canonicalmodel as a learning tool to do so, then this will provide valuable and insightful learning experiences.

Recommendations:

Information architects vexed by taxonomy and ontology issues should:

Gartner, Inc. | G00227024 Page 5 of 23

■ Identify information-related semantic tools that resolve differing taxonomy and ontology pairs.These include federation tools with advanced modeling capabilities and domain analytics,which populate extensible metadata repositories.

■ Initiate text mining and analytics with a practice to align relational data and content analysismetadata.

■ Acclimatize designated business personnel to their role in creating information assets and to theimportance of metadata management as a precursor to introducing enterprise metadatataxonomy and ontology management practices. Utilize industry-related canonical models ascandidates for leading this discussion.

Big Data and Extreme Information Processing and Management

Analysis by Merv Adrian, Mark Beyer

According to Gartner research, information quantification includes issues arising from the volumeand the wide variety of information assets, and the velocity of their arrival (which includes both rapidrecord creation and highly variable rates of data creation, including suddenly lower velocity forcingthe issue of scaling "down"). Within each variety of data types, there are significant variations andstandards, creating additional complexity, including complexity in the processing of these variousresources. "Big data" is the term adopted by the market to describe extreme informationmanagement and processing issues which exceed the capability of traditional informationtechnology along one or multiple dimensions to support the use of the information assets.Throughout 2011 and into 2012, discussion about "big data" has focused primarily on thequantification issues associated with the extremely large datasets generated from technologypractices such as social media, operational technology, Internet logging and streaming sources. Awide array of hardware and software solutions has emerged to address these issues. At this point,"big data" or, more broadly, extreme information processing and management, has become adisruptive transformation that presents new business opportunities.

The larger context of "big data" challenges existing practices of selecting which data to integratewith the proposition that all information can be integrated, and that technology should be developedto support this. As a new issue driving requirements that demand an approach, the breaching oftraditional boundaries will occur extremely fast because the many sources of new informationassets are increasing geometrically (for example, desktops became notebooks and now tablets;portable data is everywhere and in multiple context formats), which is causing exponential increasesin data volumes. Additionally, the information assets include the entire spectrum of the informationcontent continuum, from fully undetermined structure ("content") to fully documented andtraditionally accessed structures ("structured"). As a result, organizations will seek to address thefull spectrum of extreme information management issues, and will seek this as differentiation fromtheir competitors, so they can become leaders in their markets in the next two to five years. "Bigdata" is thus a current issue (focused on volume and including the velocity and variety of data, or,together, V3) which highlights a much larger extreme information management topic demandingalmost immediate solutions.

Page 6 of 23 Gartner, Inc. | G00227024

Vendors are almost universally claiming that they have a big data strategy or solution. However,Gartner clients have made it clear that big data must include large volumes processed in streamsand batch (not just MapReduce), and an extensible services framework which can deployprocessing to the data or bring data to the process, and which spans more than one variety of assettype (for example, not just tabular, or just streams or just text). Partial solutions are acceptable butshould be evaluated for what they do — not the additional claims. Gartner clients are making a verylarge number of inquiries into this topic; and this is evidence of growing hype in the market.Importantly, the different aspects and types of "big data" have been around for more than a decade— it is only recent market hype around legitimate new techniques and solutions which has createdthis heightened demand.

Enterprises should understand the many "big data" use cases. "Big data," and addressing all theextreme aspects of 21st-century information management, permits greater analysis of all availabledata, detecting even the smallest details of the information corpus — a precursor to effectivePattern-Based Strategies and the new type of applications they enable. "Big data" has many usecases. In the case of complex event processing, queries are complex with many different feeds, andthe volume may be high or not high, the velocity will vary from high to low, and so on. Volumeanalytics can leverage technology such as the Apache Hadoop project, for example. In addition toMapReduce approaches which access data in external Hadoop Distributed File System (HDFS)files, BI use cases can utilize it in-database (for example, Aster Data and Greenplum), or as aservice call managed by the database management system (IBM InfoSphere BigInsights, forexample), or externally through third-party software (such as Apache Hadoop distributions fromCloudera or MapR).

Recommendations:

■ Identify a volume analytics use case to implement a pilot effort for distributed processing, suchas MapReduce. Enterprises using portals as a business delivery channel already shouldleverage the opportunity to combine geospatial, demographic, economic and engagementpreferences data in analyzing their operations, and/or to leverage this type of data in developingnew process models. For example, supply chain situations include location tracking throughroute and time, which can be combined with business process tracking. Life sciences generateenormous volumes of data in clinical trials, genomic research and environmental analysis ascontributing factors to health conditions. CIOs and IT leaders should investigate theopportunities and understand the challenges of extreme information management. Gartnerestimates that organizations which have introduced the full spectrum of extreme informationmanagement issues to their information management strategies by 2015 will begin tooutperform their unprepared competitors within their industry sectors by 20% in every availablefinancial metric.

Integrate Information

Enterprisewide Metadata Repositories

Analysis by Michael Blechar

Gartner, Inc. | G00227024 Page 7 of 23

In "Gartner Clarifies the Definition of Metadata," Gartner defines metadata as "information thatdescribes various facets of an information asset to improve its usability throughout its life cycle."Generally speaking, the more valuable the information asset, the more critical managing themetadata about it becomes, because the contextual definition of metadata provides understandingthat unlocks the value of data. Examples of metadata are abstracted levels of information about thecharacteristics of an information asset, such as its name, location, perceived importance, quality orvalue to the organization, as well as its relationship to other information assets.

Metadata can be stored as artifacts in "metadata repositories" in the form of digital data aboutinformation assets that the enterprise wants to manage. Metadata repositories are used todocument and manage metadata (in terms of governance, compliance, security and collaborativesharing), and to perform analysis (such as change impact analysis and gap analysis) using themetadata. Repositories can also be used to publish reusable assets (such as application and dataservices) and browse metadata during life cycle activities (design, testing, release management andso on). As a result, metadata management and repositories are a key component of theimplementation of the Gartner Information Capabilities Framework (see "Defining the Scope ofMetadata Management for the Information Capabilities Framework" for more details).

In "The Eight Common Sources of Metadata" we have explored a range of solutions to meet"enterprisewide" metadata management needs. These include several categories of metadatarepositories, such as those used in support of tool suites (tool suite repositories), project-levelinitiatives and programs (community-based repositories), and those used to federate andconsolidate metadata from multiple sources (enterprise repositories) to manage metadata in a more"enterprisewide" fashion.

Most Global 1000 enterprises own one or more repositories that purport to address the need forenterprisewide metadata management. However, few organizations are using their solutionseffectively in an "enterprisewide manner" to either federate their metadata across tool suites orcommunity-based repositories, or to consolidate the metadata into one main enterprise repository(see "Six Common Approaches to Metadata Federation and Consolidation"). This is due to thedifficulties related to integrating metadata with an enterprisewide breadth of scope.

The growth in volume, velocity, variety and complexity of information, and the new use cases withinsatiable demand for real-time access to socially mediated and context-aware information, aredramatically increasing the need for improved metadata management. And these in turn also raisenew enterprisewide governance, risk and compliance needs which require more integrated(federated or consolidated) repository solutions. Unfortunately, there are no easy solutions tomanaging metadata on an enterprisewide basis, so, not surprisingly, there is no ideal solution interms of the metadata repository market to meet the need at this point in time.

We are seeing more and more organizations — even those that already own enterprise repositories— acquiring several other "best-of-breed" repositories, each focused on different communities ofusers in projects and programs involving data warehousing, MDM, business process modeling andanalysis, service-oriented architecture (SOA) and data integration, to name just a few "types ofcommunities." In each case, these community-focused repositories have shown benefits inimproved quality and productivity through an improved understanding of the artifacts, the impactqueries and the reuse of assets, such as data and process artifacts, services and components. This

Page 8 of 23 Gartner, Inc. | G00227024

has resulted in the "subsetting" of what once was the enterprise repository market into smaller"communities of interest," using solutions that are less expensive and easier to manage. However,attempting to federate metadata across multiple repositories to provide an "enterprisewide view ofmetadata" is no simple task.

Because of these metadata federation issues, enterprise repository vendors are now marketing theirsolutions "a piece at a time" — in other words, starting with just the subset required for the needs ofthe first community (for example, data warehousing) and then adding other components for theneeds of the second, third and other communities (like MDM or SOA). This has the advantage ofallowing each community to work within its own area of focus without having to worry about theissue of federating multiple community repositories when other communities are added. Or, in somecases, organizations are starting with the best-of-breed community repositories and, instead offederating across them, are passing subsets of metadata from each of the community repositoriesto an enterprise repository for consolidation and easier reporting. However, even in these cases,organizations are not reverting back to the use of enterprise repositories on the scale seen in the1970s.

While the need for enterprisewide metadata management is not being met by current technologies,we include this in the Top 10 technology trends for 2012 due to organizations' increased focus onusing data to improve business operations. This requires greater information and metadatamanagement, and demands that enterprises broaden their understanding and skills onentrerprisewide metadata management, starting with federation.

Recommendations:

By providing understanding, governance, change impact analysis and improved levels of reuse ofinformation assets, the impact of the metadata repositories on the business can be significant.

■ Define metadata in a way that demonstrates the value of information to the organization.

■ Develop a metadata management strategy which incrementally improves the standardizationand sharing of metadata.

■ Do not underestimate the importance of metadata management to the success of theenterprise's ability to create value from information assets, nor the challenges in successfullymanaging the metadata.

■ Identify the uses of the term "metadata" throughout the organization to help organizationsdetermine the level of effort necessary to implement metadata management.

■ Form a cross-organizational team to reach agreement on the fundamental purpose of metadataand on what information asset is being described.

■ Identify the scope of your metadata management initiatives based on business opportunitiesand threats and the ability to govern the metadata you capture.

■ Be selective in which sources you use, since (like the data being described) metadata can beredundant, inconsistent and erroneous, depending on the chosen source.

Gartner, Inc. | G00227024 Page 9 of 23

■ Start with evaluating the metadata management tools you currently have, including theirfederation/integration capabilities.

When looking to acquire a metadata repository or evaluate the metadata management capabilitiesof a given technology, use the analytic hierarchy process (AHP) described in "Decision Frameworkfor Evaluating Metadata Repositories" (or something similar), which includes a decision frameworkfor defining and weighting criteria in areas such as tool functionality, vendor execution, service andsupport, vendor vision and cost.

Data/Application Integration Convergence

Analysis by Ted Friedman and Jess Thompson

Application integration enables applications that were designed independently to work together. Inthe mid-1990s most IT organizations thought of application integration as creating interfaces thatmoved data between applications (data consistency integration) and that executed applications tocomplete processes (multistep process integration). Since then, organizations have extended thestyles of integration to include composite applications, which use existing assets to eliminate theredundant development of business logic and the creation of duplicate data.

Data integration enables data structures that were designed independently to be leveragedtogether. The discipline of data integration comprises the practices, architectural techniques andtools for achieving consistent access to, and delivery of, data across the spectrum of data subjectareas and data structure types in the enterprise, to meet the data consumption requirements of allapplications and business processes. As such, data integration capabilities are at the heart of theinformation infrastructure and will power the alignment and delivery of data in support of varioususe cases such as BI/analytics, MDM, and more.

In most companies, different teams within the IT organization handle application integration anddata integration. As IT staff become experienced in those disciplines they develop differentapproaches. Application integration staff tend to focus on making independently designedapplications work together, and the data management and integration staff focus on enabling datastructures that are designed independently to be leveraged together, such as aggregating data. Yetthese different approaches aim to support the same business strategy and goals. Organizations canuse both application integration techniques and data integration techniques to solve many of thesame problems. The choice of one approach versus the other is often made tactically, driven fromone point of view or the other. But application integration and data integration are better together.The challenge goes beyond simply deploying both types of technology. Instead, organizations needto consider both approaches when looking at a particular integration problem, and then choosingthe most appropriate solution for that problem, which may include both application integration anddata integration techniques.

The differences in functionality between the two disciplines and their respective technology classesmean that each is best suited to certain problem types (see Figure 1). The common problem typescan often be readily solved with either discipline and related technologies. The problem typesindicated as distinct represent sweet spots best addressed by one or the other. It is critical to notethat there is significant (and growing) overlap across the problem space. It may not be "best fit" and

Page 10 of 23 Gartner, Inc. | G00227024

most effective, but it is entirely possible to apply one discipline and technology class to virtually anyproblem type.

However, doing so can dramatically increase deployment cost and risk — substantial augmentation(in the form of custom coding, manual processes and additional administrative overhead) is likely tobe required to completely meet requirements, or certain requirements may go unmet.

An increasing number of organizations, as seen in an upward trend for Gartner client inquiries onthis topic, are recognizing the value of managing integration in a consistent way across bothdisciplines. This means that they are considering both application integration and data integrationtechniques and tools as possible solutions for a given integration problem, and then choosing the"best fit" approach based on the problem characteristics.

Further, they are seeking ways in which both disciplines and the associated tools can be usedtogether to address problems more completely and effectively than with a single approach tointegration.

Figure 1. Common Features and Use Cases for Application Integration and Data Integration

Features

App Integration

• Monitoring

• Quality of service

• Communitymanagement

• Message-basedtransformation

Common Features

• Communications

• Transformation

• Adapters

• Security

• Governance

• Management and administration

Data Integration

• Modeling/metadata

• Data profiling/quality

• Management and administration

• Set-based transformations

• Data federation/virtualization

Problem Types and Use Cases

Common

• Data consistency

• SOA support

• B2B

• Master data management

App Integration

• Composite apps

• Multistep processes

Data Integration

• Data preparation for BI/analytics

• Data migration/consolidation

• HA/DR (replication)

BI = business intelligence, DR = disaster recovery, HA = high availability, SOA = service-oriented architecture

Source: Gartner (December 2011)

Gartner, Inc. | G00227024 Page 11 of 23

Recommendations:

■ Lay out a vision for bringing together your application integration and data integrationtechnologies.

■ Identify your IT teams with responsibility for application integration and data integration.

■ Inventory the technologies supporting your application integration and data integration projects.

■ Establish centers of excellence, to develop, plan and execute the integration of your applicationintegration and data integration technologies.

■ Actively apply reference architectures that include opportunities to use application integrationand data integration capabilities together.

■ Establish and begin building best practices that use those reference architectures.

Share Information

Information Semantic Services

Analysis by Mark Beyer

Information semantic services are a layer of services that provides a specific entry or "gate" intoinformation management functions or capabilities. Information semantics follow styles orapproaches, and those approaches support specific assumptions on how an application interfaceswith the data it uses. For greater detail on this approach, see "Information Management in the 21st

Century Is About All Kinds of Semantics." Categories of information semantic services include:

■ Dedicated — Applications assume they have priority and the authority to manage the underlyinginformation (legacy applications, for example).

■ Registry — Applications assume that the authority for management is derived from governanceprocesses in other data handlers (for example, reading from other applications, MDM or ERP).

■ Consolidation — Applications assume the authority to collect and collate information in aphysical repository (for example, data integration and DW).

■ External — Applications read data from another single source of information governance,usually external to the organization, but it can be external to the application (for example,reading from an ERP app).

■ Auditing — Applications read metadata or data as created in a governance model (for example,taxonomy/ontology), and use comparative analysis (BI, for example).

■ Orchestration — Applications use a convergence of multiple information assets, includingreformatting or enrichment, as needed (for example, complex-event processing).

A modern IM approach will continue a slow evolution toward information governance practices andwill become increasingly independent from application design. At the same time, services

Page 12 of 23 Gartner, Inc. | G00227024

enablement will become the standard even in packaged applications. Packaged applications willevolve to become composite services networks, and custom applications will want to utilizeinformation services in common with those packaged applications. A key component in thisapproach is to negotiate how different applications will interact with the information assets in theorganization. Much of this interaction will be based on semantics that interpret application needs forwriting, reading and presenting data, so they can leverage finer-grained and easily recomposeddata architectures that address describing, organizing, integrating, sharing and governinginformation through a common framework and shared governance model. However, while servicearchitectures are increasingly important, they are not the only option. Enterprise informationarchitecture is responsible for determining if and when service approaches should be used incontrast to the other options that are available (such as single-purpose applications with dedicatedrepositories).

Gartner's view is that consistent semantic styles will emerge to enable widely varied applications toefficiently reuse a comprehensive information architecture, instead of having to write dedicatedapplication-to-data interfaces (see "Information Management in the 21st Century Is About All Kindsof Semantics").

Recommendations:

Organizations should focus on delivering a semantic layer to support the use of granular dataservices. They should also:

■ Develop an enterprise information architecture approach that includes a focused strategy foraddressing high-value information reuse, and which includes all of the aspects of theinformation capabilities framework through the following steps.

■ Create a candidate pool of high reuse information assets. Identify which systems in theorganization populate and manage data as the originating source of information that is thenshared across multiple business processes. They should then further qualify this list to identifythose systems that are targeted for use in planned future deployments. These are "deepwaterfalls" of data management, with plans for them to get even "deeper."

■ Develop a "skeletal" business process model that focuses on locations where informationusage crosses from one business process to another. Following the guidance of the business,determine if these "crossover" points are also key performance metrics. Quantify and qualify thedefinitional, format and governance differences of the data in the different processes todetermine the level of work needed to develop a reusable data architecture at only thosecrossover points where there is also a candidate key metric input.

■ Identify which information assets move most frequently from one business process to anotherand the systems involved. Determine the level of change to the data as it moves through theinfrastructure. If change is minimal when audited back to the source system, do not revise thosesystems or the data governance, as that is evidence of an accepted governance andmanagement strategy. For those cases wherein the data is augmented or values are changed,substituted or over-written, the enterprise architecture team needs to lead in creating an

Gartner, Inc. | G00227024 Page 13 of 23

independent model and follow a significantly more orchestration/implementation-styleapproach.

The Logical Data Warehouse

Analysis by Mark Beyer

DW architecture is undergoing an important evolution, compared with the relative stasis of theprevious 25 years. While the term "data warehouse" was coined around 1989, the architectural stylepredated the term (at American Airlines, Frito-Lay and Coca-Cola).

At its core, a DW is a negotiated, consistent logical model that is populated using predefinedtransformation processes. Over the years, the various options — centralized enterprise DW,federated marts, hub-and-spoke array of central warehouse with dependent marts, and virtualwarehouse — have all served to emphasize certain aspects of the service expectations for a DW.The common thread running through all styles is that they were repository-oriented. This, however,is changing: the DW is evolving from competing repository concepts to include a fully enabled datamanagement and information-processing platform. This new warehouse forces a complete rethinkof how data is manipulated, and where in the architecture each type of processing occurs tosupport transformation and integration. It also introduces a governance model that is only looselycoupled with data models and file structures, as opposed to the very tight, physical orientationpreviously used.

This new type of warehouse — the Logical Data Warehouse (LDW) — is an informationmanagement and access engine that takes an architectural approach which de-emphasizesrepositories in favor of new guidelines:

■ The LDW follows a semantic directive to orchestrate the consolidation and sharing ofinformation assets, as opposed to one that focuses exclusively on storing integrated datasets.The LDW is highly dependent upon the introduction of information semantic services (see theInformation Semantic Services section above).

■ The semantics are described by governance rules from data creation and use case businessprocesses in a data management layer, instead of via a negotiated, static transformationprocess located within individual tools or platforms.

■ Integration leverages both steady-state data assets in repositories and services in a flexible,audited model via the best available optimization and comprehension solution available.

To develop an LDW, enterprises should start by introducing the LDW concepts used for dynamicconsolidation, integration and implementation (see "Does the 21st-Century "Big Data" WarehouseMean the End of the Enterprise Data Warehouse?").

■ The data integration process can be broken into sourcing, collation, data quality, formatting anddomain governance segments, based on information availability and governance rules. Forexample, the sourcing/extraction process can be a registry semantic layer using "describe"verbs that tell the service "where" the data is. If data for "person" is located in documents,clickstreams and enterprise systems, one service can use textual analysis and search for

Page 14 of 23 Gartner, Inc. | G00227024

documents, another service can use MapReduce to read massive volumes of tags in "clicks,"and a traditional native driver access approach can pull data from the enterprise systemdatabase. A data quality process can then verify the work done by each service and undertakean enrichment and/or value substitution process, before prepping the data for delivery. If thedata is dynamic and constantly changing, the data integration process can deliver a virtual dataobject, but if the data is already validated by an MDM process and fairly static, it can be loadedinto a table or file. A final service can determine the appropriate load or access format and putthe data into that format.

■ In relation to latency issues, you are no longer bound by load restrictions. It is possible toindicate in a metadata layer that there are different requirements for different analytic end-usecases. For example, one department may require higher-quality data but tolerate higher-latencydelivery (it would get data from fully validated tables), while another department might beprepared to risk inconsistencies in data but require low latency (it would get a combined-registry delivery of yesterday's data in the tables with today's data from the online transactionprocessing system — "dirty but fast"). Or, instead of this fixed approach, you could have aservice that negotiates whether the quality SLA is being met for each of the departments andswitches between strategies dynamically. For example, the department requiring low latencymight receive data from the warehouse repository in the morning, after the previous night's loadhad brought everything up to date, but in the afternoon it might receive a composite view. And,instead of switching at a predetermined time of day, the switch would be based on how far outof synchronization the two sources are, based on record counts and data quality ratings.

■ A dynamic service that determines when to write summary or aggregate data is generally fasterthan one that performs a query-time summary of detailed rows. It could even switch on thebasis of CPU and storage utilization/performance audits, and change its approach throughoutthe day. It could also switch dynamically between approaches on the basis of system auditsthat determine whether more memory is added for caching, or even if it is worthwhile to performcaching.

■ Adding external data based on services written to read and analyze those data sources alsobecomes easier. For example, adding operational-technology data such as the millions ofrecords generated each day by RFID-enabled supply chain management tracking systems orutility smart grid meters requires massive data integration processing in a procedural mannerwhen using traditional warehouses. But by developing two or three variants of the sameMapReduce function, the LDW can orchestrate the preferred approach for different analystaudiences and leave the data in the source or the historian software (see "Historian Softwareand the Smart Grid: Questions and Misconceptions").

Note that with the LDW approach, the differing styles of support, such as federation, datarepositories, messaging and reductions, are not mutually exclusive. They are just manifestations ofdata delivery. The focus is on getting the data first, then figuring out the delivery approach that bestachieves the SLA with the querying application. The transformations occur in separate services.

Gartner, Inc. | G00227024 Page 15 of 23

Recommendations:

■ Start your evolution toward an LDW by identifying data assets that are not easily addressed bytraditional data integration approaches and/or easily supported by a "single version of thetruth." Consider all technology options for data access and do not focus only on consolidatedrepositories. This is especially relevant to "big data" issues.

■ Identify pilot projects in which to use LDW concepts by focusing on highly volatile andsignificantly interdependent business processes.

■ Use an LDW to create a single, logically consistent information resource independent of anysemantic layer that is specific to an analytic platform. The LDW should manage reusedsemantics and reused data.

Integrate/Store Information

Hadoop MapReduce Implementations

Analysis by Merv Adrian and Donald Feinberg

MapReduce is a programming model for the processing and storage of both structured andsemistructured data using parallel programming techniques. Although there are manyimplementations of MapReduce, Hadoop MapReduce is the most widely used. MapReduce and theHDFS are subprojects of Hadoop Common in the Apache project hierarchy, and as of November2011 are commonly referred to as Apache Hadoop. The original open-source Hadoop system isdescribed in the paper "MapReduce: Simplified Data Processing on Large Clusters," presented byGoogle researchers at the Sixth Symposium on Operating System Design and Implementation in2004.

The business value of Hadoop MapReduce lies in the performance improvement achieved byenabling the analysis of many complex mixed data types stored in a database management system(DBMS), HDFS, or HBase (another Apache project, a distributed, versioned, column-oriented storemodeled after Google's BigTable), in a parallel environment composed of inexpensive servers,processing large amounts of data. The results of MapReduce programs can be then loaded into aDW for further processing, or used directly from within the programs.

MapReduce's other key use is as a data integration tool to increase performance by reducing verylarge amounts of data to just what is needed in the DW. Due to the parallel nature of MapReduce, itincreases the speed of extraction and transformation of the data.

In 2008, MapReduce was included in both Aster Data (acquired by Teradata) and Greenplum(acquired by EMC) as an internal function within the DBMS; bidirectional connectors were includedin Vertica (acquired by HP) in 2009. In addition to the open-source Apache Software Foundationcomponents, there are other stand-alone implementations available. The pioneering distribution,Cloudera Distribution for Hadoop (CDH), has been joined by Hortonworks Data Platform (HDP), fromthe company formed as a spinout from Yahoo with added funding from Benchmark Capital.Distributions add value by organizing multiple components into a single download, single

Page 16 of 23 Gartner, Inc. | G00227024

preintegrated install unit, and typically add numerous additional components that extend coreHadoop. As of November 2011, the Apache Software Foundation lists 23 organizations that offerproducts that include Apache Hadoop, commercial support, and/or tools and utilities related toHadoop. (see http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support).Commercially, some of the uses of MapReduce are:

■ To process large sets of data such as clickstream data from the Web (hence the developmentby Google).

■ To process complex or mixed data types.

■ As part of a data integration infrastructure to process large amounts of data and to then loadthe data into a DW for further processing.

Many MapReduce applications are also performed on small amounts of data, especially when thedata contains complex mixed data types or involves complex multipass computations that includeadvanced analytics, such as pattern searching, clustering analysis or building "influencer networks."

MapReduce implementations represent a much wider class of processing concepts than this,however. The idea of a centralized repository for all enterprise data is becoming unmanageable,given the volume and complexity we have and the available hardware technology (see "Does the21st-Century "Big Data" Warehouse Mean the End of the Enterprise Data Warehouse?"). A newarchitecture is evolving that leverages new software and new, less costly and better performinghardware to create information assets on demand that may be either declared or inferred (sourceddynamically, based on context), or both. The processing involved treats search, mashups,metadata, integration and repositories as equal contributors. This requires a new type of processingin which massively parallel capabilities must be leveraged. The main reason is that the volume ofdata and the widely variant nature of its form/format (which comprises "extreme data") and contentrelative to the extremely wide variations in use cases makes centralized repositories of "all thingsinformative" unrealistic. Hadoop MapReduce represents the first popular implementation of this newclass of combined processing and storage.

Here is an example of how it works. We have four hours of stock ticker data. For two hours, nothingmeaningful happens. The Mapper puts two hours of data together and determines, based on astatistical algorithm, that there is nothing of value in this period. It then determines that the data forthe next 25 minutes is important and groups that data into smaller clusters (each, for example, of 30seconds). Then, after 25 minutes, the data returns to the same fluctuation pattern as before, butwith a much higher average selling price, prompting the creation of a new cluster, different from thatfor the first two hours. Now Reduce works on the clusters, creating one two-hour-long record,multiple 30-second records and a 95-minute record. If the data actually contained billions ofrecords for 400 stocks, the system would distribute portions of the records across a massivelyparallel processing complex. By the time MapReduce has finished its first run, it has created manydifferent intervals and values. MapReduce then runs again, reconfiguring the clustered data acrossthe system and finding better affinities. It continues to repeat until it finds the answer that bestaligns the results with the original statistical algorithm. In other words, the amount of data that canbe processed is no longer limited by a DBMS running on a dedicated database. Instead,

Gartner, Inc. | G00227024 Page 17 of 23

MapReduce can scale across all storage and all available servers, outside the DBMS, and then feedanswers in the form of records to the DBMS for final delivery.

Recommendations:

■ Organizations with highly technical developers or skilled statisticians should consider usingMapReduce to perform complex analytics "outside" the DW on external data sources.

■ Organizations with large amounts of historical data stored outside their DW should considerusing MapReduce as an alternative to their current data integration tools, as the parallelprocessing will deliver better performance. Today, this is one of the primary uses of MapReduceimplementations. Note that since the data loaded into a MapReduce process can come frommany different sources, there is none of the implied data quality or referential integrity found insome DBMSs; this would have to come from the source systems creating the data.

■ Organizations should carefully evaluate the use of MapReduce inside a DBMS, due to thevolume of data that would be loaded into the data warehouse. This could result in very largedatabase sizes, especially with complex, mixed data types, which would have a detrimentaleffect on performance and maintenance and deliver no clear benefit.

Govern Information

Multiple Domain Master Data Management

Analysis by Andrew White and John Radcliffe

Gartner defines master data as "the consistent and uniform set of identifiers and attributes thatdescribe the core entities of the enterprise, and are used across multiple business processes." Coreentities include parties, places and things. Groupings of master data include organizationalhierarchies, sales territories, product roll-ups, pricing lists, customer segmentations and preferredsuppliers.

The different master data domains include domains with similar complexities, oriented aroundprovinces:

■ The "party" province:

■ Customer: including consumers, business customers and channel/trading partners,prospects and citizens.

■ Supplier: including vendors and suppliers.

■ Human capital: including employees and contractors.

■ The "thing" province:

■ Product/service: including products, parts, assets, tools, services, accounts and policies.

■ Sell side, or customer-facing.

Page 18 of 23 Gartner, Inc. | G00227024

■ Buy side, or supplier-facing.

■ Financial: financial charts of account (also some similarity with hierarchies/relationships).

■ Location: including locations, offices, regional alignments and geographies.

■ Hierarchies, or relationships.

■ Reference data: sets of permissible values in look-up tables, often used in defining master data(for example, unit of measure, currency conversion, calendar).

The scope of a true enterprise MDM strategy will naturally need to support multiple master datadomains (described above), but the number, type and characteristics of those domains will differdepending on the organization's structure, use cases for MDM, implementation style, and industrysector (see "The Five Vectors of Complexity That Define Your MDM Strategy"). Since interest inMDM started, most organizations acquired MDM technology on the basis of individual MDMinitiatives which mapped to specific master data domains. Now they are increasingly interested inmapping out how they will support multiple MDM data domains over time. Many are hoping toenable the typical revenue generation, service enhancement, time to market reductions, costoptimization, growth enablement, risk and regulatory compliance business benefits associated withMDM initiatives, while minimizing the proliferation of different MDM vendors and products with itspotential costs and complexities.

In the long term we expect truly multidomain MDM technology to emerge, but this market dynamicis still only just emerging. Multidomain MDM will differ from multiple domain MDM, as multidomainMDM solutions are purpose-built solutions targeted at addressing the multidomain technologyrequirements of an MDM program. Multiple MDM represents multiple instances of single-domain-centric or use-case-centric MDM technologies, each possibly from a different vendor.

Some vendors have evolved with a deep focus on a single domain — such as MDM of Product Dataor MDM of Customer Data. Some vendors remain so focused; yet others have expanded their focus(and offerings) toward a multiple domain offering using multiple products, sometimes orientedaround specific industries. If the vendor offers such capability in one technology solution, we callthis a multidomain MDM technology solution. If the vendor offers such capability but with specifictechnology solutions (that is, multiple) we call this a multiple domain MDM implementation (ofsingle-domain-oriented technology solutions). We introduce this terminology to make the vendorperspective clear to users. However, end-user organizations use the term "multidomain MDM" andmay not delineate what type of technology approach they seek; and vendors exploit this by usingthe term "multidomain MDM," often independently of their technology approach. This createspotential confusion in the market and can create a mismatch in expectations between user andvendor.

Recommendations:

Start creating a holistic, business-driven MDM vision and strategy to meet the MDM needs of yourorganization. Ensure that you address governance, organizational, process and metrics issues, andthat you create a technical reference architecture for MDM. Be aware that for larger firms, and/or

Gartner, Inc. | G00227024 Page 19 of 23

those with complex information environments, it is likely that two or more MDM systems will beneeded to meet all MDM requirements. Also, ensure that your MDM initiative is aligned with theobjectives of your organization's EIM program. Without an overarching initiative like EIM, even ifMDM is the only initiative adopted, it is very possible that informational and organizational silos willnot be broken down completely, because the information infrastructure will be stretched in several,possibly competing, ways.

While keeping the long-term vision in mind, approach individual steps of the MDM journey based onyour business priorities and the maturity of MDM technology available. There is experience andproven MDM technology for managing customer and product data in operational use cases, and fordeveloping experience with other data domains, such as supplier, employee and location. There isalso experience and proven technology in niche analytical MDM products for managing financialand other types of data.

For the next three years, be prepared to engage multiple domain MDM vendors, or at least usemultiple MDM products from the same vendor, to meet all your MDM requirements. Build an MDMroad map, working with your business stakeholders and your enterprise architecture team, todemonstrate how the different facets of MDM will be addressed over time. Align, or leverage,investment in resources associated with a BI or IM competency center.

MDM will require human resources — more to start with, and less once things are properly underway. When assessing your human resources requirements, take account of the need to rationalizemultiple MDM initiatives in your organization over the long term.

In summary:

■ Organizations should take a long-term strategic view of MDM, but balance it with a pragmaticoutlook that delivers business value in stages and leverages technology for multiple master datadomains.

■ Develop an MDM strategy based on the "vectors of MDM complexity" requirements for yourorganization. Evaluate and select MDM vendors and products against this backdrop.

■ Organizations looking for multidomain MDM technology need to be realistic about MDMvendors' marketing claims.

■ MDM vendors need to improve their ability to provide in-depth support for multiple master datadomains. This may involve providing more breadth and depth in a single MDM product ortighter integration and consistency of usage across multiple MDM products.

Data Stewardship Applications

Analysis by Andrew White and John Radcliffe

Governance of data is a people- and process-oriented discipline that forms a key part of any EIMprogram. Governance includes the specification of decision rights and an accountability frameworkto encourage desirable behaviors in the valuation, creation, storage, use, archiving and deletion ofenterprise information enterprisewide. Technology plays a key role in the enforcement of the

Page 20 of 23 Gartner, Inc. | G00227024

policies and decision rites governance produces. Data stewardship applications specificallytargeted at structured data are now emerging. MDM is the first specific EIM program that reallyformalized the necessary technology to support the role of the master data (or more generally, data)steward. Before the formation of MDM, individual data quality, data profiling and other toolsproduced some analysis and data that would help with stewardship activities, but often informally,and oriented at one task at a time, and not at the specific role of stewardship. MDM solutions haveattempted to bring some additional capability together, but since 2008 a new technology hasformed that sits alongside the established MDM tools. The early MDM solutions only focused theirstewardship capabilities on the data in their own hubs; these offerings can now span "hubs ofhubs" and any other stores of structured data. Overall, MDM's maturity is driving a lot of the hype inthis technology, even though these tools could be (and are being) used by some organizations togovern reference data and other forms of enterprise information.

The maturation of this toolset will take time, and will continue to evolve through different groupingsof tools, initially spanning management processes supported by business rule engines, workflow,dashboards presenting analytics and to-do lists, more general master data reporting, metadatamanagement of master data across application data stores and data life cycle visibility, as well asMDM hubs, and other visibility and management tools that express and operationalize governance.In the future, we expect this technology to be applied to the governance of other data such ascontent, digital assets and records.

The emergence of composite, even packaged, applications for stewardship will soon challengeestablished MDM implementation efforts, because each new MDM effort brings native governancerequirements, which vary greatly. In the long term, mature IT organizations will formally integratebusiness process management tools with MDM tools to express the governance routines, alongwith business activity monitoring and corporate performance management. Many self-named MDMsolutions offer parts of this technology scope, although often not as an integrated environment foruse by the master data steward, or to be deployed across all other master data stores.

Recommendations:

Enterprises should realize that the governance of data is a core component of any EIM discipline.Be explicit about establishing a necessary stewardship function in the business (not IT), and supportit with governance overseers and the necessary set of tools and technologies to operationalize theday to day work of a business (data) steward. Don't assume that your MDM vendor automaticallyprovides all the necessary tools and solutions in support of data stewardship, Evaluate when and ifyou find differences between MDM vendor offerings and newly emerging data stewardshipsolutions. Select vendors with proven experience that aligns with your immediate priority, andwhose vision aligns with where your information governance programs will take you.

Recommended ReadingSome documents may not be available as part of your current Gartner subscription.

"Hype Cycle for Analytic Applications, 2011"

Gartner, Inc. | G00227024 Page 21 of 23

"Hype Cycle for Data Management, 2011"

"Hype Cycle for Enterprise Information Management, 2011"

"Hype Cycle for Master Data Management, 2011"

"Information Management in the 21st Century"

"Information Management in the 21st Century Is About All Kinds of Semantics"

"The Eight Common Sources of Metadata"

"Does the 21st-Century "Big Data" Warehouse Mean the End of the Enterprise Data Warehouse?"

"The Five Vectors of Complexity That Define Your MDM Strategy"

"A View of Master Data Management Vendors' Experience In Handling Multiple Master DataDomains"

Page 22 of 23 Gartner, Inc. | G00227024

Regional Headquarters

Corporate Headquarters56 Top Gallant RoadStamford, CT 06902-7700USA+1 203 964 0096

Japan HeadquartersGartner Japan Ltd.Aobadai Hills, 6F7-7, Aobadai, 4-chomeMeguro-ku, Tokyo 153-0042JAPAN+81 3 3481 3670

European HeadquartersTamesisThe GlantyEghamSurrey, TW20 9AWUNITED KINGDOM+44 1784 431611

Latin America HeadquartersGartner do BrazilAv. das Nações Unidas, 125519° andar—World Trade Center04578-903—São Paulo SPBRAZIL+55 11 3443 1509

Asia/Pacific HeadquartersGartner Australasia Pty. Ltd.Level 9, 141 Walker StreetNorth SydneyNew South Wales 2060AUSTRALIA+61 2 9459 4600

© 2011 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. Thispublication may not be reproduced or distributed in any form without Gartner’s prior written permission. The information contained in thispublication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness oradequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publicationconsists of the opinions of Gartner’s research organization and should not be construed as statements of fact. The opinions expressedherein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does notprovide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and itsshareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner’s Board ofDirectors may include senior managers of these firms or funds. Gartner research is produced independently by its research organizationwithout input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartnerresearch, see “Guiding Principles on Independence and Objectivity” on its website, http://www.gartner.com/technology/about/ombudsman/omb_guide2.jsp.

Gartner, Inc. | G00227024 Page 23 of 23