9
7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 1/9  Next Data Center Challenge? Enterprise Analytics and Big Data  As more org anizations embrace Enterprise Analytics and Big Data processing, they’re fi ndin g that the size and co mp ute-intensive nature of these work loads is stressing their existing infrastructure  – and this is only the beginning. Business-side management is go ing to d emand ever-faster solutions to m ore com plex problems. Fast hardware, designed to h andle huge throughp ut, can make the difference between finding u sable insights vs. doc umenting lost opportunities. IBM thinks they can give cu stom ers a perform ance edge with their new PowerLinux produc t line...  Big Data (and enterprise analytics) is all the rage these days, at least in the business and IT industry press, but it’s really still in its infancy. According to Deloitte, more than 90% of the Fortune 500 will have dipped their toes into the analytics waters by the end of 2012. Spending estimates for Analytics/Big Data projects are all over the map, ranging from $20 billion all the way to $50 billion worldwide. But no matter how you slice it, Big Data and enterprise analytics are new and very fast-growing markets.  As with other hot IT trends, the terminology gets very loose as the hype intensifies. The term ‘Big Data,’ for example, used to refer almost exclusively to clients using Hadoop or MapReduce to analyze unstructured data typically generated on the web. Today, the term is being used to refer to almost any big or difficult analytical computing problem. But the analytics wave encompasses much more than MapReduce and Hadoop; it also includes deep data mining, visualization, modeling & simulation, and other types of processing. A better term to describe this overall trend is Enterprise Analytics (EA) which we’ll be using in this research report.  The benefits of enterprise analytics are considerable but still somewhat anecdotal. The companies that are getting the most out of it say the least about what they’re doing and how they’re doing it. The use of enterprise analytics has become a competitive weapon in a wide range of industries and will, if anything, become even more crucial in coming years.  At a macro level, a McKinsey & Company study (“Big data: The next frontier for innovation, competition, and productivity”) identified a handful of broad categories where EA best creates value; most notably, how EA makes data more transparent and useful throughout the entire organization. In general terms, the biggest enemy of organizations is variability  – whether it’s in financial results, product quality, customer loyalty, or employee productivity. With EA tools and enough processing resources, we can study vast amounts of data to discover and understand why we see these variations occur. Once the sources of variability are identified, EA makes it easier to pinpoint potential

Next Data Center Challenge? Enterprise Analytics and Big Data

Embed Size (px)

Citation preview

Page 1: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 1/9

 Next Data Center Challenge?Enterprise Analytics and Big Data 

As more org anizat ions embrace Enterpr ise Analyt ics and Big Data processing,

they’re fi ndin g that the size and compute- intensive nature of these work loads is 

stressing their exist ing inf rastructure  – and this is only the beginning.

Business-s ide management is go ing to d emand ever-faster solut ions to more 

complex problems. Fast hardware, designed to h andle huge throughp ut , can 

make the di f ference between f inding u sable insights vs. doc ument ing lost 

oppo rtuni t ies. IBM thinks they can give cu stomers a performance edge with 

their new PowerLinux produc t l ine... Big Data (and enterprise analytics) is all the rage these days, at least in the business andIT industry press, but it’s really still in its infancy. According to Deloitte, more than 90% of the Fortune 500 will have dipped their toes into the analytics waters by the end of 2012.

Spending estimates for Analytics/Big Data projects are all over the map, ranging from $20billion all the way to $50 billion worldwide. But no matter how you slice it, Big Data andenterprise analytics are new and very fast-growing markets.

 As with other hot IT trends, the terminology gets very loose as the hype intensifies. Theterm ‘Big Data,’ for example, used to refer almost exclusively to clients using Hadoop or MapReduce to analyze unstructured data typically generated on the web. Today, the termis being used to refer to almost any big or difficult analytical computing problem.

But the analytics wave encompasses much more than MapReduce and Hadoop; it alsoincludes deep data mining, visualization, modeling & simulation, and other types of 

processing. A better term to describe this overall trend is Enterprise Analytics (EA) whichwe’ll be using in this research report. 

The benefits of enterprise analytics are considerable but still somewhat anecdotal. Thecompanies that are getting the most out of it say the least about what they’re doing andhow they’re doing it. The use of enterprise analytics has become a competitive weapon ina wide range of industries and will, if anything, become even more crucial in coming years.

 At a macro level, a McKinsey & Company study (“Big data: The next frontier for innovation,competition, and productivity”) identified a handful of broad categories where EA bestcreates value; most notably, how EA makes data more transparent and useful throughout

the entire organization.

In general terms, the biggest enemy of organizations is variability – whether it’s in financialresults, product quality, customer loyalty, or employee productivity. With EA tools andenough processing resources, we can study vast amounts of data to discover andunderstand why we see these variations occur.

Once the sources of variability are identified, EA makes it easier to pinpoint potential

Page 2: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 2/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

variability reducing solutions and to test them using models and simulations. Analytics canalso significantly improve decision making by automating routine decisions and correctlyidentifying situations that require a higher level of attention.

The benefits from this data-led decision making can be profound. McKinsey estimates thatretailers can potentially improve their margins by more than 60%, and that we could

improve the effectiveness of health care while shaving 8% off of overall spending.

 Academic research suggests that organizations adopting advanced analytics increase their productivity rates (and profits) by 5-6% vs. their competitors, which is an extraordinaryimprovement. It’s no wonder that business-side management is pushing advancedanalytics and Big Data initiatives.

Enterprise Analytics & HPC: Separated at Birth?

In general, Enterprise Analytics/Big Data workloads share a lot of common ground withscientific and technical computing (also known as High Performance Computing, or HPC)

applications. Many of the techniques and even algorithms used in HPC havecorresponding cousins that are highly useful in an enterprise context.

For example, research labs are constantly processing vast numbers of observations of natural phenomena, utilizing both structured and unstructured data. The amount of dataand processing involved requires them to use clusters of small systems running highlyparallel code in order to handle the workload at a reasonable cost and timeframe.

They’ll use this data to build a model, and then use the model to predict the results arisingfrom future interactions. These same methods and analysis techniques are being appliedto fraud detection, customer loyalty, and a myriad of other business problems as part of 

the advanced analytics trend.

Scientists also look to uncover patterns in phenomena and understand what causes thepatterns to exist, and what factors would change them. Companies are doing the samething when they attempt to understand the impact of their marketing mix on sales and howchanging things up might lead to greater success.

Enterprise Analytics & Data Center Stress

Both EA and HPC processing are some of the most demanding workloads in computingtoday. These applications can be highly compute-intensive, like a Monte Carlo simulation,

or I/O-intensive, like trying to process a large and constant stream of sensor data in order to make real-time decisions. In many cases, an HPC or EA analysis is an iterative processthat can be both compute- and I/O-intensive at different stages of the procedure.

Data centers that are accustomed to handling traditional transaction-oriented workloadswill discover that these new analytical workloads stress their existing systems andinfrastructures to a greater degree, and in different ways, than anything they’ve seenbefore. They can stress individual systems and entire infrastructures to the breaking point.

Page 3: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 3/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

One important difference between enterprise analytics and HPC requirements is the valueof time. HPC labs typically don’t face time constraints when it comes to processing. Their 

 jobs might run for days or weeks before producing results.

In the enterprise, it’s a different story. Many enterprise analytics workloads will be used tosupport automated or near real-time decision making. Speed, and reliable operation under 

heavy loads, will be key concerns in enterprise analytical processing.

The typical commodity system might be the right choice for the typical enterprisetransactional workload, but it might not be able to handle the strain of a compute-intensiveor I/O-hungry analytic application. Decision makers need to understand the requirementsof these new applications and carefully match up system configurations and capabilities inorder to meet the demands that will be placed on the systems by the business side of theorganization.

Enterprise Analytics: A systems perspective

Pragmatically, organizations that don’t already have a large enterprise analyticsinfrastructure in place will probably be getting into this type of computing incrementally.They’ll start with pilot projects in a few areas of the company, primarily where they see‘low-hanging fruit’ opportunities. They’ll build on these results, and the EA usage model willthen spread to more of the organization.

These limited pilot projects will likely consist of several analytical processes hosted on asmallish number of virtualized systems, with an emphasis on keeping start-up costs low tolimit downside risks and increase the ROI of a successful implementation.

The requirement for high performance at low cost leads us to see Linux as the idealoperating environment. It’s the dominant operating environment for HPC systems and thedevelopment platform for the vast majority of analytics ISVs.

Until recently, the only real choice for a server to host Linux was some flavor of either Intel-or AMD-based system accompanied by a virtualization suite – typically VMware’s vSphere.However, IBM is shaking up this status quo by adding a variant of their Power system lineinto this ‘industry standard’ commodity server stew. 

IBM’s PowerLinux Enters Commodity Fray

IBM has been the dominant vendor in HPC for almost a decade, with IBM systemsrepresenting more than 40% of the entries on the bi-annual Top500 list of fastest/largestsystems in the world. A solid number of these systems are based on IBM’s highlysuccessful POWER processor and Power system architecture.

These systems have been designed from the ground up to handle very large scientific and

Page 4: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 4/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

enterprise computing workloads. They’re highly integrated packages, with the processor,firmware, and operating system all designed to work together at maximum efficiency.Running IBM’s Unix operating system, AIX, these systems became the top sellers in theUnix market, shunting both Sun (with their SPARC/Solaris offerings) and Hewlett-Packard(HP-UX and Itanium) to the sidelines.

IBM has also taken a leading role in the enterprise analytics space, spending more than$15 billion over the past decade to build capabilities in business analytics consulting andsoftware, including the acquisition of Price Waterhouse, SPSS, Cognos, Netezza, PlatformComputing, and many other firms. These acquisitions, combined with IBM’s existingofferings, give the company the widest and deepest range of enterprise analytics skills,packages, and individual products in the industry.

In order to capitalize on their HPC and EA prowess, IBM decided to re-package their Power line of systems to better compete with the established x86 players. The result isIBM’s PowerLinux system offerings. These are a set of servers that utilize IBM’s highperformance POWER processors and include their PowerVM virtualization suite, but run

industry standard Red Hat and Suse Linux distributions. All of this comes at a price pointthat’s highly competitive with existing Intel/VMware combinations.

From an enterprise analytics perspective, what does PowerLinux bring to the table? Doesit really offer any advantages vs. traditional Intel/Linux/VMware offerings? Let’s take acloser look at some of the key technical differences between the two, starting with theprocessor… 

Processor : IBM’s PowerLinux systems use one or two eight-core POWER7 processorsoperating at up to 3.55 GHz. There are some significant differences between IBM’sPOWER7 and Intel’s Xeon product line that we’ve summarized in the table below.  

IBM’s POWER has advantages over Intel’s latest Xeon server offering on processor frequency, thread count, and cache. A key advantage is in memory bandwidth, which is

POWER7Xeon E5-4650L

(introduced Q2 ’12)

Cores 8 8

Frequency3.6 or 4.2 GHz (for all

cores)2.6 GHz (3.1 turbo with

some cores idled)

Threads per core4x per core

32 threads per CPU2x per core

16 threads per CPU

Cache 80 MB EDRAM 20 MB

Memory Bandwidth 105 GB/sec 51.2 GB/s

Page 5: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 5/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

crucially important to performance on many analytic workloads.

POWER7 has more than double the memory bandwidth of Intel’s fastest CPU – meaningthat more data can be moved from memory to CPU, radically increasing performance onmemory-bound applications.

Operating System(s) & Applications:  Both IBM’s PowerLinux and traditional x86systems can run distributions from Red Hat and Suse Linux. However, x86 systems canrun other Linux distributions along with Windows operating systems.

On the applications side, standard Linux applications need to be recompiled to in order torun on PowerLinux servers. That said, there is already a universe of more than 2,500applications with more on the way.

Several IBM analytic packages are currently PowerLinux-ready. This includes their InfoSphere BigInsights, which improves Hadoop performance while adding greater 

functionality and flexibility. PowerLinux users can also take advantage of IBM’s InfosphereStreams to analyze real-time data streams. This allows organizations to react instantly tochanging conditions – adjusting prices on the fly to take advantage of a competitiveopportunity, for example.

When it comes to operating systems and enterprise applications, a traditional x86 systemwill have a much wider range of choices than PowerLinux today. This is certainly aconcern if these systems are being looked at to take on the role of a typical x86 commodityserver. But in an Enterprise Analytics/Big Data context, there are plenty of packagescustomers can use with PowerLinux systems today.

Virtualization: IBM’s PowerLinux systems include IBM’s PowerVM hypervisor. This is thesame hypervisor used on IBM’s scale-up mission critical Unix server product line, with thesame reliability, security, and performance. In the x86 world, the dominant hypervisor andvirtualization suite is VMware’s vSphere 5. How does PowerVM stack up? 

When it comes to the basics like number of VMs, size of VMs, and other configurationissues, both PowerVM and VMware have enough capacity and flexibility to satisfy mostuse cases.

But there are some significant differences between the two packages that should be

highlighted. One of the key differences is in the way they are designed. VMware is a third-party add-on, and can be integrated with operating systems and hardware only to a certainextent. Because IBM owns the entire Power product line, PowerVM is highly integratedwith underlying hardware, firmware, and host operating systems.

This gives PowerVM some additional capabilities and features that can’t be matched bythird-party hypervisors. The table on the next page captures a few of the key differences:

Page 6: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 6/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

PowerVM Hypervisor VMware vSphere 5

Overhead Typically less than 2-4%, (proof point: all IBM Power benchmarks

are performed with PowerVMenabled – and these systems

typically win a lot of benchmarkbattles vs. bare metal competitors)

VMware licensing prohibitsdisclosure of non-sanctioned

benchmarks. Anecdotalinformation points to

overhead of 10-30% vs. baremetal performance,

depending on app mix

Dynamic Scalability

Can dynamically add or removeCPUs or memory from any VM. Candynamically shift resources to match

peak loads

Cannot dynamically add or remove CPU/memory due tooperating system limitations.

Dynamic VM MoveLive Partition Mobility, high tolerance

for network latency when movingheavily utilized VMs

vMotion, can seeperformance degradationwhen moving large and

highly utilized VMs

SecurityNo exploits or threats according to

NISTMany exploits/threats

according to NIST

Several of the differences in virtualization noted above will have a definite effect on howwell systems stand up to large, resource-intensive, and often time-critical analyticsworkloads. We’d suggest that at a minimum, customers test out various virtualizationmechanisms to see whether hypervisor overhead might be a significant issue with their particular collection of applications.

Workload & System Management

In the x86 system world, some see comprehensive virtualization suites like VMware as atype of workload and even system management solution. But even the most sophisticatednative x86 virtualization suite comes up short when its compared to IBM’s PlatformComputing Symphony offering.

Platform Computing was a well-known player in the HPC world for almost 20 years prior tobeing purchased by IBM in 2011. They specialize in software products that are designed toaccomplish two broad objectives: 1) Ensure that large collections of heterogeneousdistributed systems are fully utilized to the greatest extent possible; and 2) Make sure that

the right applications always get the appropriate level of system resources to comply withorganizational policies and needs.

 A typical analytics customer (a large financial institution, for example) would use PlatformSymphony to manage hundreds of applications that handle risk management, trading,account tracking, and asset pricing routines on a shared infrastructure that spans tens of thousands of cores and multiple, geographically dispersed data centers. Using Platform,

Page 7: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 7/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

they can achieve very high administrator-to-host ratios (as high as 400:1) and globalaverage system utilization rates topping 70%.

Symphony can start/stop, monitor, and control applications on a wide variety of platformsincluding x86 based Linux and Windows systems, commercial Unix servers  – andPowerLinux systems. Using Platform, data centers can divide up scarce hardware

resources on almost any basis desired. Applications can equally share hosts, or prioritizedapplications can have first dibs on free resources. Up to 10,000 priority levels can bespecified, which is enough to cover virtually any situation. Workloads can also havethresholds applied so that the amount of resource they consume is limited. Priorities canbe adjusted dynamically – even on running jobs, so that a sudden critical need for processing can be addressed immediately.

Symphony is a smart manager, monitoring the systems in its care and making instantadjustments when things go wrong. It will parcel out tasks to client systems to ensure thatthe task is completed in the time allotted. If a client system fails, Symphony can move theworkload from that system to another functioning machine without skipping a beat. Even if 

the job-tracking system fails, the jobs will recover onto a different system and all jobs willcontinue to process.

Enterprise Analytics, Symphonized

In an Enterprise Analytics/Big Data context, Platform Symphony really shines onMapReduce problems. In 2011, Platform embedded MapReduce APIs into Symphonycode, which adds the ability to run multiple MR/Hadoop jobs simultaneously on the samecluster or manage multiple jobs on several clusters – all while maintaining compatibilitywith existing MR/Hadoop applications. The result is a solution that can run as much as 10xfaster than open source MR/Hadoop solutions.

Combining Symphony with IBM’s Infosphere BigInsights softwar e produces equallyimpressive results. In benchmarks measuring an IBM solution vs. open source Hadoop(both running on x86 hardware), IBM found that their solution…. 

Performed 60x faster than Hadoop 1.0.1 on a sleep test. This is where a job is sentto a cluster, put to sleep, and then removed. It measures the overhead that thescheduler places on jobs. On this benchmark, native Hadoop could place about 5tasks per second, while the Platform Symphony scheduler can place more than 340tasks/sec.

Is 6x faster on the SWIM (Statistical Workload Injector for MapReduce) test, whichuses real Facebook 2009/10 workloads to test how quickly these tasks areperformed using Hadoop on distributed systems. Symphony was 6x faster than thesame hardware running Hadoop 1.0.1.

Used 10x less hardware to top Yahoo’s world record Terasort Hadoop benchmark.This benchmark requires a system to sort through 100TB of data using Hadoop.

Page 8: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 8/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

Platform Symphony, coupled with IBM’s BigInsights, was able to slightly better Yahoo’s results with 10x less CPU cores.

We believe that purchasing Platform was one of IBM’s canniest moves of the past severalyears. As demanding tasks like enterprise analytics are added to an already crowded ITinfrastructure, the need for comprehensive workload managers like Symphony will radically

increase.

 And while Platform’s products run on competitive systems, we believe that over time, IBMwill more closely integrate Platform’s attributes into their entire portfolio of systems andsoftware. This will give IBM solutions a competitive advantage over competitors’ offeringswhen it comes to workload management, system utilization, and ultimately, total cost of computing.

With Platform Symphony as one of the earliest offerings for their PowerLinux systems itgives IBM a leg up vs. competitors who rely upon less capable packages as workloadmanagers for distributed applications. While no one can predict how much new capacity

customers will need to handle new workloads like enterprise analytics, it’s safe to say thatthey’ll need “more,” and probably “lots more” hardware and software.

To make an already complex task even more difficult, these new workloads are going tohave rigorous time constraints. Platform Symphony could be the key to untangling andmanaging conflicting resource demands and ensuring that critical processing isaccomplished according to business needs.

Summary & Recommendations

The adoption of Enterprise Analytics is inevitable. It has already changed the competitivebattlefield in a number of industries (financial services, retail, and others) and is poised todo the same in many more.

But implementing EA is going to present significant challenges to enterprise data centers.These applications are essentially supercomputing workloads and place far greater demands on systems, storage, and networks than the typical transactional workloads thattake up the majority of space in today’s enterprise data centers. 

The new PowerLinux system line is an interesting gambit on IBM’s part to carve away asignificant, but growing, niche from the commodity x86 server vendors. IBM has taken their 

most advanced system technology, mated it to industry standard Linux, and developed asolution that can top existing x86 systems on technical criteria like performance and RAS.

So can IBM use this PowerLinux product line to replace x86 systems universally? Nope,not a chance. But is it a valid and competitive solution for particular situations  – likeEnterprise Analytics and Big Data? Yes, absolutely.

Page 9: Next Data Center Challenge? Enterprise Analytics and Big Data

7/28/2019 Next Data Center Challenge? Enterprise Analytics and Big Data

http://slidepdf.com/reader/full/next-data-center-challenge-enterprise-analytics-and-big-data 9/9

  Next Data Center Challenge?

Enterprise Analytics & Big Data

The technical case for PowerLinux is solid enough vs. x86 to merit some attention fromcustomers looking for a new EA solution. But when you factor in the price of thesesystems, it moves the interest meter from just ‘interesting’ all the way over to ‘compelling.’  

The acquisition price for the bare hardware (single or two socket PowerLinux system vs.same configuration x86 box) is about the same. But when you add in the operating system

(Red Hat Linux, for example) and virtualization suite (PowerVM or VMware’s vSphere),you’ll find that IBM’s PowerLinux systems have a 10-20% acquisition cost advantage.

The biggest challenge for IBM with PowerLinux will be to stay the course and to continueto invest in the product line. They need to aggressively recruit ISVs to the platform andassure them that PowerLinux is a long-term strategic play for IBM. Given the technical andprice advantages vs. x86, they’ll certainly get their share of customers assuming thatcustomers have faith that they system will be around for the long haul.

This document may not be reproduced or transmitted in any form by any means without prior written permission from the publisher. All trademarks and registered trademarks of the products and corporations mentioned are the property of the respective holders. Theinformation contained in this publication has been obtained from sources believed to be reliable. Gabriel Consulting Group does not warrant the completeness, accuracy, or adequacy of this report and bears no liability for errors, omissions, inadequacies, or interpretations of the information contained herein. Opinions reflect the judgment of Gabriel Consulting Group at the time of publicationand are subject to change without notice.