7
Big Data Equals Big Challenges Hosting Choices for Big Data Applications 11192015

Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

Big Data Equals Big Challenges

Hosting Choices for Big Data Applications

11192015

Page 2: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

2

1 IDC, “Big Data Analytics: Future Architecture, Skills and Roadmaps for the CIO”. P5, 20112 NIST, “NIST Cloud Computing Definition,” 25 October 2011. http://www.nist.gov/itl/csd/

cloud-102511.cfm3 http://www.listenlogic.com/about/

Definitions and ConceptsThe following are definitions and concepts used within this white paper:

Big data is defined as, “a new generation of technologies and architec-tures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.”1

Big data analytics, then, is the process of analyzing vast amounts of unstructured and semi-structured data that are too voluminous and time- sensitive to load into a traditional relational database.

Cloud computing is, “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”2 This is also known as cloud hosting. Within the context of this paper, “the cloud” refers to the concept of cloud hosting/cloud computing.

Dedicated hosting refers to hardware and network connectivity provi-sioned exclusively for the customer at a facility not owned or leased by the customer. The customer is responsible for all administration and support of the system, though the facility owner may offer remote hands services for basic hardware issues, for an extra fee.

Managed hosting refers to a more sophisticated version of dedicated hosting, where the hosting provider takes on the responsibility of the day-to-day operation of the customer’s system. This frees the customer from network administration, infrastructure and support issues, making it possible for the customer to reduce or eliminate the need for network and infrastructure personnel.

As the ability to process, filter and

classify “Big Data” becomes an

increasingly critical utility of doing

business, finding an economical

hosting solution capable of handling

vast amounts of information, free

from downtime, latency, scaling and

throughput issues is a mission-critical

need for companies reliant on Big

Data for their core business.

This white paper looks at the

challenges Big Data companies face

when choosing a hosting platform.

In it, we look at the pros and cons of

Do-It-Yourself, Colocation, Cloud and

Managed Hosting options.

Peak Hosting | (888) 476-7325 (PEAK) | [email protected]

Page 3: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

3

Big Data ChallengesNo company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges specific to Big Data that can significantly affect its distributed, high volume nature. The most detrimental of these are packet loss, latency (the time data takes to get from point A to point B) and jitter (variations in latency times), which happen in all networks to one degree or another but are especially troublesome for Big Data applications.

Packet loss is the worst of the three since it involves the loss of data. Jitter is the second most destructive because although applications can deal with slowing down, the cycle of slowing down and speeding up will ruin the apps.

In LAMP stacks, there used to be the notion that “silence is golden.” Unless the application had something to say, it didn’t talk and left the system open for other traffic. The development of distributed applications threw that away.

Distributed applications come from a world where bandwidth is plentiful and cheap, and were developed to use it as much as possible, which means now everyone is using as much as possible. What does all this mean? That modern distributed applications put a stress on the network that many hosting solutions, especially most cloud environments, were not designed for.

Big Data applications are resource intensive. To run properly, they need a lot of power, require a minimum of 10 Gbps networks (though 40 Gbps is better), require frequent diagnostics and need tons of storage and memory. And the memory issue isn’t just the amount of RAM you have, but also the number of DIMM slots that are filled. If you have RAM in 50% or more of the available slots, RAM clock speed can drop by as much as 30%.

Hosting choices fall into several general categories: Cloud Hosting, DIY/Colocation and Dedicated Managed Hosting. The good news is that networks can be designed to reduce these problems, except for the Cloud, for reasons that will become apparent.

Within these hosting options, there are two underlying decisions a business needs to make: 1) whether they want their technical operations to be managed by someone else or if they want to do it themselves, and 2) whether or not the company wants dedicated or shared resources for their implementation.

Time for Some IntrospectionIn order to make an informed decision about what type of hosting setup to select, there are nine questions you need to answer:

1 Does my service need to be available 24/7?

2 What would the financial impact be to my company should my service go down or if performance is sluggish?

3 Will the hosting solution scale with me as my company grows?

4 How much visibility do I want into my hosting solution?

5 Am I prepared to share hardware resources?

6 How much infrastructure do I want to maintain in-house?

7 What technical support do I have on staff to support my hosting solution?

8 What are the cost implications (monthly hosting fees, technical support, infrastructure, etc.) for my company for each hosting option? Where do I want to put my resources?

9 Does doing it all myself give me a strategic advantage over my competition?

The answers to these questions are important for any com-pany, but even more so when it comes to running Big Data applications. By their very nature, applications like Hadoop, Cloudera, MapR and Hortonworks push huge amounts of information across a network. How much? One big social data analytics company translates millions of daily digital conversations and analyzes them in real time. Their system must be capable of processing over 1 billion streaming classification operations per second (SCOPS) to enable them to ask the Hadoop-level, deep-dive questions. These applica-tions can crush systems that aren’t designed to handle them.

Peak Hosting | (888) 476-7325 (PEAK) | [email protected]

Page 4: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

4

Managed versus UnmanagedThe first choice is deciding who will manage the server provisioning, maintenance, on-going hardware support, connectivity and everything else involved in running your technical infrastructure. You have two options: doing it yourself (unmanaged) or hiring someone to do it for you (managed).

Managed hosting provides you with more “ownership” of the hardware and, thus, greater control over resources housed in the hosting provider’s data center. This means you have your own dedicated servers—no sharing of server space with other companies—without the overhead cost of actually purchasing the hardware. Managed Hosting vendors typically provide customers administration and other services, including operating systems, software and security systems.

The DIY model, on the other hand, leaves you with the responsibility for dealing with the entire network infrastruc-ture, including the data center. Colocation is essentially a DIY deployment where someone else runs the data center facilities and provides the connection to the Internet; you’re responsible for everything else. And whether you’re running your own data center or colocating equipment in someone else’s data center, unmanaged hosting requires you to have the person-nel needed to provision servers, monitor the network, work as systems administrators, programmers, any many more.

For companies looking for maximum control over everything in their network and willing to assume the responsibility for the care and feeding of their technical infrastructure, these options offer precisely that…but at the cost of time, money, resources and staff.

RESOURCESShared

Non

eFu

lly M

anag

ed

Dedicated

MA

NA

GEM

ENT

Cloud DIY

Co-location

OaaS

Shared Resources Dedicated Resources

Managed Hosting Cloud providers typically do not manage the hosted company’s server environments, though a few will for additional fees

Peak Hosting Operations-as-a-Service

Unmanaged Hosting This is where traditional cloud hosting falls. Resources made available to the customer, where customer is one of many companies provisioned in a multi-tenant, distributed virtual server environment.

DIY, Colocation and Dedicated (unmanaged) hosting fall into this category. In both instances, customer is responsible for all servers, switches, SANs, firewalls, etc., up to the point of Internet connectivity.

Colocation provides the data center facility, HVAC, power and everything up to the point of the Internet connection, while DIY means the customer does it all.

Shared or Dedicated ResourcesThe other choice you need to make is whether or not you want your data and applications to operate on shared or dedicated hardware. Dedicated hosting means all the hardware is purchased, leased or assigned solely to you. This can be as basic as one single-tenant server, or as sophisticated as a complete network and infrastructure designed just for your company where nothing…servers, load balancers, SAN, switches or anything else in your infrastructure…is shared.

Obviously, the total DIY solution will be 100% dedicated since you buy or lease all the equipment, design and support the entire infrastructure and run the actual data center.

Peak Hosting | (888) 476-7325 (PEAK) | [email protected]

Page 5: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

5

Hosting Option: Cloud HostingCloud Hosting, also known as Infrastructure-as-a-Service, provides you with access to computing resources in a virtual-ized environment across a public connection. Specifically, the resource provided is virtualized hardware, including virtual server space, network connections, bandwidth, IP addresses and load balancers. In physical terms, the hardware is drawn from a multitude of servers and networks that exist across any number of dispersed data centers, all of which the cloud provider is responsible for maintaining.

The cloud is designed to be big, but not necessarily designed to be fast. The cloud’s design also means your applications, data and everything else can be distributed across any server in the architecture. Since Big Data applications are already distributed, this poses a challenge to running Big Data apps in this infrastructure due to latency, jitter and potential packet loss.

companies move in, however, they soon realize that the cloud is not all it’s cracked up to be. In fact, there are many challenges to hosting in the cloud, especially for Big Data applications:

Oversubscription. Cloud deployments are typically oversubscribed, meaning there are more customers than is optimal hosted on the same servers as you. Result? Performance issues like latency, jitter and packet loss.

Server Sprawl. In the cloud, a customer is not supposed to notice, or care, whether their next virtual machine/server is provisioned on the same server, the next server, another server rack, or even in an entirely different data center. Latency is dictated by the laws of physics, and even cloud providers are subject to them.

Jitter and Latency. Networking issues like latency and jitter (variations in latency times) kill big data performance. Big Data and distributed applications can only process as fast as signaling packets are able to move between systems, which slow to match the slowed common denomi-nator (packet) in order to maintain synchronization. In networks with high latency or jitter, big data application performance plummets.

Transparency. Customer security and other reasons mean that cloud companies limit visibility into their systems, making troubleshooting, performance analysis and bug detection extremely difficult.

Cost. If you’re doing batch processing or only using the cloud a few hours a day, the cloud can be appropriate for your needs. Otherwise, you’ll be better off hosting at a managed hosting provider.

Customization. Big Data companies require deep levels of customization around TCP stacks, kernel patches, and a number of other factors. The one-size-fits-all nature of cloud services just doesn’t adapt to the needs of Big Data.

Service and subject matter expertise. In the cloud, you’re left to support everything yourself. If there are performance or application problems, you’re on your own. OS issues? You have to fix it. In the end, you’re left holding the bag with just a bunch of APIs, FAQ and GUIs available for your staff to do the work.

Service Level Agreement (SLA). Many cloud SLAs guarantee you can always provision and pay for more systems, but not necessarily receive the performance you actually needed. Cloud SLAs also make no provisions for your systems staying up; a system could fail, but as long as you’re able to start over provisioning more systems, the cloud provider’s SLA is intact.

Pros

Elasticity is a native feature that requires no additional effort to design and configure

Provides an easy point of entry for start-ups

Cons

The agility that cloud provides requires paying a significantly higher premium

Oversubscription means resources might be unavailable when you need them

There’s no certainty around capacity because performance doesn’t scale linearly

A small VM and a giant physical server take the same amount of work to manage, but require managing significantly more instances

Cloud is self-service and the cost of sourcing, training, and maintaining technical operations staff does not synergize with core business objectives

Cloud is unable to meet performance requirements for Big Data applications

By design, the cloud operates in a multi-tenant environment where hosted companies’ networks are virtualized on the many servers owned and operated by the host provider. The cloud appears attractive, with a deceptively lower cost model, promises that it will always scale with you, avoidance of hardware configuration and detail discussions, and that their multitude of servers and equipment means there is no possible way to run your technical operations less expensively. Once

Peak Hosting | (888) 476-7325 (PEAK) | [email protected]

Page 6: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

6

Hosting Option: Dedicated Managed Hosting Managed Hosting provides you the best of all worlds: “ownership” of the hardware, greater control over the resources housed in the hosting provider’s data center, the administration, support and maintenance of all equipment, software (including operating systems) and data center facilities, all in a no-multi-tenancy environment. In some instances, such as Peak Hosting’s Operations-as-a-ServiceSM solution, the hosting provider will also support other applications such as Apache, NoSQL and more.

Managed hosting allows for customization and support unavailable in the cloud, without the need to hire network support staff. Hosting company personnel do all the hardware upgrades, day-to-day server management, occasionally diagnosed throughput issues, and server OS upgrades and patches. For customers, managed hosting has most of the benefits of running a data center without the personnel or capital expenses.

Pros

Company controls everything from architecture to hardware to operational procedurese

You get full visibility into the infrastructure supporting your application to improve tuning and troubleshooting

DIY is most cost effective at massive scale, when you have buying power and economies of scale

Cons

Total cost of ownership is difficult to justify until you’re using thousands of servers (DIY)

Running a data center is difficult to do well and technical operations usually does not synergize with a business’s core objectives

Cons

It’s not as flexible as a DIY solution

You need to trust your provider is knowledgeable, which can be difficult to verify until after you’ve migrated your system and are locked in to a contract

Pros

Customized configurations are cost efficient, from two servers to a thousand servers

Leverages the shared network and data center infrastructure you would otherwise need to acquire for a DIY deployment

Reduces the cost of your infrastructure and provides exactly as much capacity as you pay for

Reduces architectural complexity, improving performance and visibility

Outsources your technical operations instead of requiring you to build an in-house team

Reduces your capital and operational expenditures as compared to an in-house DIY solution

Provides you with greater reliability and performance compared to typical cloud environments

Allows greater flexibility for adding software and operating systems, as well as the ability to change server configurations

Hosting Option: Do-It-Yourself (DIY) or Colocation DIY is exactly what it sounds like. With this option, your are responsible for its entire hosting solution, from purchasing hardware to designing the architecture, building the solution and managing all technical operations. For Colocation, the hosting responsible for supporting the data center, facilities and the Internet connection, while you are responsible your servers, switches, load balancers and all other equipment.

Peak Hosting | (888) 476-7325 (PEAK) | [email protected]

Page 7: Big Data Equals Big Challenges - WordPress.com · 3 Big Data Challenges No company can afford to lose data. While all hosting solutions have their pros and cons, there are challenges

7

ConclusionGenerally speaking, a server is a commodity item. All things being equal, a Dell R710 is a Dell R710, irrespective of whether it’s provided by Hosting Company A or Hosting Company B. The service, configuration, and deployment are where the magic happens. What good is the best designed and configured network if it stops working and the hosting company doesn’t have the expertise to fix it? Isn’t it better to have a system that doesn’t fail to begin with?

So what does it all mean for big data?

It’s clear that cloud hosting just won’t work. Too many issues (such as multi-tenancy, sprawl, jitter and latency) affect performance, limited visibility into the system means you never know what you’re getting in terms of specs, limited support leaves you needing to hire people to manage the servers and apps. Add to this the cost of running in the cloud more than a couple hours per day means that you’ll wind up paying a lot of money for something that doesn’t return much on the investment.

For companies requiring the maximum control in design, build, maintenance and support, DIY or colocation hosting

Hosting SummaryHere is a comparison for some of the performance and support challenges across the different hosting types:

In-House Cloud Colocation OaaS via Peak Hosting

Minimal Packet Loss, Jitter and Latency Maybe

Single Tenant

Reduced/No Server or Data Sprawl Maybe

DC Techs Supplied

SysAdmin Supplied

Transparent Infrastructure

Custom Design Maybe

10 or 40Gbps Network Maybe Maybe

Application Support tools

are right up your alley, with Colocation being a better value since you don’t need to build and maintain a data center. Both, however, still require hiring personnel to manage the day-to-day issues surrounding your infrastructure, as well as staff to support any applications you use to run your business on this infrastructure.

If your company needs better performance and visibility than the cloud, the customized design of DIY, the data center support from Colocation but doesn’t want to hire staff to support the application-side of things, Dedicated Managed Hosting gives you the best of all worlds, in many cases at a cost that’s significantly less than any of the other solutions.

Peak Hosting has been providing custom designed and built infrastructures since 2001. With our Operations-as-a-Service dedicated managed hosting solution, we provide customers not only all the hardware and data center facilities needed to run their technical operations, but also with the people to support the applications needed to run your infrastructure…everything but your code.®

Peak Hosting | (888) 476-7325 (PEAK) | [email protected]