40
Provisioning Big Data Platform using Cloudbreak & Ambari Karthik Karuppaiya Vivek Madani Sr. Engineering Manager, CPE Sr. Principal Software Engineer, CPE San Jose Hadoop Summit 2016 – Karthik Karuppaiya & Vivek Madani

Provisioning Big Data Platform using Cloudbreak & Ambari

Embed Size (px)

Citation preview

On-Demand HDP Clusters using Cloudbreak and Ambari

Provisioning Big Data Platform using Cloudbreak & AmbariKarthik Karuppaiya Vivek MadaniSr. Engineering Manager, CPE Sr. Principal Software Engineer, CPESan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

1

AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5

Monitoring & Alerting6

2

IntroductionSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniSymantecSymantec is the world leader in providing security software for both enterprises and end usersThere are 1000s of Enterprises and more than 400 million devices (Pcs, Tablets and Phones) that rely on Symantec to help them secure their assets from attacks, including their data centers, emails and other sensitive dataCloud Platform Engineering (CPE)Build consolidated cloud infrastructure and platform services for next generation data powered Symantec applicationsA big data platform for batch and stream analytics integrated with both private and public cloudsOpen source components as building blocksBridge feature gaps and contribute back

3

AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5

Monitoring & Alerting6

4

Big Data Platform ChallengeHundreds of millions of users generating Billions of events every day from across the globeHundreds of Big Data Application Developers developing 1000s of applicationsAt 12 PB and 500+ nodes, Cloud Platform Engineering Analytics team built the largest security data lake at SymantecElasticity is built into the platform to optimize costs in the cloud

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

5

Big Data Platform ChallengeGreat! Now Developers can start building applications on our Big Data Lake100s of developers start building applications using different big data tools

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

6

Big Data Platform ChallengeProduct team developers wants quick changes, latest versionsPlatform team wants stability!Soon, frustration prevails

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

7

AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5

Monitoring & Alerting6

8

What is the Solution?Build and use your own little cluster for developmentCopy subset of data for development purposesBuild elasticity into the platform for cost optimizationsTear down the cluster after development is completeRepeat and Rinse

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

What is the Solution?But Building clusters are hard and time consumingToo many services to install and configureDevelopers are not interested in building and managing clusters

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

What is the Solution? Self ServiceWhat if we make it really easy to build clusters?Abstract all the deployment complexities and enable developers to get their own cluster in one click of a buttonUse the same blueprint for both dev and prod clustersSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5

Monitoring & Alerting6

12

Self Service Analytics (SSA) ClustersRESTful web services to allow creation and management of custom clustersSelect from pre-defined Ambari BlueprintsCan provision infrastructure on Openstack as well as AWSInstalls HDP stack specified as part of Ambari blueprintDashing dashboard to monitor and manage (start/stop/kill) clustersSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

EnvironmentPrivate cloud on Openstack (Kilo, No Heat)Public cloud on AWSHDP 2.3.2 & 2.4.2Ambari 2.1.2 & 2.2

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

SSA Architecture

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

SSA Services

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

SSA Demo

Ambari Custom ServicesWhat about the services that are not supported by Ambari out of the box?We write our own Ambari custom stack San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4Going Hybrid Cloud using Cloudbreak5

Monitoring & Alerting6

19

Next Gen SSA This is all great! But, lot of work to add more cloud providers. Takes a lot of effort to understand the cloud providers APIs

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Next Gen SSA Cloudbreak CloudbreakCloudbreak helps to simplify the provisioning of HDP clusters in cloud environmentsSupports multiple clouds including AWS, Google, Azure and OpenstackUses Apache Ambari for HDP installation and managementHas a nice UI to build and manage clustersSupports automated cluster scalingSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

AWS Cluster ArchitectureSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Private Subnet

Direct Connect 10 GbpsData Ingestion PipesTelemetry Ingestion PipesDatacenter hosts HDP over bare-metal and Openstack

Uses d3.* and r3.* flavorsEncrypted volumes LUKSNon-EBS root volumeNon-Dockerized HDPCustom AMI Enhanced networking

Symantec Datacenter

Cloudbreak DemoSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Hybrid Cloud Using Cloudbreak Customization & ContributionNon-dockerized HDP installationSupport for Keystone v3 for OpenstackCloudbreak 1.2 released 03/2016Support for Custom AMIsWe have our own hardened images with Enhanced Networking, Volume Encryption, etcSupport for non-EBS backed root volumesDeploy in existing private VPC/SubnetAdditional AWS instance flavors supported We use r3.* and d3.* which are not supported by CloudbreakWe build our own Cloudbreak package from the trunk

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Cloudbreak Keystone V3 ScreenshotSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Cloudbreak Keystone V3 Project Scope ScreenshotSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Custom AMI SupportOrg security mandates using specific hardened AMIs onlyCreated our own hardened image with software and configurations required by CloudbreakAllows us to use features like:Volume encryption, enhanced networking enabledNon-EBS volumesSymantec specific configurations like LDAP, repos, DNS etcSymantec standard for hostnames Use jdk1.8 instead of java 7 which comes with Cloudbreak AMI

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

/cloud-aws/src/main/resources/aws-images.yml

Non Dockerized HDP SupportWhy?No experience running production clusters under docker.Unknowns with upgrade path for HDP components.Encrypted Disk Volumes had issues working with docker.What?Worked with Cloudbreak team to test out non-Dockerized version of CloudbreakProvided feedback from our test deployment of the non-Dockerized versionFeature now available in the master branchSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Non-EBS backed root volumeChanges to AWS CloudFormation template used by CloudbreakWe use ephemeral storage for root volumes for availability reasonWill contribute this back as an option to CloudbreakSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Cloudbreak Contribution In ProgressPlacement groupsMultiple security groups attached to one clusterMultiple subnet deployment inside VPCSupport for non-EBS root volumesSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

AgendaSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek MadaniIntroduction1Big Data Platform Challenges2What is the solution?3Self Service Analytics Platform Provisioning4

Monitoring & Alerting6Going Hybrid Cloud using Cloudbreak5

31

Monitoring & AlertingNow that we have delivered an elephant, the next question from users is How is his health?San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Monitoring and AlertingComprehensive dashboards for all environments managed by the platform teamExtensively use Ambari AlertsQueryX: Custom framework to fill the gaps in Ambari AlertsAll alerts are sent to OpenTSDB + Grafana stackCritical alerts PagerDutySan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Monitoring and AlertingSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Ambari Metrics Collector + QueryXCluster 1Cluster 2Cluster3.OpenTSDBGrafanaCall Ambari Metrics API

Grafana DashboardsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Grafana DashboardsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Ambari AlertsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Ambari AlertsSan Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Summary and Future WorkA journey towards one click cluster deploymentCloudbreak - one tool for all cloudContribute back the features developed in-houseEnable Cloudbreak to support Baremetal cluster provisioningAuto-scaling using Cloudbreak and PeriscopeSingle large YARN cluster for variety of compute and storage loadsOpen source use and contributeWork with community to address gapsSSA code already opensourcedhttps://github.com/symantec/

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani

Thank You!

Q & A Karthik [email protected]

Vivek [email protected]

San Jose Hadoop Summit 2016 Karthik Karuppaiya & Vivek Madani