33
Tech Data Cloudera on Azure

Tech Data Cloudera on Azure

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 1

When8.2

Classified as Microsoft General

Tech Data

Cloudera on Azure

Page 2: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 2

Contents 1 Tech Data Cloudera on Azure Step by Step ................................................................................3

1.0 Things to know prior to using this Guide ........................................................................................ 3

1.1 Cloudera on Azure deployment...................................................................................................... 4

1.2 How to connect.............................................................................................................................. 17

1.3 Post-Deployment Tasks.................................................................................................................. 22

2 Architecture Notions ……………………………………………………………………................................................28

Page 3: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 3

1.0 Things to know prior to using this Guide • You would need to familiarize yourself with this document prior to diving in.

• All the Screen Shots in this Guide are for reference only.

• This Guide will assist you with the deployment of the Cloudera Bundle in an Azure CSP

subscription that was purchased through the StreamOne Portal.

o In-depth training on Azure is outside of this guide.

• Accessing the Cloudera in Azure bundle

o You would need to login to the Azure portal to get the IP address

▪ https://portal.azure.com

▪ You would need to login using the same username and password as the

one created in StreamOne and what was emailed to you.

• For example: [email protected]

• It will give you a one-time password and you will need to change it.

▪ To access the Cloudera Platform, you must ensure you have the Login

and Password that were created during the StreamOne ordering

process.

▪ If you were not the person who accessed the StreamOne ordering portal

to do the purchasing, please get with that person and obtain the user

login and password that were initially created.

▪ If you need to access the underlying VMs, you will need the SSH key or

the password created during the ordering process.

SSH Private Key:

You will be given your Private SSH Key during the order of your Cloudera Platform.

Please make sure you secure this key and store it in a safe place as you might need it

for SSH access to any of your instances. Your key will be displayed only once and

there is no way to recover it later on. For security reasons, Tech Data does not keep a

copy.

• Prior knowledge is required with Cloudera and Microsoft Azure.

Services not included:

We do not install Apache Sentry, so you can integrate with your own Kerberos

Installation.

We also do not install Apache Kudu.

Page 4: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 4

1.1 Cloudera on Azure deployment.

Connect to StreamOne Cloud Marketplace:

You can search for the Microsoft Azure SKU in Most Viewed, browsing by Categories or Vendor,

or directly searching for it in the search field:

Page 5: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 5

Click on Microsoft Azure:

You will then be able to browse the different skus, click on "ADD TO CART" button of

registration SKU:

Page 6: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 6

Click on "View Cart" button:

Click on 'Proceed to Checkout' button:

Page 7: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 7

Fill End User information or select any end user using your email and click on "Continue to

Configuration" button.

Configuration page should be displayed.

Page 8: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 8

Fill in your Microsoft Partner Network ID.

Click on Create a New end customer Microsoft account button.

Enter any unique domain name and click on Check Availability button.

Select "The End Customer email" radio button from the "Account Administration" module.

Or select "I will administer the account" radio button from the account administration module and

enter the Delegate admin email ID.

Click on "Continue to Payment" button.

Click on "Continue to Summary" button.

Verify the information shown and click on "Place Order" button.

Page 9: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 9

Your order should be complete.

Page 10: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 10

Page 11: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 11

You should receive an email with your Microsoft Subscription Information:

And another email regarding the deployment of your Azure Bundle:

Page 12: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 12

Now click on Reseller Portal, then Customer Admin. Look for your Customer and click on

IaaS/PaaS.

Then Click on Modify:

Page 13: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 13

Then click on “Click to Configure”:

Page 14: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 14

Information related to selected bundle should be displayed.

Select Location from location drop down and fill in the Resource Group Name.

Page 15: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 15

Fill the Basic Information.

You can select the Deployment Size.

Please note that the Admin Username and the Admin Password will be used to access

the Cloudera Master for SSL Proxy.

A SSH key pair will get generated and you need to copy and save the private key.

Once the Cloudera bundle is deployed in the Azure portal, you might use this Key to login into

the underlying VMs.

SSH Private Key:

You will be given your Private SSH Key during the order of you OpenShift Container

Platform bundle. Please make sure you secure this key and store it in a safe place as

you will need it for SSH access to any of your instances. Your key will be displayed

only once and there is no way to recover it later on. For security reasons, Tech Data

does not keep a copy.

Page 16: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 16

In the Advanced Bundle Settings, you can select the VM size.

You can also choose to deploy the Workers on Standard or Premium Storage.

You can give a name to your Cloudera cluster and choose the DNS Name Prefix for your

Cluster.

Click on ‘Deploy Now’ button.

Services not included:

We do not install Apache Sentry, so you can integrate with your own Kerberos

Installation.

We also do not install Apache Kudu.

Page 17: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 17

1.2 How to connect.

Connect to the Azure Portal with your credentials.

• You would need to login to the Azure portal to get the IP address.

▪ https://portal.azure.com

▪ You would need to login using the same user name and password as the

one created in StreamOne and what was emailed to you.

• For example: [email protected]

• It will give you a one-time password and you will need to change it.

Page 18: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 18

You will then be connected to the Azure Portal. Go to Resource groups.

You will find the Resource Group into which your resources have been deployed.

Page 19: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 19

You will then see your resources listed. Find the Virtual Machine that ends with “-mn0”. You will

use it to connect to the Cloudera Dashboard:

You need to retrieve the DNS Name:

You will use it to create an SSL tunnel.

Page 20: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 20

Open a terminal and run the following command:

ssh -L 7180:10.3.0.4:7180 username@publicip

Username being the username you chose during the order process, and public ip being the one

we just retrieved.

Then choose to continue and enter your password to login into the Virtual Machine.

You can then open an internet browser and connect to http://localhost:7180 to connect to the

Cloudera Manager dashboard.

The default admin credentials are:

- Login: admin

- Password: admin

You will have to change them afterwards!

Page 21: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 21

You will then be connected to your Cloudera Manager Dashboard:

And get started!

Services not included:

We do not install Apache Sentry, so you can integrate with your own Kerberos

Installation.

We also do not install Apache Kudu.

Page 22: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 22

1.3 Post-Deployment Tasks.

Now that you have access to your Veeam Backup & Replication VM, you still need

to configure additional items. These tasks will cover the following:

• How to fix up the Warning Configuration Issues

• How to Upgrade your License

• How to Change the Admin Password and Create Users

Services not included:

We do not install Apache Sentry, so you can integrate with your own Kerberos

Installation.

We also do not install Apache Kudu.

1.3.1 Fix the Warning Configuration Issues. You now have access to your Cloudera Manager Dashboard.

There are still some Configurations to finish up, and you are alerted through some

Alerts.

Page 23: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 23

The first one to configure is the Memory Overcommit Validation Threshold (the 5

hosts).

The second one is warning only happening if you are deploying the Small sizing

(only intended for Dev and Test purposes).

Zookeeper needs three servers to be running in High Availability. Since we only

deploy one Master, the system suggests that it should be running at least 3 servers.

Page 24: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 24

The third one is tied to the Cloudera Management Services. You need to configure:

- The Java Heap Size of Service Monitor

- The Maximum Non-Java Memory of Service Monitor

1.3.2 How to Upgrade your License. The system is currently licensed with a 60 days Trial. To upgrade the cluster with you

license, you need to go to “Administration” then “License”.

Page 25: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 25

Click on “Upgrade to Cloudera Enterprise”.

Page 26: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 26

1.3.3 How to Change the Admin Password and Create

Users. To manage users, you need to go to “Administration” then “Users and Roles”.

You can then change the admin password by clicking on “Actions”.

Page 27: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 27

To create a new User, you can click on “Add Local User”.

Page 28: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 28

2. Architecture Notions.

2.1 Small Sizing (Dev and Test).

Edge Node VM Roles:

"FLUME-AGENT-BASE",

"hbase-GATEWAY-BASE",

"hbase-HBASETHRIFTSERVER-BASE",

"ks_indexer-HBASE_INDEXER-BASE",

"hdfs-BALANCER-BASE",

"hdfs-GATEWAY-BASE",

"hdfs-NFSGATEWAY-BASE",

"hdfs-SECONDARYNAMENODE-BASE",

"hive-GATEWAY-BASE",

"hive-HIVEMETASTORE-BASE",

"hue-HUE_LOAD_BALANCER-BASE",

"hue-HUE_SERVER-BASE",

"sqoop_client-GATEWAY-BASE",

"oozie-OOZIE_SERVER-BASE",

"kafka-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"solr-SOLR_SERVER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-GATEWAY-BASE"

Page 29: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 29

VM Master 1 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hdfs-NAMENODE-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"impala-CATALOGSERVER-BASE",

"impala-STATESTORE-BASE",

"kafka-KAFKA_BROKER-BASE",

"hive-GATEWAY-BASE",

"spark_on_yarn-GATEWAY-BASE",

"spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE",

"yarn-JOBHISTORY-BASE",

"yarn-RESOURCEMANAGER-BASE",

"zookeeper-SERVER-BASE"

VM Worker Roles (1, 2 and 3):

"FLUME-AGENT-BASE",

"hbase-REGIONSERVER-BASE",

"hdfs-DATANODE-BASE",

"hive-GATEWAY-BASE",

"sqoop_client-GATEWAY-BASE",

"impala-IMPALAD-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-NODEMANAGER-BASE"

Page 30: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 30

2.2 Medium Sizing.

Edge Node VM Roles:

"FLUME-AGENT-BASE",

"hbase-GATEWAY-BASE",

"hbase-HBASERESTSERVER-BASE",

"hbase-HBASETHRIFTSERVER-BASE",

"ks_indexer-HBASE_INDEXER-BASE",

"hdfs-BALANCER-BASE",

"hdfs-GATEWAY-BASE",

"hdfs-NFSGATEWAY-BASE",

"hdfs-HTTPFS-BASE",

"hive-GATEWAY-BASE",

"hive-HIVEMETASTORE-BASE",

"hue-HUE_LOAD_BALANCER-BASE",

"hue-HUE_SERVER-BASE",

"sqoop_client-GATEWAY-BASE",

"oozie-OOZIE_SERVER-BASE",

"kafka-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"solr-SOLR_SERVER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-GATEWAY-BASE"

Page 31: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 31

VM Master 1 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hdfs-NAMENODE-BASE",

"hive-GATEWAY-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE",

"zookeeper-SERVER-BASE"

VM Master 2 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hdfs-SECONDARYNAMENODE-BASE",

"hive-GATEWAY-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-JOBHISTORY-BASE",

"yarn-RESOURCEMANAGER-BASE",

"zookeeper-SERVER-BASE"

VM Master 3 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hive-GATEWAY-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"impala-CATALOGSERVER-BASE",

"impala-STATESTORE-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"zookeeper-SERVER-BASE"

VM Worker Roles (1, 2, 3, 4, 5 and 6):

"FLUME-AGENT-BASE",

"hbase-REGIONSERVER-BASE",

"hdfs-DATANODE-BASE",

"hive-GATEWAY-BASE",

"sqoop_client-GATEWAY-BASE",

"impala-IMPALAD-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-NODEMANAGER-BASE"

Page 32: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 32

2.3 Large Sizing.

Edge Node VM Roles:

"FLUME-AGENT-BASE",

"hbase-GATEWAY-BASE",

"hbase-HBASERESTSERVER-BASE",

"hbase-HBASETHRIFTSERVER-BASE",

"ks_indexer-HBASE_INDEXER-BASE",

"hdfs-BALANCER-BASE",

"hdfs-GATEWAY-BASE",

"hdfs-NFSGATEWAY-BASE",

"hdfs-HTTPFS-BASE",

"hive-GATEWAY-BASE",

"hive-HIVEMETASTORE-BASE",

"hue-HUE_LOAD_BALANCER-BASE",

"hue-HUE_SERVER-BASE",

"sqoop_client-GATEWAY-BASE",

"oozie-OOZIE_SERVER-BASE",

"kafka-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"solr-SOLR_SERVER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-GATEWAY-BASE"

Page 33: Tech Data Cloudera on Azure

Cloudera on Microsoft Azure - Step-by-Step Page 33

VM Master 1 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hdfs-NAMENODE-BASE",

"hive-GATEWAY-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE",

"zookeeper-SERVER-BASE"

VM Master 2 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hdfs-SECONDARYNAMENODE-BASE",

"hive-GATEWAY-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-JOBHISTORY-BASE",

"yarn-RESOURCEMANAGER-BASE",

"zookeeper-SERVER-BASE"

VM Master 3 Roles:

"FLUME-AGENT-BASE",

"hbase-MASTER-BASE",

"hive-GATEWAY-BASE",

"hive-HIVESERVER2-BASE",

"sqoop_client-GATEWAY-BASE",

"impala-CATALOGSERVER-BASE",

"impala-STATESTORE-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"zookeeper-SERVER-BASE"

VM Worker Roles (1, 2, 3, up to 16):

"FLUME-AGENT-BASE",

"hbase-REGIONSERVER-BASE",

"hdfs-DATANODE-BASE",

"hive-GATEWAY-BASE",

"sqoop_client-GATEWAY-BASE",

"impala-IMPALAD-BASE",

"kafka-KAFKA_BROKER-BASE",

"spark_on_yarn-GATEWAY-BASE",

"yarn-NODEMANAGER-BASE"