16
` Prerequisite Activity: Deploying the HDP and the Data Science VM

Prerequisite Activity: Deploying the HDP and the Data ... · Prerequisite Activity: Deploying the HDP ... Configure integration with Visual Studio by ... By using this Hands-on Lab,

  • Upload
    dotuyen

  • View
    222

  • Download
    8

Embed Size (px)

Citation preview

`

Prerequisite Activity:

Deploying the HDP

and the Data Science

VM

Contents

Overview ......................................................... 3

Deploy HDP Sandbox ..................................... 4

Create the VM ................................................. 6

Configure Azure Data Lake and SQL Data

Warehouse .................................................... 14

Terms of Use ................................................. 16

Interactive queries using Spark SQL on Azure HDInsight

3

Summary

In order for you to complete the labs we have prepared, you need

to ensure that you have an Azure subscription with admin

rights. This will allow you to create small clusters (max 4 nodes)

that we will utilize during the lab. We ask that you create the HDP

sandbox before arriving at the lab (see ‘Deploy HDP Sandbox).

Please liaise with your internal IT organization to gain the

necessary privileges to complete the lab.

Once your internal IT organization has granted you access to the

Azure Portal we highly recommend you complete the sections in

this document before coming to the lab to test the access granted.

This document should take no more than 30 minutes to complete.

If you have any difficulties at all then please get in contact with

your Microsoft representative.

The first lab will work with the Hortonworks sandbox environment.

We recommend you deploy this and shut it down before attending

the lab. We will also use the Twitter API, in the ‘Deploy HDP

Sandbox’ we’ve also included a link on instructions to set that up.

As part of this lab we will also be using Visual Studio to submit

Hive Queries. The software required to complete the lab is

already installed on a pre-configured VM in Azure called The Data

Science Virtual Machine. This virtual machine has the following

software installed:

Visual Studio 2015 Community Edition

Azure SDK.

Revolution R Open.

Power BI Desktop

SQL Server Express 2014

IPython

Azure PowerShell

Azure Storage Explorer

In this activity we will create an instance of this virtual machine

and install tools on the VM.

Overview

Interactive queries using Spark SQL on Azure HDInsight

4

Summary

Deplploying the Hortonworks sandbox ca be done before the lab,

and ensures you have all the necessary rights within your azure

subscription. It only takes a few minutes, and can be shutdown

once setup to prevent any further charges.

The instructions to setup a single node Hortonworks environment

are here: http://hortonworks.com/hadoop-tutorial/deploying-

hortonworks-sandbox-on-microsoft-azure/

Once deployed, you can shut the sandbox down by:

1. Log into the azure portal at https://portal.azure.com/

2. If you cannot see a VM on the dashboard with the name f

your cluster, you can search for it using the dialog at the

top:

3.

4.

5. Select your VM and in the Dashboard select ‘Stop’:

6.

7. Confirm you wish to stop the VM, confirm the status

updates to ‘stopped (Deallocated)’ after a few minutes to

ensure you aren’t charged for the Virtual machine:

Deploy HDP Sandbox

Interactive queries using Spark SQL on Azure HDInsight

5

8.

Create the Twitter API Keys

As part of the lab we’ll be collecting and processing Twitter feeds.

In order to connect to the Twitter API you’ll need to create a

twitter and and collect the API keys. Instructions for this can be

obtained here: http://www.gabfirethemes.com/create-twitter-api-

key/

Interactive queries using Spark SQL on Azure HDInsight

6

The HDInsight lab will also use Visual studio to connect to a

HDInsight cluster and run a Hive script. Visual studio community

edition is installed and configured on the Data Science VM, which

is freely available from Azure Market place.

1. Sign in to the Azure portal - https:// portal.azure.com/

2. Click on + New.

3. In the search box type Data science virtual machine press

the return key. You should see the following

4. Click on the Data Science Virtual Machine (published by

Microsoft)

5. Click on Create.

6. In the Basics Blade fill out a Name (n.b. this has to be a

unique name to the whole of Azure), User name, Password,

Resource group. Select a location nearest to you (this is the

location of the Microsoft data center). Example entry is

outlined below:

Create the VM

Interactive queries using Spark SQL on Azure HDInsight

7

7. The Size blade will pop up next. Select A3 (n.b. we will shut

down the VM at the end of this lab).

8. On the Settings blade click OK:

Interactive queries using Spark SQL on Azure HDInsight

8

9. On the Summary Blade click OK:

10. On the Buy Blade click Purchase:

Interactive queries using Spark SQL on Azure HDInsight

9

11. On the startboard you will see the VM being deployed. This

will take approximately 5-10minutes.

12. Once it is successfully deployed you will see the following on

the startboard:

13. Click on the VM you created from the startboard to get the

following page:

Interactive queries using Spark SQL on Azure HDInsight

10

14. Click on the Connect button as highlighted above. Save the

RDP file.

15. Double click on the downloaded RDP file to connect to the VM

and enter you credentials (note the \ before the username):

The next steps are optional, should you wish to explore other

features of the SDKs

1. Once you have connected to the Data Science Virtual

Machine install the Azure SDK by double clicking on the

Microsoft Web Platform shortcut on the desktop:

Interactive queries using Spark SQL on Azure HDInsight

11

In the installer click on Add for Microsoft Azure SDK for .Net

(VS 2015) - <VERSION NUMBER> and then install:

This takes approximately 5minutes to finish installing.

2. Ensure Azure Data Lake Tools for Visual Studio are installed

(Data Lake Tools for Visual Studio). Once Data Lake Tools

for Visual Studio is installed, you will see a Data Lake menu in

Visual Studio.

3. Next, install RTools by visiting the following site -

https://cran.r-project.org/bin/windows/Rtools/ - in Internet

Explorer and downloading Rtools33.exe

Interactive queries using Spark SQL on Azure HDInsight

12

Run through the installer ensuring that at the additional tasks

stage the following checkbox is ticked:

This updates the PATH environment variable so that various

R Tooling is available.

4. Close the VM by clicking the X on the blue bar highlighted

below:

5. Shutdown the VM by clicking on the Stop button on the VM

blade in the Azure preview portal (this will take a couple of

minutes).

Interactive queries using Spark SQL on Azure HDInsight

13

6. If you managed to successfully complete all these steps, then

you are ready for the Advanced Analytics lab!

Interactive queries using Spark SQL on Azure HDInsight

14

Introduction

This is an optional activity and not required to complete the labs.

However during the labs, you may also wish to review some of the

other Big Data Services available in Azure: Azure Data Lake and

the Data Lake analytics Service.

1) Familiarize with Azure Data Lake Store by reading this

2) By completing this tutorial you will enable your Azure

subscription for Data Lake Store Public Preview,

create an Azure Data Lake Store account and test

some basic Data Lake Store functionalities. At the end

don’t delete the ADL account.

3) Understand this post.

4) By completing this tutorial you will create a Data Lake

Analytics account, prepare source data and submit

Data Lake Analytics jobs

5) Familiarize with Azure SQL Data Warehouse by

reading this

6) Create a SQL Data Warehouse by completing this

tutorial

7) Configure integration with Visual Studio by

completing this tutorial

Configure Azure Data Lake and SQL Data Warehouse

Interactive queries using Spark SQL on Azure HDInsight

15

Interactive queries using Spark SQL on Azure HDInsight

16

© 2015 Microsoft Corporation. All rights reserved.

By using this Hands-on Lab, you agree to the following terms:

The technology/functionality described in this Hands-on Lab is provided by

Microsoft Corporation in a “sandbox” testing environment for purposes of

obtaining your feedback and to provide you with a learning experience. You may

only use the Hands-on Lab to evaluate such technology features and

functionality and provide feedback to Microsoft. You may not use it for any other

purpose. You may not modify copy, distribute, transmit, display, perform,

reproduce, publish, license, create derivative works from, transfer, or sell this

Hands-on Lab or any portion thereof.

COPYING OR REPRODUCTION OF THE HANDS-ON LAB (OR ANY

PORTION OF IT) TO ANY OTHER SERVER OR LOCATION FOR FURTHER

REPRODUCTION OR REDISTRIBUTION IS EXPRESSLY PROHIBITED.

THIS HANDS-ON LAB PROVIDES CERTAIN SOFTWARE

TECHNOLOGY/PRODUCT FEATURES AND FUNCTIONALITY,

INCLUDING POTENTIAL NEW FEATURES AND CONCEPTS, IN A

SIMULATED ENVIRONMENT WITHOUT COMPLEX SET-UP OR

INSTALLATION FOR THE PURPOSE DESCRIBED ABOVE. THE

TECHNOLOGY/CONCEPTS REPRESENTED IN THIS HANDS-ON LAB MAY

NOT REPRESENT FULL FEATURE FUNCTIONALITY AND MAY NOT WORK

THE WAY A FINAL VERSION MAY WORK. WE ALSO MAY NOT RELEASE A

FINAL VERSION OF SUCH FEATURES OR CONCEPTS. YOUR

EXPERIENCE WITH USING SUCH FEATURES AND FUNCITONALITY IN A

PHYSICAL ENVIRONMENT MAY ALSO BE DIFFERENT.

FEEDBACK. If you give feedback about the technology features, functionality

and/or concepts described in this Hands-on Lab to Microsoft, you give to

Microsoft, without charge, the right to use, share and commercialize your

feedback in any way and for any purpose. You also give to third parties, without

charge, any patent rights needed for their products, technologies and services to

use or interface with any specific parts of a Microsoft software or service that

includes the feedback. You will not give feedback that is subject to a license that

requires Microsoft to license its software or documentation to third parties

because we include your feedback in them. These rights survive this

agreement.

MICROSOFT CORPORATION HEREBY DISCLAIMS ALL WARRANTIES AND

CONDITIONS WITH REGARD TO THE HANDS-ON LAB, INCLUDING ALL

WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER

EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR

PURPOSE, TITLE AND NON-INFRINGEMENT. MICROSOFT DOES NOT

MAKE ANY ASSURANCES OR REPRESENTATIONS WITH REGARD TO THE

ACCURACY OF THE RESULTS, OUTPUT THAT DERIVES FROM USE OF

THE VIRTUAL LAB, OR SUITABILITY OF THE INFORMATION CONTAINED IN

THE VIRTUAL LAB FOR ANY PURPOSE.

DISCLAIMER

This lab contains only a portion of the features and enhancements in Microsoft

Azure. Some of the features might change in future releases of the product.

Terms of Use