48
MapReduce Service FAQ Date 2018-12-06

FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

MapReduce Service

FAQ

Date 2018-12-06

Page 2: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Contents

1 What Is MRS?........................................................................................................................... 1

2 What Are Highlights of MRS?.............................................................................................. 2

3 What Is MRS Used For?.......................................................................................................... 3

4 How Do I Use MRS?................................................................................................................ 4

5 How Do I Ensure Data and Service Running Security?...................................................6

6 How Is MRS Charged?............................................................................................................ 7

7 What Is Region and AZ?.........................................................................................................8

8 How Do I Prepare a Data Source for MRS?..................................................................... 10

9 What Is the Difference Between OBS and HDFS Data Storage?............................... 12

10 How Do I View All Clusters?.............................................................................................13

11 How Do I View Log Information?....................................................................................14

12 What Types of Jobs Are Supported by MRS?................................................................15

13 How Do I Submit Developed Programs to MRS?.........................................................17

14 How Do I View Cluster Configurations?........................................................................ 18

15 What ECS Specifications Are Supported by MRS?.......................................................19

16 What Is the Relationship Between Spark and Hadoop?........................................... 22

17 What Types of Spark Jobs Are Supported by an MRS Cluster?................................23

18 Can a Spark Cluster Access Data in OBS?..................................................................... 24

19 What Is the Relationship Between Hive and Other Components?.........................25

20 What Types of Distributed Storage Are Supported by MRS?................................... 26

21 Can MRS Cluster Nodes Be Changed on the MRS Management Console?........... 27

22 How Do I Disassociate a Subnet from the ACL Network?........................................ 28

23 Class Cannot Be Found After Flume Submits Jobs to Spark Streaming................ 29

24 A Job Is Running on Hue................................................................................................... 30

MapReduce ServiceFAQ Contents

2018-12-06 ii

Page 3: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

25 An Error Is Reported When the Split Value Is Changed in the Spark Application.......................................................................................................................................................31

26 MRS Development Guide Does not Contain Information About How to UseLoader..........................................................................................................................................33

27 Plenty of Jobs Are Found After Yarn Is Started........................................................... 34

28 Can MRS Be Connected to An External Network?...................................................... 36

29 An Error Is Reported When Spark Is Used.................................................................... 37

30 Task Nodes Are Failed to Be Decreased.........................................................................38

31 How Do I Handle the Expired OBS Certificate in the Cluster?.................................40

32 Adding a New Disk to the MRS Cluster.........................................................................42

MapReduce ServiceFAQ Contents

2018-12-06 iii

Page 4: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

1 What Is MRS?

MapReduce Service (MRS for short), one of basic services on the public cloud, isused for managing and analyzing massive data.

MRS builds a reliable, secure, and easy-to-use operation and maintenance (O&M)platform. The platform provides analysis and computing capabilities for massivedata and can address enterprises' demands on data storage and processing. Userscan independently apply for and use the hosted Hadoop, Spark, HBase, and Hivecomponents to quickly create clusters on a host, which provides batch dataanalysis and computing capabilities for massive data that does not havedemanding requirements on real-time processing.

MapReduce ServiceFAQ 1 What Is MRS?

2018-12-06 1

Page 5: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

2 What Are Highlights of MRS?

The highlights of MRS are as follows:

● Easy to useMRS provides not only the capabilities supported by Hadoop, Spark, SparkSQL, HBase, and Hive, but also the unified SQL interaction interfaces in theentire process, which simplifies big data application development.

● Low costMRS is free of O&M and separates computing from storage. The computingcluster can be created as required and released after a job operation iscomplete.

● StabilityMRS makes you spend less time on commissioning and monitoring clusters.The service usability reaches 99.9% and the data reliability reaches 99.9999%.

● High opennessMRS is open source-based and compatible with other services, and providesREST APIs and JDBC interfaces.

MapReduce ServiceFAQ 2 What Are Highlights of MRS?

2018-12-06 2

Page 6: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

3 What Is MRS Used For?

Based on the Hadoop open source software, Spark in-memory computing engine,HBase distributed storage database, and Hive data warehouse framework, MRSprovides a unified platform for storing, querying, and analyzing enterprise-levelbig data to help enterprises quickly establish a massive data processing system.This platform has the following features:

● Analyzing and computing massive data● Storing massive data● Streaming processing of mass data

MapReduce ServiceFAQ 3 What Is MRS Used For?

2018-12-06 3

Page 7: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

4 How Do I Use MRS?

MRS is easy to use and provides a user-friendly user interface (UI). By usingcomputers connected in a cluster, you can run various tasks, and process or storepetabytes of data.

After Kerberos authentication is disabled. A typical procedure for using MRS is asfollows:

1. Prepare data.

Upload the local programs and data files to be computed to Object StorageService (OBS).

2. Create a cluster.

Create clusters before you use MRS. The cluster quantity is subject to theElastic Cloud Server (ECS) quantity. Configure basic cluster information tocomplete cluster creation. You can submit a job at the same time when youcreate a cluster.

When you create a cluster, only one new job can be added. If you need to add morejobs, perform Step 4.

3. Import data.

After an MRS cluster is successfully created, use the import function of thecluster to import OBS data to HDFS. An MRS cluster can process both OBSdata and HDFS data.

4. Add a job.

After a cluster is created, you can analyze and process data by adding jobs.Note that MRS provides a platform for executing programs developed byusers. You can submit, execute, and monitor such programs by using MRS.After a job is added, the job is in the Running state by default.

5. View the execution result.

The job operation takes a while. After job running is complete, go to the JobManagement page, and refresh the job list to view the execution results onthe Job tab page.

You cannot execute a successful or failed job, but can add or copy the job.After setting job parameters, you can submit the job again.

MapReduce ServiceFAQ 4 How Do I Use MRS?

2018-12-06 4

Page 8: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

6. Terminate a cluster.If you want to terminate a cluster after jobs are complete, click Terminate inCluster. The cluster status changes from Running to Terminating. After thecluster is terminated, the cluster status will change to Terminated and will bedisplayed in Historical Cluster. No fee will be charged.

MapReduce ServiceFAQ 4 How Do I Use MRS?

2018-12-06 5

Page 9: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

5 How Do I Ensure Data and ServiceRunning Security?

MRS is a platform for massive data management and analysis and features highsecurity. It ensures user data and service running security from the followingaspects:

● Network isolation

The public cloud divides the entire network into two planes: the service planeand management plane. The two planes are physically isolated to ensuresecurity of the service and management networks.

– Service plane

Network plane where cluster components are running. It provides servicechannels for users and delivers data access, task submitting, andcomputing functions.

– Management plane

Public cloud console. It is used to apply for and manage MRS.

● Host security

Users can deploy third-party antivirus software based on their servicerequirements. For the operating system (OS) and interfaces, MRS provides thefollowing security protection measures:

– Hardening OS kernel security

– Installing the latest OS patch

– Controlling the OS rights

– Managing OS interfaces

– Preventing the OS protocols and interfaces from attacks

● Data security

MRS stores data on the OBS platform, ensuring data security.

● Data integrity

After processing data, MRS encrypts and transmits data to the OBS systemthrough SSL, ensuring data integrity.

MapReduce ServiceFAQ

5 How Do I Ensure Data and Service RunningSecurity?

2018-12-06 6

Page 10: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

6 How Is MRS Charged?

Currently, the commercial version of MRS is charged based on ECSs in a cluster. Bydefault, cluster nodes can be purchased in Yearly/Monthly mode or On-demandmode.● Yearly/Monthly: The duration ranges from one month to three years. The

customer must pay in full when purchasing a cluster.● On-demand: Cluster node usage is charged based on time, with the default

unit being per hour.

MapReduce ServiceFAQ 6 How Is MRS Charged?

2018-12-06 7

Page 11: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

7 What Is Region and AZ?

ConceptA region and availability zone (AZ) identify the location of a data center. You cancreate resources in a specific region and AZ.

● Regions are divided from the dimensions of geographical location andnetwork latency. Public services, such as Elastic Cloud Server (ECS), ElasticVolume Service (EVS), Object Storage Service (OBS), Virtual Private Cloud(VPC), Elastic IP (EIP), and Image Management Service (IMS), are sharedwithin the same region. Regions are classified as universal regions anddedicated regions. A universal region provides universal cloud services forcommon tenants. A dedicated region provides services of the same type onlyor for specific tenants.

● An AZ contains one or multiple physical data centers. Each AZ hasindependent cooling, fire extinguishing, moisture-proof, and electricityfacilities. Within an AZ, computing, network, storage, and other resources arelogically divided into multiple clusters. AZs within a region are interconnectedusing high-speed optical fibers to allow you to build cross-AZ high-availabilitysystems.

Figure 7-1 shows the relationship between regions and AZs.

Figure 7-1 Regions and AZs

MapReduce ServiceFAQ 7 What Is Region and AZ?

2018-12-06 8

Page 12: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

HUAWEI CLOUD provides services in many regions around the world. You canselect a region and AZ as needed. For more information, see HUAWEI CLOUDGlobal Regions.

Region SelectionWhen selecting a region, consider the following factors:

● LocationYou are advised to select a region close to you or your target users. Thisreduces network latency and improves access rate. However, Chinesemainland regions provide basically the same infrastructure, BGP networkquality, as well as operations and configurations on resources. Therefore, ifyou or your target users are in the Chinese mainland, you do not need toconsider the network latency differences when selecting a region.The countries and regions outside the Chinese mainland, such as Bangkok andHong Kong, provide services for users outside the Chinese mainland. If you oryour target users are in the Chinese mainland, these regions are notrecommended due to high access latency.– If you or your target users are in Asia Pacific excepting the Chinese

mainland, select the AP-Hong Kong, AP-Bangkok, or AP-Singaporeregion.

– If you or your target users are in Africa, select the AF-Johannesburgregion.

– If you or your target users are in Europe, select the EU-Paris region.● Resource price

Resource prices may vary in different regions. For details, see Product PricingDetails.

AZ SelectionWhen determining whether to deploy resources in the same AZ, consider yourapplication's requirements on disaster recovery (DR) and network latency.

● For high DR capability, deploy resources in different AZs in the same region.● For low network latency, deploy resources in the same AZ.

Regions and EndpointsBefore using an API to call resources, specify its region and endpoint. For moredetails, see Regions and Endpoints.

MapReduce ServiceFAQ 7 What Is Region and AZ?

2018-12-06 9

Page 13: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

8 How Do I Prepare a Data Source forMRS?

MRS can process data in both OBS and HDFS. Before using MRS to analyze data,you are required to prepare the data.

1. Upload local data to OBS.

a. Log in to the OBS management console.b. Create a userdata bucket, and then create the program, input, output,

and log folders in the userdata bucket.

i. Click Create Bucket to create a userdata bucket.ii. In the userdata bucket, click Create Folder to create the program,

input, output, and log folders.c. Upload local data to the userdata bucket.

i. Go to the program folder, and click to select a user program.ii. Click Upload.iii. Repeat preceding steps to upload the data files to the input folder.

2. Import OBS data to HDFS.This function is available only when Kerberos authentication is disabled andthe cluster is running properly.

a. Log in to the MRS management console.b. Go to the File Management page and select HDFS File List.c. Click the data storage directory, for example, bd_app1.

bd_app1 is just an example. The storage directory can be any directoryon the page. You can create a directory by clicking Create Folder.

d. Click Import Data, and click to configure the paths of HDFS andOBS, as shown in Figure 8-1.

MapReduce ServiceFAQ 8 How Do I Prepare a Data Source for MRS?

2018-12-06 10

Page 14: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Figure 8-1 Importing files

e. Click OK.You can view the file upload progress in File Operation Record.

MapReduce ServiceFAQ 8 How Do I Prepare a Data Source for MRS?

2018-12-06 11

Page 15: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

9 What Is the Difference Between OBSand HDFS Data Storage?

The data source to be processed by MRS is from OBS or HDFS. OBS provides youwith massive, highly reliable, and secure data storage capabilities at a low cost.MRS can process the data in OBS. You can view, manage, and use data by usingOBS Console or an OBS client. In addition, you can use the REST APIs to manageor access data. You can use the REST APIs alone or integrate it with serviceprograms.

● OBS data storage: Data storage and computing are performed separately. OBSdata storage features low cost and unlimited storage capacity. The clusterscan be terminated at any time in OBS. The computing performance isdetermined by OBS access performance and is lower than that of HDFS. OBSis recommended when data computing is not frequent.

● HDFS data storage: Data storage and computing are performed together.HDFS data storage features high cost, high computing performance, andlimited storage capacity. Before terminating clusters, you must export andstore the data. HDFS is recommended when data computing is frequent.

MapReduce ServiceFAQ

9 What Is the Difference Between OBS and HDFSData Storage?

2018-12-06 12

Page 16: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

10 How Do I View All Clusters?

On the Cluster page, you can view clusters in various states. If massive clustersare involved, you can turn pages to view clusters in any status.

● Active Clusters: contain all clusters except the clusters in the Failed andTerminated state.

● Cluster History: contains the tasks in the Failed and Terminated state. Onlyclusters terminated within the last six months are displayed. If you want toview clusters terminated six months ago, contact technical support engineers.

● Failed Tasks: only contain the tasks in the Failed state. Task failures include:– Cluster creation failure– Cluster termination failure– Cluster scale-out failure– Cluster scale-in failure

MapReduce ServiceFAQ 10 How Do I View All Clusters?

2018-12-06 13

Page 17: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

11 How Do I View Log Information?

On the Operation Log page, you can view log information about users' operationson clusters and jobs only after Kerberos authentication is disabled. Currently, MRShas two types of logs:

● Cluster: Creating, terminating, shrinking, and expanding a cluster● Job: Creating, stopping, and deleting a job

Figure 11-1 shows log information about users' operations.

Figure 11-1 Log information

MapReduce ServiceFAQ 11 How Do I View Log Information?

2018-12-06 14

Page 18: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

12 What Types of Jobs Are Supportedby MRS?

A job functions as a program execution platform provided by MRS. Currently, MRSsupports MapReduce jobs, Spark jobs, and Hive jobs. Table 12-1 describes jobcharacteristics.

Table 12-1 Job types

Type Description

MapReduce MapReduce is a programming model with parallel computingsimplified, and is used for parallel computing of big data sets(over one TB).Map divides one task into multiple tasks, and Reducesummarizes the processing results of these tasks and producesthe final analysis result.After you complete code development, pack the code into a JARfile in IDEA or Eclipse, upload the file to the MRS cluster forexecution, and obtain the execution result.

Spark Spark is a batch data processing engine with high processingspeed. Spark has demanding requirements on memory becauseit performs computing based on memory. A Spark job includes:● Spark: ends with .jar, which is case-insensitive.● Spark Script: ends with .sql, which is case-insensitive.● Spark SQL: specifies standard Spark SQL statements, for

example, show tables;.

MapReduce ServiceFAQ 12 What Types of Jobs Are Supported by MRS?

2018-12-06 15

Page 19: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Type Description

Hive Hive is a data warehouse framework built on Hadoop. Hiveprovides Hive query language (HiveQL), similar to structuredquery language (SQL), to process structured data. Hiveautomatically converts HiveQL in Hive Script to a MapReducetask to query and analyze massive data stored in the Hadoopcluster.An example of a standard HiveQL statement is as follows:create table page_view(viewTime INT,useridBIGINT,page_url STRING,referrer_uel STRING,ip STRINGCOMMENT 'IP Address of the User');

MapReduce ServiceFAQ 12 What Types of Jobs Are Supported by MRS?

2018-12-06 16

Page 20: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

13 How Do I Submit DevelopedPrograms to MRS?

MRS provides a platform for executing programs developed by users. You cansubmit, execute, and monitor such programs by using MRS. To submit developedprograms to MRS, set Program Path to the actual path for storing such programs,as shown in Figure 13-1.

Figure 13-1 Creating a job

MapReduce ServiceFAQ 13 How Do I Submit Developed Programs to MRS?

2018-12-06 17

Page 21: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

14 How Do I View ClusterConfigurations?

● After a cluster is created, you can choose Clusters > Active Clusters, select arunning cluster, and click its name to switch to the cluster information page.You can view the basic configuration information about a cluster, includingthe name, ID, charging type, region, creation time, Hadoop componentversion, as well as the instance specifications and capacities of nodes. Theinstance specifications and capacities of nodes determine the data analysisand processing capability of a cluster. More advanced instance specificationsand larger capacity allow faster cluster running and better data processing,and accordingly require higher cluster costs.

● Choose Clusters > Active Clusters, select a running cluster, click its name toswitch to the cluster information page, and then click View to go to thecluster management page. On the MRS cluster management page that isdisplayed, you can view and process alarm information, modify clusterconfigurations, and install cluster patches.

MapReduce ServiceFAQ 14 How Do I View Cluster Configurations?

2018-12-06 18

Page 22: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

15 What ECS Specifications AreSupported by MRS?

MRS provides optimal specifications based on extensive experience in big dataproduct optimization. Host specifications are determined by CPUs, memory, anddisks. Currently, the following specifications are supported:

MRS uses ECSs of the following types in different application scenarios.

● General computing (S1)

● General computing (S2)

● General computing (S3)

● General computing (C2)

● General computing-plus (C3)

● Disk-intensive (D2)

● General network enhancement (C3ne)

ECS Flavor Naming Rules

ECS flavors are named using the format "AB.C.D".

Example: m2.8xlarge.8

The format is defined as follows:

● A specifies the ECS type. For example, s indicates a general-purpose ECS, c acomputing ECS, and m a memory-optimized ECS.

● B specifies the type ID. For example, the 1 in s1 indicates a general-purposefirst-generation ECS, and the 2 in s2 indicates a general-purpose second-generation ECS.

● C specifies a flavor size and can be any of the following options: medium,large, and xlarge

● D specifies the ratio of memory to vCPUs expressed in a digit. For example,value 4 indicates that the ratio of memory to vCPUs is 4.

MapReduce ServiceFAQ 15 What ECS Specifications Are Supported by MRS?

2018-12-06 19

Page 23: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Specifications

Table 15-1 General computing ECS specifications

ECS Type vCPUs Memory (GB) Flavor VirtualizationType

S1 4 16 s1.xlarge XEN

16 64 s1.4xlarge XEN

32 128 s1.8xlarge XEN

S2 4 16 s2.xlarge.4 KVM

8 16 s2.2xlarge.2 KVM

16 32 s2.4xlarge.2 KVM

16 64 s2.4xlarge.4 KVM

32 128 s2.8xlarge.4 KVM

S3 8 16 s3.2xlarge.2 KVM

16 32 s3.4xlarge.2 KVM

4 16 s3.xlarge.4 KVM

16 64 s3.4xlarge.4 KVM

C2 8 16 c2.2xlarge XEN

16 32 c2.4xlarge XEN

Table 15-2 General computing-plus (C3) ECS specifications

ECS Type vCPUs Memory(GB)

Flavor VirtualizationType

C3 4 8 c3.xlarge.2 KVM

16 32 c3.4xlarge.2 KVM

4 16 c3.xlarge.4 KVM

8 32 c3.2xlarge.4 KVM

16 64 c3.4xlarge.4 KVM

32 128 c3.8xlarge.4 KVM

60 256 c3.15xlarge.4 KVM

MapReduce ServiceFAQ 15 What ECS Specifications Are Supported by MRS?

2018-12-06 20

Page 24: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Table 15-3 D2 ECS specifications

ECSType

vCPUs

Memory(GB)

Flavor VirtualizationType

LocalDisk(GB)

Hardware

D2 8 64 d2.2xlarge.8

KVM 4×1800 CPU: Intel® Xeon®Gold 6151 Processorv5Memory: 20 × 32 GB

16 128 d2.4xlarge.8

KVM 8×1800

32 256 d2.8xlarge.8

KVM 16×1800

Table 15-4 General network enhancement (C3ne) ECS specifications

ECS Type vCPUs Memory(GB)

Flavor VirtualizationType

C3ne 8 16 c3ne.2xlarge.2 KVM

4 16 c3ne.xlarge.4 KVM

8 32 c3ne.2xlarge.4 KVM

16 32 c3ne.4xlarge.2 KVM

16 64 c3ne.4xlarge.4 KVM

32 128 c3ne.8xlarge.4 KVM

60 256 c3ne.15xlarge.4 KVM

More advanced host specifications enable better data processing, and accordinglyrequire higher cluster costs. You can choose host specifications based on siterequirements.

MapReduce ServiceFAQ 15 What ECS Specifications Are Supported by MRS?

2018-12-06 21

Page 25: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

16 What Is the Relationship BetweenSpark and Hadoop?

Spark is a fast and common computing engine that is compatible with Hadoopdata. Spark can run in a Hadoop cluster by using Yarn and process data of anytype in HDFS, HBase, Hive, and Hadoop.

MapReduce ServiceFAQ

16 What Is the Relationship Between Spark andHadoop?

2018-12-06 22

Page 26: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

17 What Types of Spark Jobs AreSupported by an MRS Cluster?

On the page of MRS, an MRS cluster supports Spark jobs submitted in Spark,Spark Script, or Spark SQL mode.

MapReduce ServiceFAQ

17 What Types of Spark Jobs Are Supported by anMRS Cluster?

2018-12-06 23

Page 27: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

18 Can a Spark Cluster Access Data inOBS?

Similar to a Hadoop cluster, a Spark cluster can access data stored in the OBSsystem. After Kerberos authentication is disabled. You only need to set ImportFrom and Export To to the path of the OBS system when submitting jobs.

MapReduce ServiceFAQ 18 Can a Spark Cluster Access Data in OBS?

2018-12-06 24

Page 28: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

19 What Is the Relationship BetweenHive and Other Components?

● Relationship between Hive and HDFSHive is the subproject of Apache Hadoop. Hive uses HDFS as the file storagesystem. Hive parses and processes structured data, and HDFS provides highlyreliable underlying storage support for Hive. All data files in the Hivedatabase are stored in HDFS, and all data operations on Hive are alsoperformed using HDFS APIs.

● Relationship between Hive and MapReduceHive data computing depends on MapReduce. MapReduce is a subproject ofApache Hadoop. It is a parallel computing framework based on HDFS. Duringdata analysis, Hive translates HiveQL statements submitted by users intoMapReduce jobs and submits the jobs to MapReduce for execution.

● Relationship between Hive and DBServiceMetaStore (metadata service) of Hive processes the structure and attributeinformation about Hive databases, tables, and partitions. The informationneeds to be stored in a relational database and is maintained and processedby MetaStore. In MRS, the relational database is maintained by the DBServicecomponent.

● Relationship between Hive and SparkHive data computing can be implemented on Spark. Spark is an Apacheproject. It is a distributed computing framework based on memory. Duringdata analysis, Hive translates HiveQL statements submitted by users intoSpark jobs and submits the jobs to Spark for execution.

MapReduce ServiceFAQ

19 What Is the Relationship Between Hive andOther Components?

2018-12-06 25

Page 29: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

20 What Types of Distributed StorageAre Supported by MRS?

MRS supports Hadoop 2.8.x now and will support other mainstream Hadoopversions released by the community.

MapReduce ServiceFAQ

20 What Types of Distributed Storage Are Supportedby MRS?

2018-12-06 26

Page 30: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

21 Can MRS Cluster Nodes Be Changedon the MRS Management Console?

MRS cluster nodes cannot be changed on the MRS management console. You arenot advised to change MRS cluster nodes on the ECS management console either.If you manually stop or delete the ECS, modify or reinstall the ECS OS, or modifythe ECS specifications for a cluster node on the ECS management console, thecluster may work improperly.

If you have performed any of the preceding operations, MRS automaticallyidentifies and deletes the involved cluster node. You can substitute the deletednode by expanding the capacity of the cluster on the MRS management console.Do not perform any operation on a node during capacity expansion.

MapReduce ServiceFAQ

21 Can MRS Cluster Nodes Be Changed on the MRSManagement Console?

2018-12-06 27

Page 31: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

22 How Do I Disassociate a Subnetfrom the ACL Network?

ScenarioYou can disassociate a subnet from the ACL network when necessary.

Procedure

Step 1 Log in to the management console.

Step 2 On the console homepage, under Network, click Virtual Private Cloud.

Step 3 In the navigation tree on the left, choose Network ACL.

Step 4 Locate the target network ACL in the right pane, and click the network ACL nameto switch to the network ACL details page.

Step 5 On the displayed page, click the Associated Subnets tab.

Step 6 On the Associated Subnets page, locate the target network ACL and clickDisassociate in the Operation column.

Step 7 Click OK.

----End

MapReduce ServiceFAQ

22 How Do I Disassociate a Subnet from the ACLNetwork?

2018-12-06 28

Page 32: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

23 Class Cannot Be Found After FlumeSubmits Jobs to Spark Streaming

Issue

After Flume submits jobs to Spark Streaming, the class cannot be found.

Symptom

After the Spark Streaming code is packed into a JAR file and submitted to thecluster, an error message is displayed indicating that the class cannot be found.The following two methods are not useful:

1. When submitting a Spark job, run the --jars command to reference the JARfile of the class.

2. Import the JAR file where the class resides to the JAR file of Spark Streaming.

Possible Cause

Some JAR files cannot be loaded during Spark job execution, resulting that theclass cannot be found.

Procedure

Step 1 Run the --jars command to load the flume-ng-sdk-{version} .jar dependencypackage.

Step 2 Modify the two configuration items in the spark-default.conf file:

spark.driver.extraClassPath=$PWD/*: {Add the original value}

spark.executor.extraClassPath =$PWD/*

Step 3 Run the job successfully. If an error is reported, check which JAR is not loaded andperform step 1 and step 2 again.

----End

MapReduce ServiceFAQ

23 Class Cannot Be Found After Flume Submits Jobsto Spark Streaming

2018-12-06 29

Page 33: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

24 A Job Is Running on Hue

IssueThe customer finds that a job is running on Hue.

SymptomAfter the customer's MRS is installed, the job is running on Hue but the runningjob is not operated by the customer.

Possible CauseThis job is a permanent job generated when the system connects to JDBC afterSpark is started.

ProcedureThis is not a problem. No handling is required.

MapReduce ServiceFAQ 24 A Job Is Running on Hue

2018-12-06 30

Page 34: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

25 An Error Is Reported When the SplitValue Is Changed in the Spark Application

IssueAn error is reported when the split value is changed in the Spark application.

SymptomThe customer needs to modify the maximum split size to make multiple mapperspossible for acceleration. However, an error is reported when the set $parametercommand is executed to modify the Hive value.

Possible Cause● When configuring the hive.security.whitelist.switch parameter to enable or

disable the whitelist in security mode, you must set the parameter that needsto be run in hive.security.authorization.sqlstd.confwhitelist file.

MapReduce ServiceFAQ

25 An Error Is Reported When the Split Value IsChanged in the Spark Application

2018-12-06 31

Page 35: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

● The default whitelist does not contain the mapred.max.split.size parameter.Therefore, the system displays a message indicating that the maximum splitsize modification is not allowed.

Procedure

Step 1 Log in to MRS Manager, and choose Services-> Hive-> Service Configuration.

Step 2 Set Type to All, search hive.security.authorization.sqlstd.confwhitelist, and addmapred.max.split.size to hive.security.authorization.sqlstd.confwhitelist. Fordetails, see Using Hive from Scratch.

Step 3 Restart the Hive component after the modification.

Step 4 Run the set mapred.max.split.size=1000000 command. If no error is reported,the modification is successful.

----End

MapReduce ServiceFAQ

25 An Error Is Reported When the Split Value IsChanged in the Spark Application

2018-12-06 32

Page 36: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

26 MRS Development Guide Does notContain Information About How to Use

Loader

IssueIf the MRS development guide does not contain information about how to useLoader, what should customers do?

SymptomThe customer cannot find the document about how to use Loader on the officialwebsite.

Possible CauseThe customer does not find the correct entry address for information about howto use Loader on the official website.

Procedure

Step 1 Log in to Help Center, choose EI Enterprise Intelligence > MapReduce Service,and select User Guide.

Step 2 In the User Guide, search for the Using Loader section or keyword Loader.Alternatively, go to the Using Loader page directly.

Step 3 Create source and destination links based on the application scenario. For details,visit Using Loader to Import Data from OBS to HDFS.

----End

MapReduce ServiceFAQ

26 MRS Development Guide Does not ContainInformation About How to Use Loader

2018-12-06 33

Page 37: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

27 Plenty of Jobs Are Found After YarnIs Started

Issue

After the customer creates an MRS cluster and starts Yarn, plenty of jobsoccupying resources are found.

Issue Type

Data management class

Symptom

After the customer creates an MRS cluster and starts Yarn, plenty of jobsoccupying resources are found.

Possible Cause● It is suspected that there are hacker attacks.

● Set the Any protocol in the inbound direction of the SG to the 0.0.0.0/0.

MapReduce ServiceFAQ 27 Plenty of Jobs Are Found After Yarn Is Started

2018-12-06 34

Page 38: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Procedure

Step 1 Log in to the MRS management console. On the Active Clusters page, click thecluster name. The cluster details page is displayed.

Step 2 Click View next to Cluster Manager. The Access MRS Manager page is displayed.

Step 3 Click Manage Security Group Rule to check the security group rule configuration.

Step 4 Check whether the source address of the Any protocol in the inbound direction is0.0.0.0/0.

Step 5 If it is 0.0.0.0/0, change the remote end of the Any protocol in the inbounddirection to a specified IP address. If it is not 0.0.0.0/0, there is no need to changethe value.

Step 6 After the value is changed successfully, restart the cluster VM.

----End

Summary and SuggestionsDisable the Any protocol in the inbound direction, or specify the remote end of theAny protocol in the inbound direction as the specified IP address.

ReferenceFor details, see Security Configuration Suggestions for Clusters with KerberosAuthentication Disabled.

MapReduce ServiceFAQ 27 Plenty of Jobs Are Found After Yarn Is Started

2018-12-06 35

Page 39: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

28 Can MRS Be Connected to AnExternal Network?

IssueCan MRS be connected to an external network?

SymptomCan MRS be connected to an external network?

Possible CauseMRS can be accessed from an external network only after an EIP is bound to anMRS node.

Procedure

Step 1 Log in to the MRS management console, locate the cluster to be accessed in theactive cluster list, and click the cluster name.

Step 2 On the node information page, click the name of the node to be accessed, andchoose EIPs > Bind EIP.

Step 3 On the Bind EIP page, select the NIC to be bound in the Select NIC drop-downlist, and click OK.

If no EIP is available in the EIP address list, you can purchase an EIP. For details, seeAssigning an EIP and Binding It to an ECS.

----End

MapReduce ServiceFAQ 28 Can MRS Be Connected to An External Network?

2018-12-06 36

Page 40: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

29 An Error Is Reported When Spark IsUsed

IssueWhen Spark is used, the cluster fails to run.

SymptomWhen Spark is used, the cluster fails to run.

Possible Cause● Invalid characters are added during command execution.● The owner and owner group of the uploaded JAR file is incorrect.

Procedure

Step 1 Run /bin/spark-submit --class cn.interf.Test --master yarn-client /opt/client/Spark/spark1-1.0-SNAPSHOT.jar to check whether invalid characters areimported.

Step 2 If they are imported, modify the invalid characters and run the command again.

Step 3 After the command is executed again, other errors occur. Both the owner and theowner group of the JAR file are root.

Step 4 Change the owner and the owner group of the JAR file to omm:wheel.

----End

MapReduce ServiceFAQ 29 An Error Is Reported When Spark Is Used

2018-12-06 37

Page 41: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

30 Task Nodes Are Failed to BeDecreased

IssueThe customer resizes the cluster on the MRS cluster details page and changes thenumber of Task nodes to 0, with a result that Task nodes are not decreased.

SymptomWhen a customer adjusts the Task nodes on the MRS cluster details page, amessage stating "This operation is not allowed because the number of instancesof NodeManager will be less than the minimum configuration after scale-in,which may cause data loss." is displayed, indicating that cluster scale-in fails.

Possible CauseThe customer stops the NodeManager service of a Core node. As a result, if allTasks are found to be decommissioned when MRS checks whether Task nodes canbe decommissioned, no NodeManager is available, and so is the Yarn service.However, MRS determines that Task nodes can be decommissioned only when theremaining NodeMangers must be greater than or equal to 1.

ProcedureStep 1 Log in to the MRS management console. On the Active Clusters page, click the

cluster name. The cluster details page is displayed.

Step 2 Click View next to Cluster Manager to access the MRS Manager page.

Step 3 On the MRS Manager page, choose Services > Yarn > Instance.

Step 4 Start NodeManager of the Core node.

MapReduce ServiceFAQ 30 Task Nodes Are Failed to Be Decreased

2018-12-06 38

Page 42: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Step 5 Then, decommission Task nodes on the cluster list page. If you do not want to useNodeManager of the Core node after the Task node decommissioning issuccessful, stop NodeManager.

----End

Summary and SuggestionsTypically, NodeManager of the Core node is not stopped. You are not advised tochange the cluster deployment architecture.

MapReduce ServiceFAQ 30 Task Nodes Are Failed to Be Decreased

2018-12-06 39

Page 43: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

31 How Do I Handle the Expired OBSCertificate in the Cluster?

IssueThe certificate expired when the customer accesses OBS from the MRS cluster.

SymptomThe certificate expiration problem occurs when the customer accesses the OBSservice in the MRS cluster, resulting in failure to access the OBS system data.

Possible CauseThe certificate generated by the OBS system has a validity period. After thevalidity period expires, the server automatically updates the certificate. As a result,an error occurs when the customer uses the old certificate to access the OBSsystem.

ProcedureLog in to the cluster node in the background using VNC and run the followingcommand. For details about the configuration of each Region on HUAWEI CLOUD,see Table 31-1.

/opt/Bigdata/jdk/bin/keytool -delete -storepass changeit -alias ${uds_url} -keystore /opt/Bigdata/jdk/jre/lib/security/cacerts || trueecho | /usr/bin/openssl s_client -connect ${uds_url}:${uds_port} 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/obs.pem/usr/bin/openssl x509 -in /tmp/obs.pem -text | grep CNyes|/opt/Bigdata/jdk/bin/keytool -import -storepass changeit -alias ${uds_url} -keystore /opt/Bigdata/jdk/jre/lib/security/cacerts -file /tmp/obs.pemrm -rf /tmp/obs.pem

MapReduce ServiceFAQ

31 How Do I Handle the Expired OBS Certificate inthe Cluster?

2018-12-06 40

Page 44: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Table 31-1 Configuration of each region on HUAWEI CLOUD

Region uds_url uds_port

CN North-Beijing1 obs.cn-north-1.myhuaweicloud.com

443

CN North-Beijing4 obs.cn-north-4.myhuaweicloud.com

443

CN East-Shanghai1 obs.cn-east-3.myhuaweicloud.com

443

CN East-Shanghai2 obs.cn-east-2.myhuaweicloud.com

443

CN South-Guangzhou obs.cn-south-1.myhuaweicloud.com

443

MapReduce ServiceFAQ

31 How Do I Handle the Expired OBS Certificate inthe Cluster?

2018-12-06 41

Page 45: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

32 Adding a New Disk to the MRSCluster

IssueMRS HBase is unavailable.

SymptomHigh disk usage of the user's host causes service faults.

Possible CauseServices are unavailable due to insufficient disk capacity.

ProcedureStep 1 Purchase an EVS disk. For details, see Purchasing an EVS Disk.

Step 2 Attach the EVS disk. For details, see Attaching a Non-Shared Disk.● If the EVS disk has been attached, go to Step 6.● If an ECS fails to be selected when you attach the EVS disk on the EVS

management console, go to Step 3.

Figure 32-1 Failed to select the cloud server.

Step 3 Log in to the ECS management console and click the name of the ECS to whichnew disks are about to be attached.

MapReduce ServiceFAQ 32 Adding a New Disk to the MRS Cluster

2018-12-06 42

Page 46: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Step 4 On the Disks tab page, click Attach Disk.

Figure 32-2 Attaching EVS disks to the node

Step 5 Select new disks to be attached and click OK.

Figure 32-3 Attaching disks

Step 6 Initialize Linux data disks. For details, see Initializing a Linux Data Disk (fdisk).

The attachment point directory is the existing DataNode instance ID plus one. For example,if you run the df –h command and find that the existing ID is /srv/BigData/hadoop/data1,the new attachment point is /srv/BigData/hadoop/data2. When initializing a Linux datadisk to create an attachment point, name the new attachment point /srv/BigData/hadoop/data2 and attach a new partition to the attachment point. Example:mkdir /srv/BigData/hadoop/data2mount /dev/xvdb1 /srv/BigData/hadoop/data2

Step 7 Run the following command to grant the omm user permission to the new disk:chown omm:wheel New attachment point

Example: chown omm:wheel /srv/BigData/hadoop/data2

Step 8 Run the chmod 701 command to grant the execution permission on the newattachment point directory.chmod 701 New attachment point

MapReduce ServiceFAQ 32 Adding a New Disk to the MRS Cluster

2018-12-06 43

Page 47: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

Example: chmod 701 /srv/BigData/hadoop/data2

Step 9 Log in to MRS Manager and add data disks to DataNode and NodeManagerinstances.

Step 10 Choose Services > HDFS > Instance > DataNode > Instance Configuration. SetType to All, and modify DataNode instance configurations of the current node.● Enter dfs.datanode.fsdataset.volume.choosing.policy in the search box and

modify the parameter value toorg.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy.

● Enter dfs.datanode.data.dir in the search box and modify the parametervalue to /srv/BigData/hadoop/data1/dn,/srv/BigData/hadoop/data2/dn.

If the values of the two parameters have been modified, click Save Configurationand select Restart role instance to restart the DataNode instance.

Step 11 Choose Services > Yarn > Instance > NodeManager > Instance Configuration.Set Type to All, and modify the Yarn NodeManager instance configurations of thecurrent node.● Enter yarn.nodemanager.local-dirs in the search box and modify the

parameter value to /srv/BigData/hadoop/data1/nm/localdir,/srv/BigData/hadoop/data2/nm/localdir.

● Enter yarn.nodemanager.log-dirs in the search box and modify theparameter value to /srv/BigData/hadoop/data1/nm/containerlogs,/srv/BigData/hadoop/data2/nm/containerlogs.

If the values of the two parameters have been modified, click Save Configurationand select Restart role instance to restart the NodeManager instance.

Step 12 To check whether capacity expansion is successful, choose Services > HDFS >Instance > DataNode and check whether the total disk capacity in the DataNodeCapacity real-time monitoring metric has been increased in the Charts area. If theDataNode Capacity monitoring metric does not exist in the Charts area, clickCustomize to add the metric.● If the total disk capacity has been increased, the capacity expansion is

complete.● If the total disk capacity does not increase, contact Huawei technical support.

Step 13 (Optional) Add data disks to a Kafka instance.

Modify the Kafka instance configurations of the current node.

1. Log in to MRS Manager, and choose Services > Kafka > Instance > Broker >Instance Configuration. Set Type to All.

2. Enter log.dirs in the search box to add a new disk and separate it from theexisting one with commas (,).For example, if there is only one existing Kafka data disk and a new one isadded, change /srv/BigData/kafka/data1/kafka-logs to /srv/BigData/kafka/data1/kafka-logs,/srv/BigData/kafka/data2/kafka-logs.

3. Click Save Configuration and select Restart role instance to restart theinstance as prompted.

MapReduce ServiceFAQ 32 Adding a New Disk to the MRS Cluster

2018-12-06 44

Page 48: FAQ - HUAWEI CLOUD · MRS makes you spend less time on commissioning and monitoring clusters. ... HUAWEI CLOUD provides services in many regions around the world. You can select a

4. To check whether capacity expansion is successful, choose Services > Kafka >Instance > Broker and check whether the total disk capacity in the Capacityof Broker Disks real-time monitoring metric has been increased.

----End

Summary and Suggestions● If the disk usage exceeds 85%, you are advised to expand disk capacity and

attach the newly purchased disks to an ECS to associate with a cluster.● Perform the attachment steps and set parameters based on the site

requirements.

MapReduce ServiceFAQ 32 Adding a New Disk to the MRS Cluster

2018-12-06 45