23
HDFS Connector API Reference Part 1 Thanh Nguyen

HDFS connector API reference part 1

Embed Size (px)

Citation preview

HDFS Connector API Reference Part 1

Thanh Nguyen

Introductions

• Hadoop Distributed File System (HDFS) Connector.

Requirement Required?Requires Mule Enterprise License

Yes  

Requires Entitlement No  Mule Version 3.6.0 or higher

Kerberos Configuration

• <hdfs:config-with-kerberos>• Connection Management• Kerberos authentication configuration. Here

you can configure properties required by "Kerberos Authentication" in order to establish connection with Hadoop Distributed File System

Kerberos Configuration - AttributesName Java Type Descriptionname String The name of this configuration. With this

name can be later referenced.

nameNodeUri

String The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overriden by values in configurationResources and configurationEntries.

Kerberos Configuration - AttributeskeytabPath String Path to the keytab file associated

with username. It is used in order to obtain TGT from "Authorization server". If not provided it will look for a TGT associated to username within your local kerberos cache. 

username String A simple user identity of a client process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries.

Kerberos Configuration - AttributesconfigurationResources

List<String>

A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml) 

configurationEntries

Map<String,String>

A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs.

Simple Configuration

• <hdfs:config>• Connection Management• Simple authentication configuration. Here you

can configure properties required by "Simple Authentication" in order to establish connection with Hadoop Distributed File System

Simple Configuration- Attributesname String The name of this configuration. With

this name can be later referenced.x 

nameNodeUri

String The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overriden by values in configurationResources and configurationEntries.

Simple Configuration- Attributesusername String A simple user identity of a client

process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries. 

configurationResources

List<String> A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml) 

configurationEntries

Map<String,String>

A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs

Processors

• Read from Path– <hdfs:read-operation>

• XML Sample– <hdfs:read-operation path="/tmp/test.dat"

bufferSize="8192" config-ref="hdfs-conf"/>

Processors

Name Java Type Description

config-ref String Specify which config to use

path String the path of the file to read.

bufferSize int the buffer size to use when reading the file

Returns

Return Java Type Description

InputStream the result from executing the rest of the flow

Get Path Metadata

• <hdfs:get-metadata>• This flow variables are:– hdfs.path.exists - Indicates if the path exists (true or

false)– hdfs.content.summary - A resume of the path info– hdfs.file.checksum - MD5 digest of the file (if it is a file

and exists)– hdfs.file.status - A Hadoop object that contains info

about the status of the file (org.apache.hadoop.fs.FileStatus

XML Sample

• <hdfs:get-metadata path="/tmp/test.dat" config-ref="hdfs-conf"/>

Write to Path

• <hdfs:write>• Write the current payload to the designated

path, either creating a new file or appending to an existing one

Write to Path- AttributesName Java Type Description

config-ref String Specify which config to use

path String the path of the file to write to.

permission String the file system permission to use if a new file is created, either in octal or symbolic format (umask).

overwrite boolean if a pre-existing file should be overwritten with the new content.

Write to Path- AttributesbufferSize int the buffer size to use when appending

to the file.

replication int block replication for the file.

blockSize long the buffer size to use when appending to the file.

ownerUserName

String the username owner of the file.

ownerGroupName

String the group owner of the file.

payload InputStream the payload to write to the file.

Append to File

• <hdfs:append>• Append the current payload to a file located at

the designated path.• In order to be able append any data to an

existing file refer to dfs.support.append configuration parameter

Append to File - AttributesName Java Type Description

config-ref String Specify which config to use

path String the path of the file to write to.

bufferSize int the buffer size to use when appending to the file.

payload InputStream

the payload to append to the file.

Delete File

• <hdfs:delete-file>• Delete the file or directory located at the

designated path

Delete File - XML Sample

• <hdfs:delete-directory path="/tmp/my-dir" config-ref="hdfs-conf"/>

Delete File - Attributes

Name Java Type Description

config-ref String Specify which config to use

path String the path of the directory to delete.

Question and answer