STREAMANALTIX 2.1.6 Administrationdocs.streamanalytix.com/2.1.6/pdf/Administration.pdfStreamAnalytix platform enables enterprises to analyze and respond to events in real-time at Big

StreamAnalytix 2.1.6 Administration

pg. 1

STREAMANALTIX 2.1.6

ADMINISTRATION

Configure, Manage and Monitor


pg. 2

Introduction Welcome to StreamAnalytix! StreamAnalytix platform enables enterprises to analyze and respond to events in real-time at Big Data scale. With its unique multi-engine architecture, StreamAnalytix provides an abstraction that offers a flexibility to execute data pipelines using a stream processing engine of choice depending upon the application use-case, taking into account the advantages of Storm or Spark Streaming based upon processing methodology (CEP, ESP) and latency.

About This Guide StreamAnalytix Administrator (Superuser) manages Workspaces and Users, configures System properties, adds Connections, operates System pipelines, and monitors System Performance. This guide describes all the administration functions in detail. More Information Please visit www.streamanalytix.com To give us your feedback on your experience with the application and report bugs or problems, mail us at [email protected] To receive updated documentation in the future please register yourself at www.streamanalytix.com We welcome your feedback.

http://www.streamanalytix.com/

mailto:[email protected]

http://www.streamanalytix.com/


pg. 3

Terms & Conditions This manual, the accompanying software and other documentation, is protected by U.S. and international copyright laws, and may be used only in accordance with the accompanying license agreement. Features of the software, and of other products and services of Impetus Technologies, may be covered by one or more patents. All rights reserved.

All other company, brand and product names are registered trademarks or trademarks of their respective holders. Impetus Technologies disclaims any responsibility for specifying which companies own which marks or which organizations.

USA Los Gatos Impetus Technologies, Inc. 720 University Avenue, Suite 130 Los Gatos, CA 95032, USA Ph.: 408.252.7111, 408.213.3310 Fax: 408.252.7114 © 2017 Impetus Technologies, Inc., All rights reserved.

If you have any comments or suggestions regarding this document, please send them via e-mail to [email protected]



pg. 4

Table of Contents Introduction ............................................................................................................................................... 2

About This Guide.................................................................................................................................... 2

Terms & Conditions ................................................................................................................................... 3

ADMINISTRATION (Superuser) .................................................................................................................. 6

Superuser Dashboard............................................................................................................................. 6

Pipeline Summary ............................................................................................................................... 6

Metrics ................................................................................................................................................ 8

SAX Web Health.................................................................................................................................. 8

Connections ........................................................................................................................................ 9

Alerts................................................................................................................................................. 10

License Summary .............................................................................................................................. 10

Workspace Management..................................................................................................................... 13

Workspace Creation ......................................................................................................................... 13

Workspace Cluster Isolation............................................................................................................. 14

User Roles ......................................................................................................................................... 15

Workspace Dashboard ......................................................................................................................... 17

Pipeline Summary ............................................................................................................................. 17

Metrics .............................................................................................................................................. 18

SAX Web Health................................................................................................................................ 19

Connections ...................................................................................................................................... 20

Alerts................................................................................................................................................. 21

License Summary .............................................................................................................................. 22

Configuration ....................................................................................................................................... 24

Environment ..................................................................................................................................... 24

Common ........................................................................................................................................... 30

LogAgent ........................................................................................................................................... 34

LDAP.................................................................................................................................................. 36

Kerberos ........................................................................................................................................... 38

Connections ......................................................................................................................................... 41

ActiveMQ .......................................................................................................................................... 41


pg. 5

Cassandra.......................................................................................................................................... 42

Elasticsearch ..................................................................................................................................... 42

Hbase ................................................................................................................................................ 43

HDFS ................................................................................................................................................. 44

HIVE .................................................................................................................................................. 44

HIVE Emitter ..................................................................................................................................... 45

JDBC .................................................................................................................................................. 46

Kafka ................................................................................................................................................. 47

RabbitMQ ......................................................................................................................................... 47

S3 ...................................................................................................................................................... 48

Socket ............................................................................................................................................... 48

Solr .................................................................................................................................................... 49

System Pipelines .................................................................................................................................. 50


pg. 6

ADMINISTRATION (Superuser) StreamAnalytix Superusers are responsible for setting up and administer the cluster or instances and more. This guide is a step-to step guide for Workspace Management, Setting up StreamAnalytix Cluster, User-Role Set-up, Configuring Environment and other properties, System Pipelines and Connections.

Superuser Dashboard The Superuser Dashboard is the landing page when a Superuser logs in. The widgets provide details about Pipelines, Metrics, SAX Web Health, Connections, Alerts, License Summary.

Pipeline Summary This widget provides a summary of Pipelines throughout all the workspaces. It also provides information about cluster usage in supported engines such as Spark and Storm.

Spark

Click on the dropdown at the top right corner and choose Spark.


pg. 7

The below table describes the Pipeline Summary and its components:

Pipeline Summary Pipelines The number of pipelines created in Spark Engine for all the

workspaces. Active Number of Active Pipelines. Starting Number of pipelines in Starting Mode. Error Number of pipelines that are in Error State. Stopped Number of pipeline that Stopped.

Cluster Usage Summary Node Managers The total number of Node Managers available on Hadoop Yarn

Cluster. Cores It shows the utilization of cluster cores by Superuser Pipelines. In the

above screenshot, 3 out of 100 cores are used. (Each node has 25 cores)

Memory It shows the utilization of cluster memory by all pipelines. In the above screenshot, 8192 is the memory utilization out of total 61440 available memory in MBs. (Each node has 15360 MB of memory)

Storm

Click on the dropdown at the top right corner and choose Storm.


pg. 8

Cluster Usage Slaves The total count of Supervisors in the Storm cluster.

Workers It shows the utilization of workers for Storm pipelines. In our case, 5 workers have been utilized out of 90 available workers. It also shows the percentage of the utilization.

Metrics Metrics provides the count of different entities through all the workspaces.

Metrics Groups Number of groups created Messages Number of messages created in the workspaces. Users Number of workspaces created. Tables Number of tables registered under register entities Models Number of models registered. Components Total count of components used. Functions Total count of functions applied. Variables Total count of variables used. Transformations Total number of transformations used.

SAX Web Health This section helps monitor the overall health of the system through the memory and CPU utilization.

Application

Application Web Health shows the utilization of the CPU and Memory by the StreamAnalytix Application. Choose Application from the dropdown menu provided at the top left corner.


pg. 9

CPU

CPU shows the average usage of all cores available on machine (in percentage). CPU usage happens node by node. For example, there are 4 node managers on a machine and the Application used 2 nodes, that is 100% of one node and 90% of the other node. (As shown in the diagram above)

Memory

The total utilization of memory on machine (in percentage). That is the Application will use the total available memory irrespective of node.

System

This widget provides details about System Health. System Web Health is about the machine on which StreamAnalytix application is running.

CPU

The accumulative usage of all cores available on machine (in percentage) that is irrespective of CPU utilization of StreamAnalytix Application.

Memory

The total utilization of memory on machine (in percentage). This also is irrespective of the Application.

Connections This widget displays all the connections with their respective count. Connections allow you to connect to infrastructure components such as RabbitMQ, Elasticsearch, ActiveMQ, Cassandra, HDFS, JDBC, Hbase, Hive Emitter, Kafka, Kinesis, MapStreams, MQTT, OpenJms, S3, Socket, Solr, Tibco.


pg. 10

Shown below is the Number of total connections made to different components, that is 9 is the diagram below.

Various components and the number shown in the respective component tile is the number of connections made to that component.

Alerts It allows you to create alerts on system monitoring parameters like CPU Idle state, memory usage etc. An Alert Pipeline enables the Alert Processor to generate alerts on the UI and send alert emails. It stores alert-messages in the database and aggregate alerts for a given time window. You can also check the generated alerts on the UI along with email notifications.

This widget will show alerts generated by System with a brief description and their timestamp.

System

System alerts shows two types of alerts:

1. Monitoring Alerts are thrown when a condition is setup in the monitoring section. 2. Pipeline stopped Alerts of the Superuser pipeline are thrown when an error occurs in the

Spark or Storm Pipeline. Note: If a pipeline has been stopped directly from spark cluster, it generates pipeline error alert. If you stop it from Data Pipeline, then it will not.

License Summary This is the summary of License owner and other details of license as described below:


pg. 11

License Type

There are five types of licenses based on the categories explained below:

Category License Type Description A Time-based License cannot be used after

the expiry date. It provides early warning alerts.

B Total no. of cores License will not permit usage of any more than Total Number of cores.

C Total no. of Nodes and Cores License is valid for a specific no. of nodes and specific no. of cores underlying each node. System will not permit usage of any more than the specified number of node or cores per node. For e.g., Like 2 nodes and 2 cores per node = 2nodes and 4 cores. Therefore, if you have 2 nodes but each node has 3 cores, the license will not be valid.

Any single installation of StreamAnalytix will support only one of the following license types:

1. A 2. B 3. C 4. A and B 5. A and C

You can also find the same information on the bottom left corner of the dashboard page by clicking on License.

License Owner and Email is the name and email id of the License owner.


pg. 12

Start Date and Expiration Date is the start and end date of the license duration. License owner can contact at [email protected] if duration is to be extended.

You will get an alert 14 days before the expiry date, both email and periodic notifications on StreamAnalytix UI.

Max Cores and Cores/Nodes are explained in the table above.

Violation

There are three codes available that shows the status of the license. Violation tab is at the top right corner of the widget.

GOOD - License not expired and not violated.

Expired - License expired.

Violated - License in violated state.

In case of Expired or Violated license, you need to contact StreamAnalytix Support Team at [email protected]




pg. 13

Workspace Management StreamAnalytix provides multitenancy support through Workspaces. You can create multiple workspaces and add users to a workspace. One user can be mapped to multiple workspaces.

Workspace Creation A Superuser can create n number of workspaces depending upon the requirements. Below described are steps to create a workspace.

Go to Manage Workspace and click on Create New workspace.

Provide workspace name and add users to create Workspace.

You can either create a new user to add to the workspace or select from existing users.

NOTE: The new user that is created while workspace creation will have a Developer role. User Email Id and Passwords fields are not visible if StreamAnalytix authentication is configured with LDAP. This is because; you do not need this information in StreamAnalytix database. How to move to other Workspace: Click the Workspace icon in the upper right corner of the Superuser page to view a drop down list of the workspaces. Choose the workspace from the list you wish to enter. Actions: Two types of actions can be performed on the workspaces:


pg. 14

• Enter: Click on the Enter button in order to switch workspace. If Superuser enters inside the

workspace, role will be changed to Developer. If Devops and Tier-II users enter inside the workspace, role will be unchanged.

• Edit : Click on the Edit button of desired workspace in which you want to change user assignment

Note: There is no provision to delete any workspace.

Workspace Cluster Isolation StreamAnalytix’s Cluster Isolation feature enables Superuser to allocate specific machines to Workspaces for better utilization of hardware resources. Enable Cluster Isolation To enable Cluster Isolation for workspaces, got to Superuser UI > Configuration > Common, and set Enable StreamAnalytix Storm Scheduling property to TRUE Note: Refresh the page to view Cluster Isolation tab on the left panel Assign Machines to Workspaces You can find the details of the assigned machines to a workspace, view available machines, and assign machines to workspaces from the Cluster Isolation page.


pg. 15

User Roles User Roles determines the level of permissions that are assigned to a user to perform a group of tasks. StreamAnalytix has following four user roles:

1. Superuser 2. Developer 3. DevOps 4. Tier-II

Superuser can manage workspaces, all users, configurations, system pipelines, connections, and cluster. By default, StreamAnalytix comes with the Superuser role. Superuser creates developers when workspaces are created. Workspaces are logical containers for user pipelines, message schemas. A Superuser can create DevOps and TierII users separately as these roles span workspaces. DevOps

A DevOps user can do the following:

• Start, Stop, Import and Export Pipelines • Register Entities-set pipeline scoped variable and custom components • Manage Group, Message and Alert Configuration • Create Connection

To create a DevOps user, login as a SuperAdmin, go to Manage Users, create a New User and assign role.


pg. 16

Tier-II A Tier-II user can do the following:

• Access Monitoring • View pipeline summary, errors & Summary • View alerts

A Tier-II user cannot perform following operations:

• NOT able to create, update, delete, play, pause and stop pipelines. • NOT able to register components. • NO access to group, message and alerts configuration.

Developer A user with the role Developer can perform unrestrictive operation within a workspace. He can perform all the operations of a DevOps role in addition to pipeline creation, updating and deletion. Only a Developer or a SuperAdmin can create, update or delete any Developer Role. In case of Superuser view, the Manage Users page will not contain any developer role/user of any workspace. They are listed in their respective workspace(s) only under Manage Users tab. In case LDAP is configured, Manage Users tab is not visible in Superuser view; it is visible in workspaces only. LDAP users are assigned roles based on their LDAP group configurations. The reason is, LDAP itself manages DevOps and Tier-II type of users and we do not need to create them.


pg. 17

Workspace Dashboard The Workspace Dashboard is the landing page when you enter a workspace. The dashboard has widgets that provide details about Pipelines, Metrics, SAX Web Health, Connections, Alerts, License Summary.

Pipeline Summary This widget provides details about Pipelines of the respective workspace. It also provides information about cluster usage in supported engines such as Spark and Storm.

Spark

Click on the dropdown at the top right corner and choose Spark. The below table describes the Pipeline summary and the components it has:

Pipeline Summary Pipelines The number of pipelines created in a particular workspace of Spark

Engine. Active Number of Active Pipelines in the respective workspace. Starting Number of pipelines in Starting Mode in the respective workspace. Error Number of pipelines that are in Error State in the respective

workspace. Stopped Number of pipeline that have Stopped in the respective workspace.


pg. 18

Storm

This menu will show the details of the Storm Engine. Choose Storm from the dropdown at the top right.

Cluster Usage Slaves Slaves are the number of supervisors available in the storm cluster.

However the count you will see here depends if Scheduling is enabled or not. If it is enabled, it shows the assigned number of supervisors or slaves, and if not then it shows the total no. of slaves.

Workers Workers are total number of workers and how any of them are currently in use. It also shows the percentage of the utilization.

Metrics Metrics provides the count of different entities through all the workspaces.

Metrics


pg. 19

Groups Number of groups created. Messages Number of messages created in the workspaces. Users Number of workspaces created. Tables Number of tables registered under register entities Models Number of models registered. Components Total count of components used. Functions Total count of functions applied. Variables Total count of variables used. Transformations Total number of transformations used.

SAX Web Health This section helps monitor the overall health through the Memory and CPU utilization of the StreamAnalytix Application and System (machine on which StreamAnalytix is deployed).

Application

Application Web Health shows the utilization of the CPU and Memory by the StreamAnalytix Application. Choose Application from the dropdown menu provided at the top left corner.


pg. 20

CPU CPU shows the average usage of all cores available on machine (in percentage). CPU usage happens node by node. For example, there are 4 node managers on a machine and the Application used 2 nodes that is 100% of one node and 90% of the other node. Memory The total utilization of memory on machine (in percentage). That is the Application will use the total available memory irrespective of which node.

System

This widget provides details about System Health. System Web Health is about the machine on which StreamAnalytix application is running.

CPU The accumulative usage of all cores available on machine (in percentage) that is irrespective of CPU utilization of StreamAnalytix Application. Memory The total utilization of memory on machine (in percentage). This is also irrespective of the Application.

Connections This widget displays all the connections with their respective count. Connections allow you to connect to infrastructure components such as RabbitMQ, Elasticsearch, ActiveMQ, Cassandra, HDFS, JDBC, Hbase, Hive Emitter, Kafka, Kinesis, MapStreams, MQTT, OpenJms, S3, Socket, Solr, Tibco. Shown below is the Number of total connections made to different components, that is 9 in the diagram. Various components are shown and the number shown in the respective component tile is the number of connections made to that component.


pg. 21

Alerts It allows you to create alerts on system monitoring parameters like CPU Idle state, memory usage etc. and an Alert Pipeline enables the Alert Processor to generate alerts on the UI and send alert emails. It stores alert-messages in the database and aggregate alerts for a given time window. You can also check the generated alerts on the UI along with email notifications. This widget will show alerts generated by System with a brief description and their timestamp.

System

System alerts shows two of the following types of alerts: 1. Monitoring Alerts are thrown when a condition is setup in the monitoring section. 2. Pipeline stopped Alerts of the Superuser pipeline are thrown when an error occurs in the

Spark or Storm Pipeline.

Note: If a pipeline has been stopped directly from spark cluster, it generates pipeline error alert. If you stop it from Data Pipeline, then it will not.

Application

If you have a streaming pipeline, you can apply an alert on the same. You will see the description of the alert and its timestamp in this widget. The alert can have a customized description.


pg. 22

License Summary This is the summary of License owner and other details of license as described below:

License Type There are five types of licenses based on the categories explained below:

Category License Type Description A Time-based License cannot be used after expiry date. It

also provides early warning alerts. B Total no. of cores License will not permit usage of any more

than Total Number of cores. C Total no. of Nodes and Cores License is valid for a specific no. of nodes and

specific no. of cores underlying each node. System will not permit usage of any more than the specified number of node or cores per node. For e.g., Like 2 nodes and 2 cores per node = 2nodes and 4 cores. Therefore if you have 2 nodes but each node has 3 cores, the license will not be valid.

Any single installation of StreamAnalytix will support only one of the following license types:

1. A 2. B 3. C 4. A and B 5. A and C


pg. 23

You can also find the same information on the bottom left corner of the dashboard page by clicking on License. License Owner and Email is the name and email id of the License owner. Start Date and Expiration Date is the start and end date of the license duration. If the License owner wants to extend the duration they will have connect at [email protected] You will get an alert 14 days before the expiry date, both email and periodic notifications on StreamAnalytix UI. Max Cores and Cores/Nodes are explained in the table above. You can also find the same information on the bottom left corner of the dashboard page by clicking on License. Violation There are three codes available that shows the status of the license:

GOOD - License not expired & not violated. Expired - License expired. Violated - License in violated state. In case of Expired or Violated license, you need to contact StreamAnalytix Support Team at [email protected]




pg. 24

Configuration StreamAnalytix Configuration settings are divided into three categories: Environment, Common and Log Agent.

Environment Configuration properties of the various environment components are explained below: Components Property Description ActiveMQ Username Username of ActiveMQ to create connection.

Host IP address of the machine where ActiveMQ is running. Password Password of ActiveMQ to create connection.

Ambari Metric Collector Port Hostname where metric Ambari Collector is running. Metric Collector Host Ambari metric collector port.

CouchBase Max Pool Size The couchbase max pool size. Default bucket memory size

The memory size of default bucket in couchbase.

Password The couchbase password. Default bucket replicano

The couchbase default bucket replication number.

Host Port The port number of couchbase. Host Name The host on which the couchbase is running. Bucket List The couchbase bucket list. HTTP URL The couchbase http url. com.streamanalytix.storm.bolt.SampleBolt

Defines the fully qualified class name, which would instantiate couchbase client for given bucket.

Polling timeout Defines the polling timeout of couchbase. Polling sleeptime Defines the sleep time between each polling. User Name The username of the couchbase user.

Graphite Port Port number of Graphite. Host IP address of the machine where Graphite is running. UI Port UI port number of graphite.

Active Directory (LDAP)

Group Search Base Analogous to User Search Base, this property defines the location in the LDAP directory from which the LDAP group search begins.

Password Password against which you will be authenticated in LDAP server.

User Search Base Defines the part of the directory tree under which DN searches will be performed.


pg. 25

User Search Filter The filter which will be used to search DN within the search base defined above.

Group Search Filter The filter that is used to search for group membership. The default is member = {0} corresponding to the groupOfMembers LDAP class. In this case, the substituted parameter is the full-distinguished name of the user. The parameter {1} can be used if you wish to filter on the login name.

User Distinguished Name

A unique name that is used to find the user in the LDAP server.

LDAP Connection URL URL of the LDAP server is a string that can be used to encapsulate the address and port of a directory server. For example, ldap://host:port

MetaData Connection Information

Driver Class The database driver class name. Password The database password. Connection URL The database URL for the database. User The database username.

RT Dashboards SuperAdmin Password The super admin password (Required to access the

Dashboard UI).

ReportClient path The path of ReportClient.properties required connecting with Report Engine.

Organization ID The name of organization for SAX in Intellicus.

SuperAdmin User ID The Dashboard super admin user name required to access the Intellicus via UI.

Connection Name The connection name created for SAX in Dashboard.

SAX URL The dashboard web admin URL, used for showing Dashboard UI from within Streamanalytix admin.

SuperAdmin Organization

The Dashboard super admin organization name, required to access the Intellicus via UI.

RabbitMQ Password Password of RabbitMQ to create connection. Port Port number of RabbitMQ. Host List IP address of the machine where RabbitMQ is running. User Username of RabbitMQ to create connection.

Spark Spark cluster manager Defines spark cluster manager i.e. standalone or yarn. Spark UI Port It is the port on which Spark Master UI is running. Spark Hadoop is HDP If environment is, HDP, set it to TRUE, otherwise set it to

FALSE. Used for setting proxy user. Resource Manager Host

Defines the resource manager hostname used for storm-yarn deployment.

Resource Manager Port Defines the Resource Manager Port for used for storm-yarn deployment.


pg. 26

spark.yarn.resource.manager.webapp.port

Yarn Resource Manager UI Port.

Spark UI Host Host name of the Spark Master. Spark REST Host and Port

Spark Rest Host Name and Port. For example, Host1:6066

Spark Master URL It is the Spark Master URL. For example, spark://host1:7077

Storm Nimbus Host The IP address of the machine where Storm (Nimbus) is running.

Nimbus Thrift Port The thrift port of the Nimbus machine. Nimbus UI Port The UI Port where Storm UI is running to see all

information related to each pipeline. Storm.nimbus.seeds Comma separated host names list of nimbuses to be used. Storm.nimbus.ha.enabled

Set to true if Nimbus is running in Highly Available configuration. If this is false, then nimbus.seeds property will not be referred.

Storm Zookeeper Port The port of Storm Zookeeper. Storm Impersonation Enabled

Set to true if User Impersonation is to be used with storm.

Storm Impersonation Target User

User to impersonate.

storm.cluster.manager Storm deployment mode (Standalone/Yarn). Storm Supervisors List Storm supervisor’s hostnames, if configured will enable

pipelines to be submitted on the configured hosts only. Storm Zookeeper Server List

The list of zookeeper servers where storm metadata is kept.

Resource Manager Host

Yarn resource manager host name.

Resource Manager Port Yarn resource manager port. storm.ui.host Hostname of the storm UI.

System User Authentication Source

This property specifies that which authentication source, either StreamAnalytix database or LDAP server, has to be chosen to match or bind user’s credentials while login into application. If configured with LDAP, user who is trying to login into the application should exist in LDAP server.

LogMonitoring UI Port Defines the port of Log monitoring. User Authorization Source

Specify user’s authorization mechanism, accordingly user will be assigned appropriate role in the webapp. Possible values are ldap and db. Default value is db.

StreamAnalytix UI Host Defines the IP address of StreamAnalytix. Indexer Type Defines the default indexer types. For example, Solr or

Elasticsearch.


pg. 27

Indexer Type Defines the default indexer type. For example, Solr or Elasticsearch.

Index Default Replication Factor

Number of additional copies of data to be kept.

Enable Index Default is Batch

Default value for the batch parameter of indexing.

Index Default Batch Size

Default batch size for indexing store.

Enable Index Default Across Field Search

Search without specifying column name takes extra space and time.

Index Default Number of Shards

Number of shards to be created in index store.

Index Default Routing Required

Default value for the routing parameter for indexing.

Indexer Default Source Default value for the source parameter for indexing. Index Retries Number of retries for indexing. Index Retries Interval(in ms)

Defines the retries interval for the indexing when ingestion fails.

Indexer time to live in seconds

Indexed data older than mentioned time in seconds from current time will not be fetched.

StreamAnalytix Monitoring Reporters Supported

Defines the monitoring reporter type and the possible values should be comma separated graphite, console and logger.

StreamAnalytix UI Port Defines the UI port of StreamAnalytix. Components Supported

Creates default connection of StreamAnalytix components. Possible values are RabbitMQ, ActiveMQ, Kafka, HDFS, Hbase, Cassandra, Hive, SOLR and Elasticsearch. Default connection will be created for those components that you enter in the Components Supported textbox.

Elasticsearch Cluster Name

The name of elastic search cluster.

Elasticsearch Connection URL

The http connection URL for elastic search.

Elasticsearch HTTP Connection URL

The http connection URL for elastic search.

Elasticsearch Embedded Data Directory

The data directory for running embedded elastic search.

Elasticsearch Embedded Enable HTTP

Either the http connection is enabled or not for embedded elasticsearch.

Elasticsearch Embedded Node Name

The node name of embedded as node.


pg. 28

Elasticsearch Embedded Enable data

To store data in disk or memory (true for disk, false for memory).

Elasticsearch Embedded Enable local

The value of this field should be true.

Elasticsearch HTTP Port The port on which Elastic Search REST URL is hosted. Shield Enabled Enable shield plugin for Elastic Search authentication. Shield SSL Certificate Verification

Verification for shield SSL Certificate.

Shield Keystore Key Password

Key password for Keystore.

Shield Hostname Verification

Hostname verification for shield.

Shield Transport SSL Enables SSL on the transport-networking layer to ensure that communication between nodes is encrypted.If set to True, communication will be done in encrypted form.

Shield Keystore Password

Password used to access the keystore.

Shield Keystore Enabled

To enable keystore use for shield plugin.

Shield Password Enabled

User password for shield plugin.

Shield Keystore Path Path to elasticsearch config folder where keystore is placed.

Shield Truststore Password

Password used to access the trust store

Shield User Name Username for shield. Shield Hostname Verification

Enabled hostname verification.

Enabled KeyStore Password

KeyStore Password for SSL.

Shield Truststore Enabled

To enable truststore for shield plugin.

Shield Hostname Resolve Name

Performs hostname verification on transport connections.

Shield Hostname Verification

Enable Hostname Verification.

Shield Truststore Enabled

Truststore enabled for shield.

Shield Truststore Path Truststore path for shield. Messaging Type Specifies the Message Queuing System that application

uses internally for messaging.


pg. 29

Possible values are RABBITMQ (for RabbitMQ) or ACTIVEMQ (for ActiveMQ).

LogMonitoring UI Host Defines the host address of log monitoring. Cassandra Host List Addresses of servers where cassandra is running. Cassandra User Username for Cassandra data store authentication. Cassandra Password Password for cassandra data store authentication. Cassandra Thrift Client Retry Count

The number of retry to establish connection.

Cassandra Thrift Client Delay Between Retries (in ms)

Time interval in millisecond for connection retry.

Cassandra Keyspace Replication Factor

The number of copies of the data that will be present in the cluster.

Cassandra Keyspace Replication Strategy

Strategy determines the nodes where replicas are placed. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering pipeline.

Cassandra Connection Retry Count

Cassandra connection retry count.

Self Testability RMQ Exchange Name

Exchange name for self test pipeline.

Self Testability RMQ Queue Name

Queue name for self test pipeline.

Solr Zookeeper Hosts Defines the Zookeeper hosts for the Solr server. Database Dialect The type of database on which StreamAnalytix database is

created. Kafka Metadata Broker List

The list of comma separated IP:Port of Kafka brokers.

Kafka Zookeeper Server List

The list of comma separated IP:Port of zookeeper for creating Kafka topic from SAX UI.

system-config.channels.supported

Channels supported by self test pipeline

Persistence Store Defines the default persistence type. For example, HBASE, Cassandra.

Persistence Default Isbatch Enable

Defines if by default batching should be enabled in persistence.

Persistence Default Batch Size

Batch size for the persistence store.

Persistence Default Compression

Defines the default compression type for the persistence store.

Hadoop Enable HA Hadoop cluster is HA enabled or not.


pg. 30

Common Common properties can be platform specific or common to multiple components. Below mentioned are the configurable properties.

Field Description

ActiveMQ Connection Timeout(in ms)

Defines the ActiveMQ connection timeout interval in ms.

Application logging level The logging level to be used for StreamAnalytix logs.

Audit targets The Audit logging Implementation to be used in the application, Default is file.

CEP Type The name of the CEP used. Possible value is CEP as of now.

CepHA Wait Interval The wait interval of primary CEP task node.

Cloud trial The flag for cloud trial. Possible values are true/false.

Cloud Trial Data Usage Monitoring From Time

The time from where to enable the data usage monitoring.

File System URI File System URI. For example, hdfs://hostname:port, hdfs://nameservice, maprfs://clustername

Hadoop User Name of the user through which Hadoop service is running. Hadoop DFS Name Services

Name service id of Hadoop HA cluster.

Hadoop Namenode 1 Details

The RPC Address of namenode1.

Hadoop Namenode 2 Details

The RPC Address of namenode2.

StreamAnalytix Installation Directory

Defines the installation directory of StreamAnalytix.

HBase Zookeeper Host The zookeeper host names used for Hbase cluster. HBase Zookeeper Port The zookeeper port for hbase cluster. Hbase Client Retry Number

The number of retries for the Hbase client.

HBase Zookeeper Parent Node

Parent node in zookeeper for Hbase service metadata.

HBase Zookeeper Recovery Retry

The number of times to retry the connection to hbase zookeeper.

Self Testability Ingestion Type

Type of channel used whether Kafka / RabbitMQ or any other channel for ingestion of data in Self Testability pipeline.

Zookeeper Host List List of Host names where Zookeeper is installed.


pg. 31

Cloud Trial Day Datausage Monitoring Size(in bytes)

The max data usage size for FTP user.

Cloud Trial Max Data usage Monitoring Size(in bytes)

The maximum data usage limit for cloud trail.

Cloud Trial Workers Limit The maximum number of workers for FTP user.

Cluster Manager home url The URL for Cluster Manager home page.

DFS Rotation Policy Type The HDFS rotation policy type. It can be size, time, or size-time.

Enable Audit The value (true/false) for enabling audit in application.

Enable dashboard Whether dashboard is enabled or disabled.

Enable HA Global To enable or disable HA.

Enable Log Agent If Agent Configuration option should be visible on StreamAnalytix GUI or not.

Enable Spark Error Search Enables to index and search spark pipelines error in Log monitoring.

Enable Storm Error Search Enable showing pipeline Application Errors tab using LogMonitoring search page

Enable StreamAnalytix Monitoring Flag

Is monitoring enabled or not.

Enable StreamAnalytix Scheduler

To enable or disable the topology stopped alert.

Enable StreamAnalytix Storm Scheduling

To enable or disable storm scheduling.

Enable Topology Debug If enabled it shows the detailed debug logs.

Enable dashboard Defines whether dashboard is enabled or disabled.

Export Search Raw Field Comma separated field to export Log monitoring search result.

FTP Base Path Defines base path for the FTP location.

FTP Disk Usage Limit Defines the disk usage limit for FTP user.

FTP Service URL Defines the URL of FTP service to create the ftp directory for logged in user (required only for cloud trial).

Hadoop Security Enabled Set to true if Hadoop is secured with Kerberos. If hadoop security is enabled, HBase and Hive will be assumed to be running with Kerberos security.

Java Temp Directory The temporary directory location.

Kafka Key Serializer Class Defines the class name of the Kafka producer serializer used.

Kafka Message Fetch Size Bytes

Defines the number of byes of messages to attempt to fetch for each topic-partition in each fetch request.


pg. 32

Kafka Producer Batch Size Defines the batch size of data produced at kafka from log agent.

Kafka Producer Message Send Max Retry Count

Defines the number of times the producer will automatically retry a failed send request.

Kafka Producer Partitioner Class

Defines the class name of the Kafka producer partitioner used.

Kafka Producer Request Required Acks

Defines the acknowledgment of when a produce request is considered completed.

Kafka Producer Retry Backoff(in ms)

Defines the amount of time that the kafka producer waits before refreshing the metadata.

Kafka Producer Serializer Class

Defines the class name of the Kafka producer key serializer used.

Kafka Producer Topic Metadata Refresh Interval(in ms)

Defines the metadata refresh time taken by kafka when there is a failure.

Kafka Producer Type Defines whether Kafka producing data in async or sync mode.

Kafka Spout Connection Retry Sleep Time

Time between consecutive kafka spout connection retry.

Kafka Zookeeper Session Timeout(in ms)

Defines the Kafka Zookeeper Connection timeout.

Kerberos Sections Comma separated list of sections in keytab_login.conf

LogMonitoring Dashboard Interval(in min)

Log monitoring application refresh interval.

Logmonitoring Supervisors Servers

Servers dedicated to run logmonitoring pipeline.

Max Memory for Storm Jar Submitter

Maximum memory allocated to storm jar submitter process initially at the time of deployment of StreamAnalytix platform.

Message compression Defines the type of compression used on emitted object from any component.

Metrics Conversion Rate Unit

Specifies the unit of rates for calculating the queryserver metrics.

Metrics Duration Rate Unit Specifies the unit of duration for the metrics

Metrics Report Duration Specifies the duration at interval of which reporting of metrics will be done.

Metrics Report Duration Unit

Specifies the unit of the duration at which queryserver metrics will be reported.

Persistence Encryption Key Specifies the encryption key used to encrypt data in persistence.

Query Retries Specifies the number of retries to make a query in indexing.


pg. 33

Query Retry Interval(in ms) Defines query retry interval in milliseconds.

QueryServer Metrics Conversion Rate Unit

Specifies the unit of rates for calculating the queryserver metrics.

QueryServer Metrics Duration Rate Unit

Specifies the unit of duration for the queryserver metrics.

QueryServer Metrics Report Duration

Defines the time after which query server metrics should be published.

QueryServer Monitoring Flag

Defines the flag value (true/false) for enabling the query monitoring.

QueryServer Monitoring Reporters Supported

Defines the comma-separated list of appenders where metrics will be published. Valid values are graphite, console and logger.

RabbitMQ Max Retries Defines maximum number of retries for the RabbitMQ connection.

RabbitMQ Retry Delay Interval

Defines the retry delay intervals for RabbitMQ connection.

RabbitMQ Session Timeout Defines session timeout for RabbitMQ connection.

Real-time Alerts Exchange Name

Defines the RabbitMq exchange name for real time alert data.

Solr Security Enabled If Kerberos security is enabled then set this flag to true to enable secure communication with Solr.

Spark Monitoring Enable Specifies the flag to enable the spark monitoring.

Spark Monitoring Scheduler Delay(in ms)

Specifies the spark monitoring scheduler delay in milliseconds.

Spark Monitoring Scheduler Period(in ms)

Specifies the spark monitoring scheduler period in milliseconds.

Storm Nimbus Retry Interval(in ms)

Defines the retry intervals time in millis for nimbus connection.

Storm Nimbus Retry Intervalceiling (in ms)

Defines the retry interval ceiling in millis for storm nimbus connection.

Storm Nimbus Retry Times Defines the Storm Connection Retry count.

Storm Security Enabled Set to true if Storm is secured with Kerberos.

StreamAnalytix Default Tenant Token

Token of user for HTTP calls to log monitoring for adding modifying system information.

StreamAnalytix Deployment Timeout(in seconds)

Defines the time in which the pipeline is going to deploy on storm cluster.

StreamAnalytix Pipeline Error Search Tenant Token

Tenant token for pipeline error search.

StreamAnalytix Pipeline Log Location

Log location of the StreamAnalytix pipeline.


pg. 34

StreamAnalytix Reporting Period

Reporting Period of StreamAnalytix.

StreamAnalytix Scheduler Interval

Defines the topology stopped alert scheduler's time interval in seconds.

StreamAnalytix Session Timeout

Defines the timeout for a login session in StreamAnalytix.

StreamAnalytix Spark Error Search Index Expression

Spark Pipeline application error index expression (time based js expression to create indexes in ES or Solr that is used during retrieval of data).

StreamAnalytix Storm Error Search Index Expression

Storm Pipeline application error index expression (time based js expression to create indexes in ES or Solr that is used during retrieval of data).

StreamAnalytix Storm Max Retries

Defines the max retries count for storm nimbus connection.

StreamAnalytix Storm Monitor Rest API

Defines which API to use for Storm cluster monitoring, either REST or Thrift.

StreamAnalytix Storm Retry Delay Interval

Defines the max delay interval in millis for storm nimbus connection.

Test Connection Time Out Timeout for test connection (in ms).

Topology Message Timeout(in seconds)

Defines the maximum amount of time given to the pipeline to process a message emitted by a channel. If the message is not acked within this period, Storm will fail the message on the channel.

Topology Mode Defines whether pipeline will run in local or cluster.

TraceMessage Compression

Defines the type of compression used on emitted trace message from any component.

Zookeeper Retry Count Zookeeper connection retry count.

Zookeeper Retry Delay Interval

Defines the retry interval for the zookeeper connection.

Zookeeper Session Timeout Zookeeper Session’s timeout time.

LogAgent Log Agent read logs from different sources and transmits the messages to one or more destinations according to its own rules. Given below are the LogAgent properties:

Field Description

Agent Monitoring Flag Defines if monitoring has to be done on the agent machine or not.

Cipher Algorithm The encryption method used in log agent.

Debug Mode If the generated data has to be logged in the log file created.


pg. 35

Enable Log Info If Agent Configuration log file path and machine information has to be stored in database or not.

Enable Log Thread If log thread process is enabled or not. Enable Ping Monitoring Used to check if agent process is up or not. Enable System Info If system data has to be generated or not.

Kafka Producer Batch Size The batch size of data produced at kafka from log agent.

Kafka Producer Key Serializer Class The class name of the Kafka producer key serializer used.

Kafka Producer Message Send Max Retries

The number of times the producer will automatically retry a failed send request.

Kafka Producer Partitioner Class The class name of the Kafka producer practitioner used.

Kafka Producer Retry Backoff(in ms)

The amount of time that the kafka producer waits before refreshing the metadata.

Kafka Producer Serializer Class The class name of the Kafka producer serializer used.

Kafka Producer Topic Metadata Refresh Interval(in ms)

The metadata refresh time taken by kafka when there is a failure.

Kafka Producer Type If the producer type will be sync or async. Log Agent Kafka Producer Request Required Acks

The acknowledgement of when a produce request is considered completed.

LogMonitoring Agent Index Expression

Agent index expression(time based js expression to create indexes in ES or Solr, that is used during retrieval also), sax admin UI and storm logs data stored in ES and LogAgent(agentIndex flag true) used to index log data and LogMonitoring uses it to retrieve.

LogMonitoring Agent Index Tenant Token

LogAgent (agentIndex flag true) token of user to search adminui and storm logs on LogMonitoring application.

Logstash Bin Script Location The logstash script file path in logstash tar used.

Number of Top Process Monitor The number of top processes to be shown on LogMonitoring GUI dashboard.

Ping Monitoring Interval(in sec) The time interval to check if log agent process if up or not.

System Broker Type The broker used to generate system data. System Kafka Topic The kafka topic name for system data.

System Monitoring Enable If system monitoring is enabled on the agent machine or not.

System Monitoring Interval(in sec) The time interval to generate system data. System RabbitMQ Exchange The RabbitMQ exchange name for system data.


pg. 36

System RabbitMQ Queue The RabbitMQ queue for the system data generated.

LDAP Authentication is the process of identifying a user’s identity by obtaining some credentials and using those credentials to verify the identity. Post authentication, user must gain authorization for doing certain tasks. The authorization process determines the access control list (ACL) for user i.e. to control or restrict user access on certain activities for any security reasons. It is a process of enforcing policies that determines what type of tasks or services a user is permitted to perform or use. StreamAnalytix provides three ways for user authentication and authorization:

1. Use a Database for both authentication and authorization. 2. Use a LDAP server for both authentication and authorization. 3. Use a LDAP server for authentication and a Database for authorization. Configuration In application, the configuration for user authentication and authorization is done in the following manner.

# User Authentication and Authorization source authentication.source: db (or ldap) authorization.source: db (or ldap)

Possible values are db and ldap for build-in system database and LDAP active directory server, respectively. By default, value is db.

User Authentication

This property specifies the authentication source (database or LDAP).

StreamAnalytix supports two types of authentication systems: 1. Own built-in database: Enables administrator to create and save user details in the system database. If the property authentication.source is db, user who trying to login into the application should already be pre-created/exist in database. If the user does not exist, only a SuperAdmin or a Developer can create a new user. 2. LDAP: Configure the system to integrate with an Active Directory server for user management.


pg. 37

If the property authentication.source is ldap, user should exist in the LDAP directory. In this case, a user with the role Developer is pre-created in order to leverage multi-tenancy support provided by the platform. LDAP server directly authenticates Dev-ops and Tier-II users. User Authorization

This property specifies the authorization source (database or LDAP) to map users with their role(s).

StreamAnalytix supports two types of authorization systems: 1. Own built-in database: If the property authorization.source is db, two cases follow:

Case I: authentication.source is db.

In this case, user who is trying to login into the application, should already be pre-created/exist in database with his any of the four roles. During authorization process, the user role is fetched from database and assigned to user’s security context.

Case II: authentication.source is ldap.

In this case, user should exist in LDAP directory (in order to perform user authentication) as well as in the database (in order to retrieve user role).

2. LDAP: If the property authorization.source is ldap, it is mandatory to configure authentication.source also with LDAP.

In this case, user role will be assigned on the basis of LDAP group’s common names (cn).

You can import the required LDAP-Group vs. StreamAnalytix-Role mapping into the database prior to login by using the sample script as shown below:

Query Example (with MySQL):

INSERT INTO company_sax_rolemappings (‘company_role’, ‘sax_role’) values (‘PROD_DEV_USER’, ‘ROLE_ADVANCED_USER’), (‘PROD_OPS_USER’, ‘ROLE_NORMAL_USER’);

At the time of LDAP authorization, group’s common name (cn), where the authenticated user exist, will be search and retrieved, the group name then will be mapped with StreamAnalytix role with the help of above given table data. This evaluated role will then be finally assigned to the user.

Constraints

There are a few constraints for Manage Users tab’s visibility on the web-UI if both authentication and authorization use LDAP.

Manage Users tab is not visible to SuperAdmin user since Devops and Tier-II users do not need to be managed explicitly, rather they are managed by LDAP directory itself. In contrast, for Developer users, Manage Users tab is visible since Developer user details need to be stored and managed in database itself for multi-tenancy support in application.


pg. 38

Kerberos StreamAnalytix uses Kerberos authentication protocol to secure communication between its infrastructure components.

Kerberos support available on the following components:

• Kafka • HDFS • HBASE • SOLR • Zookeeper

Configure Kerberos

Once Kerberos is enabled, go to SuperUserUI > Configuration > Environment > Kerberos to configure Kerberos.

Field Description Hadoop NameNode Kerberos Principal Kerberos principal for HBase RegionServer

Kerberos Configuration File Override Set TRUE to regenerate keytabs and keytab_login.conf every time a pipeline is deployed

Hadoop Core Site Location Path to the core-site.xml file which contains hadoop.security.auth_to_local property and


pg. 39

corresponding rules for mapping Kerberos principals to local usernames

Hbase master kerberos Principal Kerberos principal for HBase Master kerberos.yarn.resourcemanager.kerberos.principal Kerberos principal for yarn resource manager

Hbase Regionserver Kerberos Principal Kerberos principal for HBase RegionServer

Hive Metastore Kerberos principal Kerberos principal used by Hive metastore Configure Kerberos in Components Go to Superuser UI > Connections, edit the component connection settings as explained below: HBase, HDFS

Field Description

Key Tab Select Option A Keytab is a file containing pair of Kerberos principals and encrypted keys. You can use keytab


pg. 40

to authenticate various remote systems. It has two options:

• Specify Keytab File Path: Path where keytab file is stored

• Upload Keytab File: Upload keytab file from your local file system.

Specify Keytab File Path If the option selected is Specify Keytab File Path, system will display the field KeyTab File Path where you will specify the keytab file location.

Upload Keytab File

If the option selected is Upload Keytab File, system will display the field Upload Keytab File that will enable you to upload the keytab file.

Solr, Kafka and Zookeeper Kerberos security is configured by default for these components. No manual configuration required. Note: For Solr, Kafka and Zookeeper, Security is configured by providing principals and keytab paths in keytab_login.conf. This file then needs to be placed in StreamAnalytix/conf/common/kerberos and StreamAnalytix/conf/thirdpartylib folders.


pg. 41

Connections Connections allow StreamAnalytix to connect to services like RabbitMQ, Elasticsearch etc. Only Superuser can create connections. Admin and other users can use these connections. To add a new connection, login as a Superuser, go to Connections page and click on Add Connection.

Select the component from the drop-down list for which you wish to create a connection.

ActiveMQ For creating an ActiveMQ connection, select ActiveMQ from the Component Type drop-down list and provide connection details as explained below.

Field Description

Component Type Shows all different types of available connections. Select ActiveMQ Component type from the list.

Connection Name Name of the connection. For example, ActiveMQ


pg. 42

Hosts IP address of the machine where ActiveMQ is running.

UserName Username of ActiveMQ server to create connection.

Password Password of ActiveMQ server to create connection.

Create Click on the Create button to create the connection.

Test Connection After entering all the details, click on the Test Connection button, if credentials provided are correct and ActiveMQ is running, you will get the message Connection is available. If you enter wrong credentials or ActiveMQ is not running and you click on Test Connection, you will get the message Connection unavailable.

Cassandra For creating a Cassandra connection, select Cassandra from the Component Type drop-down list and provide connection details as explained below.

Field Description Component Type Shows all different types of available connections. Select

Cassandra Component type from the list. Connection Name Name of the connection. For example, Cassandra.

Hosts Hosts and ports of the machine where Cassandra installed.

Connection Retries Number of retries allowed for creating a connection. For example, 1

Authentication Enabled If you click the Authentication Enabled check box, the system will prompt you to enter username and password. This enables authentication and displays the required Cassandra properties for its connection.


Test Connection After entering all the details, click on the Test Connection button, if credentials provided are correct and Cassandra is running, you will get the message Connection is available. If you enter wrong credentials or server is down and you click on Test Connection, you will get the message Connection unavailable.

Elasticsearch For creating an Elasticsearch connection, select Elasticsearch from the Component Type drop-down list and provide connections details as explained below.


pg. 43

Field Description

Component Type Shows all different types of available connections. Select Elasticsearch Component type from the list.

Connection Name Name of the connection. For example, Elasticsearch

Hosts Hosts name and ports of Elasticsearch machine.

httpport Port number where elastic search is running.

Cluster Name The name of the cluster to which elastic search will connect.

isShieldEnabled Enable shield plugin for elastic search authentication.


Test Connection After entering all the details, click on the Test Connection button, if credentials provided are correct, services are up, and running, you will get the message Connection is available. If you enter wrong credentials or server is down and you click on Test Connection, you will get the message Connection unavailable.

Hbase For creating an Hbase connection, select Hbase from the Component Type drop-down list and provide connections details as explained below.

Field Description

Component Type Shows all different types of available connections. Select HBase Component type from the list.

Connection Name Name of the connection. For example, HBase

HDFS User HDFS user name. For example, sax

zK Host Zookeeper host name for hbase cluster.

zK Port Zookeeper port for hbase cluster.

Client Retries Number Number of retries for the Hbase Client. For example, 2

zk Recovery Retry Number of times to try to reconnect to Hbase Zookeeper.

zk Parent Node Parent node in Zookeeper for hbase service metadata.

Table Administration Enable this if you want to create table in Hbase.



pg. 44


HDFS For creating a HDFS connection, select HDFS from the Component Type drop-down list and provide connections details as explained below.

Field Description Component Type Shows all different types of available connections. Select HDFS

Component type from the list. Connection Name Name of the connection. For example, HDFS.

File System URI File System URI of the machine where HDFS installed. Username The name of the user through which Hadoop user is running. HA Enabled Hadoop cluster is HA Enabled or not.

Name Node1 Name NameNode1 identifier/label.

Name Node1 RPC Address

RPC Address of the Name Node1.

Name Node2 Name NameNode2 identifier/label. Name Services Name service id of Hadoop cluster.



HIVE For creating a HIVE connection, select HIVE from the Component Type drop-down list and provide connections details as explained below.

Field Description


pg. 45

Component Type Shows all different types of available connections. Select HIVE Component type from the list.

Connection Name Name of the connection. For example, HIVE

Metastore Type Thrift URI to connect to HIVE MetaStore service.

Metastore Name Metastore Name with which metastore is configured in HIVE. Host Metastore host or IP.

Port Metastore port.

UserName HDFS User name authorized to access the services. Password Password for accessing the services. Database Name Name of the DB. File System URL HDFS URL at which hive is configured.

HA Enabled Check this option, if Name Node of HDFS is HA enabled.

Name Node1 Name Name Node 1 identifier/label.


RPC Address of the Name Node1.

Name Node2 Name Name Node 2 identifier/label.


RPC Address of the Name Node 2.

Name Services Name service id of Hadoop cluster.



HIVE Emitter For creating a HIVE Emitter connection, Select HIVE Emitter from the Component Type drop-down list and provide connections details as explained below.


HIVE Emitter Component type from the list. Connection Name Name of the connection. For example, HBase


pg. 46

metaStoreURL Thrift URI to connect to HIVE MetaStore service.

FileSystem URL HDFS File System URI UserName HDFS User name authorized to access the services. HA Enabled Check this option, if Name Node of HDFS is HA enabled.

Name Node1 Name Name Node 1 identifier/label. Name Node1 RPC Address RPC Address of the Name Node1. Name Node2 Name Name Node 2 identifier/label.

Name Node2 RPC Address RPC Address of the Name Node 2.

Name Services Name service id of hadoop cluster.



JDBC For creating a JDBC connection, select JDBC from the Component Type drop-down list and provide connections details as explained below.

Field Description

Component Type Shows all different types of available connections. Select JDBC Component type from the list.

Connection Name Name of the connection. For example, JDBC

Database Type Type of database in which data needs to be dumped.

Database Name The name of the database to be used.

Host The host of the machine where database is deployed. Port The port of the database where database is deployed.

UserName The username of the database. Password The password of the database.



pg. 47

Test Connection After entering all the details, click on the Test Connection button, if credentials provided are correct and services provided are up and running, you will get the message Connection is available. If you enter wrong credentials or server is down and you click on Test Connection, you will get the message Connection unavailable.

Note: JDBC-driver jar should be in class path while running a pipeline having JDBC emitter or testing connection.

Kafka For creating a Kafka connection, select Kafka from the Component Type drop-down list and provide connections details as explained below.

Field Description

Component Type Shows all different types of available connections. Select Kafka Component type from the list.

Connection Name Name of the connection to be created. For example, Kafka.

zK Hosts Defines the list of comma separated IP port of Zookeeper for creating kafka topic from StreamAnalytix UI.

Kafka Brokers List of Kafka nodes against which connections need to be created.

Enable Topic Administration

Enabling topic administration will allow creation, updating and deletion of a topic.



RabbitMQ For creating a RabbitMQ connection, Select RabbitMQ from the Component Type drop-down list and provide connections details as explained below.

Field Description


pg. 48

Component Type Shows all different types of available connections. Select RabbitMQ Component type from the list.

Connection Name Name of the connection to be created. For example, RabbitMQ

Hosts IP address and port of the machine where RabbitMQ is running.

Username Username of RabbitMQ to create connection. Password Password of RabbitMQ to create connection.



S3 For creating a S3 connection, select S3 from the Component Type drop-down list and provide other details required for creating the connection.

Field Description Component Type Shows all different types of available connections. Select S3

Component type from the list. Connection Name Name of the connection to be created. For example,

RabbitMQ. AWS KeyId S3 account access key. Secret Access Key S3 account secret key. Create Click on the Create button to create the connection.


Socket For creating a Socket connection, select Socket from the Component Type drop-down list and provide connections details as explained below.


pg. 49


Socket Component type from the list. Connection Name Name of the connection to be created. For example, Socket

Host IP address of the machine where Socket is running. Port Port of the machine where Socket is running.


Test Connection After entering all the details, click on the Test Connection button, if connection details provided are correct and services are up and running , you will get the message Connection is available. If you enter wrong connection details or server is down and you click on Test Connection, you will get the message Connection unavailable.

Solr For creating a Solr connection, Select Solr from the Component Type drop-down list and provide connections details as explained below.

Field Description Component Type Shows all different types of available connections. Select Solr

Component type from the list. Connection Name Name of the connection to be created. For example, Solr

zKHost Defines the Zookeeper host for Solr sever. Create Click on the Create button to create the connection.



pg. 50

System Pipelines System Pipelines (or Superuser Pipelines) are the default system Pipelines that are created automatically when you deploy StreamAnalytix. These Pipelines cannot be modified. Superuser can only start or stop these Pipelines. System Pipelines should be used (start/stop) to enable operations such as alert generation, error logs collection, monitoring, etc. For Storm, following are the System Pipelines:

1. AlertPipeline 2. DataLineagePipeline 3. ErrorSearchPipeline 4. GraphiteWriterPipeline 5. MonitorMetricPipeline

AlertPipeline Alert Pipeline enables the Alert Processor to generate alerts on the UI and send alert emails. It stores alert- messages in the database and aggregate alerts for a given time window. AlertPipeline configuration is shown below:


pg. 51

Hover on a pipeline component to view its metrics, as shown below:

DataLineagePipeline Datalineage Pipeline enables searching of lineage information. This Pipeline reads data from various kafka topics configured in Data Lineage sections of Pipelines and persists into Hbase by creating proper schema. To ensure that Pipeline works properly, make sure that the topic configured in Lineage section of Business Pipeline is added as a Kafka source in Data Lineage Pipeline. If the topic is not already added then add it as kafka channel and connect it to Hbase Once the pipeline is initiated, it will start persisting the data into Hbase. Data Lineage pipeline configuration is shown below:


pg. 52

Hover on a pipeline component to view its metrics, as shown above. ErrorSearchPipeline ErrorSearchPipeline sends a pipeline’s error logs and messages to an indexer so that error logs are available on the Application Errors tab in a Pipeline Summary View. ErrorSearchPipeline pipeline configuration is shown below:

To run this pipeline, go to the Configuration tab and enable the following properties: 1. Enable Spark Error Data Retrieval (sax.spark.error.data.retrieval.enabled): TRUE 2. Enable Spark Error Search (sax.spark.error.search.enabled): TRUE


pg. 53

GraphiteWriterPipeline GraphiteWriterPipeline takes the input (metrics) from the MonitorMetricPipeline and sends it to graphite. GraphiteWriterPipeline Pipeline configuration is shown below:

MonitorMetricPipeline MonitorMetricPipeline pipeline sends the metrics to a server either Ambari or Graphite as per the configuration properties. It executes the aggregation rules based on the metrics type for a fixed time interval. MonitorMetricPipeline pipeline configuration is shown below:

For Spark, these are four System Pipelines:

1. SparkDataLineagePipeline 2. SparkErrorSearchPipeline 3. SparkMetricsWriterPipeline 4. SparkMonitorMetricPipeline


pg. 54

SparkDataLineagePipeline SparkDatalineage Pipeline enables searching of Lineage information. This Pipeline reads data from various Kafka topics configured in Data Lineage sections of Pipelines. If the topic is not already added then add the topic in the Kafka channel. If there are, two or more channels then connect them to Hbase through Union operator. SparkDataLineagePipeline configuration is shown below:


pg. 55

You are able to view the Pipeline Statistical information like Jobs, Stages, Tasks and Streaming. It also exhibits number of components (Kafka and Hbase) being used in the Pipeline.

SparkErrorSearchPipeline: SparkErrorSearchPipeline sends pipeline error logs and messages to an indexer so that error logs are available on the Application Errors tab in a Pipeline Summary View. ErrorSearchPipeline pipeline configuration is shown below:

SparkMonitorMetricPipeline SparkMonitorMetricPipeline monitors the metrics of a Spark Pipeline. The data which consists of a set of message types is generated by the metrics scheduler which is read by SparkMonitorMetricPipeline pipeline from a RabbitMQ queue SaxSparkMonMetricQueue. The metrics data is processed in a window duration of 60 sec and the processed data is sent back to another RabbitMQ queue SaxSparkMetricsWriterQueue. The Pipeline generates different values for the incoming data by applying aggregation functions on the metrics data. The processed data helps in understanding the overall matrices of the Pipeline corresponding to the infrastructure, disk, network, jmx, and monitoring alert etc. SparkMonitorMetricPipeline Pipeline configuration is shown below:


pg. 56

SparkMetricsWriterPipeline SparkMetricsWriterPipeline writes the metrics generated by a Spark pipeline. The SparkMetricsWriterPipeline Pipeline reads data from the RabbitMQ queue SaxSparkMetricsWriterQueue and processes the data, which consists of a set of message types. The data is analysed based on monitoring criteria applied in the Manage Alerts section of the Monitoring widget. The data is shown in the form of line and pie charts. Notification alerts are generated when the given criteria matches with the data. SparkMetricsWriterPipeline pipeline configuration is shown below:

NOTE: Spark pipelines can only be monitored if the SparkMonitorMetricPipeline and SparkMetricsWriterPipeline pipelines are up and running. These Pipelines process the metrics and makes it available at StreamAnalytix UI in the form of live widgets. To give us your feedback on your experience with the application and report bugs or problems, mail us at [email protected]


Documents

STREAMANALTIX 2.1.6 Administrationdocs.streamanalytix.com/2.1.6/pdf/Administration.pdfStreamAnalytix platform enables enterprises to analyze and respond to events in real-time at Big