18
Distributed Monitoring Tool CSE 5306 001 NICHOLAS BURNS SNEHA KADAM TRAN HOANG-DUNG ANUJ RAKHEJA 1

Design Presentation Distributed Monitoring tool

Embed Size (px)

Citation preview

Page 1: Design Presentation Distributed Monitoring tool

Distributed

Monitoring ToolCSE 5306 – 001

NICHOLAS BURNS

SNEHA KADAM

TRAN HOANG-DUNG

ANUJ RAKHEJA

1

Page 2: Design Presentation Distributed Monitoring tool

Introduction

-Description and Focus

A distributed monitoring tool helps to collect computer and OS

information of all the nodes attached to a server

The server displays the data of the nodes on the user’s display

All the nodes/computers will have a hybrid structure with basic

client-server architecture

Prerequisites for computers in the system

Nodes/computers with successfully connected to server through wired

or wireless network

Latest version of JDK

SIGAR Libraries

OS (Windows, Linux, Mac, Unix)

2

Page 3: Design Presentation Distributed Monitoring tool

System Overview

-Design

Multi-tiered Client-Server Architecture

Tree-based Heap Structure

Two Types of Nodes

Server (1): Main computer (root) that collects

and displays all node system information

Clients (max 31): Merge their own system

information with their children’s (if any) system

information and pass along to their parent

Threading used to simulate additional nodes

Leaf Nodes initiate data transmission

3

Page 4: Design Presentation Distributed Monitoring tool

Detailed Design

-Server

The node number of the server will be 32. The

server will parse the data received from all the

client nodes & display them on the display.

The server object will contain the details of

each node (i.e. node number, its children, IP

addresses, etc.).

4

Page 5: Design Presentation Distributed Monitoring tool

Detailed Design

-Client There will be 31 client nodes in the system arranged in a

heap like tree structure. The client will collect its data every 30 seconds & transfer it to its parent. The parent node will be responsible to append its data to the child’s data & send to its parent.

The object of each client node will look like:

Node number

IP address

Parent details:

Node number

IP address

Number of children

Children details:

Left childe node number & IP address

Right childe node number & IP address

5

Page 6: Design Presentation Distributed Monitoring tool

Detailed Design

-Adding a Node

Initially only the server node i.e. node number 32 is present in the network. The

IP address & the port number on which the TCP communication will take

place will be fixed.

6

Node number = 31

IP address = XXX

Parent details:

Node number = 32

IP address = XXX

Number of children = 0

Children details:

Left childe node number & IP address = 0

Right childe node number & IP address = 0

Node number = 32

IP address = XXX

Parent details:

Node number = 0

IP address = 0

Number of children = 0

Children details:

Left childe node number & IP address = 31 & XXX

Right childe node number & IP address = 0

Page 7: Design Presentation Distributed Monitoring tool

Detailed Design

-Adding a Node

Adding a non-initial Client node (most common)

7

Page 8: Design Presentation Distributed Monitoring tool

Detailed Design

-Deleting a Node

Suppose Node 29 wishes to leave this network

8

Page 9: Design Presentation Distributed Monitoring tool

Detailed Design

-Protocol Design The communication between the children & parent nodes will take

place as below over TCP/IP using 50000 as the port number.

Byte # 1: Packet details like Request or acknowledgement or delete etc.

Byte # 2: Number of bytes in this packet

Depending on the first byte, the following bytes will vary.

Byte # 3: Number of node details that this packet contains

Byte # 4 to byte # x: Node 1 details

Byte # x to y: Node 2 details

.

.

.

Byte # z to byte a: Last node

9

Page 10: Design Presentation Distributed Monitoring tool

Detailed Design

-System Info. Collection Protocol

SIGAR (System Information Gatherer And Reporter)

We will use the following SIGAR libraries/classes…

Version – Reports the current version of SIGAR used and general OS information

Uptime – Reports the amount of time the OS has been active or awake

CPUinfo – Reports the detailed information about the CPU

Free – Reports memory information (file systems, total, used, and free space, etc.)

Ulimit – Reports the system resource limits (stack size, virtual memory,

etc.)

10

Page 11: Design Presentation Distributed Monitoring tool

Detailed Design

-System Info. Collection Protocol

1. The network is settled and in “final” form (no more nodes currently being added)

2. All the leaf nodes/computers of the network each have a periodic timer within their client-side code

3. Whenever this timer is triggered they call the SIGAR commands to gather their information

4. This information is packaged within our custom built class in a rigid format

5. This Sys_Info object is passed along to its parent

6. The receiving parent node waits until it has all of its children’s objects (either 1 or 2)

7. Once it has all child information it combines those two objects with its own SIGAR-gathered Sys_Info object and passes along the collective data to its parent

8. This protocol continues until all information reaches the main server node which then displays all node information on the user’s interface

11

Page 12: Design Presentation Distributed Monitoring tool

Detailed Design

-Output Display Format

Similar to standard SIGAR output

Server will have a simple JPanel

A tab for each Node

Non-editable text window showing information

Metrics defined in Outcome section

If Node is not in network “Node #: Not Active”

12

Page 13: Design Presentation Distributed Monitoring tool

Challenges

Parsing data sent between nodes correctly

Ensuring correct and coherent table-keeping for node

arrangement in the network

Guaranteeing synchronization of child node reporting to

parent node

Keeping accurate and updated IP and Port information amongst the clients

Ensuring no deadlocks occur during the threading of clients

13

Page 14: Design Presentation Distributed Monitoring tool

Implementation

-Software and Tools

Language: Java (using at least JDK 8u40)

Java’s Server/Client Socket Programming libraries (.net framework

and functionality)

TCP/IP scheme

SIGAR (System Information Gatherer And Reporter)

Used for collecting each computer’s system information

Provides a single API capable of working with all the popular operating

systems on the market (Windows, Linux, Solaris, MAC OS X, etc.)

Its core is implemented in C but has bindings in other languages (Java

will be used for this project)

14

Page 15: Design Presentation Distributed Monitoring tool

Implementation

-Work Dispersion Among Team DropBox and code repository

Gathering system information, packaging efficiently, and passing to another node

Nicholas, Sneha

Programming for the main server node parsing all information to display to the user

Tran, Sneha

Server node programming

Anuj, Nicholas

Client node programming

Anuj, Nicholas

General socket communication between nodes

Sneha, Tran

Overall architecture and structure of the network

All

15

Page 16: Design Presentation Distributed Monitoring tool

Theoretical/Simulation Study The nodes will have different storage capacity requirements.

The nodes in higher height in the heap have to combine the information of their children with their own information and then transfer all of them to the higher level nodes . Thus, they not only have to storage but also transfer a larger amount of information in comparison with their children.

Consequently, the requirements of storage capacity, computation ability increase from the leafs to the root in the distributed monitoring system.

It would be a problem when we apply the design method for a distributed monitoring system with a very large number of nodes because of the unbalance in storage capacity and computational ability requirements between nodes.

Therefore, the scalability of proposed design method may be limited to a certain small number of nodes.

In order to enhance the balance in storage capacity and computational ability between nodes and thus increase the scalability of the distributed monitoring system, the super peer architecture in low level nodes and the client-server architecture in high level nodes should be combined together.

16

Page 17: Design Presentation Distributed Monitoring tool

Future Work

We have just only proposed one basic operating mode for the distributed monitoring system that allows the server to collect periodically all information of its children. To improve the operating flexibility of the system, we would like to add two more operating modes into the tool in the future.

Allow the system to operate under request-answer style. That means the server may ask any node to send just only its information. This option helps the server to avoid processing to much information from all of nodes that it not really care. Additionally, this operating mode may help to reduce the communication load in entire the system.

Allow the server find the most suitable node for a specific task. When user want to find a node to do a specific task, it requires the sever to find the most suitable node. The server will send a request to its children to determine who is the most suitable node containing optimal required resources for the task. This can be seen as request for optimal information in specific cases.

17

Page 18: Design Presentation Distributed Monitoring tool

Future Work cont.

In order to deal with a large number of nodes in the future, we would like to find an optimal design architecture for the distributed monitoring system. From the theoretical analysis in section 6, the combination of super peer and client-server architecture may be a good selection. However, it is certainly more complicated in implementation than the proposed method.

Last but not least, we have to deal with one important situation of communication between nodes in the project that a node suddenly fails (for example the node 29 in section 4). In the project we just assume that before a node leaves, it informs the server to get the permission from the server. After that, the server reconfigures the heap architecture and the system works as usual again. However, one node can suddenly fail in connection with its children and its parent as well. In that situation, its parent should be informs to the server that one of its children have gone and asks for the reconfiguration the network structure from the server.

18