Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
7/11/19
1
1
ISP Essentials Workshop – Network MonitoringManila, Philippines8-12 July 2019
22
Agenda• Intro to Network Management
• Configuration Management• Device Monitoring
• Flow Monitoring
• Log Management
7/11/19
2
3
INTRO TO NETWORK MANAGEMENT
Module 1
4
Hosts and Services• Host
– Container for services– Can be physical or virtual– Both have CPU, Disk, Memory,
Network interfaces– Physical hosts also have
• Vendors, service contracts• Power supplies, temperature
• Service– An application software– Runs on a host– Have allocated resources– Have vendors / suppliers
7/11/19
3
55
Managing Config Data• Some Host Configuration Data to Track
– Physical Device Locations– Installed CPU, Disk, Memory, Network Interfaces– Serial Numbers, Licenses, OS Revision & Patch Details
• Some Service Configuration Data to Track– Allocated Resources, Network Ports– Service Permissions, Filters and ACLs, Logging– Software Revision & Patch Details
66
Why Manage Config Data?• Match Resource Allocation to Revenue Generation
• Ensure our Hosts and applications have Secureconfiguration
• Correlate operational results with config changes
• Roll back or restore config when fault occurs
7/11/19
4
7
Operational Data• Host
– CPU Utilisation– Memory Utilisation– Disk Utilisation– Network Interface Utilisation– Fan State– Port Errors
• Service– Time to Respond to Request – Processes in Use – Queue Length – State of a BGP session
8
Operational Data• Availability
– Applies to Hosts & Services – Percent of time host or service is
performing to specification – Typically measured as a percent, for
example 99.99% – Excludes planned outages
• Performance– Time to respond to request or
forward packet – Megabits or Packets Per Second – Discards, Errors, Loss
• Reachability– Applies to Hosts & Services – Percent of time host or service is
reachable – Typically measured as a percent, for
example 99.99% – Unreachable hosts may not be
unavailable to everyone – Unreachable hosts may be available
from another location
7/11/19
5
99
Why Monitor Operational Data• Know about Problems Before your Customers Call
• Prove Hosts & Services are Delivering on SLAs • Continue to Meet SLAs as your Network Grows
1010
Common NMM Tools
7/11/19
6
1111
Common Back-end Tools• Data storage
– Config files, formats and locations– Databases: SQL, key-pair, not SQL
• RRDTool– Explain the idea of a round-robin database
• Check_mk– Explain the idea of a service checking
• Nagios Plugins– Explains what is Nagios and what are plugins
1212
Network Automation• A continuous process of generation and deployment of
configuration changes, management, and operations of network devices (from Network Automation at Scale)
7/11/19
7
1313
Network Automation• Automating config management
• Including config changes based on operational data• Orchestrated with tools like Ansible Chef, Puppet, and Salt
• This is the next step in network monitoring and management
14
ADDRESS MANAGEMENTModule 2
7/11/19
8
1515
Address Management• planning and managing the assignment and use of IP
addresses and closely related resources of a computer network.
• IP Address Management (IPAM) tools– Racktables– Netbox– A lot of others (commercial and open source)
https://en.wikipedia.org/wiki/IP_address_management
1616
Tools - Racktables• Asset management tool
https://www.racktables.org/demo.php
7/11/19
9
1717
Tools - Netbox• open source web application designed to help manage and
document computer networks.
https://netbox.readthedocs.io/en/stable/
18
CONFIG MANAGEMENTModule 3
7/11/19
10
1919
Network Device Configuration• How to configure device?
– Using the command line (Cisco)– From a special tool (Mikrotik)– From a web interface (Procurve)– JSON files (Arista)– XML files (Juniper)
• Who configures the device?• How often do changes happen?
2020
Why do you need to manage config?• Know when changes are done
• Restore config during failure • Rollback changes with unexpected outcome
• Track config changes throughout time (history)
7/11/19
11
2121
What is Version Control?• Also known as revision control or source control
• Manages changes to files or documents with a revision number
• Allows users to find and highlight changes
• Allows users to restore previous versions of a file or document
2222
What’s a Diff?• A comparison of two versions of a single file or document
• Highlighting the changes between the two versions• Allowing users to quickly see only what’s changed
7/11/19
12
2323
What’s a Diff?
2424
Config Management Tools• Retrieve configuration files
• Allow for their storage as files or in versioning system• Solve many problems with network operations
7/11/19
13
2525
Tools - Rancid• Really Awesome New Cisco config differ
• monitors a router's (or more generally a device's) configuration
• Uses CVS, Subversion, or Git to maintain history
• Supports Cisco, Foundry, HP, Juniper, and more
• Runs on BSD, Linux, Mac OS• Pros:
– The de-facto industry standard for config management https://www.shrubbery.net/rancid/
2626
Rancid ExampleIndex: configs/dc1-gw1
===================================================================
retrieving revision 1.677
diff -U 4 -r1.677 dc1-gw1
@@ -713,8 +713,10 @@
remark permit eduroam to beta-login
permit tcp any host 204.111.222.3 eq www 443
remark permit eduroam to stats
permit tcp any host 204.111.222.4 eq www 443
+ remark permit eduroam to net-api
+ permit tcp any host 204.111.222.5 eq www 443
remark temp deny access to all
deny ip any 204.111.222.0 0.0.0.64
7/11/19
14
2727
Rancid ExampleIndex: configs/dc1-gw
===================================================================
retrieving revision 1.2213
diff -U 4 -r1.2213 dc1-gw
@@ -32,9 +32,8 @@
!
!Flash: bootflash: Directory of bootflash:/
!Flash: bootflash: 11 drwx 16384 Jan 11 2017 12:13:18 +10:00 lost+found
!Flash: bootflash: 12 -rw- 371180156 Oct 5 2018 14:05:16 +10:00 asr1000rp1-adventerprisek9.03.13.10.S.154-3.S10-ext.bin
- !Flash: bootflash: 13 -rw- 4 Jul 9 2019 15:15:03 +10:00 .issu_loc_lock
!Flash: bootflash: 48769 drwx 4096 Jan 11 2017 12:16:08 +10:00 .installer
!Flash: bootflash: 438913 drwx 4096 Jan 11 2017 13:05:11 +10:00 core
!Flash: bootflash: 829057 drwx 4096 Oct 11 2018 07:24:32 +10:00 .prst_sync
!Flash: bootflash: 520193 drwx 4096 Jan 11 2017 12:19:19 +10:00 .rollback_timer
2828
Tools - Oxidized• network device configuration backup tool (to replace
Rancid)• Stores files in a version control system• Supports a large number of manufacturer
– Cisco (CatOS, IOS, IOSXR, NXOS)– Juniper (JunOS, ScreenOS)– Huawei (VRP, SmartAX)– Mikrotik (RouterOS)
• Pros:– Integrates with LibreNMS
https://github.com/ytti/oxidized
7/11/19
15
2929
Other Tools• Fetchconfig
• Jazigo
30
DEVICE MONITORINGModule 4
7/11/19
16
3131
Intro to SNMP• Simple Network Management Protocol
• Used to communicate management information between the network management stations and the agents in the network elements.
• Even though SNMP is a protocol, we use the term SNMP to describe the complete architecture of the management system
3232
Intro to SNMP• Network management stations execute management
applications which monitor and control network elements.
• Network elements are devices such as hosts, gateways, terminal servers
• The agent is a piece of software that runs on the network devices you are managing. It can be a separate program, or it can be incorporated into the operating system. Agents listen and respond on UDP port 161.
7/11/19
17
3333
SNMP Polling, Traps and MIB• SNMP Polling is the act of querying an agent for some piece of
information. SNMP managers use UDP to poll agents
• A trap is way for the agent to tell the NMS that something has happened. Traps are sent asynchronously, not in response to queries from the NMS. SNMP traps are sent using UDP port 162.
• MIB or Management Information Base is a database of managed objects that the agent tracks. Any sort of status or statistical information accessed by the NMS is defined in an MIB.– OID or object identifier is the name of a management object. OIDs are globally
unique
3434
SNMP Applications• LibreNMS
• MRTG• PRTG
• …
7/11/19
18
3535
Beyond SNMP• SNMP is a heavy-weight protocol with low information density• SNMP was not designed for streaming high resolution data • It’s seen as too slow, incomplete, network-specific, and hard to
operationalize
New protocols are being developed to stream telemetry data in real-time• Yang data model• XML, JSON and GBP encoding• Data pushed from agents, not requested from Managers• UDP, TCP or gRPC transport available
3636
Tools - LibreNMS• An open-source network monitoring system (NMS)
• Capable of managing small or big networks• Most management functions are supported or can be
integrated
• Details under the hood:– Written in PHP, derived from the Observium project– Configuration in MySQL – Operational data is stored in Round Robin Database files
https://www.librenms.org/
7/11/19
19
3737
LibreNMS Dashboard
3838
Tools – Sensu• Sensu is a multi-cloud monitoring system that allows for
automating monitoring workflow– Monitor containers, instances, applications, and on-premises
infrastructure– Integrates with PagerDuty, Slack, Grafana, etc
• Sensu Go is the latest version• Uchiwa is an open-source dashboard for the Sensu
monitoring framework
https://sensu.io/about/
7/11/19
20
3939
Sensu / Uchiwa Dashboard
https://github.com/sensu/uchiwa
4040
Tools - Grafana• Open platform for monitoring and analytics
• Does time series analytics• Plugins to integrate with other applications
7/11/19
21
4141
Grafana Dashboard
https://grafana.com/
42
FLOW MONITORINGModule 5
7/11/19
22
4343
What is a Flow?• A flow is defined as a unidirectional sequence of packets
with some common properties that pass through a network device. (RFC3954)
4444
Why do we monitor IP flows?• Where is our traffic coming from?
• What kind of application traffic is it?• Are the correct QoS bits set?
• Have routing changes impacted the network
7/11/19
23
4545
What’s Netflow?• Cisco protocol for flow monitoring released in 1996
• Described by RFC3954, but not an Internet Standard• Netflow V5 is supported by nearly all router platofrms
• Versions:– Version 5: Ipv4 only– Version 9: IPv4/v6 and MPLS
4646
What is IPFIX?• IP Flow Information Export
• Vendor neutral protocol for flow monitoring • Started through the IETF process in 2004 & released in
2011
• Based on Cisco’s Netflow version9
• IPFIX is an Internet Standard replacement for version 9
7/11/19
24
4747
How do Netflow and IPFIX work?• Packets with matching tuples are grouped into a flow
• First occurrence of a flow is recorded in a flow cache• Cache entries are timestamped
• Number of packets and bytes matching the flow are tallied
• Details like next hop IP, ASN, subnet masks, and TCP flags can be recorded
• Cache can be queries interactively, or flows can be exported
4848
Setting up Netflow & IPFIX• Cisco – Netflow Configuration
• Juniper – Monitoring, Sampling …• Huawei – Netstream Configuration
• Mikrotik - IP Traffic Flow
7/11/19
25
4949
Flow Sampling / Downsampling• Tracking every flow can take a lot of device resources
• Some routers & switches can be crippled by turning on Netflow
• Sampling helps by tracking one in n packets
• CPU load can be significantly reduced – but so can resolution
5050
Tools - Softflowd• Software Flow Monitoring
• Passive Netflow collector• Network traffic passing through a switch can be mirrored
• Attach a Unix computer to the mirrored port
• Softflowd tracks flows from the mirrored traffic
• Flows can be exported just as they are from routers & switches
7/11/19
26
5151
Ad-Hoc Flow Queries• Cisco
show ip flow
• JunOSshow services accounting flow-detail
5252
Tools – nfdump + nfsen• Nfdump collects and processes netflow and sflow
– C application that receives flows & logs them to files
• Nfsen generates stats and displays graphs– Web-based front-end to Nfdump
https://github.com/phaag/nfdumphttp://nfsen.sourceforge.net/
7/11/19
27
53
Tools – nfdump + nfsen
5454
Tools - ntopng• Web-based traffic and security network monitoring tool
https://github.com/ntop/ntopng
7/11/19
28
55
LOG MANAGEMENTModule 6
5656
What generates logs?• Operating Systems
– Linux, Mac, Windows
• System applications– Cron, init, rdbms
• Network applications– Bgp, dhcp, http, iptables …
7/11/19
29
5757
What do servers log?• Backups
• Connections• Database messages
• Hardware messages
• Software versions and updates
5858
What do Network Apps log?• Connections
• DHCP details• Hardware messages
• Port events
• Protocol information
7/11/19
30
5959
Where are logs stored?• Linux/Mac : /var/log
• Windows: Event Viewer• Network devices: Memory
Is it useful to have logs stored all over the place? What happens to events written to memory when devices are turned off?
6060
Firewall Log
7/11/19
31
6161
Syslog Message LevelsLevel Description0 Emerg1 Alert2 Critical3 Error4 Warning5 Notice6 Info7 Debug
6262
Syslog aggregation
7/11/19
32
6363
How to aggregate syslog• Set up a remote syslog facility on a server
– Graylog– Elastic Stack– Rsyslog– Splunk– Syslog-ng
• Configure devices to send their logs
6464
Tools - Graylog• Commercial + Open source software
• Collection, Storage, Analysis, & Visualisation• Tightly coupled software stack including:
– ElasticSearch for Search– MongoDB for log storage
• LibreNMS integration
7/11/19
33
6565
Tools – Elastic Stack• Open source with commercial support available
• Collection via Logstash• ElasticSearch for Storage and Search
• Kibana for Search, Analytics, and Visualisation
• (ELK stack)
6666
Tools - Rsyslog• Open source with commercial support available
• TCP, SSL, TLS, RELP• MySQL, PostgreSQL, Oracle and more
• Filter any part of syslog message
• Multi-threading and suitable for relay chains
7/11/19
34
6767
Tools - Splunk• Commercial software
• Free for small users at < 500 mb/day• Collection, Storage, Analysis & visualization
• Real-time alerting engine included
• Popular corporate solution with 13k customers
6868
Tools – Syslog-ng• Free and open source with commercial support available
• Collection and storage• Adds TCP and TLS to basic UDP transport
• Can extract structured information from log messages
• Can log directly to a database
• Requires external tools for Analysis and visualization
7/11/19
35
6969
Log Alerting & Analysis• No systems administrator has time to read all logs
• Log messages are unimportant until they aren’t– Post-incident security reports– Billing inquiries– Law Enforcement Agency request
• Some platforms include analysis or alerting• Others need external tools like Tenshi or Swatch
7070
Beyond Alerting: Analysis• Volume of log entries is as important as entries
– What’s your baseline number of entries?– Has it changed?– Do more log entries mean an attack?
• Similar log entries across a network can be important– Port scanning, intrusion attempts
• Similar log entries across time can be important– Is someone attacking you very slowly?
7/11/19
36
7171