Upgrade SRX With ISSU (All KB in One GO)

8/10/2019 Upgrade SRX With ISSU (All KB in One GO)

http://slidepdf.com/reader/full/upgrade-srx-with-issu-all-kb-in-one-go 1/8

For performing an ISSU (In-Service Software Upgrade) the followings steps need to be done:

1. Load the Junos Software package on the device - KB209552. Verify the Health of the Cluster (Important step) - KB209563. Create backup of the current configuration and set the rescue config - KB209574. Start the In-Service Software Upgrade - KB209585. Process to follow, in the event of the ISSU process stalling in the middle of the upgrade - KB19500

Load the Junos Software package on the device KB20956

a. Junos software installation requires the package to be on the SRX device.

For help on getting the Junos software package, refer to Downloading Software Packages fromJuniper Networks.

There are multiple methods for transferring the software package to the device. Copy/transfer thesoftware package on to the device by using FTP or USB. (You can determine the amount oftemporary storage space left on the device by following KB17367. )

b. Once the package is loaded, verify that it is available under the directory (/cf/var/tmp ). You can dothis by following any of the methods shown below:

From Shell: {primary:node0}root@test-node0> exitroot@test-node0% ls /var/tmp

juniper.conf.spu.gz juniper.data junos-srx5000-10.1R1.8-domestic.tgz

From CLI:

{primary:node0}root@test-node0> file list /cf/var/tmp/junos *

junos-srx5000-10.1R1.8-domestic.tgz

c. To ensure that the package transferred to the device is not truncated or corrupted, perform a MD5

checksum, which proves the integrity of the package.

> file checksum md5 /var/tmp/jinstall-ex-4200-10.4R1.9-domestic-signed.tgz

Verify the Health of the Cluster (Important step) - KB20956

Review the output of the following commands, and verify completely that the cluster is ingood shape and the health is excellent. It is highly recommended that ISSU is done only if theChassis Cluster is in a healthy failover state.

1. Confirm the Chassis Cluster is in the Primary/Secondary state with a proper priority.Follow the steps here: KB20673 - How to verify that Chassis Cluster inPrimary/Secondary State has proper priority. KB20673 is the common method for

http://kb.juniper.net/KB20955





http://www.juniper.net/techpubs/en_US/release-independent/junos/topics/task/installation/software-packages-downloading.html



















verifying the Chassis Cluster health. However for an ISSU upgrade, also perform thefollowing steps.

KB20673

Run the command show chassis cluster status on either node to verify the ChassisCluster status:

{ primary:node0}root@J-SRX> show chassis cluster status Cluster ID: 1Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 1node0 100 secondary no nonode1 150 primary no no

Redundancy group: 1 , Failover count: 1node0 100 secondary no nonode1 150 primary no no

Do you see one node with the status of primary and one node with the status of secondary?

Yes - Proceed with Step 2 No - Go to KB20641 - Troubleshooting steps when the Chassis Cluster does not come up

What is the priority of each node?

If the priority is 0, then proceed to KB16869 - What does priority 0 mean in a JSRP chassiscluster?

Priority 0 means that the node is in the ineligible state. There are several reasonswhy you could see the ineligible state:

Cold sync failure (see J-Series/SRX Security Configuration Guide for moredetails)

Monitored interface down IP Tracking is failing (SRX3000 and SRX5000) Possible hardware issue

Perform the following to correct the priority 0 state:

Check chassis cluster statistics. Are there any missing heartbeats or probes? Check chassis cluster interfaces. Are any of the monitored interfaces down or is a

tracked IP missed? Check jsrpd logs. Are there any errors? Check chassisd logs. Is any hardware down? Check messages log. Are there any events leading up to the problem?

http://kb.juniper.net/InfoCenter/index?page=content&id=KB20641






http://www.juniper.net/techpubs/software/junos-security/junos-security10.1/junos-security-swconfig-security/topic-43677.html

http://www.juniper.net/techpubs/software/junos-security/junos-security10.1/junos-security-swconfig-security/topic-43677.html






If the priority is 255 , then proceed to KB16870 - What does priority 255 mean in a JSRPchassis cluster?

Priority 255 means that a manual failover was initiated. Manual failover will show'yes ' in that scenario.

After a manual failover, it is always recommended to reset the manual flag in thecluster status. Otherwise, no additional failovers may occur for that redundancygroup.

To remove manual failover state and restore proper priority state, use below CLIcommand.

request chassis cluster failover reset redundancy-group <0-128>

If the priority is between 1 and 254 , proceed to Step 3.

If the priority for both nodes is between 1 and 254, it means that the Chassis Clusteris in a healthy state.

Each Redundancy Group (other than Redundancy Group 0) contains one or more redundantEthernet interfaces. A redundant Ethernet interface is a pseudo interface that contains a pairof physical Gigabit Ethernet interfaces or a pair of Fast Ethernet interfaces. If a RedundancyGroup is active on node 0, then the child links of all the associated redundant Ethernetinterfaces on node 0 are active. If the redundancy group fails over to node 1, then the childlinks of all redundant Ethernet interfaces on node 1 become active.

2. Verify that all the FPC’s and the PIC’s are showing online. If any of the FPC s areshowing as „Present or „Offline , it is important to determine the cause for this andmake sure that on both nodes they come up as online before proceeding further. (If theFPC in non-online mode does not take part in the Chassis Cluster failover, then itshould be OK to proceed. Contact your technical support representative, if in doubt atthis stage.)

{primary:node0}root@test-node0> show chassis fpc pic-status

node0:--------------------------------------------------------------------------Slot 0 Online SRX5k DPC 40x 1GEPIC 0 Online 10x 1GE RichQPIC 1 Online 10x 1GE RichQPIC 2 Online 10x 1GE RichQPIC 3 Online 10x 1GE RichQSlot 3 Online SRX5k SPCPIC 0 Online SPU CpPIC 1 Online SPU FlowSlot 4 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU Flow









Slot 5 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU FlowSlot 6 Online SRX5k SPCPIC 0 Online SPU Flow

PIC 1 Online SPU FlowSlot 7 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU FlowSlot 8 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU Flownode1:--------------------------------------------------------------------------Slot 0 Online SRX5k DPC 40x 1GEPIC 0 Online 10x 1GE RichQPIC 1 Online 10x 1GE RichQPIC 2 Online 10x 1GE RichQPIC 3 Online 10x 1GE RichQSlot 3 Online SRX5k SPCPIC 0 Online SPU CpPIC 1 Online SPU FlowSlot 4 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU FlowSlot 5 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU FlowSlot 6 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU FlowSlot 7 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU FlowSlot 8 Online SRX5k SPCPIC 0 Online SPU FlowPIC 1 Online SPU Flow

3. Check the Chassis control link and verify that you see a closely uniform sent/receive packets. Also confirm that the error count is not increasing. It is suggested to runthis command twice.

srx> show chassis cluster control-plane statistics ()

4. It is best to have all the Redundancy Groups to be primary on any one node, eg. node0. If not, proceed to do a failover of the Redundancy Group before you proceedfurther.

srx> request chassis cluster failover redundancy-group 0 node 0srx> request chassis cluster failover reset redundancy-group 0



srx> request chassis cluster failover redundancy-group 1 node 0srx> request chassis cluster failover reset redundancy-group 1

For the failover for RG0, there might be a slight lag and you may need to wait forabout 2 – 3 mins maximum as the RE is getting failed over. The rest of the RG groups

will failover faster.

5. Verify there are no alarms. Run the following command:

{primary:node0}root@test-node0> show chassis cluster information

The cluster information and statistics should not be showing any alarms that couldcause a disruption when this is going on. You can check for the SPU counts and itshould match with the number of SPU you have on the device.

Check if the events are showing any irregular problems. If so, solve that first before proceeding further.

Verify that the date and time match with the system date and time.

Check for errors happening during our troubleshooting period prior to the upgrade.

Check if the packet counts and SPU counts match each other. If you do finddiscrepancies, contact your technical support representative for consultations before

proceeding further.

root@test-node0> show chassis cluster information | no-more

Create backup of the current configuration and set the rescue config - KB20957

If the current configuration is a good one, save it as RESCUE:

{primary:node0}root@test-node0> request system configuration rescue save

Reason: Precautionary step for ISSU, to make sure that even if active configuration getswiped out for some reason, the device will always have a rescue config to load from once it

boots up. You can verify the rescue configuration on the device by doing a 'file list /config/'to confirm that the file exists. Then do a 'file show /config/rescue.conf.gz' and review the contents.

Further, it is recommended that you have a latest copy of the running configuration stored ona different storage device/server for easy retrieval if required.







Finished upgrading secondary node node1Rebooting Secondary Nodenode1:--------------------------------------------------------------------------Shutdown NOW!

[pid 21958]ISSU: Backup RE Prepare DoneWaiting for node1 to reboot.node1 booted up.Waiting for node1 to become secondarynode1 became secondary.Waiting for node1 to be ready for failoverISSU: Preparing Daemons

Once this is done, on NODE 1, the following will be reported :

{secondary:node1}

root@test-node1> show chassis cluster statusCluster ID: 2Node Priority Status Preempt Manual failoverRedundancy group: 0 , Failover count: 2node0 254 primary no nonode1 2 secondary no noRedundancy group: 1 , Failover count: 2node0 254 primary no nonode1 0 secondary no no

At this stage, Node 1 has rebooted successfully and is on the Junos version that youupgraded to. Check the command “show version” to verify this. Also, check thefollowing commands.

srx> show chassis cluster statussrx> show chassis fpc pic-status (all the PICs in NODE 1 should be online – keepmonitoring it for 2 mins or so to make sure all are online)srx> show chassis alarmssrx> show system alarmsSrx> show log messages | grep issu

Now the automatic failover will happen and once that is done, the upgrade of Node 0will happen. The messages reported are similar to above, but still monitor it to see ifthere are any problems or warnings that the boot messages are throwing.

Node 0 should come back up in the healthy state. Verify everything as mentioned inthe KB20673 - How to verify that Chassis Cluster in Primary/Secondary State has

proper priority

Also see that the Redundancy Groups are now primary on Node 1 – to bring it back tonode 0, follow the process as shown below: srx> request chassis cluster failover redundancy-group 0 node 0srx> request chassis cluster failover redundancy-group 1 node 0srx> request chassis cluster failover redundancy-group X node 0srx> request chassis cluster failover reset redundancy-group X







As mentioned earlier, you will see that the failover of RG0 might take some time. The rest ofthe Redundancy Groups should failover fast.

The ISSU process is now complete, and you can check the health of the Cluster as mentionedin KB20673 - How to verify that Chassis Cluster in Primary/Secondary State has proper

priority.

Process to follow, in the event of the ISSU process stalling in the middle of the upgrade -KB19500

In case system does not complete ISSU process perform the following steps to completelystop the ISSU process and rollback to previous state.

If both nodes completed upgrade, verify with 'show version', run the following commands on both nodes simultaneously to rollback to previous Junos version

request chassis cluster in-service-upgrade abortrequest system software rollbackrequest system reboot

If only node completed upgrade, verify with 'show version', run the following commands

1) On upgraded node

request chassis cluster in-service-upgrade abortrequest system software rollback

2) On Node that did not complete upgrade request chassis cluster in-service-upgrade abort

3) On both nodes after completing the above steps

request system reboot

If neither node completed upgrade succesfully , verify with 'show version', run thefollowing commands on both nodes simultaneously to rollback to previous Junos version

request chassis cluster in-service-upgrade abort

request system reboot
























































Documents

Upgrade SRX With ISSU (All KB in One GO)