Fixing A Juniper Switch That Was Shut Down Improperly

From Baranoski.ca
Revision as of 13:20, 9 March 2020 by Casey (talk | contribs)
Jump to navigation Jump to search

Juniper switches need to be shut down properly, not just powered off. They're Unix-based, and Unix does not like being shut down improperly.


OS Primary Partition Corruption

You will know when you have a switch that has been shut down improperly. There will be an amber light on the chassis, and this alarm on the console:

user@switch> show chassis alarms
1 alarms currently active
Alarm time              Class  Description
2014-01-26 10:48:49 EST Minor  Host 0 Boot from backup root

As well as this banner:

***********************************************************************
**                                                                   **
**  WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE      **
**                                                                   **
**  It is possible that the primary copy of JUNOS failed to boot up  **
**  properly, and so this device has booted from the backup copy.    **
**                                                                   **
**  Please re-install JUNOS to recover the primary copy in case      **
**  it has been corrupted.                                           **
**                                                                   **
***********************************************************************

When installing the OS, a Juniper device makes two copies of the OS. One is a backup, in case the primary was not unmounted cleanly at shutdown (or just powered off).

To copy the backup image over top of the primary image (you must type this; it will not tab-complete):

request system snapshot media internal slice alternate

Note that using this command will only repair the OS; it won't clear the alarm.

Verify with the command:

show system storage partitions

You will get output like this:

Boot Media: internal (da0)
Active Partition: da0s1a
Backup Partition: da0s2a
Currently booted from: backup (da0s2a)

Note the "Currently booted from: backup" line.

Once the snapshot is done, the switch must be rebooted to clear the alarm. Normally, a Juniper will boot the last-known-good copy of the OS. It must be forced to use the primary.

request system reboot slice alternate media internal in 0

If that does not resolve it, try the reboot command again. For some reason, a second reboot sometimes solves it.

SSH Issue

Sometimes, SSH will also fail after an improper shutdown. When trying to SSH to the switch, you will see this:

user@COREBOX-re0> ssh 192.168.1.2
ssh_exchange_identification: Connection closed by remote host

To fix this, console into the switch and do the following:

start shell user root
cd /var
mkdir empty
exit

Then you have two options: reboot the switch or restart SSH.

To restart SSH:

configure private
deactivate system services ssh
commit
rollback 1
commit


Full OS Reinstall

If it gets powered off improperly enough time, the primary and backup images will both be marked bad, and you will see this:

U-Boot 1.1.6 (Apr  4 2013 - 10:30:53)

Board: EX2200-C-12T-2G 4.15
EPLD:  Version 14 (0x00)
DRAM:  Initializing (512MB)
Flash: 8 MB

Firmware Version:01.00.00
USB:   scanning bus for devices... 3 USB Device(s) found
       scanning bus for storage devices... 1 Storage Device(s) found

ELF file is 32 bit
Consoles: U-Boot console

FreeBSD/arm U-Boot loader, Revision 1.1
(builder@svl-junos-pool91.juniper.net, Tue Apr  5 00:15:22 UTC 2011)
Memory: 512MB
bootsequencing is disabled
new boot device =
\
can't load '/kernel'
can't load '/kernel.old'
Press Enter to stop auto bootsequencing and to enter loader prompt.


To reinstall the OS:

  • Copy the .tgz file for the OS to a FAT32 formatted USB memory key
  • Power off the switch
  • Insert the USB key into the switch
  • Power on the switch
  • Press enter when you see the "Press Enter" prompt
  • Run this command:
install --format file:///<the .tgz file>


Avoiding This Entirely

Just enable auto-snapshot. It will automatically repair any damaged files on the primary partition. Enabling will also repair anything that is currently damaged. It also sets the primary partition as the boot partition for the next reboot, so you don't have to do that step of forcing it to boot from the repaired partition.

configure private
set system auto-snapshot
commit and-quit
user@EX2200> show system auto-snapshot
Auto-snapshot Configuration:   Enabled
Auto-snapshot State: In-progress
Note: Snapshot takes about 10-15 mins depending upon disk size

user@EX2200> show system auto-snapshot
Auto-snapshot Configuration:   Enabled
Auto-snapshot State: Completed