Advanced Juniper PoE Debugging
Updated I might have a path for upgrade success. Looking at the controller info from show poe controller, I noticed the following: Huh. Checking out the firmware version using show chassis firmware detail, I noticed that the switch had the older 1. I upgrade the software using the latest JTAC recommended version staying in Door controller is now getting power, I see a MAC address. Everything is hunky dory. No problems on EX software upgrade.
Now upgrade PoE firmware… Ten minutes later, I get the following on the terminal: Of note, and the thing that made me panic, was that out of nine switches in the stack, only one came back online. After reboot, I get the following: WTF. WTF am I going to do?! From their site, the solution is the following with my own notes : Power cycle the affected FPC re-seat the power cord. Do not perform a soft reboot. Also, it is recommended that you push the PoE code one by one instead of adding all members in the virtual-chassis setup.
Emphasis mine After the above command is executed, the FPC should automatically reboot. If not, reboot from the Command Line Interface. Note: Be patient and wait. No, seriously…wait. It takes awhile. The PoE version should be the latest version 2. If the version is correct, the PoE devices should work. The following is similar to solution 1, but the failed PoE controller requires to basically upgrade it twice. The PoE controller will disappear when you run show poe controller, then come back and start upgrading like this: After the firmware upgrade completes, the firmware will likely be incorrect it always was for me.
Power cycle the affected FPC re-seat the power cord. If the version is correct, the PoE devices should look like this: Repeat the above steps to upgrade the PoE versions on other FPCs in the virtual-chassis setup.
This is the nuke-from-orbit approach on the switch if you want to avoid doing an RMA or if you have no choice. The gist of it: disconnect the switch from the VC if connected , perform an OAM recovery, zeroize and reboot the switch, then perform the firmware upgrade. No matter what you run, the switch displays the following message: Upgrade in progress.
For instance, I have had a switch have a failed PoE controller, but the switch still operated like a non-PoE switch without issue; i. If step 1 happens, try this first. Caveat: EXs, even in After the switch reboots, the controller will still come up as failed when you run show poe controller.
Update Juniper does have an official bug report for this, and is apparently fixed in Trying an earlier version of the JTAC software, the going to the latest recommended. Example: I had no problems with Give the hardware some time to get back up and going.
Cross your fingers. And legs. On a full moon. Hope this helps! This is outside the scope of this post.
Identify and remedy problem IKE and eventd processes on Juniper SRX
February 4, 1, Recently we encountered a very strange behavior on an SRX cluster. Honestly, we started looking at the routers first since this was something the SRX has never done before.
After noticing that it was actually link dropping and not just OSPF having issues, we dug deeper into logs as an aside, this is an excellent reason to centrally syslog everything. And I do mean everything.
To our surprise and dismay, it was actually the SRX node0 actually rebooting all of its interface line cards. This is a problem. So, in order to drill down what was causing User to be so abnormally high, we had to do a little sleuthing. From here it was pretty obvious what was eating the CPU. First things first, make sure no traceoptions are on.
KMD is the key management process. It performs configured actions in response to events on a routing platform that trigger system log messages. Lets restart ipsec-key-management and see if that helps. Note: If this does not work, you may have to drop to the shell and kill it like a unix process.
In the future this will be monitored so that this problem does sneak up on us. If you do not plan to terminate VPN tunnels, there is no reason to run the services on by default to do so. We opted to both disable and do a more inclusive loopback filter which seems to have taken care of the problem. Take aways from this is multi faceted. The SRX is a weird beast. And an IPS. And a router. Secondly, there are a lot of intricacies in monitoring these devices since they arent just routers.
Manual core dump generation – Junos
Everything is hunky dory. No problems on EX software upgrade. Now upgrade PoE firmware… Ten minutes later, I get the following on the terminal: Of note, and the thing that made me panic, was that out of nine switches in the stack, only one came back online.
After reboot, I get the following: WTF. WTF am I going to do?! From their site, the solution is the following with my own notes : Power cycle the affected FPC re-seat the power cord. Do not perform a soft reboot. Also, it is recommended that you push the PoE code one by one instead of adding all members in the virtual-chassis setup.
Junos Architecture (Processes)
Emphasis mine After the above command is executed, the FPC should automatically reboot. If not, reboot from the Command Line Interface. Note: Be patient and wait. No, seriously…wait. It takes awhile. No wonder we pay contractors so much to do it!
Again, this is why the contractors make the big bucks. Our new show poe port 0 status command would show something like Power Management-Static -ovl in the output.
Juniper EX3400: How to Recover from PoE Firmware Upgrade Failure
Applying a little engineering translation, this means that there was a static overload condition — more power was drawn from the port than was allowed by its config for more than just a brief burst. Since I was running a Class 4 type device on a port that can actually supply Class 4 power, to me this further translated to damaged insulation, a short circuit but not bad enough to completely short outor a bad device on the end.
Swapping the end device resulted in the same error, so I was able to further deduce it was a bad wiring job. Do you wonder what the other statuses mean?
This is the part of the post you might want to bookmark for future reference. I did some research and found the source of truth for these codes!
This is due to the nature of the bond-type Only one slave in the bond is active.
Juniper Memory Leak
A different slave becomes active if, and only if, the active slave fails. Tester Method 1 Upgrade the member 1 then see if you failover routing-engine from member 0 to member 1. The issue that could arise is that the routing-engine will not failover as the 2 switches will be on different version of code and the VC will not join back up as backup routing-engine. This method is out, but then it was expected to be honest Tester Method 2 Upgrade using NSSU method and when it gets stuck see if you can abort and failover.
However, when I aborted the NSSU it took over an hour to get the operational prompt and once I got to the operational prompt, the VC cluster had detached and was back to Master and Linecard. Additionally, whatever config changes you made as the members are separated will not be kept if you switchover the PFE.