We have one of out two MS425 switches that periodically "Goes Off Line" or even worse (the reason I am here) " It just loses power. This is the third time and it has 2 power supplies and each are plugged into a different PDU that is on battery backup. None of the logs of all the other devices plugged into the PDU's indicate power loss. I have worked support. The only thing that works is to pull power on the device. It is not good for business, nor my reputation. Any thoughts?
This tickles a memory buried deep somewhere in regions of my brain I don't use much (which is most of it...). A long time ago in a past life I seem to recall something very similar with Aruba switches (not the rebranded Procurves, the actual Aruba switches they made before HP bought them). We had a problem with those where the power supplies would have their e-fuses (or some other solid state circuit breaker-like protection circuit) tripped for some reason and could only be brought back with a hard power reboot.
For the life of me though I can't remember the exact error we were seeing, and how Aruba solved it. I'm not even sure if this is relevant here... And if your switch is connected to multiple power sources that are run off UPS then I would think your power is properly conditioned from any anomalies that could trigger any sort of fault protection circuit.
You could start with just RMA'ing the power supplies as those would be an easier swap than the whole switch. If that doesn't help then replace the switch.
Do you have a spare 425 lying around? can you connect it to the same power sources and run it without traffic to see if it also shows the same behaviour?
That seems like the appropriate thing to do.
Something to test. Swap the config on the 2 switches. Depending on the results it would either be config or hardware ruled as the problem.
I would also say power supplys should be swapped with RMA
I have several thoughts.
First, are you running a modern firmware?
Second, push for an RMA.
Third - I have [rarely] seen issues when using independent supply rails (particulary when fed from seperate UPs or a UPS and a different type of supply) where they develop small differences between each other.
The last one I had to deal with was a leakage between the phase and ground lines causing the power supplies in the kit to explode (and I mean explode). You can also get small differences in phase between the two supplies.
To test if this remote case is the one affecting you - unplug one of the redundant power supplies and see if the issue happens again.
I have also [rarely] seen issues where the supplied power factor is getting too out of control (would really like it to be above 0.9). This is more typical in factory environments that run heavy inductive loads. A poor factor factor affects switch mode power supplies.
To test this one you would probably need to get an electrical engineer in (not an electrician). They'll need proper kit to measure this. They can also check your dual power rails to check they are correctly in phase with each other and that you are not experiencing any significant earh leakage. Ideally you really need to do the measurements over a longer period of time, like a week. That is because these problems can be transient in an environment and being caused by other things happening (such as large inductive motors starting up).
I have several MS425's in production and not experienced anything like that. I have; however, experienced where one particular one seems to go offline briefly and randomly after we had big power outage on that side of town. I had thought it to be a power issue with the switch, or potentially a power supply...but after verifying I have good power and swapping power supplies I have wondered if it were something to do with a Cisco switch in this network and PVST (see other thread about all this).
Never did I have to do a hard reboot though, so doesn't sound like this is your issue.