Does anyone else have periodic issues updating the firmware on their switches? I realize that this may not impact all of you, especially if you only have a few smaller environments. But I have > 100 switches across multiple sites and connections and it feels like I periodically get firmware issues that exhibit the following behavior:
1. The switch goes into somewhat of a bricked state where it goes offline, in some cases the power light isn't on but the ports still have lights. Plugging in a computer to one of the open ports won't get connectivity nor will it allow setting up and connecting to the switches local interface using the 1.1.1.100
2. The only thing that seems to get it out of this state is a hard reboot of the switch
3. After a hard reboot it starts immediately downloading its new firmware then successfully installs it
4. Doesn't seem to make a difference what firmware going to/from or if it is scheduled or not.
Note: I talked to support and they have had multiple other sites/users report this issue. They are trying to build fail safes and better identify what is going on but I'd be interested in starting a dialog here to get more details or possible way to avoid this.
I'm a small network and have experience that same issue a few times.
This was fixed in 9.26:
"MS22/42/220/320 switches may hang on boot and require physical intervention."
Once you get everything on 9.26 and above it should come right.
I'm currently on 9.30 and the last update to this version, I had two switches that I had to physically go to and unplug power to bring them back online. I've not updated since.
The current stable release for the switches is 9.32 ...
Now that you are on 9.30 upgrading to 9.32 should be pain free. Note it can take quite a long time from when you say "go" to the upgrade being finished. I've seen the process taking up to 45 minutes.
I'm not sure its hanging on boot since it still hadn't fully downloaded the firmware. But I'll pay attention to the current firmware version vs the destination version to identify if the 9.26 potentially resolved it.
The last two networks I experienced the issue at were going from MS 9.27 → MS 9.32. I've also had the issue in the past when going from one stable firmware to another so this seems like it still may be an unresolved issue.
The firmware issues is definitely not resolved. On this round of updates to my MS320 switches from 9.30 to 9.32, all but 5 of my switches had to be manually rebooted post deployment in order for the firmware to update. I thought last Friday's snow day was a great day to update all the switches. I was very wrong, since that required me to come in anyways and go closet t closet pulling power to switches until they were all updated and back online.
I hope some progress can be made on this. I've been experiencing this on some of my networks for about the last year and a half of updates.
Thank you to everyone for being so active in the Meraki Community.
We have identified the issue which has been causing MS22, MS42, MS220 and MS320 switches to require a physical reboot during an upgrade. The issue is present on switches currently running MS 9.0 and above. We hope to rectify this issue and provide a fix shortly.
In the interim, we have discovered a reboot of the switch via dashboard prior to performing an upgrade will prevent the switch from becoming unreachable.
As always, if you have any further questions/concern, please do not hesitate to reach out to our Cisco Meraki Support Team.
Happy New Year to everyone!
We now have a fix for this issue in MS 9.33 and MS 10.9.
As always, if you have any further questions/concern, please do not hesitate to reach out to our Cisco Meraki Support Team.
Did this issue exist in prior firmwares as well? I'd been experiencing the issue since 8.x
That did not take long. 9.34 is already out!
Who is going to go ahead with 9.34?
@BrothersTM, I just deployed it to our internal switches for testing.
It is just a bit disconcerting to have a new "Stable release candidate" come out, and then maybe two days later to have that candidate replaced. There must have been something bad in 9.33 to have it replaced so quickly.
That is exactly my thought and apprehension.
@VictorC any feedback on the above questions/concerns?
We discovered a regression specific to the MS120 and wanted to catch it before it was widely deployed, the issue causes some problems with aggregated links.
Firmware is the same for every other model.
@VictorC Just to confirm - when upgrading to 9.34, do I need to manually reboot the switches ahead of time, or will the upgrade process take care of this on its own? The switches (MS225/MS350/MS425) are currently running 9.32.
I believe I saw in the release notes that the firmware update would first reboot the switch and then update the firmware when upgrading TO 9.34.
I agree with @BrothersTM, the release notes says it reboots the switches first. We have been running 9.34 internaly for a couple of weeks without issue. I have just put a couple of customers on to it.
Following up on this. Doing our round of firmware updates from 9.32 to 9.36 and experienced the same issue where one of our switches just stops responding when it initiates the firmware update. In this case I was physically at the switch with my laptop connected and viewing the switches local config page. Once it went to downloading upgrade 0% the switch instantly locked up. Power light went amber and all port lights went dark. No connectivity from my hard wired laptop. Port showed as if it was unplugged. No traffic being passed. I had to manually power cycle the switch then it initiated its firmware update just fine. I have a ticket in with Meraki on it. They obviously know of the issue but it is making remote firmware updates tricky.
I just upgraded a network of MS225's and MS425's last night from 10.16 to 10.19 without any problems. The 10.19 release notes do still have a known issue that switches may reboot twice while upgrading.
Just tried an upgrade from 10.23 to 10.26 last night and again ran into issues.
Some switches were stuck in blinking white state.
Downgrading did not solve this completely, too...
Some seemed to run 10.23 and some 10.26.
So it seems as if there still would be some issues with this upgrading process...
@MFuchs wrote:Just tried an upgrade from 10.23 to 10.26 last night and again ran into issues.
Some switches were stuck in blinking white state.
Downgrading did not solve this completely, too...
Some seemed to run 10.23 and some 10.26.
So it seems as if there still would be some issues with this upgrading process...
Thanks I'm going to work on a notoriously problematic switch this Thursday evening. Support recommended I call them before attempting so they can enable more detailed logging. The one's I've had problems with usually get stuck in that non-responsive state and when I power cycle them they usually finish the upgrade.
Yeah, me too but after physical reboot it came up. (1-by-1 as it when the 1st switch solid white then only power on the next).
And also frequent stacking issue but nothing is wrong but it resolve by itself after 10-15 minutes.
I called Meraki but they unable to capture the issue 😞
Somehow this upgrade behaviour is more than disappointing.
What i do not get is how the error might happen...
Perhaps it's not that easy, but it could be...
Fact is the customers should not have these problems.
The dashboard is not usable to check which firmware is running, because it shows the firmware configures (and estimated by the database).
So i'd like to queue the update and KNOW it works - hoping it works is not an option...
I have a few sites that are running 9.37
I have some issues that are supposed to be fixed in 10.26 - so I scheduled an upgrade to 10.26
The Network consists of MS-225 Stacks in IDFs connected back to a 2 x MS-425-16 Stack via LACP - with one link on MS-425-1 and the second Link from the IDF Stack on MS-425-2
All of my switches have a static IP on a VLAN - Class C Subnet that is the LAN input on my firewall
The upgrade took place and I lost my MS-425s - they were stuck in an upgrade loop of death
If you connect to the MS-425 via the Web interface you can see the following
After the upgrade had failed - the MS-425s were in DHCP Mode on VLAN 1 - they had lost their static IP
So I set the static and the device connected to the Meraki cloud - you can see the progress - 33%, 66 % and then a reboot by the switch - during which you lose web connectivity on the management port
Once you regain connectivity, the static is gone and the unit is back on DHCP, VLAN 1 - rinse, repeat....
You are stuck in this death loop
I was able to fix one switch by adding a temp DHCP server on VLAN 1 and connecting a port on the MS-425 to VLAN 1 with a DHCP Server
But the other switch became a brick and even after a hardware reset and multiple reboots - it would not complete a code upgrade
I RMA'd the switch and downgraded back to 9.37
I had a similar issue at another site - spent 12 hours on phone with TAC - I was able to get one of the MS-425s back to 9.37 but the other was in the boot loop of death and I could not go forward or backward with the firmware upgrade
I left the switch at midnight in a defaulted state and plugged into the internet with DHCP on VLAN 1
At about 4 am I received an email that the switch had joined the cloud - a miracle....
When I looked at the switch log - it had rebooted about 8 times since midnight (power supply inserted message...) and it indeed had broken free from the reboot of death loop
Here's what I know - after spending a total of about 30 hours on the phone with TAC - they don't have a clue what the issue is
During the initial deploy, I added a static IP to the MS-425s, the MS-425s were able to join the cloud and upgrade to 9.37 - then I created the stack and added Layer-3 Interfaces and configured the interfaces and added LACP
If you attempt to upgrade a stack of MS-425 from 9.37 to 10.26 - the upgrade fails on the 425 Stack and they get stuck in the reboot of death loop
If you attempt to downgrade them back to 9.37 - good luck - I found that I had to remove the stack cables, hw default one switch, set a static and connect it back to cloud - after several repeats of this process a 425 switch may rejoin the cloud - but after each failed attempt you have to re-add the static IP and set the WAN VLAN
Eventually I was able to get some of the switches to break out of the reboot of death loop and return back to 9.37
Some of the switches could not be revived and were RMA'd
I have an open case on this - but it does not look promising - "We have not been able to duplicate this in the lab...." - So resolution is not on the horizon
TAC has asked me to go on site again and try to do another upgrade - but these are production networks and who has another 12 hours to spend on a fools errand...
So at this time I have 2 networks that are running 9.37 - they have LACP and STP issues that supposedly fixed in 10.26 - but I seem to have no way of getting there
I have asked TAC to send me 2 - 425s that are already at 10.26 and I will attempt to insert them into the network
As of now - it looks like if you have 425s - you might want to try and upgrade them to 10.26 before deployment and then try and stack them - It appears that after they are stacked, they are doomed to the reboot of death loop
@DrFiber I have not had the issues you're seeing with 425's when upgrading from 9.X to 10.X firmware and all of my 425's are stacked. The downstream switches from the 425's are even 225's like yours with LACP uplinks to the 425's.
The only thing I can think of is if you are scheduling the upgrades to happen all at the same time, or if you are staging them for the 225's first and then 425's. I schedule all of the 225's to go at least 1 hour prior to the 425's so they are all complete prior to the 425's going down.
The first upgrade was a push from Meraki - with no sequencing
Since then - they added beta feature to do staged / sequenced upgrades to my dashboard
My issue is ax post facto - after the non sequenced push - the 425s get hung in an unrecoverable state - can't go forward to 10.26 - can't go backward to 9.37
Sometimes after several hardware defaults - I am able to get out of the loop and go back to 9.37 - sometimes the switch is bricked - and sometimes after leaving the switch connected for 4 hours it magically exits the loop and downgrades back to 9.37
In all cases, during this process, the MS-425 gets partially through the firmware load, re-boots, resets the static to DHCP and loses connectivity
We don't have DHCP on VLAN 1 - and our external address is not on vlan 1 - so maintaining the static is important for us
The magic switch above recovered after 4 hours of several re-boots - I think - because I had set up DHCP on VLAN 1
Other switches are not a magical and they turn into door stops after several days of the reboot of death loop
To be clear - I had no issues upgrading/downgrading the MS-225s
Want to trade MS-425s ? ...... LOL
ex post ...
@DrFiber Do you still have 425's that need to be upgraded, or have all of them been upgraded and/or replaced at this point?
Still trying to upgrade
Have a RMA unit - plan is to upgrade as a stand alone - then add to Network - and hopefully upgrade it's twin
Any updates on this issue? I have been having this problem on my ms-220s and sure enough tried to update to 10.40 tonight and 1 of the 220s hung. It is not easy for me to get to this switch and rebooting the switch never seems to help.
this issue has been going for over a year and still happening? Has anyone found a way to fix this remotely?