Firmware Update Issues

Adam
Kind of a big deal

Firmware Update Issues

Does anyone else have periodic issues updating the firmware on their switches? I realize that this may not impact all of you, especially if you only have a few smaller environments.  But I have > 100 switches across multiple sites and connections and it feels like I periodically get firmware issues that exhibit the following behavior:

 

1.  The switch goes into somewhat of a bricked state where it goes offline, in some cases the power light isn't on but the ports still have lights.  Plugging in a computer to one of the open ports won't get connectivity nor will it allow setting up and connecting to the switches local interface using the 1.1.1.100

2.  The only thing that seems to get it out of this state is a hard reboot of the switch

3.  After a hard reboot it starts immediately downloading its new firmware then successfully installs it

4.  Doesn't seem to make a difference what firmware going to/from or if it is scheduled or not.

 

Note:  I talked to support and they have had multiple other sites/users report this issue.  They are trying to build fail safes and better identify what is going on but I'd be interested in starting a dialog here to get more details or possible way to avoid this. 

 

 

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
35 REPLIES 35
BrothersTM
Getting noticed

I'm a small network and have experience that same issue a few times.

PhilipDAth
Kind of a big deal
Kind of a big deal

This was fixed in 9.26:

"MS22/42/220/320 switches may hang on boot and require physical intervention."

 

Once you get everything on 9.26 and above it should come right.

I'm currently on 9.30 and the last update to this version, I had two switches that I had to physically go to and unplug power to bring them back online.  I've not updated since.

The current stable release for the switches is 9.32 ...

 

Now that you are on 9.30 upgrading to 9.32 should be pain free.  Note it can take quite a long time from when you say "go" to the upgrade being finished.  I've seen the process taking up to 45 minutes.

Adam
Kind of a big deal

I'm not sure its hanging on boot since it still hadn't fully downloaded the firmware.  But I'll pay attention to the current firmware version vs the destination version to identify if the 9.26 potentially resolved it.  

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
Adam
Kind of a big deal

The last two networks  I experienced the issue at were going from MS 9.27 → MS 9.32.  I've also had the issue in the past when going from one stable firmware to another so this seems like it still may be an unresolved issue.  

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
Mr_IT_Guy
A model citizen

I've also had the issue where in some of my L3 switches, routing is affected. Yes things are supposed to be resolved in these upgrades, but sometimes we still are affected. It took almost a year for us to have a working version of AMP (in my metwork) and that's with moving to latest versions that supposedly fixed it.
Found this helpful? Give me some Kudos! (click on the little up-arrow below)

The firmware issues is definitely not resolved.  On this round of updates to my MS320 switches from 9.30 to 9.32, all but 5 of my switches had to be manually rebooted post deployment in order for the firmware to update.  I thought last Friday's snow day was a great day to update all the switches.  I was very wrong, since that required me to come in anyways and go closet t closet pulling power to switches until they were all updated and back online.

Adam
Kind of a big deal

I hope some progress can be made on this.  I've been experiencing this on some of my networks for about the last year and a half of updates.  

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
VictorC
Meraki Employee
Meraki Employee

Thank you to everyone for being so active in the Meraki Community.  

 

We have identified the issue which has been causing MS22, MS42, MS220 and MS320 switches to require a physical reboot during an upgrade.  The issue is present on switches currently running MS 9.0 and above.  We hope to rectify this issue and provide a fix shortly. 

 

In the interim, we have discovered a reboot of the switch via dashboard prior to performing an upgrade will prevent the switch from becoming unreachable.

 

As always, if you have any further questions/concern, please do not hesitate to reach out to our Cisco Meraki Support Team.

Happy New Year to everyone!

 

We now have a fix for this issue in MS 9.33 and MS 10.9.

 

As always, if you have any further questions/concern, please do not hesitate to reach out to our Cisco Meraki Support Team.

Adam
Kind of a big deal

Did this issue exist in prior firmwares as well?  I'd been experiencing the issue since 8.x

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
PhilipDAth
Kind of a big deal
Kind of a big deal

That did not take long. 9.34 is already out!

Who is going to go ahead with 9.34?

@BrothersTM, I just deployed it to our internal switches for testing.

 

It is just a bit disconcerting to have a new "Stable release candidate" come out, and then maybe two days later to have that candidate replaced.  There must have been something bad in 9.33 to have it replaced so quickly.

That is exactly my thought and apprehension.

Adam
Kind of a big deal

@VictorC any feedback on the above questions/concerns?

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
VictorC
Meraki Employee
Meraki Employee

We discovered a regression specific to the MS120 and wanted to catch it before it was widely deployed, the issue causes some problems with aggregated links.

 

Firmware is the same for every other model.

MRCUR
Kind of a big deal

@VictorC Just to confirm - when upgrading to 9.34, do I need to manually reboot the switches ahead of time, or will the upgrade process take care of this on its own? The switches (MS225/MS350/MS425) are currently running 9.32.

MRCUR | CMNO #12

I believe I saw in the release notes that the firmware update would first reboot the switch and then update the firmware when upgrading TO 9.34.

I agree with @BrothersTM,  the release notes says it reboots the switches first.   We have been running 9.34 internaly for a couple of weeks without issue.  I have just put a couple of customers on to it. 

 

Adam
Kind of a big deal

Following up on this.  Doing our round of firmware updates from 9.32 to 9.36 and experienced the same issue where one of our switches just stops responding when it initiates the firmware update.  In this case I was physically at the switch with my laptop connected and viewing the switches local config page.  Once it went to downloading upgrade 0% the switch instantly locked up.  Power light went amber and all port lights went dark.  No connectivity from my hard wired laptop.  Port showed as if it was unplugged.  No traffic being passed.  I had to manually power cycle the switch then it initiated its firmware update just fine.  I have a ticket in with Meraki on it.  They obviously know of the issue but it is making remote firmware updates tricky.  

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.

I have serious issues with upgrading the MS120 series. We had the since they were released and i really like the design (silent and slim) but there seem to be many issues left... support is hard-working on these issues so i hope to get a stable environment soon...
MRCUR
Kind of a big deal

I just upgraded a network of MS225's and MS425's last night from 10.16 to 10.19 without any problems. The 10.19 release notes do still have a known issue that switches may reboot twice while upgrading. 

MRCUR | CMNO #12

Just tried an upgrade from 10.23 to 10.26 last night and again ran into issues.

Some switches were stuck in blinking white state.

Downgrading did not solve this completely, too...

Some seemed to run 10.23 and some 10.26.

So it seems as if there still would be some issues with this upgrading process...

Adam
Kind of a big deal


@MFuchs wrote:

Just tried an upgrade from 10.23 to 10.26 last night and again ran into issues.

Some switches were stuck in blinking white state.

Downgrading did not solve this completely, too...

Some seemed to run 10.23 and some 10.26.

So it seems as if there still would be some issues with this upgrading process...


Thanks I'm going to work on a notoriously problematic switch this Thursday evening.  Support recommended I call them before attempting so they can enable more detailed logging.  The one's I've had problems with usually get stuck in that non-responsive state and when I power cycle them they usually finish the upgrade. 

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
Tony_Ang
Getting noticed

Yeah, me too but after physical reboot it came up. (1-by-1 as it when the 1st switch solid white then only power on the next).

And also frequent stacking issue but nothing is wrong but it resolve by itself after 10-15 minutes.

 I called Meraki but they unable to capture the issue 😞

Somehow this upgrade behaviour is more than disappointing.

What i do not get is how the error might happen...

 

  1. The switch downloads the FW thru the tunnel established to meraki cloud.
  2. The switch checkes if the downloaded firmware image is ok.
  3. The switch flshes the firmware.
  4. The switch reboots.

Perhaps it's not that easy, but it could be...

Fact is the customers should not have these problems.

The dashboard is not usable to check which firmware is running, because it shows the firmware configures (and estimated by the database).

So i'd like to queue the update and KNOW it works - hoping it works is not an option...

 

I have a few sites that are running 9.37

 

I have some issues that are supposed to be fixed in 10.26 - so I scheduled an upgrade to 10.26

 

The Network consists of MS-225 Stacks in IDFs connected back to a 2 x MS-425-16 Stack via LACP - with one link on MS-425-1 and the second Link from the IDF Stack on MS-425-2

 

All of my switches have a static IP on a VLAN - Class C Subnet that is the LAN input on my firewall

 

The upgrade took place and I lost my MS-425s - they were stuck in an upgrade loop of death

 

If you connect to the MS-425 via the Web interface you can see the following

 

After the upgrade had failed - the MS-425s were in DHCP Mode on VLAN 1 - they had lost their static IP

 

So I set the static and the device connected to the Meraki cloud - you can see the progress - 33%, 66 % and then a reboot by the switch - during which you lose web connectivity on the management port

 

Once you regain connectivity, the static is gone and the unit is back on DHCP, VLAN 1 - rinse, repeat....

 

You are stuck in this death loop

 

I was able to fix one switch by adding a temp DHCP server on VLAN 1 and connecting a port on the MS-425 to VLAN 1 with a DHCP Server

 

But the other switch became a brick and even after a hardware reset and multiple reboots - it would not complete a code upgrade

 

I RMA'd the switch and downgraded back to 9.37

 

I had a similar issue at another site - spent 12 hours on phone with TAC - I was able to get one of the MS-425s back to 9.37 but the other was in the boot loop of death and I could not go forward or backward with the firmware upgrade

 

I left the switch at midnight in a defaulted state and plugged into the internet with DHCP on VLAN 1

 

At about 4 am I received an email that the switch had joined the cloud - a miracle....

 

When I looked at the switch log - it had rebooted about 8 times since midnight (power supply inserted message...) and it indeed had broken free from the reboot of death loop

 

Here's what I know - after spending a total of about 30 hours on the phone with TAC - they don't have a clue what the issue is

 

During the initial deploy, I added a static IP to the MS-425s, the MS-425s were able to join the cloud and upgrade to 9.37 - then I created the stack and added Layer-3 Interfaces and configured the interfaces and added LACP

 

If you attempt to upgrade a stack of MS-425 from 9.37 to 10.26 - the upgrade fails on the 425 Stack and they get stuck in the reboot of death loop

 

If you attempt to downgrade them back to 9.37 - good luck - I found that I had to remove the stack cables, hw default one switch, set a static and connect it back to cloud - after several repeats of this process a 425 switch may rejoin the cloud - but after each failed attempt you have to re-add the static IP and set the WAN VLAN

 

Eventually I was able to get some of the switches to break out of the reboot of death loop and return back to 9.37

 

Some of the switches could not be revived and were RMA'd

 

I have an open case on this - but it does not look promising - "We have not been able to duplicate this in the lab...." - So resolution is not on the horizon

 

TAC has asked me to go on site again and try to do another upgrade - but these are production networks and who has another 12 hours to spend on a fools errand...

 

So at this time I have 2 networks that are running 9.37 - they have LACP and STP issues that supposedly fixed in 10.26 - but I seem to have no way of getting there

 

I have asked TAC to send me 2 - 425s that are already at 10.26 and I will attempt to insert them into the network

 

As of now - it looks like if you have 425s - you might want to try and upgrade them to 10.26 before deployment and then try and stack them - It appears that after they are stacked, they are doomed to the reboot of death loop

 

 

 

 

MRCUR
Kind of a big deal

@DrFiber I have not had the issues you're seeing with 425's when upgrading from 9.X to 10.X firmware and all of my 425's are stacked. The downstream switches from the 425's are even 225's like yours with LACP uplinks to the 425's. 

 

The only thing I can think of is if you are scheduling the upgrades to happen all at the same time, or if you are staging them for the 225's first and then 425's. I schedule all of the 225's to go at least 1 hour prior to the 425's so they are all complete prior to the 425's going down. 

MRCUR | CMNO #12

The first upgrade was a push from Meraki - with no sequencing

 

Since then - they added beta feature to do staged / sequenced upgrades to my dashboard

 

My issue is ax post facto - after the non sequenced push - the 425s get hung in an unrecoverable state - can't go forward to 10.26 - can't go backward to 9.37

 

Sometimes after several hardware defaults - I am able to get out of the loop and go back to 9.37 - sometimes the switch is bricked - and sometimes after leaving the switch connected for 4 hours it magically exits the loop and downgrades back to 9.37

 

In all cases, during this process, the MS-425 gets partially through the firmware load, re-boots, resets the static to DHCP and loses connectivity

 

We don't have DHCP on VLAN 1 - and our external address is not on vlan 1 - so maintaining the static is important for us

 

The magic switch above recovered after 4 hours of several re-boots - I think - because I had set up DHCP on VLAN 1

 

Other switches are not a magical and they turn into door stops after several days of the reboot of death loop

 

To be clear - I had no issues upgrading/downgrading the MS-225s

 

Want to trade MS-425s ? ...... LOL

 

 

ex post ...

MRCUR
Kind of a big deal

@DrFiber Do you still have 425's that need to be upgraded, or have all of them been upgraded and/or replaced at this point? 

MRCUR | CMNO #12

Still trying to upgrade

 

Have a RMA unit - plan is to upgrade as a stand alone - then add to Network - and hopefully upgrade it's twin

 

 

Any updates on this issue? I have been having this problem on my ms-220s and sure enough tried to update to 10.40 tonight and 1 of the 220s hung. It is not easy for me to get to this switch and rebooting the switch never seems to help.

 

this issue has been going for over a year and still happening? Has anyone found a way to fix this remotely?

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels