L3-Switch-Stack stops forwarding after config changes

KarstenI
Kind of a big deal
Kind of a big deal

L3-Switch-Stack stops forwarding after config changes

Hi all,

 

I again ran into a problem that I first faced about two years ago on firmware version 11 and the same problem also with MS version 12 and now 14:

 

Environment:

- Core-Stack with 2*MS35x

- systems were upgraded some time before

- after the upgrade *no* additional reboot was done

 

The next change with SVIs, like adding a VLAN causes the failure that the switch-stack stops L3-forwarding packets. L2-switching still works. After rebooting one of the cores normal operation is directly restored.

 

While on the case with the first occurrence I was told that there is a bug in replicating the forwarding-table between the switches over the stack-link and this will be addressed in the next major version.

 

When it happened last time the Meraki engineer took captures and said that is was the neighbouring Catalysts fault (L2 device between MS and Firewall) because he didn't see any packets from that device and there is nothing he can do. But the MS-reboot directly restored operation.

 

Now it again happened with MS14.31 and yes, the reboot worked again.

 

Usually I make sure that after an MS upgrade I always do an additional reboot because of this. But I am not aware of all updates with all customers.

 

Why this post?

- Have you seen a similar behaviour and if yes, is there anything special in your environment?

- For all the rest, just be aware and if you face a problem like this, just reboot one of the Core members. Luckily the reboot is really fast with these switches and the network is back in operation.

 

Have a great day with hopefully no downtime, Karsten

6 Replies 6
Ryan_Miles
Meraki Employee
Meraki Employee

Sounds like the known issue listed in MS 15 beta firmware release notes "In rare circumstances, changes made to SVIs may result in connectivity loss for one or more SVIs until reboot (predates MS 12)". A Support case could help confirm that though.

 

 

Ryan / Meraki Solutions Engineer

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
KarstenI
Kind of a big deal
Kind of a big deal

Oh, now that you mention it I see it in the release notes ...

But why do all these "rare circumstances" always hit me ...😭

Inderdeep
Kind of a big deal
Kind of a big deal

@KarstenI : Because you are the leader checking all possible rare issues and alert us early 🙂

Regards/Inder
Cisco IT Blogs awarded in 2020 & 2021
www.thenetworkdna.com
KarstenI
Kind of a big deal
Kind of a big deal

Now I feel like being really important ... 😉

KarstenI
Kind of a big deal
Kind of a big deal

It is only mentioned in the release-notes starting version 15.2. Probably someone forgot to add it to the v11, v12 and v14 release-notes. That really should be added back at least to the last v14 notes.

@KarstenI , this issue is sort of like hydra as in, we fix one instance of the issue, can no longer reproduce it, and for some reason, it keeps coming back one way or another.

 

You are right, that this was originally seen in switch 11, but was rare. This issue started getting reported more often in later versions which led to the issue being investigated further and resolved in MS 14.28. The changelog message for this fix was:

In rare circumstances, the next hop VLAN is incorrectly modified for existing OSPF SVIs when a new SVI is added which will also participate in OSPF. A reboot will correct this (present since MS 11)


Shortly after, additional reports of similar issues were reported. These were resolved in MS 14.31. Changelog message for this:

Modifying SVIs on switches could cause the next-hop VLAN to be incorrectly set; which would result in routing outages


The most recent occurrence, that is impacting MS 14.31+ versions is yet again, another instance of this issue. The MS 14.32, along with MS 15.2+ changelogs have been modified to have the following issue in the "Known Issues" section:

In rare scenarios, changes made to SVIs may result in connectivity loss for one or more SVIs until reboot (present since MS 14.31)

This issue is actively being investigated, with hopes that it will make it into a future release soon. 

 

Hopefully, I have been able to add a bit of additional clarity around the troubles you are facing. 

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels