Stacked Switches, Mirrored Ports, Fetch Cloud Configuration, Upgrades

sdh1972
Here to help

Stacked Switches, Mirrored Ports, Fetch Cloud Configuration, Upgrades

Hello,

 

At a remote location, we have 4 switches(MS350) running in a stack and performed a routine upgrade from 10.25 to 11.22.  After the upgrade we started seeing the following:

  1. Dropped pings (from data center to servers, computers at the remote location)
  2. Could not ping the SVIs on the switch stack from the data center
  3. Computers on the local LAN could not ping their default gateway (SVI on the switch stack). However, the SVI could still pass traffic, meaning the computers could still access resources on/off the network segment, but could not ping the its default gateway.
  4. High packet loss and latency when pinging the individual switches in the stack from the Meraki portal
  5. The main applications it impacted for us were VoIP and Video conferencing.  Phones would re-register several times a day or hang while trying to re-registers.  Additionally, Zoom video conferencing was choppy, etc. Uses complained the network was slow.
  6. Switches turning yellow and green, (I only caught this because i happened to be in the portal monitoring the sites)

Meraki performed packet captures and said they did not see return traffic from the upstream devices. Initially Meraki thought the issue was 11.22 (as there is a known bug related to stack switches in the release).  They recommended the following:

  1. Upgrade to 11.28 (new release candidate) at that time.  During the upgrade most of the switches upgraded, but a few 220s  didn't..we hard power cycled the Meraki switches, removed most of the aggregate ports and the upgrade was finally successful on those Meraki switches. 
  2. Move Management network for switch stack (only) off the stack:  There is a known issue\design best practice with Meraki (with stack switches only), where the management IPs of switches in a switch stack should not be on a SVI that is housed on the switch stack.  We moved our management network from the switch stack and put it on our router.
  3. Remove all aggregate ports

 

None of those next steps fixed issues 1-6.  The network performance got even worse, we lost ospf connectivity between the switch stack and our mpls router.  Per Meraki, the switch stack was having issues fetching its configuration, ospf was no longer established and now the site was down hard.  The switch stack in the portal alerting something similar to "stack configuration in member different from dashboard"

 

After finally getting to the Meraki escalation team (going through 5-6 engineers, 2 days later, escalating to account team and finally a down hard network), they were able to isolate the issue to the following:  port mirroring on the stack triggered an issue with switch stack being unable to fetch its configuration from the cloud.  We removed ALL port mirroring, rebooted the switch stack, ospf re-established and issues 1-6 immediately cleared.  The site is back up and performing normal.

 

 

UPDATE 12/14:  We continued to have poor performance (demonstrated by high packet loss,  error messages indicating that the merakis were having a difficulty reaching the cloud, user complaints of slowness, etc) with the switch stack (after all the changes above).  We ended up having to remove the stack....no reported issues after removing the stack.

 

 

 

 

6 REPLIES 6
NolanHerring
Kind of a big deal

Thank you so much for coming here and relaying this information for others to learn from. Sorry you had to go through all that =(

Time for me to go check to see if I have any port-mirroring that I've forgotten about lol
Nolan Herring | nolanwifi.com
TwitterLinkedIn
Nash
Kind of a big deal

Oh noooo, thank you for sharing.

 

Did you restore the port mirroring after you were done, or leave it off?

Hello Nash, we have not enabled port mirroring.  We are going to wait a few days and re-enable.

@sdh1972  Owch what a painful excercise, thanks for sharing though. Were you actually using the port mirroring or had it been setup in the past for something and forgotten?

Brons2
Building a reputation

I had problems with 11.22 also, see my thread "11.22 killed my call center".

 

I rolled back to 10.45 though instead of going forward to 11.28.  It sounds like I need to stay on 10.45 if I can't use port mirroring on 11.28.  I have mirror ports going to a security device, looking for lateral movement of malware.  I have no intention of disabling this.

 

 

NolanHerring
Kind of a big deal


@Brons2 wrote:

I had problems with 11.22 also, see my thread "11.22 killed my call center".

 

I rolled back to 10.45 though instead of going forward to 11.28.  It sounds like I need to stay on 10.45 if I can't use port mirroring on 11.28.  I have mirror ports going to a security device, looking for lateral movement of malware.  I have no intention of disabling this.

 

 


That is a safe way to go for sure, so I respect that.

 

That being said, I am curious though if it was only because port-mirroring was enabling 'during' the upgrade process that it broke stuff.

 

I wonder if you turn it off 100%, perform the upgrade, and then re-enable post upgrade if it would still cause issues or not. Obviously I don't expect you to find out with your production environment of course lol, but just food for thought.

Nolan Herring | nolanwifi.com
TwitterLinkedIn
Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels