Broadcast Storm Brought Down Entire MS120 Network

RH6379
Getting noticed

Broadcast Storm Brought Down Entire MS120 Network

I have 1 MX84 and 12 MS120-24P switches that I'm setting up for a new install.

  • MX84 is doing the L3 Routing and has 7 VLANs defined on it.
  • Switches 1-6 are in one rack and Switches 7-12 are in a second rack.
  • Switches 1 and 7 are used as "Distribution Switches".
  • Switches 1 and 7 Trunk Ports to Access Switches do not have any STP Guard enabled.
  • Switches 2-6 and 8-12 Are "Access Switches".
  • Switches 2-6 and 8-12 have uplinks to both Switches 1 & 7 which are configured as TRUNK ports on Native VLAN 1.
  • Switches 1 & 7 each have a single uplink to a single MX84 which are configured as TRUNK ports on Native VLAN 1.
  • Switch 1 has a root value of 0 and Switch 7 has a root value of 4096.  Switches 2-6 & 8-12 have a value of 32768.

On September 22nd, our DSL Internet Link went down.

On September 24th, the DSL router was rebooted.  Once the MX84 came online, Switch 1 flooded the network with broadcast traffic to all the switches, creating a broadcast storm.  All the switches went offline and didn't come back up until I rebooted Switch 1.  (Meraki Support is stating this is what happened.)

 

Has anyone run into this before?  We're running 10.35 on the switches.  Meraki had me do port mirrors from all the uplink ports on switches 1 & 7 to an access port on those same switches.  They asked me to disconnect the WAN link to the MX84 for 24 hours and then connect it back to see if we can replicate this issue.

 

 

Meraki Design Redundant Uplinks.jpg

 

 

 

10 Replies 10
NSGuru
Getting noticed

The MX/s do not participate in spanning tree so it may be possible it was the cause of your issue. Just an idea as ive had issues with multiple uplinks going from MX to LAN if spanning tree isnt working properly. 

Cloud Network Engineer | cloudIT
Certified Meraki Networking Associate

Kudo this if it helped! 🙂
RH6379
Getting noticed


@NSGuru wrote:

The MX/s do not participate in spanning tree so it may be possible it was the cause of your issue. Just an idea as ive had issues with multiple uplinks going from MX to LAN if spanning tree isnt working properly. 


Meraki Support told me they don't support multiple links from one switch to the MX84.  They said a single uplink from multiple switches is fine.  Having just 1 switch with a connection to the MX84 creates a SPOF, right?

PhilipDAth
Kind of a big deal
Kind of a big deal

I have had issues like this with 9.x firmware, multiple times, but not with the 10.x firmware.

 

This is probably cold comfort, but I always strive for a design as loop free as possible.  In this case if you used MS210 switches you could stack them together.  You could then use LACP to all the down stream switches.  Then the only loop that would be left in the network would be the MX84 itself.  This is likely to be a lot more solid.

 

RH6379
Getting noticed


@PhilipDAth wrote:

I have had issues like this with 9.x firmware, multiple times, but not with the 10.x firmware.

 

This is probably cold comfort, but I always strive for a design as loop free as possible.  In this case if you used MS210 switches you could stack them together.  You could then use LACP to all the down stream switches.  Then the only loop that would be left in the network would be the MX84 itself.  This is likely to be a lotmore solid.

 


I have been suggesting stackable switches for some time.  Here in our HQ, we use stackable 3850's for our access switches going to 2 Catalyst 6807 switches via 10Gb links.  someone else here supports the remote offices that we're putting this Meraki gear in and he had the initial design discussion with them.  We even had the same discussion as to what you're recommending with the 2 stacked switches as the distribution layer and 120's for Access.

RH6379
Getting noticed

I tried replicating the issue this morning following the steps Meraki Support suggested, but I couldn't replicate it.  I noticed that the uplinks from Switch 1 and 7 were both in forwarding mode so I connected a patch cable from switch 7 to switch 1.  This caused switch 7 to put its connection to the MX84 in blocking mode since switch 1 is the root.  I ran a test to ensure STP "failover" between Switch 1 and Switch 7 and back by doing the following:

 

  1. Connected laptop to Switch 3 in Stack 1.
  2. Started ping -t to 8.8.8.8
  3. Started ping -t to 192.168.1.10 (Switch 9 in Stack 2)
  4. Started ping -t to 192.168.1.6 (Switch 5 in Stack 1)
  5. Pulled power from Switch 1
  6. Lost 1 ping to the 3 IP’s as Switch 7 became the root and put its uplink to MX84 in forwarding mode.
  7. Reconnected power to Switch 1
  8. Lost 1 ping when Switch 1 became the root again.
  9. Checked switch 7 ports on Meraki portal and port 24 (Uplink to MX84) is in blocking mode.
  10. Switch 7 shows the root being Switch 1 via port 23 (inter-switch) link.
  11. Switch 1 has port 24 (Uplink to MX84) in forwarding mode.
  12. Switch 1 shows itself as root.

I don't know if this will prevent that broadcast storm that Meraki said happened, but it will ensure that both switch 1 and switch 7 don't have their uplinks to the MX84 in forwarding mode at the same time.

trunolimit
Building a reputation

you know it'd really be assome if Meraki would give us visibility under the hood. CPU utiliazation, temperature sensonsrs ..ect. 

 

You could have set an alert to notify you when CPU utiliazation went above 50 or 60%. might have prevented your network from going down entirly if you could have gotten to it early. 

 

I wish Meraki would implement ALL features of the cisco IOS platform into their switches. 

RH6379
Getting noticed

agreed.
NickCalcutti
Getting noticed

It will happen down the road, im just thankful its pretty easy to use Meraki gear for ease of management and out of box visibility has helped me out tremendously. but Cisco's new DNA center brings a Meraki ease to Cisco IOS but at a higher cost and subscriptions.

 

Also Maybe look into the SNMP MIB and that may get you CPU/TEMP info through SNMP. not as nice as it could be in the dashboard but PRTG has an integration

 

https://kb.paessler.com/en/topic/59986-help-monitoring-meraki-network

 

 

Stephane_F
Conversationalist

Hi,

 

I'm sorry but it's an MX84 hardware issue that cannot be patched.

 

If you have the possibility to try with an MX100 it would be OK.

 

I've spend many month with Meraki engineer and product manager to troubleshoot this and Meraki conclude to this.

We even tried with hot spare.

 

I change MX84 by MX100 without touch to the LAN or architecture and it's working perfectly.

 

We don't deploy any MX84 in these kind of environnement.

 

Regards,

Stephane F

RH6379
Getting noticed

We swapped out our MX84's for MX100's and it happened with 2 MX100's in an HA configuration.  Meraki does not recommend a direct connection for heartbeats between the 2 MX devices any longer.  We now have 2 MX100 devices with the HA one powered off.  This is ridiculous.

Meraki Recommended HA Design.png

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels