MS425-16 with MX600 issues

akan33
Building a reputation

MS425-16 with MX600 issues

Hi all,

 

My company decided to go for Meraki some months ago, and I am still struggling with some issues that I'd like to share to see if anyone else is facing them.

 

we have a couple of MX600 working as active - warm spare, both units are connecting with 1 x 10ge link to 2 MS425 switches working in stacked mode. Both firewalls are also directly connected between them for VRRP purposes.

 

1 - My first surprise comes when I see that one of the ports in the switch, the one connected to the active MX unit, is in STP discarding mode. I raise the issue and Support says this is how Meraki Stack switches work (??), and the firewall works with VRRP, passing also the STP BPDUs through the link between them, therefore creating a loop that the switches need to block. OK... How is possible that there is root bridge election within stack members? 

 

2 - The firewall doesn't seem to monitor the LAN ports, as it only relies on VRRP packets, if we remove the cable connected between them, and any of the switches fail, we face a brain split issue impacting all the internet. therefore, we need that cable in place, and if you don't have crossed connections between firewall and switches, you will have a problem because the active unit REMAINS active even if there's no LAN connection anymore.

 

3 - I have powered off the switch connected to the active firewall, and I was shocked that it took 1.35 minutes for ALL my traffic to recover, it should rely only in STP, isn't it? and even with the RSTP timers, any impact is just not acceptable in a corporate environment. Just to mention that when powering it back it caused another 35 of service disruption. For this point, I am waiting for the reply.

 

The HA setup is just giving us many issues. I don't know if you guys are also experiencing these problems? 

 

Thank you.

 

11 REPLIES 11
PhilipDAth
Kind of a big deal
Kind of a big deal

The first important thing to know is that MX units don't run spanning tree.  They do forward the BPDU packets though.  So in your config a switch BPDU goes out one switch port, through the two MX units, and back to another switch port.  So the switches see their own BPDU and know that there is a loop and consequently shut down one of their own switch ports.

 

The design you are using is called a "box" design, because of the cabling and the core switches it looks like a box.

 

I don't personally like the box design because you can get the scenario where all traffic trying to get through to the active MX gets dragged through the standby MX - not optimal.

 

I prefer the approach where each MX is dual connected to each core switch - and their is no link between the MXs.  The switch will knock out one link to each MX.  However you are guaranteed that traffic always goes directly to the active MX.

 

2. This is a double failure you are discussing.  You'll also find my suggest approach also copes better with a switch or cable failure.  Basically my approach has more links and as a result more redundancy.

 

3. If you repeat the test using my approach the failover time is usually a maximum of 30s.

 

 

 

Also with (3) the failback with my approach usually has no loss of service.

akan33
Building a reputation

thanks for the quick reply Philip.

 

Is this stacking mode expected? I mean, taking this root election even within the same stack members.

PhilipDAth
Kind of a big deal
Kind of a big deal

This has nothing to do with being a root bridge or a non-root bridge.

 

When you hardware stack two switches they act as a single switch.  If you take any switch that runs spanning tree and connect it to itself it will block one of those ports.

akan33
Building a reputation

thanks. According to meraki support there's actually an election based on the lower MAC.

 

I wonder in your scenario, if we have crossed cables, and both ports from the other switch are STP blocked, you say there's no disruption, but I guess that it would have some impact due to this recalculation? 

 

ps. I don't tend to stack Meraki core switches like MS425's (assuming you have just two of them).  That is because when you upgrade a stack you have to let it reboot both at the same time.

 

If they are not stacked you can upgrade one, reboot and then the other.  So if everything is dual connected you wont have an outage.

 

Also note that "stacking" can be turned off on the two 40Gb/s ports, and you can make them ordinary ports.  So you can still connect the switches together using these ports.

Also note when the switches are not stacked you can configure one switch to be a warm space for the other.  If the primary switch fails the warm spare takes over the layer 3 config of the primary.

akan33
Building a reputation

I'll take that into account as stacking in Meraki doesn't seem to have any benefit in terms of HA and STP. 

 

My problem here is that my design is based on business decisions too, I have to insert a 3rd party appliance (IPS) between the MX and the switch, and i can't use more than 1 link per appliance (it is not in-line yet for my testing). 

 

Therefore I see 2 options: 

 

1  - use these IPs and only 1 link, but accept this 1.35 minutes downtime, or 

2 - create this crossed connection and avoid the IPS.

 

thanks again Philip.

PhilipDAth
Kind of a big deal
Kind of a big deal

Maybe there is a third option.

 

Are the MX's being used "on a stick" in VPN concentrator mode?

PhilipDAth
Kind of a big deal
Kind of a big deal

If you are using VPN concentrator mode then have you considered running OSPF between the MX600s and the MX425's?  I haven't done this specific config, but the OSPF timers are likely to time out much faster.

akan33
Building a reputation

I don't have them in VPN concentrator mode. I will check this option to see how it fits. Thank you!

akan33
Building a reputation

I am sorry to say, but I think of other deployment with ASA and cisco switches in stack and redundancy is way better from my point of view. the ASA doesn't need vrrp so we avoid STP, also it monitors the links u want, triggering the failover perfectly.

in my case I will try the proposed design with the crossed links, but I was expecting more from a nexr generation firewall and switches. 1.35 minutes or even 30 seconds is not acceptable.
Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels