I'm implementing Warm Spare with 2 MX250. Both MX are connected with each other via 2 links in different VLANs. They take the network addressing perfectly wthout problems and they are connected to Meraki Cloud.
I also have 2 stacks, one stack with 3 switches and the other one with 2 switches, the issue that I have is; when I unplugged a cable from the INTERNET side the the Stack with 3 switches still connect to the internet without problems because they use de MX_SPARE as sppected, but the other stack of 2 switches don't. It loss connection, i checked the configuration many times and I also compared the configuration with the other stack and they have the same config.
How do the MS stacks connect to the MX's?
They are connected by two 1gbits links to each MX. For example: the Stack1 (3 MS), are connected from 48 port to MX_MASTER and another 48 port to the other MX_SPARE.
Hi, and yes!
I followed that guide and I did the same steps. The stacks have 1 link to the MASTER and another to the SPARE.
Do both stacks use the same vlan to uplink to the MX?
What is you MX and MS version?
Do the two switch stacks have a link between them?
We have an HA pair of MXs but they are set up with the same configuration. They are connected to one central core switch only (not different stacks with separate VLANs you seem to have) which is then connected to the other stacks. You may wish to reconfigure them into this setup as ours works every time with zero issues. Add a core uplink switch in the mix and set up the HAs with same VLANs so you dont have issues.
Did you ever figure this out. I am having a similar issue. I have two separate switches but the idea is they provide redundancy to our CISCO UCS. I was told by one Meraki tech that the architecture will not work.
It is a bug on that appliance. They are working on a fix, till now, there is no solution.
Am I correct in assuming that your MX's are both ACTIVE ACTIVE not Warm spare.
Does anyone have any experience of typical Warm Standby failover times ?
We have a design that requires the DC to terminate 2 x Internet & 2 x MPLS, however if we terminate 1 INT + 1 MPLS on the Active MX if that fails the Warm Standby will take x seconds to become active where we loose both INT + MPLS so in this situation would you have 2 x ACTIVE MX on separate networks and not a warm standby design.
@MickeyDawson Warm Spare does not support an active/active setup when running NAT mode. I believe you can achieve active/active when the MXen are acting as VPN concentrators but I've never personally deployed that setup.
Even with NAT HA, the failover time is extremely quick - likely sub 1 second in most cases. If you are monitoring the LAN side of the warm spare, you'd likely only notice one or two dropped pings.
This doc has some great info on how NAT HA is implemented: https://documentation.meraki.com/MX-Z/Networks_and_Routing/NAT_HA_Failover_Behavior
Hi, what scenario would you describe the MX having sub second failover? I would have thought that once it looses connectivity to Meraki there would be a lag between the failover between the existing and the warm standby as if you had a intermittent problem regards internet connectivity would they both try and become the primary?
Maybe incorrectly I thought the failover to warm standby was over a min or two maybe I am wrong. 🙂
I haven't even got to that point yet. I am working with a CISCO engineer on the problem and we think we have a solution. My problem is that I can not get everything forwarding let alone test for failover.
@MickeyDawson You should take a look at the document I linked to as it describes in detail how VRRP is implemented when running MX warm spare.
In the event of a hardware failure, the spare MX waits 3 seconds (misses 3 VRRP advertisements from the master) before becoming active. If the primary MX changes its VRRP priority to indicate its uplink is down, the spare MX immediately takes over.
Once the spare becomes active, that's where I'm saying you'll likely see sub second failover. But as you can see from the doc and the above, even in a hardware failure scenario, you're looking at 3-5 seconds failover time. It's very quick.
Sorry for the previous posts after reading the article you advised its VRRP so local 😉 now I understand.
I thought incorrectly it was cloud controlled 🙂
I think I may have a solution that will work for you. I was able to get full redundancy using this configuration across multiple switches.