Meraki

Adoos · ‎Nov 14 2018

Hi,

Seeking some guidance around warm spare configuration with MX65.

Question 1: does the spare mx need uplinks to both internet interfaces?

Currently we have a masterMX with Internet 1 (MPLS connected) and Internet 2 (SP Managed Router NAT). The spare MX has no uplink in internet 1 but uses internet 2 connected to the SP managed router.

The service provider managed router is giving the MXs an IP Addresses via its DHCP. The MPLS is a static private address.

Question 2: Is it poor practice to use a direct heartbeat cable between the MX devices?

We are using a heartbeat between the two MX devices and also have two MS120 switches connecting both MX devices.

Happy to clarify further if my explanation is no good. As of late we are having very strange symptoms with MX devices fighting each other randomly.

Thanks

AH

MacuserJim · ‎Nov 15 2018

The spare MX does not need an uplink in both WAN interfaces, it just needs to have at least one source to the internet to function. However additional uplinks will provide redundancy.

With the Meraki warm spares you shouldn't have a heartbeat cable directly between the two. They will need to be able to see each other on the LAN though so they can send VRRP packets to each other and know when to fail over or not.

Adoos · ‎Nov 15 2018

Hmm doco says this regarding uplinks:

Both MXs must share the same number of uplinks. That is, if the Primary MX has dual uplinks, then the Spare must have dual uplinks as well.

jdsilva · ‎Nov 15 2018

@Adoos wrote:

Question 2: Is it poor practice to use a direct heartbeat cable between the MX devices?
We are using a heartbeat between the two MX devices and also have two MS120 switches connecting both MX devices.

Remove it. Burn it with fire. Banish it to hell.

I've wasted untold hours of my life dealing with problems because of that stupid recommendation. Follow the updated recommended topologies.

https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair#Recomme...

🙂

http://blog.brokennetwork.ca |

@jdsilva

Adoos · ‎Nov 15 2018

We are also spending countless hours and getting random calls from branches who have lost connectivity because of the fighting. We will try removing the heartbeat and see if this helps. Meraki confirmed a BUG for the mX84 regarding direct heartbeat cables but said mx65 was not included.

NolanHerring · ‎Nov 15 2018

Direct cable for VRRP is not model specific. Avoid it always.

Nolan Herring | nolanwifi.com

Aaron_Wilson · ‎Nov 16 2018

Hold on a second, these updated drawings drastically differ from what they had before. In the past I had issues with the primary and secondary Meraki seeing each other via the switch they were connected to, I *had* to patch the two together with a heart-beat.

Are you saying they resolved this? Also, I have a ton of HA pairs in deployment with direct heart-beat in use and nothing negative has occurred.....yet. Is there a certain firmware where this all goes south?

Lastly, I believe the HA pairs have to be layer 2 adjacent in some fashion, correct? So if by chance they are not tied to the same down stream switches, then the heart-beat is needed, right?

This doc still shows direct connection: https://documentation.meraki.com/MX/Networks_and_Routing/NAT_HA_Failover_Behavior#VRRP_Mechanics_for...

NolanHerring · ‎Nov 16 2018

That link your provided is not showing a direct connection between the MX appliances. What it is doing, is being lazy and not showing you the switches. It is just summarizing it and saying 'LAN Connection'. Basically meaning through the switches.

VRRP is sent out all VLANs, so if you have a warm-spare setup, and both of them are connected to the same switch/switches (which they should be), and you have proper VLANs configured on the uplinks between MX to MS, then they will 'see' each other.

Not sure if a specific firmware causes it to go south, and it DID work, and probably still does. The issue is that there have been cases/situations where it CAN cause problems, and ones that are tough to diagnose at that. So Meraki has since updated their documentation to no longer recommend this.

Nolan Herring | nolanwifi.com

jdsilva · ‎Nov 16 2018

Hi @Aaron_Wilson

@Aaron_Wilson wrote:
Hold on a second, these updated drawings drastically differ from what they had before. In the past I had issues with the primary and secondary Meraki seeing each other via the switch they were connected to, I *had* to patch the two together with a heart-beat.

VRRP advertisements are a link-local multicast, so all they require is a continuous layer 2 path between the Active and Standby units. If you have a switch between two MXes, and VRRP cannot propagate between them, you have much more serious issues on your network that should be looked at. A switch's entire purpose in life is to provide layer 2 connectivity, so if it's not doing that then I would be asking myself if my switch vendor choice is the correct one.

Let's also put this in perspective. The layer 2 path that VRRP needs is no different whatsoever as the layer 2 path your hosts would need to get to their default gateway. We're not talking about some kind of special construct needed to handle some kind of special traffic. This is absolutely the most basic functionality required of an Ethernet switch.

Are you saying they resolved this? Also, I have a ton of HA pairs in deployment with direct heart-beat in use and nothing negative has occurred.....yet. Is there a certain firmware where this all goes south?

The problems are less about firmware and more about different failure scenarios. When you're designing a network with a FHRP it's actually desirable to have the hellos follow a path representative of your user traffic. If you create a "shortcut" for FHRP traffic then you introduce scenarios where the path between your clients and the gateway is utterly broken, but VRRP is humming along just fine because it has a special path just for it.

And that's not the only concern. Consider Spanning Tree, and that the MXes do not participate in your STP topology. If you dual connect two switches to two MXes you break the point-to-point type links that RSTP is looking for. Or put another way, Your switches will receive two different BDPUs on each port that's connected to an MX. This can cause STP become unstable during convergence and lead to longer convergence times. Indeed, this is where I had my most pain and actually had bridging loops exists for extended periods during reconvergence events, bringing the network down to its knees.

Now, granted, the root cause of this issue is the event that's causing STP to reconverge, not the heartbeat cable, but having STP become unstable and taking extended periods to reconverge when it should simply adjust to the changing conditions is a result of poor design. This is a classic case of a poor design in one area being exposed by a problem in another area. For a good network you need to consider all aspects, and how they interoperate with each other, and design failure domains such that an issue with one part doesn't kill another, unrelated part.

Lastly, I believe the HA pairs have to be layer 2 adjacent in some fashion, correct? So if by chance they are not tied to the same down stream switches, then the heart-beat is needed, right?

Yes, they must be layer 2 adjacent. But that doesn't mean they have to connect to the same switch. They only need to be on the same VLAN. And the MXes will send a VRRP advertisement on every VLAN configured on it. All of those must fail for the MX to fail over (which is actually a poor VRRP implementation, but that's for another post 😉 )

Remember, this is the same type of layer 2 connectivity you need between your hosts and your gateway. If this meant you had to connect your gateway to the same switches as your hosts then the at most you could only ever have one switch in your network! That would be a ridiculous limitation.

This doc still shows direct connection: https://documentation.meraki.com/MX/Networks_and_Routing/NAT_HA_Failover_Behavior#VRRP_Mechanics_for...

What @NolanHerring said 🙂

http://blog.brokennetwork.ca |

@jdsilva

Aaron_Wilson · ‎Nov 16 2018

Thanks guys for the replies, and it makes sense.

Just very strange how they went from preferred of direct heartbeat cable to now going through the down stream switches (them having this recommendation for so long).

Adoos · ‎Nov 18 2018

How long have they been in production and do you have a dual internet links on one?

So we have two documentations showing different physical architecture:

https://documentation.meraki.com/MX/Deployment_Guides/MX_Warm_Spare_-_High_Availability_Pair#Recomme...

and

https://documentation.meraki.com/MX/Networks_and_Routing/NAT_HA_Failover_Behavior#VRRP_Mechanics_for...

I can certainly say that in the real world we are having random issues with the direct cable between two MX devices. It's been confirmed a software bug exists for the MX84 when they are directly connected.

Aaron_Wilson · ‎Nov 19 2018

My main US data center has two MX400s with dual internet links for both and a heartbeat between the two. Running 13.33, zero issues. Its been this way for around 2 years?

Most other head-ends/DCs are single internet (per MX) with heartbeat between the pairs.

One important note though, each MX single threads to the upstream and downstream switches, they do not cross-connect at the distribution layer as shown in Meraki's updated drawings.

I already know what you are thinking and will say, so no worries. However, keep in mind the Meraki gear we have deployed is in parallel to our core Cisco infrastructure, but again, I know the routine 😉

Meraki

Community

MX65 Warm Spare

MX65 Warm Spare