Meraki MX Firewalls Active / Active

Matt_Collins
Getting noticed

Meraki MX Firewalls Active / Active

This is aimed at Meraki Product Managers,   I have had many requirements to put Meraki MX's into data centers with an Active / Active configuration,  and I have had to explain to the customer that this configuration is not available. Meraki always state that they are an Enterprise Class Vendor but this is leaving a large hole if the offerings and basic requirements in a DC.   Is this a feature that will be coming soon,  is it a functional feature that cannot be implemented on the MX's.   Sadly I have to resort to other vendors to meet this requirement ( 6 in the last 3 months) - Can you please provide some feedback on this please - Many thanks

16 REPLIES 16
PhilipDAth
Kind of a big deal
Kind of a big deal

You can do Active/Active when doing VPN concentrator mode...

But tell me - why requirement needs this? What are you gaining over active/standby?

In a high capacity DC many customers want A/A ennoblement for load balancing, and quick handover in the event of a hardware failure, all other manufactures Cisco / Juniper / Palo Alto / Checkpoint etc off this functionality and we find that it continually being asked as a requirement instead of having just a warm spare.

Is this for AutoVPN termination or providing general Internet access? 

It's for both,  different customers want different things,  but mainly in a DC our customers want an A/A setup, one of our customers has a branch MPLS network with DC breakout to the internet, and all traffic is routed out of the main DC. They have the full SEC license on the MX's and one failed,   as they are a banking organisation it took too long for the standby to kick in.   Having swapped them out for a pair of Cisco ASA's in a A/A setup there is now no loss when we failed one of the units.

The case you gave is more of an issue with failover time rather than failover type.  You could have active/active, and if one active node failed you still have to consider how long it takes before this condition is detected the the remaining active node takes over the entire load.

 

With MPLS the best failover time is achieved when you run AutoVPN over the top of MPLS between the branch and DC. This is because AutoVPN sends active probes to detect far end failure, rather than waiting for it to re-actively detect it.

 

Lets take the first simple case - active/standby.  

The MX implementation of VRRP seconds a hello packet every second.  If a hello is not seen for three seconds the standby unit assumes the primary has failed, and takes over as the primary.

Note that if you have just a single leg into your MPLS network that you will need to suffer the time it takes AutoVPN to rebuild.  In my experience, recovery from primary power failure should have restored connectivity within 30s.

 

https://documentation.meraki.com/MX-Z/Networks_and_Routing/NAT_HA_Failover_Behavior#Additional_VRRP_...

 

"In a working state, the master MX will send VRRP advertisements out to the LAN every second. If the spare MX does not receive any advertisements for three seconds, it assumes that the master MX has failed and will take over as the new master (including sending its own advertisements)."

 

Once you are using AutoVPN, then rather than using active/standby at the DC, you can run two individual MX's.  It is really important that each MX has a separate stub link to the network core.  You can then run BGP from each MX to the network core.  Now that you are running BGP to the core you can crank the BGP timers right down for rapid layer 3 failover.

Now the important thing to note in this case is every branch now builds two AutoVPN tunnels to the DC - one to each MX head end.  Each branch will prefer one as the primary as the other, but you can split the branches between them.  Back to the important bit, because there are two AutoVPN tunnels already built a failover in the primary path does not take so long to failover now, because a second AutoVPN path is already built to failover to.  Plus the network core with low BGP timers will quickly remove the failed MX as a routing path.

 

There is some info about this approach at this URL:

https://documentation.meraki.com/MX-Z/Networks_and_Routing/BGP

 

 

Lets talk about ASA's for a moment.  ASA's can run in active/active mode in only two cases:

* An AnyConnect user to site VPN cluster - not applicable to this case

* Multi-context mode where they run virtual firewalls.  This is mostly used by service providers.  It does not allow you to scale up the performance of a single VM.  It allows you to scale out sideways.

 

So in the case you give, you would be running the ASA's in active/standby mode. However the ASA's support state-full failover.  This allows one ASA to mirror the state to the other.  What this does on device failure is reduce the time for VPNs to rebuild (or NAT sessions to re-establish) since the standby ASA already has this info.  Note that MX's in active/standby state also mirror their NAT state tables.

ASA's also allow you to adjust the failover timers - which MX's do not allow.  So you can crank down the failover timers so that it responds sub-second if desired.

 

 

So you see, it is not really a question of running active/active, but the time to respond and recover to a primary failure, weather it be MX, ASA, or something else.

 

In Meraki's case, some of this can be addresses by the network design and taking advantage of AutoVPN.  Potentially some of this can be address using active/active MX's at the head end and using BGP.

 

 

What you probably need is for Meraki to provide more fine grained control of the failover criteria, to control the time it takes to detect and recover.

Thanks for the reply,  but the answer is really simple,   Does Meraki have in their timeline the feature to enable Active/Active on the MX. I totally understand what you are saying but it really doesn't answer the question just goes a long way around to avoid the answer

The point being - why build something that isn't needed.  I have given a pretty good break down why active/active is not the solution for the issue presented.

 

I hope Meraki are not putting any effort into this when other things are needed more badly, like IKEv2 support, IPv6 support, and AnyConnect support.

Please, in the link below is mandatory a one-armed concentrator to establish a EBGP connection in the  DC side? Is it not possible to establish a EBGP connection between the L3 switch and the MX?

 

https://documentation.meraki.com/MX-Z/Networks_and_Routing/BGP

Hello. just to say first I really like Meraki components and dashboard, but honestly till on MX I agree with Matt it's diffficult to tell the customer it can consider it as Enterprise Class when Active/Active is not there also like creation of objects or group of IP for setting up easiest L3 rules. Anyway it's not there, but for me (even if I'm tech engineer and not a sale) I do consider it's simply not correct to buy a second hardware at the same price if it is just there "in case" and so is not use/running actively. then I would suggest Meraki to create sales pack/bundle of 2 MX with the second at half price if keeping standby mode. Ok I do not speak about a technical argument, but you know in many case to win the deal commercial arguments are stronger...

You realise that you don't have to buy a license for a warm standby - so that second unit is considerably cheaper than the first?

I've said it before and I'll say it again - you do realise you can do active/active for AutoVPN deployments?

 

Here is the basic guide using BGP:

https://documentation.meraki.com/MX-Z/Networks_and_Routing/BGP

 

Here is the DC-DC failover guide:

https://documentation.meraki.com/MX-Z/Deployment_Guides/Datacenter_Redundancy_(DC-DC_Failover)_Deplo...

Hi,

Sorry for reopen this conversation. IMO, this paragraph in both supplied documents:

 

"In a DC-DC failover design, a spoke site will form VPN tunnels to all VPN hubs that are configured for that site. For subnets that are unique to a particular hub, traffic will be routed directly to that hub so long as tunnels between the spoke and hub are established successfully. For subnets that are advertised from multiple hubs, spokes sites will send traffic to the highest priority hub that is reachable."

 

for me it means no real active-active is provided in case both hubs have to export the same subnets. OK, you can Split the spokes and configure 50% with hub1 as primary and 50% with hub 2 as primary. Doing so, you won't have the traffic going directly to the desired hub in all cases.

 

It would be great if you could set the hub priority in a subnet basis.

 

This behavior makes me think Meraki solution was deployed with dual-internet solutions in mind and not for hybrid MPLS-Internet. Provided you have hybrid solution, your hub sites (per cost reasons) usually have one Access per "color", not two. So you need hub 1 to also inject all hub 2 subnets (and viceversa) in order to cover a hub's single Access sortage. Usually some of your spokes are single Access...

 

Another thing that makes me think Meraki does not fit well with mpls is the fact all MX wan interfaces need to reach the Meraki cloud. For Internet Access this is not an issue. For MPLS...a Service Provider has to convince his customer to dedicate some Bandwidth from his DC internet Access for your solution's control/management traffic (aggregate for all your mpls sites)... or you have to deploy a dedicated internet Access at your SP Network for this.

 

Regards,

Chema.

 

 

akan33
Building a reputation

Well ASAs monitor the LAN interfaces for failover purposes right, while Meraki doesn't, if you don't have a proper mesh between switches and MX you could have some undesired behavior, imagine you have an issue between the switch and the active unit, this active unit remains as Active, so traffic would go towards the passive and then the Active... Something to take into account.

 

For that specific scenario you mention in the DC I wouldn't consider MX but other type of firewalls? MX is intended to be for enterprise only. 

I like the MX line for what it is, but it's never struck me as even close to real "enterprise" class, despite what the marketing department says.

dpf
Here to help

enterprise class is sort of an arbitrary term. if not having the ability to change any route metrics like on a traditional router, then an MX is not enterprise class. if you're limited to two uplinks, an MX is not enterprise class. if you can't do true active/active, an MX is not enterprise class.

jdizzle
Here to help

Your can do active-active in the data center, it's just not with warm spare mode.

 

You could additionally have warm spares for each of the active nodes for redundancy.

 

As I understand it, the argument against an active warm spar is that if you have two MXen worth of load, then you don't really have high availability. The moment one MX goes down, the other will choke and drop stuff.

 

It sounds like the feature you really want is clustering, where N MXen can act as a single logical one. Active-active is just a lazy version that only supports 2 nodes.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels