MX Failover does not update public IP of attached MS or MR

SOLVED
Ramtech
Here to help

MX Failover does not update public IP of attached MS or MR

Hi all,

We have annoying problem at the moment.  Brief intro of the config is  MX with 2 WAN ports not using Load Balancing or Active Active Auto-VPN.  Primary Uplink is W1.  An MS attached to the MX (No SVIs) and an MR attached to the MX also (No SVIs).  
Our problem is, when the MX Fails over to the W2 after a failure of W1, the Public IP of the MS and the MR (and hence the Src address of all attached devices), stay as the W2 IP for many hours (or days) after the MX has switched back to the Primary W1 address.  All traffic from devices attached to the MS or MR is seen as coming from the W2 address until this updates.
If I boot the MS or MR they update and all is well.  But this is cumbersome.
The problem this causes is for SIP traffic, we are getting Unidirectional RTP (and hence unidirectional Audio) until they update.  
How can I get the MS and MR to update their public IP (and hence, all devices src addresses) to the active MX WAN interface immediately, or at least within minutes?
Thanks in anticipation.
Ross.

Regards
Ross
1 ACCEPTED SOLUTION
Ramtech
Here to help

So after speaking to Meraki Support, I now know the following.

  • It is expected behaviour.  The MX maintains maps of flows for a minimum of 5 minutes to facilitate what the support rep described as graceful failover.
  • ANY flow that the MX sees that it has a map for (e.g. any flow in the last 5 minutes) it will hold that flow to the previous map indefinitely. 
  • So if the MX fails over to the WAN2 it will disrupt traffic and force the traffic to use WAN2. Expected.
  • When the MX switches back to primary, it still sees any flows in it's maintained maps until there is no flow from that Src, to that Dst for > 5 Minutes. (Seems ridiculous to me)
  • So this means it will stay using the failover WAN forever if there is a lot of flows from that Src to that Dst.
  • There is apparently a back end (Meraki Engineer only) way to change this, BUT...
    • You must have the MX on a dedicated MX network, not a combined network,
    • AND you must be running v17.x firmware, which at time of writing is not a stable release.

This seems quite ludicrous to me.  Why would you want the flow to remain on the secondary WAN (in a non-load-balanced network), after switching back to primary.  I can understand you wanting that TCP session to finish to avoid interruption, but not the entire flow.  so a 5-10 second window I could understand for smooth transition between WANs, but not 5 minutes.
Anyway, that is what I have been told.  Hope that helps someone else that runs foul of this.

Regards
Ross

View solution in original post

7 REPLIES 7
AjitKumar
Head in the Cloud

Hi @Ramtech,

Seems strange.

As MX is L3 and NAT Device. I understand all the downstream devices should use W1 / W2 as per the availability of Link.

I do not see any role for MS / MR out here. They are just L2 (NO NATTING).

The NATTED (Public IP) is for the MS / MR to reach cloud. The devices (endpoints ie. Laptops / Desktops) connected downstream would also work with the similar behavior.

 

One thought - could you please review Uplink selection & Internet Flow configuration.

  • Security & SDWAN > SDWAN & Traffic Shaping > Uplink selection

Load Balancing.PNG

 

Also you may consider reviewing the firmware configuration.

 

 

Regards,
Ajit
AjitsNW@gmail.com
www.ajit.network

Hi Ajit.

Firmware is latest firmware on all devices.
As I mentioned in the OP, I am not using (and do not want to use because of the other issues it creates), Load Balancing, Active-Active AutoVPN, or flows.  I also specifically do not want to use them.
The problem is exactly as described in OP.  When the MX fails over, and then later switches back to primary, the MS & MR and all connected devices to them, do not follow.  They remain on the WAN 2 IP for a considerable period afterwards.  Normally in the order of 1 day.  This is the behavior I am trying to rectify.
I have replicated the same behavior on other tenancies and other networks also.  It seems to be an issue with the Meraki's uniquely for some reason I am un aware of.

Ross

Regards
Ross
AjitKumar
Head in the Cloud

Hi Ross,

Apologies I did not read the OP properly.

I am clueless for this behavior. Usually config is updated near to real time on Meraki Solutions.

I hope Meraki Support may help us understand / fix this behavior.

 

 

Regards,
Ajit
AjitsNW@gmail.com
www.ajit.network

Yeah it's weird.  But have confirmed it in several of our environments.  So it seems to be systemic. Still waiting (gosh the meraki support is slow in responding) for any response other than acknowledgement of receipt).

Regards
Ross
Ramtech
Here to help

So after speaking to Meraki Support, I now know the following.

  • It is expected behaviour.  The MX maintains maps of flows for a minimum of 5 minutes to facilitate what the support rep described as graceful failover.
  • ANY flow that the MX sees that it has a map for (e.g. any flow in the last 5 minutes) it will hold that flow to the previous map indefinitely. 
  • So if the MX fails over to the WAN2 it will disrupt traffic and force the traffic to use WAN2. Expected.
  • When the MX switches back to primary, it still sees any flows in it's maintained maps until there is no flow from that Src, to that Dst for > 5 Minutes. (Seems ridiculous to me)
  • So this means it will stay using the failover WAN forever if there is a lot of flows from that Src to that Dst.
  • There is apparently a back end (Meraki Engineer only) way to change this, BUT...
    • You must have the MX on a dedicated MX network, not a combined network,
    • AND you must be running v17.x firmware, which at time of writing is not a stable release.

This seems quite ludicrous to me.  Why would you want the flow to remain on the secondary WAN (in a non-load-balanced network), after switching back to primary.  I can understand you wanting that TCP session to finish to avoid interruption, but not the entire flow.  so a 5-10 second window I could understand for smooth transition between WANs, but not 5 minutes.
Anyway, that is what I have been told.  Hope that helps someone else that runs foul of this.

Regards
Ross

One correction. It can be enabled on combined networks. It doesn't support template bound networks today.

PhilipDAth
Kind of a big deal

Ask support to enable the "Enhanced Failover" option.  Then configure the MX for "immediate" failback and your SIP issue will be resolved.

https://documentation.meraki.com/MX/Firewall_and_Traffic_Shaping/Connection_Monitoring_for_WAN_Failo... 

 

> Why would you want the flow to remain on the secondary WAN (in a non-load-balanced network), after switching back to primary

 

So that it does not break anything using TCP.  For example, we use MS Teams for our phone system (which is TCP based).  By not doing an immediate failback, anyone on a phone or video call can continue to complete that call without any interruption.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels