Sticky Failover

Kunst
Just browsing

Sticky Failover

MX64 with primary WAN (1) connected to a fiber base service provider 500M pipe and secondary WAN (2) (AKA LAN 4) connected to AT&T LTE backup router. Public IP's are on both networks.

 

We recently discovered during hard failure of the WAN 1 services automatically move to WAN 2 as expected. However, when WAN 1 service was restored about 30 minutes later MX64 did not force back all services to WAN 1 thus leaving some devices and VOIP services traversing WAN 2 which later cause data overages on the AT&T LTE backup. 

 

We are NOT load sharing WAN 1 is primary. 

 

Any guidance the community can provide is appreciated.

 

mx64-1.JPG

5 REPLIES 5
UCcert
Kind of a big deal

Hi @Kunst , that doesn't sound right at all.

 

Have a quick read of the below link - https://documentation.meraki.com/MX/Firewall_and_Traffic_Shaping/Connection_Monitoring_for_WAN_Failo...

 

Sounds like a connection Monitor issue.  Do you have the below ports and IP's open on your firewall?

 

UCcert_0-1614149163375.png

 

Darren O'Connor | uccert.co.uk
https://www.linkedin.com/in/darrenoconnor/

I'm not an employee of Cisco/Meraki. My posts are based on Meraki best practice and what has worked for me in the field.
ww
Kind of a big deal
Kind of a big deal

Afaik all the existing (tcp) sessions keep using wan2 until they end.  Meraki does not active reset them. For some wan voip services i can imagine they dont break tcp session for a long time.

Kunst
Just browsing

Hi WW and thanks for your reply. 

What you are describing seem to be related to our issue.

 

Agree the VOIP services are always communicating with the hosted voice switch my assumption based on your response is we need to have the device cease to communicate, that seems to be problematic.

 

I contacted our VOIP service provider today who sent me screen shot showing some phones on the primary IP and some on backup IP while primary is up, which is the real issue. There support team recommended reducing the DHCP lease time..... but I am not sure that is really going to do anything to resolve this issue. 

 

Are there any fixes or setting we can make in the Meraki that would help reduce the time Meraki waits before switch the device back to primary?

 

Kunst  

ww
Kind of a big deal
Kind of a big deal

I do not know a fix. Other then just rebooting the phones or disconnect wan2 of the mx

Reducing  lease time does not disconnect  any device. It just let the phone renew its lease more often.

PhilipDAth
Kind of a big deal

@ww is correct.  This is expected behaviour.

 

For voip services, we always use SIP/TLS.  Being TCP based on the change over it breaks the TCP session, and forces the phone to make a fresh registration and recover.

 

See if your provider can support something TCP based.  Ideally SIP/TLS.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels