Hi team Meraki,
I come before you again with another WAN failover problem. Although, to be fair, I think this is really a continuation of the same problem I've had for years, just with a new client.
The question: Is there any way (either from the front end GUI or via a support team modification through the back end) to configure a timeout for active flows to reset routing preference?
The backstory:
Meraki MX64
- WAN1 - wired NBN internet link.
- WAN2 - 4G modem with static IP
Meraki is configured with WAN1 as the primary uplink, with failover to WAN2 as needed when WAN1 drops.
Problem:
- Client has an onprem VoIP phone system.
- Phone system maintains a long-running SIP connection for signaling back to the provider's environment
- Phone system processes the actual voice calls on a different IP/port socket. This bounces up and down as calls are made, etc.
- Client WAN1 has been unreliable, and dropping out.
- Aside - I suspect these dropouts are false-positives and are caused by link saturation which causes the Meraki DNS probes to time out. Either way, it doesn't affect the actual problem
- When WAN1 fails, all traffic is mapped over to WAN2. Phones continue to work, etc.
- WAN1 comes back up. Web browsing, etc, is mapped back to WAN1. Easy peasy - these flows start and stop all the time, so it's easy for them to time out (~5 mins from memory) in the Meraki and then be established next time on the primary WAN.
- But the phones, specifically the Signaling connection, stays on WAN2. Because this is a long running connection with keepalives, it never drops. If it does drop (by restarting the VoIP server, for instance), it attempts to reconnect to the upstream WELL within the 5min flow timeouts. Because of this, it NEVER fails back from WAN2. It just stays on WAN2, chewing up expensive data and having all the unreliability of a 4G/cellular connection.
I understand that the software is designed like this so that it doesn't induce a second outage on the failback. For most things, this is fine. But in this scenario it means that some things stay on the secondary link forever.
What I'd ideally love is a way to keep the default behaviour, but mark specific flows as needing to be reevaluated every so often (30 mins, say?). If after 30 mins the flow is reevaluated and better matches a different route, the flow preference should be hard-switched to the new interface/route. If that causes the connection to drop and need to be reestablished, so be it.
Am I dreaming here, or has someone else already solved this problem with Meraki devices, long-running connections and WAN failover?
Cheers,
Matt