Odd SD-WAN Traffic Behavior - *Some* Traffic from Concentrators Arriving on WAN2, not WAN1

Crocker
A model citizen

Odd SD-WAN Traffic Behavior - *Some* Traffic from Concentrators Arriving on WAN2, not WAN1

Our topology is classic hub and spoke. We have 2 hubs (at separate datacenters), configured in one-armed concentrator mode. On the spoke side, we use full-tunnel active/active AutoVPN. Our spokes generally have a wired ISP connected to WAN 1, and an MG21 on WAN 2.

 

Our spoke-side SD-WAN policy is simple. We instruct the local VOIP subnet to "use the uplink that's best for VOIP", which I understand is based on MOS. The rest of the traffic has no associated policy, thus should only traverse WAN 2 if WAN 1 goes down hard.

 

What I'm seeing is that if I run a packet capture on the Site-To-Site VPN over WAN 2, there are a handful of packets sourced from our VOIP controllers traversing WAN 2. *Most* of a given flow is bidirectional across WAN1 (verified by running side-by-side packet captures), but occasionally a few packets come in across WAN2. I've double-checked the VPN statistics, and WAN1 is consistently at a MOS of 4.1, while WAN2 wavers a fair bit (due to bad cellular reception) from 2.5-3.8.

 

As far as I can tell, I should never see any traffic over the Site-To-Site VPN cross WAN 2...However, I'm absolutely seeing this occur.

 

What got me looking at this in the first place was a complaint about VOIP quality from a specific spoke. I found that disabling WAN 2 cleared up the issue (which again, is a bit cruddy due to bad cellular service).

 

I know the 'fix' for the VOIP complaints is likely to either leave WAN 2 disabled (or adjust the VOIP policy to only shift traffic if WAN 1 is hard down); However...I'm stumped as to why *some* traffic from my hub to my spoke is crossing WAN 2.

 

Any insights? I do have a ticket open with Support regarding this behavior, but thought I'd ask the community as well.

5 Replies 5
ww
Kind of a big deal
Kind of a big deal

In case the phone service has ever failed over to wan2. And the phone keeps a active session it stays forever( till the sessions  times out) on wan2.

Crocker
A model citizen

I wondered about this; However, I knocked WAN2 down for 10-15 minutes (assuming that was long enough to sever any flows). When I stood it back up, I immediately started seeing this behavior.

DarrenOC
Kind of a big deal
Kind of a big deal

Hi @Crocker - what version of firmware are you running?  Sounds like a bug to me.  Have you trawled through your other SD-WAN & Traffic shaping settings to ensure you have nothing erroneous configured?

 

I would assume you have your Primary uplink set as WAN1.  For giggles have you tried flicking WAN failover and fallback from Graceful to Immediate?

 

DarrenOC_0-1690441053783.png

 

As you've correctly stated absolutely no traffic should be touching WAN2 until the Primary link fails.

Darren OConnor | doconnor@resalire.co.uk
https://www.linkedin.com/in/darrenoconnor/

I'm not an employee of Cisco/Meraki. My posts are based on Meraki best practice and what has worked for me in the field.
Crocker
A model citizen

We're running 16.16.8 across the board, due to some unrelated issues regarding MX17/18's IPv6 support and some carrier modems with really low IPv6 DHCPv6.

 

Unfortunately, that also means we don't have the option to select between graceful/immediate for WAN failover/fallback. I'm unsure what the 'default' is, in MX16.

 

I did double and triple check our SD-WAN policy, but nothing in there looks out of place. Additionally, I'm using VOIP traffic as the example in my original post but I see other traffic (SNMP between our Solarwinds poller and the MX/MS's on-site) behaving the exact same way. That traffic absolutely is not affected by an SD-WAN policy, and should only ever traverse WAN 2 if WAN 1 goes down hard.

 

When speaking with support yesterday, there was a little back-and-forth over whether this behavior was actually expected. I was told that the only real way to ensure traffic inbound to the spoke doesn't traverse WAN 2 would be to disable active/active AutoVPN. Wasn't particularly pleased with that answer, as it sort of dodges the crux of the issue.

Crocker
A model citizen

Any Meraki folks haunting the forum who have any thoughts on this? I guess basically the question boils down to:

How does a VPN concentrator determine which tunnel to use to send traffic to a destination at a spoke site with Active/Active AutoVPN enabled? Should I just expect to see occasional packets inbound over WAN2 for no apparent reason? The answer I got from support wasn't particularly clear, but I'd paraphrase it as "We don't know but we also don't think it's a problem"

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels