SD-WAN uplink vs tunnel selection

Solved
GIdenJoe
Kind of a big deal
Kind of a big deal

SD-WAN uplink vs tunnel selection

I have a question about how the uplink traffic is sent over SD-WAN.  I'm hoping a Meraki employee could also give an insight on this.

My example below is the following and I have simplified it to one spoke site and the hub.
So we have MX'es on all sites with both uplinks in use.
WAN1 is connected to an MPLS provider that provides an internet breakout on the MPLS so that autoVPN tunnels can be formed over the WAN1 uplinks.
WAN2 is connected to the public internet.

Not counting every SA that would be made for every direction and local subnet you should have 4 logical connections.  From the hub WAN1 to WAN1 on the spoke, WAN1 hub to WAN2 spoke, WAN2 hub to WAN1 spoke and WAN2 hub to WAN2 spoke.

When you define SD-WAN uplink policies you can choose your uplink based on traffic matching criteria.  However this only selects your outgoing WAN interface.  You have 2 logical tunnels from that uplink towards both WAN uplinks on the other side.  So how does the MX handle this and is this configurable?  As you can see, on the right part of the drawing, for traffic going from WAN1 to the other side on WAN2 it has to break out of the MPLS and route through the internet to the other side.

Another question: outside of actually performing a packet capture like I did below, is there a way to see which logical tunnel the traffic takes?  The uplink selection page only shows the selected uplink, and the uplink stats only shows latency, jitter, packet loss and MOS score.  No traffic utilization and it's not always as clear which tunnel is always shown.  You can clearly see there is actual traffic crossing from the private MPLS IP's to the public address of the other MX.
SDWAN-question.png

1 Accepted Solution
Raj66
Meraki Employee
Meraki Employee

Hi @GIdenJoe, That is a good question and a good representation of your thoughts.

 

So, even though there are two tunnels establishes on both uplinks and even while doing active-active VPN, We can only control outbound traffic, the inbound traffic will always come to the primary WAN interface.

 

Let us consider your analogy and assume WAN 1 is the primary WAN interface on both the hub and the spoke (This is configuration under "security & SD-WAN > SD-WAN and traffic shaping") then the traffic will flow in a below-specified way.

 

No SD-WAN policies configured:

Traffic will flow from WAN1 of one site to WAN1 of the second site

 

SD-WAN policy configured to send traffic  over WAN2:

Traffic will from WAN2 of one site to WAN1 of the second site

 

When WAN1 is down, traffic will flow to the WAN2 interface to the spoke site

 

I hope this answers your question, let me know if you have any questions.

 

Cheers!

 

Raj

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it

View solution in original post

11 Replies 11
Raj66
Meraki Employee
Meraki Employee

Hi @GIdenJoe, That is a good question and a good representation of your thoughts.

 

So, even though there are two tunnels establishes on both uplinks and even while doing active-active VPN, We can only control outbound traffic, the inbound traffic will always come to the primary WAN interface.

 

Let us consider your analogy and assume WAN 1 is the primary WAN interface on both the hub and the spoke (This is configuration under "security & SD-WAN > SD-WAN and traffic shaping") then the traffic will flow in a below-specified way.

 

No SD-WAN policies configured:

Traffic will flow from WAN1 of one site to WAN1 of the second site

 

SD-WAN policy configured to send traffic  over WAN2:

Traffic will from WAN2 of one site to WAN1 of the second site

 

When WAN1 is down, traffic will flow to the WAN2 interface to the spoke site

 

I hope this answers your question, let me know if you have any questions.

 

Cheers!

 

Raj

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it
GIdenJoe
Kind of a big deal
Kind of a big deal

Great, thanks for that clarification.

Follow up question:

As you can see in the wireshark output there is actual traffic going from a private IP = MPLS WAN interface towards a public IP (91.x.x.x) = public internet WAN2 on the other MX.  This seems way less common but does happen.

 

The site is about the same as the other encapsulated packets which contain Windows RDP traffic. So there is traffic flowing back from the primary WAN towards the secondary WAN.  Would you contribute that to keepalive traffic or does the MX keep track of flows inside the tunnel. So policy router traffic leaving WAN2 and entering WAN1 on the other side getting response over the same tunnel?

 

I also assume that if WAN1-WAN1 tunnel goes down traffic can still exit WAN1 but go through tunnel towards WAN2?

ali_abbass85
Getting noticed

Hi @GldenJoe

We have exactly the same situation with MPLS and Internet on both Hub and Spoke with SDWAN policies configured, however what was annoying us was the tunnel going from inside the MPLS on WAN1 Spoke to the internet on WAN2 on Hub, we contacted Meraki Support and they advised that this cannot be turned off (we only wanted 2 tunnels WAN1-to-WAN1 and WAN2-to-WAN2), we ended up by blocking the Public IPs of the Hub WAN2 internet link from the MPLS breakout FW, this way we had 3 tunnels rather than 4. I believe the spoke MX would use the tunnel with lower RTD on the same interface, even so I had no confirmation either.

We have a point I believe there should be a way to control how many tunnels should go through each WAN interface.

Raj66
Meraki Employee
Meraki Employee

Yes, that traffic that is heading to WAN2 IP of the other side is the keepalive traffic to make sure the tunnel is up and again yes, in case of a failover(when WAN1 goes down), the traffic will seamlessly switch to WAN2

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it
GIdenJoe
Kind of a big deal
Kind of a big deal

Ok, thanks again!

GIdenJoe
Kind of a big deal
Kind of a big deal

@Raj66, I found a problem with this behavior.

What if you need to define WAN 2 as primary to have the bulk internet traffic leave that way but you want that time sensitive VPN traffic not only to leave your WAN1 MPLS and enter the other side via WAN1 MPLS but the other side also has WAN2 as primary for the same reason.

 

I believe downlink saturation on WAN2 on the other side may very well starve your incoming delay and loss sensitive traffic.

Is there a way to escalate this in Meraki to make this behavior configurable outside of using the non-feedback make a wish button?

Raj66
Meraki Employee
Meraki Employee

@GIdenJoe I will try to get this feedback up on my side, but the best way to approach is via "Make a wish" as that will go directly to our product team. 

 

Meanwhile, One way you can try to achieve this is by leveraging flow preferences. We can play with the option a little to mould the outgoing traffic on interested interfaces based on the requirement.

 

Cheers!

 

Raj

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it
GIdenJoe
Kind of a big deal
Kind of a big deal

Sorry to bring this topic back up but...
Yesterday I did a demonstration for a group of customers about SD-WAN.

I had a setup with a hub and spoke WAN between 4 sites using a mixture of MX'es (250, 84, 68, 67C)
All of them had a primary WAN going into a cisco router of mine each with their own little subnet (simulating an MPLS with a single breakout IP) and a second connection going to a switch going to another ISP.

Before the demonstration I did some testing using iperf server on a laptop in the HQ site and another laptop in one of the sites I could send continuous heavier traffic to test policies out and found the following:

Even if the HQ site had WAN2 defined as primary.  When the traffic in the branch site was being routed over WAN1, it also arrived at WAN1 on the HQ site.  I tested this with captures at first but then I could just look at the uplink stats page of HQ and see the color if the traffic downstream.

We tested the other way around but the results were consistent.  So I can only conclude the MX chooses to send from WAN1 to WAN1 or WAN2 to WAN2 based on the public IP or performance metrics instead of uplink preference on the other side.

The next test I did was running the test longer and then disconnecting a local uplink.  The traffic was switched to the other WAN immediately because of the layer 1 down status of the WAN link.

Final test was disconnecting the receiving WAN link on HQ and there we had two results.

Using UDP: the traffic stopped being received for between 20 to 25 seconds and resumed on the cross VPN link after that.
Using TCP: the connection failed after the link was switched to the cross link (reset by peer), this however could be due to the behavior of iperf.

So long story short:  If you have an MPLS where you overlay Meraki SD-WAN having a single breakout IP don't worry. Traffic leaving one MX onto the MPLS will be routed to the other site on that same MPLS and not crossed over to the internet unless the MPLS link on the other site is down.

whistleblower
Building a reputation

Hi,

 

it`s an "older" post but just up to date for me 🙂

please allow me two questions...

 

1)


@GIdenJoe wrote:
From the hub WAN1 to WAN1 on the spoke, WAN1 hub to WAN2 spoke, WAN2 hub to WAN1 spoke and WAN2 hub to WAN2 spoke.

Do this description also applies to a deployment with a MX in 1-armed vpn concentrator mode (hub) /w two possible uplinks (MPLS + Internet) and a MX in NAT mode (branch) also with two uplinks (MPLS + Internet)?

 

2)


@GIdenJoe wrote:

So long story short:  If you have an MPLS where you overlay Meraki SD-WAN having a single breakout IP don't worry. Traffic leaving one MX onto the MPLS will be routed to the other site on that same MPLS and not crossed over to the internet unless the MPLS link on the other site is down.


Does anyone know, if there is actually also an official documentation that describes the behavior @GIdenJoe explicates?

GIdenJoe
Kind of a big deal
Kind of a big deal

I can answer no 1 immediately.

 

In one armed concentrator mode, you only have WAN1.  So you can only terminate two tunnels from each spoke to an MX node running this mode.  How it is routed upstream is entirely up the the underlay.  I have to assume if incoming traffic is received on the one armed concentrator coming from WAN1 of the branch, that the return traffic will be sent to the same tunnel.

 

for question number two: this was not based on any documentation.  I was able at that time to do a poc for a client with demo MX'es and we simply observed the behavior by using iperf testing and checking the uplink statistics on both MX'es at the same time.

There was a router between the MX'es that connected WAN1 of both MX's and an uplink from the router to the internet where NAT was applied.  So that is how both MX's WAN1 was behind the same public IP.

Bruce
Kind of a big deal

>> I have to assume if incoming traffic is received on the one armed concentrator coming from WAN1 of the branch, that the return traffic will be sent to the same tunnel.

 

This is definitely the case. It’s how if traffic originates at the concentrator end it eventually ends up on the ‘correct’ (based on the SD-WAN rules at the spoke) site. Initially traffic initiated at the VPN concentrator end is just placed into one of the two tunnels to the spoke (no logic is applied). The SD-WAN rules are then applied to traffic as it returns from the spoke to the VPN concentrator and it’s put into the ‘correct’ tunnel. Then when the traffic is received at the VPN concentrator it then know the tunnel to use based on the rules applied by the spoke.

 

>> for question number two: this was not based on any documentation.  I was able at that time to do a poc for a client with demo MX'es and we simply observed the behavior by using iperf testing and checking the uplink statistics on both MX'es at the same time.

 

I haven’t seen this documented anywhere either, but have heard that the MXs will always try and make the connection to the actual IP address assigned to a WAN interface first. If that fails they’ll then try and use the public IP address (I’ve never done a packet capture to see if this is actually true though). The VPN registry provides both the public address that it sees, and the IP address assigned to the WAN interface to the peer MXs (the MXs themselves send their Interface IP address to the VPN registry as part of the registration process).

Get notified when there are additional replies to this discussion.