I am currently testing the SD-WAN capabilities of Meraki and its abilities to select the best link based on WAN links’ performance.
I have 2x low cost WAN links on a remote site 50/20Mbit and a high-performance Fiber link 100/100Mbit on the Hub site. Both sites are configured to use NAT, while the HUB is configured as a Hub and remote site as Spoke.
On the remote site, I have defined WAN1 as the primary uplink and have enabled Load balancing.
5x VPN traffic Uplink selection policies have been defined, each one configured to load balance on uplinks that are suitable for their respective performance class. Uplink selection policies have both application based and custom definitions. For example for the Voice, I have defined to use the uplink that’s best for VoIP traffic, using the traffic filters: Skype, SIP (Voice), (UDP from Any to Any:5060-5061)
5x performance classes have been defined, including the default voice class.
Additionally, 6x traffic shaping rules have been defined, reflecting the above Uplink selection policies (same application assignments), where Voice (EF) and AF41 classes are assigned as high Priority (2/7 of the total bandwidth each), AF31 and AF21 are assigned to normal (1/7 each) and the rest of the traffic as Low. No bandwidth limits have been applied.
Based on my understanding of the documentation, I would expect that the outgoing traffic through the VPN should be marked as defined on the traffic shaping rules and in the occasion of link congestion the bandwidth allocation should be as defined in the priority classes (for example on Voice’s case it would be 20Mbit x 2/7 = 5.7Mbit).
First question: Given that I have 2x 50/20Mbit WAN links, would that bandwidth be 2 x 5.7Mbit? What is the expected behaviour?
Second question: On the Security appliance/ VPN Status, shouldn’t I see the applied policy per connection, as defined above?
Reference Topology and settings' screenshots:
With regard to your second question, go:
Security Appliance/VPN Status
You will see the flow decisions on the bottom.
If you click on an individual VPN link (except for on the description field) on that same page you should see something like this:
Hi @PhilipDAth,
My question is why the Policy that appears to be applied doesn't much to the defined one. As shown on the previous post's configuration, both ICMP protocol and DNS (any - any UDP/TCP 53) are configured to use the Load balance on uplinks that are suitable for "LOW AF21" Uplink selection policy, however only ICMP appears to have that policy applied.
I can see in your case, @PhilipDAth that the policy for RDP is applied to the VPN connections:
While in my case it appears that it hasn't been applied for some reason:
Is the traffic you are trying to apply it to going over AutoVPN, or over the Internet?
@PhilipDAth wrote:Is the traffic you are trying to apply it to going over AutoVPN, or over the Internet?
AutoVPN
You need to configure the VPN flow preferences under:
Security Appliance/Traffic Shapping/Flow Preferences/VPN Traffic
Like this:
@PhilipDAth wrote:You need to configure the VPN flow preferences under:
Security Appliance/Traffic Shapping/Flow Preferences/VPN Traffic
Like this:
I noticed that you haven't specified a port in your traffic filter rules. If I use a generic definition of something like 192.168.0.0/16:any to 192.168.0.0/16:any, it will work. However, I am trying to make it work based on either a specific port (TCP/UDP:3389) or a specific protocol (Remote Desktop). Generally I have specified 5 different performance classes based on applications and protocols and the expected performance, including SNMP, Citrix, Remote Desktop, DNS etc etc
For example:
You should be able to specify a port - but you will probably have to wait for any existing cached flows to age out first. Try configuring it, and then giving the MX a reboot to make it take affect immediately.
Note that modern RDP clients use UDP/3389 for their main transport - not TCP/3389. So create two rules, one matching UDP/3389 and TCP/3389. That is assuming that you don't have an RDS gateway, and then it will use TCP/443.
In my case, the RDP server is a dedicated RDP server, so matching the whole IP address is more straightforward.
@PhilipDAth wrote:You should be able to specify a port - but you will probably have to wait for any existing cached flows to age out first. Try configuring it, and then giving the MX a reboot to make it take affect immediately.
Note that modern RDP clients use UDP/3389 for their main transport - not TCP/3389. So create two rules, one matching UDP/3389 and TCP/3389. That is assuming that you don't have an RDS gateway, and then it will use TCP/443.
In my case, the RDP server is a dedicated RDP server, so matching the whole IP address is more straightforward.
I have tried using both Meraki's "Remote Desktop" traffic filter and protocol/port based definition with the same result, even after rebooting the MX
I have also tried splitting the protocols to UDP and TCP or using more specific expressions like 192.168.134.0/24 UDP/3389 to 192.168.0.0/16 any. Same result
Have you tried using 13.28, the current stable release candidate? I haven't used the 14.x series of beta code yet.
@PhilipDAth wrote:Have you tried using 13.28, the current stable release candidate? I haven't used the 14.x series of beta code yet.
I just installed 13.28, same result. The Policy applied is Fail over if uplink is down rather than the defined one
These are question I hope to get answered. The specifics on flows and connections and how it load balances and prefers more specific rules needs to be explained better.
I do know there is no packet replay or resets going on to manipulate the stream so I’m more focused on the decision making process in circumstances.
There are also other considerations when configuring multiple datacenters OR your headend has multiple ISPs. The traffic rules would need to either match or be configured appropriately for the links. Therefore you could have a different ISP route egress vs ingress. We also do keep all AutoVPN tunnels up and active even if “idle”.
We are working internally on more detailed customer facing documentation as we see more use cases like this. We do appreciate the feedback and patience as we get your answers.
That's the network design and what I've been trying to achieve.
It would be really useful if there was an option where we could monitor, for troubleshooting purposes, the traffic shapping rules statistics: que depth, total drops, no buffer drops, exheed drops, drop rate etc.
I am still waiting to hear back from the PM team on the behavior of the SD-WAN setup you have.
I saw you had rules in there for EF, keep in mind if your already marking DSCP egress from the phone there is no need to re-mark, which is what adding in a rule in there would do. When the MX receives the marking it will apply it to the appropriate queue.
As it pertains to your other ask, "que depth, total drops, no buffer drops, exheed drops, drop rate" this is not something we are looking into providing visibility on near term. Our focus is on simplicity, providing this visibility with no easy way to consume the data makes the solution more complex. The only need for this information is if there are issues, when there are issues Meraki support is involved to provide resolution.
What a majority of our customers are asking for is "how are my vital applications performing and tell me when it's not performing well" through the overlay or direct to the internet. Application issues could be tied back to a multitude of problems in and out of the network. In the network we want Meraki own the problem and resolution. We want to make it easier for you to prioritize your applications and be notified when they are not working to expectation; which means help desk calls.
I would stay tuned and keep an eye on our announcements in the next month. We may not have exactly what your looking for around QOS/COS visibility, however we may have something that enhances your ability to support the applications that are important to your business. This would be end-to-end visibility not per hop metrics your asking for.
Another bug that I want to point out in this scenario where 2xWAN links and a cellular interface are used is that if "Prefer WAN 1. Failover if poor performance for <performance class>" is defined, instead of "Load balance on uplinks that are suitable for <performance class>", traffic will never failover to the cellular interface.
In the version you have access to we do not perform AutoVPN over USB-Cellular. It is working as expected.
Also, there are different types of failover. So please refer to the document below that explains how connection monitoring works. This is not how the failover withing the AutoVPN- SDWAN overlay works, just internet and fail to cellular. I'm not sure how your testing failover but I assume you had either a soft or hard failure for both WAN1/2.
@DCooper wrote:In the version you have access to we do not perform AutoVPN over USB-Cellular. It is working as expected.
I performed a hard failure test, pulling off both WAN1/2 cables. It failed over USB-Cellular interface and when I had configured the uplink selection policies to use "Load balance on uplinks that are suitable for <performance class>" I was able to access the remote resources through VPN
I knew the AutoVPN over usb was coming, just didn't realize it was in this public beta. So did it only fail over when the rules were there? I assumed it would failover to USB and be all or nothing and not pay any attention to load balancing or rules you have setup. I haven't tested but have it in the lab so can reproduce what you have setup.
@DCooper wrote:I knew the AutoVPN over usb was coming, just didn't realize it was in this public beta. So did it only fail over when the rules were there? I assumed it would failover to USB and be all or nothing and not pay any attention to load balancing or rules you have setup. I haven't tested but have it in the lab so can reproduce what you have setup.
It currently runs v13.28. If I define a prefered link (WAN1) at the uplink selection policies, it looks like it's only trying to use WAN2 if it WAN1 fails. Load balance on uplinks option though, does utilize cellular in a case of WAN1/2 failure.
Edit: Same thing applied on the earlier beta version too v14.20
Any update from the PM team?
@Billy Going to need some additional time on this one. The question is rolling up to our Eng team who builds the feature.
Any updates on this one? I'm looking into deploying that design on a site in the near future and knowing beforehand how this solution is expected to perform would be quite useful.
Thanks