Reported packet loss and SD-WAN performance

BazMonkey
Getting noticed

Reported packet loss and SD-WAN performance

We've been having this problem where spoke to spoke traffic on our 1G internet linked sites are seeing traffic in one direction performing at about 400Mbit's  (iPerf3 testing) The other direction is about 40-50Mbit's. This is using TCP traffic. No bandwidth policies in use.

 

The problem does affect all other sites on SD-WAN but we've been testing higher capacity sites.

 

When using UDP we are seeing 900Mbit's in one direction and about  400Mbit's in the other which is much better. I'd expect UDP to perform better but not this much?

 

We've got 700+ sites to the one HA pair hub.

 

We' are upgrding to 16.16 in a few days on the hubs. The spokes are already at this version. Just something else to try.

 

We are seeing in the dashboard, occasional reports of packet loss across one of our data centre switches the hub is behind. All are tests are performed during quiet and busy network periods but the results are about the same.

 

There are two Fortigate firewalls in the path at the hub end but we have been assured no issues in this space. Link capacity across the DC is fine and all devices and interfaces not reporting over use. 

 

We've  got some Xmit-errors reported on a switch the VPN's traverse in the DC but it's intermittent and failry low.

 

We've got a TAC case open and they have been focussing on the packet loss which we had bubbling around the 1-3% mark. This packet loss is not evident at those levels anymore.

 

The internet is lossy but should small loss affect SD-WAN performance at the level being seen?

Anyone experienced similar issues. 

5 REPLIES 5
PhilipDAth
Kind of a big deal
Kind of a big deal

This can sometimes be caused by content filtering or IPS (iperf3 might be using a port causing a lot of usage for a particular signature).

 

If you turn IPS and content filtering off temporarily, does the issue go away?

Hi Philip. We've killed IPS/IDS and content filtering for some testing which did not result any in much inprovement.

cmr
Kind of a big deal
Kind of a big deal

What model are you running on the hub, what firmware at the moment and what mode is it in (VPN concentrator I presume)? 

BazMonkey
Getting noticed

We are running MX450's as a HA pair in concentrator mode running 16.16.

BazMonkey
Getting noticed

UPDATE - We finally got a resolution to the problem. Seems all trains of version 17 of the MX firmware had issues. Since moving to 18.x we seen a massive drop in MX device utilization.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels