Asymmetric AutoVPN throughput after MX85 → MX67 HA replacement (one site only)

H311over
Conversationalist

Asymmetric AutoVPN throughput after MX85 → MX67 HA replacement (one site only)

Hi everyone,

I’m seeing a very strange one-way performance issue after replacing an MX85 HA pair with an MX67 HA pair at one of our sites and would appreciate any hints or ideas what else to check.


Environment

  • Several sites connected via AutoVPN (mixture of hub/spoke).

  • Each key site has an MX pair in HA, routed mode, behind an ISP router.

  • MX WAN and LAN connect via a switch (WAN VLAN, LAN VLAN).

  • One site (“Site A”) hosts a Windows file server.

  • Another site (“Site B”) is a main hub and terminates Client VPN for home users.

  • A third site (“Site C”) is another branch used for tests.

At Site A, we replaced an MX85 HA pair with an MX67 HA pair. Configuration was kept essentially identical (networks, AutoVPN settings, no traffic shaping, no special QoS).


Problem

Since moving Site A to MX67, we see strongly asymmetric throughput over AutoVPN for any traffic involving Site A:

  • From other sites / VPN clients to Site A: throughput is fine.

  • From Site A to other sites / VPN clients: throughput is limited to about 0.5–1 Mbit/s.

This affects:

  • SMB access to file shares on the file server in Site A (Windows Explorer freezes when opening shares, files take ages or don’t open at all).

  • RDP sessions that go through Site A (they lag badly and sometimes disconnect).

  • Copying medium files (100–200 MB) from Site A to other sites is extremely slow, while copying in the opposite direction is OK.

Latency looks fine:

  • From Site B to the file server in Site A: ~25 ms, 0% loss (32-byte ping).

  • From Site C (e.g. US) to Site A: ~100 ms, 0% loss (32-byte ping).

Path MTU tests (ping with DF and increasing size):

  • Between some sites we get “Packet needs to be fragmented” for larger packets; effective MTU on the path is ~1300–1350 bytes.

  • Meraki support told me that:

    • MTU on the MX67 in Site A is currently 1500.

    • MTU on at least one remote uplink is 1492.

    • They suggested reducing Site A MTU to 1492, which I plan to do.

However, the performance issue feels too drastic to be “just MTU”, and the captures don’t show obvious fragmentation.


iPerf3 measurements

All tests are AutoVPN traffic involving Site A and run with iperf3 -P 4 -t 30.

  1. Remote user at home → Client VPN to Site B → AutoVPN to file server in Site A

  • Client → Site A (iperf3 -c file-server):

    • ~20–25 Mbit/s total.

  • Site A → client (iperf3 -c file-server -R):

    • ~0.5–1 Mbit/s total.

  1. On-prem PC in Site B LAN → AutoVPN to Site A

  • Site B → Site A:

    • ~80–100 Mbit/s.

  • Site A → Site B (-R):

    • ~0.5–1 Mbit/s.

  1. Third site (Site C) → AutoVPN to Site A

  • Same pattern: one direction is tens of Mbit/s, the opposite direction is around 1 Mbit/s.

So the problem is consistently: traffic originating from Site A towards the rest of the network is extremely slow, regardless of whether the other side is a branch, a hub, or a Client VPN user.


What we already checked

  • Switch ports where MX67 connects (WAN and LAN):

    • Speed/duplex correct (1G/full), no CRC errors, no obvious interface errors, no link flaps.

  • Config on the HA pair:

    • Both MX67 units have effectively identical configuration.

  • During some earlier tests, when we temporarily ran only one MX67 (no HA, maintenance situation), RDP felt noticeably faster; after bringing the HA pair back up, the very slow behaviour returned. So I’m not sure if HA plays a role here or it was just coincidence.

  • There are no traffic shaping rules or per-tunnel limits configured on AutoVPN.

  • No special QoS or SD-WAN policies that should cap bandwidth.


Packet capture

I captured traffic on the Site-to-Site VPN interface of the MX67 at Site A during iPerf tests. High-level observations:

  • AutoVPN UDP encapsulation carries the iPerf TCP flows.

  • TCP MSS on SYN/SYN-ACK is around 1386, which matches a reduced path MTU due to encapsulation.

  • Most data packets are around 1426-byte IP payload and I don’t see IP fragmentation in the capture.

  • I don’t see obvious massive retransmissions bursts that would explain such a hard ~1 Mbit/s ceiling, but I may be missing something.

I can share sanitized PCAPs if needed.


Question for the community

Has anyone seen something like this:

  • One specific MX67 HA site where AutoVPN throughput is fine in one direction and capped around 1 Mbit/s in the other,

  • while latency and small-packet ping looks normal,

  • switches show no errors,

  • and the issue affects both site-to-site traffic and Client VPN users who reach that site through a hub?

Any ideas what else I should check on:

  • MX67 HA configuration (any known bugs or gotchas with HA and AutoVPN throughput)?

  • MTU settings (would mismatched 1500/1492 on uplinks realistically cause this exact kind of one-way behaviour even when MSS looks sane)?

  • Hidden counters/logs for queueing, drops, or shaper on the MX uplinks?

  • Known issues with specific firmware versions on MX67 related to AutoVPN or asymmetric throughput?

Any pointers or similar experiences would be very helpful. Thanks!

10 Replies 10
ww
Kind of a big deal
Kind of a big deal

Can you try iperf with lower packet size from a to b. For example 1300bytes  (-l 1300)

 

H311over
Conversationalist

B - A
[ ID] Interval Transfer Bitrate
[ 5] 0.00-30.01 sec 122 MBytes 34.1 Mbits/sec sender
[ 5] 0.00-30.09 sec 122 MBytes 34.0 Mbits/sec receiver
[ 7] 0.00-30.01 sec 56.7 MBytes 15.8 Mbits/sec sender
[ 7] 0.00-30.09 sec 56.5 MBytes 15.8 Mbits/sec receiver
[ 9] 0.00-30.01 sec 83.8 MBytes 23.4 Mbits/sec sender
[ 9] 0.00-30.09 sec 83.5 MBytes 23.3 Mbits/sec receiver
[ 11] 0.00-30.01 sec 70.4 MBytes 19.7 Mbits/sec sender
[ 11] 0.00-30.09 sec 70.0 MBytes 19.5 Mbits/sec receiver
[SUM] 0.00-30.01 sec 333 MBytes 93.1 Mbits/sec sender
[SUM] 0.00-30.09 sec 332 MBytes 92.5 Mbits/sec receiver
A - B

 

[ ID] Interval Transfer Bitrate
[ 5] 0.00-30.04 sec 1.07 MBytes 299 Kbits/sec sender
[ 5] 0.00-30.01 sec 1.01 MBytes 282 Kbits/sec receiver
[ 7] 0.00-30.04 sec 1.09 MBytes 303 Kbits/sec sender
[ 7] 0.00-30.01 sec 1.02 MBytes 286 Kbits/sec receiver
[ 9] 0.00-30.04 sec 1.58 MBytes 442 Kbits/sec sender
[ 9] 0.00-30.01 sec 1.52 MBytes 424 Kbits/sec receiver
[ 11] 0.00-30.04 sec 2.04 MBytes 570 Kbits/sec sender
[ 11] 0.00-30.01 sec 1.98 MBytes 553 Kbits/sec receiver
[SUM] 0.00-30.04 sec 5.78 MBytes 1.61 Mbits/sec sender
[SUM] 0.00-30.01 sec 5.53 MBytes 1.55 Mbits/sec receiver

alemabrahao
Kind of a big deal
Kind of a big deal

BPDU Guard or STP misconfigurations on switch ports can block VRRP packets, causing both MXs to act as active or creating unstable forwarding paths. This can lead to traffic blackholing or severe performance degradation in one direction.
Check if STP guard or filtering is enabled on the switch ports connected to MXs. Disable BPDU Guard for those ports.

 

AutoVPN can create asymmetric paths if hub priorities differ or if failover logic misbehaves. Normally this isn’t fatal, but combined with HA or MTU issues, it can cause severe slowdowns.

 

Capture AutoVPN UDP flows on both ends and check for retransmissions or out-of-order packets.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
H311over
Conversationalist

I tested it with the BPDUs on the STP disabled — nothing changed.
In the logs, the VRRP transition is observed only when I manually swap the MX devices in the menu.

alemabrahao
Kind of a big deal
Kind of a big deal

I suggest you open a support case.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
H311over
Conversationalist

I opened it a couple of days ago 🙂

RWelch
Kind of a big deal
Kind of a big deal

What MX firmware version or versions are the MXs currently running?  

 

Are all MXs running the same MX firmware (or are they different versions)?

Have you ran simultaneous packet captures on site A and site B and compared results?

 

If this were my project, I'd be inclined to do a factory reset on both MX67s and force it to fetch a new dashboard config just to make sure it's previous settings/config aren't still lingering somehow.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
H311over
Conversationalist

The current firmware versions 19.1.11 are installed on all devices. Measurements and tests from both sides showed the same picture, with a huge difference between download and upload speeds

PhilipDAth
Kind of a big deal
Kind of a big deal

On both the server and the workstations, enable TCP timestamping (helps a lot with asymmetric performance issues).

 

netsh int tcp set global timestamps=enable 

 

H311over
Conversationalist

I did it, but it didn’t affect the measurement results at all.

Get notified when there are additional replies to this discussion.