Hey
For the past week I've been troubleshooting some performance issues on AutoVPN over a 4G connection with an MG41.
After battling with it for numerous days, we may have gotten close to identifying what's going on.
Backstory
We have a Test PC that's cabled directly to an MX which acts as a Spoke. The MX builds an AutoVPN tunnel to a One-Armed COncentrator Hub in the customer DC. The Hub has a uplink to a pair of Cisco FTDs.
When the Spoke MX is connected with cable to an ISP device and AutoVPN Default Route is enabled, we don't see performance issues, and webpages can easily load.
When the Spoke MX is connected to a MG41, and AutoVPN Default Route is enabled, webpages on the Test PC can barely load, if they load at all.
All these issues seem to be related to HTTPS traffic, yet this is not conclusive. Cisco (and infact most other vendors) seem to have an open bug regarding modern cryptograpic methods in HTTPS and TLS1.3 Since Chromium (and by extent Chrome and Edge) have added support for these Quantum Cryptographic methods the TLS header is larger than usual, resulting in a larger packet that ends up being fragmented. It seems that for some reason the smaller packet (typically around 400 bytes) seems to be processed faster than the larger (1492 bytes) and thus hits the wire before the larger packet, resulting in the latter part of the original packet arriving at the FTD before the former part. Due to this, FTD drops the packet, as this is seen as some form of IDS/IPS attack. See this bug: https://quickview.cloudapps.cisco.com/quickview/bug/CSCwj82736
Now, this is all mainly Cisco Classic. What does it have to do with Meraki? Here's the kicker..
We are contributing all this to the fact that the MTU size on the MG is smaller due to the Cellular nature of it, as compared to the wired uplink on the MX. However, we are still seeing some odd stuff gping on, on the packet level.
When the AutoVPN Default Route is disabled (Local Internet Breakout via MG41) everything is great again.
On the Test PC we have a pcap, that shows a TLS1.3 Client Hello, with packet size 1813 bytes (TCP payload size 1759 bytes). On the Spoke MX LAN we see the same packet, fragmented into two packets 1438 bytes and 437 bytes respectively. Both these have a TCP payload size 1380 and 379 bytes, respectively. 1380+379 = 1759 = same size as the original tcp payload.
In the figure we have the Test PC on the left-hand side, and the MX Spoke LAN on the right-hand side. Even though the DF bit is set on the Test PC, the packet is fragmented anyway when it lands on the MX Spoke LAN.
In this next screen show, we have the Hub MX on the lef-hand side, and the Cisco FTD on the right-hand side.
Here we see the same packets from the MX Spoke LAN in reverse order when exiting the Hub MX. IN the FTD ingress interface we see the exact same as the Hub MX, meaning nothing is changed on the path from Hub MX to FTD.
Changing the MTU size on the Test PC seems to do wonders, and infact fixes everything. Unfortunately, this is just not a feasible solution in the long run.
My main question is, has anyone else seen something like this?
From what I gather, three things are going on here; 1) something is borked with Path MTU Disovery, as the MTU is not being set correctly, 2) AutoVPN tunnel is reversing the order of packets, and 3) the FTD is dropping the packet as a result of "URL Filtering and TLS Server Identity Discovery" (https://www.cisco.com/c/en/us/td/docs/security/secure-firewall/management-center/device-config/710/m...)
Or, am I simply barking up the wrong tree here?
LinkedIn :::
https://blog.rhbirkelund.dk/Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution
🙂All code examples are provided as is. Responsibility for Code execution lies solely your own.