Meraki

rhbirkelund · ‎Jul 4 2024

Hey

For the past week I've been troubleshooting some performance issues on AutoVPN over a 4G connection with an MG41.

After battling with it for numerous days, we may have gotten close to identifying what's going on.

Backstory

We have a Test PC that's cabled directly to an MX which acts as a Spoke. The MX builds an AutoVPN tunnel to a One-Armed COncentrator Hub in the customer DC. The Hub has a uplink to a pair of Cisco FTDs.

When the Spoke MX is connected with cable to an ISP device and AutoVPN Default Route is enabled, we don't see performance issues, and webpages can easily load.

When the Spoke MX is connected to a MG41, and AutoVPN Default Route is enabled, webpages on the Test PC can barely load, if they load at all.

All these issues seem to be related to HTTPS traffic, yet this is not conclusive. Cisco (and infact most other vendors) seem to have an open bug regarding modern cryptograpic methods in HTTPS and TLS1.3 Since Chromium (and by extent Chrome and Edge) have added support for these Quantum Cryptographic methods the TLS header is larger than usual, resulting in a larger packet that ends up being fragmented. It seems that for some reason the smaller packet (typically around 400 bytes) seems to be processed faster than the larger (1492 bytes) and thus hits the wire before the larger packet, resulting in the latter part of the original packet arriving at the FTD before the former part. Due to this, FTD drops the packet, as this is seen as some form of IDS/IPS attack. See this bug: https://quickview.cloudapps.cisco.com/quickview/bug/CSCwj82736

Now, this is all mainly Cisco Classic. What does it have to do with Meraki? Here's the kicker..

We are contributing all this to the fact that the MTU size on the MG is smaller due to the Cellular nature of it, as compared to the wired uplink on the MX. However, we are still seeing some odd stuff gping on, on the packet level.

When the AutoVPN Default Route is disabled (Local Internet Breakout via MG41) everything is great again.

On the Test PC we have a pcap, that shows a TLS1.3 Client Hello, with packet size 1813 bytes (TCP payload size 1759 bytes). On the Spoke MX LAN we see the same packet, fragmented into two packets 1438 bytes and 437 bytes respectively. Both these have a TCP payload size 1380 and 379 bytes, respectively. 1380+379 = 1759 = same size as the original tcp payload.

In the figure we have the Test PC on the left-hand side, and the MX Spoke LAN on the right-hand side. Even though the DF bit is set on the Test PC, the packet is fragmented anyway when it lands on the MX Spoke LAN.

In this next screen show, we have the Hub MX on the lef-hand side, and the Cisco FTD on the right-hand side.

Here we see the same packets from the MX Spoke LAN in reverse order when exiting the Hub MX. IN the FTD ingress interface we see the exact same as the Hub MX, meaning nothing is changed on the path from Hub MX to FTD.

Changing the MTU size on the Test PC seems to do wonders, and infact fixes everything. Unfortunately, this is just not a feasible solution in the long run.

My main question is, has anyone else seen something like this?

From what I gather, three things are going on here; 1) something is borked with Path MTU Disovery, as the MTU is not being set correctly, 2) AutoVPN tunnel is reversing the order of packets, and 3) the FTD is dropping the packet as a result of "URL Filtering and TLS Server Identity Discovery" (https://www.cisco.com/c/en/us/td/docs/security/secure-firewall/management-center/device-config/710/m...)

Or, am I simply barking up the wrong tree here?

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

ww · ‎Jul 4 2024

What is the mtu of your cellular connection?

rhbirkelund · ‎Jul 4 2024

Whatever is default - it's not been changed, as we're trying to avoid changing too much in the backend which is not visible.

Besides, this shouldn't be an issue with Path MTU Discovery which attempts to probe the path for the available MTU, and setting it based on its findings.

At least, that's my understanding of it.

http://shouldiblockicmp.com/

https://packetlife.net/blog/2008/aug/18/path-mtu-discovery/

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

ww · ‎Jul 4 2024

You say its lower then landline. But i dont think thats the case, so could you verify that. If its lower then 1500 you have a issue

I would not rely on Mtu discovery . It doesnt work most times.

And if it would , and would go through a mx tunnel it might think it can use a higher value then the underlay supports(tunnel just asumes 1500-68 overhead and it uses 1432 adjust-mss). Because the tunnel itself doesnt use df bit and doesnt know it gets fragmented

RaphaelL · ‎Jul 4 2024

PMTUD is great on paper but rarely works. Your issues is 100% related to MTU.

You should take a capture on the MX LAN side and Site-To-Site VPN to see if there is any MSS clamping happening. If the MSS is still 1460 on the VPN side you have 2 problems.

1- The MX doesn't clamp the MSS for the VPN

2- The MX doesn't clamp the MSS to account the MG lower MTU.

MSS in AutoVPN should be around MTU-40-68 bytes ( was 64 prior to MX16-17 ).

RaphaelL · ‎Jul 4 2024

When you are not in full tunnel mode , the MG sees the packets ( they are not encrypted in the tunnel ) so It can clamp the MSS. Just wondering how the MX will react when using a MG... never used one. But I know that the MX will lower the MSS when using the cellular modem from a MX68CW.

Careful with the MTU since it will affect the whole AutoVPN domain ( doesn't make much sense unless you are in a full-mesh architecture but hey... )

rhbirkelund · ‎Jul 4 2024

I suppose the solution then is to simply ask Support to decrease the AutoVPN MTU size, and then let that be it. If we decrease the MTU size on the Test PC it seems to work anyway.

It’s just that this solution annoys me, since there is no visibility in this whatsoever. Who’s to say that in two years time, something happens. Who’s to remember that back in 2024, someone asked to decrease the MTU size?
It’s a setting that there’s no possible way to verify, other than having to open a case with Meraki again.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

RaphaelL · ‎Jul 4 2024

Yep it annoys me too. We are having the same issues.

Our VPN tunnels are going Akamai Prolexic NDDoS solution. We have to subtract the GRE tunnel overhead , so we lowered every MTU on the HUBs. Not on the spoke because it will affect the local breakout performance.

RaphaelL · ‎Jul 4 2024

AlexP confirmed that the MX performs PMTUD which even with MX 18.211.2 is not working on my lab. https://community.meraki.com/t5/Security-SD-WAN/MX-Path-MTU-under-1500-and-AutoVPN/m-p/190069

I tried opening tickets in the past but felt like I was wasting my time and energy and ended up lowering the MTU instead.

rhbirkelund · ‎Jul 4 2024

I did find that thread during my research which is also why I thought the PMTUD was borked.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

ww · ‎Jul 4 2024

Mx doesnt do pmtu on the wan. It just keeps the tunnel mtu consistent across the org in case a mx mtu is lowered due pppoe or manual by support

rhbirkelund · ‎Jul 4 2024

But I’d suppose that this not PMTU on the WAN but rather that of AutoVPN.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

RaphaelL · ‎Jul 4 2024

My tunnel is built on internet links that do not support a MTU of 1500 but the MX is still using a AutoVPN MTU of 1500-68. It didn't notice that the path MTU is lower than 1500. Samething is happening in your case. If the MX was really running PMTUD on AutoVPN it would have got it.

rhbirkelund · ‎Jul 5 2024

The customer opened a Case with Support, and the first response from them was to suggest to decrese the WAN MTU.

So I guess, we'll see where this goes.

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

RaphaelL · ‎Jul 5 2024

It's a shame.. Using a full stack Meraki ( MG - MX and whatever else ) and still have to use NFO to mitigate compatibility issues.

rhbirkelund · ‎Jul 5 2024

I couldn't agree more. 😕

LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

PhilipDAth · ‎Jul 8 2024

A wild shot in the dark - some carriers support baby giant packets. You could try asking support to INCREASE the MTU on the MG41 from something like 1500 to 1508 bytes (being the additional size of PPPoE header). If it works you'll get a 1500 byte clean MTUs. Otherwise, it will completely break it.

Meraki

Community

Packet Fragmentation and Out-of-Order

Packet Fragmentation and Out-of-Order