Meraki

theshmike · ‎Apr 1 2020

I am having a strange issue here where an MX appliance at a (all our) remote site sends traffic that should be routed to the WAN over the VPN.

Clients cannot reach hosts from 172.217.* (which are hosts from Google). I even cannot ping those hosts from the appliance. After some reverse engineering, I've found out, that the MX is routing this traffic into the VPN to our main site instead of routing it over the local WAN uplink at the remote site.

This is the setup:

main site with local subnet 172.16.0.0/16
remote site with local subnet 172.18.5.0/24
both connected with meraki site2site VPN
The routing table (Security & SD-WAN>Route table) at the remote site looks fine:
- 172.16.0.0/16 with next: hop main site over VPN
- 0.0.0.00 with next hop: WAN uplink
- no other routes for 172.*

From my point of view, this is a bug, because of the routing table, the traffic should be routed to the WAN. Or am I overlooking something?

theshmike · ‎Apr 2 2020

OK, just to inform you and without further explanation here is what I had to do to fix the issue.

Simply changing the mask to /12 did not work. And beware of this if you think it will work in your case: Doing so will reassign new random subnets to your bound networks!!!

save firewall rules of the template (if there are rules that have VLAN1 either in source or destination)
save bonjour forwarding-config of the template (if VLAN1 in included)
save every another piece of config of the template that includes VLAN1
delete firewall rules that include VLAN1
delete VLAN1 from every another piece of config of the template that includes VLAN1
save MX subnet config for VLAN1 (including DHCP settings) on every branch network

delete VLAN1 from template
add VLAN1 with the new mask to the template
- doing this will assign random subnets to your bound networks
rebuild firewall rules to the template
rebuild every another piece of config of the template that includes VLAN1
reassign the saves subnet configs to every single brach network

Quite a lot of work, but the API did the trick 🙂

AND, of course: Create a copy of your template first. Test, if you are using a script!

View solution in original post

GIdenJoe · ‎Apr 1 2020

Could you share a screenshot from your routing table on the branch and the vpn config on that same site?

theshmike · ‎Apr 1 2020

The routing table looks like below.

I've cutted off most of the other remote sites, because there are about ~100 more of them.

GIdenJoe · ‎Apr 2 2020

There's only one default route?

Because if you configure default route via S2S VPN you should see a second entry 0.0.0.0/0 Meraki VPN: VLAN.

Only in that case should the traffic be sent through the tunnel towards unknown destinations.

And you haven't found any other summary route where the google routes fall under?

theshmike · ‎Apr 2 2020

@GIdenJoe wrote:

Because if you configure default route via S2S VPN you should see a second entry 0.0.0.0/0 Meraki VPN: VLAN.
Only in that case should the traffic be sent through the tunnel towards unknown destinations.

Yes, there is only this one default route and there is also absolutely no other summary route that catches the 172.217.x.x-addresses.

..and that's why I am wondering about the MX sending the traffic into the VPN - because based on the routing table it should definitely NOT do that...

Aaron_Wilson · ‎Apr 2 2020

At the remote site, you have the default route box unchecked on the VPN connection?

theshmike · ‎Apr 2 2020

Yes, the box is unchecked in the template for all remote sites...

theshmike · ‎Apr 2 2020

I've opened a case with Meraki and it seems, that this behaviour is caused by this "Addressing- and VLANs"-setting in the template for our remote sites:

Obviously, AutoVPN thinks, everything in 172.0.0.0/8 should be routed via VPN because the local subnets (which are only /24) are out of 172.0.0.0/8

If so, I think this is very, very, VERY dumb logic!

GIdenJoe · ‎Apr 2 2020

Ah I see, I didn't know the templates would have effect on that.

However I do see an error in your own part in your last screenshot.

You are actually taking the entire 172.0.0.0/8 space as "private" space to carve out. While alot of that space is public, as the issue you are experiencing.

You should only include 172.16.0.0/12 in that template space.

theshmike · ‎Apr 2 2020

Yeah, that's true. But I also don't see a sense in this routing behaviour based on the template. Especially when it's not listed in the routing table.

I am going to change the mask to /12. The f*** part about this is, that this will regenerate and apply new random subnets out of the new /12-mask for all my remote sites.

I think I have to fix it with an API script afterwards.

GIdenJoe · ‎Apr 2 2020

I agree that they are using this "optimization" by violating standard routing rules (longest match).

I guess someone thought it would be a great idea to aggregate all possible VPN routes when using templates for Branches. And yes you wouldn't have noticed any problem if you originally used the /12.

But the fact is that even when optimizing, never ever violate standards because sooner or later you will have cases like this one exactly.

Nash · ‎Apr 2 2020

Just here to bang the "follow the standards" drum too. Using the most specific path is so standard that I would not, frankly, ever expect to see a supernet given priority.

Not even a special supernet like the one used for templates.

Windows 10 Client VPN scripts: Makes life better!

PhilipDAth · ‎Apr 2 2020

The 172.0.0.0/8 prefix is wrong. It should be 172.16.0.0/12.

https://tools.ietf.org/html/rfc1918

GIdenJoe · ‎Apr 2 2020

Nope PhilipDAth, it's 172.16.0.0/12 😉 But I already informed him of that and he realizes it.

theshmike · ‎Apr 2 2020