Wrong routing decision: MX sends WAN-traffic over VPN

Solved
theshmike
Getting noticed

Wrong routing decision: MX sends WAN-traffic over VPN

I am having a strange issue here where an MX appliance at a (all our) remote site sends traffic that should be routed to the WAN over the VPN.

 

Clients cannot reach hosts from 172.217.* (which are hosts from Google). I even cannot ping those hosts from the appliance. After some reverse engineering, I've found out, that the MX is routing this traffic into the VPN to our main site instead of routing it over the local WAN uplink at the remote site.

 

This is the setup:

 

  • main site with local subnet 172.16.0.0/16
  • remote site with local subnet 172.18.5.0/24
  • both connected with meraki site2site VPN
  • The routing table (Security & SD-WAN>Route table) at the remote site looks fine:
    • 172.16.0.0/16 with next: hop main site over VPN
    • 0.0.0.00 with next hop: WAN uplink
    • no other routes for 172.*

From my point of view, this is a bug, because of the routing table, the traffic should be routed to the WAN. Or am I overlooking something?

 

1 Accepted Solution
theshmike
Getting noticed

OK, just to inform you and without further explanation here is what I had to do to fix the issue.

Simply changing the mask to /12 did not work. And beware of this if you think it will work in your case: Doing so will reassign new random subnets to your bound networks!!!

 

  • save firewall rules of the template (if there are rules that have VLAN1 either in source or destination)
  • save bonjour forwarding-config of the template (if VLAN1 in included)
  • save every another piece of config of the template that includes VLAN1
  • delete firewall rules that include VLAN1
  • delete VLAN1 from every another piece of config of the template that includes VLAN1
  • save MX subnet config for VLAN1 (including DHCP settings) on every branch network

 

  •  delete VLAN1 from template
  • add VLAN1 with the new mask to the template
    • doing this will assign random subnets to your bound networks
  • rebuild firewall rules to the template
  • rebuild every another piece of config of the template that includes VLAN1
  • reassign the saves subnet configs to every single brach network

 

Quite a lot of work, but the API did the trick 🙂

AND, of course: Create a copy of your template first. Test, if you are using a script!

View solution in original post

14 Replies 14
GIdenJoe
Kind of a big deal
Kind of a big deal

Could you share a screenshot from your routing table on the branch and the vpn config on that same site?

theshmike
Getting noticed

The routing table looks like below.

I've cutted off most of the other remote sites, because there are about ~100 more of them.

 

theshmike_1-1585807145205.png

 

 

GIdenJoe
Kind of a big deal
Kind of a big deal

There's only one default route?

Because if you configure default route via S2S VPN you should see a second entry 0.0.0.0/0 Meraki VPN: VLAN.

Only in that case should the traffic be sent through the tunnel towards unknown destinations.

 

And you haven't found any other summary route where the google routes fall under?

theshmike
Getting noticed


@GIdenJoe wrote:

 

Because if you configure default route via S2S VPN you should see a second entry 0.0.0.0/0 Meraki VPN: VLAN.

Only in that case should the traffic be sent through the tunnel towards unknown destinations.


Yes, there is only this one default route and there is also absolutely no other summary route that catches the 172.217.x.x-addresses.

 

..and that's why I am wondering about the MX sending the traffic into the VPN - because based on the routing table it should definitely NOT do that...

Aaron_Wilson
A model citizen

At the remote site, you have the default route box unchecked on the VPN connection?
theshmike
Getting noticed

Yes, the box is unchecked in the template for all remote sites...

 

theshmike_1-1585822208031.png

 

 

theshmike
Getting noticed

I've opened a case with Meraki and it seems, that this behaviour is caused by this "Addressing- and VLANs"-setting in the template for our remote sites:

 

theshmike_0-1585823039816.png

 

Obviously, AutoVPN thinks, everything in 172.0.0.0/8 should be routed via VPN because the local subnets (which are only /24) are out of 172.0.0.0/8

 

If so, I think this is very, very, VERY dumb logic!

GIdenJoe
Kind of a big deal
Kind of a big deal

Ah I see, I didn't know the templates would have effect on that.

 

However I do see an error in your own part in your last screenshot.

You are actually taking the entire 172.0.0.0/8 space as "private" space to carve out.  While alot of that space is public, as the issue you are experiencing.

 

You should only include 172.16.0.0/12 in that template space.

theshmike
Getting noticed

Yeah, that's true. But I also don't see a sense in this routing behaviour based on the template. Especially when it's not listed in the routing table.

I am going to change the mask to /12. The f*** part about this is, that this will regenerate and apply new random subnets out of the new /12-mask for all my remote sites.

I think I have to fix it with an API script afterwards.
GIdenJoe
Kind of a big deal
Kind of a big deal

I agree that they are using this "optimization" by violating standard routing rules (longest match).

I guess someone thought it would be a great idea to aggregate all possible VPN routes when using templates for Branches.  And yes you wouldn't have noticed any problem if you originally used the /12.

 

But the fact is that even when optimizing, never ever violate standards because sooner or later you will have cases like this one exactly.

Nash
Kind of a big deal

Just here to bang the "follow the standards" drum too. Using the most specific path is so standard that I would not, frankly, ever expect to see a supernet given priority.

 

Not even a special supernet like the one used for templates.

PhilipDAth
Kind of a big deal
Kind of a big deal

The 172.0.0.0/8 prefix is wrong.  It should be  172.16.0.0/12.

https://tools.ietf.org/html/rfc1918 

GIdenJoe
Kind of a big deal
Kind of a big deal

Nope PhilipDAth, it's 172.16.0.0/12 😉  But I already informed him of that and he realizes it.

theshmike
Getting noticed

OK, just to inform you and without further explanation here is what I had to do to fix the issue.

Simply changing the mask to /12 did not work. And beware of this if you think it will work in your case: Doing so will reassign new random subnets to your bound networks!!!

 

  • save firewall rules of the template (if there are rules that have VLAN1 either in source or destination)
  • save bonjour forwarding-config of the template (if VLAN1 in included)
  • save every another piece of config of the template that includes VLAN1
  • delete firewall rules that include VLAN1
  • delete VLAN1 from every another piece of config of the template that includes VLAN1
  • save MX subnet config for VLAN1 (including DHCP settings) on every branch network

 

  •  delete VLAN1 from template
  • add VLAN1 with the new mask to the template
    • doing this will assign random subnets to your bound networks
  • rebuild firewall rules to the template
  • rebuild every another piece of config of the template that includes VLAN1
  • reassign the saves subnet configs to every single brach network

 

Quite a lot of work, but the API did the trick 🙂

AND, of course: Create a copy of your template first. Test, if you are using a script!

Get notified when there are additional replies to this discussion.