Site to Site tunnel with vMX on AWS

Solved
chrismoses
Here to help

Site to Site tunnel with vMX on AWS

Over the weekend we attempted to connect all of our remote sites (about 25) to a vMX in AWS. We already have a physical MX100 acting as a hub in our data center for our remote sites and they've been easy and trouble free. Unfortunately, for no reason that we can determine, four of our 26 sites will not connect to the AWS based vMX. These same sites have no problem connecting to the physical MX and we are really stuck. We contacted support and they just said that "something is being blocked" but I don't get how the S2S VPN to our physical is working if "something is being blocked" that affects this feature. We tried assigning an Elastic IP to the vMX instance and I added UDP inbound traffic to the TCP security group that Meraki auto-generated. Anyone else experienced this or have any ideas what to look at? This threw a huge spanner in a major project that I have months invested in... These sites are located around the country and I haven't been able to replicate it with any of my devices and internet connections locally so I really only have the Meraki dashboard tools to work with.

1 Accepted Solution
jdizzle
Here to help

On the Site to Site configuration page of the vMX there should be an option to switch NAT traversal between automatic and manual. The port will be configure on the vMX but you'll need to make sure the IP matches whatever elastic IP or some such public IP used by the vMX.

View solution in original post

5 Replies 5
PhilipDAth
Kind of a big deal
Kind of a big deal

The remote sites - do the MX's have a public IP address on them directly (best) or do they have something in front of them doing NAT?

 

What firmware version are you using for your MX's?

 

Did you try rebooting one of the four MX's with the connectivity issue?

 

You definitely want an elastic IP address assigned to the vMX, so that is good.  You probably don't need to allow all inbound UDP traffic, but that certainly wont stop it working.

 

 

The thing that stands out to me are that 22 out of 26 sites connect to the vMX just fine.  So this makes me think it is nothing to do with your vMX configuration.  This makes me suspicious about the 4 sites that don't connect.

 

Even with so little details - I'm going to take a punt.  The four affected sites don't have a public IP address directly on the WAN interface and are sitting behind routers doing NAT - and those four routers doing NAT have a defect in their NAT code.  Something will be different about those four compared to the other 22 sites.  You might be able to resolve it by upgrading the firmware used by those four units.  If those four units are different to the rest in your network then perhaps if the firmware upgrade does not work you could try replacing one of them.

This is just wild speculation at this point in time.

jdizzle
Here to help

Have you tried configuring a manual VPN IP/port for your vMX? If there's issues like the other poster suggested it may be resolved by this.  Specifically, if the src port of outbound UDP connections at your broken sites are being remapped, the NAT punching logic likely breaks. Setting a static IP/port allows it to work without needing the NAT punching logic.

 

Follow up Q: your post doesn't mention anything about a Checkpoint firewall, but the title does. Why is that?

chrismoses
Here to help

Thanks for the responses!

 

1) I have no idea where "checkpoint" came from in the topic. I don't use Checkpoint and never have. I typed "Site to Site VPN with vMX on AWS". I've tried changing it and it keeps popping back to "Site to Site tunnel with Checkpoint"...

 

2) Yes, all four sites are behind other devices that are NATing. At least two are at Regus facilities (hiss!).

 

3) Three sites are Z1 (FW 12.26 or 13.33) and one is a Z3 (FW 14.16 up to date).

 

4) WRT jdizzle's question, "Have you tried configuring a manual VPN IP/port for your vMX?" are you talking about the remote spoke devices or a change on the vMX in AWS? I'm not quite following your question but would like to.

 

5) Really oddly, all four sites are now connected through the Meraki Site-to-site VPNs despite me making no changes. The connections took almost 48 hours to come up and they all seemed to have come up at the exact same time (which kind of makes me think it was something on the AWS side and not at the individual remote locations???). EDIT: looked into the logs and one site came up at 10:50:40, and the other three all came up at 10:50:43, (on 4/29) so they literally connected at the same exact second.

 

Thanks again and in advance.

jdizzle
Here to help

On the Site to Site configuration page of the vMX there should be an option to switch NAT traversal between automatic and manual. The port will be configure on the vMX but you'll need to make sure the IP matches whatever elastic IP or some such public IP used by the vMX.

chrismoses
Here to help

For some reason I did have to set the NAT to manual on the vMX appliance. I picked a much lower port than automatic was choosing and the trouble sites came up almost immediately.
Get notified when there are additional replies to this discussion.