Non-Meraki VPN with NSX-T : flapping IPSec SA

MarcelTempelman
Getting noticed

Non-Meraki VPN with NSX-T : flapping IPSec SA

Hello all,

 

We have a datacenter which is built with VMWare software and for networking we use NSX-T (we migrated from NSX-V this year). VPN-tunnels with NSX-V were not really super table but since the migration VPN-issues are almost a daily occurence. We have contacted support but their standard answer is almost always "we see no issues here"

 

Phase 1 is in general reasonably stable but phase 2 is realy a headache.

 

Jun 6 10:03:04 VDT-HEE-FW01 Non-Meraki VPN Non-Meraki VPN negotiation msg: <remote-peer-4|42> closing CHILD_SA net-4{4913} with SPIs ce45e2d6(inbound) (13761 bytes) 9c6df103(outbound) (30904 bytes) and TS 172.30.200.0/24 === 10.95.020.0/24
Jun 6 10:03:04 VDT-HEE-FW01 Non-Meraki VPN Non-Meraki VPN negotiation msg: Jun 6 08:03:03 13[IKE] <remote-peer-4|42> outbound CHILD_SA net-4{5315} established with SPIs cd5ee2be(inbound) 25593d03(outbound) and TS 172.30.200.0/24 === 10.95.020.0/24
Jun 6 10:03:04 VDT-HEE-FW01 Non-Meraki VPN Non-Meraki VPN negotiation msg: Jun 6 08:03:03 13[IKE] <remote-peer-4|42> inbound CHILD_SA net-4{5315} established with SPIs cd5ee2be(inbound) 25593d03(outbound) and TS 172.30.200.0/24 === 10.95.020.0/24
Jun 6 10:03:04 VDT-HEE-FW01 Non-Meraki VPN Non-Meraki VPN negotiation msg: <remote-peer-4|42> closing CHILD_SA net-4{4941} with SPIs c8ba247f(inbound) (50838 bytes) 67659a00(outbound) (92069 bytes) and TS 172.30.200.0/24 === 10.95.020.0/24
Jun 6 10:03:04 VDT-HEE-FW01 Non-Meraki VPN Non-Meraki VPN negotiation msg: Jun 6 08:03:03 07[IKE] <remote-peer-4|42> outbound CHILD_SA net-4{5314} established with SPIs c3892e57(inbound) 3d90e600(outbound) and TS 172.30.200.0/24 === 10.95.020.0/24
Jun 6 10:03:04 VDT-HEE-FW01 Non-Meraki VPN Non-Meraki VPN negotiation msg: Jun 6 08:03:03 07[IKE] <remote-peer-4|42> inbound CHILD_SA net-4{5314} established with SPIs c3892e57(inbound) 3d90e600(outbound) and TS 172.30.200.0/24 === 10.95.020.0/24

 

I see IPSec-SAs coming up and going down and although it seems most of the time it does not affect the connectivity sometimes we suddenly lose connections and we have much difficulty restarting the tunnel because not all IPSec SA become operational.

 

Last night I have cleaned up the traffic selectors on both sides and that seems to clear up a lot in the logging (which is not surprising) but I keep seeing these IPSec SA going up and being torn down. Troubleshooting at NSX-T can be done but has to be done at the Linux-level because VMWare does not provide any sensible tooling for that. We have support from a VMWare partner but as goes with a lot of server focused companies they have only the bare minumum knowledge about networking....

 

Phase 1 and Phase 2 are both the same. We run IKEv2 but we have tried IKEv1 but that is way more unstable.

 

My question is : does anyone have any experience with VMWare NSX-T and Meraki VPN-tunnels?

 

4 Replies 4
alemabrahao
Kind of a big deal
Kind of a big deal

It’s possible that there might be a mismatch in the configuration on both sides, or there could be network instability causing the VPN tunnel to drop and re-establish frequently.

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

We had some differences in the traffic selectors but we fixed that. It's only the phase 2 which is flapping like hell (network instability would affect phase 1) and we do not see these issues with Fortigates.

PhilipDAth
Kind of a big deal
Kind of a big deal

Does by chance the encryption domain include multiple subnets?  If it does, could you try an experiment and reduce it to a single encryption domain?

 

An issue I have seen (in general) is there are two ways of building the SA:

1. Creating an SA and then adding additional encryption domain combinations via negotiation.

2. Creating a separate SA for each combination of encryption domains.

 

Meraki only supports one of the two methods above (sorry - I don't remember which).  My gut feel (could well be wrong) that it can't handle method (1) and it sees each new negotiation as a request to replace the existing SA rather than appending to it.  This results in the appearance of the VPN going up and down a lot.

 

An easy first step to determine if this is the issue is to reduce to a single encryption domain on each side.  If the problem goes away, you know you are on the right track.

 

 

ps. I have gotten sick and tired of resolving these, so now I often deploy StongSwan on Unbuntu in a VM.  It supports every combination of every option that you can think of, and is rock solid.  And the software is free.

It's a bit trial and error. We have about 10 subnets on both sides. I know that Meraki uses 1 SA for all traffic selectors when using IKEv2, I'm not sure about NSX-T (documentation is scarce). We're also using Fortigates at customer sites to connect NSX-T and we see little issues there (and also with ASAs).

The 3rd party VPN issues is really holding us back selling these MXs to customers and that is a shame. For one customer we installed an MX in our datacenter and AutoVPN is rock solid but that's not our preferred solution (although I would'nt mind having a stack of customer MXs in the DC). In the meantime I hope Meraki will give the 3rd party VPN some more love (and add VTI !!!!!)..

 

 

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels