3rd Party VPNs not working/General MX issues

TNAComputers
Getting noticed

3rd Party VPNs not working/General MX issues

Just a FYI to everyone, I have been having issues with 3rd party VPNs since 18.x code. I am on the latest 19.x code for all devices in question. I even split them to 19.x code and remotes 18.x code etc. for testing purposes. I have spent about 40 hours with support working on this specific issue.

 

A little background, head end is a MX95, and remotes are MX67/MX68, so all Meraki just in different orgs hence the 3rd party vpn setup. The first tunnel that comes up works fine. When the second tunnel tries to come up, the initial udp/500 packet works on the proper primary uplink, but return traffic gets send out WAN2 and the tunnel never comes up. Now on 19x code, it just stops working all together even with one tunnel.

 

We went through all 18.x versions and then 19 with the issue still present. I have had a ticket open for months on this, but the workaround was to disable multi-core support. However I figured out a month later that was adding latency of 2-4 ms to ALL connections, and limited the WAN2 speeds by 60-70% on throughput. This was causing major issues. Recently I had to get that workaround removed, and latency and speeds returned to normal. However the 3rd party VPN issue cropped back up again. Looking at the logs, the VPN inst even trying to come up. It just stops working at all randomly. Rebooting the head-end seems to fix it until the next re-key, ISP issue etc.  All VPN settings are correct, and it worked 100% fine with multi-core disabled other than the issues I stated above.

 

AMP/IPS isn't working (I think) as I never get any alerts while active/prevention. If I disable AMP and set IPS to detection, I start seeing alerts again. There is nothing in the event logs showing any issues.

 

I have to say im shocked that basic functionality of MXs are not even working correctly over multiple major release versions. It seems like (since 2017 when I started using them) major/core features are not working correctly, or problems with code causing issues etc. This inst a rant at all, and I know each vendor has their own problems, but these are basic components of a "firewall" esp a stripped feature/functionality one that costs a premium to use.

 

Has anyone else had similar issues like this, or is just me that is lucky to find problems with every release? I ended up having to move the devices from their own org into the org with the head-end to use auto vpn which magically works great. That presents its own problems with licensing, subnet allocation etc. It now takes congress/CEO level approval to move devices from Co-term to PDL as they are pushing subscription licensing model on everyone now. PDL is no longer possible and is being phased out. It took over an hour of approvals from AM, Licensing management team just to move the license, which all stemmed from the 3rd party vpns not working. Had this not been a known issue, they would have denied it and it wouldn't have worked at all.

 

 

6 Replies 6
PhilipDAth
Kind of a big deal
Kind of a big deal

Are you by chance using IKEv2 and have multiple subnets selected?  IKEv2 on Meraki only supports a single subnet combination.  You have to use IKEv1 if you need to support multiple subnet combinations.

 

I echo your sentiment that MX is weak in the third-party VPN area.

 

When I tend to do, when I need to link orgs, is put an MX from each org next to each other in one central site, add static routes between them, and reidstribute those static routes back into AutoVPN on each side.

That works rock solid.

TNAComputers
Getting noticed

Hi Phillip. Thanks for the info. On one of the VPNs it was a single subnet. However on the other it was multiple subnets. There was no rhyme or reason as to which one would work, and with the multi core support disabled, both were working without issue. I admit I didn't test with removed the second down to one subnet as it might have caused an issue for both of them, but again it was working fine that way with multicore disabled.

 

Thanks for the tip on the auto vpn as well.  I ended up just moving those devices into the org with the headend MX and so far everything has been working fine. Rather than fighting that in the future that might be what I do. I have a feeling that im going to have to do this again soon with other devices.

JonnyM
Getting noticed

Routed tunnels would massively improve the non-Meraki VPN tunnel experience for me but until then you need to be very aware of the VPN limitations on the MX platform and work around it where possible - the most flexible way is to put a non-Meraki box in to do the VPN bits and then create static routes to it, but it does sort of wreck the "single pane of glass" promise that Meraki makes.

TNAComputers
Getting noticed

100% agree with you. If I have to put anther device in to terminate the VPN connections why have a MX in the first place? If I get to that point, I will just move back to FTD or some other vendor and get rid of the MX outright.

 

Going off topic, I cant really justify the "cloud" aspect of this anymore. Thats the only thing that makes Meraki unique, but with Cisco hosting FMC/CDO (Juniper Mist/Prisma Cloud etc) this same level can be achieved with other products that are way more stable and can actually do NAT. (been on the wishlist since 2015 that I know of). Sure Prisma cloud isnt as pretty or has as many options as Meraki, but I would choose stability over features anytime, and I feel this should always be the foundation of any vendor devices.

 

Issues I have specifically with MX line:

1. NAT support ( I know there is some support now with autovpn, but most people have unique subnets already if they are part of your org) I'm talking about Non meraki VPN NAT which lets face it 99% of us will run into network overlap esp with many peers with no recourse other than telling the remote party to NAT to us, or change your IP subnet.

2. AMP/IPS- No indication that its working. No reports, emails, log entries showing that its working as I stated above esp in V18/19. Maybe a bug? I have to rely on endpoint software telling me it blocked something, which means it made it past the firewall.

3. GEO blocking- Again no indication that its working. No logs showing whats blocked etc

4. Logging- Logging in general is horrible. Why Meraki dosent put the systems logs out there for those of us that want to look is beyond me. Following the keep it simple mantra I get it, but a button somewhere to turn it on for those that want to look/see would be great

5. Resource/Utilization- No way to tell CPU/RAM/System utilization other than the summary report which blends all of that together to spit out a number. I have had multiple MXs lock up on me with no way to tell other than calling support and it was due to CPU/RAM etc crashing the system.

6. URL Filtering- As far as I can tell it works, but no easy way to tell whats getting blocked. Logging has improved, but most of the time it didn't log the events, or it was an event burst so you still don't see it as they were dropped in the logs.

7. Mix and match license types (within MX). Not every site needs SD-WAN or Advanced Security licenses. The answer I always get is get a Z3/4 device (Dont get me started on Z4 being licensed like a MX now), but sometimes a remote site has 1gbps internet and those wont work and you have to go with a MX, so they are forcing you to buy unneeded licenses to use auto VPN etc. Only other choice is to setup another org and non-meraki VPN (see issues in original post). 

8. FQDN VPN peers do not work (non meraki VPN IKEv2) 

9. Support- I have nothing against the support team at all, but I already have a pretty good idea of whats happening when I call support and most of the time its because of a bug, or something I dont have access to. See the logging section. The default answer is do a packet capture which dosent identify if its L3/L7/AMP/IPS/GEO etc and you have to go through one by one and allow/deny to figure it out. I would say 90%+ of all my support calls have been due to bugs that require back-end logs to verify/confirm.

 

All of these issues are non existent on Cisco FTD/ASA, or Fortigate etc, and I consider them to be core functionally of what makes a firewall a firewall.

 

TNAComputers
Getting noticed

Now as as new issue with 19.x code (I know its release candidate as of this time) port forwarding is not working on WAN2. It just fails to forward anything for WAN2 to the client device. Downgrading to 18.x latest patch to see if any of these issues clear up, but I would expect a release candidate to be 90-95% ready to go, and with that port forwarding is a core feature of a firewall. This is all really basic stuff that keeps getting broke with every release.

TNAComputers
Getting noticed

As an update, downgrading did not fix the issue. I had a spare MX250 just to test on a different platform, and it does not work either with both 18.x code .2 and .4 patches, as well as 19.x firmware. I cannot go back further than 18.211.2. The MX250 was on 16.x code from 2020-2021, so it upgraded to org code, and I cloned my prod network to this one to retain all settings. I thought maybe something was off, so I factory reset it to make sure there was nothing lingering. I also turned off AMP/IPS/URL/GEO just to make sure nothing was interfering with the connection.

 

Maybe someone else can test this and its just a mistake on my side. WAN1 port forwarding works fine (WAN1 is my primary uplink). WAN2 port forwarding will not work. It gets a random port and not the port that its supposed to use. I have the ports I want forwarded setup, and I have the SD-WAN polices set to route all traffic (outbound) from WAN2 from that device, so inbound and outbound should be going over WAN2. All traffic is allowed, and the device is listening on that port.

 

I can see from the application that it is in fact using WAN 2 outbound so I know that works. However its getting a random port on the WAN2 (for example its supposed to use 3000, but gets 28437 from WAN2 public), but the device is using port 3000 inbound/oubound.

 

For example with the configured settings:

Expected flow

WAN2 public IP 1.1.1.1:3000-->172.16.1.1:3000

 

Actual flow

WAN2 public IP 1.1.1.1:28473-->172.16.1.1:3000

 

WAN1 is correctly forwarding the port with this same configuration. I should also mention that I also setup 1:many NAT with the same results. I cannot do 1:1 NAT because WAN2 is a DHCP single /32 address and it will not let me do that.

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels