WE have a situation where we manage site to site vpns between Meraki devices and Cisco ASA devices. WE can establish a site to site VPN fine but after a undetermined / random amount of time the tunnel will stop passing traffic and we have to force a rekey on the ASA side or force the vpn down and back up on the Meraki portal side but shutting VPN settings off and turning the back on.
WE have been back and forth with support for both ends, set recommended ph1 and ph2 timeouts, disabled dpd and other misc settings but the issue remains. WE always attempt to be the on the latest firmware on both ends.
I am out of ideas.
The strange thing is that the tunnel in the portal shows the green "up" icon and on the asa side it will still show "active" but no traffic will pass until you reset/rekey to force the tunnel reset.
Looking for recommendations, ideas or feedback.
Is ASAv running 8.3 code or above? Meraki has an issue building tunnels to ASA's below this code level.
Hmm, we've seen similar issues with ASAv 9.5.x versions that were resolved by upgrades to 9.6.3 and later.
What we find is that duplicate IPSEC SAs are being created when they shouldn't be. The bug can be confirmed on the ASA by running "show crypto ipsec sa inactive" and looking for an inactive tunnel. Performing "clear crypto ipsec sa inactive" on the ASA is a workaround. My understanding is that 9.8.x versions were unaffected.
Interesting. We do have a couple older asas running 9.5.2 and had locked up VPNs over the past few days. I checked "sh crypto ipsec sa inactive" and it came back with 0. I must have a different issue here. Good feedback though. I hadnt heard that one before.
Hi, we had the same issue here. It turned out to be a problem with the timeouts and NAT-T.
We ended up with phase 1 28800, phase 2 14400 and Meraki support disabled NAT-T (this was a configuration override that only support can do) for that endpoint it has been stable for us since.
We're having the same exact issue using a Sonicwall NSA3500 on the hub side. Tried just about everything you did to get some stability to no avail. I can only guess now that it's an issue with Meraki now 😞
Please let us all know if you have had a fix for this issue.
This same issue has been killing us for almost 2 years. I have the exact same symptoms you describe on multiple ASA-MX VPN tunnels. ASA-ASA tunnels, ASA-SonicWall tunnels and MX-MX tunnels are all fine. Did you ever find a fix?
Did you ever find a fix?
Sort of. As I suggested above, we found that the ASA bug was supposed to be resolved in 9.6.3, but in practice we've still had occasional issues through 9.8.1 devices. We think this is identified in the ASA bug tracker as "Stale VPN Context entries cause ASA to stop encrypting traffic despite fix for CSCup37416 - CSCvb29688."
We've successfully mitigated the issue by using the following tunnel settings on both sides:
It is important that these be the ONLY accepted/offered tunnel parameters.
It is also important that the ASA have NAT Exempt enabled for the tunnel.
Nope. Same here. Since 2014.
Recent ASA Code 9.8.2 seems to have helped as well as running 12.26 on the MX side but they still go down once and a while.
It used to be as much as multiple times of day to a couple times a week now. ...which is much much better than it has been.
Hi, has this problem come back? We are having this same issue. I'm hoping that this is the fix.
Thanks everyone for the help. I should also add that I opened a case with Meraki support and they disabled NAT traversal on the tunnel by changing some backend settings on the MX that we do not have access to. That seems to have helped significantly - we have not had the tunnel go down in over a month at this point after making all the changes in this thread and having Meraki disable NAT-T. We also adjusted the timeouts to 86400 for phase 1 and 28800 for phase 2.
I am still not convinced that the issue is resolved, but there is no question that things are much improved over where they were with the default settings.
We have exactly the same issue with MX-64's and MX-100 connecting to a 3rd party Juniper firewall and the only issue we can think, including Meraki support is either the ESP window or the fact that Meraki have Anti-Replay protection enabled. We had an issue with some ASA's connecting to the same Juniper in which we had to disable Anti-Replay as the Juniper is sending out of sequence packets.
Support cannot find out the issue as the tunnel is up but packets just drop, our phase 1 is 28800 with phase 2 3600. My concern is I am about to return these devices because of this issue which I do not want to do.
Any help would be appreciated
Have anyone found a fix for this scenario? I still have a random issue between a MX600 and a ASA running 9.1(7)4 , the tunnel remains always up but the traffic stops going through, it is very annoying and it has been around for 2 months now.
I still have not had any tunnel outages after performing the steps noted in my post above. Still not totally convinced everything is fixed, but things are certainly WAY more stable than they were before. I saw exactly what you saw - tunnel appears to be up on both sides but traffic stops passing. Only way to fix was to clear the tunnel from the ASA side and allow it to rebuild. This has not happened at all since disabling NAT-T on the Meraki (must be done by support) and adjusting the timeouts on both sides. Also make sure to disable the data-based tunnel lifetime on the ASA, although this was never my problem.
I see, I have a case opened with meraki but I think they already gave up 😄 I don't have the data lifetime, it is actually working without issues with other ASA, running the same cnfig and version, but I don't know why it fails that much with this other ASA.
I have read above that changing to 3DES also helped someone, I will give it a try although I don't find any relation with the encryption algorithm, at this point I find it quite frustrating. I am escalating in parallel with Meraki again (no answer from them since Dec 19th 🙂 ) to see if NAT-T has something to do with it. I am curious how could that actually help, I thought NAT-T is actually needed having a PAT device in the path.
Just for the record, changing to 3DES haven't changed anything, tunnel keeps failing. I have restored AES256 and I have changed lifetime to 28800 seconds (Meraki default) instead of the 86400 seconds that I had before. I don't know if doing rekey earlier would help somehow.
Changing to 28800 seconds made the difference, I don't know it it is solved but it looks more stable. Meraki Support doesn't even respond anymore.
It seemed more stable, it started to happen again though. Meraki doesn't provide any serious feedback on this one. They asked me whether nat-trasversal is activated in the ASA, it is, I think they are already running out of ideas. I have another ASA in the same building working steady and stable for long time.
Yeah, it's totally ridiculous. I just had one stop passing traffic this weekend. When I looked at the ASA side (since you can't see s*** on the Meraki) there were two tunnels up and active - one with the ASA as the initiator and one with it as the responder. Had to "cl isakmp sa" and everything started working again (but who knows for how long). Meraki support is just terrible. Every time I reach out to them I get a tech that can't really help at all. Cisco TAC they are not. They certainly act like they know what they are doing, but nothing ever really gets fixed. They just don't have the knowledge and experience to support the product properly when something unusual goes wrong.
It is disappointing that this is even an issue. I've done SonicWall-ASA tunnels, Watchguard-ASA tunnels, Fortinet-ASA tunnels - all work perfectly. Meraki is owned by Cisco and they can't create a stable tunnel with the most industry-standard firewall imaginable. Ridiculous. So frustrated. Maybe someone else can stay on support about this and give them a hard time. I just don't have the time or heart any more.
My ticket is opened since December, I have contacted them multiple times, no success at all. Now someone else took the ticket and they keep asking again for the same basic information, what is your config, take capture, etc. I am sorry to say but this is not a Next generation firewall, unstable tunnels, VRRP HA, no outgoing NAT for other IPs but the WAN interface, terrible Support, etc. My level of frustration with this product is getting really high, very disappointed too.
Agreed Akan33 , we are facing the same issue and everytime we call in they have a new fix which works for couple of days and than the same issue. I dont think MX is enterprise level device
So it seems there are multiple customers complaining about this, they should take this situation more seriously from my point of view as it is not isolated.
Same problem here. Mainly VPN to Sonicwalls but also Azure and Fortigate VPNs. Only solution is to disable/enable VPN temporarily or move to AutoVPN.
which devices? I show most of my devices are 13.33 and considered current via the portal. The beta firmwares i show only go up to 14.xx. No 15 options at all for firmware versions.
OHTorx, are you referring to v14.13 changelog entry?
"Non-Meraki and client VPN traffic may be dropped when packets arrive out-of-order due to an overly restrictive anti-replay window size"
Which appears to be fixed in 14.26 changelog:
If so, have you experienced any other issues on this beta firmware?
I am having the same exact issue between a Meraki MX80 HA Pair and a Watchguard firewall. I have marked this CASE HIGH PRIORITY CRITICAL when I lose this tunnel the entire organization is down.
Basically HA and all failover works perfectly and then either at EOL of Phase 2 key or at random the VPN just stops it appears Phase 1 is up and we have verified all settings on both sides, followed Meraki docs to a Watchguard, either side can rekey the tunnel back up and working, but hangs.
I am using HA pair setup with Virtual IPs for greatest recovery with two ISPs all cabled the same. with direct heartbeat cable between per Meraki Best Practices.
They had me move to 14.20 for an initial HA Pair problem where the STP was not being passed on a security monitoring device, got that resolved was not related to the 14.20 firmware. Went back to the stable release of 13.27
I was on Stable relase 13.27 and at random, the mx would lose its virtual IP and the tunnel would try to establish on the non-virtual IP, of course, it wouldn't work THey beta pushed me up to 14.27 and now I"m back to my original problem.
Using std negotiations with phase 1 time to 28800 and phase two time to 14400 everything matches to a tee. Also have the WatchGuard keep alive off because not supported to non-watchguard, dead peer detection is on.
They have captured packets and don't see anything wrong on in the tunnel setup nor settings. They can't explain why it just stops, but there are over 100 tunnels connection to my application provider without problems and this is only one they are having trouble with, with all different manufacturers.
Meraki is so good at so many things, but some of the most basic things, like this, and then like no logging if they block a country from layer 7 firewall rule.
I had the same setup with sonciwall and never had any trouble with the HA or tunnels, but now trouble. They are gathering packets etc and I'm trying to get to engineering but doesn't suppress the heat i'm getting
Has anyone got a resolution. I'm tempted to go back to the sonciwall with this tunnel. I still have it running my Verizon Wireless Private network tunnel because meraki doesn't support address translation on a tunnel or truly support BGP so I can get rid of the translation.
ANyone......Car54 Anyone??????? HELP
marking the case as High priority won't make any different from my experience.
I had my firewall running few months ago on 12.x and they asked me to move it to 13.28, same result.
behavior is like you describe. I have escalated this issue to Cisco ASA engineer, I will keep you posted but I would recommend you to do the same as Meraki is not helping at all on this issue, it is very frustrating (they keep passing the ticket among engineers and I have to explain the same story every time, without any progress).
Thank you. I refuse to get off the phone nor did they pressure me i'm over 2 hours in right now. The result is parsed packet captures and verified my settings are correct and remote vendor. Eliminated any setup errors on both our parts and have attached screen captures.
They verified Dead peer detection is fine and correct. I'm supposedly heading into higher level engineering.
What we are down too is this.
Describe it this way Site A (ME) Site B is Watchguard)
When this VPN Down event occurs Site, B tries to send packets to Site A(seen in packet capture),
The Phase one tunnel is up, matter fact I get a green light on meraki, but meraki Phase 2 is actually down, the green light only shows phase 1. You reboot primary or turn off vpn page turn on, the phase one comes down and immediately everything restarts, and they did both confirm on both sides that dead peer detection is working properly. I'm good to go again, it seems related to phase 2 key lifetime, but not always its random to.
We have gathered logs, screenshots, and everything because I don't want this escalated and easily dismissed. This issue thanks to this thread and my gut experience is more than just a simple misconfiguration or setting problem.
More to come, Thank you for the replies> its good to know I"m not alone.
yeah, Phase 1 remains up, but no SPI are built in the remote end, only resetting the ipsec or bouncing the tunnel works.
I am trying to collect some debugging from the ASA to see if Cisco can helps here.
@akan33 you are describing the same exact issue I have been troubleshooting for the last 4 months. View my notes above and how we resolved. Meraki code 15.7 changed the anti-Replay value from 4 to 32. That fixed my Meraki to Juniper VPN troubles. Either upgrade or on the other side have them change this window to = 4 to match your current code's value.
Update... So being here till 11:30pm last night and going through three call centers, and refusing to get off the phone. They captured all the information they wanted except a down situation. From my colleagues on the other side, they can see that Meraki Support disabled NAT-T on the Meraki side, which is an options we cannot see, and (FINGERS CROSSED) since last night I have not had one hiccup. We did temporarily remove the secondary endpoint on the watchguard side just for testing, but plan on putting it back if everything goes well today. I was also very patient, gave the techs time to analyze the captures, cause we all know how it is to work in Tech Service.
The WatchGuard guys asked if I wanted anything else changed on their side. I told them not to change anything so on the WatchGuard side we still have Dead Peer Detection 5 tries 20 seconds, no Keep Alive cause thats watchguard to WatchGuard, and NAT-T on, which is on by default on most firewalls now, but apparently NAT-T on meraki might be causing something with Meraki.
Keep everyone Posted. For as good as they are in so many areas, this core product needs more work
@ITofTN you are going through the same steps I did for 4 months. Have your Meraki guy look up my Case 02390711 and talk to the engineer on it. From your description the Anti-Replay is the issue. We did the NAT-T thing also with no success.
Meraki uses "lifetime-kb-unlimited" and there is no way to change this. We had an issue where we were doing MX VPN's to Cisco ASA and this is what was recommended bu Meraki support. I believe this is also why Azure tunnels won't stay connected. You need an ASA running 9.1(2) or higher I believe to use this command.
On Cisco ASA you have to specify this in crypto-map:
crypto map <map-name> <seq-num> set security-association lifetime kilobytes unlimited
Upate I ran all day today, and Meraki Support did not turn off Nat-T it is still one, no drops then had a bip at 5pm and now down from 6pm-9pm no explanatio
maybe it is related to the anti-replay window size as per above comments, if that's the fix what it would be shocking to me is the fact that I have had my ticket open for months and no engineer has been able to provide any information, and that the 'fix' actually comes so late. In any case, the damage is made.
from the logs, I can see this when failing from the ASA:
where x.x.x.x is the Meraki remote public IP.
[IKEv1]Group = x.x.x.x, IP = x.x.x.x, QM FSM error (P2 struct &0x00007ffe60d39e40, mess id 0xb107883c)!
[IKEv1]Group = x.x.x.x, IP = x.x.x.x., Removing peer from correlator table failed, no match!
[IKEv1]Group = x.x.x.x, IP = x.x.x.x, Session is being torn down. Reason: Phase 2 Mismatch
Also, from the debugging, it looks like there's a crypto ACL mismatch, but the ACL that shows the log is actually properly configured in both sides, mirrored. Again, when clearing the tunnel everything starts working fine again.
Cisco pointing to Meraki, but no answer from them.
are you NATting in the firewall? I have everything behind NAT, so I wouldn't understand the point of disabling NAT-T as I need to encapsulate in UDP to work with PAT 😕
Yes, we are natted completely behind the firewall. My understanding the Nat T only effects this site to site Vpn which public side is all real ips. It's not a global setting so someone trying to get on a Vpn inside my network can.
So far up since last Wednesday no events. I just added back or watch guard side added back in secondary end point for isp2 and they had to turn on dead peer detection so now click is reset
I've been having some major issues with a Meraki MX80's VPN to one site previously running a Cisco 89x series and now a Ubiquiti EdgeRouter ER8-Pro.
MX80 is on firmware 13.28. IPSEC has 3DES/SHA1 with lifetime of 86400 for both Phase 1 and 2.
What I've found is that if a change is made in the site-to-site VPN settings - such as adding/removing a subnet on any of the peers - the Meraki closes ALL tunnels and recreates them. When this happens, certain types of traffic stop passing through the tunnel to this site. For all intents and purposes the tunnel is up, however not everything works.
At the Cisco/Ubiquiti end, this manifests as failed authentication attempts to domain controllers, file shares stop working etc. The only way to fix it is to restart IPSEC on the Cisco/Ubiquiti end. I can recreate this like clockwork by simply making a change to one of the peers on the Meraki console. Within a few seconds, the tunnels drop and recreate fine but with only some of my traffic passing through.
Tonight I've had a breakthrough. By adjusting the MSS down to a conservative 1300 on all interfaces, the problem has magically gone away. As soon as I made the change, traffic started flowing freely. I didn't need to restart IPSEC, it literally just came good. I then made 10+ changes to the Meraki peer console to try and force it to break, and each time the tunnel would drop, recreate and resume normal operation.
Obviously it's too early for me to say whether this has completely resolved it, but I thought it worth sharing as I've tried almost everything else and hopefully it points someone in the right direction.
EdgeOS Commands :
set firewall options mss-clamp interface-type all
set firewall options mss-clamp mss 1300
I am still struggling with an issue between Meraki MX and ASA since last October 🙂 Cisco and Meraki are engaged and although we keep trying things the root cause has not been found, we have performed live sessions and troubleshooting.
last thing was that Meraki was changing the WAN IP randomly between the firewall physical interface and the VIP, breaking up tunnels randomly too, we were on 13.28 and they recommended upgrade to 14.30 (Beta), we just found that the issue persists so we are really running out of ideas and it is very frustrating.
@akan33 Have you talked with support about beta 15.7 and the change it made to the Anti-replay value from 4 to 32? I do not know ASA, is it possible to change it's value down to 4 to match the Meraki's pre-15.7 value?
Just jumping in to say that, assuming the issue is related to the anti-replay value as @OHTorx is advising,you should be able to change the anti-replay window size on the ASA side:
Hopefully this might be a less disruptive test than a firmware upgrade .
Keep in mind that the 15.X release is currently on unreleased Beta and we are using it for customers who face particular issues that are resolved in that specific release, which is why you are unable to schedule upgrades to it manually.
Hope this helps.