We have palo-alto firewall with 2 ISPs and path-monitoring enable on both default routes and one PBR rule.
An HA of MX250 behind this firewall with proper rules and NAT. The NAT works perfectly in automatic without unfriendly NAT detected.
In case of ISP fail-over, UDP session created inside the Palo-alto stay active and continu to nat on the last public IP.
Only solution is to do a clear session on the firewall and all VPN start to work again.
Here explanation of Palo-Alto :
This problem have big impact because, it mean all our SD-WAN network can not have a automatique ISP fail-over on the concentrator (something we have with remote sites ...)
Did somebody found something to improve this? a work-around? a redesign?
Thanks in advance.
Yesterday, I was with Palo-alto support. I propose this way, because it's possible to reduce UDP session timeout, default is 30sec.
But from my analyze, to have this working, it mean 2 or 3 secondes and Palo-alto said this is not recommended to have very small value like that.
Put the MX in parallel is not an option for me because I need to keep one-armed VPN concentrator due to EBGP used.
At this moment, I push a case on Palo-Alto side because I consider this is a bad behavior from Palo-alto. I keep you updated.
In any case, all proposition you can have will be more than welcome because I can't keep this no-automatic fail-over in a DC.
I went into more detail on the other thread but I think manual NAT traversal might be helpful even in this situation. Instead of clearing all of the sessions on the Palo Alto, you could manually change the NAT traversal information on the VPN concentrator to force all of the spokes to change the port and IP address to connect to. This is by no means a perfect solution as the same thing could potentially happen when the uplink is restored (I don't understand the intricacies that you're running into on the Palo Alto side). The key point of the change though is to change the port that is being used. All of the spokes would see that the port had changed and would need to re-establish the VPN. Since a new port is being used a new flow would be created on the Palo Alto. It's not a solution but another avenue that can be used while troubleshooting with Palo Alto.
Put temporary manual NAT traversal in order to force all VPN to change ports can be an other work-around, but it still not an automatic fail-over.
In this case, why Meraki can not change source port used when a new public IP is discovered? (this could solve other behavior I saw too... because dynamic NAT behavior change a lot depend of routers models)
The intricacies on Palo side is the fact that because the source IP (Private VIP IP from the one armed concentrator) and the source port don't change on this fail-over, the Palo-Alto continu to work with the same session and continu to NAT of the old public IP. So all incoming packet coming from other MX on the new public IP are drop because there is no existing session on this criteria.
Clear session or change source port will force to recreate these sessions.
I continu to think this behavior is not normal on Palo side.
An other work-around I had in mind was to have an incoming rules to allow UDP flow to go to the VIP MX IP.
But this kind of incoming rules looks not good in term of security and Meraki don't recommend this rule in this configuration. Do you think this could be a way?
I strongly agree with having an incoming rule to allow the UDP flow. (Not sure where we say that we don't recommend it, it's a practice that I'll commonly make.) You'll want to setup a manual NAT traversal with a specific port. You would probably want to have that port set up as a port forward on both uplinks. If the VPN concentrator able to communicate with the VPN registry on the failover, then the spokes might be able to create new sessions to the concentrator. I don't know if the traffic from the concentrator to the spoke would still get black-holed but it might be worth a try.
The reason you want the manual NAT traversal is it will prevent a new one from being picked at random. Otherwise every time the port changes on the MX then you'd have to change your inbound rules. In terms of security, it would be similar to having a webserver port forward. There's a resource that needs to be accessed but we rely on the webserver being up to date to prevent malicious use while still maintaining access to the resource. In this case the MX is the server being accessed and it has builtin security to only allow valid VPN connections. If you really want to lock it down then you could create allow lists with the public IP of all of the spokes. But that might be a bit onerous to upkeep.
About the manuel NAT traversal, I can not implement this because there 2 ISPs with 2 differents public IP. When we select manuel NAT traversal we need to put a port and only one IP. So, in my case, automatic is the only way.
About incoming rule, I'm surprise about this. I read this in Meraki document :
Placing an MX appliance configured as a one-armed VPN concentrator at the perimeter of the network with a publicly routable IP address is not recommended and can present security risks. As a best practice, one-armed concentrators MX appliances should always be deployed behind an edge firewall that filters inbound connections.
Now, I read again this and I'm not sure about "that filters inbound connections" ?
So you confirm, in my case, open incoming rules could be a solution? So it mean a rule where I will open all UDP port range 32768-61000 from all IPs?
My apologies. That warning is about setting a public IP directly accessible to the internet on the VPN concentrator. I thought you were referring to setting manual NAT traversal. We were talking about two different things.
As for my recommendation on manual NAT traversal, I'm not sure how else to explain it. I still recommend setting manual NAT traversal. I understand you have 2 internet circuits. The manual NAT traversal locks the MX to a certain port then relies on the VPN registry to teach the second ISP public IP to the spokes. However, in your case, the Palo Alto might be stopping that traffic from getting out so it might not have worked anyway.
I do not recommend opening all the ports. If you leave the NAT traversal set to auto, I would recommend that you open only the port that is currently being used. You can find that one the VPN status page.
So, for you, my way can be to use manual NAT traversal. But, when I try to activate this the dashboard, the public IP is required :
Because I have 2 ISPs, if we are in failover, an other public IP will be used. That's why I said the automatic NAT traversal seems to be the only way for me.
If I keep automatic and I manage a public IP white list? perhaps I can do an API request to receive all public IP found from my dashboard?