Meraki

JinSoo_Park

Hello,

I’m experiencing an issue with Site-to-Site IPsec VPN (Non-Meraki VPN) using Primary and Secondary tunnels.

Environment

Cisco Meraki MX95
IPsec VPN to AWS Transit Gateway
IKEv2
Static routing (no BGP)
Two IPsec peers configured:

Primary tunnel
Secondary tunnel

Both tunnels show green (up) in VPN Status

VPN Configuration

Local subnets (Meraki side):
10.184.x.x (multiple internal subnets)
Remote subnets (AWS side):
10.100.0.0/16, 10.130.0.0/16, 10.192.0.0/16
IPsec policy preset: AWS
Phase 1 / Phase 2 settings match AWS requirements
Health Check configured (tested with):

AWS internal IP
http://google.com

Health check source IP 192.0.2.3/32 is allowed and routed on AWS side

Issue Description

When Primary and Secondary tunnels are both enabled with Health Check:

VPN Status shows both tunnels UP (green)
However, traffic does not pass
Event log shows Traffic Selectors changing from expected subnets to:
TS 0.0.0.0/0 === 0.0.0.0/0
At this point, communication over the tunnel completely fails
When only the primary IPsec tunnel is in use, the event log shows traffic selectors such as
TS 10.184.x.x/x === 10.100.0.0/16, 10.130.0.0/16, 10.192.0.0/16,
and communication works correctly without any issues.

If I remove the Health Check:

One tunnel becomes inactive (expected behavior)
The remaining tunnel works correctly
Traffic Selectors return to normal subnet-based values
VPN traffic works as expected

Additional Notes

Disabling “Failover directly to internet” does not resolve the issue
The problem occurs regardless of which endpoint is used for Health Check
AWS routing includes 192.0.2.3/32 pointing back to the VPN attachment
This issue only occurs when trying to keep both tunnels active simultaneously

Questions

Is it expected behavior for Traffic Selectors to change to 0.0.0.0/0 when health checks fail?
Are there known limitations or requirements for Health Checks with AWS TGW + Meraki IPsec?
Is there a recommended Health Check endpoint when using AWS Transit Gateway?
Are there any additional settings required to keep both Primary and Secondary tunnels active simultaneously without breaking traffic selectors?

Any guidance or similar experiences would be greatly appreciated.

Thank you!

Tony-Sydney-AU

Hello there @JinSoo_Park ,

Thanks for your really in-depth scenario and follow-up questions.

I'm just missing which firmware version you're running at the moment.

I'll discuss this with Internal Teams and update you back with my findings.

I suppose it's 19.1.11 (the current Stable). Am I right?

I didn't have the chance to test peering with AWS TGW but I suppose if you want both tunnel active simultaneously and you're not using BGP, this would be expected.

But let me check this internally and I'll update you.

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it.

Tony-Sydney-AU

Hi @JinSoo_Park ,

I'm back and sharing my findings.

We believe in this case you would benefit more from a Support Case. It's too hard to troubleshoot the behaviour you described without looking at your network and backends.

In any case, I'm answering your questions below, granted that you're running firmware higher than 19.2.2 where Multilink feature is enabled;

1. Is it expected behaviour for Traffic Selectors to change to 0.0.0.0/0 when health checks fail?

- The traffic selectors would be 0.0.0.0/0 if health checks are enabled in general.

2. Are there known limitations or requirements for Health Checks with AWS TGW + Meraki IPsec?

- No, there are no known limitations or issues for health checks.

3. Is there a recommended Health Check endpoint when using AWS Transit Gateway?

- No, nothing special recommended here. Only thing you need is AWS side is routing traffic back.

4. Are there any additional settings required to keep both Primary and Secondary tunnels active simultaneously without breaking traffic selectors?

- No, there is no extra config needed here.

In conclusion, let's examine your issue over a support case as this would allow us to look deeper into this behaviour.

Looking forward to your input on this.

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it.

JinSoo_Park

Hi @Tony-Sydney-AU

Thank you for getting back to me and for the detailed explanation.

I’d like to share our current situation for clarity.

We have an active Support Case open with Cisco Meraki, and we are currently troubleshooting a Site-to-Site IPsec VPN setup between a Meraki MX and AWS Transit Gateway using Primary and Secondary IPsec tunnels with health checks enabled.

Current observations:

When health checks are not configured, and only a single IPsec tunnel is active, traffic works as expected.

Traffic selectors are shown correctly (e.g. 10.184.x.x === 10.100.0.0/16, 10.130.0.0/16, 10.192.0.0/16)

No packet loss or instability is observed.

When health checks are enabled on both primary and secondary tunnels, traffic selectors change to 0.0.0.0/0 === 0.0.0.0/0, which we now understand is expected behavior.

However, in this state, we observe intermittent or complete traffic loss, even though:

Both tunnels show green (up) status in the dashboard

AWS routing for 192.0.2.3/32 is correctly configured

The issue is not consistently reproducible:

In some cases, applying the health check to the primary tunnel first, then the secondary, temporarily results in stable traffic

However, after leaving the configuration unchanged overnight, traffic may fail again the next day

Packet captures taken on the IPsec interface confirm that:

Health check probes (HTTP) are sent

Tunnel establishment remains up

Data traffic becomes one-way or drops intermittently when health checks are enabled

At this point, based on your answers, our configuration appears to align with documented and expected behavior, yet the instability persists only when health checks are enabled.

We agree that further investigation via the Support Case and backend logs is the right path forward, and we are continuing to work with Meraki Support to identify why traffic becomes unstable under health check conditions in our environment.

As an update from the active support case:

Meraki Support observed that the health check probes (sourced from 192.0.2.3) were reaching the endpoint kix06s10-in-f14.1e100.net over HTTP (google.com), but the probes were receiving HTTP 404 responses. Based on the backend logs, the IPsec tunnels themselves remain up, however the health checks are failing due to the HTTP response, which likely contributes to the intermittent traffic behavior we are seeing.

At this point, we are checking with the customer whether it is possible to deploy a simple HTTP service within AWS that can reliably respond with a valid HTTP 200 status. Once confirmed, we plan to update the health check endpoint to this AWS-hosted domain and re-test the tunnel stability.

For reference, the MX appliances are currently running firmware version MX 19.1.11.

Best regards,

Jinsoo Park

Tony-Sydney-AU

Hi @JinSoo_Park ,

Thanks for sharing such a detailed answer.

If you don't mind, I'm sending you a private message.

I would like to have a closer look at your support case.

I'm too curious to let this go now. I want to see how this evolves and get a solution.

Thanks again.

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it.

Tony-Sydney-AU

Hi @JinSoo_Park ,

I'm sharing here what I found after looking at your support case.

1. Your ticket covers mostly MX High Availability (HA) a.k.a. Warm and Spare ; the failover behaviour in HA isn't directly related to IPSec VPNs.

2. In your use-case, your primary MX is running firmware 19.1.11; therefore, multilink isn't stable here. Multilink requires 19.2.2

3. In your use-case, MX would have one active tunnel at a time. The other tunnel endpoint would be established but in standby mode.

4. Refer to this document. It explains tunnel failover feature and scenario and how health-check triggers tunnel failover.

5. Like you said, a health-check probe getting 404 would flag target as down and result in tunnel failover. As per above document, "“Down” means that the most recent probe has failed (ICMP unreachable, TCP reset, HTTP 4xx or 5xx code or similar) or timed out (packets lost) after 10 seconds."

6. [most important in my opinion] If possible, run firmware 19.2.2 even though it's a Release Candidate (RC) if you want multilink (active-active). Or even better, run 19.2.2 RC and BGP peering to AWS VPN and TGW with BGP rather than static route. Doing this would allow you to disable health-checks and just rely on BGP timers to detect tunnel and peer state. This ultimately prevents incorrect failover due to health-check 4xx or 5xx status codes. Another advantage is you don't need to manually update TGW route tables because BGP would auto-propagate any routes back to on-prem that VPC would need.

Hope this information is useful. Feel free to post here further questions / concerns.

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it.

JinSoo_Park

Hello @Tony-Sydney-AU

Thank you for your detailed explanation and recommendations.

Regarding the suggestion to use BGP:
On the MX side, we understand that we could proceed with the configuration,
however on the AWS side there is currently no dedicated AWS engineer available.

The customer is only running a few basic EC2 instances and does not have the resources or expertise to manage Transit Gateway BGP configuration.

This is also outside the scope of our current support responsibility, so implementing BGP on AWS would be difficult at this time.

Concerning the firmware version:
You mentioned that multlink active-active requires firmware 19.2.2 or later. In our dashboard,
version 19.2.2 is not selectable, but 19.2.4 (RC) is available.

The customer has expressed concerns about running a Release Candidate version in a production environment.

Could you please advise whether 19.2.4 RC is considered sufficiently stable compared to a general release, especially for production use?

Any guidance on the risk level or real-world stability would be very helpful for us to address the customer’s concerns.

Thank you again for your insights and support.

Kind regards,
Jinsoo Park

Tony-Sydney-AU

Hi @JinSoo_Park

The 19.2.4 has demonstrated to be stable in my experience in Support Team.

But in my personal opinion (not as a Meraki empolyee) is don't run RC unless I absolutely need the new feature present only in RC.

Regarding BGP, TGW doesn't have BGP by itself. You would need to create a new AWS VPN endpoint this time it would be a BGP routed VPN and attach it to your TGW.

But if there are only some few EC2 it looks better to avoid complexity in your solution and just run 19.1.11 and do single tunnel with fail-over.

And for health-check, my personal suggestion is you check against your EC2s rather than something outside AWS.

Happy New Year!

If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it.

Meraki

Community

Site-to-Site IPsec VPN issue with Primary/Secondary tunnels – traffic selectors changing to 0.0.0.0/

Site-to-Site IPsec VPN issue with Primary/Secondary tunnels – traffic selectors changing to 0.0.0.0/