I'm working on a solution for a Meraki HA problem for a client. Since the customer has two uplinks going into the active MX-450, when the primary link goes down, the devices do not fail over to the standby switch which has a standby uplink. There are three uplinks in this scenario, two in the active switch, one in the standby.
My thought is that if I wrote a interface tracking script to monitor the status of the actual status of this interface (connected/disconnected), I could then automate the second link in the active switch to disabled mode, forcing the HA pair to failover to the secondary.
However I don't think this is going to work in practice, the API call to for dashboard.switch_ports.getDeviceSwitchPortStatuses() is ludicrously slow, so even if I did write the script, it would be a long failover event which is no bueno.
I was thinking maybe instead I could monitor a route or do this some other way. I figure using the alerts feature and a webhook would be less complicated but unfortunatly the alert won't pop until 5 minutes elapses, which is also no good. They client may have to end up removing the non-primary uplink on the active switch as it is interfering with HA, but I wanted to see if I could get some insight or any ideas here.
I think the issue is that the uplinks probably aren't in the ideal configuration for HA, as well as the fact that you apparently can't do 3 uplinks on an MX. It would be best if we knew what your topology looked like exactly.
Ideally, you'd bring circuit A into your switching on VLAN X, and attach both MX's primary uplinks to the switching on VLAN X. Repeat for circuit B, on VLAN Y, with the MX's secondary uplinks also on VLAN Y. This would utilize a switch stack, with each circuit on a different stack member.
This allows either MX to operate on both circuits, via VRRP.
This is problematic however if your circuits do not have at least a /29 allocated, because VRRP won't have enough enough addresses.(Network ID, broadcast, gateway, MX 1, MX 2, VIP = 6 IPs)
The 3rd circuit would have to go on some other device, so I imagine that and the above are why you're in this situation.
So that has the same problem as the 5 minute webhook delay, meaning automating quick custom failover using the API is basically off the table.
I'd suggest not removing the 3rd uplink, just move it to the standby MX. Worst case scenario, you just have to manually fail back to the primary, instead of getting stuck with the primary using the 3rd (and presumably worst) circuit. (I'm assuming your uplink preference is MX 1 uplink 1, MX 2 uplink 1, MX 1 uplink 2).