MX64 pair in NAT mode Warm Standby - unstable network and packet loss
I have a pair of MX64s in NAT mode with one acting as a warm spare.
The problem i have is when the internet service for this remote office was ordered it has been delivered as a single copper hand-off requiring PPPoE auth. When the internet service is connected to the Master MX64 with the 'warm spare' NOT connected to the internet the network is extremely unstable. Downstream devices are disconnected from the Meraki cloud, and then restablish connection, and then fall off again.
As soon as i pull the power out of the warm spare device the network becomes stable.
Is this expected behaviour?
I know it doesnt make a whole lot of sense having a warm spare which doesnt have an internet link because it cant really automatically failover. I figured if theres a failure I'll just have to talk the people on-site through connecting the internet cable to the warm spare and they'll be up and running again.
Do the MX's connect directly to each other, or via some other device (like a switch)?
Have the two MX's been given the opportunity to upgrade to the same firmware version (I would use 13.28)?
Does the config show as "up to date" on both MX's?
Any chance you can talk the client into putting in a "backup" internet circuit? I often use 4G hot spot routers. Via the local status page on an MX64 you can convert the first LAN port into a second WAN port. If you plugged this port on both MX units into the 4G hot spot then the second unit would be able to come online properly.
Failing that, what about a cheap backup circuit like an ADSL circuit?
Firmware versions are consistent across all devices. The MX64s are running 12.26 and the MS225s are running 9.32. Config is 'up to date'.
There's a 4G usb dongle on-site that was used while we waited for the wired internet to be delivered. At the moment its only got a prepaid sim which is out of credit so no data service available.
The second internet isnt the concern, the concern is the behaviour when the spare doesnt have internet. I can reproduce the problem in a lab at head office which has the exact same hardware. As soon as I pull the internet from the warm spare the network falls to pieces and I dont understand why.
Thats correct. Only the current master has a white light. The warm spare is orange.
The switches have a white light while the warm spare is powered off but shortly after powering on the warm spare the switches turn orange, even though the master doesnt change. the warm spare doesnt become master. theres no indication of a dual-master scenario.
Sometimes the switches go to white but wont stay that way, they'll go back to orange after a short time.
I suspect it is because the two units are not coming online together before the "wan failure" happens.
Can you do this test for me, plug the cellular link into the spare MX and then see if that resolves the issue. If it does, I suspect that if you pull the cellular unit back out it will keep working a expected.
If it does, then you might be able to try this (never tried it myself). You are not going to have enough ports at the moment ...
On the spare MX use the local status page and convert the first LAN port to a WAN port configured for DHCP. Plug this new WAN port into the LAN side on your switches. This will hopefully cause the spare MX to use the primary MX for its uplink. No idea if it will work. Always wanted to know though ...
Otherwise you'll failover strategy will be to turn off the primary MX and turn on the second MX, and the reverse to fail back. Still a valid as strategy as moving a cable from one unit to another (you could even just used a single power cable, and move the power cable between units).