I've been recently struggling with a problem C9300L switch (less than 6 months old) in a two switch stack where its management container stops working but the member switch isn't taking over the management plane.
I've done the following troubleshooting steps and had an active ticket with Meraki support while doing these steps.
- Force the member switch (SW2) to become the active switch by rebooting the active switch (SW1). This provided me with two weeks of stable operation
- Update the switch firmware to CS 17.2.1.1 which elected SW1 to be the active switch again. We got 20 days of stable operation before its management container stopped working again.
The stack is still passing data plane traffic normally, but I'm perplexed as to why our member switch isn't able to takeover management of the stack. While this is only a two switch stack, what happens if I have this issue with a larger switch stack? I have switch stacks ranging from 2 switches and up to 6 switches. If the members of these stacks can't just automagically take over the management of the stack then I'm losing both visibility for monitoring and the ability to make configuration changes.