In recent years, we have started seeing more issues with code updates to the MX product line impacting performance to the point where in some scenarios we have to force the client to upgrade their MX to the next largest SKU.
Somewhere in that statement, there is a position someone could justify making, that if this is the case the MX was too small for their environment in the first place.
As I engage with our local meraki reps to try and get eyes on this. I am also trying to get input from other engineers that have to support this hardware.
Have you ever been in a situation where you needed local logs to prove a firmware bug or hardware failure, but the only way for you to effectively retrieve those logs is to keep your client's environment down for longer to get the needed logs?
I am wondering, is there any way that the Meraki platform could be changed to have a way to write critical local logs to persistent storage? Cisco Phones and many other classic Cisco devices have had ways to do this, creating a short window of time where logs can be retrieved after a device crashes or fails. That information, can then be used to effectively understand what happened and decide next steps.
Since Meraki support requires these local logs to diagnose many issues, I think the best lasting fix for all parties that have to support this equipment, would be to create a way for critical local logs to survive a reboot.
This request is born from situations where the local status page has become un-accessible or unreachable even locally on MX hardware after it begins having an issue that takes an environment offline. That then requires the equipment to be rebooted to bring it back online, which usually does fix the issue, but then no one is able to figure out what happened or why, which means the bug team cannot address a possible issue, and an RMA for hardware that is actually defective cannot be started. Putting us in a situation where a client may have to sustain 2 or 3 outages (or more), that are then extended so a technician on-site can spend time trying to retrieve the local logs.
I understand there are ways to access the local status page remotely from the LAN or with a jumpbox, but again, we are seeing an uptick in the local status pages just flat out not functioning or not being accessible with recent releases of code.
A way for the hardware to write critical errors like these to persistent storage would improve this product line drastically.