Have 60 onboarded Catalyst 9300 switches in Meraki cloud and 9 switches went offline

Solved
TOCNY
Here to help

Have 60 onboarded Catalyst 9300 switches in Meraki cloud and 9 switches went offline

I have a ticket open and unfortunately support doesn't have an answer at this time.  I have approximately 60 Catalyst 9300 series switches, most in stacks onboarded to the Meraki cloud.  Last night at the exact same time 9 locations had their switch/stack go offline and unreachable.

 

A review of the affected switches shows they are all working and passing traffic locally and all devices have Internet access.  We were able to reboot a switch at a site with no one working and the switch imemdiately came back online after the reboot.

We opened the ticket with support to hopefully avoid having to reboot the remaining affected devices and support had limited suggestions as the devices in question are offline and the device I rebooted is back online however like with a Meraki MS switch once rebooted all logs are flushed and provide no additional help to support.

To complicate matters more is the fact that I have other like model stacks and single switches all on the same management vlan and some additional stacks at the same physical locations appearing online, while a few others remain offline.

 

I have seen in the past issues with cloud connectivity that show me devices scattered throughout the WAN reporting down, however internal testing showed otherwise.  In all previous instances of this all switches that showed down for no reason, came back online at some point in the day for no reason, however not this last outage, which has the remaining switches that went down still reporting unreachable.

Meraki support was going to try and engage some additonal support from the Cisco side and I agreed that I will reboot all remaining affected switches but 1 to get them back up and monitoring and leave one switch unreachable for additional troubleshooting with support.

Anyoe else have any similar experience with some of their enterprise Catalyst switches going down when like devices at the same site remain up?

 

Thanks,

 

Mark 

1 Accepted Solution
TOCNY
Here to help

I found that if you recycle the loopback1000 interface as well as the TLS Gateway communications resume with no switch reboot 🙂

 

Crypto TLS-Tunnel Submode Commands:
cc-mode Enable CC-Mode
device-id Device Identifier
exit Exit from crypto ssl policy sub mode
local-interface Specify the WAN interfaces
mode Server Mode
no Negate or set default values of a command
overlay Specify the Overlay Type
pki Specify the pki trustpoint for auth
platform-disable Disable Platform support
protection Cipher-Suite
psk Specify the Pre-Shared Key
server Specify the Server Address
shutdown Shutdown the TLS-Tunnel
virtual-template virtual-template

View solution in original post

6 Replies 6
alemabrahao
Kind of a big deal

Are all switches running the latest stable version?

I am not a Cisco Meraki employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.
TOCNY
Here to help

The Catalyst switches are running IOS XE 17.3.4, certainly not the latest but a very stable revision.  All Catalsyts are in "monitor only" mode as they have not been flipped to the Meraki "cloud based IOS XE" yet.

Logs show nothing so we are scheduling a reboot this evening.

Mloraditch
Kind of a big deal

So as far as monitor mode goes, it has been known that sometimes you just have to remove it and redeploy it for it to work again.

If the reboot fixes it great, but if not, you may be looking at that. Also may want to look at upgrading to 17.15.3 and using the new native Hybrid mode.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
TOCNY
Here to help

You are correct as that was my solution for some devices in the past that didn't come back up naturally, it's a PITA but it does get the device talking again.

After speaking with the Meraki engineer today he got me thinking that the 9300's actually report as having (2) serial numbers, the Cisco chassis s/n and a Meraki s/n.  That is an indication that the switch supports the conversion from the Cisco DNA license to Meraki's cloud license and then you get pushed a modified IOS XE version that supports Cisco's Cloud DNA management.  This then provides us with full read/write capabilities and the only setback is old school Cisco guys like me live in the cmd line world and once we flip the license and the IOS XE version we lose command line and everything is managed via the Meraki GUI.  I'm wondering if it would stablize the connectivity issues moving forward.  I believe I am going to take a few lab enviornment 9300's and flip the DNA license and the modded IOS version and try full management on these switches.

 

Mloraditch
Kind of a big deal

I've seen this with regular Meraki switches and usually after an outage or firmware update. Something just gets janky internally and a reboot fixes it. You can try and put time into it, gather support bundles (if possible), but usually expediency of a reboot wins out over finding out the issue. At least unless and until it happens repeatedly.

If you found this post helpful, please give it Kudos. If my answer solves your problem please click Accept as Solution so others can benefit from it.
TOCNY
Here to help

I found that if you recycle the loopback1000 interface as well as the TLS Gateway communications resume with no switch reboot 🙂

 

Crypto TLS-Tunnel Submode Commands:
cc-mode Enable CC-Mode
device-id Device Identifier
exit Exit from crypto ssl policy sub mode
local-interface Specify the WAN interfaces
mode Server Mode
no Negate or set default values of a command
overlay Specify the Overlay Type
pki Specify the pki trustpoint for auth
platform-disable Disable Platform support
protection Cipher-Suite
psk Specify the Pre-Shared Key
server Specify the Server Address
shutdown Shutdown the TLS-Tunnel
virtual-template virtual-template

Get notified when there are additional replies to this discussion.