I'm at a loss here and I could really use some guidance or at least know if others are experiencing what I am with the MS120 switch series. My STP settings were preconfigured before I added any devices to my Meraki Network. This is a 100% Meraki Network. Our network consists of two MX100s connected to a MS425 stack in our MDF. MS425 Switch 1 - Port 1 - ISP1 MS425 Switch 1 - Port 2 - MX100 Firewall 1 Port 1 (Internet 1) MS425 Switch 1 - Port 3 - MX100 Firewall 2 Port 1 (Internet 1) MS425 Switch 1 - Port 6 - MX100 Firewall 1 Port 4 Downlink Tagged for All VLANs except ISP1/ISP2 MS425 Switch 2 - Port 1 - ISP 2 MS425 Switch 2 - Port 2 - MX100 Firewall 1 Port 2 (Internet 2) MS425 Switch 2 - Port 3 - MX100 Firewall 2 Port 2 (Internet 2) MS425 Switch 2 - Port 6 - MX100 Firewall 2 Port 4 Downlink Tagged for All VLANs except ISP1/ISP2 I use VRRP and HA. My MS425s have a single Multi-Mode or Single-Mode SFP link (Native Management VLAN and Tagged for All VLANs) to our IDF switches Two of these switches are MS225s that are in the same rack as the gear above using 10G black cables - I forget the name). MS425 Switch 1 and MS425 Switch 2 Port 11 are Aggregates to MS225 SW 1 Port 49, 50. MS425 Switch 1 and MS425 Switch 2 Port 12 are Aggregates to MS225 SW 2 Port 49, 50. Right now the MS425s have just one server on them that is Linux based and used a software Bro for its uplinks. I see no issues with this as its not a true LACP and my logs are blank in terms of errors. We will later connect more stuff to it but for now, that's all it has. As I stated above, the additional SFP ports on the MS425s are used for our IDFs located throughout the building. Each port goes to a different IDF. One cable (SM or MM) to each IDF. It was distance from the MDF that determined the cable media. Each IDF is an MS120 either 24 or 48 Port LP switch for phones, MR74s, workstations and printers. For the sake of troubleshooting though, I disabled all ports except the uplinks. I will note though that before I dive into the issue more I did enable BPDU Guard on all access ports and RSTP was enabled on all interfaces. There are 3 instances where I didn't have enough fiber run in the building and needed more capacity. In two of those instances: MDF MS425 ------> IDF MS120 ----> IDF MS120 These MS120s are in the same IDF. The last instance: MDF MS425 ------> IDF MS120 ----> IDF MS120 | IDF MS120 Not sure if the spacing will come out right when I hit Post but essentially one IDF in this case with an MS120 has a Fiber link on port 52 to the MDF MS425 stack and Port 1 and Port 2 are Native Management All Trunk Ports to an MS120 (One switch each port). Again, I ran out of fiber. The issue we faced seemed to be related to RSTP and proper delegation of the root. I've set the root under Switch > Settings as my MS425 stack and all switches show the MS425 stack as the root (set to 0). For one, it didn't seem to block any BPDUs when you have BPDU Guard enabled on untagged ports. If I created a loop it wouldn't block it. I thought that was odd. Root Guard just took down IDF switches. I tested Root Guard on instances where like above where an MS120 has an uplink to the MS425 stack but 1 or 2 downlinks to another MS120. I'd try root guard on the downlinks. With 30 switches and 4 undeployed the network, at random a MS120 would request to be the root despite the root being set as my MS425 stack - set to 0 and the switch itself recognizing this under its Monitor page. You can see from each switch the root was in fact the MS425 stack as well. Not just the one making the request. When this happened, all switches would broadcast that they wanted to be the root. See example below: Dec 9 13:59:21 Port STP change Port 1 root→designated Dec 9 13:59:20 Port STP change Port 1 designated→root Dec 9 13:59:19 Port STP change Port 52 root→designated Dec 9 13:40:06 Port STP change Port 52 designated→root Dec 9 13:40:05 Port STP change Port 52 root→designated Dec 9 13:39:50 Port STP change Port 52 designated→root Dec 9 13:39:50 Port STP change Port 1 root→designated Dec 9 13:39:50 Port STP change Port 1 designated→root Dec 9 13:39:49 Port STP change This log output would be for every uplink to the MDF or downlink to another IDF. So if there was one fiber connecting an IDF to the MDF that would be the only port flapping but if it were an MDF connected to an IDF with 1 or 2 other IDFs daisy chained to the IDF with the uplink even those ports would flap too. I thought it was propagating from a particular IDF that had a direct fiber to the MS425 stack but also connected to 1 or 2 other IDF switches. So, I tried to use root guard on the other downlinks. Even with root guard enabled on the other IDF switches (not the uplink to the MS425 stack) the network would all flap their uplink port. That didn't make a difference. After trying to figure out why this was going on like I said above I shut every port down to ensure there wasn't an actual loop. That is, for example, Port 1-24 Disabled and 1 Fiber Uplink enabled. In the cases where IDFs were connected to 1-2 other MS120s I left those ports up. I enabled 1 interface on a switch and connected a laptop to it. I would drop 30+ packets trying to ping my default gateway on my MS425 stack before I got a successful ping. Gear showed green on the dashboard. The network was unusable. After about 20 hours of troubleshooting and 12 of that on the phone with various levels of Mearki support and zero luck getting us up I removed everything but the MX100 firewalls and replaced it with Cisco Catalyst gear that's about 7 years old. Network ran without an issue. Clean logs, flawless. My MDF ran fine prior to adding all the IDF switches - the MS120s. I have not yet deployed the MR74s but I am aware of the lovely way they go into repeater mode and so I haven't connected them yet. The issue was not with our MDF which consists of MS425 Stack, 2 MX100 firewalls and 2 MS225 switches. The issue is with the MS120 series and it seems like a hardware bug. I am having our consultant come by this weekend to help me pull all the gear into a lab and test again. My fear is when this got really bad it would knock out the firewall's connection to both ISPs and you cannot view the dashboard at all. No dashboard, no logs, no support no nothing. I have gear sitting around for 22 sites like this and I'm honestly terrified to connect any of it. I deployed 4 other locations. 2 are 100% Meraki and the other 2 are a mix of Meraki and Cisco Catalyst. I was going to make them all 100% Meraki but stopped when I discovered this lovely experience over the weekend. The topology, whether its 100% Meraki or not doesn't matter. Adding the MS120s to my Cisco networks that are 1/2 deployed crashed in a similar fashion. I forgot to mention that when they do "crash" the gear takes up all the IPs in the Management network trying to connect over and over so then you start loosing DHCP IPs in other VLANs because when one pool runs out it just goes to another. I'm not really sure what to do now. It all seems like a hardware bug. I can't see any source code so I'm not really sure though but if I remember reading online Meraki doesn't use typical VRP routing, I can't find a cost option anywhere and you can't really set any other STP settings. Just RSTP and if disabled STP kicks in automatically. After hours on the phone, email, and calls I still haven't got an answer. What scares me is if I didn't have other gear to connect I'd still be down and the gear that I have is like 7-10 years old. A gamble for as large as a company this is. Very scary stuff. Has anyone had this experience and this much trouble? Here is a rough diagram (not perfect but pretty spot on):
... View more