So, here's my issue. I talked to support the other day when I had a major outage. One of the things I was told by support is the MS-320 LAN IP (management address) CANNOT be the same address used for the Uplink Interface. I had been using /30 subnets for the connection between my MS-320s and the L3 Cisco 3560 that they talk to. Mind you, this setup was in place before I started on the team.
Here's the problem- I had one MS-320 that never went down. It's VLAN Interface back to the 3560 is xxx.xxx.xx.28/30, with .30/30 as it's address and .29/30as it's route/next hop...... AND it's LAN IP (management) is xxx.xxx.xx.30/30.
So, why does it work, when I'm told that it won't work?
This is almost more of a phone call.
Only the management IP address talks to the cloud to get a config. It also does not use the routing table, just the statically configure default gateway.
Lets take a simple case. If the management IP address is in a VLAN where it can see the device providing Internet access and that is also its default gateway it can simply talk to the Internet to get its config.
Lets take a nastier case. If the management IP default gateway points to the switch itself and then talks to the Internet we can get into a race scenario. The management IP needs the switch up to talk to the Internet, but it can't get the config from the cloud to get the config to bring the switch up. Catch-22.
Lets say you configure up the switch in your lab, and the switch pulls its config. You then go configure its final management IP address so it uses the switch to get to the Internet and put it into production. Now this will actually work - because the switched has already pulled a working configuration. But you are now on a knife edge. If a software update or anything else resets the config then the management ip wont be able to get a config to bring the switch back online, and without the switch online it can't get the config. Catch-22.
The switch that never went down worked because its next hop was another switch.
You should convert the /30 to a /29, and put the management IP and interface IP into this subnet with the default route pointing to the upstream device.
This is all assuming you have your switch in layer 3 mode doing actual routing.
Here's the thing- I am NOT willing to change something that is working after the SNAFU I dealt with last week. Out of 5 MS-320 switches that I'm saddled with, 2 of them worked perfectly using this method. 3 of them did not... yet every single setting for the VLANs on the 3560 are exactly the same, as are the interface/route settings on the Meraki side. By same, I mean VLAN50 is .24/30, VLAN51 is .28/30, VLAN52 is .32/30.... etc. Only VLAN51 and VLAN52 never had an issue.
Two switches that are experiencing the 87% update fail have been moved to a /29 subnet. They briefly talked to the Meraki cloud- but as soon as 87% hit, nothing. Power cycle.... nothing.
The worst part is.... I'm the FNG. I've only been with my job for less than a year. Though I've done some stellar background work to clean up the network mess I was left, I now look like a total effing moron because of what I believe to be faulty and inferior equipment.
This only affects the VLAN on the MS switch that is used to talk to the Internet. I'm not sure how all of these VLANs fit into your picture.
Briefly, the topolgy is an old MetroLAN. It's a school district in CA with 5 sites. All 5 sites used to route to the 3560 then through an MX400 (passthrough mode) and into a 4506 and beyond for firewall/filter and then CENIC internet. Two of the sites are now off the MAN and connecting via ASRs into the 3560 and then beyond the same way. You wouldn't even believe some of the just bizarre behavior I've seen from these Merakis- all the way down to MR WAPs. (Like 1 single WAP being online when nothing else around it or above it was).
There's a defunct management VLAN that hasn't worked since July, but was always flaky according to the county network admin. There's a VLAN from the 3650 to each specific site. Each site has it's own internal VLAN. Each site, or at least the remaning 3 on the MAN, belong to a VLAN for Site-to-Site connectivity/mesh. The only trunks are from the 3560 to the MS320s, and they are all set perfectly fine.
Assuming the MS320's are in layer 3 mode and are doing routing towards the network core; despite that it has previously been configured using /30's - it is still an invalid config.
You need to change them to being a /29 (or broader). The VLAN IP and the management IP of the MS320 needs to be different.
If may be the switch is trying to pull something from the cloud during the upgrade and failing because of the invalid design.
I did. One of the 320s that failed at the update was on a new /29 that I set up and showed the proper IP in the local page.
What you're seeing is the kind of behavior that happens with this (as @PhilipDAth mentioned) invalid config. A L3 interface configured on the MS cannot be used as its Dashboard uplink. You need to separate the VLAN's used for Dashboard uplink and routing.
Yes, this is a pain and doesn't make sense if you've come from a non-MS background.
Mrcur- That's the thing..... an invalid config that works. I don't get it. I've got screenshots to prove it works. Or doesn't, depending on god knows what.
The two MS320s that were moved to a new circuit have the same VLAN for management and uplink just fine, on a /29. It's not the VLAN, it's the amount of addresses. But, then again, when a /30 works just the same what the hell do I know? It's tech, it's supposed to be logical, but often times it's really not.
I have a similar case
The switch is operating in L3 mode.
I have a separate VLAN to connect to the WAN router (/30)
First IP in this VLAN is used by the router, the second for the switch.
This ip is both used for the uplink and the management.
Never had a problem.
Should I create a new VLAN just for management?
It really depends on how much you want to muck around. Part of the problem is trying to make changes on the fly without the switch updating before you finish. It would really be nice to have a feature where we could sideload a change and then activate it, because sometimes when you're trying to change the static LAN/Management IP, interface, and default route all at the same time, it's no bueno. Even being able to set a manual timer for config updates would be pretty damn useful.
I'm finding that sometimes, with these Merakis and their convoluted way of making changes, it's just best to leave it alone if it's working and you're not seeing weirdness.
@Asavoy I know it works - sometimes. What I'm saying, and what I think @PhilipDAth is saying, is that you should really plan to move to the Meraki supported setup and go from there. I absolutely want the MS line to support the setup you are using officially. It would save me VLAN's and IP space, but for now I have it running in a supported setup. The Dashboard uplink has its own VLAN/subnet that's tagged on the uplink port. The VLAN I run OSPF on is untagged on this same port back to the core the MS L3 switches uplink to.