MS-320 issues after power outages

Asavoy
Building a reputation

MS-320 issues after power outages

We recently upgraded our bandwidth at two sites and our provider has us using Cisco ASR920 devices that my MS-320s link to.  Unfortunately, one site is under construction and power is frequently cut for longer than our battery backups can manage.  When the power comes back on, the ASR920 takes about 13 minutes to go through its boot cycle, and due to that length of time the Meraki never comes back up properly- after it initially doesn't communicate with the ASR920 it doesn't try again, it seems.  So, I have to manually power cycle the Meraki MS-320 to get it to come back online.

 

Is there any setting to have it force a reboot or force attempting to re-link after a set amount of time?

23 Replies 23
Adam
Kind of a big deal

Interesting, my understanding is that the device keeps trying to connect to its upstream WAN regardless of how long it takes for that upstream device to come up.  An example of trying to reproduce this would be to power on the switch and to wait 15 minutes to plugin an upstream network interface to it.  I do this all the time in my office when I'm staging switches and it comes up pretty quickly once I connect the upstream link.  Could you try bouncing the port/uplink from the ASR920 to the MS320?  

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
PhilipDAth
Kind of a big deal
Kind of a big deal

I have seen when the devices can't connect for a "while" that the frequency at which they retry to connect drops off (like maybe goes from 5 minutes to hourly, not sure).

 

How about getting a delay timer.  Set it so the the MS320's don't get power till 10 minutes after the power is restored.

 

Another thought, can you get additional UPS capacity for just the ASR920?

 

Another thought; put in an MX.  One connection goes to the ASR and the other has a 4G router in it.  This will allow a backup Internet link to come up much faster.

 

Next thought; once the ASR920 is back online you should be able to get remotely to the local status page of the MS320's.  I suspect if you went into the uplink configuration and simply saved the settings again that would force a check for it to come online.

Asavoy
Building a reputation

@PhilipDAth  I'll try to get to the local admin page, that would be much better than schlepping through the construction zone to get to the switch.  I have a feeling that there's something in the ASR that maybe shuts the port that the Meraki is plugged into.  They site is currently without power, so I should be able check in half an hour or so, as long as the battery backup dies.

 

@Adam I don't think that method would help here.  I'll be able to find out in just a little bit when they restore power to the site.  I'll look at the ASR's port state and trying to go from there.  When these power issues occur, I also have to reboot the downstream MS410 and MS220 devices for them to behave as normal, but never have to touch my Cisco SG350s.

 

The connection goes like this- ASR920 -> MS320 -> MS410 -> MS220 with everything BUT the MS220 in the same room.  That one is on the other side of the campus.

Asavoy
Building a reputation

So, quick update.  I've found that my battery backups last at least 1.5 hours.  Power went down late yesterday, so of course the network was down this morning.

 

I physically plugged into the management port of the ASR920- the status of the port is down.

 

So, basically, the Meraki boots up first and tries to say hello to the ASR920.  Since the ASR920 is still in boot process, it cannot respond.  The Meraki goes, oh...nobody home, I'll try again in an hour.  The ASR920 comes to up state and tries to say hello to the Meraki, but the Meraki is already out to lunch, so the ASR920 goes oh, I can turn this port off since nobody is there.

 

So.... this needs corrected.  I cannot be the only person in the Meraki universe that has this issue.

PhilipDAth
Kind of a big deal
Kind of a big deal

Once the ASR comes up can you then access the MS320 via the local status page?

 

How does the MS320 gets its IP address?  Static?  Remotely via DHCP via the ASR920? Locally via a local DHCP server?

Asavoy
Building a reputation

@PhilipDAth I cannot get to the Meraki via local status page.  The ASR puts the port that connects the Meraki into a down state because the Meraki is not responding to ack requests when it (the ASR) reboots.  Like I said, it's an inordinately long 13 minute or so boot time.

 

The Meraki's uplink/management IP is static, and it's LAN interface IP is reserved in my domain controller's DHCP table.

 

 

PhilipDAth
Kind of a big deal
Kind of a big deal

But can you get to the local status after the ASR port comes up?

 

Also - that does not make sense.  Why would the ASR expect an ACK from the MS320?  What kind of protocol is it taking to the MS320 to expect this?  If we know that protocol we can probably make an adjustment.

Asavoy
Building a reputation

@PhilipDAth Perhaps ack wasn't the right terminology- I just meant acknowledgement that there is something alive and active on the port- handshaking may be a better term?

 

Once I physically pull power from the MS320 for a second and then plug it back in for it to do a full reboot, then yes.... the ASR puts the port into up state and I can reach the Meraki via the local page.  By then, there's no point because it's fully available on the dashboard.

 

BTW, they are connected with 1000Base-SX SFP and multimode fibre.

PhilipDAth
Kind of a big deal
Kind of a big deal

Has the ASR920 or the MS320 got any protective protocols enabled like UDLD?

PhilipDAth
Kind of a big deal
Kind of a big deal

Are you using recent firmware, like 9.37 or 10.26?

PhilipDAth
Kind of a big deal
Kind of a big deal

Are you using a genuine Meraki SFP?

Asavoy
Building a reputation

@PhilipDAth Using 9.37 firmware, and no.  AFAIK there's no special settings on the ports.

PhilipDAth
Kind of a big deal
Kind of a big deal

Are you 100% sure on the ASR side the port has not gone into an err-disabled state and you are simply resetting this by rebooting the MS320?

 

What does the company that looks after the ASR920 see as the reason for the port being down on their side?

Asavoy
Building a reputation

@PhilipDAth  The console 'show interface' from the ASR reads "GigabitEthernet0/0/4 is down, line protocol is down"

 

After power cycling the MS320, it reads "GigabitEthernet0/0/4 is up, line protocol is up".

 

Zero changes to anything other than power cycle of the MS320.

 

The company that looks after the ASR920?  That's me.  They only installed it and made sure it was running.

 

I can check next time it's down (probably tonight) to see if I can dig further into the state of the port.

Asavoy
Building a reputation

@PhilipDAth Okay, so this morning I tried to console into the ASR and do a 'no shut' command on the port, but no change.  On the MS320 I have two SFP ports in use, one goes to the ASR, the other goes to a MS410-16 fibre aggragate switch.  Neither of those ports have the amber indicator lit, and neither does the port on the MS410.

 

So, it's becoming quite obvious that the MS320 is shutting those ports off.  Plugging directly into the MS320 and accessing the local interface give zero indication of what's going on.  Try to disable/enable the ports in question to no avail.  There's really nothing else to do.

 

And no, they are not Meraki SFP modules but are Cisco ones.  Shouldn't matter in the least for what is going on here.

 

Edit- This is basically all I get from the event log:

Jun 20 11:55:58 Port STP change
Port 28 disabled→designated
Jun 20 11:55:58 Port status change
port: 28, old: down, new: 1Gfdx
Adam
Kind of a big deal

Did you try doing a shut, leave it for 5-10 seconds, then a no shut?

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
Asavoy
Building a reputation

No, really don't think that would help, but I can try it next time.

 

The MS320 and MS410 are on a different battery backup than the ASR920 is.  Today, the Merakis went down (power), but the ASR did not, so when power was restored everything came up perfectly fine.

 

This tells me pretty much for certain that it's a weakness in the Meraki code.  When it doesn't see a device on the other end of the SFP port it shuts it down and doesn't even bother to check for a 'new' device at any time, until it's rebooted.

PhilipDAth
Kind of a big deal
Kind of a big deal

I think I would try going to the 10.26 code on the MS320.  I don't think you have much to loose.

Asavoy
Building a reputation

@PhilipDAth Yeah, that's what people said about the 25.11 code for the MRs and I'm pissed that my wireless network is worse than ever with it.

 

I'm done with updates.  I'm damn near done with Meraki.

 

What engineer thought it appropriate to program the SFP ports to stay in a shutdown state until a power cycle?  Because essentially, that's what is happening.  If I plug something into an open standard port, it leaves the shutdown state.  If I plug something into an open SFP port, it stays in the shutdown state.  Considering their datasheet touts the "4x 10Gbe SFP+ Uplink ports for core connectivity"?

Asavoy
Building a reputation

@Adam I've tried rebooting the MS410 and the ASR920 first, but the MS320 will not even look at anything 'new' on those SFP ports until it gets a reboot.

 

It's most certainly a bug or oversight.

Asavoy
Building a reputation

I've kicked this to Meraki support.  We'll see what they say about it.

Asavoy
Building a reputation

And, just to update....

 

I was finally able to get onsite right after the MS-320 went down.  It was not a power issue at the site.  It just decided to shut those ports off.

 

After a few reboots and troubleshooting, it does turn out that the uplink SFP port was suffering intermittent failure.  Unfortunately, because of my setup (and lack of listening to my own advise) I did not have a backup port pre-set for the trunk info that I need.  We know what the local interface does for us in that situation..... a big fat nothing.

 

It took a few more reboots to get the port to finally come up.  At that point in time, I set up the secondary port and moved the uplink over to that.  If it stays up all weekend, I'll know for sure, but I'm pretty sure as it is.  So, yes, it was a power loss issue, but no it had nothing to do with the power loss, just the fact that the SFP ports woudln't intialize on reboot.

Adam
Kind of a big deal

Great to hear that you got to the bottom of it.  Thanks for coming here to close the loop. 

Adam R MS | CISSP, CISM, VCP, MCITP, CCNP, ITILv3, CMNO
If this was helpful click the Kudo button below
If my reply solved your issue, please mark it as a solution.
Get notified when there are additional replies to this discussion.