All dashboards down. All VPN's went down also

Hight
New here

All dashboards down. All VPN's went down also

All of my dashboards are down but devices are accessible.  All of my IPSEC VPN's to my datacenter are down as well.

8 REPLIES 8
Schmitty
Here to help

This appears to be an ISP internet issue at a large scale.

Schmitty_0-1598789962348.png

 

Just found out that both Level 3/Century Link are experiencing global outages.  Any data centers with Level 3 are down or bouncing.  Century Link is effecting all of their home internet users.  Marking this as solved, not a Cisco/Meraki problem directly

DavidF1
Conversationalist

Uh, it is a Meraki issue if they are hosting in a L3 datacenter, and don't have appropriate alternate routes.

LukeRickley
Conversationalist

Does anyone have any useful updates on this outage?  We are an MSP provider, hundreds of customers affected by this outage.  What we have been able to determine is that if they are Meraki, they are down, if they are using Umbrella DNS, they are down.  A few we have been able to get into and change their DNS lookup servers to google dns has fixed (as least for the moment).  I got to Cisco's Umbrella status page, all green, no issues!? Not cool.

It's an ISP issue. Notinhg to do with Meraki. Apparently CenturyLink/Level3 related, AS3356.
LinkedIn ::: https://blog.rhbirkelund.dk/

Like what you see? - Give a Kudo ## Did it answer your question? - Mark it as a Solution 🙂

All code examples are provided as is. Responsibility for Code execution lies solely your own.

Hello everyone,

 

Thank you for bringing this up.

 

I can confirm this is part of a wider internet outage and isn't Meraki specific.

 

You can see additional information on the outage here and here

 

Edit: Just to confirm, I do work for Meraki but it will take a few days for my account to be verified.

BrandonS
Kind of a big deal

This info from the CenturyLink master ticket on this issue was shared on the outages list about 90 minutes ago.  May be useful to give to execs, etc. asking what happened:

 

The IP NOC with the assistance of the Operations Engineering team confirmed a routing issue to be preventing BGP sessions from establishing correctly. A configuration adjustment was deployed at a high level, and sessions began to re-establish with stability. As the change propagates through the affected devices, service affecting alarms continue to clear.

 

Due to the nature of this outage, it may be necessary to reset your services locally at your equipment, or manually reset your BGP session. If after that action has been performed a service issue prevails, please contact the CenturyLink Repair Center for troubleshooting assistance.

 

Next update by: 2020-08-30 15:50 GMT

 

This notification is sent from an unmonitored email account.  Please click the link below to reply:

 

mailto:SMC@CenturyLink.com?subject=RE%3A%20CenturyLink%20Ticket%20#:%2019545403,&body=Customer%20Rep...

 

Notes History:

 

08/30/2020 14:46:29 GMT - The IP NOC with the assistance of the Operations Engineering team confirmed a routing issue to be preventing BGP sessions from establishing correctly. A configuration adjustment was deployed at a high level, and sessions began to re-establish with stability. As the change propagates through the affected devices, service affecting alarms continue to clear.

 

Due to the nature of this outage, it may be necessary to reset your services locally at your equipment, or manually reset your BGP session. If after that action has been performed a service issue prevails, please contact the CenturyLink Repair Center for troubleshooting assistance.

 

08/30/2020 13:19:02 GMT - The IP NOC has isolated multiple Border Gateway Protocol (BGP) issues causing service impacts across multiple markets. Cooperative escalated  investigation efforts with CenturyLink leadership are underway to isolate and troubleshoot the issue and expedite service restoral.

 

08/30/2020 12:14:09 GMT - The IP NOC advises cooperative escalated investigation and troubleshooting efforts remain ongoing this time.

 

08/30/2020 11:36:30 GMT - The IP NOC is engaged in cooperative escalated investigations to isolate and troubleshoot the fault at this time.

 

08/30/2020 10:48:38 GMT - On August 30, 2020 10:04 GMT, CenturyLink identified a market wide. As this network fault is impacting multiple clients, the event has increased visibility with CenturyLink leadership. As such, client trouble tickets associated to this fault have been automatically escalated to higher priority.

The NOC is engaged and investigating in order to isolate the cause. Please be advised that updates for this event will be relayed at a minimum of hourly unless otherwise noted. The information conveyed hereafter is associated to live troubleshooting effort and as the discovery process evolves through to service resolution, ticket closure, or post incident review, details may evolve.

 

 

 

- Ex community all-star (⌐⊙_⊙)
GaryShainberg
Building a reputation

What I find that is almost bizarre if not unprecedented, that since 2007 when I started using Meraki, I had never experienced a total dashboard outage that completely affected the operational integrity of a Meraki network until last week when there was an ISP issue in Europe that for about 2 hours we lost dashboards which also impacted on VPN access (I guess because the Meraki RADIUS db is hosted within the dashboard platform).

 

And then this has happened again today all be it in the USA

 

However, both outages caused by ISP's with BGP issues.

 

Ironically, one of the key features of BGP is to prevent this type of thing, I would hope (r) hope that some of the top network gurus in Meraki are re-assessing the peering and upstream connectivity and trying to understand why in one week there have been two ISP BGP issues that have taken down significant chunks of Dashboards.

 

I reiterate personally I have never seen this before in the history of Meraki - one would question if someone has made some changes recently to the network access and peering policies within Meraki (Cisco) that has exposed their network to a higher risk.

CTO & Solutioneer
CMNA, CMNO, ECMS2
SNSA, SNSP
~~If you found this post helpful, please give it kudos. If my answer solved your problem, click "accept as solution" so that others can benefit from it.~~
Get notified when there are additional replies to this discussion.