Making data-driven network decisions - where's the need?

GrantShirk
Meraki Employee
Meraki Employee

Making data-driven network decisions - where's the need?

Hi Full-stack crew - wanted to get your take on an analytics and capacity question. We're starting a conversation on the blog this week about practical applications of AI and intelligence, and wanted to ground it in the realities you face every day. 

 

What are the top network challenges (e.g. troubleshooting, planning, support, etc.) that currently have the biggest impact on your team's capacity? 

 

And as a followup, if you could wave a magic wand and automate/accelerate parts of your network operations today, what would you wish for? 

 

[Edit on 9/22]
Our first blog post on the topic is live. 
https://meraki.cisco.com/blog/2021/09/complexity-connectivity-and-capacity-putting-network-data-to-w... 

49 Replies 49
PhilipDAth
Kind of a big deal
Kind of a big deal

Fault detection and diagnosis.

GrantShirk
Meraki Employee
Meraki Employee

Thanks for kicking off the discussion! We'll be digging into the other responses, but on fault detection and diagnosis - are there particular root causes that are particularly pernicious to track down? 

PhilipDAth
Kind of a big deal
Kind of a big deal

>are there particular root causes that are particularly pernicious to track down

 

I find tracking down access because of firewall rules can take a while, because they can be configured in so many places.

Warren
Getting noticed


@PhilipDAth wrote:

>are there particular root causes that are particularly pernicious to track down

 

I find tracking down access because of firewall rules can take a while, because they can be configured in so many places.


I agree - there is the layer 7 rules, layer 3 rules, content category/url filtering, dns filtering.  If it's only blocked because of DNS - there is a page that is displayed to the user.  Content filter may log the event, unless it's "N events were dropped". 

 

Layer 3 and Layer 7 are really the harder ones.  In our case it's typically geo filtering.  The recent MaxMind/Google issue has taught us that the geo-filtering comes from MaxMind not BrightCloud.  Once we have determined the source, getting it to go through is the next challenge.

 

Other firewalls let you set allow rules at layer 7.  I understand that will not be possible in firewall component, and won't be possible without a substantial amount of rewrite.

PhilipDAth
Kind of a big deal
Kind of a big deal

Meraki also has some of these ...

 

 

Packet path visualisation.  Sometimes I get a client saying "x" is not working.  You might need to look through:

* Auto VPN firewall rules

* MX VPN firewall riles

* Layer 7 rules

* Content rules

* Host-based group policies

* VLAN based group policies

* Switch ACLs

* MR ACLs

* Route tables

Probably something else I have forgotten.

 

 

Maybe it would be nice to click on a client, type in what it is trying to access, and say "Visualise" and watch it move through all the different engines to see where it might be blocked.

PhilipDAth
Kind of a big deal
Kind of a big deal

Something that can analyse blocked traffic on MX firewall rules would be nice (in general) and a bit easier than the last one.

 

Every firewall I have ever used lets you watch, in real-time, traffic flows and what is being blocked.  To do this on an MX you have to use a third-party syslog server.

 

It would be nice to have something you can say "start monitoring", you do a test, and then it returns an "expert" analysis saying what it saw that was allowed, and what got blocked.

 

Like syslog or real-time monitoring, but made Meraki simple.

 

Real Meraki simple would be proposing the firewall rule changes, and then a button to apply that recommendation.

cmr
Kind of a big deal
Kind of a big deal

  • An event log message for when a device reboots or starts up if there isn't time to send the down message.
  • Seeing the actual firmware that is running on a device from the local admin page when logged in
NJNetworkGuy100
Getting noticed

This is the scenario I'm in now.

 

We just standardized all MX's worldwide with standard L3 Firewall rules on all VLANs (thanks Python and the API!), but now we have a Deny All rule as the rule above the Allow All built-in rule.  

 

I would love it if I could quickly see what is being blocked by this Deny All rule right in the dashboard, or even via an API script that I could pump to an Excel spreadsheet.  

 

Now I have to start a new project to find and configure a Syslog server to monitor blocked traffic whenever someone claims to have an issue with the firewall.  

Sakiriyas
Conversationalist

Every now and then networks increase in size, administering the network configuration increases in difficulty. Devices can conflict with each other. It becomes interesting to keep the rules in firewalls and other devices up to date, and manually applying policies leads to errors and discrepancies.

TomTownsend
Here to help

An AI controlled router could change DNS, close ports, or dump cache on the fly. It could help identify threats based on real-time security algorithm updates from a world wide threat matrix. At any time capacity is being reached even for a millisecond, it could reach out for help and direction using a hive mind technology. Possibly even autonomously spinning up virtual routers to deal with a threat by dumping traffic like an offramp or to provide additional bandwidth as needed.   

GrantShirk
Meraki Employee
Meraki Employee

Really interesting ideas - I'll definitely pass these on. 

PercyInfostore
New here

To review each user's usage of the network, they should revert to client usage of the network. Now it's just general.
I have to do packet capture to know what they are browsing. It is that at the school level this is very necessary.

RR
Here to help

It's troubleshooting.

 

We are VSAT providers and our customers are always traveling on the water trying to finish their job with very strict timelines.

 

For us, an issue on the boat while they are in the middle of the ocean with no network coverage and only rely on VSAT Internat and VOIP is what makes troubleshooting more challenging.

The longer we take to resolve the issue the bigger the impact is for the customer. Because they have to communicate constantly with offshore using email or speaking on phone.

 

Thanks.

Travis_Ferris
Here to help

The biggest challenges to troubleshooting are:

  1. Remote Analysis via end-user (ie. troubleshooting the end-user in parallel to the reported issue).  There are numerous instances where this can play out, but one example would be having to answer the following:
    • Did the device reload?  (environmental issue, memory crash, or code upgrade)
    • Is the switch losing power?  (Is it bad switch or faulty/lose power cable)
    • Is it a bad UPS or dirty power in the building?
    • Is the circuit bouncing and the branch power cycles the switch?
    • Having device uptime with reload reason would be helpful in troubleshooting these types of issues and comparing timestamps with other logs and devices.
  2. Cisco and Meraki barriers
    • For the most part, Cisco and Meraki hardware play nicely with each other, but some things like ARP or MAC tables at a root Meraki device does not show every downstream device, almost like once it crosses the brand barrier, the information available is limited.
  3. Better upgrade status notifications
    • It's difficult to know when a device is fully upgraded when performing upgrades remotely.  When in the lab, I can see the light status of the device when it is reloading, but when upgrading devices at our numerous branches spread out across North America, I have to rely on the dashboard and continuous pings to know the progress.  As easy as the process is to upgrade devices, it's nearly impossible to tell if it actually completed.
    • The status under Organization > Firmware Upgrades shows different than BranchNetwork > Switches > Staged Upgrades, not updating until 30 minutes after the device is upgraded.
    • The status under the Branch Network -> Switch Summary (or individual) status updates the code version in the Dashboard/GUI immediately, before it actually loads it to the switch
    • Running a constant ping usually confirms when the switch reloads and boots to the new code version, but sometimes I've not lost any pings, almost as if the switch either never truly upgraded, yet everywhere inside the dashboard shows the device is on the new code version.

 

ksbrowning
Here to help

I think the biggest challenge is having the man power available to troubleshoot the network quickly and easily. Also to be proactive in identifying and resolving user issues before it turns into a dumpster fire. 

ShahnMonson
Conversationalist

There are a couple but the one that comes up the most is that we have had multiple request to schedule a reboot for some of our MXs and it would be nice to be able to schedule a reboot of one, much like we do with firmware upgrades. The only other thing I can think of is the monitoring in place. We get alerts to the point it is just white noise that no one pays attention to anymore. We have it to send and email to a DL if the VPN connection goes up or goes down, but there is no option for a time threshold. For example we get this quite often: 

At 07:06 AM CDT on Oct 8, the site-to-site VPN connection to XXXXXXXX went down.
At 07:07 AM CDT on Oct 8, the site-to-site VPN connection to XXXXXXXX came up.

 

If we could set a threshold of say 5min or 10min the alerts might become more actionable. 
CoachSteve
Conversationalist

Meraki is really an awesome product and I would recommend it to anyone.

The only challenge I've had with it is the reporting where you can't filter by SSID.

Working in a very big corporate/enterprise company, these type of reports are very crucial to execs.

 

EZnetwork
Conversationalist

With my magic wand, I will put on the right top corner on the Insight--> WAN health page, a Manual or Automatic (1 minute for example) Refresh Option so I don't have to hit Last 2 hours once in a while.

 

Take experience from Logic Monitor Application, the best monitoring tool so far, almost as it is 1000 eyes Application.

Arnaldo Chicola
PhilipDAth
Kind of a big deal
Kind of a big deal

@CoachSteve , you know you can produce summary reports fitlered by SSID?

 

PhilipDAth_0-1633730705225.png

 

 

CoachSteve
Conversationalist

Thanks for the response Philip.

I saw that option but unfortunately it doesn't seem to function properly. When I filter by SSID we don't see any data, it's just blank, but if I have a look at "All SSIDs" then I see that we have a lot of data.

We've had a tac case open since April and the Tac engineer recently advised that they are still working on a solution so that is still quite unfortunate.

Habou
Conversationalist

Hello,

 

I think Planning and knowledge have the biggest impact. In many cases, Teams need capacity strengthening!

 

MakaraMEAS
Getting noticed

Here is the some Meraki network challenges impact to team or implementer:
1.Troubleshooting, Support:
          .Pro: Live Troubleshoot, Support Back-end, Request some features back-end by open case...
          .Con: Need to call support and take long time to response or answer phone, email take long time to response, difficult to engage for live troubleshooting or Webex, sometime no workaround solutions...
2.Planning:
          .Layer7(Applications still limit compare to Cisco FTD or other vendor),
          .By default NGFW Firewall rule is allow, so network view maybe not get all logs?
3.Report: Detail and easy to generate report: visibility, security event, users, applications, etc.

M.MAKARA
OCH
Conversationalist

Hello,

 

Wireless health !!!!

 

Before, with our « old » WLC, we have to wait users that came back to us for association, authentication, DHCP, DNS, etc. Failure…

 

Now, in real time, we get alerts, reports, who’s facing problem, where, etc. It’s insane 🙂 No more time spent to analyze log… Problems are analyzed for us and Meraki gives advices to troubleshoot. In case of Authentication issue, Meraki indicates to check PSK on client or check Radius… Awnsome 🙂

 

Less time lost to analyze log to gain more time to fix probelm 

starabishi
Conversationalist

Hi all, I believe the challenges resides in few things are are kind of inter-related

1) Fast changing IT model: work from home, pandemic and other factors => Meraki response SD-WAN and other related technology such as vpn

2) Lack of documentation and automation: with fast growing companies and adding new sites => also with the Meraki portal built-in topology and automation (API and other templates) is the correct way to go

3) Security is a the top and most critical part of this => Meraki continuous updates, and security model (threat protection and content filtering) and coupled with Cisco Umbrella is the icing on the cake

In Summary going Meraki will solve most of the current IT challenges 

EJN
A model citizen

What are the top network challenges (e.g. troubleshooting, planning, support, etc.) that currently have the biggest impact on your team's capacity? 

 

Optimization. How to check if all devices are optimized? Wireless signal coverage; channels/bands; switch settings; port settings. Settings might be working OK, but how to tell if they are optimal?

 

And as a followup, if you could wave a magic wand and automate/accelerate parts of your network operations today, what would you wish for? 

 

Client traffic. Most traffic, least traffic, what traffic is it. In an easy to understand view. Top 25 clients, top WAP, top switch, etc. Same for bottom/least. For the traffic, show me in plain english what websites are creating the traffic.

Esteban J Nunez
School and Church
K-12 Education
Inderdeep
Kind of a big deal
Kind of a big deal

@GrantShirk : Well i think the issue is multivendor and the integrations.. Multiple Security tools, Cyberattack threads and for which network engineers needs to be alert on multiple domain. The other issue is the Skill gap and the support from the multivendor as well. 

Regards/Inder
Cisco IT Blogs awarded in 2020 & 2021
www.thenetworkdna.com
BGeorge
Conversationalist

Vendor lock-in.

 

Yesterday’s reasonable decision means today your architecture is built around the assumptions and capabilities of yesterday’s vendor. Bringing in the best solutions from today’s vendors means figuring out how to interoperate, integrate, and support multiple solutions, or figuring out how to tear out all of the old equipment without tearing down the entire network at the same time.

BlakeRichardson
Kind of a big deal
Kind of a big deal

The biggest challenge facing my department is software that was obtained without using a proper procurement process therefore we have many systems that don't communicate that require triple handling of information in some cases. This coupled with the rate of technological change and the fact staffing hasn't increased in the 13 years I've worked here doesn't help. 

 

It sometimes feels like we are being given bigger faster steam trains to operate but our coal shovels still remain the same size. 

MK2
Building a reputation

Good morning,

 

our biggest problem is the following:
more and more it-systems and less qualified personel.
everything, needs to get connected, even the thrashcan :D.

 

this trend is quit frustrating because you can not clone IT guys.

 

 

But one tool that I found, that can assist with theis mess, is MERAKI. There are so many tools inside MERAKI that I could cry how great it helped us out.
For example monitoring, it's there, nothing to install or to setup.

The "old" way was to setup Nagios or a PRTG.

 

Our biggest challenge is support and lifecycle-management. Mostly to change a switchport because new devices gets provisioned or moved. But here we want to try a API-based solution for helpdesk, so that unskilled personel can change a VLAN with a minimum of knowledge.

I wish there would be some web-interface for this "easy" work without the need to programm a API-tool.

redsector
Head in the Cloud

I like the alarm messages when a port goes down and up.

But...

Please not for all ports in the network! 

I don´t want to know when users are shutting on or off their computers.

But I need this for only a few importants LAN-ports. F.e. for servers and printers and so on.

I there a possibillity to get an up/ down alarm message for specific MS switch network ports?

cmr
Kind of a big deal
Kind of a big deal

@redsector you can tag ports and then only alert on those tagged ports, i.e. ports connected to APs

redsector
Head in the Cloud

@cmr 

 

No, I can´t put tags to this error message.

redsector_0-1634024310726.png

 

cmr
Kind of a big deal
Kind of a big deal

@redsector if you add tags to the switch ports then they appear as choices in the menu:

Screenshot_20211012-084306_Chrome.jpg

Screenshot_20211012-084521_Chrome.jpg

redsector
Head in the Cloud

@cmr Thank you so much. That´s it. I didn´t see it before.
Majo
Comes here often

there is no big challenge with Meraki devices in our environment at this moment, so nothing special to share with you, they just works as expected (or works as designed, as sometimes used)

have a nice day to all of you

redsector
Head in the Cloud

I don´t know if this is possible:

All dashboard admins are logged out after a few minutes, and they have a 2FA for security reasons. That is a good thing.

But:

I need a monitoring user for our big monitoring-screen, read only, view only, but without auto logoff and without 2FA.

Can we mix admins with high security and monitoring-user(s) without high security?

 

PhilipDAth
Kind of a big deal
Kind of a big deal

>All dashboard admins are logged out after a few minutes, and they have a 2FA for security reasons. That a good 

 

@redsector , check out my hack for this.

 

https://community.meraki.com/t5/Managed-Services/Need-Dynamic-MSP-Dashboard-Monitoring/m-p/7355/high... 

redsector
Head in the Cloud

@PhilipDAth

Sorry, this link is not allowed for me.

 

redsector_0-1633943518412.png

 

redsector
Head in the Cloud

I have got several networks to manage, and some of the networks are mixed up with Cisco "classic" products and more and more Meraki products.

And sometimes there is a STP issue. 

What I need is a better STP inspection feature. I need to anaylze where the spanning-tree issue is. 

F.e. we have stacked Meraki switches connected to Cisco "classic" distribution switch stacks.

And each time when the Merli switches staring after an firmware upgrade I get STP errors.

andrewperry
Here to help

Grant access for vpn clients using both company's connectivity lines (main and backup ones).

MMoss
Building a reputation

I'm scared to mention it because I don't want to jinx it. My network has been running pretty smooth the last months with most issues being the result of endpoints going out like phones. With 22 locations plus remote workers I've been pretty fortunate.

GregErnest
Here to help

MS switching is only allowed 128 ACL rules.  They also do not support port ranges.  However they are allowed in Group Policies and MR AP's.

 

This is stopping us from fully blocking key subnets off from the rest of the network.  I just cannot be done for us with this above limitation.

mahrsmusic
Conversationalist

Hello!

       This is my very first post and it's sure to get me in trouble!    😉

 

       Our biggest challenge right now is just getting the equipment! We became a Meraki shop a few years ago and moved away from Cisco equipment when we saw how Meraki's cloud management solution could transform the way we manage our customers' networks. It's really saved us! Our sales rep, Patrick, I'm sure is getting beat up quite a bit lately over some of these 90 day lead times.    😑

       When I helped our organization to push us forward to the Meraki platform I knew that since it was a Cisco-backed company we were in good hands. I know the chip shortage is affecting the entire world but I'm hoping that Cisco can work its magic to keep the Meraki shipments flowing!!!   😁

 

Thank you,

David M.

enkompas

DAVID J MEYERS
DarrenOC
Kind of a big deal
Kind of a big deal

Hi @mahrsmusic - the lead times for Meraki whilst currently poor are a big improvement on some of the legacy Cisco stock.  I've recently quoted some Cat 9k projects with lead times of 135+ days.

Darren OConnor | doconnor@resalire.co.uk
https://www.linkedin.com/in/darrenoconnor/

I'm not an employee of Cisco/Meraki. My posts are based on Meraki best practice and what has worked for me in the field.
mahrsmusic
Conversationalist

Hi @DarrenOC ,

 

I appreciate the info, but between 90 - 135 days I don't see a big difference there. It's still several MONTHS away.

DAVID J MEYERS
cmr
Kind of a big deal
Kind of a big deal

Quite a few Meraki items (lower end PoE switches for instance) have out-done the mother ship and gone straight to 999 days now... 😱

MM_GT
Conversationalist

in the simplicity of managing Meraki networks, where there is time to create new projects

 

would you wish for? More reporting 

delfuego
Getting noticed

Meraki is worth billions of dollars and you are offering us socks? Up yours Meraki, I can buy my own socks.

DarrenOC
Kind of a big deal
Kind of a big deal

I personally think Meraki need to hit pause on their development side and get back to basics in terms of bolstering their support capability and also looking at core stability.

 

Too many posts on here recently with the focus centred on shocking support times.

Darren OConnor | doconnor@resalire.co.uk
https://www.linkedin.com/in/darrenoconnor/

I'm not an employee of Cisco/Meraki. My posts are based on Meraki best practice and what has worked for me in the field.
Get notified when there are additional replies to this discussion.