Hi Full-stack crew - wanted to get your take on an analytics and capacity question. We're starting a conversation on the blog this week about practical applications of AI and intelligence, and wanted to ground it in the realities you face every day.
What are the top network challenges (e.g. troubleshooting, planning, support, etc.) that currently have the biggest impact on your team's capacity?
And as a followup, if you could wave a magic wand and automate/accelerate parts of your network operations today, what would you wish for?
[Edit on 9/22]
Our first blog post on the topic is live.
Thanks for kicking off the discussion! We'll be digging into the other responses, but on fault detection and diagnosis - are there particular root causes that are particularly pernicious to track down?
>are there particular root causes that are particularly pernicious to track down
I find tracking down access because of firewall rules can take a while, because they can be configured in so many places.
>are there particular root causes that are particularly pernicious to track down
I agree - there is the layer 7 rules, layer 3 rules, content category/url filtering, dns filtering. If it's only blocked because of DNS - there is a page that is displayed to the user. Content filter may log the event, unless it's "N events were dropped".
Layer 3 and Layer 7 are really the harder ones. In our case it's typically geo filtering. The recent MaxMind/Google issue has taught us that the geo-filtering comes from MaxMind not BrightCloud. Once we have determined the source, getting it to go through is the next challenge.
Other firewalls let you set allow rules at layer 7. I understand that will not be possible in firewall component, and won't be possible without a substantial amount of rewrite.
Meraki also has some of these ...
Packet path visualisation. Sometimes I get a client saying "x" is not working. You might need to look through:
* Auto VPN firewall rules
* MX VPN firewall riles
* Layer 7 rules
* Content rules
* Host-based group policies
* VLAN based group policies
* Switch ACLs
* MR ACLs
* Route tables
Probably something else I have forgotten.
Maybe it would be nice to click on a client, type in what it is trying to access, and say "Visualise" and watch it move through all the different engines to see where it might be blocked.
Something that can analyse blocked traffic on MX firewall rules would be nice (in general) and a bit easier than the last one.
Every firewall I have ever used lets you watch, in real-time, traffic flows and what is being blocked. To do this on an MX you have to use a third-party syslog server.
It would be nice to have something you can say "start monitoring", you do a test, and then it returns an "expert" analysis saying what it saw that was allowed, and what got blocked.
Like syslog or real-time monitoring, but made Meraki simple.
Real Meraki simple would be proposing the firewall rule changes, and then a button to apply that recommendation.
This is the scenario I'm in now.
We just standardized all MX's worldwide with standard L3 Firewall rules on all VLANs (thanks Python and the API!), but now we have a Deny All rule as the rule above the Allow All built-in rule.
I would love it if I could quickly see what is being blocked by this Deny All rule right in the dashboard, or even via an API script that I could pump to an Excel spreadsheet.
Now I have to start a new project to find and configure a Syslog server to monitor blocked traffic whenever someone claims to have an issue with the firewall.
Every now and then networks increase in size, administering the network configuration increases in difficulty. Devices can conflict with each other. It becomes interesting to keep the rules in firewalls and other devices up to date, and manually applying policies leads to errors and discrepancies.
An AI controlled router could change DNS, close ports, or dump cache on the fly. It could help identify threats based on real-time security algorithm updates from a world wide threat matrix. At any time capacity is being reached even for a millisecond, it could reach out for help and direction using a hive mind technology. Possibly even autonomously spinning up virtual routers to deal with a threat by dumping traffic like an offramp or to provide additional bandwidth as needed.
To review each user's usage of the network, they should revert to client usage of the network. Now it's just general.
I have to do packet capture to know what they are browsing. It is that at the school level this is very necessary.
We are VSAT providers and our customers are always traveling on the water trying to finish their job with very strict timelines.
For us, an issue on the boat while they are in the middle of the ocean with no network coverage and only rely on VSAT Internat and VOIP is what makes troubleshooting more challenging.
The longer we take to resolve the issue the bigger the impact is for the customer. Because they have to communicate constantly with offshore using email or speaking on phone.
The biggest challenges to troubleshooting are:
I think the biggest challenge is having the man power available to troubleshoot the network quickly and easily. Also to be proactive in identifying and resolving user issues before it turns into a dumpster fire.
There are a couple but the one that comes up the most is that we have had multiple request to schedule a reboot for some of our MXs and it would be nice to be able to schedule a reboot of one, much like we do with firmware upgrades. The only other thing I can think of is the monitoring in place. We get alerts to the point it is just white noise that no one pays attention to anymore. We have it to send and email to a DL if the VPN connection goes up or goes down, but there is no option for a time threshold. For example we get this quite often:
At 07:06 AM CDT on Oct 8, the site-to-site VPN connection to XXXXXXXX went down.
At 07:07 AM CDT on Oct 8, the site-to-site VPN connection to XXXXXXXX came up.
Meraki is really an awesome product and I would recommend it to anyone.
The only challenge I've had with it is the reporting where you can't filter by SSID.
Working in a very big corporate/enterprise company, these type of reports are very crucial to execs.
With my magic wand, I will put on the right top corner on the Insight--> WAN health page, a Manual or Automatic (1 minute for example) Refresh Option so I don't have to hit Last 2 hours once in a while.
Take experience from Logic Monitor Application, the best monitoring tool so far, almost as it is 1000 eyes Application.
Thanks for the response Philip.
I saw that option but unfortunately it doesn't seem to function properly. When I filter by SSID we don't see any data, it's just blank, but if I have a look at "All SSIDs" then I see that we have a lot of data.
We've had a tac case open since April and the Tac engineer recently advised that they are still working on a solution so that is still quite unfortunate.
Here is the some Meraki network challenges impact to team or implementer:
.Pro: Live Troubleshoot, Support Back-end, Request some features back-end by open case...
.Con: Need to call support and take long time to response or answer phone, email take long time to response, difficult to engage for live troubleshooting or Webex, sometime no workaround solutions...
.Layer7(Applications still limit compare to Cisco FTD or other vendor),
.By default NGFW Firewall rule is allow, so network view maybe not get all logs?
3.Report: Detail and easy to generate report: visibility, security event, users, applications, etc.
Wireless health !!!!
Before, with our « old » WLC, we have to wait users that came back to us for association, authentication, DHCP, DNS, etc. Failure…
Now, in real time, we get alerts, reports, who’s facing problem, where, etc. It’s insane 🙂 No more time spent to analyze log… Problems are analyzed for us and Meraki gives advices to troubleshoot. In case of Authentication issue, Meraki indicates to check PSK on client or check Radius… Awnsome 🙂
Less time lost to analyze log to gain more time to fix probelm
Hi all, I believe the challenges resides in few things are are kind of inter-related
1) Fast changing IT model: work from home, pandemic and other factors => Meraki response SD-WAN and other related technology such as vpn
2) Lack of documentation and automation: with fast growing companies and adding new sites => also with the Meraki portal built-in topology and automation (API and other templates) is the correct way to go
3) Security is a the top and most critical part of this => Meraki continuous updates, and security model (threat protection and content filtering) and coupled with Cisco Umbrella is the icing on the cake
In Summary going Meraki will solve most of the current IT challenges
Optimization. How to check if all devices are optimized? Wireless signal coverage; channels/bands; switch settings; port settings. Settings might be working OK, but how to tell if they are optimal?
Client traffic. Most traffic, least traffic, what traffic is it. In an easy to understand view. Top 25 clients, top WAP, top switch, etc. Same for bottom/least. For the traffic, show me in plain english what websites are creating the traffic.
@GrantShirk : Well i think the issue is multivendor and the integrations.. Multiple Security tools, Cyberattack threads and for which network engineers needs to be alert on multiple domain. The other issue is the Skill gap and the support from the multivendor as well.
Yesterday’s reasonable decision means today your architecture is built around the assumptions and capabilities of yesterday’s vendor. Bringing in the best solutions from today’s vendors means figuring out how to interoperate, integrate, and support multiple solutions, or figuring out how to tear out all of the old equipment without tearing down the entire network at the same time.
The biggest challenge facing my department is software that was obtained without using a proper procurement process therefore we have many systems that don't communicate that require triple handling of information in some cases. This coupled with the rate of technological change and the fact staffing hasn't increased in the 13 years I've worked here doesn't help.
It sometimes feels like we are being given bigger faster steam trains to operate but our coal shovels still remain the same size.
our biggest problem is the following:
more and more it-systems and less qualified personel.
everything, needs to get connected, even the thrashcan :D.
this trend is quit frustrating because you can not clone IT guys.
But one tool that I found, that can assist with theis mess, is MERAKI. There are so many tools inside MERAKI that I could cry how great it helped us out.
For example monitoring, it's there, nothing to install or to setup.
The "old" way was to setup Nagios or a PRTG.
Our biggest challenge is support and lifecycle-management. Mostly to change a switchport because new devices gets provisioned or moved. But here we want to try a API-based solution for helpdesk, so that unskilled personel can change a VLAN with a minimum of knowledge.
I wish there would be some web-interface for this "easy" work without the need to programm a API-tool.
I like the alarm messages when a port goes down and up.
Please not for all ports in the network!
I don´t want to know when users are shutting on or off their computers.
But I need this for only a few importants LAN-ports. F.e. for servers and printers and so on.
I there a possibillity to get an up/ down alarm message for specific MS switch network ports?
there is no big challenge with Meraki devices in our environment at this moment, so nothing special to share with you, they just works as expected (or works as designed, as sometimes used)
have a nice day to all of you
I don´t know if this is possible:
All dashboard admins are logged out after a few minutes, and they have a 2FA for security reasons. That is a good thing.
I need a monitoring user for our big monitoring-screen, read only, view only, but without auto logoff and without 2FA.
Can we mix admins with high security and monitoring-user(s) without high security?
>All dashboard admins are logged out after a few minutes, and they have a 2FA for security reasons. That a good
@redsector , check out my hack for this.
I have got several networks to manage, and some of the networks are mixed up with Cisco "classic" products and more and more Meraki products.
And sometimes there is a STP issue.
What I need is a better STP inspection feature. I need to anaylze where the spanning-tree issue is.
F.e. we have stacked Meraki switches connected to Cisco "classic" distribution switch stacks.
And each time when the Merli switches staring after an firmware upgrade I get STP errors.
I'm scared to mention it because I don't want to jinx it. My network has been running pretty smooth the last months with most issues being the result of endpoints going out like phones. With 22 locations plus remote workers I've been pretty fortunate.
MS switching is only allowed 128 ACL rules. They also do not support port ranges. However they are allowed in Group Policies and MR AP's.
This is stopping us from fully blocking key subnets off from the rest of the network. I just cannot be done for us with this above limitation.
This is my very first post and it's sure to get me in trouble! 😉
Our biggest challenge right now is just getting the equipment! We became a Meraki shop a few years ago and moved away from Cisco equipment when we saw how Meraki's cloud management solution could transform the way we manage our customers' networks. It's really saved us! Our sales rep, Patrick, I'm sure is getting beat up quite a bit lately over some of these 90 day lead times. 😑
When I helped our organization to push us forward to the Meraki platform I knew that since it was a Cisco-backed company we were in good hands. I know the chip shortage is affecting the entire world but I'm hoping that Cisco can work its magic to keep the Meraki shipments flowing!!! 😁
Hi @mahrsmusic - the lead times for Meraki whilst currently poor are a big improvement on some of the legacy Cisco stock. I've recently quoted some Cat 9k projects with lead times of 135+ days.
Hi @UCcert ,
I appreciate the info, but between 90 - 135 days I don't see a big difference there. It's still several MONTHS away.
Quite a few Meraki items (lower end PoE switches for instance) have out-done the mother ship and gone straight to 999 days now... 😱
I personally think Meraki need to hit pause on their development side and get back to basics in terms of bolstering their support capability and also looking at core stability.
Too many posts on here recently with the focus centred on shocking support times.