Org-level endpoint for Appliance Performance Score

Prodrick
Building a reputation

Org-level endpoint for Appliance Performance Score

What’s the likelihood that we’ll see an org-level endpoint for  Appliance Performance Score in the next 12 months? Having to hit the active node by serial number uses a lot of my budget in bigger customer environments even if I only get it every 15 minutes.

 

Current endpoint:

 

https://api.meraki.com/api/v1/devices/:serial/appliance/performance

8 Replies 8
ww
Kind of a big deal
Kind of a big deal

Yeah that would be great to have 

For now there is one for top10

https://developer.cisco.com/meraki/api/get-organization-summary-top-appliances-by-utilization/

John-K
Meraki Employee
Meraki Employee

This operation doesn't solve every problem (no operation does, ha!), but it should solve many.

 

Your top appliances by utilization are the most important ones to monitor, most of the time. Appliances trending towards 100% might be candidates for hardware upgrades. However, if an appliance isn't in the top 10 by utilization for that interval, then it means the other appliances in the organization all have average utilization lower than the least utilized appliance in the response. If your least utilized appliance is normally pretty low (e.g. under 50%) then it's pretty unlikely that hardware utilization is going to be a concern for any of the other sites.

 

But you don't need to take my word for it! The endpoint supports t0 & t1 controls so you can slice the data (e.g. 1 day at a time) and build your own graphs. You can use those stats to establish a performance baseline (e.g. a typical utilization percentage) for those top 10.

 

If you measure this over periods of time where you have known real-world usage spikes, then that will yield you some stats about how usage spikes can impact the appliance utilization. For example, imagine that the norm for a given office is 50% appliance utilization, but you've measured that a big team on-site event caused the utilization to spike to 75%. It'd be reasonable to predict that subsequent onsite events for that office on that hardware could trigger a utilization increase of about 25 percentage points.

 

Generally, 75% isn't high enough to be concerning in most scenarios, especially if it's a temporary spike. On the other hand, if your "normal" utilization for that office creeps up to 75%, then you might predict that an onsite event could push the utilization (for that site and that hardware) to 100%. Then, that might be an actionable problem and/or a sign that it's time to upgrade the MX hardware at that location.

 

Separately, if you correlate these stats with real-world user experience data (e.g. support cases for "slow Internet" or "slow VPN") then you can build some personalized metrics about which utilization percentages might be of concern to your users for specific models. For example, if users at a site are suddenly opening lots of support tickets about slow WAN/VPN etc., and you see that the utilization is 95%, then there might be a hardware bottleneck.

 

All of this is to say that, if your top 10 by utilization are typically pretty low, then additional datapoints about the other appliances are probably not much of a concern. And separately, if your top 10 are usually pretty high, then you probably have enough data already to start addressing potential hardware bottlenecks that might be causing user experience issues.

GreenMan
Meraki Employee
Meraki Employee

I would suggest you hook up with your Meraki account team, for them to raise a feature request on your behalf.

Prodrick
Building a reputation

Thanks. I have the ability to submit a feature request. I prefer to do this first, as it gives other visibility and the ability to provide alternative solutions.

RaphaelL
Kind of a big deal
Kind of a big deal

Pretty sure we also did the request a long time ago. 


But to be honest , it is not really useful since the values returned are always 1 hour late. You can't monitor the device utilization with care.

Prodrick
Building a reputation

We monitor metrics like this for trending, capacity planning, and to alert if it stays at 100. Additionally, we monitor the syslog for the “why”.

RaphaelL
Kind of a big deal
Kind of a big deal

Trending and capacity planning makes sense. 

 

But if my HUBs are reaching 100% I would like to know right now , not 1 hour later. It's already too late , everyone suffered.

Prodrick
Building a reputation

I'm looking to see if this is something we can identify in the syslog to alert on.

Get notified when there are additional replies to this discussion.