MX "Device Utilization" vs CPU utilisation

Dylan
Just browsing

MX "Device Utilization" vs CPU utilisation

Hi,

 

Has anyone here seen high "Device Utilization" with the Organisation>Summary Report especially with the mx400? What are your experiences?

 

We have a pair of MX400 which are used only as a wireless concentrator and the "Device Utilization" graph has grown to a consistent 85-90%. We have also been seeing unwanted failover events (VRRP failovers) and are obviously suspicious of the CPU usage causing it. 

 

BUT TAC has investigated and advised the actual device load is not high very high and not to worry about it. They provided the following for our CPU load. 

 

> Load 1.24(1 min) 1.17(5 min) 1.06(15 min)

"Since it is a quad-core CPU, if the load reaches 4.00 reaches 100% which would cause some performance issues. However, the actual load on primary is just around 1.00, so it should not be a problem at this moment." 

 

The Meraki docs on Load monitoring say the device utilisation is "calculated based upon the CPU utilization of the MX and it's traffic load.".  Why is there such a disconnect between the Utilisation graph and the CPU load that Meraki TAC can see? Should we just be ignoring the utilisation graph? Am I missing something? 

 

 

 

 

 

 

 

 

 

1 Reply 1
PhilipDAth
Kind of a big deal
Kind of a big deal

I've just looked at a pair of what I would call "busy" MX400's.  They currently show as having 238 VPNs, and the "Organisation/Summary Report" pages shows the load as being 67%.  They are running 13.27 (you would use 13.28 now if you were going to upgrade).

 

The output you are seeing from support appears to be from the Linux "uptime" command (or the output from 'cat /proc/loadavg').

Easy explanation: https://www.itworld.com/article/2833435/hardware/how-to-interpret-cpu-load-on-linux.html

More complex explanation: http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html

 

You would really need to see these stats at the time you are getting your VRRP events - but it would have to be pretty bad to cause a VRRP failover.  I find it difficult to believe the CPU load would cause a VRRP failover.  The Meraki hello timer is 300ms, and it needs three hellos to be missed - so the MX CPU would have to be so busy that it could not service the VRRP traffic for 900ms (that is almost a second!).

 

 

It is hard to talk about the linux load values without knowing more about the internals of the MX OS.  For example, if there was a single threaded process doing most of the work then having 4 CPU cores would not help, and a load average of 1 would not be so good.  But assuming there are lots of processes running splitting the load across the CPU cores then a load average of 1 is not that bad,

Get notified when there are additional replies to this discussion.
Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.
Labels