Hi All,
We are facing an unusual issue with our MX250. When business hours are about to start and the number of clients behind the MX250 increases, there is a sudden spike in device utilization, which leads to latency affecting all traffic passing through the MX. This latency impacts both WAN links and LAN-to-MX traffic. As a result, we also experience increased latency over Auto VPN traffic.
Everything was functioning normally until mid-September. To stabilize device utilization, we rebooted the MX250, which temporarily resolved the issue for about three days. Initially, we were running MX 18.211.2, and TAC recommended upgrading to MX 18.211.3. The upgrade was stable for two days, but on the third day, latency increased again due to high device utilization. TAC then suggested upgrading to the MX 19.1.4 Beta version, but the issue persists.
All necessary logs have been captured, but no solution has been found for this behavior. We are seeing traffic of only 100-150 Mbps when the issue occurs, and after a reboot, device utilization returns to normal. We’ve even swapped the primary MX250 with a spare unit, but the behavior remains unchanged.
We tried disabling IPS/IDS, AMP and the behavior remains same.
If anyone has encountered a similar issue with any of your MX devices, could you please share any fixes or suggestions?
Hi ,
2 questions
1- Number of "clients" ?
2- Have you tried MX 18.107.10
You can also ask support to show you the advanced metrics. CPU 99% percentile , number of flows , packets per second.
It looks like that :
I have almost the same issues as described with many MX68. We are exceeding the recommended limit of 50 clients ( but who cares ). With 18.2XX code and 19.X we see a big bump is device utilization.
This is the device Utilization from Sept 15. I had requested TAC for checking CPU Utilization and they confirmed there is spike in CPU Utilization but i didn't get any report.
Average client count is under 1000 and some times it increases to 1890. The recommended client count for MX250 is 2000.
When we were running in 18.107.10 the device was much stable. After the upgrade there is in increase in device utilization. I will check with tac if we can test by downgrading the MX to 18.107.10
Do you have the graph to compare before ( MX 18.107 ) and after ( MX 18.211 / 19.1X ) ?
The only thing different from my case is that when we disable or whitelist 10/8 from the IPS we see a huge improvement. Device utilization decreases about 10-40% depending on the type of site.
Just verified the graph. It was normal till Sept 14. We upgraded to MX 18.211.2 in month of July 2024. It was fine till Sept 14 2024.
We have whitelisted around 20 rules in Intrusion detection and prevention as the trusted traffic in AUTO VPN was getting blocked.
IDS Mode is Prevention and Ruleset is balanced.
I see 3 suggestions.
MX 18.107.10 , MX 18.211.4 , MX 19.1.5 ( latest Stable RC )
I would hope that going back to 18.107.10 would "fix" your issue.
Ok I will downgrade to one of these firmware and check if it helps.
MX 18.211.4 improved high device utilization for one of my many MX75s. At another location, one MX105 seems to perform better during peak utilization periods (active concurrent sessions).
Sure. I will check this firmware if it helps.
I would recommend to upgrade the image
I had done the upgrade twice. Still the problem exists. Will check with any other version