Hi,
Is there an ongoing issue with the performance of the Meraki Dashboard API?
We've developed software for a number of Meraki SD-WAN customers and in recent days have been experiencing issues with a significant increase in the API reliability and response times.
Whilst requesting data from various endpoints we're seeing issues, which can also be reproduced in Postman
This problem was seen with multiple unrelated customer organisations. For example one his hosted on n376.meraki.com and another on n271.meraki.com
Looking at some historical data this issue appears to have started around 11th Jan and has been getting progressively worse. It also appears to be worse from 8am until 8-9pm.
I'm not aware of anywhere to see service status or performance if there is such a thing. But it looks very much like a load issue
Solved! Go to solution.
I think if you check now, you should find the problem fixed.
If not, we had a different problem!
Hi @Martin_Rowan I am not aware of anything specific being mentioned or discussed in our Meraki Support channels. There is a relatively new discussion forum here on the Community specific to service outage/impairments: https://community.meraki.com/t5/Meraki-Service-Notices/bg-p/service-notices and at this point nothing there either specific to degraded API performance. And that's a valid data point that you are noticing this across multiple Dashboard organizations on different shards.
I would open a ticket with Meraki Support for further investigation, they will be best equipped to correlate your issue with any others and engage with Engineering.
I've seen similar on n383 over the last couple of weeks, seems to be degrading over time.
From my end (using Meraki Python package), the problem is showing as server disconnects and server not accepting connections, rather than HTTP errors.
I opened a case yesterday, low priority but fast contact back from support, and then provided a lot of detail info to them.
As above, open a case is the best approach.
Thanks, I've opened a support case.
@Martin_Rowan I also seem to be having major issues with the API. Lots of calls seem to be timing out and scripts that would normally run < 1 minute are taking 5+ minutes to complete. Wonder what the problem is but I doubt support has any answers 😄
Well the more people that raise tickets the better I guess. Slow responses are annoying, incomplete/truncated responses (invalid json) are something else. An example below took nearly 3 mins and sent ~48k of the 160k expected. The SSL handshake was very long and unusual in that request, most of the time it appears to be the download. It's definitely a lot slower during 8am-9pm timeframe, early this morning a call was still slow 5 seconds, but as the day progresses it has become more erratic and flaky.
Not sure if this will help some people in the short term. But I make the request directly to the shard endpoint for example: https://n376.meraki.com/api/v1/ then responses are back in milliseconds, whereas going via the API gateway https://api.meraki.com/api/v1 is where it's slow.
I fed this back to support on the case I opened and gave a link to your post.
They said they're investigating a wider issue than just my case, so good to know they are aware and working on resolution.
I think if you check now, you should find the problem fixed.
If not, we had a different problem!
Indeed performance seems to have improved dramatically. Hopefully, we won't see the gradual decline I detected since 11th Jan.
Yes we're seeing the problem fixed at around 13:10 GMT today \o/
Are you guys seeing any errors again? Things still seem to timeout for me. Are you still using the API gateway or just making the calls directly to the shard now?
@AutomationDude Checked again and response times are fine using api.meraki.com not getting timeouts or seeing very slow replies.
Normal operation still for me, no server disconnects/non-connects, using the gateway.
Wonder why it's happening to me then.. I'll investigate on my end. Cheers anyway though and have a nice weekend
We've seen a dramatic increase in API response times too, I've logged a case about it on the 19th, but I couldn't seem more information there.
The job that gets all device statuses for a 150-ish organizations went from ~42 sec to ~6 minutes. Calls on specific API shards became significantly slower, the graph would be way higher if the timeout for the API client wasn't 60 seconds, because some calls just took 2-4 minutes to respond.
99th percentile going up:
Averages going up:
For us the issue was resolved on 21st around midday UTC.
If you still see a problem then I suggest post a comment on the case you opened asking for an update.
Yeah our issues have been resolved.
Can you confirm which API Endpoints you're using?
We have an issue where some times API requests don't respond before an internal timeout (60 seconds), I sometimes see the same issue using just PostMan, as a result, we changed our implementation to have an additional retry policy, such that if we don't start to get a reply within a second, we drop that request and make it again (after a short delay). This has improved the overall reliability significantly. Not sure if you're hitting the same issue.
@Martin_Rowan I was having slow responses on:
GET /v1/networks/{networkId}/firmwareUpgrades
GET /v1/organizations/{organizationId}/devices
GET /v1/organizations/{organizationId}/devices/statuses
I'm temporarily applying the same strategy as well: implemented a lower timeout and retries but that's just putting on a band aid unfortunately.
I'm running a script that loops through many orgs and it's quite clear that certain orgs on specific shards are experiencing a ton of latency. Most of the orgs however are doing just fine. It also seems to be specific API calls on specific orgs that are timing out.
I've also experienced time out on several of my scripts runneing API calls. I have narrowed it down to n146.meraki.com timing out. Contaced Meraki Support, but they have no solutions.
Yep, it's the same shard for me thats causing issues. Doubt Meraki support can do anything about this, the problem is likely to be at a much deeper level.
You need to persuade them to open an internal case with the Dashboard-API support team.
It's already done! But they can't say anything......