We have also experienced this from time to time. To answer your questions, I guess Meraki does need to improve this on their side but since it works more than 99% of the time it's safe to say they are doing a decent job, it's pretty much impossible to guarantee 100%. I don't reckon there's much you can do on your side to fix the issue, but maybe you could set up better error handling within your code for these types of situations so you can retry the call until you get the data you need. Good luck!
In the example you posted it looks like you are using the V0 API which I think is now deprecated, do you see the same with V1?
Fwiw I use the API across several organizations for monitoring/stats/analytics, using quite a few different endpoints, currently with the Python V1 library. Defensive programming with plenty of error recovery is essential, the error recovery in the library is certainly not always enough.
There will be faults at times, some may take quite a while to fix, but there will also be transient faults, including Internet/other issues unrelated to Meraki's infrastructure, all can disrupt things.
One thing that may reduce issues is using an organization's specific shard rather than the generic api.meraki.com, there's another recent thread about this.
You mention polling every minute, if that is on multiple networks/devices in an organization you might also be hitting the rate limit at times.
I guess you prefer the same approach on multiple vendors, but it seems odd to do such frequent polling of an orchestrator that can itself send you alerts as/when they occur, maybe you could get what you need more reliably via webhooks?
We are running similar in house developments for multiple organizations' monitoring and have seen lately significant number of issues. Currently it's frequent timeouts on API calls.
The main problem in our experience, that we can not pinpoint any particular patterns when and why we start getting timeouts on some of the calls for some of the organizations.
Regarding 429 error code returned by Meraki API - we worked out a solution that allows us to limit the calls number to conform with Meraki rate limiting. Yet, from time to time we still starting to get lots of 429 errors, one of the solutions we found is to change the egress IP we are running from, but it's work round that causes us a lot time spent on these. Has anyone seen similar problems?
We respect the rate limits and 429's, but the issue is just random. Out of the blue some calls are just not handled properly.
We are not polling all API's every minute, but some, thats was just to show that the error rate of the Meraki API is way higher than API's we call more frequently.. We have our reasons to do so, Meraki API we hit every 5 minutes, and just two organizations on n146, and I see more post about n146 being slow.