'Network Device Loss and Latency History' inconsistency

kvkewn
Conversationalist

'Network Device Loss and Latency History' inconsistency

Hi,

 

I'm hitting:

https://developer.cisco.com/meraki/api/#/rest/api-endpoints/devices/get-network-device-loss-and-late...

 

I then parse the response into PRTG.

 

The problem i have is that sometimes it will return with nothing ([]) in Postman (and PRTG) and some other times it will return 2 groups of keys and values.

 

My GET:

 

/networks/{networkId}/devices/{serial}/lossAndLatencyHistory?ip=8.8.8.8&timespan=90&uplink-WAN1

 

 Normal (expected) response:

 

 

 

[
    {
        "startTs": "2020-02-28T13:58:00Z",
        "endTs": "2020-02-28T13:59:00Z",
        "lossPercent": 0.0,
        "latencyMs": 23.4
    }
]

 

 

 

If i keep clicking Send in Postman every 10-20 seconds it sometimes returns:
 

 

 

[]

 

 

 

Some other times, rare but it happens, it will return something like:

 

 

 

[
    {
        "startTs": "2020-02-28T14:13:00Z",
        "endTs": "2020-02-28T14:14:00Z",
        "lossPercent": 0.0,
        "latencyMs": 23.3
    }
    {
        "startTs": "2020-02-28T14:14:00Z",
        "endTs": "2020-02-28T14:15:00Z",
        "lossPercent": 0.0,
        "latencyMs": 24.1
    }
]

 

 

 

That results into a parsing and graphing mess in PRTG.

 

The issue in the last example seems to be the 'timespan' parameter, notice that it returns the value for two periods of time, 1 minute duration each. That's confirmed in the dashboard as the uplink 'Historical Data' graph seems to work in 1 minute intervals.

 

My first thought was that the 90 seconds timespan will occasionally return two values, one for the past 60 seconds and one for the 60 seconds before it. I now tried requesting a timespan of 60 seconds instead but i am seeing the same issue where it will often return nothing. That stops PRTG from plotting a smooth graph similar to the one on the dashboard. Interestingly, the issue applies to both WAN links, they will both either return something or nothing.

 

Nothing else is querying this API so we are definitely not hitting the 5 calls per second limit.

 

I could play around with a longer timespan or the t0 and t1 values but i would like to monitor this in a way where i can spot latency spikes, ideally so it looks similar to the graph on the dashboard.

 

Anyone knows what pattern this follows?

 

It seems to be clearing the values every now and then and eventually replacing them with the most current ones, then adding to these.

 

Some experiments i've done:

 

-- timespan=300 usually returns 4 values but something 5.

-- timespan=180 sometimes returns 1 value, other times 3, some other times 2 which makes no sense.

-- timespan=180 and above seem to ALWAYS return at least 1 value.

 

 

6 REPLIES 6
PhilipDAth
Kind of a big deal
Kind of a big deal

What HTTP response code are you getting when it doesn't work?

 

Perhaps you are getting a 429 asking you to back off, or something else.

kvkewn
Conversationalist

It's actually returning a 200 with an empty table, all i see in Postman is:

 

[]

 

Was wondering if someone can test this with a timespan of somewhere between 60 and 90 seconds and confirm. 

 

Just before i left the office i tried requesting for org MX latency and loss** instead of a single device with a high timespan, and noticed that the timestamps in the responses came out of order but also were sometimes within 1 second of each other, others within 10 seconds. Trying to figure out if there's any pattern here or the API is queueing/caching data and responding on it on a random fashion.

 

** https://developer.cisco.com/meraki/api/#/rest/api-endpoints/organizations/get-organization-uplinks-l...

This is happening to me as well, but I can never make it work. Were you able to figure it out? Thanks.

sungod
Head in the Cloud

Looking at the OP's findings, I'd say the variable number of response samples is just a side-effect of a too-short sampling window (timespan) for a set of events (default 60 second duration) that aren't synchronised with the sampling window, this seems in line with seeing 4 or 5 with a 300 second timespan.

 

If you need consistent 60 second samples, try making requests every 2-3 minutes with a t0 in seconds of (now-300 seconds)%60 seconds (to align t0 to hh:mm:00) and t1 = t0+300, add new samples to a list, discard or overwrite duplicates.

 

That should allow plotting without gaps, the only cost is that things will lag a couple of minutes, though you could reduce that by making the API call at a higher rate, say every minute.

boomi
Getting noticed

I had to do something similar, I just poll the past 5 minutes every 5 minutes, and average the values (sloppy but I don't actually need granular accuracy). 

 

It's unfortunate that the API doesn't enforce a minimum time frame, it would avoid a lot of confusion. 

klausengelmann
Here to help

Hello Kvkewn

 

Many thanks for posting your issue and thoughts. I am in a similar situation using PRTG too.

I am struggling to understand how to combine the "timespan" parameter and the right "polling" frequency to collect data from Meraki API.

I am collecting the number of clients associated per Meraki Access Point or Network using the following URL:

 

 

In order to avoid the API Rate Limit I am collecting the result every hour using PRTG.

The timespan used inside the URL is also 3600 seconds (1 hour).

The problem is that the data is coming in a very inconsistent way. 

 

klausengelmann_0-1619211236428.png

 

klausengelmann_1-1619211296844.png

 

Please, could you help me to understand how can I use the Meraki API timespan correctly ?

What is the recommended time interval to collect the connected clients ?

 

Regards,

Get notified when there are additional replies to this discussion.