cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

'Network Device Loss and Latency History' inconsistency

Highlighted
Just browsing

'Network Device Loss and Latency History' inconsistency

Hi,

 

I'm hitting:

https://developer.cisco.com/meraki/api/#/rest/api-endpoints/devices/get-network-device-loss-and-late...

 

I then parse the response into PRTG.

 

The problem i have is that sometimes it will return with nothing ([]) in Postman (and PRTG) and some other times it will return 2 groups of keys and values.

 

My GET:

 

/networks/{networkId}/devices/{serial}/lossAndLatencyHistory?ip=8.8.8.8&timespan=90&uplink-WAN1

 

 Normal (expected) response:

 

 

 

[
    {
        "startTs": "2020-02-28T13:58:00Z",
        "endTs": "2020-02-28T13:59:00Z",
        "lossPercent": 0.0,
        "latencyMs": 23.4
    }
]

 

 

 

If i keep clicking Send in Postman every 10-20 seconds it sometimes returns:
 

 

 

[]

 

 

 

Some other times, rare but it happens, it will return something like:

 

 

 

[
    {
        "startTs": "2020-02-28T14:13:00Z",
        "endTs": "2020-02-28T14:14:00Z",
        "lossPercent": 0.0,
        "latencyMs": 23.3
    }
    {
        "startTs": "2020-02-28T14:14:00Z",
        "endTs": "2020-02-28T14:15:00Z",
        "lossPercent": 0.0,
        "latencyMs": 24.1
    }
]

 

 

 

That results into a parsing and graphing mess in PRTG.

 

The issue in the last example seems to be the 'timespan' parameter, notice that it returns the value for two periods of time, 1 minute duration each. That's confirmed in the dashboard as the uplink 'Historical Data' graph seems to work in 1 minute intervals.

 

My first thought was that the 90 seconds timespan will occasionally return two values, one for the past 60 seconds and one for the 60 seconds before it. I now tried requesting a timespan of 60 seconds instead but i am seeing the same issue where it will often return nothing. That stops PRTG from plotting a smooth graph similar to the one on the dashboard. Interestingly, the issue applies to both WAN links, they will both either return something or nothing.

 

Nothing else is querying this API so we are definitely not hitting the 5 calls per second limit.

 

I could play around with a longer timespan or the t0 and t1 values but i would like to monitor this in a way where i can spot latency spikes, ideally so it looks similar to the graph on the dashboard.

 

Anyone knows what pattern this follows?

 

It seems to be clearing the values every now and then and eventually replacing them with the most current ones, then adding to these.

 

Some experiments i've done:

 

-- timespan=300 usually returns 4 values but something 5.

-- timespan=180 sometimes returns 1 value, other times 3, some other times 2 which makes no sense.

-- timespan=180 and above seem to ALWAYS return at least 1 value.

 

 

5 REPLIES 5
Highlighted
Kind of a big deal

Re: 'Network Device Loss and Latency History' inconsistency

What HTTP response code are you getting when it doesn't work?

 

Perhaps you are getting a 429 asking you to back off, or something else.

Highlighted
Just browsing

Re: 'Network Device Loss and Latency History' inconsistency

It's actually returning a 200 with an empty table, all i see in Postman is:

 

[]

 

Was wondering if someone can test this with a timespan of somewhere between 60 and 90 seconds and confirm. 

 

Just before i left the office i tried requesting for org MX latency and loss** instead of a single device with a high timespan, and noticed that the timestamps in the responses came out of order but also were sometimes within 1 second of each other, others within 10 seconds. Trying to figure out if there's any pattern here or the API is queueing/caching data and responding on it on a random fashion.

 

** https://developer.cisco.com/meraki/api/#/rest/api-endpoints/organizations/get-organization-uplinks-l...

Highlighted
New here

Re: 'Network Device Loss and Latency History' inconsistency

This is happening to me as well, but I can never make it work. Were you able to figure it out? Thanks.

Getting noticed

Re: 'Network Device Loss and Latency History' inconsistency

Looking at the OP's findings, I'd say the variable number of response samples is just a side-effect of a too-short sampling window (timespan) for a set of events (default 60 second duration) that aren't synchronised with the sampling window, this seems in line with seeing 4 or 5 with a 300 second timespan.

 

If you need consistent 60 second samples, try making requests every 2-3 minutes with a t0 in seconds of (now-300 seconds)%60 seconds (to align t0 to hh:mm:00) and t1 = t0+300, add new samples to a list, discard or overwrite duplicates.

 

That should allow plotting without gaps, the only cost is that things will lag a couple of minutes, though you could reduce that by making the API call at a higher rate, say every minute.

Highlighted
Getting noticed

Re: 'Network Device Loss and Latency History' inconsistency

I had to do something similar, I just poll the past 5 minutes every 5 minutes, and average the values (sloppy but I don't actually need granular accuracy). 

 

It's unfortunate that the API doesn't enforce a minimum time frame, it would avoid a lot of confusion. 

Welcome to the Meraki Community!
To start contributing, simply sign in with your Cisco account. If you don't yet have a Cisco account, you can sign up.