Has anyone else had a need to monitor which uplink interface is currently 'Active'? If so, what was your method for accomplishing it?
The current issue we're facing is that our 4G failover simply works too well. We're finding sites that have had their primary cable/DSL/whatever circuit offline for days without anyone realizing it. We're looking at 700+ sites equipped with MX's and a 4G USB failover device, so this can amount to thousands in charges on our Verizon bill. There are definitely ways to get this information and I understand that, we're just having a hard time molding it to our needs.
For example, we could use the built-in alert 'The primary uplink status changes' on the template level and fire off an email to our ticketing system, however we don't necessarily care about an outage that only lasts 30 minutes. We could also monitor via SNMP when the primary uplink interface goes down, however that doesn't help in the case of routing troubles as the interface will still show it's up even though the MX has already failed over to the 4G interface.
Ultimately, we're looking for a way to track when the Cellular interface goes 'Active' on the MX for longer than 1 hour. We utilize Solarwinds for other monitoring, so it would be fantastic if I could find a way to do it within that.
The only way I can see this currently possible would be to take advantage of the Alerting within Dashboard. That's if you're not going to proactively monitor the state via SNMP.
Using the alerts, The primary uplink status changes and Cellular connection state change alerts can be fed into your ticketing tool. Assuming that two alerts would be generated, one for when failover occurs and one when service is restored this would indicate a normal restoration. Any Ticket which only has the one alert generated after 60 minutes of creation would indicate the device is still operating on the Cellular interface.
My recommendation if this is still too ADHOC would be to "make a wish" requesting for time thresholds for the primary up-link and cellular connection state alerts.
Eliot F | Simplifying IT with Cloud Solutions Found this helpful? Give me some Kudos! (click on the little up-arrow below)
Just for future reference for anyone that may find this thread. Here is what I have come up with.
Our Solarwinds instance is able to report on values at the interface level as I previously stated. We do not care about actual interface status however. The key value I found is: Received bps.
When our primary WAN connection goes down, I see this interface value drop to below 1000 (usually fluctuates between 50-600). I have set this alert to email after an hour of continued detection. It's certainly not a fool-proof method by any means, but with my limited testing I believe it will be consistent enough to roll out and not annoy our Service Desk.
We had a similar situation. Support told us to "Make a wish" and/or hit the API...so we did those things.
Here's a link to the github page of what we use. We use it to check for Wan interfaces (wan1 & wan2) that are in a "failed" or "not connected" state. We have the script setup to run once per day. If one of the interfaces is in one of those states for that length of time, it gets added to a list and emailed to us. It may not be super pretty, but it was our only option.
We use two perl scripts for this. The first one calls /api/v0/organizations/$orgid/deviceStatuses every 60 seconds and saves the response locally. The second script takes the device serial as a parameter and polls the current locally saved file.
We use PRTG for monitoring and the script generates XML output that can be handled by the "custom EXE/XML" sensor type. For alerting, we utilize the monitoring-builtin features when one of the sensors is at least 30 minutes on failover.
I dropped the files at GitHub, feel free to use them. If you need any help, feel free to contact me.