So, we do have a second MX -- that's what enabled me to reboot the primary unit without _serious_ problems,
although we did lose connectivity on public safety apps for a moment -- Not the end of the world, but not how I like to treat public safety either.... The applications recover, but it's still unprofessional to have no other recourse except to reboot the entire firewall to reset the IPSEC parameters on just one of many VPN ipsec tunnels.
I did talk to Support -- they made changes to the VPN registries available/known to the networks involved,
but that wasn't the problem - I didn't have the ability to be on the phone with Support from where I was at the time of the outage.
And - the other part of the question - Yes, there was traffic, the subnets full of Cisco phones and PCs on either end
definitely were still trying to talk - this is not a lack of "interesting traffic"
On an ASA, I'd have been able to do debug commands to see the ISAKMP attempted setups (or failures),
and capture match traffic on the interesting traffic ACLs, etc.
As it was - all I could see was that the endpoints on both sides could not see each other.
Both MXes could see the MAC address of the other unit in ARP tables, but could not ping each other, but could ping other devices (gateway router, other public IP devices) in the same public-side IP subnet.
Zero ability to see the tunnel "down" other than a red bar in the GUI --- zero ability to see lifetimes, keepalives, etc.
That info should all be in the gui and monitorable in SolarWinds (or other SIEM/NMS).
So much more info exists about an IPSEC tunnel - none of it is visible in the GUI, and not in logs.
And no "button" to reset just one tunnel without dropping others.