A further update - my issue turned out to be time synchronisation on the VMX. It was too far out from the SAML provider.
Just in case anyone else runs into this issue, about once a year when running in Amazon AWS, I see this event:
msg: SAML: Assertion is expired or not valid
The VMX and the SAML provider have to have their time synchronised to within 3 minutes of each other. Anymore than that, and you get the above error. When the VMX runs in Amazon it seems to source its time from the host (rather than NTP).
The solution to fix the issue to to shutdown the VMX and then start it again. This moves you to a new host, which 99.999% of the time has the correct time.