Our us2 cluster had issues on 2018-05-31 between around 07:14 and 09:03. The root case was an outage in AWS: the us-east-2 region lost all connection to the Internet for approximately 30 minutes. Internal networking in the region was unaffected, so, with one exception, our us2 cluster became available again when the connection between us-east-2 and the Internet was restored.
The one exception to automatic recovery was our webhook feature. Webhook events are placed on a queue and consumed by a webhook sender. However, this network outage triggered a bug in our webhook sender, which meant it stopped sending webhooks. We manually restarted the webhook sender. Webhook events were not lost, but were delayed by up to two hours.