Timeouts have subsided and the service has returned to acceptable performance. We also tweaked some parameters that will allow the service to deal with increased loads. We also deployed a new version with better logging and metrics to allow us to react more quickly and devise a fix, if future incidents do occur.
Posted Feb 15, 2017 - 10:18 UTC
Investigating
At 09:50 UTC the Push Notifications service was restarted and returned to acceptable performance. We are deploying additional metrics to understand the issue and determine a fix.
Posted Feb 14, 2017 - 11:32 UTC
Identified
The problem has been narrowed down to a load issue caused when a very large batch of APNS publishes are requested within a small timeframe. We are improving our metric collection to better understand the issue.
Posted Feb 13, 2017 - 14:33 UTC
Investigating
There appear to be intermittent request timeouts on our Push Notifications Service. We're actively investigating the cause.