Elevated 5XX and 4XX errors on mt1 cluster
Incident Report for Pusher
Resolved
A postmortem will follow.
Posted Jan 14, 2020 - 19:38 UTC
Monitoring
All HTTP API traffic on mt1 is being directed to working instances.

We are still receiving a small number of requests to faulty instances. We believe this is due to clients not respecting the 60s TTL. Please clear your DNS cache.

We are monitoring our 4xx and 5xx rates on this cluster, and our traffic rates.

A postmortem will follow.
Posted Jan 14, 2020 - 19:07 UTC
Update
All DNS traffic is now being directed at working instances. Around ~10% of requests are still being received by faulty instances. We believe some of these are due to Keep-Alive; we are closing those connections to prompt clients to move over to working instances.

(Some other requests may be due to clients not respecting the 60s TTL; in this case users may need to clear their DNS caches.)
Posted Jan 14, 2020 - 18:48 UTC
Update
No users should be experiencing bad 5XX responses. Some users may still be experiencing 4XX responses. We are continuing to migrate traffic away from bad instances. We expect all traffic to be migrated within 60 minutes.
Posted Jan 14, 2020 - 18:34 UTC
Update
We are continuing to move traffic back to instances unaffected by the 4XX and 5XX issues.
Posted Jan 14, 2020 - 18:09 UTC
Identified
We have identified the cause of 4xx and 5xx responses. We are rolling back a failed deployment. We expect to see reduced error rates as traffic migrates.

We'll update this with more details soon.
Posted Jan 14, 2020 - 17:47 UTC
Update
We are continuing to investigate this issue.
Posted Jan 14, 2020 - 17:21 UTC
Investigating
We're seeing an increase in 5XX and 4XX errors on mt1 cluster and we're currently investigating the cause.
Posted Jan 14, 2020 - 15:22 UTC
This incident affected: Channels REST API.