Metrics missing from the dashboard
Incident Report for Pusher
Postmortem

Overview

For around 3-4 hours there were no historical stats for customers. This was caused by a misconfiguration of one of our stats pipeline components. This did not impact any core functionality.

Issue description

One of our components reads stats from the pipeline and writes them to an Amazon RDS instance. This issue was fundamentally caused by one of our components not being updated to write stats to the correct database. This happened since we promoted a read replica to the master as part of an upgrade. This did not change the endpoint of the previous master. This component therefore kept writing to the now orphaned database.

How it was resolved

The core issue was resolved by centralizing this point of configuration so we can change database endpoints for all our services that rely on them. The stats issue was resolved by importing the incorrectly written stats from the orphaned database and importing them into the new master, and then rerunning the stats aggregation jobs (these jobs produce the hourly/daily/weekly stats).

Posted Jul 23, 2018 - 10:58 UTC

Resolved
All seems well! Contact us if you see anything strange. Goodnight! 👋
Posted Jul 18, 2018 - 23:34 UTC
Monitoring
All stats have been migrated to the correct database, and the aggregation process is finished. No data was lost nor was any core functionally affected. We'll monitor for a little bit just to make sure all is well! 🤞
Posted Jul 18, 2018 - 23:21 UTC
Update
The missing data is being aggregated and you should see your graphs filling with missing data. This process is estimated to finish in around a hour or so, at which point the issue will be resolved! Thanks for your patience. 🚀
Posted Jul 18, 2018 - 20:52 UTC
Identified
Due to misconfiguration after a database failover our metrics were written to the wrong database for the last hour. We have switched to the correct database and are working on inserting the metrics into the correct database.
Posted Jul 18, 2018 - 16:06 UTC
This incident affected: Channels Stats Integrations.