Stats graphs on dashboard.pusher.com are backed by a “logs” table in a MySQL database. Entries in this table are populated from a Kafka instance by a component called "stats-forwarder". At 01:56 UTC, the stats-forwarder stopped forwarding. We are still investigating the root cause; we have improved our logging to aid investigation.
A separate component called "stats-forwarder-production-tests" checks whether stats-forwarder is making progress, and this correctly identified the issue, but we did not get an on-call alert for this, because the alert was misconfigured. As such, we did not find out until 14:45 UTC. This alert is now fixed.
Once we were aware of the issue, we restarted stats-forwarder, which started repopulating the logs table. However, we found that stats-forwarder was dropping 5% of the stats. These stats were being dropped because the connection pool from stats-forwarder to MySQL was unbounded in size, so the instance that stats-forwarder runs on eventually could not open any new connections. This bug was only exhibited under high load, due to the large backlog of stats in Kafka. We have fixed this bug by bounding the connection pool size in stats-forwarder.
Those 5% of stats are now lost, resulting in slightly lower usage reported in the dashboard for 2018-10-09. We apologize for the loss of these stats, and for the delayed display of stats on the dashboard.