This incident has been resolved.
Oct 17, 13:07 UTC
Our aggregation layer has suffered a further decrease in capacity leading to backlogs of up to 5 minutes..
We have expanded capacity in our aggregation layer to help work through the backlogs.
Oct 17, 12:20 UTC
A fix has been implemented and we are monitoring the results.
Oct 17, 10:55 UTC
As of 10:52 UTC our aggregation layer has returned to full health and all backlogs have been replayed.
We continue to monitor the situation.
Oct 17, 10:53 UTC
As of 10:12 UTC network connectivity issues have caused datapoints to be dropped in our aggregation layer.
We have switched to a less strict healthcheck mechanism and are seeing recovery.
Backlogs of up to 7 minutes are currently being replayed.
This will have caused delays in processing datapoints leading to gaps in graphs causing alerts to trigger in error.
No data has been lost.
Oct 17, 10:35 UTC