All datapoints from the impacted aggregation server have been successfully written out to our persistent storage backend.
Feb 13, 16:49 UTC
All datapoints have now been restored and normal service has resumed. We will resolve this incident when the datapoints from the impacted server have been written out to our persistent storage backend.
Feb 13, 15:49 UTC
The replay has been completed for the long term resolution (1 hour) so graphs over long periods have been repaired. We're replaying the missing data for the other resolutions, which will repair shorter term graphs over the next couple of hours.
Feb 13, 13:27 UTC
We have identified a failure in one of our aggregation servers resulting in the loss of data from it's leading edge cache affecting leading edge reads for approximately 2% of all metrics for all resolutions.
Feb 13, 13:00 UTC