Feb 20, 17:30 UTC
We have successfully expanded our capacity and the incident is now resolved. All systems are operating normally.
Feb 15, 18:43 UTC
There continues to be no user impact and we have identified that this is a capacity issue. We are expanding capacity in the 300s resolution storage cluster to enable writes again and prevent any further occurrences. We will update again when the capacity issue has been resolved or we have further information.
Feb 15, 17:34 UTC
There continues to be no user impact. We are investigating a potential issue related to our service which writes out to our 300s storage cluster involving duplicate objects which has lead to increased load on the cluster. We will update again in two hours or when we have further information.
Feb 15, 15:32 UTC
Timeouts to our 300s resolution storage no longer occur and graphs should have returned to normal.
We've halted long-term writes to our 300s data storage to give it sufficient room to respond to requests. We are continuing our investigation of the issue and we will update when we know more in an hour
Feb 15, 14:11 UTC
We are still seeing an elevated percentage of timeouts. We have identified an issue with unusually large objects being written to our storage backend responsible for 300s data and are currently working to address that.
Feb 15, 13:36 UTC
At 12:10 UTC today we began seeing intermittent issues of up to 7% timeouts to a our 300s datapoint resolution of our long-term storage which could lead to some partial graphs.
At 12:35 UTC, these timeouts increased to up to 11% and are still ongoing but we are investigating the issue.
Feb 15, 13:11 UTC