Network Connectivity Issues

Incident Report for Hosted Graphite

Resolved

This incident has been resolved.

Posted Oct 17, 2019 - 13:07 UTC

Update

Our aggregation layer has suffered a further decrease in capacity leading to backlogs of up to 5 minutes..

We have expanded capacity in our aggregation layer to help work through the backlogs.

Posted Oct 17, 2019 - 12:20 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Oct 17, 2019 - 10:55 UTC

Update

As of 10:52 UTC our aggregation layer has returned to full health and all backlogs have been replayed.

We continue to monitor the situation.

Posted Oct 17, 2019 - 10:53 UTC

Investigating

As of 10:12 UTC network connectivity issues have caused datapoints to be dropped in our aggregation layer.
We have switched to a less strict healthcheck mechanism and are seeing recovery.
Backlogs of up to 7 minutes are currently being replayed.

This will have caused delays in processing datapoints leading to gaps in graphs causing alerts to trigger in error.

No data has been lost.

Posted Oct 17, 2019 - 10:35 UTC

This incident affected: Graph rendering, Ingestion, and Alerting.