All Systems Operational
Website   Operational
90 days ago
99.99 % uptime
Today
Graph rendering   Operational
90 days ago
99.98 % uptime
Today
Ingestion   Operational
90 days ago
100.0 % uptime
Today
Alerting   Operational
90 days ago
99.99 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Dec 13, 2018
Completed - The scheduled maintenance has been completed.
Dec 13, 05:30 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Dec 13, 03:01 UTC
Scheduled - Between 03:00 UTC and 05:30 UTC on December 13th our hosting provider will be performing essential maintenance to their network infrastructure. This maintenance is expected to cause widespread disruption across our services, resulting in:

* degraded ingestion
* intermittent access to the main site and application
* partial/empty renders
* delayed alerting

Once this window of maintenance is over, we will be monitoring the situation and providing status updates for any services still affected after the maintenance window.
Dec 10, 14:55 UTC
Dec 12, 2018

No incidents reported.

Dec 11, 2018
Resolved - Our services have finished working through the backlog and all affected datapoints are now available for querying.

This incident is now resolved.
Dec 11, 17:30 UTC
Monitoring - Our services are continuing to work through the backlogs of affected datapoints.

We will next update when the replay has been completed.
Dec 11, 16:34 UTC
Update - Our aggregation layer has returned to a healthy state.

Leading edge data has recovered and we continue to work through backlogs of affected datapoints. Datapoints received between 14:50 UTC and 15:20 UTC will continue to be delayed.
Dec 11, 15:38 UTC
Identified - As of 14:50 UTC, a configuration change has resulted in our aggregation layer being overloaded.
This will result in delay of a few minutes in datapoints being processed.

The configuration change has been reverted and we are seeing recovery in our aggregation layer.
Dec 11, 15:12 UTC
Completed - The scheduled maintenance has been completed.
Dec 11, 05:30 UTC
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Dec 11, 03:00 UTC
Scheduled - Between 03:00 UTC and 05:30 UTC on December 11th our hosting provider will be performing essential maintenance to their network infrastructure. This maintenance is expected to cause widespread disruption across our services, resulting in:

* degraded ingestion
* intermittent access to the main site and application
* partial/empty renders
* delayed alerting

Once this window of maintenance is over, we will be monitoring the situation and providing status updates for any services still affected after the maintenance window.
Dec 10, 14:42 UTC
Dec 10, 2018

No incidents reported.

Dec 9, 2018

No incidents reported.

Dec 8, 2018

No incidents reported.

Dec 7, 2018

No incidents reported.

Dec 6, 2018

No incidents reported.

Dec 5, 2018
Postmortem - Read details
Dec 6, 17:08 UTC
Resolved - We are no longer seeing any impact on ingested datapoints, and we are now marking this incident as resolved.

Between 18:01 UTC and 19:10 UTC, we experienced delays of up to 3 minutes processing ingested metrics, which resulted in gaps in the leading edge of graphs.

From 19:10 UTC until 21:05 UTC, there was an improvement in the processing rate of our ingestion pipeline, but we still saw occasional spikes of up to a minute in the delay of processing incoming datapoints.

After 21:05 UTC, our ingestion layer has recovered and regular operation has resumed.

We will be publishing a postmortem before the end of this week for this incident.
Dec 5, 21:50 UTC
Update - Our aggregation layer continues to work through the backlog of affected datapoints.

We will provide another update in one hour or when we have more information.
Dec 5, 20:24 UTC
Monitoring - Our aggregation layer has returned to a healthy state.

Leading edge data has recovered and we continue to work through backlogs of affected datapoints.
Dec 5, 18:44 UTC
Update - The configuration change has been reverted and we are seeing recovery in our aggregation layer.

Delays remain in the processing of datapoints.
Dec 5, 18:33 UTC
Identified - As of 18:01 UTC, a configuration change has caused an issue in our aggregation layer
This will result in delays of datapoints being processed.

We are reverting the change now.
Dec 5, 18:25 UTC
Dec 4, 2018

No incidents reported.

Dec 3, 2018

No incidents reported.

Dec 2, 2018

No incidents reported.

Dec 1, 2018

No incidents reported.

Nov 30, 2018
Resolved - As of 10:13 UTC, users will no longer see errors when deleting metrics, as we have identified and corrected an issue with the caching layer for our metric naming service which caused it to incorrectly route delete requests after the failover.

We will be taking steps to ensure that this does not occur again in future failovers.
Nov 30, 10:23 UTC
Investigating - Since 12:00 UTC on Thursday 29th November, we have seen errors when users attempt to delete metrics after a failover of our internal service which handles metric creation and deletion. We are investigating the issue.
Nov 30, 10:06 UTC
Nov 29, 2018

No incidents reported.