All Systems Operational
Website Operational
Graph rendering Operational
Ingestion Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
System Metrics Month Week Day
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Feb 24, 2017

No incidents reported today.

Feb 23, 2017
While performing maintenance on some of our systems today our metric naming service was impacted, causing a delay for newly created metrics to appear in the system, and preventing users from being able to delete metrics from the UI.

The system was impacted between 10:40 and 11:20 UTC. No data has been lost, and metrics created during this time interval will appear with all previously sent data intact as soon as a new datapoint is received for them.
Feb 23, 11:31 UTC
Feb 22, 2017

No incidents reported.

Feb 21, 2017

No incidents reported.

Feb 20, 2017

No incidents reported.

Feb 19, 2017

No incidents reported.

Feb 18, 2017

No incidents reported.

Feb 17, 2017

No incidents reported.

Feb 16, 2017
Resolved - This issue has now been resolved.

We have identified a service making an excessive number of connections to our leading edge cache that was causing intermittent failures when requesting data resulting in partial graphs. The offending service has now been shut down.
Feb 16, 19:30 UTC
Investigating - We are currently investigating an issue with our leading edge cache that is affecting graph renders for a small percentage of metrics. You can expect to see gaps for some metrics at the leading edge of graph renders.
Feb 16, 17:31 UTC
Feb 15, 2017

No incidents reported.

Feb 14, 2017

No incidents reported.

Feb 13, 2017
Resolved - All datapoints from the impacted aggregation server have been successfully written out to our persistent storage backend.
Feb 13, 16:49 UTC
Monitoring - All datapoints have now been restored and normal service has resumed. We will resolve this incident when the datapoints from the impacted server have been written out to our persistent storage backend.
Feb 13, 15:49 UTC
Update - The replay has been completed for the long term resolution (1 hour) so graphs over long periods have been repaired. We're replaying the missing data for the other resolutions, which will repair shorter term graphs over the next couple of hours.
Feb 13, 13:27 UTC
Identified - We have identified a failure in one of our aggregation servers resulting in the loss of data from it's leading edge cache affecting leading edge reads for approximately 2% of all metrics for all resolutions.
Feb 13, 13:00 UTC
Feb 12, 2017

No incidents reported.

Feb 11, 2017

No incidents reported.

Feb 10, 2017
Resolved - This incident has been resolved.
Feb 10, 21:01 UTC
Investigating - Our Heroku integration doesn't seem to be correctly receiving all log-based metric data, with our internal monitoring suggesting only a small portion of datapoints are being correctly ingested
Feb 10, 20:35 UTC
Resolved - We have successfully replayed all resolutions. All gaps in graphs for the 2% of affected metrics should now be filled in.
Feb 10, 05:45 UTC
Update - We have successfully replayed all affected data for 3600s, 300s, and 30s resolutions. Gaps in graphs for those resolutions should now be filled in. We are currently working on replaying the 5s data.
Feb 10, 05:25 UTC
Identified - We have identified a failure in one of our aggregation servers resulting in the loss of data from it's leading edge cache affecting leading edge reads for approximately 2% of all metrics for all resolutions. You can expect 3600s data for this percentage of metrics to be missing from Feb 8 03:23 UTC, 300s from 10:38 UTC, 30s from Feb 10 01:08 UTC, and 5s from Feb 10 03:00:12 UTC. We are currently working on replaying the data from our backups and these gaps should fill in soon.
Feb 10, 05:00 UTC