All Systems Operational
Website Operational
90 days ago
99.97 % uptime
Today
Graph rendering Operational
90 days ago
99.95 % uptime
Today
Ingestion Operational
90 days ago
99.96 % uptime
Today
Alerting Operational
90 days ago
99.95 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
had a major outage
had a partial outage
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Apr 18, 2019

No incidents reported today.

Apr 17, 2019

No incidents reported.

Apr 16, 2019

No incidents reported.

Apr 15, 2019
Resolved - As of 12:35 UTC, the affected datapoints have been replayed and are all available for querying.
Apr 15, 12:44 UTC
Identified - Since 12:07 UTC, we are experiencing a delay processing metrics. The issue was caused by incorrectly routing traffic to an unhealthy node. We've removed the node and the affected datapoints are being replayed.

No ingested data has been lost.
Apr 15, 12:37 UTC
Apr 14, 2019

No incidents reported.

Apr 13, 2019
Resolved - Between 15:04 UTC and 15:09 UTC, our render API reported HTTP 503 errors in response to requests.

From 15:09 UTC, the render API has been fully stable.
Apr 13, 15:18 UTC
Apr 12, 2019
Resolved - Our render layer has remained stable since 10:20 UTC. This incident is resolved.
Apr 12, 11:32 UTC
Monitoring - At 10:20 UTC, we restarted some of the affected webservers in order to clear existing load. The render API has been fully operational since then.

We continue to investigate this incident and monitor to make sure it doesn't reoccur.
Apr 12, 10:26 UTC
Investigating - Since 10:10 UTC, render requests have been returning a HTTP 503 error. We are investigating the situation.
Apr 12, 10:15 UTC
Apr 11, 2019

No incidents reported.

Apr 10, 2019

No incidents reported.

Apr 9, 2019

No incidents reported.

Apr 8, 2019

No incidents reported.

Apr 7, 2019

No incidents reported.

Apr 6, 2019

No incidents reported.

Apr 5, 2019
Resolved - We have replayed and verified all affected data.

This incident is resolved.
Apr 5, 09:11 UTC
Update - We have replayed 80% of the affected data, which is now again available for querying.
We expect the remaining replays to complete in the next couple of hours.

We'll resolve this issue tomorrow once we have verified the replayed data.
Apr 4, 22:12 UTC
Update - The replay is underway and some of the data from the affected period is now available. More data will become available as the replay continues. We expect this process to take several more hours, and we will provide further updates when we have new information.
Apr 4, 17:54 UTC
Monitoring - As of 16:48 UTC we have started replaying data from the affected period. As this is a significant amount of data, we are taking measures to ensure that the persistent storage backend, as well as the aggregation servers responsible for replaying the data, remain stable by expanding our aggregation layer and limiting the replay process to a subset of hosts. We will provide additional updates on the status of this replay in an hour or when we have further information.
Apr 4, 16:51 UTC
Identified - Up to 3% of metrics ingested between 04:11 UTC on April 3 and 11:51 UTC on April 4 may not have been persisted to our backend storage layer for 5 minute resolution. All other resolutions are unaffected. Our leading edge cache is protecting up to 16 hours of this data, which is still available for query.
We've identified an issue that affected several nodes across our backend storage layer for 5 minute resolution data during the time period.

We are currently working on replaying the affected data from this time period. No data has been lost, and we will provide an update on the status of the data replay in an hour or when we have further information.
Apr 4, 15:48 UTC