Monitoring - After rolling back the configuration changes, we are seeing recovery in ingestion, graph rendering, and alerting. We will continue to monitor the situation and provide another update in one hour.
Dec 12, 14:15 UTC
Update - We are still in the process of rolling back the configuration changes. We will provide another update when this is completed.
Dec 12, 13:34 UTC
Identified - We have identified the issue as being related to a DNS configuration change that was made earlier today. As a result of this, approximately 70% of ingestion traffic is failing, which will result in partial graphs. This may cause alerts to incorrectly fire. We are rolling back the changes and will provide updates as more information is available.
Dec 12, 12:33 UTC
Investigating - We're currently investigating an issue at our load-balancing layer which is affecting ingestion - we will post updates as more information becomes available.
Dec 12, 11:42 UTC
Website   Operational
90 days ago
99.97 % uptime
Today
Graph rendering   Degraded Performance
90 days ago
99.97 % uptime
Today
Ingestion   Partial Outage
90 days ago
99.94 % uptime
Today
Alerting   Degraded Performance
90 days ago
99.97 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
System Metrics Month Week Day
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Dec 11, 2017

No incidents reported.

Dec 10, 2017

No incidents reported.

Dec 9, 2017

No incidents reported.

Dec 8, 2017

No incidents reported.

Dec 7, 2017

No incidents reported.

Dec 6, 2017

No incidents reported.

Dec 5, 2017

No incidents reported.

Dec 4, 2017

No incidents reported.

Dec 3, 2017

No incidents reported.

Dec 2, 2017

No incidents reported.

Dec 1, 2017

No incidents reported.

Nov 30, 2017

No incidents reported.

Nov 29, 2017

No incidents reported.

Nov 28, 2017
Resolved - All data has been replayed.
Nov 28, 10:17 UTC
Identified - We have identified a failure in one of our aggregation servers at 09:33 UTC resulting in leading edge data being unavailable for approximately 1% of all metrics for all resolutions. The server is now back online and we are currently replaying all data from the affected period.
Nov 28, 10:02 UTC