All Systems Operational
Website   Operational
90 days ago
100.0 % uptime
Today
Graph rendering   Operational
90 days ago
99.99 % uptime
Today
Ingestion   Operational
90 days ago
99.93 % uptime
Today
Alerting   Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
System Metrics Month Week Day
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Feb 22, 2018

No incidents reported today.

Feb 21, 2018

No incidents reported.

Feb 20, 2018

No incidents reported.

Feb 19, 2018

No incidents reported.

Feb 18, 2018

No incidents reported.

Feb 17, 2018

No incidents reported.

Feb 16, 2018

No incidents reported.

Feb 15, 2018
Postmortem - Read details
Feb 20, 17:30 UTC
Resolved - We have successfully expanded our capacity and the incident is now resolved. All systems are operating normally.
Feb 15, 18:43 UTC
Update - There continues to be no user impact and we have identified that this is a capacity issue. We are expanding capacity in the 300s resolution storage cluster to enable writes again and prevent any further occurrences. We will update again when the capacity issue has been resolved or we have further information.
Feb 15, 17:34 UTC
Update - There continues to be no user impact. We are investigating a potential issue related to our service which writes out to our 300s storage cluster involving duplicate objects which has lead to increased load on the cluster. We will update again in two hours or when we have further information.
Feb 15, 15:32 UTC
Monitoring - Timeouts to our 300s resolution storage no longer occur and graphs should have returned to normal.

We've halted long-term writes to our 300s data storage to give it sufficient room to respond to requests. We are continuing our investigation of the issue and we will update when we know more in an hour
Feb 15, 14:11 UTC
Update - We are still seeing an elevated percentage of timeouts. We have identified an issue with unusually large objects being written to our storage backend responsible for 300s data and are currently working to address that.
Feb 15, 13:36 UTC
Identified - At 12:10 UTC today we began seeing intermittent issues of up to 7% timeouts to a our 300s datapoint resolution of our long-term storage which could lead to some partial graphs.
At 12:35 UTC, these timeouts increased to up to 11% and are still ongoing but we are investigating the issue.
Feb 15, 13:11 UTC
Feb 14, 2018
From 15:54 UTC to 16:33 UTC, we had connectivity issues to our leading edge storage and long term storage. Up to 5% of requests to leading edge and up to 2% of requests to long term storage failed resulting in partial graphs. This has since been resolved and renders have returned to normal.

Ingestion and alerting were unaffected during this time period.
Feb 14, 17:30 UTC
Feb 13, 2018

No incidents reported.

Feb 12, 2018

No incidents reported.

Feb 11, 2018

No incidents reported.

Feb 10, 2018

No incidents reported.

Feb 9, 2018

No incidents reported.

Feb 8, 2018

No incidents reported.