All Systems Operational
Website   Operational
90 days ago
99.96 % uptime
Today
Graph rendering   Operational
90 days ago
99.96 % uptime
Today
Ingestion   Operational
90 days ago
99.96 % uptime
Today
Alerting   Operational
90 days ago
99.96 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Jun 20, 2018

No incidents reported today.

Jun 19, 2018

No incidents reported.

Jun 18, 2018

No incidents reported.

Jun 17, 2018
Resolved - This incident has been resolved.
Jun 17, 11:13 UTC
Monitoring - A fault within our hosting providers infrastructure has resulted in severe network issues across all our services.

Between 10:25 and 10:29 UTC this resulted in the following impact:
* Ingestion: Our ingestion layer successfully processed all data-points sent, however, during the outage we saw a large backlog of data between the ingestion layer and the aggregation layer. This would have lead to gaps in graphs, and possibly erroneous triggered alerts, until this data was successfully replayed at 10:45 UTC.
* Graph Rendering: 50% of renders failed during the affected period. This will have caused alerts to trigger in error.
* Website: The website may have been unavailable for a short time during the incident.
* Alerting: Valid Alerts may not have been triggered during the incident.
Jun 17, 10:49 UTC
Jun 16, 2018

No incidents reported.

Jun 15, 2018

No incidents reported.

Jun 14, 2018
Resolved - Ingestion levels have returned to normal and this incident is now resolved.
Jun 14, 15:53 UTC
Monitoring - From 15:13 UTC TO 15:30 UTC, approximately 10% of connections made to ingestion endpoints would have failed, this also affected 10% of UDP traffic. Any received data has not been lost.

We have tracked down the issue to a recent configuration change and it has been reverted.
Jun 14, 15:40 UTC
Jun 13, 2018

No incidents reported.

Jun 12, 2018

No incidents reported.

Jun 11, 2018

No incidents reported.

Jun 10, 2018
Resolved - Between 13:41 UTC and 14:00 UTC, alert evaluation was delayed and missing metrics alerts would've fired incorrectly.
Jun 10, 14:13 UTC
Jun 9, 2018

No incidents reported.

Jun 8, 2018

No incidents reported.

Jun 7, 2018
Resolved - From 18:20 UTC to 18:48 UTC we experienced elevated rates of read errors from our long term storage for 300s resolution data. Approximately 10% of graphs rendered during this time period would've experienced partial data.
Jun 7, 18:49 UTC
Jun 6, 2018

No incidents reported.