All Systems Operational
Website Operational
Graph rendering Operational
Ingestion Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
System Metrics Month Week Day
www.hostedgraphite.com uptime ?
Fetching
Interface health: TCP ?
Fetching
Interface health: UDP ?
Fetching
Interface health: StatsD ?
Fetching
Interface health: HTTP API ?
Fetching
Interface health: carbon relay (pickle) ?
Fetching
Graph render time (95th percentile)
Fetching
Interface health: Heroku integration ?
Fetching
AWS connectivity (US-East-1) ?
Fetching
AWS connectivity (US-West-1) ?
Fetching
Past Incidents
Jun 22, 2017

No incidents reported today.

Jun 21, 2017

No incidents reported.

Jun 20, 2017
Resolved - This incident is now resolved.
Jun 20, 14:05 UTC
Monitoring - We've identified and rolled back a configuration change that prevented our service discovery mechanisms to automatically remove faulty nodes. Our ingestion pipeline is fully operational now and data affected is currently being replayed.

We will continue to monitor the situation until we're confident that it is resolved.
Jun 20, 13:22 UTC
Investigating - We're investigating an issue that's resulting in degraded performance of our ingestion pipeline (starting 11:25 UTC). Small percentage of data points being ingested aren't immediately available for rendering. No data has been lost at this stage and all data will be replayed once we apply fixes for ongoing issue.
Jun 20, 12:27 UTC
Jun 19, 2017

No incidents reported.

Jun 18, 2017

No incidents reported.

Jun 17, 2017
Resolved - This incident is now resolved.
Jun 17, 03:32 UTC
Monitoring - Our service provider informed us that the issue was due to faulty network equipment, which was replaced. Our services were impacted between 01:54 and 02:55 UTC. Annotations sent between those times will have been dropped.

We will continue to monitor the situation until we're confident that it is resolved.
Jun 17, 03:10 UTC
Investigating - We're investigating an internal network outage within our service provider that has caused high render response times, impacted our data ingestion services and API access at approximately 01:54 UTC. Some rendered graphs might be temporarily missing more data until we fully restore our services. Additionally our Annotations service is currently unavailable.
Jun 17, 02:47 UTC
Jun 16, 2017

No incidents reported.

Jun 15, 2017
Resolved - We have restored the alerting service and normal operation has resumed. However, some alerts triggered between 17:15 UTC and 18:19 UTC will not have been processed.
Jun 15, 18:33 UTC
Investigating - We are currently investigating an issue that is causing degradation for our alerting service (starting 17:15 UTC). Majority of alerts will not be processed/sent until we identify and fix the root cause. We will update this status page once we have more information.
Jun 15, 18:13 UTC
Jun 14, 2017
Resolved - We have restored the alerting service and normal operation has resumed. However, some alerts triggered between 17:50 UTC and 18:25 UTC will not have been processed.
Jun 14, 18:49 UTC
Monitoring - We have performed the rollback as of 18:25 UTC and alerts are beginning to be processed and sent again. We are monitoring the situation as recovery continues.
Jun 14, 18:30 UTC
Identified - We've identified a problem with our alerting caused by a change rolled out to fix issues experienced last night and are currently rolling back this change - as of 17:50 UTC some alerts will not be sent or processed until this change is fully rolled back.
Jun 14, 18:15 UTC
Resolved - We have restored the alerting service, however any alerts triggered between 03:58 UTC and 05:44 UTC will have been impacted by the downtime with only some of these being processed.
Jun 14, 06:13 UTC
Monitoring - Our alerting service is back online. To clarify the impact described in our previous update, you can expect alerts during this period to have failed to notify your notification channels.
Jun 14, 05:58 UTC
Investigating - We are currently investigating an issue that is causing downtime for our alerting service. No alerts will be processed/sent until we identify and fix the root cause. We will update this status page once we have more information.
Jun 14, 05:26 UTC
Jun 13, 2017

No incidents reported.

Jun 12, 2017

No incidents reported.

Jun 11, 2017
Resolved - All datapoints from the impacted aggregation server have been successfully replayed and are now queryable.
Jun 11, 01:17 UTC
Update - We're replaying the missing data for all resolutions and will update when this process has completed.
Jun 11, 01:07 UTC
Identified - We have identified a failure in one of our aggregation servers resulting in the loss of data from its leading edge cache affecting leading edge reads for approximately 2% of all metrics for all resolutions.
Jun 11, 00:39 UTC
Jun 10, 2017

No incidents reported.

Jun 9, 2017
Resolved - The previous change has been successfully rolled back as of 17:38 UTC and use of the highestCurrentIndex and lowestCurrentIndex graphite functions has returned to normal.

The graphs at hostedgraphite.com/app and /app/traffic are now rendering as expected.
Jun 9, 16:39 UTC
Update - We have identified the issue and are working on reverting a change that had previously been rolled out at 11:20 UTC
Jun 9, 16:19 UTC
Identified - We have identified an issue with an internal service that handles the highestCurrentIndex and lowestCurrentIndex graphite functions. Any graphs using these functions will currently not render. This will also affect our user traffic graphs seen at hostedgraphite.com/app.
Jun 9, 15:43 UTC
Jun 8, 2017

No incidents reported.