As of 11:55 UTC we have identified an issue with our config tooling which incorrectly disabled health checks for our web servers, which made them unable to communicate with our long term storage layer. This resulted in failed reads across all resolutions for 100% of renders during the time period between 11:30 UTC to 11:55 UTC. This will have resulted in the incorrect firing of alerts and empty graphs for any renders during the affected time period.
We have rolled back this change and reads have returned to normal. This incident is now resolved.
Posted 5 months ago. May 07, 2019 - 12:04 UTC
As of 11:30 UTC we are experiencing partial renders across all resolutions due to an issue reading from our long term storage layer for up to 80% of reads. We are currently investigating this issue.