Intermittent Grafana Error: "Grafana has failed to load its application files"
Incident Report for Hosted Graphite
Resolved
We are no longer experiencing elevated error rates and the affected user accounts have been fixed. All users should be successfully able to access grafana and we are no longer experiencing intermittent connectivity issues.

The affected accounts were experiencing HTTP 500s due to an issue where the connection limit for the MySQL backend had not be applied correctly, resulting in a much lower default. As part of normal usage we hit that lowered limit, and the accounts that were affected by this issue experienced elevated error rates due to how grafana handles failed connections to our MySQL backend.

From the 11th of July at 14:26 UTC to the 17th of July at 15:50 UTC (today) less than 1% of users were affected by this issue, causing intermittent connectivity issues when accessing grafana, contributing to 1-4% of total grafana requests returning HTTP 500 errors.

This issue is now resolved.
Posted Jul 17, 2019 - 16:06 UTC
Update
Between 1-4% of total Grafana requests are responding with the error "HTTP 500: Grafana has failed to load its application files". These failures affect less than 1% of users.

We believe that this is due to intermittent connectivity issues at the container level which we continue to investigate.

While our investigation continues, refreshing the page will allow access to the instance of Grafana, as the connectivity issues are transient in nature. We'll provide a further update in 3 hours, or when we have new information.
Posted Jul 17, 2019 - 13:24 UTC
Monitoring
Between 1-4% of grafana requests are responding with the error "HTTP 500: Grafana has failed to load its application files".

We believe this to be an issue created during the rollout of a new container image version and are currently investigating the situation further.

While we develop a fix, refreshing the page will allow access to the grafana instance as the error only happens intermittently.
Posted Jul 17, 2019 - 09:57 UTC
This incident affected: Website.