U IS FOR UNCERTAINTY

When we plot energy consumption against driving-factor values on a scatter diagram, the points don’t fall exactly on the regression line. The degree of dispersion is described by a parameter called the ‘coefficient of determination’, commonly known as R-squared, which tells us how much of the variation in energy consumption is explained by the regression model. When all the points fall exactly on the line, the model explains all the variation in energy consumption and R-squared has a value of one. If there is no relationship between consumption and the chosen driving factor, R-squared would be zero. If R-squared is 0.9 it means that the model explains 90% of the observed variation in energy consumption with the remaining 10% being attributable to errors or factors that were not taken into account.

There are two common misconceptions about R-squared. One is that on a heating system, a low value of R-squared signifies poor control. This is not necessarily the case, as the following thought experiment will show. Consider a well-controlled heating system whose consumption is assessed against a reliable local source of degree-day data. Whatever value of R-squared is observed, if you were to substitute degree-day statistics from a more distant weather station in the regression analysis, R-squared would go down, even though the heating system continues to be well-controlled. So beware: low R-squared might be telling you more about the quality of the model and your data than about the behaviour of the thing you are monitoring.

The other common misconception is that there is a threshold for R-squared (0.75, or 0.9, or whatever) below which your regression model cannot be trusted. There is no such cut-off. If you have chosen the most relevant driving factor and a straight-line model is plausible, you have got the right model and a low R-squared value just means it is not as reliable as it could be. In practice that simply means that a deviation has to be bigger before it can be treated as something that didn’t happen by chance. By refining the model you will improve your ability to discriminate between real faults and random variation. So it’s not a question of do you trust the model or not; the question is: “given a plausible model, how much uncertainty is there in its predictions?”. Hence the idea, introduced in an earlier bulletin, of tuneable +/- control limits on charts showing the history of deviation from expected consumption.

Back to the A to Z