Pitfalls of regression analysis: case study

I began monitoring this external lighting circuit at a retail park in the autumn of 2016. It seems from the scatter diagram below that it exhibits weekly consumption which is well-correlated with changing daylight availability expressed as effective hours of darkness per week.

The only anomaly is the implied negative intercept, which I will return to later; when you view actual against expected consumption, as below, the relationship seems perfectly rational:

Consumption follows the annual sinusoidal profile that you might expect.

But what about that negative intercept? The model appears to predict close to zero consumption in the summer weeks, when there would still be roughly six hours a night of darkness. One explanation could be that the lights are actually habitually turned off in the middle of the night for six hours when there is no activity. That is entirely plausible, and it is a regime that does apply in some places, but not here. For evidence see the ‘heatmap’ view of half-hourly consumption from September to mid November:

As you can see, lighting is only off during hours of daylight; note by the way how the duration of daylight gradually diminishes as winter draws on. But the other very clear feature is the difference before and after 26 October when the overnight power level abruptly increased. When I questioned that change, the explanation was rather simple: they had turned on the Christmas lights (you can even see they tested them mid-morning as well on the day of the turn-on).

So that means we must disregard that week and subsequent ones when setting our target for basic external lighting consumption. This puts a different complexion on our regression analysis. If we use only the first four weeks’ data we get the relationship shown with a red line:

In this modified version, the negative intercept is much less marked and the data-points at the top right-hand end of the scatter are anomalous because they include Christmas lighting. There are, in effect, two behaviours here.

The critical lesson we must draw is that regression analysis is just a statistical guess at what is happening: you must moderate the analysis by taking into account any engineering insights that you may have about the case you are analysing

Lego shows why built form affects energy performance

Just to illustrate why building energy performance indicators can’t really be expected to work. Here we have four buildings with identical volumes and floor areas (same set of Lego blocks) but just look at the different amount of external wall, roof and ground-floor perimeter – even exposed soffit in two of them.

But all is not lost: there are techniques we can use to benchmark dissimilar buildings, in some cases leveraging submeters and automatic meter reading, but also using good old-fashioned whole-building weekly manual meter readings if that’s all we have. Join me for my lunchtime lecture on 23 February to find out more

Advanced benchmarking of building heating systems

The traditional way to compare buildings’ fuel consumptions is to use annual kWh per square metre. When they are in the same city, evaluated over the same interval, and just being compared with each other, there is no need for any normalisation. So it was with “Office S” and “Office T” which I recently evaluated. I found that Office S uses 65 kWh per square metre and Office T nearly double that. Part of the difference is that Office T is an older building; and it is open all day Saturday and Sunday morning, not just five days a week. But desktop analysis of consumption patterns showed that Office T also has considerable scope to reduce its demand through improved control settings.

Two techniques were used for the comparison. The first is to look at the relationship between weekly gas consumption and the weather (expressed as heating degree days).

The chart on the right shows the characteristic for Office S. Although not a perfect correlation, it exhibits a rational relationship.

Office T, by contrast, has a quite anomalous relationship which actually looked like two different behaviours, one high one during the heating season and another in milder weather.

The difference in the way the two heating systems behave can be seen by examining their half-hourly consumption patterns. These are shown below using ‘heat map’ visualisations for the period 3 September to 10 November, i.e., spanning the transition from summer to winter weather. In an energy heatmap each vertical stripe is one day, midnight to midnight GMT from top to bottom and each cell represents half an hour. First Office S. You can see its daytime load progressively becoming heavier as the heating season progresses:

Compare Office T, below. It has some low background consumption (for hot water) but note how, after its heating system is brought into service at about 09:00 on 3 October, it abruptly starts using fuel at similar levels every day:

Office T displays classic signs of mild-weather overheating, symptomatic of faulty heating control. It was no surprise to find that its heating system uses radiators with weather compensation and no local thermostatic control. In all likelihood the compensation slope has been set too shallow – a common and easily-rectified failing.

By the way, although it does not represent major energy waste, note how the hot water system evidently comes on at 3 in the morning and runs until after midnight seven days a week.

This case history showcases two of the advanced benchmarking techniques that will be covered in my lunchtime lecture in Birmingham on 23 February 2017 (click here for more details).

Air-compressor benchmarking

Readers with reliably-metered compressed-air installations are invited to participate in an exercise using a comparison technique called parametric benchmarking.

Background

Traditionally, air-compressor installations have been benchmarked against each other by comparing their simple specific energy ratios (SER) expressed typically as kWh per normal cubic metre. However, as this daily data kindly supplied by a reader shows, there may be an element of fixed consumption which confounds the analysis because the SER will be misleadingly higher at low output:

It seems to me that the gradient of the regression line would be a much better parameter for comparison; broadly speaking, on a simple thermodynamic view, one would expect similar gradients for compressors with the same output pressure, and differences would imply differences in the efficiency of compression. The intercept on the other hand is a function of many other factors. It may include parasitic loads; it will certainly depend on the size of the installation, which the gradient should not.

I am proposing to run a pilot exercise pooling anonymous data from readers of the Energy Management Register to try “parametric” benchmarking, in which the intercepts and gradients of regression lines are compared separately.

Call for data

Participants must have reliable data for electricity consumption and air output at either daily or weekly intervals: we will also need to know what compressor technology they use, the capacity of each compressor, and the air delivery pressures.

In terms of the metered data the ideal would be to have an electricity and air meter associated with each individual compressor. However, metering arrangements may force us to group compressors together, the aim being to create the smallest possible block model whose electricity input and air output is measurable.

Please register your interest by email to moc.a1544554238msev@1544554238sinli1544554238v1544554238 with ‘compressor benchmarking’ in the subject line: once I have a reasonable group of participants I will approach them for the data.

Vilnis Vesma

4 January 2017

The meaning of R-squared

In statistical analysis the coefficient of determination (more commonly known as R2) is a measure of how well variation in one variable explains the variation in something else, for instance how well the variation in hours of darkness explains variation in electricity consumption of yard lighting.

R2 varies between zero, meaning there is no effect, and 1.0 which would signify total correlation between the two with no error. It is commonly held that higher R2 is better, and you will often see a value of (say) 0.9 stated as the threshold below which you cannot trust the relationship. But that is nonsense and one reason can be seen from the diagrams below which show how, for two different objects,  energy consumption on the vertical or y axis might relate to a particular driving factor or independent variable on the horizontal or x axis.

In both cases, the relationship between consumption and its driving factor is imperfect. But the data were arranged to have exactly the same degree of dispersion. This is shown by the CV(RMSE) value which is the root mean square deviation expressed as a percentage of the average consumption.  R2 is 0.96  (so-called “good”) in one case but only 0.10 (“bad”) in the other. But why would we regard the right-hand model as worse than the left? If we were to use either model to predict expected consumption, the absolute error in the estimates would be the same.

By the way, if anyone ever asks how to get R2 = 1.0 the answer is simple: use only two data points. By definition, the two points will lie exactly on the best-fit line through them!

Another common misconception is that a low value of R2 in the case of heating fuel signifies poor control of the building. This is not a safe assumption. Try this thought experiment. Suppose that a building’s fuel consumption is being monitored against locally-measured degree days. You can expect a linear relationship with a certain R2 value. Now suppose that the local weather monitoring fails and you switch to using published degree-day figures from a meteorological station 35km away. The error in the driving factor data caused by using remote weather observations will reduce R2 because the estimates of expected consumption are less accurate; more of the apparent variation in consumption will be attributable to error and less to the measured degree days. Does the reduced R2  signify worse control? No; the building’s performance hasn’t changed.

Degree-day base temperature

When considering the consumption of fuel for space heating, the degree-day base temperature is the outside air temperature above which heating is not required, and the presumption is that when the outside air is below the base temperature, heat flow from the building will be proportional to the deficit in degrees. Similar considerations apply to cooling load, but for simplicity this article deals only with heating.

In UK practice, free published degree-day data have traditionally been calculated against a default base temperature of 15.5°C (60°F). However, this is unlikely to be truly reflective of modern buildings and the ready availability of degree-day data to alternative base temperatures makes it possible to be more accurate. But how does one identify the correct base temperature?

The first step is to understand the effect of getting the base temperature wrong. Perhaps the most common symptom is the negative intercept that can be seen in Figure 1 which compares the relationships between consumption and degree days. This is what most often alerts you to a problem:

It should be evident that in Figure 1 we are trying to fit a straight line to what is actually a curved characteristic. The shape of the curve depends on whether the base temperature was too low or too high, and Figure 2 shows the same consumptions plotted against degree days computed to three different base temperatures: one too high (as Figure 1), one too low, and one just right.

Notice in Figure 2 that the characterists are only curved near the origin. They are parallel at their right-hand ends, that is to say, in weeks when the outside air temperature never went above the base temperature. The gradients of the straight sections are all the same, including of course the case where the base temperature was appropriate. This is significant because although in real life we only have the distorted view represented by Figure 1, we now know that the gradient of its straight section is equal to the true gradient of the correct line.

So let’s revert to our original scenario: the case where we had a single line where the base temperature was too high. Figure 3 shows that a projection of the straight segment of the line intersects the vertical axis at -1000 kWh per week, well below the true position, which from Figure 1 we can judge to be around 500 kWh per week. The gradient of the straight section, incidentally, is 45 kWh per degree day.

To correct the distortion we need to shift the line in Figure 3 to the left by a certain number of degree days so that it ends up looking like Figure 4 below. The change in intercept we are aiming for is 1,500 kWh (the difference between the apparent intercept of -1000, and the true intercept, 500*). We can work out how far left to move the line by dividing the required change in the intercept by the gradient: 1500/45 = 33.3 degree days. Given that the degree-day figures are calculated over a 7-day interval, the required change in base temperature is 33.3/7 = 4.8 degrees

Note that only the points in the straight section moved the whole distance to the left: in the curved sections, the further left the point originally sits, the less it moves. This can best be visualised by looking again at Figure 2.

In more general terms the base-temperature adjustment is given by (Ct-Ca)/m.t where:

Ct is the true intercept;
Ca is the apparent intercept when projecting the straight portion of the distorted characteristic;
m is the gradient of that straight portion; and
t is the observing-interval length in days

* The intercept could be judged or estimated by a variety of methods including: empirical observations like averaging the consumption in non-heating weeks; by ‘eyeball’; or by fitting a curved regression line, etc..

A new dark age?

Is this the worst energy dashboard ever?

It’s an anonymised but accurate reconstruction of something I recently saw touted as an example of a ‘visual energy display’ suitable for a reception area. Apart from patently being an advertisement for an equipment supplier — name changed to protect the innocent (guilty?) — the only numerical information in the display is in small type against a background which makes it hard to read. Also, one might ask, “so what?”. There is no context. What proportion was 3.456 kWh? What were we aiming for? What is the trend?

There’s a bigger picture here: in energy reporting generally, system suppliers have descended into “content-lite” bling warfare (why do bar charts now have to bounce into view with a flourish?). And nearly always the displays are just passive and uncritical statements of quantities consumed. Anybody who wants to display energy information graphically should read Stephen Few’s book Information Dashboard Design . It is clear that almost no suppliers of energy monitoring systems have ever done so, but perhaps if their customers did, and became more discerning and demanding, we might see more useful information and less meaningless noise and clutter.

Flexible degree-day service

The FlexDD service from Degree Days Direct is a framework for delivering degree-day data. It allows you to create Excel energy workbooks with degree-day tables in them which update themselves automatically from the cloud.

Having subscribed to the observing stations you require, you’ll receive an Excel workbook linked to your account with some initial tables built in which you can customise as required.

This is a typical Excel table. In each column you specify the observing station (a), heating or cooling mode (b), and base temperature (c). Put datestamps at (d) and copy the output formulae into the table (e).

Available base temperatures for heating are at whole-number increments from 10°C to 25°C For cooling the range is 5°C to 30°C. For compatibility with legacy reports, additional base temperatures of 15.5°C and 18.5°C heating and 15.5°C cooling are also provided.

You can clone the worksheet to have both monthly and weekly reports if you wish, and your weekly reports can end on any day of the week.

Pricing

There is an initial setup charge of £50, with per-station subscription charges of £12 per annum. Prices exclude VAT. Orders can be placed by emailing moc.a1544554238msev@1544554238selas1544554238.

Energy balance: debunk bogus product claims

One of the most powerful basic concepts for the energy manager to understand is that of the energy balance, i.e., that all the energy you put into a system comes out again as energy in one form or another. This fundamental principle enables you spot at least some of the dodgy offerings out there.

Take, for example, any product that claims to increase the efficiency of a heating boiler by improving heat transfer: if it does so, it can only do so by increasing the quantity of heat absorbed from the flame. This leaves less heat in the exhaust gas and so reduces the flue-gas temperature. If the treatment doesn’t reduce the exhaust temperature it hasn’t worked, and the extent of temperature reduction indicates how much improvement there has been.

Likewise with voltage reduction. Unless operating at reduced voltage somehow improves the energy-conversion efficiency* of the connected equipment, any saving in energy purchased (input) must be manifested as a reduction in output (light, mechanical effort or heat) from the equipment. Hence you can only save input energy if you can tolerate reduced output. You certainly cannot, as one product shamelessly claims, recover the saved energy and store it for use later.

*Electric motors do change efficiency with voltage. When trying to provide the same mechanical output at reduced voltage, the current in their windings has to increase to compensate, and because this increases the resistive heating effect, the result is a small increase in power consumed – not a reduction.

Gas meter conundrum

A reader contacted me to say that he had a gas meter on his site that runs backwards when there is no demand. It is a rotary positive-displacement type (see picture) and I believe it is one of a number of sub-meters on what I know to be a sprawling industrial complex. His first thought had been that the gas in the downstream pipework might be warming up and expanding, pushing back through the meter, but a quick calculation showed that this would only account for about 10% of the observed volume. We established that the meter was in the right way around, and had mechanical as well as digital readouts, which tallied with each other. I put the puzzle to the readers of the Energy Management Register to see what they thought. I got about twenty responses and here is an edited summary of what came back.

A few people asked questions and suggested some measurements that might be useful: for example wanting to know what sizes and types of equipment were on the network, and what the standing pressures were upstream and downstream of the meter (it can only run backwards if the downstream pressure is higher than upstream). One theory was that there is something pressurising the downstream side. This is not completely fanciful and one reader on a similar site mentioned that he had gas supplies at both ends, one of which had been capped off. A long-forgotten redundant but imperfectly-isolated second mains supply could easily be the culprit. On a big network it would feed back through the meter and into other branches if there is even the slightest pressure differential.

Another reader asked if there were any gas booster sets downstream of the meter, fitted to increase gas pressure above that of the supply main on high-output burners. When the burners shut off, any residual pressure could dissipate via a defective non-return valve back through the meter.

A lot of respondents asked no questions but fired off some inspired suggestions. One raised the possibility that groundwater was leaking into buried pipework and displacing gas. It would not need to be very deep for this to happen but presumably there would be obvious and dramatic consequences whenever the burners fired up after a prolonged idle spell. Several raised the possibility that air was being pumped in somehow — which could create a gas mixture that could detonate rather than ignite when next called on.

Other readers focussed on the upstream gas pressure and the possibility that something might be causing it to drop. For example, it  might be cooling down during periods of no demand. If the upstream pipework is extensive this could draw back a larger volume via the meter than expansion downstream but this would have only a temporary effect, and only when other branches were not drawing gas. Two or three people raised the possibility that there were gas booster sets in the other upstream branches and that the suction from those would depress the upstream pressure. Indeed one reader had seen exactly this effect trip out a CHP plant on low pressure. Two ingenious folk suggested a Bernoulli effect, in which the problematic supply is teed off from a main and high-velocity gas passing the junction sucks gas from the branch.

One reader thought that the meter ought to be fitted with a ratchet to stop it turning backwards; I think this is normal with fiscal meters for obvious reasons, and it is not a bad point. If you stall a gear meter like the one in question, it stops the flow, and as there are safety implications in many of the ideas put forward, that seems like a good idea.

At the time of writing we are waiting to find out what was discovered. Meanwhile thanks to Bill Gysin, Mike Mann, Mike Muscott, Andrew Cowan, Vic Tuffen, Ben Davies-Nash, Jeremy Draper, James Pollington, John Perkin, Alan Turner, Bill Spragg, Mike Bond, Jonathan Morgan, James Ferguson, Neil Howison, Ian Hill, Tony Duffin, Neil Alcock, Peter Thompson  and others for your insights.