Few statistics are more oft-quoted by empirical researchers than r-squared. While applauding the value of an intuitive interpretation in principle, it is pretty clear that the interpretation is wrong. Apart from honesty, the main reason I care about this is that it gets me into trouble with (the more discerning) students.
Not for the first time a student recently came back to me with a query. I have given him some*data and the task was to draw some kind of a causaility diagram using correlations, partial correlations and commons sense (for the causailty). They had just had the class on r-squared so the idea was to put these on the arrows.
The student was interested in checking the interpretation of r-squared. So he broke the y-variable*down into groups of equal x-values (which was discrete). He looked at the standard deviation of Y for each group (using Pivotables). He compared these with the overall standard deviation and found that the within group standard deviation was, on average, about 40% of the overall. So 60% is explained by X. Yet the correlation was about 0.9 and we say 81% is explained.
I had to tell him that the common interpretation of r-squared is wrong but ubiquitous and that I had hoped*he wouldn’t notice!
The problem of course is that that we can explain 81% of the
variance. But variance does not measure variability (or anything sensible?). Standard deviation does. This being the case, it seems that we should re-defined variation explained as
1–?(1–r2)
which is always way smaller. Not that I am game to try! Maybe if we collectively came up with a better name we could get away with it. One possibility would be to incorporate this adjustment into the adjusted r-squared. In other words, substitute the adjusted r-squared into the above formula and call
this variation explained. The downside of this is that the incorrectness of the interpretation of ordinary r-squared would then stand out like the proverbials.
Get More from the original blog...