### Degrees of freedom and cowboy econometrics

In the glossary at the end of his “A Guide to Econometrics” text (Fifth Edition, 2003, p 545) Peter Kennedy defines degrees of freedom as “..the number of free or linearly independent sample observations used in the calculation of a statistic”. In regression models we often see the number of degrees of freedom defined as “the number of observations minus the number of parameters to be estimated” – so for simple bivariate regression models that means n-2 (there are two parameters here: one fixing the slope of the line relating the variables and one fixing the intercept on the vertical axis).

Take an extreme case where you only have two observations on (X,Y). You have no freedom in fitting the line at all as there is only one line that can be chosen to connect the two points (see Figure 1).

Anybody undertaking any serious applied work would be advised to ensure that they have a great many more observations than 2. Not only do these extra observations provide some freedom in estimating the equation of the line, they also ensure that the 95% confidence intervals for the parameters are not too wide. The confidence interval will be the point estimate plus or minus the product of the standard error of the parameter estimate and the t-value leaving two and a half percent of the distribution in each tail (where we look up t with the appropriate number of degrees of freedom). A glance at the t-tables shows that the t-values fall as the number of degrees of freedom increase. For example t(0.025;10) = 2.228 while t(0.025;60) = 2.

When assessing the statistical significance of a variable in a regression (or more correctly of its accompanying parameter) the calculated t value must be compared with the critical value from the tables. So for example if the calculated t-value was say 3.5 then we would be able to reject the null hypothesis that the parameter is zero and accept the alternative hypothesis. [If we have a strong a priori view about the sign of the parameter, as predicted by theory, we might use a one-tailed test which would put all the 5% significance level area into one tail and thus pick out a smaller critical value that has to be exceeded for the decision to be taken to reject the null.] Your computer software might also produce a figure for the P-value or probability value linked to the calculated statistic. This measures the area beyond the calculated value, in the tail(s). This provides an alternative way for you to decide whether to reject the null or not. You simply compare the P-value with 0.05 (i.e. 5%). If the P-value is < 0.05 then you can reject the null.

When I was student the computer software was not that sophisticated and you definitely had to use the t-tables. A friend of mine on the same course never had his tables with him when he was doing the practical exercises set by the lecturer and was

famous for saying “Just check if its bigger than 2”. His reasoning was that whatever degrees of freedom might be appropriate, the t-value you got from the table was always approximately 2. See the examples I mentioned above and also Figure 2 which shows the tabulated t-values at 5% and 2½% for various degrees of freedom.

Of course if you do this you will be slightly misrepresenting the actual significance level of the test. I called my friend’s approach to the subject “cowboy econometrics” (an analogy with “cowboy builders” like the ones who worked on my house and didn’t properly measure the doors they were fitting. They shut OK but they don’t fit snugly so I get a draft under the gap at the bottom).

These days there really is no excuse not to do the tests properly. And there are also some very nice online Java applets that will calculate either the probability value to go with any t-value (for a given number of degrees of freedom) or the t-value to go with a specified P-value. See for example the one produced by R Webster West of the Department of Statistics at the Texas A&M University, from which I have taken the following screen grab – Figure 3. Screen grab of the t-distribution applet graphic.

So it turns out that they are not all cowboys in Texas!

Take an extreme case where you only have two observations on (X,Y). You have no freedom in fitting the line at all as there is only one line that can be chosen to connect the two points (see Figure 1).

Anybody undertaking any serious applied work would be advised to ensure that they have a great many more observations than 2. Not only do these extra observations provide some freedom in estimating the equation of the line, they also ensure that the 95% confidence intervals for the parameters are not too wide. The confidence interval will be the point estimate plus or minus the product of the standard error of the parameter estimate and the t-value leaving two and a half percent of the distribution in each tail (where we look up t with the appropriate number of degrees of freedom). A glance at the t-tables shows that the t-values fall as the number of degrees of freedom increase. For example t(0.025;10) = 2.228 while t(0.025;60) = 2.

When assessing the statistical significance of a variable in a regression (or more correctly of its accompanying parameter) the calculated t value must be compared with the critical value from the tables. So for example if the calculated t-value was say 3.5 then we would be able to reject the null hypothesis that the parameter is zero and accept the alternative hypothesis. [If we have a strong a priori view about the sign of the parameter, as predicted by theory, we might use a one-tailed test which would put all the 5% significance level area into one tail and thus pick out a smaller critical value that has to be exceeded for the decision to be taken to reject the null.] Your computer software might also produce a figure for the P-value or probability value linked to the calculated statistic. This measures the area beyond the calculated value, in the tail(s). This provides an alternative way for you to decide whether to reject the null or not. You simply compare the P-value with 0.05 (i.e. 5%). If the P-value is < 0.05 then you can reject the null.

When I was student the computer software was not that sophisticated and you definitely had to use the t-tables. A friend of mine on the same course never had his tables with him when he was doing the practical exercises set by the lecturer and was

famous for saying “Just check if its bigger than 2”. His reasoning was that whatever degrees of freedom might be appropriate, the t-value you got from the table was always approximately 2. See the examples I mentioned above and also Figure 2 which shows the tabulated t-values at 5% and 2½% for various degrees of freedom.

Of course if you do this you will be slightly misrepresenting the actual significance level of the test. I called my friend’s approach to the subject “cowboy econometrics” (an analogy with “cowboy builders” like the ones who worked on my house and didn’t properly measure the doors they were fitting. They shut OK but they don’t fit snugly so I get a draft under the gap at the bottom).

These days there really is no excuse not to do the tests properly. And there are also some very nice online Java applets that will calculate either the probability value to go with any t-value (for a given number of degrees of freedom) or the t-value to go with a specified P-value. See for example the one produced by R Webster West of the Department of Statistics at the Texas A&M University, from which I have taken the following screen grab – Figure 3. Screen grab of the t-distribution applet graphic.

So it turns out that they are not all cowboys in Texas!

## 2 Comments:

I've called the cowboy econometrics the "TNT rule of thumb" (TNT goes for t-Near-Two)... Another "rule of thumb" like the Durwin-Watson-near-2.

That's good. I like memorable ideas like this.

Post a Comment

<< Home