### Look at the stars

Modern econometrics software, such as PcGive (part of the Oxmetrics suite) helpfully adds asterisks (or stars) to test statistics to indicate statistical significance - one star for significance at the 5% level and two stars for significance at the 1% level. This can be helpful for quickly showing you whether or not a model fails a diagnostic test, for example whether or not the residuals appear to exhibit heteroskedasticity.

Using the data on house prices in Baton Rouge, Louisiana in 1985, supplied by Hill, Griffiths and Judge (2001) and running a simple regression of house prices (in dollars) on plot size (in square feet) at first sight we appear to have a satisfactory model with a postive coefficient for the the independent variable and a t-value showing statistical significance even at the 1% level. The F statistic for the regression (here equivalent to the square of the t-value for the coefficient of the explanatory variable) is large enough for us to reject the null hypothesis of no relationship even at the 1% level. The P value shown here after the F statistic as 0.000 to three decimal places is clearly less than 0.01 (1%). PcGive highlights this by attaching two stars.

However a post regression test for heteroskedastcity tells a different story. Using the test based on White (1980) PcGive runs an auxiliary regression of the squared residuals on the original regressor (in this case plot size in square feet) and the squared value of this variable. If the null hypothesis of homoskedasticity is correct we should get a low value for the test statistic (Chi squared and an F form version of the test are available) and associated high probability values. However in this case because the spread of house prices is larger at higher plot sizes we find that we must reject the null hypothesis and accept the alternative, i.e. we must conclude that there is a problem of heteroskedasticty. PcGive helpfully makes sure that we can see this instantly by attaching the stars to the computer output.

Clearly we must reconsider our model. A simple linear regression model is inadequate and we should consider an alternative functional form (perhaps a log-linear model). We should also add in and test for the relevance of many other possible variables to explain variations in house prices, some of them in dummy variable form (such as the presence or not of central heating).

The asterisks (or stars) that come in the PcGive output are a great help, but they should just be used to draw your attention to certain features of the results and must not be used mechanically.

As this example illustrates, the presence or absence of stars can be something we welcome (as with the overall F statistic in the basic regression) or something that causes us to pause and consider what to do next (as in the case of the heteroskedastcity test result). In all cases it is NOT appropriate to report your results just by talking about the presence or absence of the stars. A full consideration of the null and alternative hypotheses, test statistic and its meaning should be provided. Unfortunately I have seen assessed work handed in by some students where this doesn't happen and all I get is star gazing. I call this approach to econometrics "Yellow Econometrics"*. It is a kind of technological upgrade on the "cowboy econometrics" that I discussed in an earlier blog (where people would just quickly look to see if t-values were bigger than 2 or not rather than looking up the exact t statistic from the tables).

* Based on the opening line of Coldplay's 2000 hit "Yellow"

"Look at the stars,

Look how they shine for you,

And everything you do,

Yeah they were all yellow"

Using the data on house prices in Baton Rouge, Louisiana in 1985, supplied by Hill, Griffiths and Judge (2001) and running a simple regression of house prices (in dollars) on plot size (in square feet) at first sight we appear to have a satisfactory model with a postive coefficient for the the independent variable and a t-value showing statistical significance even at the 1% level. The F statistic for the regression (here equivalent to the square of the t-value for the coefficient of the explanatory variable) is large enough for us to reject the null hypothesis of no relationship even at the 1% level. The P value shown here after the F statistic as 0.000 to three decimal places is clearly less than 0.01 (1%). PcGive highlights this by attaching two stars.

`EQ( 1) Modelling price by OLS-CS (using b_rouge.xls)`

The estimation sample is: 1 to 213

Coefficient Std.Error t-value t-prob Part.R^2 Constant -426.708 5061 -0.0843 0.933 0.0000 sqft 46.0050 2.803 16.4 0.000 0.5608

sigma 8163.25 RSS 1.40607587e+010

R^2 0.560768 F(1,211) = 269.4 [0.000]**

However a post regression test for heteroskedastcity tells a different story. Using the test based on White (1980) PcGive runs an auxiliary regression of the squared residuals on the original regressor (in this case plot size in square feet) and the squared value of this variable. If the null hypothesis of homoskedasticity is correct we should get a low value for the test statistic (Chi squared and an F form version of the test are available) and associated high probability values. However in this case because the spread of house prices is larger at higher plot sizes we find that we must reject the null hypothesis and accept the alternative, i.e. we must conclude that there is a problem of heteroskedasticty. PcGive helpfully makes sure that we can see this instantly by attaching the stars to the computer output.

`Testing for heteroscedasticity using squares`

Chi^2(2) = 11.048 [0.0040]** and F-form F(2,208) = 5.6892 [0.0039]**

Clearly we must reconsider our model. A simple linear regression model is inadequate and we should consider an alternative functional form (perhaps a log-linear model). We should also add in and test for the relevance of many other possible variables to explain variations in house prices, some of them in dummy variable form (such as the presence or not of central heating).

The asterisks (or stars) that come in the PcGive output are a great help, but they should just be used to draw your attention to certain features of the results and must not be used mechanically.

As this example illustrates, the presence or absence of stars can be something we welcome (as with the overall F statistic in the basic regression) or something that causes us to pause and consider what to do next (as in the case of the heteroskedastcity test result). In all cases it is NOT appropriate to report your results just by talking about the presence or absence of the stars. A full consideration of the null and alternative hypotheses, test statistic and its meaning should be provided. Unfortunately I have seen assessed work handed in by some students where this doesn't happen and all I get is star gazing. I call this approach to econometrics "Yellow Econometrics"*. It is a kind of technological upgrade on the "cowboy econometrics" that I discussed in an earlier blog (where people would just quickly look to see if t-values were bigger than 2 or not rather than looking up the exact t statistic from the tables).

* Based on the opening line of Coldplay's 2000 hit "Yellow"

"Look at the stars,

Look how they shine for you,

And everything you do,

Yeah they were all yellow"

*References**Doornik, J A and Hendry, D F (1994-2008)PcGive Professional. Timberlake Consultants Limited.*

Hill, R C, Griffiths, W E and Judge, G G (2001) Undergraduate Econometrics. Second Edition John Wiley and Sons Ltd.

White, H. (1980). "A heteroskedastic-consistent covariance matrix estimator and a direct test for heteroskedasticity" Econometrica, 48, 817--838.Hill, R C, Griffiths, W E and Judge, G G (2001) Undergraduate Econometrics. Second Edition John Wiley and Sons Ltd.

White, H. (1980). "A heteroskedastic-consistent covariance matrix estimator and a direct test for heteroskedasticity" Econometrica, 48, 817--838.

## 4 Comments:

This comment has been removed by a blog administrator.

This comment has been removed by a blog administrator.

Guy, do you have any tips on interpreting large negative constant terms? The variables I'm working with are all non-negative values (they are mostly dummy variables and some dollar amount variables). Is there something wrong with my model?

This comment has been removed by a blog administrator.

Post a Comment

<< Home