Monday, February 26, 2007

Guinness is good for you!

Students taking my course on Introductory Econometrics probably won’t recognise this slogan but it was to be found on all bottles of Guinness when I was a student first studying econometrics at the end of the nineteen-sixties. Actually there is a connection between Guinness and econometrics and an interesting story to go with it.

The link is Student’s t-distribution and its originator Mr W S Gosset who worked as a chemist and mathematician for the Guinness brewery a century ago.

In econometrics we regularly use the t-distribution to assess the significance of individual regression coefficients or to compute confidence intervals based on our sample estimates. We divide the coefficient estimate by its standard error and then check whether this calculated value exceeds (in absolute value) the relevant critical t value from the tables (based on the available degrees of freedom and the agreed significance level for the test – usually 5%). To get a 95% confidence interval for a parameter we take the point estimate plus or minus the estimated standard error multiplied by the appropriate t-value from the table with 0.025 in each tail.

As you may know the t-distribution is rather like the normal distribution in that it is symmetrical around its mean and with most of the values falling quite close to the mean with only a small amount of the area out in the extreme tails. Actually the tails are a bit “fatter” than those of the normal distribution, so if you were to use the normal distribution critical values of + and – 1.96 to separate extreme values in the tails from those in the middle of the distribution you would be slightly out in your assessment of the significance level of your hypothesis test or in setting 95% confidence limits for parameter estimates. However, as you perhaps also know the exact t-values depend also on the number of degrees of freedom available to you in estimating the parameter(s) and the t-distribution does approach the normal distribution as the number of degrees of freedom increases – so with a big enough sample size it perhaps wouldn’t make much difference.

Where does this t-distribution come from and why is it important? Let’s suppose first of all that a factory production line is filling bottles with an amount of liquid (Guinness maybe!) supposed to be 1 pint. Now it is not really possible for the technology to guarantee an exact amount of 1 pint each time – in reality the quantity dispensed will be a random variable which sometimes puts a bit more than a pint into a bottle and sometimes a bit less. It may be OK to assume that this random variable has a constant mean and variance, and even that the distribution is normal. In that case, if these two parameters were known in advance it would be possible to set up the machinery in such a way that we could ensure that say 95% of the time there would be at least 1 pint in each bottle. The mean amount dispensed would have to exceed 1 pint, the gap between this figure and 1 (pint) obviously being smaller the lower is the variance of the distribution of liquid dispensed. The problem is that in most cases we won’t be able to know either the mean value of the distribution or its variance in advance. We will have to take a sample of values and use the sample mean to estimate the population mean, and then base our estimate of the population standard deviation on the standard deviation of our sample. (Actually the estimate of the standard deviation of the sampling distribution of the mean – called the standard error – will be the sample standard deviation divided by the square root of the sample size; see any basic statistics text). In a situation like this it is quite likely that the sample size that we work with will be small. After all if we have to remove a number of bottles of liquid from the production line in order to measure how much liquid is in it, that bottle and its contents cannot be sold. Taking a large sample would just be too costly. Fortunately Gosset discovered how the sampling distribution would be affected by having to use an estimate of its variance from a small sample rather than working with the known population value or an estimate based on a very large sample. This distribution has now come to be known as the t-distribution, or more formally, Student’s t-distribution. It is worth noting that Gosset had no computers to help him undertake the necessary simulations to arrive at his result. All his calculations had to be done by hand.

Having completed this work Gosset naturally wanted to share his findings with other members of the statistics community through the usual method of a published journal article. However his employers at Guinness were not at all keen on this and he had to resort to the ruse of publishing under the pseudonym A. Student. Hence “Student’s” t –distribution. (Actually he didn’t originally call the distribution the t-distribution. It was his fellow statistician Fisher who gave it this name.)

You can read the original paper online if you wish at
Student [W S Gosset] (1908)The probable error of a mean. Biometrika, (1): 1–25.

And you can read more on William Sealy Gosset and his work for Guinness (as well as other leading statisticians of the early days such as Fisher, Pearson and others) in a fascinating book called The Lady Tasting Tea by David Salsburg.

Friday, February 16, 2007

Why least squares?

Introductory courses in econometrics quickly tell students about the use of the least squares criterion in estimating regression equation parameters. The difference between an actual value of the dependent variable Y and its fitted value Yhat is called the residual. Least squares estimators are produced in such a way as to minimise the sum of the squares of these residuals (RSS = Residual Sum of Squares).

Most students will accept that the slope and intercept of a fitted regression line need to be found by some kind of objective method. It is not very reliable just to put down a ruler and draw in the line that seems to give a good fit balancing in some way positive and negative errors. But why minimise the sum of the squares? There are other possible objective criteria that could be used. For example why not just minimise the sum of the absolute deviations of the actual points from the fitted line? (It is easy enough to show that you couldn’t choose values to minimise the simple sum of the deviations because the positive and negative errors would just cancel each other out so you wouldn’t be able to get a solution this way).

Minimising the squared deviations will apply a greater penalty to points that lie further away from the fitted regression line than if we worked only with the absolute distances of the points from the line. It is sometimes argued that this is a desirable feature of the estimation technique. In some sense the residual that you get is an estimate of the disturbance for that observation so you might alternatively ask why a-typical observations with apparent big disturbances should be given extra importance. As Peter Kennedy points out in his book A Guide to Econometrics ( page 12), even if you use the squared deviations rather than the absolute deviations as a way of getting over the problem of positive and negative errors cancelling out you don’t have to give each of these squared deviations equal weight. Indeed as you will find out later in your studies there may be occasions where it is better to use Weighted Least Squares rather than Ordinary Least Squares as the criterion for determining your estimator.

Of course one advantage of least squares estimators is that they are computationally straightforward. The result of applying the criterion using basic methods of calculus is a simple formula for each regression coefficient in terms of the X and Y data (or more accurately their sums, sums of squares and sums of cross-products). Other estimation techniques might require iterative procedures to arrive at an estimate. Although that would be less of a worry in modern times given the advances in computer power, conceptually it seems attractive to have a formula that can always be used.

Least squares estimation might also seem to have a “natural” justification as W W Sawyer suggested in his wonderful little book The Search for Pattern (published by Penguin Books in 1970 as part of a series called Introducing Mathematics but now unfortunately out of print). In part of the chapter on algebra and statistics (pp312-313) he described a mechanical device that could provide a visual confirmation of the least squares criterion. The device consists of a solid piece of wood with nails or screws inserted at places supposed to correspond with the XY values on a graph. Sawyer suggested that a steel rod could be used for the line and the nails could be connected to the steel rod by elastic bands. To quote what he said “Things must be arranged in such a way that, if the rod actually passed through one of the points, that point’s band would ‘feel satisfied’ – there would be no tension in it. But the further away the rod is the greater the tension in the band must be; in fact the tension must be proportional to the amount by which the rod misses the point. Things must be so arranged that the bands are compelled to remain upright, as shown in the figure. Each band then is doing its best for the point to which it belongs, and under all these conflicting pulls the rod would eventually come to rest in a position which represented a fair compromise.”

I well remember constructing a device of this kind when with great excitement I first began teaching econometrics back in the 1970s. I even painted on the fitted least squares line on the wood so that students could see the rod settling exactly where the line was. Everything worked perfectly with the first group that I used it with but later in the week I guess the elastic bands must have weakened and, just after I had held the board aloft with the metal rod sitting perfectly in place, one of the elastic bands snapped and the rod was fired across the room narrowly missing one of the students. So my experiment with a physical representation of the least squares line was short-lived. Looking back at Sawyer’s book now I see that he does say that “the device may not be too easy to set up in actual practice”, a phrase that I must have missed in my enthusiasm at the time.

Another justification for using OLS that I remember from my own student days was that, despite its alleged limitations it was quite robust to departures from its underlying assumptions. Thus, while it might be better to make use of a more complex estimation technique to overcome problems of heteroskedasticity or autocorrelation, those techniques actually require you to have a good idea of the form of autocorrelation or heteroskedasticty that you were going to allow for – something that you might not have. I remember my tutor at the time, Farouk El Sheikh (sadly now no longer with us) setting us an essay question “’In the country of the blind the one eyed-man is king’. Discuss in relation to the use of OLS and other estimation techniques.” However he was somewhat taken aback by my answer, which pointed out that in H G Wells' short story, The Country of the Blind, the fully blind inhabitants of the remote South American country eventually decided that the one-eyed man was insane because of the visions that he kept talking about, and so decided that he must be operated on to make him ‘normal’.

Perhaps sometimes visual and literary allusions in econometrics can be taken too far!

Monday, February 12, 2007

Nothing to prove *

I don’t really go in much for proofs in my Introduction to Econometrics (INEMET) course.

That doesn’t mean that proofs aren’t important in econometrics. Without a proof how could we be sure that the least squares estimators are unbiased, given the classical assumptions, or that omitting a relevant explanatory variable from a model will not only cause bias in the estimation of the coefficients of the other variables but will also affect their standard errors and t-values.

Certainly anyone who wishes to pursue the study of econometrics beyond an introductory course will need to become familiar with proofs of these and other important propositions found in the textbooks. But beginners can be overwhelmed by all the technical stuff (as I can still remember from my own initial exposure to the subject back in the late 1960s!). It is more important for students who are just beginning their study of econometrics to get a good intuitive feel for the subject, its scope and methodology, than to grapple with formal proofs. So in this respect I go along completely with Christopher Dougherty, who says in the preface to his book Introduction to Econometrics (Third Edition) p vi “For nearly everyone, there is a limit to the rate at which formal mathematical analysis can be digested. If this limit is exceeded, the student spends much mental energy grappling with the technicalities rather than the substance, impeding the development of a unified understanding of the subject.”

That doesn’t mean that students have to just accept a whole set of results without any attempt being made to justify them. In a number of cases a convincing intuitive argument can be provided for the propositions in question, without the need to resort to a proof. Or alternatively a simple quantitative example can be used to support the argument.

Take the case of the formula for the standard error of the X coefficient in the simple linear regression model. As Dougherty shows mathematically (!) on page 83 of his book the theoretical variance of the X coefficient is the variance of the disturbance term divide by the sum of squares of the deviations of the X variable from its sample mean. We can calculate the latter but we have to estimate the former as we don’t observed the actual disturbances. The square root of this estimate of the variance is then the standard error that we use for the t-test and for computing confidence intervals for the parameter.

In my lectures I have tried to convince students of the result by using a simple spreadsheet (Excel) demonstration which amounts to a simple Monte Carlo experiment. I begin by setting up an assumed model with known intercept and slope parameters – say Y = 2 + 0.8X + u.

Next I create a sample of fixed X values in the spreadsheet. I usually centre the values on a mean of 100 and have maybe 12 values either side of that (so X runs from 88 to 112). Then I use the random number generator to create a large number of sets (say 500) of 25 values of u, initially making u ~ N(0,1). From this I can create 500 sets of Y values to go with the Xs. Now I can run regressions based on these 500 data sets and collect the 500 estimates of the slope coefficient. (You might prefer to set this up as a batch job in EViews or some other specialist econometric software packages if you wish). After that, plot a histogram of the beta hat values, as well as calculating the mean and standard deviation of the 500 values. (An interesting discussion point is whether you should use 0 or the mean of the 500 beta hat values in this calculation). Compare these values with those predicted by the theory for the sampling distribution of beta hat.

Now you can repeat the process, varying first the variance of u (maybe make it smaller than 1 – say 0.5). You should see immediately that the variance of the beta hat estimates falls proportionately.

Then you can illustrate the effect of more spread out values of the Xs. Multiply each of them by 10 and recalculate all of the Y values (go back to the original standard normal distribution for u). Rerun the regressions and compare the distribution of the beta hat values. The standard deviation should be one tenth of what it was initially.

If all this takes too much time to do this interactively you could prepare everything in advance.

Another advantage of this exercise is that it introduces students to the idea of Monte Carlo studies and computer simulation at an early stage.

[1] Dougherty, C (2006)Introduction to Econometrics. (Third Edition), Oxford University Press.
[2] Judge, G (1999) Simple Monte Carlo studies on a spreadsheet CHEER Volume 13, 2.

* The phrase "Nothing to prove" is one that I always associate with the Sunderland striker David Connolly. He began his career at Watford, averaging a goal in every two games, before he left for a spell at the Dutch side Feyenoord. A bit of a flop in Holland he returned to England to play for Wimbledon declaring that he had "nothing to prove". This caused some amusement among Watford supporters who think of him as rather arrogant and used to label him W4BS (Watford's 4th Best Striker).