by

Tim Cohn

USGS National Center MS:107

Reston, VA 22092

703/648-5711

B = (X'X)^{-1}X'YS

^{2}= (Y-XB)'(Y-XB)/(n-k)

where Y is an n x 1 vector of responses, X is an n?k matrix of explanatory
variables, B is a k x 1 vector of parameter estimates, and S^{2}
is an unbiased estimate of the residual mean square error. OLS has several
advantages: it is easy to apply, and it leads to estimates whose properties
are well understood [see Draper and Smith, 1981]. OLS procedures are implemented
in MINITAB using the Command REGRESS.

However, in some cases environmental data are subject to censoring. That is, some of the observations are reported as "less than" an analytical detection limit. Statistical procedures for dealing with this situation have been addressed extensively in the statistical and economics literature. Perhaps the most widely accepted method is the Tobit estimator, named after the economist James Tobin. The Tobit estimator is simply a maximum likelihoodestimator. Its properties are well understood [see Chapter 18, Judge et al., 1980]. However, Cohn [1988], among others, has observed that the Tobit estimator can be substantially biased in some cases. While it can be proven that it is not possible to eliminate the bias entirely, one can easily derive an estimator which is unbiased to first order. This is called an Adjusted Maximum Likelihood Estimator, or AMLE. A simple FORTAN program, called AMLEREG.F77, has been written which implements both the Tobit estimator (MLE) and the AMLE.

- 1) The Tobit program uses likelihood ratio tests, rather than t-tests, to determine the p-value associated with a fitted parameter. These are not exact, but appear to be quite good for typical circumstances.
- 2) Any transformations of the input data (e.g. taking logarithms) must be done before running TOBIT.
- 3) One should have at least 3?(k+1) above-threshold observations. The properties of estimates with fewer parameters have not been investigated.
- 4) The AMLE procedures have been tested with up to 80% censoring.
- 5) The user should specify a censoring threshold for every observation, regardless of whether or not that particular observation is censored. This information is used to remove the bias from the MLE estimates.
- 6) Since one does not have usual residuals to examine, there are some potential problems with using censored-data regression estimates.
- a) It may be difficult to identify model lack of fit;
- b) The familiar diagnostics are unavailable;
- c) On the other hand, it may be advantageous to have estimates that are insensitive to the exact values of the smallest observations. In a certain sense, the AMLE procedures are more robust to certain types of model mis-specification than are the "complete data" methods (depending on what you want to estimate). In fact, it has been suggested that deliberately censoring the left-hand tail of one's data may be an appropriate method for eliminating certain types of lack of fit. However, this needs to be investigated further.

Example 1: The Minitab Result

MTB > READ 'TEST4.DAT' C1-C6

100 ROWS READ

ROW C1 C2 C3 C4 C5 C6

1 | -0.08635 | -0.09670 | 0.17524 | 0.24622 | 1.50243 | 0 |

2 | 0.42794 | -0.14250 | -0.84265 | 0.76608 | -1.60559 | 0 |

3 | -0.87022 | -0.48002 | 1.87135 | -0.91082 | -0.54038 | 0 |

4 | 0.02228 | 0.02247 | -0.72753 | -1.08101 | -2.64420 | 0 |

MTB > REGRESS C5 4 C1-C4

The regression equation is

C5 = - 0.009 + 0.994 C1 + 0.914 C2 + 0.873 C3 + 1.06 C4

Predictor | Coef | Stdev | t-ratio | p |

Constant | -0.0085 | 0.1042 | -0.08 | 0.935 |

C1 | 0.9937 | 0.1082 | 9.19 | 0.000 |

C2 | 0.9142 | 0.1297 | 7.05 | 0.000 |

C3 | 0.8729 | 0.1078 | 8.10 | 0.000 |

C4 | 1.0611 | 0.1281 | 8.28 | 0.000 |

s = 1.034 R-sq = 74.7% R-sq(adj) = 73.6%

Analysis of Variance

SOURCE DF SS MS F p

Regression 4 299.229 74.807 69.94 0.000

Error 95 101.610 1.070

Total 99 400.839

Continue?

SOURCE DF SEQ SS

C1 1 124.059

C2 1 40.872

C3 1 60.911

C4 1 73.387

Unusual Observations

Obs. | C1 | C5 | Fit | Stdev.Fit | Residual | St.Resid |

14 | 1.38 | 0.369 | 0.388 | 0.420 | -0.019 | -0.02X |

16 | 0.73 | 0.080 | 2.488 | 0.225 | -2.408 | -2.39R |

41 | 0.45 | 5.604 | 2.832 | 0.236 | 2.772 | 2.75R |

46 | 1.51 | 2.716 | 0.438 | 0.277 | 2.278 | 2.29R |

54 | 0.17 | 4.582 | 1.759 | 0.172 | 2.823 | 2.77R |

59 | -0.05 | 3.736 | 1.604 | 0.183 | 2.132 | 2.09R |

R denotes an obs. with a large st. resid.

X denotes an obs. whose X value gives it large influence.

Example 2: Tobit Results with Threshold Corresponding to 50% Censoring:

RVARES:

R TOBIT

TOBIT REGRESSION ANALYSIS PROGRAM USING

EITHER MLE OR ADJUSTED MLE ESTIMATORS

**** VERSION 90.09 ****

TIM COHN, SEPTEMBER 1990

ENTER THE INPUT FILE NAME (OR ?)

TEST4.DAT

ENTER NO. VARS.(<20) IN FILE

ENTER NO. OF EXPLANATORY VARIABLES IN MODEL

(NOT COUNTING A CONSTANT TERM)

ENTER THE COLUMN NO. OF PREDICTOR 1

ENTER THE COLUMN NO. OF PREDICTOR 2

ENTER THE COLUMN NO. OF PREDICTOR 3

ENTER THE COLUMN NO. OF PREDICTOR 4

IS THERE A CONSTANT IN THE MODEL? (Y/N)

ENTER THE COLUMN NO. OF RESPONSE VAR.

ENTER THE COLUMN NO. OF DET. LIMIT VAR.

NO. OBS. READ IN: 100

NUMBER OF COLUMNS: 6

FILE NAME: TEST4.DAT

MAXIMUM LIKELIHOOD ESTIMATES (TOBIT)

The regression equation is

C05 = -1.707E-01 + 1.162E+00*C01 + 9.432E-01*C02 + 9.222E-01*C03 +

1.133E+00*C04

predictor | Coef | Stdev | -2*L-ratio | Approx-p | |

Constant | -1.706510E-01 | 1.868682E-01 | 0.945 | 0.330940 | |

Column | 1 | 1.162288E+00 | 1.848199E-01 | 43.203 | 0.000000 |

Column | 2 | 9.432051E-01 | 1.756538E-01 | 27.898 | 0.000000 |

Column | 3 | 9.222313E-01 | 1.560587E-01 | 32.678 | 0.000000 |

Column | 4 | 1.133125E+00 | 1.828878E-01 | 36.061 | 0.000000 |

S = 1.111476E+00

LIKELIHOOD = 4.875195E-20

APPROX. DF: 36.4

ENTER 1 FOR AMLE ESTIMATES

ADJUSTED MAXIMUM LIKELIHOOD ESTIMATES

N.B. THESE ARE, AT PRESENT, EXPERIMENTAL

The regression equation is

C05 = -1.646E-01 + 1.157E+00*C01 + 9.392E-01*C02 + 9.227E-01*C03 + 1.131E+00*C04

predictor | Coef | Stdev | -2*L-ratio | Approx-p | |

Constant | -1.646218E-01 | 1.939814E-01 | 0.945 | 0.330940 | |

Column | 1 | 1.156585E+00 | 1.918551E-01 | 43.203 | 0.000000 |

Column | 2 | 9.391652E-01 | 1.823401E-01 | 27.898 | 0.000000 |

Column | 3 | 9.226745E-01 | 1.619991E-01 | 32.678 | 0.000000 |

Column | 4 | 1.130771E+00 | 1.898494E-01 | 36.061 | 0.000000 |

S = 1.153785E+00

LIKELIHOOD = 4.875195E-20

APPROX. DF: 36.4

**** STOP

## Driver program and main subroutines

## dhumsl subroutine

## imslfake subroutines

## tacit subroutines

## Test Data Set

- Cohn, T. "Adjusted Maximum Likelihood Estimation of the Moments of Lognormal Populations from Type I Censored Samples," U.S. Geological Survey Open File Report No. 88-350, 34 pp., 1988.
- Draper, N. R., and H. Smith, Applied Regression Analysis, John Wiley and Sons, New York, 1981.
- Judge, G. E., W. E. Griffiths, R. C. Hill, and T-C. Lee, The Theory and Practice of Econometrics, John Wiley and Sons, New York, 1980.

This page last modified on 01 February 2001

Please email comments or suggestions to Tim Cohn at: software@timcohn.com