Tobit and Adjusted Maximum Likelihood Estimation:

Documentation for FORTRAN Program


by

Tim Cohn
USGS National Center MS:107
Reston, VA 22092
703/648-5711



Introduction

It is sometimes of interest to fit multiple linear regression models to data. The conventional approach is to use ordinary least squares (OLS):
 
B = (X'X)-1X'Y

S2 = (Y-XB)'(Y-XB)/(n-k)


where Y is an n x 1 vector of responses, X is an n?k matrix of explanatory variables, B is a k x 1 vector of parameter estimates, and S2 is an unbiased estimate of the residual mean square error. OLS has several advantages: it is easy to apply, and it leads to estimates whose properties are well understood [see Draper and Smith, 1981]. OLS procedures are implemented in MINITAB using the Command REGRESS.

However, in some cases environmental data are subject to censoring. That is, some of the observations are reported as "less than" an analytical detection limit. Statistical procedures for dealing with this situation have been addressed extensively in the statistical and economics literature. Perhaps the most widely accepted method is the Tobit estimator, named after the economist James Tobin. The Tobit estimator is simply a maximum likelihoodestimator. Its properties are well understood [see Chapter 18, Judge et al., 1980]. However, Cohn [1988], among others, has observed that the Tobit estimator can be substantially biased in some cases. While it can be proven that it is not possible to eliminate the bias entirely, one can easily derive an estimator which is unbiased to first order. This is called an Adjusted Maximum Likelihood Estimator, or AMLE. A simple FORTAN program, called AMLEREG.F77, has been written which implements both the Tobit estimator (MLE) and the AMLE.

Using TOBIT

A driver program called TOBIT has been written to implement the censored-data equivalent of the MINITAB "REGRESS" command. It is interactive, and prints to the screen much of the usual MINITAB output (more or less). It requires an input data file which contains columns of the X, Y, and YD data in free format ASCII. A censored observation is assumed to occur if the detection limit is greater than the observed value.
 

Notes

Some Examples

A few analyses follow of a data set of 100 observations with 4 explanatory variables and a constant. The first example shows the usual MINITAB output; the next two are TOBIT results, first with essentially no censoring, and then with 50% censored. Note that commands typed by the user are in italics.

Example 1: The Minitab Result

MTB > READ 'TEST4.DAT' C1-C6
100 ROWS READ

ROW C1 C2 C3 C4 C5 C6
1 -0.08635 -0.09670 0.17524 0.24622 1.50243 0
2 0.42794 -0.14250 -0.84265 0.76608 -1.60559 0
3 -0.87022 -0.48002 1.87135 -0.91082 -0.54038 0
4 0.02228 0.02247 -0.72753 -1.08101 -2.64420 0
. . .

MTB > REGRESS C5 4 C1-C4

The regression equation is
C5 = - 0.009 + 0.994 C1 + 0.914 C2 + 0.873 C3 + 1.06 C4
Predictor Coef Stdev t-ratio p
Constant -0.0085 0.1042 -0.08 0.935
C1 0.9937 0.1082 9.19 0.000
C2 0.9142 0.1297 7.05 0.000
C3 0.8729 0.1078 8.10 0.000
C4 1.0611 0.1281 8.28 0.000

s = 1.034 R-sq = 74.7% R-sq(adj) = 73.6%

Analysis of Variance

SOURCE DF SS MS F p
Regression 4 299.229 74.807 69.94 0.000
Error 95 101.610 1.070
Total 99 400.839

Continue?
SOURCE DF SEQ SS
C1 1 124.059
C2 1 40.872
C3 1 60.911
C4 1 73.387

Unusual Observations
Obs. C1 C5 Fit Stdev.Fit Residual St.Resid
14 1.38 0.369 0.388 0.420 -0.019 -0.02X
16 0.73 0.080 2.488 0.225 -2.408 -2.39R
41 0.45 5.604 2.832 0.236 2.772 2.75R
46 1.51 2.716 0.438 0.277 2.278 2.29R
54 0.17 4.582 1.759 0.172 2.823 2.77R
59 -0.05 3.736 1.604 0.183 2.132 2.09R

R denotes an obs. with a large st. resid.
X denotes an obs. whose X value gives it large influence.
 

Example 2: Tobit Results with Threshold Corresponding to 50% Censoring:

RVARES:
R TOBIT
TOBIT REGRESSION ANALYSIS PROGRAM USING
EITHER MLE OR ADJUSTED MLE ESTIMATORS

**** VERSION 90.09 ****
TIM COHN, SEPTEMBER 1990

ENTER THE INPUT FILE NAME (OR ?)
TEST4.DAT
ENTER NO. VARS.(<20) IN FILE

ENTER NO. OF EXPLANATORY VARIABLES IN MODEL
(NOT COUNTING A CONSTANT TERM)

ENTER THE COLUMN NO. OF PREDICTOR 1

ENTER THE COLUMN NO. OF PREDICTOR 2

ENTER THE COLUMN NO. OF PREDICTOR 3

ENTER THE COLUMN NO. OF PREDICTOR 4

IS THERE A CONSTANT IN THE MODEL? (Y/N)

ENTER THE COLUMN NO. OF RESPONSE VAR.

ENTER THE COLUMN NO. OF DET. LIMIT VAR.

NO. OBS. READ IN: 100
NUMBER OF COLUMNS: 6
FILE NAME: TEST4.DAT

MAXIMUM LIKELIHOOD ESTIMATES (TOBIT)

The regression equation is

C05 = -1.707E-01 + 1.162E+00*C01 + 9.432E-01*C02 + 9.222E-01*C03 +
1.133E+00*C04
predictor Coef Stdev -2*L-ratio Approx-p
Constant -1.706510E-01 1.868682E-01 0.945 0.330940
Column 1 1.162288E+00 1.848199E-01 43.203 0.000000
Column 2 9.432051E-01 1.756538E-01 27.898 0.000000
Column 3 9.222313E-01 1.560587E-01 32.678 0.000000
Column 4 1.133125E+00 1.828878E-01 36.061 0.000000

S = 1.111476E+00

LIKELIHOOD = 4.875195E-20
APPROX. DF: 36.4

ENTER 1 FOR AMLE ESTIMATES

ADJUSTED MAXIMUM LIKELIHOOD ESTIMATES
N.B. THESE ARE, AT PRESENT, EXPERIMENTAL

The regression equation is

C05 = -1.646E-01 + 1.157E+00*C01 + 9.392E-01*C02 + 9.227E-01*C03 + 1.131E+00*C04
predictor Coef Stdev -2*L-ratio Approx-p
Constant -1.646218E-01 1.939814E-01 0.945 0.330940
Column 1 1.156585E+00 1.918551E-01 43.203 0.000000
Column 2 9.391652E-01 1.823401E-01 27.898 0.000000
Column 3 9.226745E-01 1.619991E-01 32.678 0.000000
Column 4 1.130771E+00 1.898494E-01 36.061 0.000000

S = 1.153785E+00

LIKELIHOOD = 4.875195E-20
APPROX. DF: 36.4

**** STOP
 

Tobit2001 Compiled for Macintosh (with FORTRAN Source and Test Data)

FORTRAN Source Code (Non-Mac users should download all four routines)

Driver program and main subroutines

dhumsl subroutine

imslfake subroutines

tacit subroutines

Test Data Set (to generate results above)

Test Data Set

References


This page last modified on 01 February 2001
Please email comments or suggestions to Tim Cohn at: software@timcohn.com