What is the "7 Parameter Model"?
Estimator can fit a variety of models. However, for estimating nutrient loads a 7-parameter model seems to work well (see Cohn et al., 1992):
ln[L] = b0 + b1 ln[Q] + b2 ln[Q]^2 + b3 T + b4 T^2 + b5 Sin[2*pT] + b6 Cos[2*pT] + eWhere
Q is the daily discharge
T is time, expressed in yearsThe parameters b1 and b2 in equation (1) correspond to variability related to flow dependence, the next pair correspond to time trends, and the third pair are used to fit a first-order Fourier series to the seasonal component of variability.
Where is this model appropriate?
The model has been tested in a number of watersheds. Generally, it should work in large watersheds (greater than 100 square miles).Why use a 7 Parameter Regression Model?
The 7 parameter model is the simplest model that captures the main sources of variability for most nutrient species. The first coefficient is a constant; the next two address variability connected with discharge; the next two deal with time trends; the final two capture annual seasonality. Simpler models are appropriate in some circumstances; the 7 parameter model seems to work for almost all nutrient species for large (greater than 100 square miles) watersheds.
An "official USGS" version of Estimator (which will be called Loadest3) is currently (March 2002) being developed by Rob Runkle. While the new version will retain Estimator's statistical approach, it promises to have a much-improved interface. Stay tuned!
What data does Estimator use?
Estimator uses a daily value discharge record (streamflow) and a set of unit value nutrient ("water quality") data.
What is the required format for the discharge record?
It is the USGS ADAPS "2 and 3" 80-character card format. This is primitive, but it works. Conversion software is available that will produce this file format from other formats.
The water quality file?
This is stored in the USGS QWDATA "PSTAT" format. Again, conversion software is available.
How much data is needed to run Estimator?
Estimator requires a continuous record of daily discharges, which must cover the time period of the calibration data set (at least those data that are to be used) and the period for which load esimates are desired. The more water quality data, the more precise the load estimates will be. Estimator works well with 25 water quality samples each year, half at high flow and half uniformly distributed over the course of the year (bi-weekly sampling).
Can you restrict the data used for calibration to a specific time interval?
Yes, Estimator allows the user to specify a time interval for the calibration data set. This can be useful for many purposes. Where watersheds are changing rapidly (due to development; phosphate bans; agricultural practices; etc.), it may be desirable to limit the calibration data set to include only those data collected at around the same time that one want estimates.
Does it make sense to use a "moving window" of calibration data?
Sometimes. In work done related to Chesapeake Bay, an asymmetrical 10-year moving window was used. Because of management needs, preliminary load estimates were given for the current year -- year "10" -- based on data collected during years "1" through "10". These were then made final one year later based on fitting the model to a different 10 year window: the last 9 years of data plus the year of new data. Final estimates (which are for year "9" of the new 10-year window) are made using data from years 1 through 10.In addition to meeting management needs, there was a statistical justification for this: The quadratic approximation to the time trend in the 7-parameter model is arguably not quite right, and it is desirable to consider the impact of a higher-order series. However, any omitted third-order term will have roots at {1.13, 5.0, 8.87} (assuming uniform sampling, etc.; this is a consequence of orthogonal polynomials), which means the omitted term will have minimal impact in years "2", "5", "6" and "9".
Third Order Orthogonal Polynomial on (0,10)
Note that roots occur at {1.13, 5, 8.87}Thus, by estimating in year "9", the model is nearly exact for any third-order time trend.
NUMBER NAME
CENTER COEFF.
S.D. T
P COEFF.
S.D. T
P
1 CONSTANT
.000 -3.3313
.2376 ***** .000000
6.8562 .2376 28.86 .000000
2 LOG-FLOW
9.293 .0802
.0959 .84 .394928
1.0802 .0959 11.26 .000000
3 LOG-FLOW SQUARED
9.293 -.0539
.0648 -.83 .389298
-.0539 .0648 -.83 .389298
4 DEC_TIME
1984.513 -.1191
.0276 -4.32 .000011
-.1191 .0276 -4.32 .000011
5 DEC_TIME SQUARED
1984.513 -.0188
.0113 -1.67 .084545
-.0188 .0113 -1.67 .084545
6 SIN(2*PI*T)
.000 -.8617
.1711 -5.04 .000001
-.8617 .1711 -5.04 .000001
7 COS(2*PI*T)
.000 -.0537
.1500 -.36 .715957
-.0537 .1500 -.36 .715957
S 1.23106 1.23106
R**2 (%)
26.6
52.8
N
198
M
191
NCENS
46
Take a look at an introductory linear model text, such as
Draper and Smith.
COMPUTED LOADS [KG/DAY] OR [G/DAY]
YEAR MO. LOAD:P00671 95% CONFIDENCE INTERVAL S.E. S.E. PRED.
1980 CALENDAR YEAR
1 2933.4050
1145.7127 6231.6363
708.2666 1328.8673
...
12
836.0575 327.0023
1774.4992 204.4908
378.1788
Q1 1980
2018.7822 1043.0113
3544.7960 416.2953
645.6858
...
CY 1980
1580.3346 1061.0595
2267.0815 232.5623
308.9721
WY 1980
3538.1675 2020.9621
5766.6898 744.3518
963.5743
The first column (Time Intervals): This defines the time interval being considered: Numbers 1 to 12 refer to the months January to December; "Q" refers to quarters, which are here defined so that Q1 is January-March; Q2 is April-June, etc; CY refers to the calendar year (January-December), while WY refers to the "Water Year" (October-September). The 1999 water year begins on 1 October 1998 and extends to 30 September 1999.
The second column (Load): This gives the estimated load for the time interval. All loads are reported as daily average values for the time interval, and the units are either kilograms/day or grams/day depending on whether the concentration data was reported in milligrams/liter or micrograms/liter.
The third and fourtholumns (Confidence Intervals): These provide approximate 95% confidence intervals for the true load. That is, if one employs uses these confidence intervals at 100 sites, at about 5 of the sites the estimated confidence intervals will not contain the true (unknown) loads.
The fifth column (SE): This gives
the estimated standard error of the load estimator. This captures
variability related to our uncertainty in the parameters of the model.
The sixth column (SEP): This gives the estimated
standard error of of prediction of the load estimate. This captures
variability in the system as well as variability related to our uncertainty
in the parameters of the model.
Note that each row corresponds to an observation used to calibrate the regression model.
Column Definition
of Variable
------ ----------------------
1
STATION ID
2
YEAR
3
MONTH (1=JANUARY, 2=FEB...)
4
DAY
5
TIME (2399=1 MINUTE BEFORE MIDNIGHT)
6
IGNORE (USED TO BE INST. Q, BUT THAT WAS DROPPED)
7
DAILY DISCHARGE [CFS]
8
MEASURED CONCENTRATION (REAL UNITS, MG/L)
9
ESTIMATED CONCENTRATION (BY ESTIMATOR)
10
STANDARD ERROR OF ESTIMATED CONCENTRATION
11
ESTIMATED LOAD (REAL UNITS, KG/DAY)
12
STANDARD ERROR OF ESTIMATED LOAD
THE NEXT FEW RELATE TO THE FIT OF THE REGRESSION
MODEL
13... PREDICTOR VARIABLES
(USUALLY TRANSFORMS OF Q AND T)
ASSUME THERE ARE K EXPLANATORY VARIABLES
IN MODEL
13+K RESPONSE
VARIABLE (LOG UNITS)
14+K CENSORING
THRESHOLD (LOG UNITS)
15+K RESIDUAL
(LOG UNITS)
16+K PREDICTED
VALUE (LOG UNITS)